From 923d2dd458d777209ae13852c233bb38d1edf51c Mon Sep 17 00:00:00 2001 From: Wu Libin Date: Fri, 1 Aug 2014 10:24:41 +0800 Subject: [PATCH] LUDOC-221 nrs: Add tbf policy document tbf is a new policy in nrs, this patch describe how to use it. Signed-off-by: Li Xi Signed-off-by: Wu Libin Change-Id: I3f0fecc8d10d67a5326db3678a9db0663b7ae9e7 Reviewed-on: http://review.whamcloud.com/11302 Tested-by: Jenkins Reviewed-by: Andreas Dilger --- LustreTuning.xml | 93 ++++++++ figures/TBF_policy.svg | 597 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 690 insertions(+) create mode 100644 figures/TBF_policy.svg diff --git a/LustreTuning.xml b/LustreTuning.xml index 54d947f..2864923 100644 --- a/LustreTuning.xml +++ b/LustreTuning.xml @@ -655,6 +655,99 @@ ost.OSS.ost_io.nrs_orr_supported=reg_supported:reads_and_writes +
+ <indexterm> + <primary>tuning</primary> + <secondary>Network Request Scheduler (NRS) Tuning</secondary> + <tertiary>Token Bucket Filter (TBF) policy</tertiary> + </indexterm>Token Bucket Filter (TBF) policy + The TBF (Token Bucket Filter) is a Lustre NRS policy which enables Lustre services + to enforce the RPC rate limit on clients/jobs for QoS (Quality of Service) purposes. +
+ The internal structure of TBF policy + + + + + + The internal structure of TBF policy + + +
+ When a RPC request arrives, TBF policy puts it to a waiting queue according to its + classification. The classification of RPC requests is based on either NID or JobID of the + RPC according to the configure of TBF. TBF policy maintains multiple queues in the system, + one queue for each category in the classification of RPC requests. The requests waits for + tokens in the FIFO queue before they have been handled so as to keep the RPC rates under the limits. +When Lustre services are too busy to handle all of the requests in time, all of the + specified rates of the queues will not be satisfied. Nothing bad will happen except + some of the RPC rates are slower than configured. In this case, the queue with higher + rate will have an advantage over the queues with lower rates, but none of them will be starved. +To manage the RPC rate of queues, we don't need to set the rate of each queue manually. + Instead, we define rules which TBF policy matches to determine RPC rate limits. All of + the defined rules are organized as an ordered list. Whenever a queue is newly created, + it goes though the rule list and takes the first matched rule as its rule, so that the + queue knows its RPC token rate. A rule can be added to or removed from the list at run + time. Whenever the list of rules is changed, the queues will update their matched rules. + + + ost.OSS.ost_io.nrs_tbf_rule + The format of the rule start command of TBF policy is as follows: + $ lctl set_param x.x.x.nrs_tbf_rule= + "[reg|hp] start rule_name arguments..." + The 'rule_name' argument is a string which identifies a rule. + The format of the 'arguments' is changing according to the + type of the TBF policy. For the NID based TBF policy, its format is as follows: + $ lctl set_param x.x.x.nrs_tbf_rule= + "[reg|hp] start rule_name {nidlist} rate" + The format of 'nidlist' argument is the same as the + format when configuring LNET route. The 'rate' argument is + the RPC rate of the rule, means the upper limit number of requests per second. + Following commands are valid. Please note that a newly started rule is prior to old rules, + so the order of starting rules is critical too. + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "start other_clients {192.168.*.*@tcp} 50" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "start loginnode {192.168.1.1@tcp} 100" + General rule can be replaced by two rules (reg and hp) as follows: + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "reg start loginnode {192.168.1.1@tcp} 100" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "hp start loginnode {192.168.1.1@tcp} 100" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "start computes {192.168.1.[2-128]@tcp} 500" + The above rules will put an upper limit for servers to process at most 5x as many RPCs + from compute nodes as login nodes. + For the JobID (please see for more details) based TBF policy, its format is as follows: + $ lctl set_param x.x.x.nrs_tbf_rule= + "[reg|hp] start name {jobid_list} rate" + Following commands are valid: + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "start user1 {iozone.500 dd.500} 100" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "start iozone_user1 {iozone.500} 100" + Same as nid, could use reg and hp rules separately: + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "hp start iozone_user1 {iozone.500} 100" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule= + "reg start iozone_user1 {iozone.500} 100" + The format of the rule change command of TBF policy is as follows: + $ lctl set_param x.x.x.nrs_tbf_rule= + "[reg|hp] change rule_name rate" + Following commands are valid: + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="change loginnode 200" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="reg change loginnode 200" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="hp change loginnode 200" + The format of the rule stop command of TBF policy is as follows: + $ lctl set_param x.x.x.nrs_tbf_rule="[reg|hp] stop rule_name" + Following commands are valid: + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="stop loginnode" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="reg stop loginnode" + $ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="hp stop loginnode" + + +
<indexterm><primary>tuning</primary><secondary>lockless I/O</secondary></indexterm>Lockless I/O Tunables diff --git a/figures/TBF_policy.svg b/figures/TBF_policy.svg new file mode 100644 index 0000000..931963c --- /dev/null +++ b/figures/TBF_policy.svg @@ -0,0 +1,597 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -- 1.8.3.1