+ <section remap="h4">
+ <title>Enable TBF policy</title>
+ <para>Command:</para>
+ <screen>lctl set_param ost.OSS.ost_io.nrs_policies="tbf <<replaceable>policy</replaceable>>"
+ </screen>
+ <para>For now, the RPCs can be classified into the different types
+ according to their NID, JOBID, OPCode and UID/GID. When enabling TBF
+ policy, you can specify one of the types, or just use "tbf" to enable
+ all of them to do a fine-grained RPC requests classification.</para>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_policies="tbf"
+$ lctl set_param ost.OSS.ost_io.nrs_policies="tbf nid"
+$ lctl set_param ost.OSS.ost_io.nrs_policies="tbf jobid"
+$ lctl set_param ost.OSS.ost_io.nrs_policies="tbf opcode"
+$ lctl set_param ost.OSS.ost_io.nrs_policies="tbf uid"
+$ lctl set_param ost.OSS.ost_io.nrs_policies="tbf gid"</screen>
+ </section>
+ <section remap="h4">
+ <title>Start a TBF rule</title>
+ <para>The TBF rule is defined in the parameter
+ <literal>ost.OSS.ost_io.nrs_tbf_rule</literal>.</para>
+ <para>Command:</para>
+ <screen>lctl set_param x.x.x.nrs_tbf_rule=
+"[reg|hp] start <replaceable>rule_name</replaceable> <replaceable>arguments</replaceable>..."
+ </screen>
+ <para>'<replaceable>rule_name</replaceable>' is a string up to 15
+ characters which identifies the TBF policy rule's name. Alphanumeric
+ characters and underscores are accepted (e.g: "test_rule_A1").
+ </para>
+ <para>'<replaceable>arguments</replaceable>' is a string to specify the
+ detailed rule according to the different types.
+ </para>
+ <itemizedlist>
+ <para>Next, the different types of TBF policies will be described.</para>
+ <listitem>
+ <para><emphasis role="bold">NID based TBF policy</emphasis></para>
+ <para>Command:</para>
+ <screen>lctl set_param x.x.x.nrs_tbf_rule=
+"[reg|hp] start <replaceable>rule_name</replaceable> nid={<replaceable>nidlist</replaceable>} rate=<replaceable>rate</replaceable>"
+ </screen>
+ <para>'<replaceable>nidlist</replaceable>' uses the same format
+ as configuring LNET route. '<replaceable>rate</replaceable>' is
+ the (upper limit) RPC rate of the rule.</para>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start other_clients nid={192.168.*.*@tcp} rate=50"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start computes nid={192.168.1.[2-128]@tcp} rate=500"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start loginnode nid={192.168.1.1@tcp} rate=100"</screen>
+ <para>In this example, the rate of processing RPC requests from
+ compute nodes is at most 5x as fast as those from login nodes.
+ The output of <literal>ost.OSS.ost_io.nrs_tbf_rule</literal> is
+ like:</para>
+ <screen>lctl get_param ost.OSS.ost_io.nrs_tbf_rule
+ost.OSS.ost_io.nrs_tbf_rule=
+regular_requests:
+CPT 0:
+loginnode {192.168.1.1@tcp} 100, ref 0
+computes {192.168.1.[2-128]@tcp} 500, ref 0
+other_clients {192.168.*.*@tcp} 50, ref 0
+default {*} 10000, ref 0
+high_priority_requests:
+CPT 0:
+loginnode {192.168.1.1@tcp} 100, ref 0
+computes {192.168.1.[2-128]@tcp} 500, ref 0
+other_clients {192.168.*.*@tcp} 50, ref 0
+default {*} 10000, ref 0</screen>
+ <para>Also, the rule can be written in <literal>reg</literal> and
+ <literal>hp</literal> formats:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"reg start loginnode nid={192.168.1.1@tcp} rate=100"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"hp start loginnode nid={192.168.1.1@tcp} rate=100"</screen>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">JobID based TBF policy</emphasis></para>
+ <para>For the JobID, please see
+ <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="jobstats" /> for more details.</para>
+ <para>Command:</para>
+ <screen>lctl set_param x.x.x.nrs_tbf_rule=
+"[reg|hp] start <replaceable>rule_name</replaceable> jobid={<replaceable>jobid_list</replaceable>} rate=<replaceable>rate</replaceable>"
+ </screen>
+ <para>Wildcard is supported in
+ {<replaceable>jobid_list</replaceable>}.</para>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start iozone_user jobid={iozone.500} rate=100"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start dd_user jobid={dd.*} rate=50"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start user1 jobid={*.600} rate=10"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start user2 jobid={io*.10* *.500} rate=200"</screen>
+ <para>Also, the rule can be written in <literal>reg</literal> and
+ <literal>hp</literal> formats:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"hp start iozone_user1 jobid={iozone.500} rate=100"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"reg start iozone_user1 jobid={iozone.500} rate=100"</screen>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">Opcode based TBF policy</emphasis></para>
+ <para>Command:</para>
+ <screen>$ lctl set_param x.x.x.nrs_tbf_rule=
+"[reg|hp] start <replaceable>rule_name</replaceable> opcode={<replaceable>opcode_list</replaceable>} rate=<replaceable>rate</replaceable>"
+ </screen>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start user1 opcode={ost_read} rate=100"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start iozone_user1 opcode={ost_read ost_write} rate=200"</screen>
+ <para>Also, the rule can be written in <literal>reg</literal> and
+ <literal>hp</literal> formats:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"hp start iozone_user1 opcode={ost_read} rate=100"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"reg start iozone_user1 opcode={ost_read} rate=100"</screen>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">UID/GID based TBF policy</emphasis></para>
+ <para>Command:</para>
+ <screen>$ lctl set_param ost.OSS.*.nrs_tbf_rule=\
+"[reg][hp] start <replaceable>rule_name</replaceable> uid={<replaceable>uid</replaceable>} rate=<replaceable>rate</replaceable>"
+$ lctl set_param ost.OSS.*.nrs_tbf_rule=\
+"[reg][hp] start <replaceable>rule_name</replaceable> gid={<replaceable>gid</replaceable>} rate=<replaceable>rate</replaceable>"</screen>
+ <para>Exapmle:</para>
+ <para>Limit the rate of RPC requests of the uid 500</para>
+ <screen>$ lctl set_param ost.OSS.*.nrs_tbf_rule=\
+"start tbf_name uid={500} rate=100"</screen>
+ <para>Limit the rate of RPC requests of the gid 500</para>
+ <screen>$ lctl set_param ost.OSS.*.nrs_tbf_rule=\
+"start tbf_name gid={500} rate=100"</screen>
+ <para>Also, you can use the following rule to control all reqs
+ to mds:</para>
+ <para>Start the tbf uid QoS on MDS:</para>
+ <screen>$ lctl set_param mds.MDS.*.nrs_policies="tbf uid"</screen>
+ <para>Limit the rate of RPC requests of the uid 500</para>
+ <screen>$ lctl set_param mds.MDS.*.nrs_tbf_rule=\
+"start tbf_name uid={500} rate=100"</screen>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">Policy combination</emphasis></para>
+ <para>To support TBF rules with complex expressions of conditions,
+ TBF classifier is extented to classify RPC in a more fine-grained
+ way. This feature supports logical conditional conjunction and
+ disjunction operations among different types.
+ In the rule:
+ "&" represents the conditional conjunction and
+ "," represents the conditional disjunction.</para>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start comp_rule opcode={ost_write}&jobid={dd.0},\
+nid={192.168.1.[1-128]@tcp 0@lo} rate=100"</screen>
+ <para>In this example, those RPCs whose <literal>opcode</literal> is
+ ost_write and <literal>jobid</literal> is dd.0, or
+ <literal>nid</literal> satisfies the condition of
+ {192.168.1.[1-128]@tcp 0@lo} will be processed at the rate of 100
+ req/sec.
+ The output of <literal>ost.OSS.ost_io.nrs_tbf_rule</literal>is like:
+ </para>
+ <screen>$ lctl get_param ost.OSS.ost_io.nrs_tbf_rule
+ost.OSS.ost_io.nrs_tbf_rule=
+regular_requests:
+CPT 0:
+comp_rule opcode={ost_write}&jobid={dd.0},nid={192.168.1.[1-128]@tcp 0@lo} 100, ref 0
+default * 10000, ref 0
+CPT 1:
+comp_rule opcode={ost_write}&jobid={dd.0},nid={192.168.1.[1-128]@tcp 0@lo} 100, ref 0
+default * 10000, ref 0
+high_priority_requests:
+CPT 0:
+comp_rule opcode={ost_write}&jobid={dd.0},nid={192.168.1.[1-128]@tcp 0@lo} 100, ref 0
+default * 10000, ref 0
+CPT 1:
+comp_rule opcode={ost_write}&jobid={dd.0},nid={192.168.1.[1-128]@tcp 0@lo} 100, ref 0
+default * 10000, ref 0</screen>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.*.nrs_tbf_rule=\
+"start tbf_name uid={500}&gid={500} rate=100"</screen>
+ <para>In this example, those RPC requests whose uid is 500 and
+ gid is 500 will be processed at the rate of 100 req/sec.</para>
+ </listitem>
+ </itemizedlist>
+ </section>
+ <section remap="h4">
+ <title>Change a TBF rule</title>
+ <para>Command:</para>
+ <screen>lctl set_param x.x.x.nrs_tbf_rule=
+"[reg|hp] change <replaceable>rule_name</replaceable> rate=<replaceable>rate</replaceable>"
+ </screen>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"change loginnode rate=200"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"reg change loginnode rate=200"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"hp change loginnode rate=200"
+</screen>
+ </section>
+ <section remap="h4">
+ <title>Stop a TBF rule</title>
+ <para>Command:</para>
+ <screen>lctl set_param x.x.x.nrs_tbf_rule="[reg|hp] stop
+<replaceable>rule_name</replaceable>"</screen>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="stop loginnode"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="reg stop loginnode"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule="hp stop loginnode"</screen>
+ </section>
+ <section remap="h4">
+ <title>Rule options</title>
+ <para>To support more flexible rule conditions, the following options
+ are added.</para>
+ <itemizedlist>
+ <listitem>
+ <para><emphasis role="bold">Reordering of TBF rules</emphasis></para>
+ <para>By default, a newly started rule is prior to the old ones,
+ but by specifying the argument '<literal>rank=</literal>' when
+ inserting a new rule with "<literal>start</literal>" command,
+ the rank of the rule can be changed. Also, it can be changed by
+ "<literal>change</literal>" command.
+ </para>
+ <para>Command:</para>
+ <screen>lctl set_param ost.OSS.ost_io.nrs_tbf_rule=
+"start <replaceable>rule_name</replaceable> <replaceable>arguments</replaceable>... rank=<replaceable>obj_rule_name</replaceable>"
+lctl set_param ost.OSS.ost_io.nrs_tbf_rule=
+"change <replaceable>rule_name</replaceable> rate=<replaceable>rate</replaceable> rank=<replaceable>obj_rule_name</replaceable>"
+</screen>
+ <para>By specifying the existing rule
+ '<replaceable>obj_rule_name</replaceable>', the new rule
+ '<replaceable>rule_name</replaceable>' will be moved to the front of
+ '<replaceable>obj_rule_name</replaceable>'.</para>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start computes nid={192.168.1.[2-128]@tcp} rate=500"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start user1 jobid={iozone.500 dd.500} rate=100"
+$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=\
+"start iozone_user1 opcode={ost_read ost_write} rate=200 rank=computes"</screen>
+ <para>In this example, rule "iozone_user1" is added to the front of
+ rule "computes". We can see the order by the following command:
+ </para>
+ <screen>$ lctl get_param ost.OSS.ost_io.nrs_tbf_rule
+ost.OSS.ost_io.nrs_tbf_rule=
+regular_requests:
+CPT 0:
+user1 jobid={iozone.500 dd.500} 100, ref 0
+iozone_user1 opcode={ost_read ost_write} 200, ref 0
+computes nid={192.168.1.[2-128]@tcp} 500, ref 0
+default * 10000, ref 0
+CPT 1:
+user1 jobid={iozone.500 dd.500} 100, ref 0
+iozone_user1 opcode={ost_read ost_write} 200, ref 0
+computes nid={192.168.1.[2-128]@tcp} 500, ref 0
+default * 10000, ref 0
+high_priority_requests:
+CPT 0:
+user1 jobid={iozone.500 dd.500} 100, ref 0
+iozone_user1 opcode={ost_read ost_write} 200, ref 0
+computes nid={192.168.1.[2-128]@tcp} 500, ref 0
+default * 10000, ref 0
+CPT 1:
+user1 jobid={iozone.500 dd.500} 100, ref 0
+iozone_user1 opcode={ost_read ost_write} 200, ref 0
+computes nid={192.168.1.[2-128]@tcp} 500, ref 0
+default * 10000, ref 0</screen>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">TBF realtime policies under congestion
+ </emphasis></para>
+ <para>During TBF evaluation, we find that when the sum of I/O
+ bandwidth requirements for all classes exceeds the system capacity,
+ the classes with the same rate limits get less bandwidth than if
+ preconfigured evenly. The reason for this is the heavy load on a
+ congested server will result in some missed deadlines for some
+ classes. The number of the calculated tokens may be larger than 1
+ during dequeuing. In the original implementation, all classes are
+ equally handled to simply discard exceeding tokens.</para>
+ <para>Thus, a Hard Token Compensation (HTC) strategy has been
+ implemented. A class can be configured with the HTC feature by the
+ rule it matches. This feature means that requests in this kind of
+ class queues have high real-time requirements and that the bandwidth
+ assignment must be satisfied as good as possible. When deadline
+ misses happen, the class keeps the deadline unchanged and the time
+ residue(the remainder of elapsed time divided by 1/r) is compensated
+ to the next round. This ensures that the next idle I/O thread will
+ always select this class to serve until all accumulated exceeding
+ tokens are handled or there are no pending requests in the class
+ queue.</para>
+ <para>Command:</para>
+ <para>A new command format is added to enable the realtime feature
+ for a rule:</para>
+ <screen>lctl set_param x.x.x.nrs_tbf_rule=\
+"start <replaceable>rule_name</replaceable> <replaceable>arguments</replaceable>... realtime=1</screen>
+ <para>Example:</para>
+ <screen>$ lctl set_param ost.OSS.ost_io.nrs_tbf_rule=
+"start realjob jobid={dd.0} rate=100 realtime=1</screen>
+ <para>This example rule means the RPC requests whose JobID is dd.0
+ will be processed at the rate of 100req/sec in realtime.</para>
+ </listitem>
+ </itemizedlist>
+ </section>
+ </section>
+ <section xml:id="delaytuning" condition='l2A'>
+ <title>
+ <indexterm>
+ <primary>tuning</primary>
+ <secondary>Network Request Scheduler (NRS) Tuning</secondary>
+ <tertiary>Delay policy</tertiary>
+ </indexterm>Delay policy</title>
+ <para>The NRS Delay policy seeks to perturb the timing of request
+ processing at the PtlRPC layer, with the goal of simulating high server
+ load, and finding and exposing timing related problems. When this policy
+ is active, upon arrival of a request the policy will calculate an offset,
+ within a defined, user-configurable range, from the request arrival
+ time, to determine a time after which the request should be handled.
+ The request is then stored using the cfs_binheap implementation,
+ which sorts the request according to the assigned start time.
+ Requests are removed from the binheap for handling once their start
+ time has been passed.</para>
+ <para>The Delay policy can be enabled on all types of PtlRPC services,
+ and has the following tunables that can be used to adjust its behavior:
+ </para>