<?xml version='1.0' encoding='utf-8'?>
<chapter xmlns="http://docbook.org/ns/docbook"
-xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
-xml:id="lustretuning">
+ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+ xml:id="lustretuning">
<title xml:id="lustretuning.title">Tuning a Lustre File System</title>
<para>This chapter contains information about tuning a Lustre file system for
better performance.</para>
service immediately and disables automatic thread creation behavior.
</para>
</note>
- <para condition='l23'>Lustre software release 2.3 introduced new
- parameters to provide more control to administrators.</para>
+ <para>Parameters are available to provide administrators control
+ over the number of service threads.</para>
<itemizedlist>
<listitem>
<para>
in providing the read page service. The read page service handles
file close and readdir operations.</para>
</listitem>
- <listitem>
- <para>
- <literal>mds_attr_num_threads</literal> controls the number of threads
- in providing the setattr service to clients running Lustre software
- release 1.8.</para>
- </listitem>
</itemizedlist>
</section>
</section>
- <section xml:id="dbdoclet.mdsbinding" condition='l23'>
+ <section xml:id="dbdoclet.mdsbinding">
<title>
<indexterm>
<primary>tuning</primary>
<secondary>MDS binding</secondary>
</indexterm>Binding MDS Service Thread to CPU Partitions</title>
- <para>With the introduction of Node Affinity (
- <xref linkend="nodeaffdef" />) in Lustre software release 2.3, MDS threads
- can be bound to particular CPU partitions (CPTs) to improve CPU cache
- usage and memory locality. Default values for CPT counts and CPU core
+ <para>With the Node Affinity (<xref linkend="nodeaffdef" />) feature,
+ MDS threads can be bound to particular CPU partitions (CPTs) to improve CPU
+ cache usage and memory locality. Default values for CPT counts and CPU core
bindings are selected automatically to provide good overall performance for
a given CPU count. However, an administrator can deviate from these setting
if they choose. For details on specifying the mapping of CPU cores to
to
<literal>CPT4</literal>.</para>
</listitem>
- <listitem>
- <para>
- <literal>mds_attr_num_cpts=[EXPRESSION]</literal> binds the setattr
- service threads to CPTs defined by
- <literal>EXPRESSION</literal>.</para>
- </listitem>
</itemizedlist>
- <para>Parameters must be set before module load in the file
+ <para>Parameters must be set before module load in the file
<literal>/etc/modprobe.d/lustre.conf</literal>. For example:
<example><title>lustre.conf</title>
<screen>options lnet networks=tcp0(eth0)
<para>By default, this parameter is off. As always, you should test the
performance to compare the impact of changing this parameter.</para>
</section>
- <section condition='l23'>
+ <section>
<title>
<indexterm>
<primary>tuning</primary>
<secondary>Network interface binding</secondary>
</indexterm>Binding Network Interface Against CPU Partitions</title>
- <para>Lustre software release 2.3 and beyond provide enhanced network
- interface control. The enhancement means that an administrator can bind
- an interface to one or more CPU partitions. Bindings are specified as
- options to the LNet modules. For more information on specifying module
- options, see
+ <para>Lustre allows enhanced network interface control. This means that
+ an administrator can bind an interface to one or more CPU partitions.
+ Bindings are specified as options to the LNet modules. For more
+ information on specifying module options, see
<xref linkend="dbdoclet.50438293_15350" /></para>
<para>For example,
<literal>o2ib0(ib0)[0,1]</literal> will ensure that all messages for
<screen>
ko2iblnd credits=256
</screen>
- <note condition="l23">
- <para>In Lustre software release 2.3 and beyond, LNet may revalidate
- the NI credits, so the administrator's request may not persist.</para>
+ <note>
+ <para>LNet may revalidate the NI credits, so the administrator's
+ request may not persist.</para>
</note>
</section>
<section>
<screen>
lnet large_router_buffers=8192
</screen>
- <note condition="l23">
- <para>In Lustre software release 2.3 and beyond, LNet may revalidate
- the router buffer setting, so the administrator's request may not
- persist.</para>
+ <note>
+ <para>LNet may revalidate the router buffer setting, so the
+ administrator's request may not persist.</para>
</note>
</section>
<section>
events across all CPTs. This may balance load better across the CPU but
can incur a cross CPU overhead.</para>
<para>The current policy can be changed by an administrator with
- <literal>echo
- <replaceable>value</replaceable>>
- /proc/sys/lnet/portal_rotor</literal>. There are four options for
+ <literal>lctl set_param portal_rotor=value</literal>.
+ There are four options for
<literal>
<replaceable>value</replaceable>
</literal>:</para>
be MAX.</para>
</section>
</section>
- <section xml:id="dbdoclet.libcfstuning" condition='l23'>
+ <section xml:id="dbdoclet.libcfstuning">
<title>
<indexterm>
<primary>tuning</primary>
<secondary>libcfs</secondary>
</indexterm>libcfs Tuning</title>
- <para>Lustre software release 2.3 introduced binding service threads via
- CPU Partition Tables (CPTs). This allows the system administrator to
- fine-tune on which CPU cores the Lustre service threads are run, for both
- OSS and MDS services, as well as on the client.
+ <para>Lustre allows binding service threads via CPU Partition Tables
+ (CPTs). This allows the system administrator to fine-tune on which CPU
+ cores the Lustre service threads are run, for both OSS and MDS services,
+ as well as on the client.
</para>
<para>CPTs are useful to reserve some cores on the OSS or MDS nodes for
system functions such as system monitoring, HA heartbeat, or similar
<literal>nscheds</literal> parameter. This adjusts the number of threads for
each partition, not the overall number of threads on the LND.</para>
<note>
- <para>Lustre software release 2.3 has greatly decreased the default
- number of threads for
+ <para>The default number of threads for
<literal>ko2iblnd</literal> and
- <literal>ksocklnd</literal> on high-core count machines. The current
- default values are automatically set and are chosen to work well across a
- number of typical scenarios.</para>
+ <literal>ksocklnd</literal> are automatically set and are chosen to
+ work well across a number of typical scenarios, for systems with both
+ high and low core counts.</para>
</note>
<section>
<title>ko2iblnd Tuning</title>
</entry>
<entry>
<para>Introduced in 2.10. Number of connections to each peer. Messages
- are sent round-robin over the connection pool. Provides signifiant
+ are sent round-robin over the connection pool. Provides significant
improvement with OmniPath.</para>
</entry>
</row>
</informaltable>
</section>
</section>
- <section xml:id="dbdoclet.nrstuning" condition='l24'>
+ <section xml:id="dbdoclet.nrstuning">
<title>
<indexterm>
<primary>tuning</primary>
<title>The internal structure of TBF policy</title>
<mediaobject>
<imageobject>
- <imagedata scalefit="1" width="100%"
- fileref="figures/TBF_policy.svg" />
+ <imagedata scalefit="1" width="50%"
+ fileref="figures/TBF_policy.png" />
</imageobject>
<textobject>
<phrase>The internal structure of TBF policy</phrase>
<para>
<emphasis role="bold">Client-side:</emphasis>
</para>
- <screen>
-/proc/fs/lustre/llite/lustre-*
-</screen>
+ <screen>llite.<replaceable>fsname</replaceable>-*</screen>
<para>
<literal>contention_seconds</literal>-
<literal>llite</literal> inode remembers its contended state for the
<emphasis role="bold">Client-side statistics:</emphasis>
</para>
<para>The
- <literal>/proc/fs/lustre/llite/lustre-*/stats</literal> file has new
- rows for lockless I/O statistics.</para>
+ <literal>llite.<replaceable>fsname</replaceable>-*.stats</literal>
+ parameter has several entries for lockless I/O statistics.</para>
<para>
<literal>lockless_read_bytes</literal> and
<literal>lockless_write_bytes</literal>- To count the total bytes read
16MB. To temporarily change <literal>brw_size</literal>, the
following command should be run on the OSS:</para>
<screen>oss# lctl set_param obdfilter.<replaceable>fsname</replaceable>-OST*.brw_size=16</screen>
- <para>To persistently change <literal>brw_size</literal>, one of the following
- commands should be run on the OSS:</para>
+ <para>To persistently change <literal>brw_size</literal>, the
+ following command should be run:</para>
<screen>oss# lctl set_param -P obdfilter.<replaceable>fsname</replaceable>-OST*.brw_size=16</screen>
- <screen>oss# lctl conf_param <replaceable>fsname</replaceable>-OST*.obdfilter.brw_size=16</screen>
<para>When a client connects to an OST target, it will fetch
<literal>brw_size</literal> from the target and pick the maximum value
of <literal>brw_size</literal> and its local setting for
<screen>client$ lctl set_param osc.<replaceable>fsname</replaceable>-OST*.max_pages_per_rpc=16M</screen>
<para>To persistently make this change, the following command should
be run:</para>
- <screen>client$ lctl conf_param <replaceable>fsname</replaceable>-OST*.osc.max_pages_per_rpc=16M</screen>
+ <screen>client$ lctl set_param -P obdfilter.<replaceable>fsname</replaceable>-OST*.osc.max_pages_per_rpc=16M</screen>
<caution><para>The <literal>brw_size</literal> of an OST can be
changed on the fly. However, clients have to be remounted to
renegotiate the new maximum RPC size.</para></caution>
client is more likely to become CPU-bound during reads than writes.</para>
</section>
</chapter>
+<!--
+ vim:expandtab:shiftwidth=2:tabstop=8:
+ -->