<?xml version='1.0' encoding='UTF-8'?>
-<chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0"
- xml:lang="en-US" xml:id="lustreproc">
+<chapter xmlns="http://docbook.org/ns/docbook"
+ xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
+ xml:id="lustreproc">
<title xml:id="lustreproc.title">Lustre Parameters</title>
- <para>The <literal>/proc</literal> and <literal>/sys</literal> file systems
- acts as an interface to internal data structures in the kernel. This chapter
- describes parameters and tunables that are useful for optimizing and
- monitoring aspects of a Lustre file system. It includes these sections:</para>
+ <para>There are many parameters for Lustre that can tune client and server
+ performance, change behavior of the system, and report statistics about
+ various subsystems. This chapter describes the various parameters and
+ tunables that are useful for optimizing and monitoring aspects of a Lustre
+ file system. It includes these sections:</para>
<itemizedlist>
<listitem>
- <para><xref linkend="dbdoclet.50438271_83523"/></para>
+ <para><xref linkend="enabling_interpreting_debugging_logs"/></para>
<para>.</para>
</listitem>
</itemizedlist>
</para>
<para>Typically, metrics are accessed via <literal>lctl get_param</literal>
files and settings are changed by via <literal>lctl set_param</literal>.
- While it is possible to access parameters in <literal>/proc</literal>
+ They allow getting and setting multiple parameters with a single command,
+ through the use of wildcards in one or more part of the parameter name.
+ While each of these parameters maps to files in <literal>/proc</literal>
and <literal>/sys</literal> directly, the location of these parameters may
- change between releases, so it is recommended to always use
+ change between Lustre releases, so it is recommended to always use
<literal>lctl</literal> to access the parameters from userspace scripts.
Some data is server-only, some data is client-only, and some data is
exported from the client to the server and is thus duplicated in both
<para>In the examples in this chapter, <literal>#</literal> indicates
a command is entered as root. Lustre servers are named according to the
convention <literal><replaceable>fsname</replaceable>-<replaceable>MDT|OSTnumber</replaceable></literal>.
- The standard UNIX wildcard designation (*) is used.</para>
+ The standard UNIX wildcard designation (*) is used to represent any
+ part of a single component of the parameter name, excluding
+ "<literal>.</literal>" and "<literal>/</literal>".
+ It is also possible to use brace <literal>{}</literal>expansion
+ to specify a list of parameter names efficiently.</para>
</note>
<para>Some examples are shown below:</para>
<itemizedlist>
<listitem>
- <para> To obtain data from a Lustre client:</para>
- <screen># lctl list_param osc.*
-osc.testfs-OST0000-osc-ffff881071d5cc00
-osc.testfs-OST0001-osc-ffff881071d5cc00
-osc.testfs-OST0002-osc-ffff881071d5cc00
-osc.testfs-OST0003-osc-ffff881071d5cc00
-osc.testfs-OST0004-osc-ffff881071d5cc00
-osc.testfs-OST0005-osc-ffff881071d5cc00
-osc.testfs-OST0006-osc-ffff881071d5cc00
-osc.testfs-OST0007-osc-ffff881071d5cc00
-osc.testfs-OST0008-osc-ffff881071d5cc00</screen>
+ <para> To list available OST targets on a Lustre client:</para>
+ <screen># lctl list_param -F osc.*
+osc.testfs-OST0000-osc-ffff881071d5cc00/
+osc.testfs-OST0001-osc-ffff881071d5cc00/
+osc.testfs-OST0002-osc-ffff881071d5cc00/
+osc.testfs-OST0003-osc-ffff881071d5cc00/
+osc.testfs-OST0004-osc-ffff881071d5cc00/
+osc.testfs-OST0005-osc-ffff881071d5cc00/
+osc.testfs-OST0006-osc-ffff881071d5cc00/
+osc.testfs-OST0007-osc-ffff881071d5cc00/
+osc.testfs-OST0008-osc-ffff881071d5cc00/</screen>
<para>In this example, information about OST connections available
- on a client is displayed (indicated by "osc").</para>
+ on a client is displayed (indicated by "osc"). Each of these
+ connections may have numerous sub-parameters as well.</para>
</listitem>
</itemizedlist>
<itemizedlist>
</itemizedlist>
<itemizedlist>
<listitem>
+ <para> To see a specific subset of parameters, use braces, like:
+<screen># lctl list_param osc.*.{checksum,connect}*
+osc.testfs-OST0000-osc-ffff881071d5cc00.checksum_type
+osc.testfs-OST0000-osc-ffff881071d5cc00.checksums
+osc.testfs-OST0000-osc-ffff881071d5cc00.connect_flags
+</screen></para>
+ </listitem>
+ </itemizedlist>
+ <itemizedlist>
+ <listitem>
<para> To view a specific file, use <literal>lctl get_param</literal>:
<screen># lctl get_param osc.lustre-OST0000*.rpc_stats</screen></para>
</listitem>
</itemizedlist>
<para>For more information about using <literal>lctl</literal>, see <xref
- xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438194_51490"/>.</para>
+ xmlns:xlink="http://www.w3.org/1999/xlink" linkend="setting_param_with_lctl"/>.</para>
<para>Data can also be viewed using the <literal>cat</literal> command
with the full path to the file. The form of the <literal>cat</literal>
command is similar to that of the <literal>lctl get_param</literal>
version and the Lustre version being used. The <literal>lctl</literal>
command insulates scripts from these changes and is preferred over direct
file access, unless as part of a high-performance monitoring system.
- In the <literal>cat</literal> command:</para>
- <itemizedlist>
- <listitem>
- <para>Replace the dots in the path with slashes.</para>
- </listitem>
- <listitem>
- <para>Prepend the path with the appropriate directory component:
- <screen>/{proc,sys}/{fs,sys}/{lustre,lnet}</screen></para>
- </listitem>
- </itemizedlist>
- <para>For example, an <literal>lctl get_param</literal> command may look like
- this:<screen># lctl get_param osc.*.uuid
-osc.testfs-OST0000-osc-ffff881071d5cc00.uuid=594db456-0685-bd16-f59b-e72ee90e9819
-osc.testfs-OST0001-osc-ffff881071d5cc00.uuid=594db456-0685-bd16-f59b-e72ee90e9819
-...</screen></para>
- <para>The equivalent <literal>cat</literal> command may look like this:
- <screen># cat /proc/fs/lustre/osc/*/uuid
-594db456-0685-bd16-f59b-e72ee90e9819
-594db456-0685-bd16-f59b-e72ee90e9819
-...</screen></para>
- <para>or like this:
- <screen># cat /sys/fs/lustre/osc/*/uuid
-594db456-0685-bd16-f59b-e72ee90e9819
-594db456-0685-bd16-f59b-e72ee90e9819
-...</screen></para>
+ </para>
+ <note condition='l2c'><para>Starting in Lustre 2.12, there is
+ <literal>lctl get_param</literal> and <literal>lctl set_param</literal>
+ command can provide <emphasis>tab completion</emphasis> when using an
+ interactive shell with <literal>bash-completion</literal> installed.
+ This simplifies the use of <literal>get_param</literal> significantly,
+ since it provides an interactive list of available parameters.
+ </para></note>
<para>The <literal>llstat</literal> utility can be used to monitor some
Lustre file system I/O activity over a specified time period. For more
details, see
- <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438219_23232"/></para>
+ <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="config_llstat"/></para>
<para>Some data is imported from attached clients and is available in a
directory called <literal>exports</literal> located in the corresponding
per-service directory on a Lustre server. For example:
<listitem>
<para><literal>brw_stats</literal> – Histogram data characterizing I/O requests to the
OSTs. For more details, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
- linkend="dbdoclet.50438271_55057"/>.</para>
+ linkend="monitor_ost_block_io_stream"/>.</para>
</listitem>
<listitem>
<para><literal>rpc_stats</literal> – Histogram data showing information about RPCs made by
to that point in the table of calls (<literal>cum %</literal>). </para>
</section>
</section>
- <section xml:id="dbdoclet.50438271_55057">
+ <section xml:id="monitor_ost_block_io_stream">
<title><indexterm>
<primary>proc</primary>
<secondary>block I/O</secondary>
</indexterm>Monitoring the OST Block I/O Stream</title>
- <para>The <literal>brw_stats</literal> file in the <literal>obdfilter</literal> directory
- contains histogram data showing statistics for number of I/O requests sent to the disk,
- their size, and whether they are contiguous on the disk or not.</para>
+ <para>The <literal>brw_stats</literal> parameter file below the
+ <literal>osd-ldiskfs</literal> or <literal>osd-zfs</literal> directory
+ contains histogram data showing statistics for number of I/O requests
+ sent to the disk, their size, and whether they are contiguous on the
+ disk or not.</para>
<para><emphasis role="italic"><emphasis role="bold">Example:</emphasis></emphasis></para>
- <para>Enter on the OSS:</para>
- <screen># lctl get_param obdfilter.testfs-OST0000.brw_stats
+ <para>Enter on the OSS or MDS:</para>
+ <screen>oss# lctl get_param osd-*.*.brw_stats
snapshot_time: 1372775039.769045 (secs.usecs)
read | write
pages per bulk r/w rpcs % cum % | rpcs % cum %
512K: 0 0 100 | 24 0 0
1M: 0 0 100 | 23142 99 100
</screen>
- <para>The tabular data is described in the table below. Each row in the table shows the number
- of reads and writes occurring for the statistic (<literal>ios</literal>), the relative
- percentage of total reads or writes (<literal>%</literal>), and the cumulative percentage to
- that point in the table for the statistic (<literal>cum %</literal>). </para>
+ <para>The tabular data is described in the table below. Each row in the
+ table shows the number of reads and writes occurring for the statistic
+ (<literal>ios</literal>), the relative percentage of total reads or
+ writes (<literal>%</literal>), and the cumulative percentage to that
+ point in the table for the statistic (<literal>cum %</literal>). </para>
<informaltable frame="all">
<tgroup cols="2">
<colspec colname="c1" colwidth="40*"/>
</para>
</listitem>
<listitem>
- <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_cache_mb</literal>
- - Maximum amount of inactive data cached by the client. The
- default value is 3/4 of the client RAM.
+ <para><literal>llite.<replaceable>fsname_instance</replaceable>.max_cached_mb</literal>
+ - Maximum amount of read+write data cached by the client. The
+ default value is 1/2 of the client RAM.
</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>llite.<replaceable>fsname_instance</replaceable>.max_read_ahead_mb</literal>
- - Controls the maximum amount of data readahead on a file.
+ - Controls the maximum amount of data readahead on all files.
Files are read ahead in RPC-sized chunks (4 MiB, or the size of
the <literal>read()</literal> call, if larger) after the second
sequential read on a file descriptor. Random reads are done at
<para><literal>llite.<replaceable>fsname_instance</replaceable>.max_read_ahead_per_file_mb</literal>
- Controls the maximum number of megabytes (MiB) of data that
should be prefetched by the client when sequential reads are
- detected on a file. This is the per-file readahead limit and
+ detected on one file. This is the per-file readahead limit and
cannot be larger than <literal>max_read_ahead_mb</literal>.
</para>
</listitem>
<title><indexterm>
<primary>proc</primary>
<secondary>read cache</secondary>
- </indexterm>Tuning OSS Read Cache</title>
- <para>The OSS read cache feature provides read-only caching of data on an OSS. This
- functionality uses the Linux page cache to store the data and uses as much physical memory
+ </indexterm>Tuning Server Read Cache</title>
+ <para>The server read cache feature provides read-only caching of file
+ data on an OSS or MDS (for Data-on-MDT). This functionality uses the
+ Linux page cache to store the data and uses as much physical memory
as is allocated.</para>
- <para>OSS read cache improves Lustre file system performance in these situations:</para>
+ <para>The server read cache can improves Lustre file system performance
+ in these situations:</para>
<itemizedlist>
<listitem>
- <para>Many clients are accessing the same data set (as in HPC applications or when
- diskless clients boot from the Lustre file system).</para>
+ <para>Many clients are accessing the same data set (as in HPC
+ applications or when diskless clients boot from the Lustre file
+ system).</para>
</listitem>
<listitem>
- <para>One client is storing data while another client is reading it (i.e., clients are
- exchanging data via the OST).</para>
+ <para>One client is writing data while another client is reading
+ it (i.e., clients are exchanging data via the filesystem).</para>
</listitem>
<listitem>
<para>A client has very limited caching of its own.</para>
</listitem>
</itemizedlist>
- <para>OSS read cache offers these benefits:</para>
+ <para>The server read cache offers these benefits:</para>
<itemizedlist>
<listitem>
- <para>Allows OSTs to cache read data more frequently.</para>
+ <para>Allows servers to cache read data more frequently.</para>
</listitem>
<listitem>
- <para>Improves repeated reads to match network speeds instead of disk speeds.</para>
+ <para>Improves repeated reads to match network speeds instead of
+ storage speeds.</para>
</listitem>
<listitem>
- <para>Provides the building blocks for OST write cache (small-write aggregation).</para>
+ <para>Provides the building blocks for server write cache
+ (small-write aggregation).</para>
</listitem>
</itemizedlist>
<section remap="h4">
- <title>Using OSS Read Cache</title>
- <para>OSS read cache is implemented on the OSS, and does not require any special support on
- the client side. Since OSS read cache uses the memory available in the Linux page cache,
- the appropriate amount of memory for the cache should be determined based on I/O patterns;
- if the data is mostly reads, then more cache is required than would be needed for mostly
- writes.</para>
- <para>OSS read cache is managed using the following tunables:</para>
+ <title>Using Server Read Cache</title>
+ <para>The server read cache is implemented on the OSS and MDS, and does
+ not require any special support on the client side. Since the server
+ read cache uses the memory available in the Linux page cache, the
+ appropriate amount of memory for the cache should be determined based
+ on I/O patterns. If the data is mostly reads, then more cache is
+ beneficial on the server than would be needed for mostly writes.
+ </para>
+ <para>The server read cache is managed using the following tunables.
+ Many tunables are available for both <literal>osd-ldiskfs</literal>
+ and <literal>osd-zfs</literal>, but in some cases the implementation
+ of <literal>osd-zfs</literal> prevents their use.</para>
<itemizedlist>
<listitem>
- <para><literal>read_cache_enable</literal> - Controls whether data read from disk during
- a read request is kept in memory and available for later read requests for the same
- data, without having to re-read it from disk. By default, read cache is enabled
- (<literal>read_cache_enable=1</literal>).</para>
- <para>When the OSS receives a read request from a client, it reads data from disk into
- its memory and sends the data as a reply to the request. If read cache is enabled,
- this data stays in memory after the request from the client has been fulfilled. When
- subsequent read requests for the same data are received, the OSS skips reading data
- from disk and the request is fulfilled from the cached data. The read cache is managed
- by the Linux kernel globally across all OSTs on that OSS so that the least recently
- used cache pages are dropped from memory when the amount of free memory is running
- low.</para>
- <para>If read cache is disabled (<literal>read_cache_enable=0</literal>), the OSS
- discards the data after a read request from the client is serviced and, for subsequent
- read requests, the OSS again reads the data from disk.</para>
- <para>To disable read cache on all the OSTs of an OSS, run:</para>
- <screen>root@oss1# lctl set_param obdfilter.*.read_cache_enable=0</screen>
- <para>To re-enable read cache on one OST, run:</para>
- <screen>root@oss1# lctl set_param obdfilter.{OST_name}.read_cache_enable=1</screen>
- <para>To check if read cache is enabled on all OSTs on an OSS, run:</para>
- <screen>root@oss1# lctl get_param obdfilter.*.read_cache_enable</screen>
+ <para><literal>read_cache_enable</literal> - High-level control of
+ whether data read from storage during a read request is kept in
+ memory and available for later read requests for the same data,
+ without having to re-read it from storage. By default, read cache
+ is enabled (<literal>read_cache_enable=1</literal>) for HDD OSDs
+ and automatically disabled for flash OSDs
+ (<literal>nonrotational=1</literal>).
+ The read cache cannot be disabled for <literal>osd-zfs</literal>,
+ and as a result this parameter is unavailable for that backend.
+ </para>
+ <para>When the server receives a read request from a client,
+ it reads data from storage into its memory and sends the data
+ to the client. If read cache is enabled for the target,
+ and the RPC and object size also meet the other criterion below,
+ this data may stay in memory after the client request has
+ completed. If later read requests for the same data are received,
+ if the data is still in cache the server skips reading it from
+ storage. The cache is managed by the Linux kernel globally
+ across all targets on that server so that the infrequently used
+ cache pages are dropped from memory when the free memory is
+ running low.</para>
+ <para>If read cache is disabled
+ (<literal>read_cache_enable=0</literal>), or the read or object
+ is large enough that it will not benefit from caching, the server
+ discards the data after the read request from the client is
+ completed. For subsequent read requests the server again reads
+ the data from storage.</para>
+ <para>To disable read cache on all targets of a server, run:</para>
+ <screen>
+ oss1# lctl set_param osd-*.*.read_cache_enable=0
+ </screen>
+ <para>To re-enable read cache on one target, run:</para>
+ <screen>
+ oss1# lctl set_param osd-*.{target_name}.read_cache_enable=1
+ </screen>
+ <para>To check if read cache is enabled on targets on a server, run:
+ </para>
+ <screen>
+ oss1# lctl get_param osd-*.*.read_cache_enable
+ </screen>
</listitem>
<listitem>
- <para><literal>writethrough_cache_enable</literal> - Controls whether data sent to the
- OSS as a write request is kept in the read cache and available for later reads, or if
- it is discarded from cache when the write is completed. By default, the writethrough
- cache is enabled (<literal>writethrough_cache_enable=1</literal>).</para>
- <para>When the OSS receives write requests from a client, it receives data from the
- client into its memory and writes the data to disk. If the writethrough cache is
- enabled, this data stays in memory after the write request is completed, allowing the
- OSS to skip reading this data from disk if a later read request, or partial-page write
- request, for the same data is received.</para>
+ <para><literal>writethrough_cache_enable</literal> - High-level
+ control of whether data sent to the server as a write request is
+ kept in the read cache and available for later reads, or if it is
+ discarded when the write completes. By default, writethrough
+ cache is enabled (<literal>writethrough_cache_enable=1</literal>)
+ for HDD OSDs and automatically disabled for flash OSDs
+ (<literal>nonrotational=1</literal>).
+ The write cache cannot be disabled for <literal>osd-zfs</literal>,
+ and as a result this parameter is unavailable for that backend.
+ </para>
+ <para>When the server receives write requests from a client, it
+ fetches data from the client into its memory and writes the data
+ to storage. If the writethrough cache is enabled for the target,
+ and the RPC and object size meet the other criterion below,
+ this data may stay in memory after the write request has
+ completed. If later read or partial-block write requests for this
+ same data are received, if the data is still in cache the server
+ skips reading it from storage.
+ </para>
<para>If the writethrough cache is disabled
- (<literal>writethrough_cache_enabled=0</literal>), the OSS discards the data after
- the write request from the client is completed. For subsequent read requests, or
- partial-page write requests, the OSS must re-read the data from disk.</para>
- <para>Enabling writethrough cache is advisable if clients are doing small or unaligned
- writes that would cause partial-page updates, or if the files written by one node are
- immediately being accessed by other nodes. Some examples where enabling writethrough
- cache might be useful include producer-consumer I/O models or shared-file writes with
- a different node doing I/O not aligned on 4096-byte boundaries. </para>
- <para>Disabling the writethrough cache is advisable when files are mostly written to the
- file system but are not re-read within a short time period, or files are only written
- and re-read by the same node, regardless of whether the I/O is aligned or not.</para>
- <para>To disable the writethrough cache on all OSTs of an OSS, run:</para>
- <screen>root@oss1# lctl set_param obdfilter.*.writethrough_cache_enable=0</screen>
+ (<literal>writethrough_cache_enabled=0</literal>), or the
+ write or object is large enough that it will not benefit from
+ caching, the server discards the data after the write request
+ from the client is completed. For subsequent read requests, or
+ partial-page write requests, the server must re-read the data
+ from storage.</para>
+ <para>Enabling writethrough cache is advisable if clients are doing
+ small or unaligned writes that would cause partial-page updates,
+ or if the files written by one node are immediately being read by
+ other nodes. Some examples where enabling writethrough cache
+ might be useful include producer-consumer I/O models or
+ shared-file writes that are not aligned on 4096-byte boundaries.
+ </para>
+ <para>Disabling the writethrough cache is advisable when files are
+ mostly written to the file system but are not re-read within a
+ short time period, or files are only written and re-read by the
+ same node, regardless of whether the I/O is aligned or not.</para>
+ <para>To disable writethrough cache on all targets on a server, run:
+ </para>
+ <screen>
+ oss1# lctl set_param osd-*.*.writethrough_cache_enable=0
+ </screen>
<para>To re-enable the writethrough cache on one OST, run:</para>
- <screen>root@oss1# lctl set_param obdfilter.{OST_name}.writethrough_cache_enable=1</screen>
+ <screen>
+ oss1# lctl set_param osd-*.{OST_name}.writethrough_cache_enable=1
+ </screen>
<para>To check if the writethrough cache is enabled, run:</para>
- <screen>root@oss1# lctl get_param obdfilter.*.writethrough_cache_enable</screen>
+ <screen>
+ oss1# lctl get_param osd-*.*.writethrough_cache_enable
+ </screen>
</listitem>
<listitem>
- <para><literal>readcache_max_filesize</literal> - Controls the maximum size of a file
- that both the read cache and writethrough cache will try to keep in memory. Files
- larger than <literal>readcache_max_filesize</literal> will not be kept in cache for
- either reads or writes.</para>
- <para>Setting this tunable can be useful for workloads where relatively small files are
- repeatedly accessed by many clients, such as job startup files, executables, log
- files, etc., but large files are read or written only once. By not putting the larger
- files into the cache, it is much more likely that more of the smaller files will
- remain in cache for a longer time.</para>
- <para>When setting <literal>readcache_max_filesize</literal>, the input value can be
- specified in bytes, or can have a suffix to indicate other binary units such as
- <literal>K</literal> (kilobytes), <literal>M</literal> (megabytes),
- <literal>G</literal> (gigabytes), <literal>T</literal> (terabytes), or
- <literal>P</literal> (petabytes).</para>
- <para>To limit the maximum cached file size to 32 MB on all OSTs of an OSS, run:</para>
- <screen>root@oss1# lctl set_param obdfilter.*.readcache_max_filesize=32M</screen>
- <para>To disable the maximum cached file size on an OST, run:</para>
- <screen>root@oss1# lctl set_param obdfilter.{OST_name}.readcache_max_filesize=-1</screen>
- <para>To check the current maximum cached file size on all OSTs of an OSS, run:</para>
- <screen>root@oss1# lctl get_param obdfilter.*.readcache_max_filesize</screen>
+ <para><literal>readcache_max_filesize</literal> - Controls the
+ maximum size of an object that both the read cache and
+ writethrough cache will try to keep in memory. Objects larger
+ than <literal>readcache_max_filesize</literal> will not be kept
+ in cache for either reads or writes regardless of the
+ <literal>read_cache_enable</literal> or
+ <literal>writethrough_cache_enable</literal> settings.</para>
+ <para>Setting this tunable can be useful for workloads where
+ relatively small objects are repeatedly accessed by many clients,
+ such as job startup objects, executables, log objects, etc., but
+ large objects are read or written only once. By not putting the
+ larger objects into the cache, it is much more likely that more
+ of the smaller objects will remain in cache for a longer time.
+ </para>
+ <para>When setting <literal>readcache_max_filesize</literal>,
+ the input value can be specified in bytes, or can have a suffix
+ to indicate other binary units such as
+ <literal>K</literal> (kibibytes),
+ <literal>M</literal> (mebibytes),
+ <literal>G</literal> (gibibytes),
+ <literal>T</literal> (tebibytes), or
+ <literal>P</literal> (pebibytes).</para>
+ <para>
+ To limit the maximum cached object size to 64 MiB on all OSTs of
+ a server, run:
+ </para>
+ <screen>
+ oss1# lctl set_param osd-*.*.readcache_max_filesize=64M
+ </screen>
+ <para>To disable the maximum cached object size on all targets, run:
+ </para>
+ <screen>
+ oss1# lctl set_param osd-*.*.readcache_max_filesize=-1
+ </screen>
+ <para>
+ To check the current maximum cached object size on all targets of
+ a server, run:
+ </para>
+ <screen>
+ oss1# lctl get_param osd-*.*.readcache_max_filesize
+ </screen>
+ </listitem>
+ <listitem>
+ <para><literal>readcache_max_io_mb</literal> - Controls the maximum
+ size of a single read IO that will be cached in memory. Reads
+ larger than <literal>readcache_max_io_mb</literal> will be read
+ directly from storage and bypass the page cache completely.
+ This avoids significant CPU overhead at high IO rates.
+ The read cache cannot be disabled for <literal>osd-zfs</literal>,
+ and as a result this parameter is unavailable for that backend.
+ </para>
+ <para>When setting <literal>readcache_max_io_mb</literal>, the
+ input value can be specified in mebibytes, or can have a suffix
+ to indicate other binary units such as
+ <literal>K</literal> (kibibytes),
+ <literal>M</literal> (mebibytes),
+ <literal>G</literal> (gibibytes),
+ <literal>T</literal> (tebibytes), or
+ <literal>P</literal> (pebibytes).</para>
+ </listitem>
+ <listitem>
+ <para><literal>writethrough_max_io_mb</literal> - Controls the
+ maximum size of a single writes IO that will be cached in memory.
+ Writes larger than <literal>writethrough_max_io_mb</literal> will
+ be written directly to storage and bypass the page cache entirely.
+ This avoids significant CPU overhead at high IO rates.
+ The write cache cannot be disabled for <literal>osd-zfs</literal>,
+ and as a result this parameter is unavailable for that backend.
+ </para>
+ <para>When setting <literal>writethrough_max_io_mb</literal>, the
+ input value can be specified in mebibytes, or can have a suffix
+ to indicate other binary units such as
+ <literal>K</literal> (kibibytes),
+ <literal>M</literal> (mebibytes),
+ <literal>G</literal> (gibibytes),
+ <literal>T</literal> (tebibytes), or
+ <literal>P</literal> (pebibytes).</para>
</listitem>
</itemizedlist>
</section>
<screen>$ lctl get_param obdfilter.*.sync_on_lock_cancel
obdfilter.lol-OST0001.sync_on_lock_cancel=never</screen>
</section>
- <section xml:id="dbdoclet.TuningModRPCs" condition='l28'>
+ <section xml:id="TuningModRPCs" condition='l28'>
<title>
<indexterm>
<primary>proc</primary>
<literal>rtr </literal></para>
</entry>
<entry>
- <para>Number of routing buffer credits.</para>
+ <para>Number of available routing buffer credits.</para>
</entry>
</row>
<row>
<literal>tx </literal></para>
</entry>
<entry>
- <para>Number of send credits.</para>
+ <para>Number of available send credits.</para>
</entry>
</row>
<row>
</tbody>
</tgroup>
</informaltable>
- <para>Credits are initialized to allow a certain number of operations (in the example
- above the table, eight as shown in the <literal>max</literal> column. LNet keeps track
- of the minimum number of credits ever seen over time showing the peak congestion that
- has occurred during the time monitored. Fewer available credits indicates a more
- congested resource. </para>
- <para>The number of credits currently in flight (number of transmit credits) is shown in
- the <literal>tx</literal> column. The maximum number of send credits available is shown
- in the <literal>max</literal> column and never changes. The number of router buffers
- available for consumption by a peer is shown in the <literal>rtr</literal>
- column.</para>
- <para>Therefore, <literal>rtr</literal> – <literal>tx</literal> is the number of transmits
- in flight. Typically, <literal>rtr == max</literal>, although a configuration can be set
- such that <literal>max >= rtr</literal>. The ratio of routing buffer credits to send
- credits (<literal>rtr/tx</literal>) that is less than <literal>max</literal> indicates
- operations are in progress. If the ratio <literal>rtr/tx</literal> is greater than
- <literal>max</literal>, operations are blocking.</para>
- <para>LNet also limits concurrent sends and number of router buffers allocated to a single
- peer so that no peer can occupy all these resources.</para>
+ <para>Credits are initialized to allow a certain number of operations
+ (in the example above the table, eight as shown in the
+ <literal>max</literal> column. LNet keeps track of the minimum
+ number of credits ever seen over time showing the peak congestion
+ that has occurred during the time monitored. Fewer available credits
+ indicates a more congested resource. </para>
+ <para>The number of credits currently available is shown in the
+ <literal>tx</literal> column. The maximum number of send credits is
+ shown in the <literal>max</literal> column and never changes. The
+ number of currently active transmits can be derived by
+ <literal>(max - tx)</literal>, as long as
+ <literal>tx</literal> is greater than or equal to 0. Once
+ <literal>tx</literal> is less than 0, it indicates the number of
+ transmits on that peer which have been queued for lack of credits.
+ </para>
+ <para>The number of router buffer credits available for consumption
+ by a peer is shown in <literal>rtr</literal> column. The number of
+ routing credits can be configured separately at the LND level or at
+ the LNet level by using the <literal>peer_buffer_credits</literal>
+ module parameter for the appropriate module. If the routing credits
+ is not set explicitly, it'll default to the maximum transmit credits
+ defined by <literal>peer_credits</literal> module parameter.
+ Whenever a gateway routes a message from a peer, it decrements the
+ number of available routing credits for that peer. If that value
+ goes to zero, then messages will be queued. Negative values show the
+ number of queued message waiting to be routed. The number of
+ messages which are currently being routed from a peer can be derived
+ by <literal>(max_rtr_credits - rtr)</literal>.</para>
+ <para>LNet also limits concurrent sends and number of router buffers
+ allocated to a single peer so that no peer can occupy all resources.
+ </para>
</listitem>
<listitem>
- <para><literal>nis</literal> - Shows the current queue health on this node.</para>
+ <para><literal>nis</literal> - Shows current queue health on the node.
+ </para>
<para>Example:</para>
<screen># lctl get_param nis
nid refs peer max tx min
</listitem>
</itemizedlist></para>
</section>
- <section remap="h3" xml:id="dbdoclet.balancing_free_space">
+ <section remap="h3" xml:id="balancing_free_space">
<title><indexterm>
<primary>proc</primary>
<secondary>free space</secondary>
space is more than this. The default is 0.2% of total OST size.</para>
</listitem>
</itemizedlist>
- <para>For more information about monitoring and managing free space, see <xref
- xmlns:xlink="http://www.w3.org/1999/xlink" linkend="dbdoclet.50438209_10424"/>.</para>
+ <para>For more information about monitoring and managing free space, see
+ <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="file_striping.managing_free_space"/>.</para>
</section>
<section remap="h3">
<title><indexterm>
<itemizedlist>
<listitem>
<para>To enable automatic LRU sizing, set the
- <literal>lru_size</literal> parameter to 0. In this case, the
- <literal>lru_size</literal> parameter shows the current number of locks
+ <literal>lru_size</literal> parameter to 0. In this case, the
+ <literal>lru_size</literal> parameter shows the current number of locks
being used on the client. Dynamic LRU resizing is enabled by default.
- </para>
+ </para>
</listitem>
<listitem>
<para>To specify a maximum number of locks, set the
- <literal>lru_size</literal> parameter to a value other than zero.
- A good default value for compute nodes is around
- <literal>100 * <replaceable>num_cpus</replaceable></literal>.
+ <literal>lru_size</literal> parameter to a value other than zero.
+ A good default value for compute nodes is around
+ <literal>100 * <replaceable>num_cpus</replaceable></literal>.
It is recommended that you only set <literal>lru_size</literal>
- to be signifivantly larger on a few login nodes where multiple
- users access the file system interactively.</para>
+ to be signifivantly larger on a few login nodes where multiple
+ users access the file system interactively.</para>
</listitem>
</itemizedlist>
<para>To clear the LRU on a single client, and, as a result, flush client
<note>
<para>The <literal>lru_size</literal> parameter can only be set
temporarily using <literal>lctl set_param</literal>, it cannot be set
- permanently.</para>
+ permanently.</para>
</note>
<para>To disable dynamic LRU resizing on the clients, run for example:
</para>
ldlm.namespaces.myth-MDT0000-mdc-ffff8804296c2800.lru_max_age=900000
</screen>
</section>
- <section xml:id="dbdoclet.50438271_87260">
+ <section xml:id="tuning_setting_thread_count">
<title><indexterm>
<primary>proc</primary>
<secondary>thread counts</secondary>
<screen># lctl set_param <replaceable>service</replaceable>.threads_<replaceable>min|max|started=num</replaceable> </screen>
</listitem>
<listitem>
- <para>To permanently set this tunable, run:</para>
- <screen># lctl conf_param <replaceable>obdname|fsname.obdtype</replaceable>.threads_<replaceable>min|max|started</replaceable> </screen>
- <para condition='l25'>For version 2.5 or later, run:
- <screen># lctl set_param -P <replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></screen></para>
+ <para>To permanently set this tunable, run the following command on
+ the MGS:
+ <screen>mgs# lctl set_param -P <replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></screen></para>
+ <para condition='l25'>For Lustre 2.5 or earlier, run:
+ <screen>mgs# lctl conf_param <replaceable>obdname|fsname.obdtype</replaceable>.threads_<replaceable>min|max|started</replaceable></screen>
+ </para>
</listitem>
</itemizedlist>
- <para>The following examples show how to set thread counts and get the number of running threads
- for the service <literal>ost_io</literal> using the tunable
- <literal><replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></literal>.</para>
+ <para>The following examples show how to set thread counts and get the
+ number of running threads for the service <literal>ost_io</literal>
+ using the tunable
+ <literal><replaceable>service</replaceable>.threads_<replaceable>min|max|started</replaceable></literal>.</para>
<itemizedlist>
<listitem>
<para>To get the number of running threads, run:</para>
</note>
<para>See also <xref xmlns:xlink="http://www.w3.org/1999/xlink" linkend="lustretuning"/></para>
</section>
- <section xml:id="dbdoclet.50438271_83523">
+ <section xml:id="enabling_interpreting_debugging_logs">
<title><indexterm>
<primary>proc</primary>
<secondary>debug</secondary>
<section>
<title>Interpreting OST Statistics</title>
<note>
- <para>See also <xref linkend="dbdoclet.50438219_84890"/> (<literal>llobdstat</literal>) and
- <xref linkend="dbdoclet.50438273_80593"/> (<literal>collectl</literal>).</para>
+ <para>See also
+ <xref linkend="collectl"/> (<literal>collectl</literal>).</para>
</note>
<para>OST <literal>stats</literal> files can be used to provide statistics showing activity
for each OST. For example:</para>
<section>
<title>Interpreting MDT Statistics</title>
<note>
- <para>See also <xref linkend="dbdoclet.50438219_84890"/> (<literal>llobdstat</literal>) and
- <xref linkend="dbdoclet.50438273_80593"/> (<literal>collectl</literal>).</para>
+ <para>See also
+ <xref linkend="collectl"/> (<literal>collectl</literal>).</para>
</note>
<para>MDT <literal>stats</literal> files can be used to track MDT
statistics for the MDS. The example below shows sample output from an
</section>
</section>
</chapter>
+<!--
+ vim:expandtab:shiftwidth=2:tabstop=8:
+ -->