1 <?xml version='1.0' encoding='UTF-8'?>
2 <!-- This document was created with Syntext Serna Free. --><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="configuringlustre">
3 <title xml:id="configuringlustre.title">Configuring Lustre</title>
4 <para>This chapter shows how to configure a simple Lustre system comprised of a combined MGS/MDT, an OST and a client. It includes:</para>
7 <para><xref linkend="dbdoclet.50438267_50692"/>
11 <para><xref linkend="dbdoclet.50438267_76752"/>
15 <section xml:id="dbdoclet.50438267_50692">
17 <indexterm><primary>Lustre</primary><secondary>configuring</secondary></indexterm>
18 Configuring a Simple Lustre File System</title>
19 <para>A Lustre system can be set up in a variety of configurations by using the administrative utilities provided with Lustre. The procedure below shows how to configure a simple Lustre file system consisting of a combined MGS/MDS, one OSS with two OSTs, and a client. For an overview of the entire Lustre installation procedure, see <xref linkend="installoverview"/>.</para>
20 <para>This configuration procedure assumes you have completed the following:</para>
24 <emphasis role="bold">Set up and configured your hardware</emphasis>
25 </emphasis>. For more information about hardware requirements, see <xref linkend="settinguplustresystem"/>.</para>
28 <para><emphasis role="bold">Downloaded and installed the Lustre software.</emphasis> For more information about preparing for and installing the Lustre software, see <xref linkend="installinglustre"/>.</para>
31 <para>The following optional steps should also be completed, if needed, before the Lustre software is configured:</para>
34 <para><emphasis>Set up a hardware or software RAID on block devices to be used as OSTs or MDTs.</emphasis> For information about setting up RAID, see the documentation for your RAID controller or <xref linkend="configuringstorage"/>.</para>
37 <para><emphasis>Set up network interface bonding on Ethernet interfaces.</emphasis> For information about setting up network interface bonding, see <xref linkend="settingupbonding"/>.</para>
40 <para><emphasis>Set</emphasis> lnet <emphasis>module parameters to specify how Lustre Networking (LNET) is to be configured to work with Lustre and test the LNET configuration.</emphasis> LNET will, by default, use the first TCP/IP interface it discovers on a system. If this network configuration is sufficient, you do not need to configure LNET. LNET configuration is required if you are using Infiniband or multiple Ethernet interfaces.</para>
43 <para>For information about configuring LNET, see <xref linkend="configuringlnet"/>. For information about testing LNET, see <xref linkend="lnetselftest"/>.</para>
46 <para><emphasis>Run the benchmark script <literal>sgpdd-survey</literal> to determine
47 baseline performance of your hardware.</emphasis> Benchmarking your hardware will
48 simplify debugging performance issues that are unrelated to Lustre and ensure you are
49 getting the best possible performance with your installation. For information about
50 running <literal>sgpdd-survey</literal>, see <xref linkend="benchmarkingtests"/>.</para>
54 <para>The <literal>sgpdd-survey</literal> script overwrites the device being tested so it must
55 be run before the OSTs are configured.</para>
57 <para>To configure a simple Lustre file system, complete these steps:</para>
60 <para>Create a combined MGS/MDT file system on a block device. On the MDS node, run:</para>
61 <screen>mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgs --mdt --index=0 <replaceable>/dev/block_device</replaceable></screen>
62 <para>The default file system name (<literal>fsname</literal>) is <literal>lustre</literal>.</para>
64 <para>If you plan to create multiple file systems, the MGS should be created separately on its own dedicated block device, by running:</para>
65 <screen>mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgs <replaceable>/dev/block_device</replaceable></screen>
66 <para>See <xref linkend="dbdoclet.50438194_88063"/> for more details.</para>
69 <listitem xml:id="dbdoclet.addmdtindex">
70 <para>Optional for Lustre 2.4 and later. Add in additional MDTs.</para>
71 <screen>mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgsnode=<replaceable>nid</replaceable> --mdt --index=1 <replaceable>/dev/block_device</replaceable></screen>
72 <note><para>Up to 4095 additional MDTs can be added.</para></note>
75 <para>Mount the combined MGS/MDT file system on the block device. On the MDS node, run:</para>
76 <screen>mount -t lustre <replaceable>/dev/block_device</replaceable> <replaceable>/mount_point</replaceable></screen>
78 <para>If you have created an MGS and an MDT on separate block devices, mount them both.</para>
81 <listitem xml:id="dbdoclet.50438267_pgfId-1290915">
82 <para>Create the OST. On the OSS node, run:</para>
83 <screen>mkfs.lustre --fsname=<replaceable>fsname</replaceable> --mgsnode=<replaceable>MGS_NID</replaceable> --ost --index=<replaceable>OST_index</replaceable> <replaceable>/dev/block_device</replaceable></screen>
84 <para>When you create an OST, you are formatting a <literal>ldiskfs</literal> file system on a block storage device like you would with any local file system.</para>
85 <para>You can have as many OSTs per OSS as the hardware or drivers allow. For more information about storage and memory requirements for a Lustre file system, see <xref linkend="settinguplustresystem"/>.</para>
86 <para>You can only configure one OST per block device. You should create an OST that uses the raw block device and does not use partitioning.</para>
87 <para>You should specify the OST index number at format time in order to simplify translating the OST number in error messages or file striping to the OSS node and block device later on.</para>
88 <para>If you are using block devices that are accessible from multiple OSS nodes, ensure that you mount the OSTs from only one OSS node at at time. It is strongly recommended that multiple-mount protection be enabled for such devices to prevent serious data corruption. For more information about multiple-mount protection, see <xref linkend="managingfailover"/>.</para>
90 <para>Lustre currently supports block devices up to 128 TB on RHEL 5/6 (up to 8 TB on other distributions). If the device size is only slightly larger that 16 TB, it is recommended that you limit the file system size to 16 TB at format time. We recommend that you not place DOS partitions on top of RAID 5/6 block devices due to negative impacts on performance, but instead format the whole disk for the filesystem.</para>
93 <listitem xml:id="dbdoclet.50438267_pgfId-1293955">
94 <para>Mount the OST. On the OSS node where the OST was created, run:</para>
95 <screen>mount -t lustre <replaceable>/dev/block_device</replaceable> <replaceable>/mount_point</replaceable></screen>
98 To create additional OSTs, repeat Step <xref linkend="dbdoclet.50438267_pgfId-1290915"/> and Step <xref linkend="dbdoclet.50438267_pgfId-1293955"/>, specifying the next higher OST index number.</para>
101 <listitem xml:id="dbdoclet.50438267_pgfId-1290934">
102 <para>Mount the Lustre file system on the client. On the client node, run:</para>
103 <screen>mount -t lustre <replaceable>MGS_node</replaceable>:/<replaceable>fsname</replaceable> <replaceable>/mount_point</replaceable>
106 <para>To create additional clients, repeat Step <xref linkend="dbdoclet.50438267_pgfId-1290934"/>.</para>
109 <para>If you have a problem mounting the file system, check the syslogs on the client and all the servers for errors and also check the network settings. A common issue with newly-installed systems is that <literal>hosts.deny</literal> or firewall rules may prevent connections on port 988.</para>
113 <para>Verify that the file system started and is working correctly. Do this by running <literal>lfs df</literal>, <literal>dd</literal> and <literal>ls</literal> commands on the client node.</para>
116 <para><emphasis>(Optional)</emphasis> Run benchmarking tools to validate the performance of hardware and software layers in the cluster. Available tools include:</para>
119 <para><literal>obdfilter-survey</literal> - Characterizes the storage performance of a
120 Lustre file system. For details, see <xref linkend="dbdoclet.50438212_26516"/>.</para>
123 <para><literal>ost-survey</literal> - Performs I/O against OSTs to detect anomalies
124 between otherwise identical disk subsystems. For details, see <xref
125 linkend="dbdoclet.50438212_85136"/>.</para>
132 <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>simple example</tertiary></indexterm>
133 Simple Lustre Configuration Example</title>
134 <para>To see the steps in a simple Lustre configuration, follow this example in which a combined MGS/MDT and two OSTs are created. Three block devices are used, one for the combined MGS/MDS node and one for each OSS node. Common parameters used in the example are listed below, along with individual node parameters.</para>
135 <informaltable frame="all">
137 <colspec colname="c1" colwidth="2*"/>
138 <colspec colname="c2" colwidth="25*"/>
139 <colspec colname="c3" colwidth="25*"/>
140 <colspec colname="c4" colwidth="25*"/>
143 <entry nameend="c2" namest="c1">
144 <para><emphasis role="bold">Common Parameters</emphasis></para>
147 <para><emphasis role="bold">Value</emphasis></para>
150 <para><emphasis role="bold">Description</emphasis></para>
160 <para> <emphasis role="bold">MGS node</emphasis></para>
163 <para> <literal>10.2.0.1@tcp0</literal></para>
166 <para>Node for the combined MGS/MDS</para>
174 <para> <emphasis role="bold">file system</emphasis></para>
177 <para><literal> temp</literal></para>
180 <para>Name of the Lustre file system</para>
188 <para> <emphasis role="bold">network type</emphasis></para>
191 <para> <literal>TCP/IP</literal></para>
194 <para>Network type used for Lustre file system temp</para>
200 <informaltable frame="all">
202 <colspec colname="c1" colwidth="25*"/>
203 <colspec colname="c2" colwidth="25*"/>
204 <colspec colname="c3" colwidth="25*"/>
205 <colspec colname="c4" colwidth="25*"/>
208 <entry nameend="c2" namest="c1">
209 <para><emphasis role="bold">Node Parameters</emphasis></para>
212 <para><emphasis role="bold">Value</emphasis></para>
215 <para><emphasis role="bold">Description</emphasis></para>
221 <entry nameend="c4" namest="c1">
222 <para> MGS/MDS node</para>
230 <para> <emphasis role="bold">MGS/MDS node</emphasis></para>
233 <para> <literal>mdt0</literal></para>
236 <para>MDS in Lustre file system temp</para>
244 <para> <emphasis role="bold">block device</emphasis></para>
247 <para> <literal>/dev/sdb</literal></para>
250 <para>Block device for the combined MGS/MDS node</para>
258 <para> <emphasis role="bold">mount point</emphasis></para>
261 <para> <literal>/mnt/mdt</literal></para>
264 <para>Mount point for the <literal>mdt0</literal> block device (<literal>/dev/sdb</literal>) on the MGS/MDS node</para>
268 <entry nameend="c4" namest="c1">
269 <para> First OSS node</para>
277 <para> <emphasis role="bold">OSS node</emphasis></para>
280 <para><literal> oss0</literal></para>
283 <para>First OSS node in Lustre file system temp</para>
291 <para> <emphasis role="bold">OST</emphasis></para>
294 <para><literal>ost0</literal></para>
297 <para>First OST in Lustre file system temp</para>
305 <para> <emphasis role="bold">block device</emphasis></para>
308 <para> <literal>/dev/sdc</literal></para>
311 <para>Block device for the first OSS node (<literal>oss0</literal>)</para>
319 <para> <emphasis role="bold">mount point</emphasis></para>
322 <para> <literal>/mnt/ost0</literal></para>
325 <para> Mount point for the <literal>ost0</literal> block device (<literal>/dev/sdc</literal>) on the <literal>oss1</literal> node</para>
329 <entry nameend="c4" namest="c1">
330 <para> Second OSS node</para>
338 <para> <emphasis role="bold">OSS node</emphasis></para>
341 <para><literal>oss1</literal></para>
344 <para>Second OSS node in Lustre file system temp</para>
352 <para> <emphasis role="bold">OST</emphasis></para>
355 <para> <literal>ost1</literal></para>
358 <para>Second OST in Lustre file system temp</para>
364 <para> <emphasis role="bold">block device</emphasis></para>
367 <para><literal>/dev/sdd</literal></para>
370 <para>Block device for the second OSS node (oss1)</para>
378 <para> <emphasis role="bold">mount point</emphasis></para>
381 <para><literal>/mnt/ost1</literal></para>
384 <para> Mount point for the <literal>ost1</literal> block device (<literal>/dev/sdd</literal>) on the <literal>oss1</literal> node</para>
388 <entry nameend="c4" namest="c1">
389 <para> Client node</para>
397 <para> <emphasis role="bold">client node</emphasis></para>
400 <para> <literal>client1</literal></para>
403 <para>Client in Lustre file system temp</para>
411 <para> <emphasis role="bold">mount point</emphasis></para>
414 <para> <literal>/lustre</literal></para>
417 <para>Mount point for Lustre file system temp on the <literal>client1</literal> node</para>
424 <para>We recommend that you use 'dotted-quad' notation for IP addresses rather than host names to make it easier to read debug logs and debug configurations with multiple interfaces.</para>
426 <para>For this example, complete the steps below:</para>
429 <para>Create a combined MGS/MDT file system on the block device. On the MDS node, run:</para>
430 <screen>[root@mds /]# mkfs.lustre --fsname=temp --mgs --mdt --index=0 /dev/sdb</screen>
431 <para>This command generates this output:</para>
432 <screen> Permanent disk data:
438 (MDT MGS first_time update )
439 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
440 Parameters: mdt.identity_upcall=/usr/sbin/l_getidentity
442 checking for existing Lustre data: not found
445 formatting backing filesystem ldiskfs on /dev/sdb
446 target name temp-MDTffff
448 options -i 4096 -I 512 -q -O dir_index,uninit_groups -F
449 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-MDTffff -i 4096 -I 512 -q -O
450 dir_index,uninit_groups -F /dev/sdb
451 Writing CONFIGS/mountdata </screen>
454 <para>Mount the combined MGS/MDT file system on the block device. On the MDS node, run:</para>
455 <screen>[root@mds /]# mount -t lustre /dev/sdb /mnt/mdt</screen>
456 <para>This command generates this output:</para>
457 <screen>Lustre: temp-MDT0000: new disk, initializing
458 Lustre: 3009:0:(lproc_mds.c:262:lprocfs_wr_identity_upcall()) temp-MDT0000:
459 group upcall set to /usr/sbin/l_getidentity
460 Lustre: temp-MDT0000.mdt: set parameter identity_upcall=/usr/sbin/l_getidentity
461 Lustre: Server temp-MDT0000 on device /dev/sdb has started </screen>
463 <listitem xml:id="dbdoclet.50438267_pgfId-1291170">
464 <para>Create and mount <literal>ost0</literal>.</para>
465 <para>In this example, the OSTs (<literal>ost0</literal> and <literal>ost1</literal>) are being created on different OSS nodes (<literal>oss0</literal> and <literal>oss1</literal> respectively).</para>
468 <para>Create <literal>ost0</literal>. On <literal>oss0</literal> node, run:</para>
469 <screen>[root@oss0 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 --ost --index=0 /dev/sdc</screen>
470 <para>The command generates this output:</para>
471 <screen> Permanent disk data:
477 (OST first_time update)
478 Persistent mount opts: errors=remount-ro,extents,mballoc
479 Parameters: mgsnode=10.2.0.1@tcp
481 checking for existing Lustre data: not found
484 formatting backing filesystem ldiskfs on /dev/sdc
485 target name temp-OST0000
487 options -I 256 -q -O dir_index,uninit_groups -F
488 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OST0000 -I 256 -q -O
489 dir_index,uninit_groups -F /dev/sdc
490 Writing CONFIGS/mountdata </screen>
493 <para>Mount ost0 on the OSS on which it was created. On <literal>oss0</literal> node, run:</para>
494 <screen>root@oss0 /] mount -t lustre /dev/sdc /mnt/ost0</screen>
495 <para>The command generates this output:</para>
496 <screen>LDISKFS-fs: file extents enabled
497 LDISKFS-fs: mballoc enabled
498 Lustre: temp-OST0000: new disk, initializing
499 Lustre: Server temp-OST0000 on device /dev/sdb has started</screen>
500 <para>Shortly afterwards, this output appears:</para>
501 <screen>Lustre: temp-OST0000: received MDS connection from 10.2.0.1@tcp0
502 Lustre: MDS temp-MDT0000: temp-OST0000_UUID now active, resetting orphans </screen>
507 <para>Create and mount <literal>ost1</literal>.</para>
510 <para>Create ost1. On <literal>oss1</literal> node, run:</para>
511 <screen>[root@oss1 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 \
512 --ost --index=1 /dev/sdd</screen>
513 <para>The command generates this output:</para>
514 <screen> Permanent disk data:
520 (OST first_time update)
521 Persistent mount opts: errors=remount-ro,extents,mballoc
522 Parameters: mgsnode=10.2.0.1@tcp
524 checking for existing Lustre data: not found
527 formatting backing filesystem ldiskfs on /dev/sdd
528 target name temp-OST0001
530 options -I 256 -q -O dir_index,uninit_groups -F
531 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OST0001 -I 256 -q -O
532 dir_index,uninit_groups -F /dev/sdc
533 Writing CONFIGS/mountdata </screen>
536 <para>Mount ost1 on the OSS on which it was created. On <literal>oss1</literal> node, run:</para>
537 <screen>root@oss1 /] mount -t lustre /dev/sdd /mnt/ost1 </screen>
538 <para>The command generates this output:</para>
539 <screen>LDISKFS-fs: file extents enabled
540 LDISKFS-fs: mballoc enabled
541 Lustre: temp-OST0001: new disk, initializing
542 Lustre: Server temp-OST0001 on device /dev/sdb has started</screen>
543 <para>Shortly afterwards, this output appears:</para>
544 <screen>Lustre: temp-OST0001: received MDS connection from 10.2.0.1@tcp0
545 Lustre: MDS temp-MDT0000: temp-OST0001_UUID now active, resetting orphans </screen>
550 <para>Mount the Lustre file system on the client. On the client node, run:</para>
551 <screen>root@client1 /] mount -t lustre 10.2.0.1@tcp0:/temp /lustre </screen>
552 <para>This command generates this output:</para>
553 <screen>Lustre: Client temp-client has started</screen>
556 <para>Verify that the file system started and is working by running the <literal>df</literal>, <literal>dd</literal> and <literal>ls</literal> commands on the client node.</para>
559 <para>Run the <literal>lfs df -h</literal> command:</para>
560 <screen>[root@client1 /] lfs df -h </screen>
561 <para>The <literal>lfs df -h</literal> command lists space usage per OST and the MDT in human-readable format. This command generates output similar to this:</para>
563 UUID bytes Used Available Use% Mounted on
564 temp-MDT0000_UUID 8.0G 400.0M 7.6G 0% /lustre[MDT:0]
565 temp-OST0000_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:0]
566 temp-OST0001_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:1]
567 filesystem summary: 1.6T 800.0M 1.6T 0% /lustre</screen>
570 <para>Run the <literal>lfs df -ih</literal> command.</para>
571 <screen>[root@client1 /] lfs df -ih</screen>
572 <para>The <literal>lfs df -ih</literal> command lists inode usage per OST and the MDT. This command generates output similar to this:</para>
574 UUID Inodes IUsed IFree IUse% Mounted on
575 temp-MDT0000_UUID 2.5M 32 2.5M 0% /lustre[MDT:0]
576 temp-OST0000_UUID 5.5M 54 5.5M 0% /lustre[OST:0]
577 temp-OST0001_UUID 5.5M 54 5.5M 0% /lustre[OST:1]
578 filesystem summary: 2.5M 32 2.5M 0% /lustre</screen>
581 <para>Run the <literal>dd</literal> command:</para>
582 <screen>[root@client1 /] cd /lustre
583 [root@client1 /lustre] dd if=/dev/zero of=/lustre/zero.dat bs=4M count=2</screen>
584 <para>The <literal>dd</literal> command verifies write functionality by creating a file containing all zeros (<literal>0</literal>s). In this command, an 8 MB file is created. This command generates output similar to this:</para>
585 <screen>2+0 records in
587 8388608 bytes (8.4 MB) copied, 0.159628 seconds, 52.6 MB/s</screen>
590 <para>Run the <literal>ls</literal> command:</para>
591 <screen>[root@client1 /lustre] ls -lsah</screen>
592 <para>The <literal>ls -lsah</literal> command lists files and directories in the current working directory. This command generates output similar to this:</para>
594 4.0K drwxr-xr-x 2 root root 4.0K Oct 16 15:27 .
595 8.0K drwxr-xr-x 25 root root 4.0K Oct 16 15:27 ..
596 8.0M -rw-r--r-- 1 root root 8.0M Oct 16 15:27 zero.dat
603 <para>Once the Lustre file system is configured, it is ready for use.</para>
606 <section xml:id="dbdoclet.50438267_76752">
608 <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>additional options</tertiary></indexterm>
609 Additional Configuration Options</title>
610 <para>This section describes how to scale the Lustre file system or make configuration changes using the Lustre configuration utilities.</para>
613 <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>for scale</tertiary></indexterm>
614 Scaling the Lustre File System</title>
615 <para>A Lustre file system can be scaled by adding OSTs or clients. For instructions on creating additional OSTs repeat Step <xref linkend="dbdoclet.50438267_pgfId-1291170"/> and Step <xref linkend="dbdoclet.50438267_pgfId-1293955"/> above. For mounting additional clients, repeat Step <xref linkend="dbdoclet.50438267_pgfId-1290934"/> for each client.</para>
619 <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>striping</tertiary></indexterm>
620 Changing Striping Defaults</title>
621 <para>The default settings for the file layout stripe pattern are shown in <xref linkend="configuringlustre.tab.stripe"/>.</para>
622 <table frame="none" xml:id="configuringlustre.tab.stripe">
623 <title>Default stripe pattern</title>
625 <colspec colname="c1" colwidth="13*"/>
626 <colspec colname="c2" colwidth="13*"/>
627 <colspec colname="c3" colwidth="13*"/>
631 <para><emphasis role="bold">File Layout Parameter</emphasis></para>
634 <para><emphasis role="bold">Default</emphasis></para>
637 <para><emphasis role="bold">Description</emphasis></para>
642 <para> <literal>stripe_size</literal></para>
648 <para> Amount of data to write to one OST before moving to the next OST.</para>
653 <para> <literal>stripe_count</literal></para>
659 <para> The number of OSTs to use for a single file.</para>
664 <para> <literal>start_ost</literal></para>
670 <para> The first OST where objects are created for each file. The default -1 allows the MDS to choose the starting index based on available space and load balancing. <emphasis>It's strongly recommended not to change the default for this parameter to a value other than -1.</emphasis></para>
676 <para>Use the <literal>lfs setstripe</literal> command described in <xref linkend="managingstripingfreespace"/> to change the file layout configuration.</para>
680 <indexterm><primary>Lustre</primary><secondary>configuring</secondary><tertiary>utilities</tertiary></indexterm>
681 Using the Lustre Configuration Utilities</title>
682 <para>If additional configuration is necessary, several configuration utilities are available:</para>
685 <para><literal>mkfs.lustre</literal> - Use to format a disk for a Lustre service.</para>
688 <para><literal>tunefs.lustre</literal> - Use to modify configuration information on a Lustre target disk.</para>
691 <para><literal>lctl</literal> - Use to directly control Lustre via an ioctl interface, allowing various configuration, maintenance and debugging features to be accessed.</para>
694 <para><literal>mount.lustre</literal> - Use to start a Lustre client or target service.</para>
697 <para>For examples using these utilities, see the topic <xref linkend="systemconfigurationutilities"/></para>
698 <para>The <literal>lfs</literal> utility is useful for configuring and querying a variety of options related to files. For more information, see <xref linkend="userutilities"/>.</para>
700 <para>Some sample scripts are included in the directory where Lustre is installed. If you have installed the Lustre source code, the scripts are located in the <literal>lustre/tests</literal> sub-directory. These scripts enable quick setup of some simple standard Lustre configurations.</para>