1 <?xml version='1.0' encoding='utf-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="configuringlustre">
5 <title xml:id="configuringlustre.title">Configuring a Lustre File
7 <para>This chapter shows how to configure a simple Lustre file system
8 comprised of a combined MGS/MDT, an OST and a client. It includes:</para>
12 <xref linkend="dbdoclet.50438267_50692" />
17 <xref linkend="dbdoclet.50438267_76752" />
21 <section xml:id="dbdoclet.50438267_50692">
24 <primary>Lustre</primary>
25 <secondary>configuring</secondary>
26 </indexterm>Configuring a Simple Lustre File System</title>
27 <para>A Lustre file system can be set up in a variety of configurations by
28 using the administrative utilities provided with the Lustre software. The
29 procedure below shows how to configure a simple Lustre file system
30 consisting of a combined MGS/MDS, one OSS with two OSTs, and a client. For
31 an overview of the entire Lustre installation procedure, see
32 <xref linkend="installoverview" />.</para>
33 <para>This configuration procedure assumes you have completed the
39 <emphasis role="bold">Set up and configured your hardware</emphasis>
40 </emphasis>. For more information about hardware requirements, see
41 <xref linkend="settinguplustresystem" />.</para>
45 <emphasis role="bold">Downloaded and installed the Lustre
46 software.</emphasis>For more information about preparing for and
47 installing the Lustre software, see
48 <xref linkend="installinglustre" />.</para>
51 <para>The following optional steps should also be completed, if needed,
52 before the Lustre software is configured:</para>
56 <emphasis>Set up a hardware or software RAID on block devices to be
57 used as OSTs or MDTs.</emphasis>For information about setting up RAID,
58 see the documentation for your RAID controller or
59 <xref linkend="configuringstorage" />.</para>
63 <emphasis>Set up network interface bonding on Ethernet
64 interfaces.</emphasis>For information about setting up network
65 interface bonding, see
66 <xref linkend="settingupbonding" />.</para>
70 <emphasis>Set</emphasis> lnet
71 <emphasis>module parameters to specify how Lustre Networking (LNet) is
72 to be configured to work with a Lustre file system and test the LNet
73 configuration.</emphasis>LNet will, by default, use the first TCP/IP
74 interface it discovers on a system. If this network configuration is
75 sufficient, you do not need to configure LNet. LNet configuration is
76 required if you are using InfiniBand or multiple Ethernet
80 <para>For information about configuring LNet, see
81 <xref linkend="configuringlnet" />. For information about testing LNet, see
83 <xref linkend="lnetselftest" />.</para>
87 <emphasis>Run the benchmark script
88 <literal>sgpdd-survey</literal> to determine baseline performance of
89 your hardware.</emphasis>Benchmarking your hardware will simplify
90 debugging performance issues that are unrelated to the Lustre software
91 and ensure you are getting the best possible performance with your
92 installation. For information about running
93 <literal>sgpdd-survey</literal>, see
94 <xref linkend="benchmarkingtests" />.</para>
99 <literal>sgpdd-survey</literal> script overwrites the device being tested
100 so it must be run before the OSTs are configured.</para>
102 <para>To configure a simple Lustre file system, complete these
106 <para>Create a combined MGS/MDT file system on a block device. On the
107 MDS node, run:</para>
109 mkfs.lustre --fsname=
110 <replaceable>fsname</replaceable> --mgs --mdt --index=0
111 <replaceable>/dev/block_device</replaceable>
113 <para>The default file system name (
114 <literal>fsname</literal>) is
115 <literal>lustre</literal>.</para>
117 <para>If you plan to create multiple file systems, the MGS should be
118 created separately on its own dedicated block device, by
121 mkfs.lustre --fsname=
122 <replaceable>fsname</replaceable> --mgs
123 <replaceable>/dev/block_device</replaceable>
126 <xref linkend="dbdoclet.50438194_88063" />for more details.</para>
129 <listitem xml:id="dbdoclet.addmdtindex">
130 <para>Optional for Lustre software release 2.4 and later.
131 Add in additional MDTs.</para>
133 mkfs.lustre --fsname=
134 <replaceable>fsname</replaceable> --mgsnode=
135 <replaceable>nid</replaceable> --mdt --index=1
136 <replaceable>/dev/block_device</replaceable>
139 <para>Up to 4095 additional MDTs can be added.</para>
143 <para>Mount the combined MGS/MDT file system on the block device. On
144 the MDS node, run:</para>
147 <replaceable>/dev/block_device</replaceable>
148 <replaceable>/mount_point</replaceable>
151 <para>If you have created an MGS and an MDT on separate block
152 devices, mount them both.</para>
155 <listitem xml:id="dbdoclet.50438267_pgfId-1290915">
156 <para>Create the OST. On the OSS node, run:</para>
158 mkfs.lustre --fsname=
159 <replaceable>fsname</replaceable> --mgsnode=
160 <replaceable>MGS_NID</replaceable> --ost --index=
161 <replaceable>OST_index</replaceable>
162 <replaceable>/dev/block_device</replaceable>
164 <para>When you create an OST, you are formatting a
165 <literal>ldiskfs</literal> or
166 <literal>ZFS</literal> file system on a block storage device like you
167 would with any local file system.</para>
168 <para>You can have as many OSTs per OSS as the hardware or drivers
169 allow. For more information about storage and memory requirements for a
170 Lustre file system, see
171 <xref linkend="settinguplustresystem" />.</para>
172 <para>You can only configure one OST per block device. You should
173 create an OST that uses the raw block device and does not use
175 <para>You should specify the OST index number at format time in order
176 to simplify translating the OST number in error messages or file
177 striping to the OSS node and block device later on.</para>
178 <para>If you are using block devices that are accessible from multiple
179 OSS nodes, ensure that you mount the OSTs from only one OSS node at at
180 time. It is strongly recommended that multiple-mount protection be
181 enabled for such devices to prevent serious data corruption. For more
182 information about multiple-mount protection, see
183 <xref linkend="managingfailover" />.</para>
185 <para>The Lustre software currently supports block devices up to 128
186 TB on Red Hat Enterprise Linux 5 and 6 (up to 8 TB on other
187 distributions). If the device size is only slightly larger that 16
188 TB, it is recommended that you limit the file system size to 16 TB at
189 format time. We recommend that you not place DOS partitions on top of
190 RAID 5/6 block devices due to negative impacts on performance, but
191 instead format the whole disk for the file system.</para>
194 <listitem xml:id="dbdoclet.50438267_pgfId-1293955">
195 <para>Mount the OST. On the OSS node where the OST was created,
199 <replaceable>/dev/block_device</replaceable>
200 <replaceable>/mount_point</replaceable>
203 <para>To create additional OSTs, repeat Step
204 <xref linkend="dbdoclet.50438267_pgfId-1290915" />and Step
205 <xref linkend="dbdoclet.50438267_pgfId-1293955" />, specifying the
206 next higher OST index number.</para>
209 <listitem xml:id="dbdoclet.50438267_pgfId-1290934">
210 <para>Mount the Lustre file system on the client. On the client node,
214 <replaceable>MGS_node</replaceable>:/
215 <replaceable>fsname</replaceable>
216 <replaceable>/mount_point</replaceable>
219 <para>To create additional clients, repeat Step
220 <xref linkend="dbdoclet.50438267_pgfId-1290934" />.</para>
223 <para>If you have a problem mounting the file system, check the
224 syslogs on the client and all the servers for errors and also check
225 the network settings. A common issue with newly-installed systems is
227 <literal>hosts.deny</literal> or firewall rules may prevent
228 connections on port 988.</para>
232 <para>Verify that the file system started and is working correctly. Do
234 <literal>lfs df</literal>,
235 <literal>dd</literal> and
236 <literal>ls</literal> commands on the client node.</para>
240 <emphasis>(Optional)</emphasis>Run benchmarking tools to validate the
241 performance of hardware and software layers in the cluster. Available
242 tools include:</para>
246 <literal>obdfilter-survey</literal>- Characterizes the storage
247 performance of a Lustre file system. For details, see
248 <xref linkend="dbdoclet.50438212_26516" />.</para>
252 <literal>ost-survey</literal>- Performs I/O against OSTs to detect
253 anomalies between otherwise identical disk subsystems. For details,
255 <xref linkend="dbdoclet.50438212_85136" />.</para>
263 <primary>Lustre</primary>
264 <secondary>configuring</secondary>
265 <tertiary>simple example</tertiary>
266 </indexterm>Simple Lustre Configuration Example</title>
267 <para>To see the steps to complete for a simple Lustre file system
268 configuration, follow this example in which a combined MGS/MDT and two
269 OSTs are created to form a file system called
270 <literal>temp</literal>. Three block devices are used, one for the
271 combined MGS/MDS node and one for each OSS node. Common parameters used
272 in the example are listed below, along with individual node
274 <informaltable frame="all">
276 <colspec colname="c1" colwidth="2*" />
277 <colspec colname="c2" colwidth="25*" />
278 <colspec colname="c3" colwidth="25*" />
279 <colspec colname="c4" colwidth="25*" />
282 <entry nameend="c2" namest="c1">
284 <emphasis role="bold">Common Parameters</emphasis>
289 <emphasis role="bold">Value</emphasis>
294 <emphasis role="bold">Description</emphasis>
306 <emphasis role="bold">MGS node</emphasis>
311 <literal>10.2.0.1@tcp0</literal>
315 <para>Node for the combined MGS/MDS</para>
324 <emphasis role="bold">file system</emphasis>
329 <literal>temp</literal>
333 <para>Name of the Lustre file system</para>
342 <emphasis role="bold">network type</emphasis>
347 <literal>TCP/IP</literal>
351 <para>Network type used for Lustre file system
352 <literal>temp</literal></para>
358 <informaltable frame="all">
360 <colspec colname="c1" colwidth="25*" />
361 <colspec colname="c2" colwidth="25*" />
362 <colspec colname="c3" colwidth="25*" />
363 <colspec colname="c4" colwidth="25*" />
366 <entry nameend="c2" namest="c1">
368 <emphasis role="bold">Node Parameters</emphasis>
373 <emphasis role="bold">Value</emphasis>
378 <emphasis role="bold">Description</emphasis>
385 <entry nameend="c4" namest="c1">
386 <para>MGS/MDS node</para>
395 <emphasis role="bold">MGS/MDS node</emphasis>
400 <literal>mdt0</literal>
404 <para>MDS in Lustre file system
405 <literal>temp</literal></para>
414 <emphasis role="bold">block device</emphasis>
419 <literal>/dev/sdb</literal>
423 <para>Block device for the combined MGS/MDS node</para>
432 <emphasis role="bold">mount point</emphasis>
437 <literal>/mnt/mdt</literal>
441 <para>Mount point for the
442 <literal>mdt0</literal> block device (
443 <literal>/dev/sdb</literal>) on the MGS/MDS node</para>
447 <entry nameend="c4" namest="c1">
448 <para>First OSS node</para>
457 <emphasis role="bold">OSS node</emphasis>
462 <literal>oss0</literal>
466 <para>First OSS node in Lustre file system
467 <literal>temp</literal></para>
476 <emphasis role="bold">OST</emphasis>
481 <literal>ost0</literal>
485 <para>First OST in Lustre file system
486 <literal>temp</literal></para>
495 <emphasis role="bold">block device</emphasis>
500 <literal>/dev/sdc</literal>
504 <para>Block device for the first OSS node (
505 <literal>oss0</literal>)</para>
514 <emphasis role="bold">mount point</emphasis>
519 <literal>/mnt/ost0</literal>
523 <para>Mount point for the
524 <literal>ost0</literal> block device (
525 <literal>/dev/sdc</literal>) on the
526 <literal>oss1</literal> node</para>
530 <entry nameend="c4" namest="c1">
531 <para>Second OSS node</para>
540 <emphasis role="bold">OSS node</emphasis>
545 <literal>oss1</literal>
549 <para>Second OSS node in Lustre file system
550 <literal>temp</literal></para>
559 <emphasis role="bold">OST</emphasis>
564 <literal>ost1</literal>
568 <para>Second OST in Lustre file system
569 <literal>temp</literal></para>
576 <emphasis role="bold">block device</emphasis>
581 <literal>/dev/sdd</literal>
585 <para>Block device for the second OSS node (oss1)</para>
594 <emphasis role="bold">mount point</emphasis>
599 <literal>/mnt/ost1</literal>
603 <para>Mount point for the
604 <literal>ost1</literal> block device (
605 <literal>/dev/sdd</literal>) on the
606 <literal>oss1</literal> node</para>
610 <entry nameend="c4" namest="c1">
611 <para>Client node</para>
620 <emphasis role="bold">client node</emphasis>
625 <literal>client1</literal>
629 <para>Client in Lustre file system
630 <literal>temp</literal></para>
639 <emphasis role="bold">mount point</emphasis>
644 <literal>/lustre</literal>
648 <para>Mount point for Lustre file system
649 <literal>temp</literal> on the
650 <literal>client1</literal> node</para>
657 <para>We recommend that you use 'dotted-quad' notation for IP addresses
658 rather than host names to make it easier to read debug logs and debug
659 configurations with multiple interfaces.</para>
661 <para>For this example, complete the steps below:</para>
664 <para>Create a combined MGS/MDT file system on the block device. On
665 the MDS node, run:</para>
667 [root@mds /]# mkfs.lustre --fsname=temp --mgs --mdt --index=0 /dev/sdb
669 <para>This command generates this output:</para>
677 (MDT MGS first_time update )
678 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
679 Parameters: mdt.identity_upcall=/usr/sbin/l_getidentity
681 checking for existing Lustre data: not found
684 formatting backing filesystem ldiskfs on /dev/sdb
685 target name temp-MDTffff
687 options -i 4096 -I 512 -q -O dir_index,uninit_groups -F
688 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-MDTffff -i 4096 -I 512 -q -O
689 dir_index,uninit_groups -F /dev/sdb
690 Writing CONFIGS/mountdata
694 <para>Mount the combined MGS/MDT file system on the block device. On
695 the MDS node, run:</para>
697 [root@mds /]# mount -t lustre /dev/sdb /mnt/mdt
699 <para>This command generates this output:</para>
701 Lustre: temp-MDT0000: new disk, initializing
702 Lustre: 3009:0:(lproc_mds.c:262:lprocfs_wr_identity_upcall()) temp-MDT0000:
703 group upcall set to /usr/sbin/l_getidentity
704 Lustre: temp-MDT0000.mdt: set parameter identity_upcall=/usr/sbin/l_getidentity
705 Lustre: Server temp-MDT0000 on device /dev/sdb has started
708 <listitem xml:id="dbdoclet.50438267_pgfId-1291170">
709 <para>Create and mount
710 <literal>ost0</literal>.</para>
711 <para>In this example, the OSTs (
712 <literal>ost0</literal> and
713 <literal>ost1</literal>) are being created on different OSS nodes (
714 <literal>oss0</literal> and
715 <literal>oss1</literal> respectively).</para>
719 <literal>ost0</literal>. On
720 <literal>oss0</literal> node, run:</para>
722 [root@oss0 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 --ost
725 <para>The command generates this output:</para>
733 (OST first_time update)
734 Persistent mount opts: errors=remount-ro,extents,mballoc
735 Parameters: mgsnode=10.2.0.1@tcp
737 checking for existing Lustre data: not found
740 formatting backing filesystem ldiskfs on /dev/sdc
741 target name temp-OST0000
743 options -I 256 -q -O dir_index,uninit_groups -F
744 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OST0000 -I 256 -q -O
745 dir_index,uninit_groups -F /dev/sdc
746 Writing CONFIGS/mountdata
750 <para>Mount ost0 on the OSS on which it was created. On
751 <literal>oss0</literal> node, run:</para>
753 root@oss0 /] mount -t lustre /dev/sdc /mnt/ost0
755 <para>The command generates this output:</para>
757 LDISKFS-fs: file extents enabled
758 LDISKFS-fs: mballoc enabled
759 Lustre: temp-OST0000: new disk, initializing
760 Lustre: Server temp-OST0000 on device /dev/sdb has started
762 <para>Shortly afterwards, this output appears:</para>
764 Lustre: temp-OST0000: received MDS connection from 10.2.0.1@tcp0
765 Lustre: MDS temp-MDT0000: temp-OST0000_UUID now active, resetting orphans
771 <para>Create and mount
772 <literal>ost1</literal>.</para>
775 <para>Create ost1. On
776 <literal>oss1</literal> node, run:</para>
778 [root@oss1 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 \
779 --ost --index=1 /dev/sdd
781 <para>The command generates this output:</para>
789 (OST first_time update)
790 Persistent mount opts: errors=remount-ro,extents,mballoc
791 Parameters: mgsnode=10.2.0.1@tcp
793 checking for existing Lustre data: not found
796 formatting backing filesystem ldiskfs on /dev/sdd
797 target name temp-OST0001
799 options -I 256 -q -O dir_index,uninit_groups -F
800 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OST0001 -I 256 -q -O
801 dir_index,uninit_groups -F /dev/sdc
802 Writing CONFIGS/mountdata
806 <para>Mount ost1 on the OSS on which it was created. On
807 <literal>oss1</literal> node, run:</para>
809 root@oss1 /] mount -t lustre /dev/sdd /mnt/ost1
811 <para>The command generates this output:</para>
813 LDISKFS-fs: file extents enabled
814 LDISKFS-fs: mballoc enabled
815 Lustre: temp-OST0001: new disk, initializing
816 Lustre: Server temp-OST0001 on device /dev/sdb has started
818 <para>Shortly afterwards, this output appears:</para>
820 Lustre: temp-OST0001: received MDS connection from 10.2.0.1@tcp0
821 Lustre: MDS temp-MDT0000: temp-OST0001_UUID now active, resetting orphans
827 <para>Mount the Lustre file system on the client. On the client node,
830 root@client1 /] mount -t lustre 10.2.0.1@tcp0:/temp /lustre
832 <para>This command generates this output:</para>
834 Lustre: Client temp-client has started
838 <para>Verify that the file system started and is working by running
840 <literal>df</literal>,
841 <literal>dd</literal> and
842 <literal>ls</literal> commands on the client node.</para>
846 <literal>lfs df -h</literal> command:</para>
848 [root@client1 /] lfs df -h
851 <literal>lfs df -h</literal> command lists space usage per OST and
852 the MDT in human-readable format. This command generates output
853 similar to this:</para>
855 UUID bytes Used Available Use% Mounted on
856 temp-MDT0000_UUID 8.0G 400.0M 7.6G 0% /lustre[MDT:0]
857 temp-OST0000_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:0]
858 temp-OST0001_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:1]
859 filesystem summary: 1.6T 800.0M 1.6T 0% /lustre
864 <literal>lfs df -ih</literal> command.</para>
866 [root@client1 /] lfs df -ih
869 <literal>lfs df -ih</literal> command lists inode usage per OST
870 and the MDT. This command generates output similar to
873 UUID Inodes IUsed IFree IUse% Mounted on
874 temp-MDT0000_UUID 2.5M 32 2.5M 0% /lustre[MDT:0]
875 temp-OST0000_UUID 5.5M 54 5.5M 0% /lustre[OST:0]
876 temp-OST0001_UUID 5.5M 54 5.5M 0% /lustre[OST:1]
877 filesystem summary: 2.5M 32 2.5M 0% /lustre
882 <literal>dd</literal> command:</para>
884 [root@client1 /] cd /lustre
885 [root@client1 /lustre] dd if=/dev/zero of=/lustre/zero.dat bs=4M count=2
888 <literal>dd</literal> command verifies write functionality by
889 creating a file containing all zeros (
890 <literal>0</literal>s). In this command, an 8 MB file is created.
891 This command generates output similar to this:</para>
895 8388608 bytes (8.4 MB) copied, 0.159628 seconds, 52.6 MB/s
900 <literal>ls</literal> command:</para>
902 [root@client1 /lustre] ls -lsah
905 <literal>ls -lsah</literal> command lists files and directories in
906 the current working directory. This command generates output
907 similar to this:</para>
910 4.0K drwxr-xr-x 2 root root 4.0K Oct 16 15:27 .
911 8.0K drwxr-xr-x 25 root root 4.0K Oct 16 15:27 ..
912 8.0M -rw-r--r-- 1 root root 8.0M Oct 16 15:27 zero.dat
919 <para>Once the Lustre file system is configured, it is ready for
923 <section xml:id="dbdoclet.50438267_76752">
926 <primary>Lustre</primary>
927 <secondary>configuring</secondary>
928 <tertiary>additional options</tertiary>
929 </indexterm>Additional Configuration Options</title>
930 <para>This section describes how to scale the Lustre file system or make
931 configuration changes using the Lustre configuration utilities.</para>
935 <primary>Lustre</primary>
936 <secondary>configuring</secondary>
937 <tertiary>for scale</tertiary>
938 </indexterm>Scaling the Lustre File System</title>
939 <para>A Lustre file system can be scaled by adding OSTs or clients. For
940 instructions on creating additional OSTs repeat Step
941 <xref linkend="dbdoclet.50438267_pgfId-1291170" />and Step
942 <xref linkend="dbdoclet.50438267_pgfId-1293955" />above. For mounting
943 additional clients, repeat Step
944 <xref linkend="dbdoclet.50438267_pgfId-1290934" />for each client.</para>
949 <primary>Lustre</primary>
950 <secondary>configuring</secondary>
951 <tertiary>striping</tertiary>
952 </indexterm>Changing Striping Defaults</title>
953 <para>The default settings for the file layout stripe pattern are shown
955 <xref linkend="configuringlustre.tab.stripe" />.</para>
956 <table frame="none" xml:id="configuringlustre.tab.stripe">
957 <title>Default stripe pattern</title>
959 <colspec colname="c1" colwidth="13*" />
960 <colspec colname="c2" colwidth="13*" />
961 <colspec colname="c3" colwidth="13*" />
966 <emphasis role="bold">File Layout Parameter</emphasis>
971 <emphasis role="bold">Default</emphasis>
976 <emphasis role="bold">Description</emphasis>
983 <literal>stripe_size</literal>
990 <para>Amount of data to write to one OST before moving to the
997 <literal>stripe_count</literal>
1004 <para>The number of OSTs to use for a single file.</para>
1010 <literal>start_ost</literal>
1017 <para>The first OST where objects are created for each file.
1018 The default -1 allows the MDS to choose the starting index
1019 based on available space and load balancing.
1020 <emphasis>It's strongly recommended not to change the default
1021 for this parameter to a value other than -1.</emphasis></para>
1028 <literal>lfs setstripe</literal> command described in
1029 <xref linkend="managingstripingfreespace" />to change the file layout
1030 configuration.</para>
1032 <section remap="h3">
1035 <primary>Lustre</primary>
1036 <secondary>configuring</secondary>
1037 <tertiary>utilities</tertiary>
1038 </indexterm>Using the Lustre Configuration Utilities</title>
1039 <para>If additional configuration is necessary, several configuration
1040 utilities are available:</para>
1044 <literal>mkfs.lustre</literal>- Use to format a disk for a Lustre
1049 <literal>tunefs.lustre</literal>- Use to modify configuration
1050 information on a Lustre target disk.</para>
1054 <literal>lctl</literal>- Use to directly control Lustre features via
1056 <literal>ioctl</literal> interface, allowing various configuration,
1057 maintenance and debugging features to be accessed.</para>
1061 <literal>mount.lustre</literal>- Use to start a Lustre client or
1062 target service.</para>
1065 <para>For examples using these utilities, see the topic
1066 <xref linkend="systemconfigurationutilities" /></para>
1068 <literal>lfs</literal> utility is useful for configuring and querying a
1069 variety of options related to files. For more information, see
1070 <xref linkend="userutilities" />.</para>
1072 <para>Some sample scripts are included in the directory where the
1073 Lustre software is installed. If you have installed the Lustre source
1074 code, the scripts are located in the
1075 <literal>lustre/tests</literal> sub-directory. These scripts enable
1076 quick setup of some simple standard Lustre configurations.</para>