1 <?xml version='1.0' encoding='utf-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="configuringlustre">
5 <title xml:id="configuringlustre.title">Configuring a Lustre File
7 <para>This chapter shows how to configure a simple Lustre file system
8 comprised of a combined MGS/MDT, an OST and a client. It includes:</para>
12 <xref linkend="dbdoclet.50438267_50692" />
17 <xref linkend="dbdoclet.50438267_76752" />
21 <section xml:id="dbdoclet.50438267_50692">
24 <primary>Lustre</primary>
25 <secondary>configuring</secondary>
26 </indexterm>Configuring a Simple Lustre File System</title>
27 <para>A Lustre file system can be set up in a variety of configurations by
28 using the administrative utilities provided with the Lustre software. The
29 procedure below shows how to configure a simple Lustre file system
30 consisting of a combined MGS/MDS, one OSS with two OSTs, and a client. For
31 an overview of the entire Lustre installation procedure, see
32 <xref linkend="installoverview" />.</para>
33 <para>This configuration procedure assumes you have completed the
39 <emphasis role="bold">Set up and configured your hardware</emphasis>
40 </emphasis>. For more information about hardware requirements, see
41 <xref linkend="settinguplustresystem" />.</para>
45 <emphasis role="bold">Downloaded and installed the Lustre
46 software.</emphasis>For more information about preparing for and
47 installing the Lustre software, see
48 <xref linkend="installinglustre" />.</para>
51 <para>The following optional steps should also be completed, if needed,
52 before the Lustre software is configured:</para>
56 <emphasis>Set up a hardware or software RAID on block devices to be
57 used as OSTs or MDTs.</emphasis>For information about setting up RAID,
58 see the documentation for your RAID controller or
59 <xref linkend="configuringstorage" />.</para>
63 <emphasis>Set up network interface bonding on Ethernet
64 interfaces.</emphasis>For information about setting up network
65 interface bonding, see
66 <xref linkend="settingupbonding" />.</para>
70 <emphasis>Set</emphasis> lnet
71 <emphasis>module parameters to specify how Lustre Networking (LNet) is
72 to be configured to work with a Lustre file system and test the LNet
73 configuration.</emphasis>LNet will, by default, use the first TCP/IP
74 interface it discovers on a system. If this network configuration is
75 sufficient, you do not need to configure LNet. LNet configuration is
76 required if you are using InfiniBand or multiple Ethernet
80 <para>For information about configuring LNet, see
81 <xref linkend="configuringlnet" />. For information about testing LNet, see
83 <xref linkend="lnetselftest" />.</para>
87 <emphasis>Run the benchmark script
88 <literal>sgpdd-survey</literal> to determine baseline performance of
89 your hardware.</emphasis>Benchmarking your hardware will simplify
90 debugging performance issues that are unrelated to the Lustre software
91 and ensure you are getting the best possible performance with your
92 installation. For information about running
93 <literal>sgpdd-survey</literal>, see
94 <xref linkend="benchmarkingtests" />.</para>
99 <literal>sgpdd-survey</literal> script overwrites the device being tested
100 so it must be run before the OSTs are configured.</para>
102 <para>To configure a simple Lustre file system, complete these
106 <para>Create a combined MGS/MDT file system on a block device. On the
107 MDS node, run:</para>
109 mkfs.lustre --fsname=
110 <replaceable>fsname</replaceable> --mgs --mdt --index=0
111 <replaceable>/dev/block_device</replaceable>
113 <para>The default file system name (
114 <literal>fsname</literal>) is
115 <literal>lustre</literal>.</para>
117 <para>If you plan to create multiple file systems, the MGS should be
118 created separately on its own dedicated block device, by
121 mkfs.lustre --fsname=
122 <replaceable>fsname</replaceable> --mgs
123 <replaceable>/dev/block_device</replaceable>
126 <xref linkend="dbdoclet.50438194_88063" />for more details.</para>
129 <listitem xml:id="dbdoclet.addmdtindex">
130 <para>Optionally add in additional MDTs.</para>
132 mkfs.lustre --fsname=
133 <replaceable>fsname</replaceable> --mgsnode=
134 <replaceable>nid</replaceable> --mdt --index=1
135 <replaceable>/dev/block_device</replaceable>
138 <para>Up to 4095 additional MDTs can be added.</para>
142 <para>Mount the combined MGS/MDT file system on the block device. On
143 the MDS node, run:</para>
146 <replaceable>/dev/block_device</replaceable>
147 <replaceable>/mount_point</replaceable>
150 <para>If you have created an MGS and an MDT on separate block
151 devices, mount them both.</para>
154 <listitem xml:id="dbdoclet.format_ost">
155 <para>Create the OST. On the OSS node, run:</para>
157 mkfs.lustre --fsname=
158 <replaceable>fsname</replaceable> --mgsnode=
159 <replaceable>MGS_NID</replaceable> --ost --index=
160 <replaceable>OST_index</replaceable>
161 <replaceable>/dev/block_device</replaceable>
163 <para>When you create an OST, you are formatting a
164 <literal>ldiskfs</literal> or
165 <literal>ZFS</literal> file system on a block storage device like you
166 would with any local file system.</para>
167 <para>You can have as many OSTs per OSS as the hardware or drivers
168 allow. For more information about storage and memory requirements for a
169 Lustre file system, see
170 <xref linkend="settinguplustresystem" />.</para>
171 <para>You can only configure one OST per block device. You should
172 create an OST that uses the raw block device and does not use
174 <para>You should specify the OST index number at format time in order
175 to simplify translating the OST number in error messages or file
176 striping to the OSS node and block device later on.</para>
177 <para>If you are using block devices that are accessible from multiple
178 OSS nodes, ensure that you mount the OSTs from only one OSS node at at
179 time. It is strongly recommended that multiple-mount protection be
180 enabled for such devices to prevent serious data corruption. For more
181 information about multiple-mount protection, see
182 <xref linkend="managingfailover" />.</para>
184 <para>The Lustre software currently supports block devices up to 128
185 TB on Red Hat Enterprise Linux 5 and 6 (up to 8 TB on other
186 distributions). If the device size is only slightly larger that 16
187 TB, it is recommended that you limit the file system size to 16 TB at
188 format time. We recommend that you not place DOS partitions on top of
189 RAID 5/6 block devices due to negative impacts on performance, but
190 instead format the whole disk for the file system.</para>
193 <listitem xml:id="dbdoclet.mount_ost">
194 <para>Mount the OST. On the OSS node where the OST was created,
198 <replaceable>/dev/block_device</replaceable>
199 <replaceable>/mount_point</replaceable>
202 <para>To create additional OSTs, repeat Step
203 <xref linkend="dbdoclet.format_ost" />and Step
204 <xref linkend="dbdoclet.mount_ost" />, specifying the
205 next higher OST index number.</para>
208 <listitem xml:id="dbdoclet.mount_on_client">
209 <para>Mount the Lustre file system on the client. On the client node,
213 <replaceable>MGS_node</replaceable>:/
214 <replaceable>fsname</replaceable>
215 <replaceable>/mount_point</replaceable>
218 <para>To mount the filesystem on additional clients, repeat Step
219 <xref linkend="dbdoclet.mount_on_client" />.</para>
222 <para>If you have a problem mounting the file system, check the
223 syslogs on the client and all the servers for errors and also check
224 the network settings. A common issue with newly-installed systems is
226 <literal>hosts.deny</literal> or firewall rules may prevent
227 connections on port 988.</para>
231 <para>Verify that the file system started and is working correctly. Do
233 <literal>lfs df</literal>,
234 <literal>dd</literal> and
235 <literal>ls</literal> commands on the client node.</para>
239 <emphasis>(Optional)</emphasis>Run benchmarking tools to validate the
240 performance of hardware and software layers in the cluster. Available
241 tools include:</para>
245 <literal>obdfilter-survey</literal>- Characterizes the storage
246 performance of a Lustre file system. For details, see
247 <xref linkend="benchmark.ost_perf" />.</para>
251 <literal>ost-survey</literal>- Performs I/O against OSTs to detect
252 anomalies between otherwise identical disk subsystems. For details,
254 <xref linkend="benchmark.ost_io" />.</para>
262 <primary>Lustre</primary>
263 <secondary>configuring</secondary>
264 <tertiary>simple example</tertiary>
265 </indexterm>Simple Lustre Configuration Example</title>
266 <para>To see the steps to complete for a simple Lustre file system
267 configuration, follow this example in which a combined MGS/MDT and two
268 OSTs are created to form a file system called
269 <literal>temp</literal>. Three block devices are used, one for the
270 combined MGS/MDS node and one for each OSS node. Common parameters used
271 in the example are listed below, along with individual node
273 <informaltable frame="all">
275 <colspec colname="c1" colwidth="2*" />
276 <colspec colname="c2" colwidth="25*" />
277 <colspec colname="c3" colwidth="25*" />
278 <colspec colname="c4" colwidth="25*" />
281 <entry nameend="c2" namest="c1">
283 <emphasis role="bold">Common Parameters</emphasis>
288 <emphasis role="bold">Value</emphasis>
293 <emphasis role="bold">Description</emphasis>
305 <emphasis role="bold">MGS node</emphasis>
310 <literal>10.2.0.1@tcp0</literal>
314 <para>Node for the combined MGS/MDS</para>
323 <emphasis role="bold">file system</emphasis>
328 <literal>temp</literal>
332 <para>Name of the Lustre file system</para>
341 <emphasis role="bold">network type</emphasis>
346 <literal>TCP/IP</literal>
350 <para>Network type used for Lustre file system
351 <literal>temp</literal></para>
357 <informaltable frame="all">
359 <colspec colname="c1" colwidth="25*" />
360 <colspec colname="c2" colwidth="25*" />
361 <colspec colname="c3" colwidth="25*" />
362 <colspec colname="c4" colwidth="25*" />
365 <entry nameend="c2" namest="c1">
367 <emphasis role="bold">Node Parameters</emphasis>
372 <emphasis role="bold">Value</emphasis>
377 <emphasis role="bold">Description</emphasis>
384 <entry nameend="c4" namest="c1">
385 <para>MGS/MDS node</para>
394 <emphasis role="bold">MGS/MDS node</emphasis>
399 <literal>mdt0</literal>
403 <para>MDS in Lustre file system
404 <literal>temp</literal></para>
413 <emphasis role="bold">block device</emphasis>
418 <literal>/dev/sdb</literal>
422 <para>Block device for the combined MGS/MDS node</para>
431 <emphasis role="bold">mount point</emphasis>
436 <literal>/mnt/mdt</literal>
440 <para>Mount point for the
441 <literal>mdt0</literal> block device (
442 <literal>/dev/sdb</literal>) on the MGS/MDS node</para>
446 <entry nameend="c4" namest="c1">
447 <para>First OSS node</para>
456 <emphasis role="bold">OSS node</emphasis>
461 <literal>oss0</literal>
465 <para>First OSS node in Lustre file system
466 <literal>temp</literal></para>
475 <emphasis role="bold">OST</emphasis>
480 <literal>ost0</literal>
484 <para>First OST in Lustre file system
485 <literal>temp</literal></para>
494 <emphasis role="bold">block device</emphasis>
499 <literal>/dev/sdc</literal>
503 <para>Block device for the first OSS node (
504 <literal>oss0</literal>)</para>
513 <emphasis role="bold">mount point</emphasis>
518 <literal>/mnt/ost0</literal>
522 <para>Mount point for the
523 <literal>ost0</literal> block device (
524 <literal>/dev/sdc</literal>) on the
525 <literal>oss1</literal> node</para>
529 <entry nameend="c4" namest="c1">
530 <para>Second OSS node</para>
539 <emphasis role="bold">OSS node</emphasis>
544 <literal>oss1</literal>
548 <para>Second OSS node in Lustre file system
549 <literal>temp</literal></para>
558 <emphasis role="bold">OST</emphasis>
563 <literal>ost1</literal>
567 <para>Second OST in Lustre file system
568 <literal>temp</literal></para>
575 <emphasis role="bold">block device</emphasis>
580 <literal>/dev/sdd</literal>
584 <para>Block device for the second OSS node (oss1)</para>
593 <emphasis role="bold">mount point</emphasis>
598 <literal>/mnt/ost1</literal>
602 <para>Mount point for the
603 <literal>ost1</literal> block device (
604 <literal>/dev/sdd</literal>) on the
605 <literal>oss1</literal> node</para>
609 <entry nameend="c4" namest="c1">
610 <para>Client node</para>
619 <emphasis role="bold">client node</emphasis>
624 <literal>client1</literal>
628 <para>Client in Lustre file system
629 <literal>temp</literal></para>
638 <emphasis role="bold">mount point</emphasis>
643 <literal>/lustre</literal>
647 <para>Mount point for Lustre file system
648 <literal>temp</literal> on the
649 <literal>client1</literal> node</para>
656 <para>We recommend that you use 'dotted-quad' notation for IP addresses
657 rather than host names to make it easier to read debug logs and debug
658 configurations with multiple interfaces.</para>
660 <para>For this example, complete the steps below:</para>
663 <para>Create a combined MGS/MDT file system on the block device. On
664 the MDS node, run:</para>
666 [root@mds /]# mkfs.lustre --fsname=temp --mgs --mdt --index=0 /dev/sdb
668 <para>This command generates this output:</para>
676 (MDT MGS first_time update )
677 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
678 Parameters: mdt.identity_upcall=/usr/sbin/l_getidentity
680 checking for existing Lustre data: not found
683 formatting backing filesystem ldiskfs on /dev/sdb
684 target name temp-MDTffff
686 options -i 4096 -I 512 -q -O dir_index,uninit_groups -F
687 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-MDTffff -i 4096 -I 512 -q -O
688 dir_index,uninit_groups -F /dev/sdb
689 Writing CONFIGS/mountdata
693 <para>Mount the combined MGS/MDT file system on the block device. On
694 the MDS node, run:</para>
696 [root@mds /]# mount -t lustre /dev/sdb /mnt/mdt
698 <para>This command generates this output:</para>
700 Lustre: temp-MDT0000: new disk, initializing
701 Lustre: 3009:0:(lproc_mds.c:262:lprocfs_wr_identity_upcall()) temp-MDT0000:
702 group upcall set to /usr/sbin/l_getidentity
703 Lustre: temp-MDT0000.mdt: set parameter identity_upcall=/usr/sbin/l_getidentity
704 Lustre: Server temp-MDT0000 on device /dev/sdb has started
707 <listitem xml:id="dbdoclet.create_and_mount_ost">
708 <para>Create and mount
709 <literal>ost0</literal>.</para>
710 <para>In this example, the OSTs (
711 <literal>ost0</literal> and
712 <literal>ost1</literal>) are being created on different OSS nodes (
713 <literal>oss0</literal> and
714 <literal>oss1</literal> respectively).</para>
718 <literal>ost0</literal>. On
719 <literal>oss0</literal> node, run:</para>
721 [root@oss0 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 --ost
724 <para>The command generates this output:</para>
732 (OST first_time update)
733 Persistent mount opts: errors=remount-ro,extents,mballoc
734 Parameters: mgsnode=10.2.0.1@tcp
736 checking for existing Lustre data: not found
739 formatting backing filesystem ldiskfs on /dev/sdc
740 target name temp-OST0000
742 options -I 256 -q -O dir_index,uninit_groups -F
743 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OST0000 -I 256 -q -O
744 dir_index,uninit_groups -F /dev/sdc
745 Writing CONFIGS/mountdata
749 <para>Mount ost0 on the OSS on which it was created. On
750 <literal>oss0</literal> node, run:</para>
752 root@oss0 /] mount -t lustre /dev/sdc /mnt/ost0
754 <para>The command generates this output:</para>
756 LDISKFS-fs: file extents enabled
757 LDISKFS-fs: mballoc enabled
758 Lustre: temp-OST0000: new disk, initializing
759 Lustre: Server temp-OST0000 on device /dev/sdb has started
761 <para>Shortly afterwards, this output appears:</para>
763 Lustre: temp-OST0000: received MDS connection from 10.2.0.1@tcp0
764 Lustre: MDS temp-MDT0000: temp-OST0000_UUID now active, resetting orphans
770 <para>Create and mount
771 <literal>ost1</literal>.</para>
774 <para>Create ost1. On
775 <literal>oss1</literal> node, run:</para>
777 [root@oss1 /]# mkfs.lustre --fsname=temp --mgsnode=10.2.0.1@tcp0 \
778 --ost --index=1 /dev/sdd
780 <para>The command generates this output:</para>
788 (OST first_time update)
789 Persistent mount opts: errors=remount-ro,extents,mballoc
790 Parameters: mgsnode=10.2.0.1@tcp
792 checking for existing Lustre data: not found
795 formatting backing filesystem ldiskfs on /dev/sdd
796 target name temp-OST0001
798 options -I 256 -q -O dir_index,uninit_groups -F
799 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OST0001 -I 256 -q -O
800 dir_index,uninit_groups -F /dev/sdc
801 Writing CONFIGS/mountdata
805 <para>Mount ost1 on the OSS on which it was created. On
806 <literal>oss1</literal> node, run:</para>
808 root@oss1 /] mount -t lustre /dev/sdd /mnt/ost1
810 <para>The command generates this output:</para>
812 LDISKFS-fs: file extents enabled
813 LDISKFS-fs: mballoc enabled
814 Lustre: temp-OST0001: new disk, initializing
815 Lustre: Server temp-OST0001 on device /dev/sdb has started
817 <para>Shortly afterwards, this output appears:</para>
819 Lustre: temp-OST0001: received MDS connection from 10.2.0.1@tcp0
820 Lustre: MDS temp-MDT0000: temp-OST0001_UUID now active, resetting orphans
826 <para>Mount the Lustre file system on the client. On the client node,
829 root@client1 /] mount -t lustre 10.2.0.1@tcp0:/temp /lustre
831 <para>This command generates this output:</para>
833 Lustre: Client temp-client has started
837 <para>Verify that the file system started and is working by running
839 <literal>df</literal>,
840 <literal>dd</literal> and
841 <literal>ls</literal> commands on the client node.</para>
845 <literal>lfs df -h</literal> command:</para>
847 [root@client1 /] lfs df -h
850 <literal>lfs df -h</literal> command lists space usage per OST and
851 the MDT in human-readable format. This command generates output
852 similar to this:</para>
854 UUID bytes Used Available Use% Mounted on
855 temp-MDT0000_UUID 8.0G 400.0M 7.6G 0% /lustre[MDT:0]
856 temp-OST0000_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:0]
857 temp-OST0001_UUID 800.0G 400.0M 799.6G 0% /lustre[OST:1]
858 filesystem summary: 1.6T 800.0M 1.6T 0% /lustre
863 <literal>lfs df -ih</literal> command.</para>
865 [root@client1 /] lfs df -ih
868 <literal>lfs df -ih</literal> command lists inode usage per OST
869 and the MDT. This command generates output similar to
872 UUID Inodes IUsed IFree IUse% Mounted on
873 temp-MDT0000_UUID 2.5M 32 2.5M 0% /lustre[MDT:0]
874 temp-OST0000_UUID 5.5M 54 5.5M 0% /lustre[OST:0]
875 temp-OST0001_UUID 5.5M 54 5.5M 0% /lustre[OST:1]
876 filesystem summary: 2.5M 32 2.5M 0% /lustre
881 <literal>dd</literal> command:</para>
883 [root@client1 /] cd /lustre
884 [root@client1 /lustre] dd if=/dev/zero of=/lustre/zero.dat bs=4M count=2
887 <literal>dd</literal> command verifies write functionality by
888 creating a file containing all zeros (
889 <literal>0</literal>s). In this command, an 8 MB file is created.
890 This command generates output similar to this:</para>
894 8388608 bytes (8.4 MB) copied, 0.159628 seconds, 52.6 MB/s
899 <literal>ls</literal> command:</para>
901 [root@client1 /lustre] ls -lsah
904 <literal>ls -lsah</literal> command lists files and directories in
905 the current working directory. This command generates output
906 similar to this:</para>
909 4.0K drwxr-xr-x 2 root root 4.0K Oct 16 15:27 .
910 8.0K drwxr-xr-x 25 root root 4.0K Oct 16 15:27 ..
911 8.0M -rw-r--r-- 1 root root 8.0M Oct 16 15:27 zero.dat
918 <para>Once the Lustre file system is configured, it is ready for
922 <section xml:id="dbdoclet.50438267_76752">
925 <primary>Lustre</primary>
926 <secondary>configuring</secondary>
927 <tertiary>additional options</tertiary>
928 </indexterm>Additional Configuration Options</title>
929 <para>This section describes how to scale the Lustre file system or make
930 configuration changes using the Lustre configuration utilities.</para>
934 <primary>Lustre</primary>
935 <secondary>configuring</secondary>
936 <tertiary>for scale</tertiary>
937 </indexterm>Scaling the Lustre File System</title>
938 <para>A Lustre file system can be scaled by adding OSTs or clients. For
939 instructions on creating additional OSTs repeat Step
940 <xref linkend="dbdoclet.create_and_mount_ost" />and Step
941 <xref linkend="dbdoclet.mount_ost" />above. For mounting
942 additional clients, repeat Step
943 <xref linkend="dbdoclet.mount_on_client" />for each client.</para>
948 <primary>Lustre</primary>
949 <secondary>configuring</secondary>
950 <tertiary>striping</tertiary>
951 </indexterm>Changing Striping Defaults</title>
952 <para>The default settings for the file layout stripe pattern are shown
954 <xref linkend="configuringlustre.tab.stripe" />.</para>
955 <table frame="none" xml:id="configuringlustre.tab.stripe">
956 <title>Default stripe pattern</title>
958 <colspec colname="c1" colwidth="13*" />
959 <colspec colname="c2" colwidth="13*" />
960 <colspec colname="c3" colwidth="13*" />
965 <emphasis role="bold">File Layout Parameter</emphasis>
970 <emphasis role="bold">Default</emphasis>
975 <emphasis role="bold">Description</emphasis>
982 <literal>stripe_size</literal>
989 <para>Amount of data to write to one OST before moving to the
996 <literal>stripe_count</literal>
1003 <para>The number of OSTs to use for a single file.</para>
1009 <literal>start_ost</literal>
1016 <para>The first OST where objects are created for each file.
1017 The default -1 allows the MDS to choose the starting index
1018 based on available space and load balancing.
1019 <emphasis>It's strongly recommended not to change the default
1020 for this parameter to a value other than -1.</emphasis></para>
1027 <literal>lfs setstripe</literal> command described in
1028 <xref linkend="managingstripingfreespace" />to change the file layout
1029 configuration.</para>
1031 <section remap="h3">
1034 <primary>Lustre</primary>
1035 <secondary>configuring</secondary>
1036 <tertiary>utilities</tertiary>
1037 </indexterm>Using the Lustre Configuration Utilities</title>
1038 <para>If additional configuration is necessary, several configuration
1039 utilities are available:</para>
1043 <literal>mkfs.lustre</literal>- Use to format a disk for a Lustre
1048 <literal>tunefs.lustre</literal>- Use to modify configuration
1049 information on a Lustre target disk.</para>
1053 <literal>lctl</literal>- Use to directly control Lustre features via
1055 <literal>ioctl</literal> interface, allowing various configuration,
1056 maintenance and debugging features to be accessed.</para>
1060 <literal>mount.lustre</literal>- Use to start a Lustre client or
1061 target service.</para>
1064 <para>For examples using these utilities, see the topic
1065 <xref linkend="systemconfigurationutilities" /></para>
1067 <literal>lfs</literal> utility is useful for configuring and querying a
1068 variety of options related to files. For more information, see
1069 <xref linkend="userutilities" />.</para>
1071 <para>Some sample scripts are included in the directory where the
1072 Lustre software is installed. If you have installed the Lustre source
1073 code, the scripts are located in the
1074 <literal>lustre/tests</literal> sub-directory. These scripts enable
1075 quick setup of some simple standard Lustre configurations.</para>
1080 <!--vim:expandtab:shiftwidth=2:tabstop=8:-->