1 <?xml version="1.0" encoding="UTF-8"?>
2 <chapter version="5.0" xml:lang="en-US" xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" xml:id='configuringlustre'>
4 <title xml:id='configuringlustre.title'>Configuring Lustre</title>
9 <para>This chapter shows how to configure a simple Lustre system comprised of a combined MGS/MDT, an OST and a client. It includes:</para>
10 <itemizedlist><listitem>
11 <para><xref linkend="dbdoclet.50438267_50692"/>
15 <para><xref linkend="dbdoclet.50438267_76752"/>
20 <section xml:id="dbdoclet.50438267_50692">
21 <title>10.1 Configuring a Simple Lustre File System</title>
22 <para>A Lustre system can be set up in a variety of configurations by using the administrative utilities provided with Lustre. The procedure below shows how to to configure a simple Lustre file system consisting of a combined MGS/MDS, one OSS with two OSTs, and a client. For an overview of the entire Lustre installation procedure, see <xref linkend='installoverview'/>.</para>
23 <para>This configuration procedure assumes you have completed the following:</para>
24 <itemizedlist><listitem>
25 <para><emphasis>Set up and configured your hardware</emphasis> . For more information about hardware requirements, see <xref linkend='settinguplustresystem'/>.</para>
28 <para><emphasis>Downloaded and installed the Lustre software.</emphasis> For more information about preparing for and installing the Lustre software, see <xref linkend='installinglustre'/>.</para>
31 <para>The following optional steps should also be completed, if needed, before the Lustre software is configured:</para>
32 <itemizedlist><listitem>
33 <para><emphasis>Set up a hardware or software RAID on block devices to be used as OSTs or MDTs.</emphasis> For information about setting up RAID, see the documentation for your RAID controller or <xref linkend='configuringstorage'/>.</para>
36 <para><emphasis>Set up network interface bonding on Ethernet interfaces.</emphasis> For information about setting up network interface bonding, see <xref linkend='settingupbonding'/>.</para>
39 <para><emphasis>Set</emphasis>lnet<emphasis>module parameters to specify how Lustre Networking (LNET) is to be configured to work with Lustre and test the LNET configuration.</emphasis> LNET will, by default, use the first TCP/IP interface it discovers on a system. If this network configuration is sufficient, you do not need to configure LNET. LNET configuration is required if you are using Infiniband or multiple Ethernet interfaces.</para>
42 <para>For information about configuring LNET, see <xref linkend='configuringlnet'/>. For information about testing LNET, see <xref linkend='lnetselftest'/>.</para>
43 <itemizedlist><listitem>
44 <para><emphasis>Run the benchmark script sgpdd_survey to determine baseline performance of your hardware.</emphasis> Benchmarking your hardware will simplify debugging performance issues that are unrelated to Lustre and ensure you are getting the best possible performance with your installation. For information about running sgpdd_survey, see <xref linkend='benchmarkingtests'/>.</para>
50 The sgpdd_survey script overwrites the device being tested so it must be run before the OSTs are configured.</para>
53 <para>To configure a simple Lustre file system, complete these steps:</para>
56 <para><emphasis role="bold">Create</emphasis> a combined MGS/MDT file system on a block device. On the MDS node, run:</para>
57 <screen>mkfs.lustre --fsname=<<emphasis>fsname</emphasis>> --mgs --mdt <<emphasis>block device name</emphasis>>
62 If you plan to generate multiple file systems, the MGS should be created separately on its own dedicated block device, by running:</para><para> mkfs.lustre --fsname=<<emphasis>fsname</emphasis>> --mgs <<emphasis>block device name</emphasis>></para>
69 <para>Mount the combined MGS/MDT file system on the block device. On the MDS node, run:</para>
70 <screen>mount -t lustre <<emphasis>block device name</emphasis>> <<emphasis>mount point</emphasis>>
74 If you have created and MGS and an MDT on separate block devices, mount them both.</para>
79 <listitem xml:id="dbdoclet.50438267_pgfId-1290915">
80 <para>Create the OST. On the OSS node, run:</para>
81 <screen>mkfs.lustre --ost --fsname=<<emphasis>fsname</emphasis>> --mgsnode=<<emphasis>NID</emphasis>> <<emphasis>block device name</emphasis>>
83 <para>When you create an OST, you are formatting a ldiskfs file system on a block storage device like you would with any local file system.</para>
84 <para>You can have as many OSTs per OSS as the hardware or drivers allow. For more information about storage and memory requirements for a Lustre file system, see <xref linkend='settinguplustresystem'/>.</para>
85 <para>You can only configure one OST per block device. You should create an OST that uses the raw block device and does not use partitioning.</para>
86 <para>If you are using block devices that are accessible from multiple OSS nodes, ensure that you mount the OSTs from only one OSS node at at time. It is strongly recommended that multiple-mount protection be enabled for such devices to prevent serious data corruption. For more information about multiple-mount protection, see <xref linkend='managingfailover'/>.</para>
89 Lustre currently supports block devices up to 16 TB on OEL 5/RHEL 5 (up to 8 TB on other distributions). If the device size is only slightly larger that 16 TB, it is recommended that you limit the file system size to 16 TB at format time. If the size is significantly larger than 16 TB, you should reconfigure the storage into devices smaller than 16 TB. We recommend that you not place partitions on top of RAID 5/6 block devices due to negative impacts on performance.</para>
94 <listitem xml:id="dbdoclet.50438267_pgfId-1293955">
95 <para>Mount the OST. On the OSS node where the OST was created, run:</para>
96 <screen>mount -t lustre <emphasis><block device name> <mount point></emphasis></screen>
100 To create additional OSTs, repeat <xref linkend='dbdoclet.50438267_pgfId-1290915'/>Step 3 and <xref linkend='dbdoclet.50438267_pgfId-1293955'/>Step 4.</para>
104 <listitem xml:id="dbdoclet.50438267_pgfId-1290934">
105 <para>Mount the Lustre file system on the client. On the client node, run:</para>
106 <screen>mount -t lustre <<emphasis>MGS node</emphasis>>:/<<emphasis>fsname</emphasis>> <<emphasis>mount point</emphasis>>
110 To create additional clients, repeat <xref linkend='dbdoclet.50438267_pgfId-1290934'/>Step 5.</para>
115 <para>Verify that the file system started and is working correctly. Do this by running the lfs df, dd and ls commands on the client node.</para>
118 If you have a problem mounting the file system, check the syslogs on the client and all the servers for errors and also check the network settings. A common issue with newly-installed systems is that hosts.deny or firewall rules may prevent connections on port 988.</para>
124 <para><emphasis>(Optional) Run benchmarking to</emphasis>ols to validate the performance of hardware and software layers in the cluster. Available tools include:</para>
126 <itemizedlist><listitem>
127 <para>obdfilter_survey - Characterizes the storage performance of a Lustre file system. For details, see <xref linkend='benchmarkingtests'/><emphasis><link xl:href="BenchmarkingTests.html#50438212_40624">Testing OST Performance (obdfilter_survey)</link></emphasis>.</para>
130 <para>ost_survey - Performs I/O against OSTs to detect anomalies between otherwise identical disk subsystems. For details, see <xref linkend='benchmarkingtests'/><emphasis><link xl:href="BenchmarkingTests.html#50438212_85136">Testing OST I/O Performance (ost_survey)</link></emphasis>.</para>
137 <title>10.1.1 Simple Lustre <anchor xml:id="dbdoclet.50438267_marker-1290955" xreflabel=""/>Configuration Example</title>
138 <para>To see the steps in a simple Lustre configuration, follow this example in which a combined MGS/MDT and two OSTs are created. Three block devices are used, one for the combined MGS/MDS node and one for each OSS node. Common parameters used in the example are listed below, along with individual node parameters.</para>
139 <informaltable frame="all">
141 <colspec colname="c1" colwidth="2*"/>
142 <colspec colname="c2" colwidth="25*"/>
143 <colspec colname="c3" colwidth="25*"/>
144 <colspec colname="c4" colwidth="25*"/>
147 <entry nameend="c2" namest="c1"><para><emphasis role="bold">Common Parameters</emphasis></para></entry>
148 <entry><para><emphasis role="bold">Value</emphasis></para></entry>
149 <entry><para><emphasis role="bold">Description</emphasis></para></entry>
154 <entry><para> </para></entry>
155 <entry><para> <emphasis role="bold">MGS node</emphasis></para></entry>
156 <entry><para> 10.2.0.1@tcp0</para></entry>
157 <entry><para> Node for the combined MGS/MDS</para></entry>
160 <entry><para> </para></entry>
161 <entry><para> <emphasis role="bold">file system</emphasis></para></entry>
162 <entry><para> temp</para></entry>
163 <entry><para> Name of the Lustre file system</para></entry>
166 <entry><para> </para></entry>
167 <entry><para> <emphasis role="bold">network type</emphasis></para></entry>
168 <entry><para> TCP/IP</para></entry>
169 <entry><para> Network type used for Lustre file system temp</para></entry>
174 <informaltable frame="all">
176 <colspec colname="c1" colwidth="25*"/>
177 <colspec colname="c2" colwidth="25*"/>
178 <colspec colname="c3" colwidth="25*"/>
179 <colspec colname="c4" colwidth="25*"/>
182 <entry nameend="c2" namest="c1"><para><emphasis role="bold">Node Parameters</emphasis></para></entry>
183 <entry><para><emphasis role="bold">Value</emphasis></para></entry>
184 <entry><para><emphasis role="bold">Description</emphasis></para></entry>
189 <entry nameend="c4" namest="c1"><para> MGS/MDS node</para></entry>
192 <entry><para> </para></entry>
193 <entry><para> <emphasis role="bold">MGS/MDS node</emphasis></para></entry>
194 <entry><para> mdt1</para></entry>
195 <entry><para> MDS in Lustre file system temp</para></entry>
198 <entry><para> </para></entry>
199 <entry><para> <emphasis role="bold">block device</emphasis></para></entry>
200 <entry><para> /dev/sdb</para></entry>
201 <entry><para> Block device for the combined MGS/MDS node</para></entry>
204 <entry><para> </para></entry>
205 <entry><para> <emphasis role="bold">mount point</emphasis></para></entry>
206 <entry><para> /mnt/mdt</para></entry>
207 <entry><para> Mount point for the mdt1 block device (/dev/sdb) on the MGS/MDS node</para></entry>
210 <entry nameend="c4" namest="c1"><para> First OSS node</para></entry>
213 <entry><para> </para></entry>
214 <entry><para> <emphasis role="bold">OSS node</emphasis></para></entry>
215 <entry><para> oss1</para></entry>
216 <entry><para> First OSS node in Lustre file system temp</para></entry>
219 <entry><para> </para></entry>
220 <entry><para> <emphasis role="bold">OST</emphasis></para></entry>
221 <entry><para> ost1</para></entry>
222 <entry><para> First OST in Lustre file system temp</para></entry>
225 <entry><para> </para></entry>
226 <entry><para> <emphasis role="bold">block device</emphasis></para></entry>
227 <entry><para> /dev/sdc</para></entry>
228 <entry><para> Block device for the first OSS node (oss1)</para></entry>
231 <entry><para> </para></entry>
232 <entry><para> <emphasis role="bold">mount point</emphasis></para></entry>
233 <entry><para> /mnt/ost1</para></entry>
234 <entry><para> Mount point for the ost1 block device (/dev/sdc) on the oss1 node</para></entry>
237 <entry nameend="c4" namest="c1"><para> Second OSS node</para></entry>
240 <entry><para> </para></entry>
241 <entry><para> <emphasis role="bold">OSS node</emphasis></para></entry>
242 <entry><para> oss2</para></entry>
243 <entry><para> Second OSS node in Lustre file system temp</para></entry>
246 <entry><para> </para></entry>
247 <entry><para> <emphasis role="bold">OST</emphasis></para></entry>
248 <entry><para> ost2</para></entry>
249 <entry><para> Second OST in Lustre file system temp</para></entry>
252 <entry><para> </para></entry>
253 <entry><para> <emphasis role="bold">block device</emphasis></para></entry>
254 <entry><para> /dev/sdd</para></entry>
255 <entry><para> Block device for the second OSS node (oss2)</para></entry>
258 <entry><para> </para></entry>
259 <entry><para> <emphasis role="bold">mount point</emphasis></para></entry>
260 <entry><para> /mnt/ost2</para></entry>
261 <entry><para> Mount point for the ost2 block device (/dev/sdd) on the oss2 node</para></entry>
264 <entry nameend="c4" namest="c1"><para> Client node</para></entry>
267 <entry><para> </para></entry>
268 <entry><para> <emphasis role="bold">client node</emphasis></para></entry>
269 <entry><para> client1</para></entry>
270 <entry><para> Client in Lustre file system temp</para></entry>
273 <entry><para> </para></entry>
274 <entry><para> <emphasis role="bold">mount point</emphasis></para></entry>
275 <entry><para> /lustre</para></entry>
276 <entry><para> Mount point for Lustre file system temp on the client1 node</para></entry>
286 We recommend that you use 'dotted-quad' notation for IP addresses rather than host names to make it easier to read debug logs and debug configurations with multiple interfaces.</para>
289 <para>For this example, complete the steps below:</para>
294 <para>Create a combined MGS/MDT file system on the block device. On the MDS node, run:</para>
295 <screen>[root@mds /]# mkfs.lustre --fsname=temp --mgs --mdt /dev/sdb
297 <para>This command generates this output:</para>
298 <screen> Permanent disk data:
304 (MDT MGS needs_index first_time update )
305 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
306 Parameters: mdt.group_upcall=/usr/sbin/l_getgroups
308 checking for existing Lustre data: not found
311 formatting backing filesystem ldiskfs on /dev/sdb
312 target name temp-MDTffff
314 options -i 4096 -I 512 -q -O dir_index,uninit_groups -F
315 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-MDTffff -i 4096 -I 512 -q -O
316 dir_index,uninit_groups -F /dev/sdb
317 Writing CONFIGS/mountdata
322 <para>Mount the combined MGS/MDT file system on the block device. On the MDS node, run:</para>
323 <screen>[root@mds /]# mount -t lustre /dev/sdb /mnt/mdt
325 <para>This command generates this output:</para>
326 <screen>Lustre: temp-MDT0000: new disk, initializing
327 Lustre: 3009:0:(lproc_mds.c:262:lprocfs_wr_group_upcall()) temp-MDT0000: gr\
328 oup upcall set to /usr/sbin/l_getgroups
329 Lustre: temp-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups
330 Lustre: Server temp-MDT0000 on device /dev/sdb has started
333 <listitem xml:id="dbdoclet.50438267_pgfId-1291170">
334 <para>Create and mount ost1.</para>
335 <para>In this example, the OSTs (ost1 and ost2) are being created on different OSSs (oss1 and oss2 respectively).</para>
338 <para>a. Create ost1. On oss1 node, run:</para>
339 <screen>[root@oss1 /]# mkfs.lustre --ost --fsname=temp --mgsnode=10.2.0.1@tcp0 /dev\
342 <para>The command generates this output:</para>
343 <screen> Permanent disk data:
349 (OST needs_index first_time update)
350 Persistent mount opts: errors=remount-ro,extents,mballoc
351 Parameters: mgsnode=10.2.0.1@tcp
353 checking for existing Lustre data: not found
356 formatting backing filesystem ldiskfs on /dev/sdc
357 target name temp-OSTffff
359 options -I 256 -q -O dir_index,uninit_groups -F
360 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OSTffff -I 256 -q -O
361 dir_index,uninit_groups -F /dev/sdc
362 Writing CONFIGS/mountdata
366 <para>b. Mount ost1 on the OSS on which it was created. On oss1 node, run:</para>
367 <screen>root@oss1 /] mount -t lustre /dev/sdc /mnt/ost1
369 <para>The command generates this output:</para>
370 <screen>LDISKFS-fs: file extents enabled
371 LDISKFS-fs: mballoc enabled
372 Lustre: temp-OST0000: new disk, initializing
373 Lustre: Server temp-OST0000 on device /dev/sdb has started
375 <para>Shortly afterwards, this output appears:</para>
376 <screen>Lustre: temp-OST0000: received MDS connection from 10.2.0.1@tcp0
377 Lustre: MDS temp-MDT0000: temp-OST0000_UUID now active, resetting orphans
383 <para>Create and mount ost2.</para>
386 <para>Create ost2. On oss2 node, run:</para>
387 <screen>[root@oss2 /]# mkfs.lustre --ost --fsname=temp --mgsnode=10.2.0.1@tcp0 /dev\
390 <para>The command generates this output:</para>
391 <screen> Permanent disk data:
397 (OST needs_index first_time update)
398 Persistent mount opts: errors=remount-ro,extents,mballoc
399 Parameters: mgsnode=10.2.0.1@tcp
401 checking for existing Lustre data: not found
404 formatting backing filesystem ldiskfs on /dev/sdd
405 target name temp-OSTffff
407 options -I 256 -q -O dir_index,uninit_groups -F
408 mkfs_cmd = mkfs.ext2 -j -b 4096 -L temp-OSTffff -I 256 -q -O
409 dir_index,uninit_groups -F /dev/sdc
410 Writing CONFIGS/mountdata
414 <para>Mount ost2 on the OSS on which it was created. On oss2 node, run:</para>
415 <screen>root@oss2 /] mount -t lustre /dev/sdd /mnt/ost2
417 <para>The command generates this output:</para>
418 <screen>LDISKFS-fs: file extents enabled
419 LDISKFS-fs: mballoc enabled
420 Lustre: temp-OST0000: new disk, initializing
421 Lustre: Server temp-OST0000 on device /dev/sdb has started
423 <para>Shortly afterwards, this output appears:</para>
424 <screen>Lustre: temp-OST0000: received MDS connection from 10.2.0.1@tcp0
425 Lustre: MDS temp-MDT0000: temp-OST0000_UUID now active, resetting orphans
431 <para>Mount the Lustre file system on the client. On the client node, run:</para>
432 <screen>root@client1 /] mount -t lustre 10.2.0.1@tcp0:/temp /lustre
434 <para>This command generates this output:</para>
435 <screen>Lustre: Client temp-client has started
439 <para>Verify that the file system started and is working by running the df, dd and ls commands on the client node.</para>
442 <para>Run the lfsdf -h command:</para>
443 <screen>[root@client1 /] lfs df -h
445 <para>The lfsdf-h command lists space usage per OST and the MDT in human-readable format. This command generates output similar to this:</para>
446 <screen>UUID bytes Used Available \
448 temp-MDT0000_UUID 8.0G 400.0M 7.6G 0% \
450 temp-OST0000_UUID 800.0G 400.0M 799.6G 0% \
452 temp-OST0001_UUID 800.0G 400.0M 799.6G 0% \
454 filesystem summary: 1.6T 800.0M 1.6T \
460 <para>Run the lfsdf-ih command.</para>
461 <screen>[root@client1 /] lfs df -ih
463 <para>The lfsdf-ih command lists inode usage per OST and the MDT. This command generates output similar to this:</para>
464 <screen>UUID Inodes IUsed IFree IUse% Mounted\
466 temp-MDT0000_UUID 2.5M 32 2.5M 0% /lustre[MDT:0]
467 temp-OST0000_UUID 5.5M 54 5.5M 0% /lustre[OST:0]
468 temp-OST0001_UUID 5.5M 54 5.5M 0% /lustre[OST:1]
469 filesystem summary: 2.5M 32 2.5M 0% /lustre
474 <para>c. Run the dd command:</para>
475 <screen>[root@client1 /] cd /lustre
476 [root@client1 /lustre] dd if=/dev/zero of=/lustre/zero.dat bs=4M count=2
478 <para>The dd command verifies write functionality by creating a file containing all zeros (0s). In this command, an 8 MB file is created. This command generates output similar to this:</para>
479 <screen>2+0 records in
481 8388608 bytes (8.4 MB) copied, 0.159628 seconds, 52.6 MB/s
487 <para>d. Run the ls command:</para>
488 <screen>[root@client1 /lustre] ls -lsah
490 <para>The ls-lsah command lists files and directories in the current working directory. This command generates output similar to this:</para>
492 4.0K drwxr-xr-x 2 root root 4.0K Oct 16 15:27 .
493 8.0K drwxr-xr-x 25 root root 4.0K Oct 16 15:27 ..
494 8.0M -rw-r--r-- 1 root root 8.0M Oct 16 15:27 zero.dat
501 <para>Once the Lustre file system is configured, it is ready for use.</para>
504 <section xml:id="dbdoclet.50438267_76752">
505 <title>10.2 Additional Configuration Options</title>
506 <para>This section describes how to scale the Lustre file system or make configuration changes using the Lustre configuration utilities.</para>
508 <title>10.2.1 Scaling the <anchor xml:id="dbdoclet.50438267_marker-1292440" xreflabel=""/>Lustre File System</title>
509 <para>A Lustre file system can be scaled by adding OSTs or clients. For instructions on creating additional OSTs repeat <xref linkend="dbdoclet.50438267_pgfId-1291170"/>Step 3 and <xref linkend="dbdoclet.50438267_pgfId-1293955"/>Step 4 above. For mounting additional clients, repeat <xref linkend="dbdoclet.50438267_pgfId-1290934"/>Step 5 for each client.</para>
512 <title>10.2.2 <anchor xml:id="dbdoclet.50438267_50212" xreflabel=""/>Changing Striping Defaults</title>
513 <para>The default settings for the file layout stripe pattern are shown in <xref linkend='configuringlustre.tab.stripe'/>.</para>
514 <table frame="none" xml:id='configuringlustre.tab.stripe'>
515 <title>Default stripe pattern</title>
517 <colspec colname="c1" colwidth="3*"/>
518 <colspec colname="c2" colwidth="13*"/>
519 <colspec colname="c3" colwidth="13*"/>
522 <entry><para><emphasis role="bold">File Layout Parameter</emphasis></para></entry>
523 <entry><para><emphasis role="bold">Default</emphasis></para></entry>
524 <entry><para><emphasis role="bold">Description</emphasis></para></entry>
527 <entry><para> stripe_size</para></entry>
528 <entry><para> 1 MB</para></entry>
529 <entry><para> Amount of data to write to one OST before moving to the next OST.</para></entry>
532 <entry><para> stripe_count</para></entry>
533 <entry><para> 1</para></entry>
534 <entry><para> The number of OSTs to use for a single file.</para></entry>
537 <entry><para> start_ost</para></entry>
538 <entry><para> -1</para></entry>
539 <entry><para> The first OST where objects are created for each file. The default -1 allows the MDS to choose the starting index based on available space and load balancing. <emphasis>It's strongly recommended not to change the default for this parameter to a value other than -1.</emphasis></para></entry>
544 <para>Use the lfs setstripe command described in <xref linkend='managingstripingfreespace'/> to change the file layout configuration.</para>
547 <title>10.2.3 Using the Lustre Configuration Utilities</title>
548 <para>If additional configuration is necessary, several configuration utilities are available:</para>
549 <itemizedlist><listitem>
550 <para>mkfs.lustre - Use to format a disk for a Lustre service.</para>
553 <para>tunefs.lustre - Use to modify configuration information on a Lustre target disk.</para>
556 <para>lctl - Use to directly control Lustre via an ioctl interface, allowing various configuration, maintenance and debugging features to be accessed.</para>
559 <para>mount.lustre - Use to start a Lustre client or target service.</para>
562 <para>For examples using these utilities, see the topic <xref linkend='systemconfigurationutilities'/></para>
563 <para>The lfs utility is usful for configuring and querying a variety of options related to files. For more information, see <xref linkend='userutilities'/>.</para>
566 Some sample scripts are included in the directory where Lustre is installed. If you have installed the Lustre source code, the scripts are located in the lustre/tests sub-directory. These scripts enable quick setup of some simple standard Lustre configurations.</para>