- <section xml:id="dbdoclet.50438198_93109">
- <title>26.3 Common Lustre Problems</title>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291338" xreflabel=""/>This section describes how to address common issues encountered with Lustre.</para>
- <section remap="h3">
- <title><anchor xml:id="dbdoclet.50438198_pgfId-1291350" xreflabel=""/>26.3.1 OST Object is <anchor xml:id="dbdoclet.50438198_marker-1291349" xreflabel=""/>Missing or Damaged</title>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291351" xreflabel=""/>If the OSS fails to find an object or finds a damaged object, this message appears:</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291717" xreflabel=""/>OST object missing or damaged (OST "ost1", object 98148, error -2)</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291352" xreflabel=""/>If the reported error is -2 (-ENOENT, or "No such file or directory"), then the object is missing. This can occur either because the MDS and OST are out of sync, or because an OST object was corrupted and deleted.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291353" xreflabel=""/>If you have recovered the file system from a disk failure by using e2fsck, then unrecoverable objects may have been deleted or moved to /lost+found on the raw OST partition. Because files on the MDS still reference these objects, attempts to access them produce this error.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291354" xreflabel=""/>If you have recovered a backup of the raw MDS or OST partition, then the restored partition is very likely to be out of sync with the rest of your cluster. No matter which server partition you restored from backup, files on the MDS may reference objects which no longer exist (or did not exist when the backup was taken); accessing those files produces this error.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291355" xreflabel=""/>If neither of those descriptions is applicable to your situation, then it is possible that you have discovered a programming error that allowed the servers to get out of sync. Please report this condition to the Lustre group, and we will investigate.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291356" xreflabel=""/>If the reported error is anything else (such as -5, "I/O error"), it likely indicates a storage failure. The low-level file system returns this error if it is unable to read from the storage device.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291358" xreflabel=""/><emphasis role="bold">Suggested Action</emphasis></para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291359" xreflabel=""/>If the reported error is -2, you can consider checking in /lost+found on your raw OST device, to see if the missing object is there. However, it is likely that this object is lost forever, and that the file that references the object is now partially or completely lost. Restore this file from backup, or salvage what you can and delete it.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291360" xreflabel=""/>If the reported error is anything else, then you should immediately inspect this server for storage problems.</para>
- </section>
- <section remap="h3">
- <title><anchor xml:id="dbdoclet.50438198_pgfId-1291362" xreflabel=""/>26.3.2 OSTs <anchor xml:id="dbdoclet.50438198_marker-1291361" xreflabel=""/>Become Read-Only</title>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291363" xreflabel=""/>If the SCSI devices are inaccessible to Lustre at the block device level, then ldiskfs remounts the device read-only to prevent file system corruption. This is a normal behavior. The status in /proc/fs/lustre/health_check also shows "not healthy" on the affected nodes.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1293032" xreflabel=""/>To determine what caused the "not healthy" condition:</para>
- <itemizedlist><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1293041" xreflabel=""/> Examine the consoles of all servers for any error indications</para>
- </listitem>
-
-<listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1293045" xreflabel=""/> Examine the syslogs of all servers for any LustreErrors or LBUG</para>
- </listitem>
-
-<listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1293046" xreflabel=""/> Check the health of your system hardware and network. (Are the disks working as expected, is the network dropping packets?)</para>
- </listitem>
-
-<listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1293055" xreflabel=""/> Consider what was happening on the cluster at the time. Does this relate to a specific user workload or a system load condition? Is the condition reproducible? Does it happen at a specific time (day, week or month)?</para>
- </listitem>
-
-</itemizedlist>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291365" xreflabel=""/>To recover from this problem, you must restart Lustre services using these file systems. There is no other way to know that the I/O made it to disk, and the state of the cache may be inconsistent with what is on disk.</para>
- </section>
- <section remap="h3">
- <title><anchor xml:id="dbdoclet.50438198_pgfId-1291367" xreflabel=""/>26.3.3 Identifying a <anchor xml:id="dbdoclet.50438198_marker-1291366" xreflabel=""/>Missing OST</title>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291368" xreflabel=""/>If an OST is missing for any reason, you may need to know what files are affected. Although an OST is missing, the files system should be operational. From any mounted client node, generate a list of files that reside on the affected OST. It is advisable to mark the missing OST as 'unavailable' so clients and the MDS do not time out trying to contact it.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291369" xreflabel=""/> 1. Generate a list of devices and determine the OST's device number. Run:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1291370" xreflabel=""/>$ lctl dl
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1293115" xreflabel=""/>The lctl dl command output lists the device name and number, along with the device UUID and the number of references on the device.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291371" xreflabel=""/> 2. Deactivate the OST (on the OSS at the MDS). Run:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1291372" xreflabel=""/>$ lctl --device <OST device name or number> deactivate
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291373" xreflabel=""/>The OST device number or device name is generated by the lctl dl command.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1293067" xreflabel=""/>The deactivate command prevents clients from creating new objects on the specified OST, although you can still access the OST for reading.</para>
- <note><para>If the OST later becomes available it needs to be reactivated, run:</para><para># lctl --device <OST device name or number> activate</para></note>
-
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291376" xreflabel=""/> 3. Determine all files that are striped over the missing OST, run:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1291377" xreflabel=""/># lfs getstripe -r -O {OST_UUID} /mountpoint
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291378" xreflabel=""/>This returns a simple list of filenames from the affected file system.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291379" xreflabel=""/> 4. If necessary, you can read the valid parts of a striped file, run:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1291380" xreflabel=""/># dd if=filename of=new_filename bs=4k conv=sync,noerror
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291381" xreflabel=""/> 5. You can delete these files with the unlink or munlink command.</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1291382" xreflabel=""/># unlink|munlink filename {filename ...}
-</screen>
- <note><para>There is no functional difference between the unlink and munlink commands. The unlink command is for newer Linux distributions. You can run munlink if unlink is not available.</para><para> When you run the unlink or munlink command, the file on the MDS is permanently removed.</para></note>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291384" xreflabel=""/> 6. If you need to know, specifically, which parts of the file are missing data, then you first need to determine the file layout (striping pattern), which includes the index of the missing OST). Run:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1291385" xreflabel=""/># lfs getstripe -v {filename}
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291386" xreflabel=""/> 7. Use this computation is to determine which offsets in the file are affected: [(C*N + X)*S, (C*N + X)*S + S - 1], N = { 0, 1, 2, ...}</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291388" xreflabel=""/>where:</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291389" xreflabel=""/>C = stripe count</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291390" xreflabel=""/>S = stripe size</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291391" xreflabel=""/>X = index of bad OST for this file</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291392" xreflabel=""/>For example, for a 2 stripe file, stripe size = 1M, the bad OST is at index 0, and you have holes in the file at: [(2*N + 0)*1M, (2*N + 0)*1M + 1M - 1], N = { 0, 1, 2, ...}</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291394" xreflabel=""/>If the file system cannot be mounted, currently there is no way that parses metadata directly from an MDS. If the bad OST does not start, options to mount the file system are to provide a loop device OST in its place or replace it with a newly-formatted OST. In that case, the missing objects are created and are read as zero-filled.</para>
- </section>
- <section remap="h3">
- <title><anchor xml:id="dbdoclet.50438198_pgfId-1291436" xreflabel=""/>26.3.4 <anchor xml:id="dbdoclet.50438198_69657" xreflabel=""/>Fixing a Bad LAST_ID on an OST</title>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296775" xreflabel=""/>Each OST contains a LAST_ID file, which holds the last object (pre-)created by the MDS <footnote><para><anchor xml:id="dbdoclet.50438198_pgfId-1296778" xreflabel=""/>The contents of the LAST_ID file must be accurate regarding the actual objects that exist on the OST.</para></footnote>. The MDT contains a lov_objid file, with values that represent the last object the MDS has allocated to a file.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296779" xreflabel=""/>During normal operation, the MDT keeps some pre-created (but unallocated) objects on the OST, and the relationship between LAST_ID and lov_objid should be LAST_ID <= lov_objid. Any difference in the file values results in objects being created on the OST when it next connects to the MDS. These objects are never actually allocated to a file, since they are of 0 length (empty), but they do no harm. Creating empty objects enables the OST to catch up to the MDS, so normal operations resume.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296780" xreflabel=""/>However, in the case where lov_objid < LAST_ID, bad things can happen as the MDS is not aware of objects that have already been allocated on the OST, and it reallocates them to new files, overwriting their existing contents.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296781" xreflabel=""/>Here is the rule to avoid this scenario:</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296782" xreflabel=""/>LAST_ID >= lov_objid and LAST_ID == last_physical_object and lov_objid >= last_used_object</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296783" xreflabel=""/>Although the lov_objid value should be equal to the last_used_object value, the above rule suffices to keep Lustre happy at the expense of a few leaked objects.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296784" xreflabel=""/>In situations where there is on-disk corruption of the OST, for example caused by running with write cache enabled on the disks, the LAST_ID value may become inconsistent and result in a message similar to:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296785" xreflabel=""/>"filter_precreate()) HOME-OST0003: Serious error:
-<anchor xml:id="dbdoclet.50438198_pgfId-1296786" xreflabel=""/>objid 3478673 already exists; is this filesystem corrupt?"
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296787" xreflabel=""/>A related situation may happen if there is a significant discrepancy between the record of previously-created objects on the OST and the previously-allocated objects on the MDS, for example if the MDS has been corrupted, or restored from backup, which may cause significant data loss if left unchecked. This produces a message like:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296788" xreflabel=""/>"HOME-OST0003: ignoring bogus orphan destroy request:
-<anchor xml:id="dbdoclet.50438198_pgfId-1296789" xreflabel=""/>obdid 3438673 last_id 3478673"
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296797" xreflabel=""/>To recover from this situation, determine and set a reasonable LAST_ID value.</para>
- <note><para>The file system must be stopped on all servers before performing this procedure.</para></note>
-
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296799" xreflabel=""/>For hex < -> decimal translations:</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296800" xreflabel=""/>Use GDB:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296801" xreflabel=""/>(gdb) p /x 15028
-<anchor xml:id="dbdoclet.50438198_pgfId-1296802" xreflabel=""/>$2 = 0x3ab4
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296803" xreflabel=""/>Or bc:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296804" xreflabel=""/>echo "obase=16; 15028" | bc
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296805" xreflabel=""/> 1. Determine a reasonable value for the LAST_ID file. Check on the MDS:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296806" xreflabel=""/># mount -t ldiskfs /dev/<mdsdev> /mnt/mds
-<anchor xml:id="dbdoclet.50438198_pgfId-1296807" xreflabel=""/># od -Ax -td8 /mnt/mds/lov_objid
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296808" xreflabel=""/>There is one entry for each OST, in OST index order. This is what the MDS thinks is the last in-use object.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296809" xreflabel=""/> 2. Determine the OST index for this OST.</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296810" xreflabel=""/># od -Ax -td4 /mnt/ost/last_rcvd
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296811" xreflabel=""/>It will have it at offset 0x8c.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296812" xreflabel=""/> 3. Check on the OST. Use debugfs to check the LAST_ID value:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296813" xreflabel=""/>debugfs -c -R 'dump /O/0/LAST_ID /tmp/LAST_ID' /dev/XXX ; od -Ax -td8 /tmp/\
-LAST_ID"
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296814" xreflabel=""/> 4. Check the objects on the OST:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296815" xreflabel=""/>mount -rt ldiskfs /dev/{ostdev} /mnt/ost
-<anchor xml:id="dbdoclet.50438198_pgfId-1296816" xreflabel=""/># note the ls below is a number one and not a letter L
-<anchor xml:id="dbdoclet.50438198_pgfId-1296817" xreflabel=""/>ls -1s /mnt/ost/O/0/d* | grep -v [a-z] |
-<anchor xml:id="dbdoclet.50438198_pgfId-1296818" xreflabel=""/>sort -k2 -n > /tmp/objects.{diskname}
-<anchor xml:id="dbdoclet.50438198_pgfId-1296819" xreflabel=""/>
-<anchor xml:id="dbdoclet.50438198_pgfId-1296820" xreflabel=""/>tail -30 /tmp/objects.{diskname}
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296821" xreflabel=""/>This shows you the OST state. There may be some pre-created orphans. Check for zero-length objects. Any zero-length objects with IDs higher than LAST_ID should be deleted. New objects will be pre-created.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296832" xreflabel=""/>If the OST LAST_ID value matches that for the objects existing on the OST, then it is possible the lov_objid file on the MDS is incorrect. Delete the lov_objid file on the MDS and it will be re-created from the LAST_ID on the OSTs.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296833" xreflabel=""/>If you determine the LAST_ID file on the OST is incorrect (that is, it does not match what objects exist, does not match the MDS lov_objid value), then you have decided on a proper value for LAST_ID.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296834" xreflabel=""/>Once you have decided on a proper value for LAST_ID, use this repair procedure.</para>
- <orderedlist><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296835" xreflabel=""/>Access:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296836" xreflabel=""/>mount -t ldiskfs /dev/{ostdev} /mnt/ost
-</screen>
- </listitem><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296837" xreflabel=""/>Check the current:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296838" xreflabel=""/>od -Ax -td8 /mnt/ost/O/0/LAST_ID
-</screen>
- </listitem><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296839" xreflabel=""/>Be very safe, only work on backups:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296840" xreflabel=""/>cp /mnt/ost/O/0/LAST_ID /tmp/LAST_ID
-</screen>
- </listitem><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296841" xreflabel=""/>Convert binary to text:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296842" xreflabel=""/>xxd /tmp/LAST_ID /tmp/LAST_ID.asc
-</screen>
- </listitem><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296843" xreflabel=""/>Fix:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296844" xreflabel=""/>vi /tmp/LAST_ID.asc
-</screen>
- </listitem><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296845" xreflabel=""/>Convert to binary:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296846" xreflabel=""/>xxd -r /tmp/LAST_ID.asc /tmp/LAST_ID.new
-</screen>
- </listitem><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296847" xreflabel=""/>Verify:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296848" xreflabel=""/>od -Ax -td8 /tmp/LAST_ID.new
-</screen>
- </listitem><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296849" xreflabel=""/>Replace:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296850" xreflabel=""/>cp /tmp/LAST_ID.new /mnt/ost/O/0/LAST_ID
-</screen>
- </listitem><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1296851" xreflabel=""/>Clean up:</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1296852" xreflabel=""/>umount /mnt/ost
-</screen>
- </listitem></orderedlist>
- </section>
- <section remap="h3">
- <title><anchor xml:id="dbdoclet.50438198_pgfId-1291447" xreflabel=""/>26.3.5 Handling/Debugging <anchor xml:id="dbdoclet.50438198_marker-1291446" xreflabel=""/>"Bind: Address already in use" Error</title>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291448" xreflabel=""/>During startup, Lustre may report a bind: Address already in use error and reject to start the operation. This is caused by a portmap service (often NFS locking) which starts before Lustre and binds to the default port 988. You must have port 988 open from firewall or IP tables for incoming connections on the client, OSS, and MDS nodes. LNET will create three outgoing connections on available, reserved ports to each client-server pair, starting with 1023, 1022 and 1021.</para>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291449" xreflabel=""/>Unfortunately, you cannot set sunprc to avoid port 988. If you receive this error, do the following:</para>
- <itemizedlist><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291450" xreflabel=""/> Start Lustre before starting any service that uses sunrpc.</para>
- </listitem>
-
-<listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291451" xreflabel=""/> Use a port other than 988 for Lustre. This is configured in /etc/modprobe.conf as an option to the LNET module. For example:</para>
- </listitem>
-
-</itemizedlist>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1291452" xreflabel=""/>options lnet accept_port=988
-</screen>
- <itemizedlist><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1291453" xreflabel=""/> Add modprobe ptlrpc to your system startup scripts before the service that uses sunrpc. This causes Lustre to bind to port 988 and sunrpc to select a different port.</para>
- </listitem>
-
-</itemizedlist>
- <note><para>You can also use the sysctl command to mitigate the NFS client from grabbing the Lustre service port. However, this is a partial workaround as other user-space RPC servers still have the ability to grab the port.</para></note>
-
- </section>
- <section remap="h3">
- <title><anchor xml:id="dbdoclet.50438198_pgfId-1291471" xreflabel=""/>26.3.6 Handling/Debugging <anchor xml:id="dbdoclet.50438198_marker-1291470" xreflabel=""/>Error "- 28"</title>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1297002" xreflabel=""/>A Linux error -28 (ENOSPC) that occurs during a write or sync operation indicates that an existing file residing on an OST could not be rewritten or updated because the OST was full, or nearly full. To verify if this is the case, on a client on which the OST is mounted, enter :</para>
- <screen><anchor xml:id="dbdoclet.50438198_pgfId-1297980" xreflabel=""/>lfs df -h
-</screen>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1297979" xreflabel=""/>To address this issue, you can do one of the following:</para>
- <itemizedlist><listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1297887" xreflabel=""/> Expand the disk space on the OST.</para>
- </listitem>
-
-<listitem>
- <para><anchor xml:id="dbdoclet.50438198_pgfId-1297888" xreflabel=""/> Copy or stripe the file to a less full OST.</para>
- </listitem>
+ </section>
+ <section xml:id="reporting_lustre_problem">
+ <title><indexterm>
+ <primary>troubleshooting</primary>
+ <secondary>reporting bugs</secondary>
+ </indexterm><indexterm>
+ <primary>reporting bugs</primary>
+ <see>troubleshooting</see>
+ </indexterm>Reporting a Lustre File System Bug</title>
+ <para>If you cannot resolve a problem by troubleshooting your Lustre file
+ system, other options are:<itemizedlist>
+ <listitem>
+ <para>Post a question to the <link xmlns:xlink="http://www.w3.org/1999/xlink"
+ xlink:href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">lustre-discuss</link>
+ email list or search the archives for information about your issue.</para>
+ </listitem>
+ <listitem>
+ <para>Submit a ticket to the <link xmlns:xlink="http://www.w3.org/1999/xlink"
+ xlink:href="https://jira.whamcloud.com/">Jira</link>
+ <abbrev><superscript>*</superscript></abbrev>
+ bug tracking and project management tool used for the Lustre project.
+ If you are a first-time user, you'll need to open an account by
+ clicking on <emphasis role="bold">Sign up</emphasis> on the
+ Welcome page.</para>
+ </listitem>
+ </itemizedlist> To submit a Jira ticket, follow these steps:<orderedlist>
+ <listitem>
+ <para>To avoid filing a duplicate ticket, search for existing
+ tickets for your issue.
+ <emphasis role="italic">For search tips, see
+ <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="searching_jira"/>.</emphasis></para>
+ </listitem>
+ <listitem>
+ <para>To create a ticket, click <emphasis role="bold">+Create Issue</emphasis> in the
+ upper right corner. <emphasis role="italic">Create a separate ticket for each issue you
+ wish to submit.</emphasis></para>
+ </listitem>
+ <listitem>
+ <para>In the form displayed, enter the following information:<itemizedlist>
+ <listitem>
+ <para><emphasis role="italic">Project</emphasis> - Select <emphasis role="bold"
+ >Lustre</emphasis> or <emphasis role="bold">Lustre Documentation</emphasis> or
+ an appropriate project.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Issue type</emphasis> - Select <emphasis role="bold"
+ >Bug</emphasis>.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Summary</emphasis> - Enter a short description of the
+ issue. Use terms that would be useful for someone searching for a similar issue. A
+ LustreError or ASSERT/panic message often makes a good summary.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Affects version(s)</emphasis> - Select your Lustre
+ release.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Environment</emphasis> - Enter your kernel with
+ version number.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Description</emphasis> - Include a detailed
+ description of <emphasis role="italic">visible symptoms</emphasis> and, if
+ possible, <emphasis role="italic">how the problem is produced</emphasis>. Other
+ useful information may include <emphasis role="italic">the behavior you expect to
+ see</emphasis> and <emphasis role="italic">what you have tried so far to
+ diagnose the problem</emphasis>.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="italic">Attachments</emphasis> - Attach log sources such as
+ Lustre debug log dumps (see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="debugging_tools"/>), syslogs, or console logs. <emphasis
+ role="italic"><emphasis role="bold">Note:</emphasis></emphasis> Lustre debug
+ logs must be processed using <code>lctl df</code> prior to attaching to a Jira
+ ticket. For more information, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="using_lctl_tool"/>. </para>
+ </listitem>
+ </itemizedlist>Other fields in the form are used for project tracking and are irrelevant
+ to reporting an issue. You can leave these in their default state.</para>
+ </listitem>
+ </orderedlist></para>
+ <section xml:id="searching_jira">
+ <title>Searching Jira<superscript>*</superscript>for Duplicate Tickets</title>
+ <para>Before submitting a ticket, always search the Jira bug tracker for
+ an existing ticket for your issue. This avoids duplicating effort and
+ may immediately provide you with a solution to your problem. </para>
+ <para>To do a search in the Jira bug tracker, select the
+ <emphasis role="bold">Issues</emphasis> tab and click on
+ <emphasis role="bold">New filter</emphasis>. Use the filters provided
+ to select criteria for your search. To search for specific text, enter
+ the text in the "Contains text" field and click the magnifying glass
+ icon.</para>
+ <para>When searching for text such as an ASSERTION or LustreError
+ message, you can remove NIDs, pointers, and other installation-specific
+ and possibly version-specific text from your search string such as line
+ numbers by following the example below.</para>
+ <para><emphasis role="italic">Original error message:</emphasis></para>
+ <para><code>"(filter_io_26.c:</code>
+ <emphasis role="bold">791</emphasis><code>:filter_commitrw_write())
+ ASSERTION(oti->oti_transno<=obd->obd_last_committed) failed:
+ oti_transno </code><emphasis role="bold">752</emphasis>
+ <code>last_committed </code><emphasis role="bold">750</emphasis>
+ <code>"</code></para>
+ <para><emphasis role="italic">Optimized search string</emphasis></para>
+ <para><code>filter_commitrw_write ASSERTION oti_transno
+ obd_last_committed failed:</code></para>
+ </section>
+ </section>
+ <section xml:id="common_lustre_problems">
+ <title><indexterm>
+ <primary>troubleshooting</primary>
+ <secondary>common problems</secondary>
+ </indexterm>Common Lustre File System Problems</title>
+ <para>This section describes how to address common issues encountered with
+ a Lustre file system.</para>
+ <section remap="h3">
+ <title>OST Object is Missing or Damaged</title>
+ <para>If the OSS fails to find an object or finds a damaged object,
+ this message appears:</para>
+ <para><screen>OST object missing or damaged (OST "ost1", object 98148, error -2)</screen></para>
+ <para>If the reported error is -2 (<literal>-ENOENT</literal>, or
+ "No such file or directory"), then the object is no longer
+ present on the OST, even though a file on the MDT is referencing it.
+ This can occur either because the MDT and OST are out of sync, or
+ because an OST object was corrupted and deleted by e2fsck.</para>
+ <para>If you have recovered the file system from a disk failure by using
+ e2fsck, then unrecoverable objects may have been deleted or moved to
+ /lost+found in the underlying OST filesystem. Because files on the MDT
+ still reference these objects, attempts to access them produce this
+ error.</para>
+ <para>If you have restored the filesystem from a backup of the raw MDT
+ or OST partition, then the restored partition is very likely to be out
+ of sync with the rest of your cluster. No matter which server partition
+ you restored from backup, files on the MDT may reference objects which
+ no longer exist (or did not exist when the backup was taken); accessing
+ those files produces this error.</para>
+ <para>If neither of those descriptions is applicable to your situation,
+ then it is possible that you have discovered a programming error that
+ allowed the servers to get out of sync.
+ Please submit a Jira ticket (see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
+ linkend="reporting_lustre_problem"/>).</para>
+ <para>If the reported error is anything else (such as -5,
+ "<literal>I/O error</literal>"), it likely indicates a storage
+ device failure. The low-level file system returns this error if it is
+ unable to read from the storage device.</para>
+ <para><emphasis role="bold">Suggested Action</emphasis></para>
+ <para>If the reported error is -2, you can consider checking in
+ <literal>lost+found/</literal> on your raw OST device, to see if the
+ missing object is there. However, it is likely that this object is
+ lost forever, and that the file that references the object is now
+ partially or completely lost. Restore this file from backup, or
+ salvage what you can using <literal>dd conv=noerror</literal>and
+ delete it using the <literal>unlink</literal> command.</para>
+ <para>If the reported error is anything else, then you should
+ immediately inspect this server for storage problems.</para>
+ </section>
+ <section remap="h3">
+ <title>OSTs Become Read-Only</title>
+ <para>If the SCSI devices are inaccessible to the Lustre file system
+ at the block device level, then <literal>ldiskfs</literal> remounts
+ the device read-only to prevent file system corruption. This is a normal
+ behavior. The status in the parameter <literal>health_check</literal>
+ also shows "not healthy" on the affected nodes.</para>
+ <para>To determine what caused the "not healthy" condition:</para>
+ <itemizedlist>
+ <listitem>
+ <para>Examine the consoles of all servers for any error indications</para>
+ </listitem>
+ <listitem>
+ <para>Examine the syslogs of all servers for any LustreErrors or <literal>LBUG</literal></para>
+ </listitem>
+ <listitem>
+ <para>Check the health of your system hardware and network. (Are the disks working as expected, is the network dropping packets?)</para>
+ </listitem>
+ <listitem>
+ <para>Consider what was happening on the cluster at the time. Does this relate to a specific user workload or a system load condition? Is the condition reproducible? Does it happen at a specific time (day, week or month)?</para>
+ </listitem>
+ </itemizedlist>
+ <para>To recover from this problem, you must restart Lustre services using these file systems. There is no other way to know that the I/O made it to disk, and the state of the cache may be inconsistent with what is on disk.</para>
+ </section>
+ <section remap="h3">
+ <title>Identifying a Missing OST</title>
+ <para>If an OST is missing for any reason, you may need to know what files are affected. Although an OST is missing, the files system should be operational. From any mounted client node, generate a list of files that reside on the affected OST. It is advisable to mark the missing OST as 'unavailable' so clients and the MDS do not time out trying to contact it.</para>
+ <orderedlist>
+ <listitem>
+ <para>Generate a list of devices and determine the OST's device number. Run:</para>
+ <screen>$ lctl dl </screen>
+ <para>The lctl dl command output lists the device name and number, along with the device UUID and the number of references on the device.</para>
+ </listitem>
+ <listitem>
+ <para>Deactivate the OST (on the OSS at the MDS). Run:</para>
+ <screen>$ lctl --device <replaceable>lustre_device_number</replaceable> deactivate</screen>
+ <para>The OST device number or device name is generated by the lctl dl command.</para>
+ <para>The <literal>deactivate</literal> command prevents clients from creating new objects on the specified OST, although you can still access the OST for reading.</para>
+ <note>
+ <para>If the OST later becomes available it needs to be reactivated, run:</para>
+ <screen># lctl --device <replaceable>lustre_device_number</replaceable> activate</screen>
+ </note>
+ </listitem>
+ <listitem>
+ <para>Determine all files that are striped over the missing OST, run:</para>
+ <screen># lfs find -O {OST_UUID} /mountpoint</screen>
+ <para>This returns a simple list of filenames from the affected file system.</para>
+ </listitem>
+ <listitem>
+ <para>If necessary, you can read the valid parts of a striped file, run:</para>
+ <screen># dd if=filename of=new_filename bs=4k conv=sync,noerror</screen>
+ </listitem>
+ <listitem>
+ <para>You can delete these files with the <literal>unlink</literal> command.</para>
+ <screen># unlink filename {filename ...} </screen>
+ <note>
+ <para>When you run the <literal>unlink</literal> command, it may
+ return an error that the file could not be found, but the file
+ on the MDS has been permanently removed.</para>
+ </note>
+ </listitem>
+ </orderedlist>
+ <para>If the file system cannot be mounted, currently there is no way
+ that parses metadata directly from an MDS. If the bad OST does not
+ start, options to mount the file system are to provide a loop device
+ OST in its place or replace it with a newly-formatted OST. In that case,
+ the missing objects are created and are read as zero-filled.</para>
+ </section>
+ <section xml:id="repair_ost_lastid">
+ <title>Fixing a Bad LAST_ID on an OST</title>
+ <para>Each OST contains a <literal>LAST_ID</literal> file, which holds
+ the last object (pre-)created by the MDS
+ <footnote><para>The contents of the <literal>LAST_ID</literal>
+ file must be accurate regarding the actual objects that exist
+ on the OST.</para></footnote>.
+ The MDT contains a <literal>lov_objid</literal> file, with values
+ that represent the last object the MDS has allocated to a file.</para>
+ <para>During normal operation, the MDT keeps pre-created (but unused)
+ objects on the OST, and normally <literal>LAST_ID</literal> should be
+ larger than <literal>lov_objid</literal>. Any small difference in the
+ values is a result of objects being precreated on the OST to improve
+ MDS file creation performance. These precreated objects are not yet
+ allocated to a file, since they are of zero length (empty).</para>
+ <para>However, in the case where <literal>lov_objid</literal> is
+ larger than <literal>LAST_ID</literal>, it indicates the MDS has
+ allocated objects to files that do not exist on the OST. Conversely,
+ if <literal>lov_objid</literal> is significantly less than
+ <literal>LAST_ID</literal> (by at least 20,000 objects) it indicates
+ the OST previously allocated objects at the request of the MDS (which
+ likely contain data) but it doesn't know about them.</para>
+ <para condition='l25'>Since Lustre 2.5 the MDS and OSS will resync the
+ <literal>lov_objid</literal> and <literal>LAST_ID</literal> files
+ automatically if they become out of sync. This may result in some
+ space on the OSTs becoming unavailable until LFSCK is next run, but
+ avoids issues with mounting the filesystem.</para>
+ <para condition='l26'>Since Lustre 2.6 the LFSCK will repair the
+ <literal>LAST_ID</literal> file on the OST automatically based on
+ the objects that exist on the OST, in case it was corrupted.</para>
+ <para>In situations where there is on-disk corruption of the OST, for
+ example caused by the disk write cache being lost, or if the OST
+ was restored from an old backup or reformatted, the
+ <literal>LAST_ID</literal> value may become inconsistent and result
+ in a message similar to:</para>
+ <screen>"myth-OST0002: Too many FIDs to precreate,
+OST replaced or reformatted: LFSCK will clean up"</screen>
+ <para>A related situation may happen if there is a significant
+ discrepancy between the record of previously-created objects on the
+ OST and the previously-allocated objects on the MDT, for example if
+ the MDT has been corrupted, or restored from backup, which would cause
+ significant data loss if left unchecked. This produces a message
+ like:</para>
+ <screen>"myth-OST0002: too large difference between
+MDS LAST_ID [0x1000200000000:0x100048:0x0] (1048648) and
+OST LAST_ID [0x1000200000000:0x2232123:0x0] (35856675), trust the OST"</screen>
+ <para>In such cases, the MDS will advance the <literal>lov_objid</literal>
+ value to match that of the OST to avoid deleting existing objects,
+ which may contain data. Files on the MDT that reference these objects
+ will not be lost. Any unreferenced OST objects will be attached to
+ the <literal>.lustre/lost+found</literal> directory the next time
+ LFSCK <literal>layout</literal> check is run.</para>
+ </section>
+ <section remap="h3">
+ <title><indexterm><primary>troubleshooting</primary><secondary>'Address already in use'</secondary></indexterm>Handling/Debugging "<literal>Bind: Address already in use</literal>" Error</title>
+ <para>During startup, the Lustre software may report a <literal>bind: Address already in
+ use</literal> error and reject to start the operation. This is caused by a portmap service
+ (often NFS locking) that starts before the Lustre file system and binds to the default port
+ 988. You must have port 988 open from firewall or IP tables for incoming connections on the
+ client, OSS, and MDS nodes. LNet will create three outgoing connections on available,
+ reserved ports to each client-server pair, starting with 1023, 1022 and 1021.</para>
+ <para>Unfortunately, you cannot set sunprc to avoid port 988. If you receive this error, do the following:</para>
+ <itemizedlist>
+ <listitem>
+ <para>Start the Lustre file system before starting any service that uses sunrpc.</para>
+ </listitem>
+ <listitem>
+ <para>Use a port other than 988 for the Lustre file system. This is configured in
+ <literal>/etc/modprobe.d/lustre.conf</literal> as an option to the LNet module. For
+ example:</para>
+ <screen>options lnet accept_port=988</screen>
+ </listitem>
+ </itemizedlist>
+ <itemizedlist>
+ <listitem>
+ <para>Add modprobe ptlrpc to your system startup scripts before the service that uses
+ sunrpc. This causes the Lustre file system to bind to port 988 and sunrpc to select a
+ different port.</para>
+ </listitem>
+ </itemizedlist>
+ <note>
+ <para>You can also use the <literal>sysctl</literal> command to mitigate the NFS client from grabbing the Lustre service port. However, this is a partial workaround as other user-space RPC servers still have the ability to grab the port.</para>
+ </note>
+ </section>
+ <section remap="h3">
+ <title><indexterm><primary>troubleshooting</primary><secondary>'Error -28'</secondary></indexterm>Handling/Debugging Error "- 28"</title>
+ <para>A Linux error -28 (<literal>ENOSPC</literal>) that occurs during
+ a write or sync operation indicates that an existing file residing
+ on an OST could not be rewritten or updated because the OST was full,
+ or nearly full. To verify if this is the case, run on a client:</para>
+ <screen>
+client$ lfs df -h
+UUID bytes Used Available Use% Mounted on
+myth-MDT0000_UUID 12.9G 1.5G 10.6G 12% /myth[MDT:0]
+myth-OST0000_UUID 3.6T 3.1T 388.9G 89% /myth[OST:0]
+myth-OST0001_UUID 3.6T 3.6T 64.0K 100% /myth[OST:1]
+myth-OST0002_UUID 3.6T 3.1T 394.6G 89% /myth[OST:2]
+myth-OST0003_UUID 5.4T 5.0T 267.8G 95% /myth[OST:3]
+myth-OST0004_UUID 5.4T 2.9T 2.2T 57% /myth[OST:4]