- <para>Each OST contains a LAST_ID file, which holds the last object (pre-)created by the MDS <footnote>
- <para>The contents of the LAST_ID file must be accurate regarding the actual objects that exist on the OST.</para>
- </footnote>. The MDT contains a lov_objid file, with values that represent the last object the MDS has allocated to a file.</para>
- <para>During normal operation, the MDT keeps some pre-created (but unallocated) objects on the OST, and the relationship between LAST_ID and lov_objid should be LAST_ID <= lov_objid. Any difference in the file values results in objects being created on the OST when it next connects to the MDS. These objects are never actually allocated to a file, since they are of 0 length (empty), but they do no harm. Creating empty objects enables the OST to catch up to the MDS, so normal operations resume.</para>
- <para>However, in the case where lov_objid < LAST_ID, bad things can happen as the MDS is not aware of objects that have already been allocated on the OST, and it reallocates them to new files, overwriting their existing contents.</para>
- <para>Here is the rule to avoid this scenario:</para>
- <para>LAST_ID >= lov_objid and LAST_ID == last_physical_object and lov_objid >= last_used_object</para>
- <para>Although the lov_objid value should be equal to the last_used_object value, the above rule suffices to keep Lustre happy at the expense of a few leaked objects.</para>
- <para>In situations where there is on-disk corruption of the OST, for example caused by running with write cache enabled on the disks, the LAST_ID value may become inconsistent and result in a message similar to:</para>
- <screen>"filter_precreate()) HOME-OST0003: Serious error:
-objid 3478673 already exists; is this filesystem corrupt?"</screen>
- <para>A related situation may happen if there is a significant discrepancy between the record of previously-created objects on the OST and the previously-allocated objects on the MDS, for example if the MDS has been corrupted, or restored from backup, which may cause significant data loss if left unchecked. This produces a message like:</para>
- <screen>"HOME-OST0003: ignoring bogus orphan destroy request:
-obdid 3438673 last_id 3478673"</screen>
- <para>To recover from this situation, determine and set a reasonable LAST_ID value.</para>
- <note>
- <para>The file system must be stopped on all servers before performing this procedure.</para>
- </note>
- <para>For hex-to-decimal translations:</para>
- <para>Use GDB:</para>
- <screen>(gdb) p /x 15028
-$2 = 0x3ab4</screen>
- <para>Or <literal>bc</literal>:</para>
- <screen>echo "obase=16; 15028" | bc</screen>
- <orderedlist>
- <listitem>
- <para>Determine a reasonable value for the LAST_ID file. Check on the MDS:</para>
- <screen># mount -t ldiskfs <replaceable>/dev/mdt_device</replaceable> /mnt/mds
-# od -Ax -td8 /mnt/mds/lov_objid
-</screen>
- <para>There is one entry for each OST, in OST index order. This is what the MDS thinks is the last in-use object.</para>
- </listitem>
- <listitem>
- <para>Determine the OST index for this OST.</para>
- <screen># od -Ax -td4 /mnt/ost/last_rcvd
-</screen>
- <para>It will have it at offset 0x8c.</para>
- </listitem>
- <listitem>
- <para>Check on the OST. Use debugfs to check the LAST_ID value:</para>
- <screen>debugfs -c -R 'dump /O/0/LAST_ID /tmp/LAST_ID' /dev/XXX ; od -Ax -td8 /tmp/\
-LAST_ID"
-</screen>
- </listitem>
- <listitem>
- <para>Check the objects on the OST:</para>
- <screen>mount -rt ldiskfs /dev/{ostdev} /mnt/ost
-# note the ls below is a number one and not a letter L
-ls -1s /mnt/ost/O/0/d* | grep -v [a-z] |
-sort -k2 -n > /tmp/objects.{diskname}
-
-tail -30 /tmp/objects.{diskname}</screen>
- <para>This shows you the OST state. There may be some pre-created orphans. Check for zero-length objects. Any zero-length objects with IDs higher than LAST_ID should be deleted. New objects will be pre-created.</para>
- </listitem>
- </orderedlist>
- <para>If the OST LAST_ID value matches that for the objects existing on the OST, then it is possible the lov_objid file on the MDS is incorrect. Delete the lov_objid file on the MDS and it will be re-created from the LAST_ID on the OSTs.</para>
- <para>If you determine the LAST_ID file on the OST is incorrect (that is, it does not match what objects exist, does not match the MDS lov_objid value), then you have decided on a proper value for LAST_ID.</para>
- <para>Once you have decided on a proper value for LAST_ID, use this repair procedure.</para>
- <orderedlist>
- <listitem>
- <para>Access:</para>
- <screen>mount -t ldiskfs /dev/{ostdev} /mnt/ost</screen>
- </listitem>
- <listitem>
- <para>Check the current:</para>
- <screen>od -Ax -td8 /mnt/ost/O/0/LAST_ID</screen>
- </listitem>
- <listitem>
- <para>Be very safe, only work on backups:</para>
- <screen>cp /mnt/ost/O/0/LAST_ID /tmp/LAST_ID</screen>
- </listitem>
- <listitem>
- <para>Convert binary to text:</para>
- <screen>xxd /tmp/LAST_ID /tmp/LAST_ID.asc</screen>
- </listitem>
- <listitem>
- <para>Fix:</para>
- <screen>vi /tmp/LAST_ID.asc</screen>
- </listitem>
- <listitem>
- <para>Convert to binary:</para>
- <screen>xxd -r /tmp/LAST_ID.asc /tmp/LAST_ID.new</screen>
- </listitem>
- <listitem>
- <para>Verify:</para>
- <screen>od -Ax -td8 /tmp/LAST_ID.new</screen>
- </listitem>
- <listitem>
- <para>Replace:</para>
- <screen>cp /tmp/LAST_ID.new /mnt/ost/O/0/LAST_ID</screen>
- </listitem>
- <listitem>
- <para>Clean up:</para>
- <screen>umount /mnt/ost</screen>
- </listitem>
- </orderedlist>
+ <para>Each OST contains a <literal>LAST_ID</literal> file, which holds
+ the last object (pre-)created by the MDS
+ <footnote><para>The contents of the <literal>LAST_ID</literal>
+ file must be accurate regarding the actual objects that exist
+ on the OST.</para></footnote>.
+ The MDT contains a <literal>lov_objid</literal> file, with values
+ that represent the last object the MDS has allocated to a file.</para>
+ <para>During normal operation, the MDT keeps pre-created (but unused)
+ objects on the OST, and normally <literal>LAST_ID</literal> should be
+ larger than <literal>lov_objid</literal>. Any small difference in the
+ values is a result of objects being precreated on the OST to improve
+ MDS file creation performance. These precreated objects are not yet
+ allocated to a file, since they are of zero length (empty).</para>
+ <para>However, in the case where <literal>lov_objid</literal> is
+ larger than <literal>LAST_ID</literal>, it indicates the MDS has
+ allocated objects to files that do not exist on the OST. Conversely,
+ if <literal>lov_objid</literal> is significantly less than
+ <literal>LAST_ID</literal> (by at least 20,000 objects) it indicates
+ the OST previously allocated objects at the request of the MDS (which
+ likely contain data) but it doesn't know about them.</para>
+ <para condition='l25'>Since Lustre 2.5 the MDS and OSS will resync the
+ <literal>lov_objid</literal> and <literal>LAST_ID</literal> files
+ automatically if they become out of sync. This may result in some
+ space on the OSTs becoming unavailable until LFSCK is next run, but
+ avoids issues with mounting the filesystem.</para>
+ <para condition='l26'>Since Lustre 2.6 the LFSCK will repair the
+ <literal>LAST_ID</literal> file on the OST automatically based on
+ the objects that exist on the OST, in case it was corrupted.</para>
+ <para>In situations where there is on-disk corruption of the OST, for
+ example caused by the disk write cache being lost, or if the OST
+ was restored from an old backup or reformatted, the
+ <literal>LAST_ID</literal> value may become inconsistent and result
+ in a message similar to:</para>
+ <screen>"myth-OST0002: Too many FIDs to precreate,
+OST replaced or reformatted: LFSCK will clean up"</screen>
+ <para>A related situation may happen if there is a significant
+ discrepancy between the record of previously-created objects on the
+ OST and the previously-allocated objects on the MDT, for example if
+ the MDT has been corrupted, or restored from backup, which would cause
+ significant data loss if left unchecked. This produces a message
+ like:</para>
+ <screen>"myth-OST0002: too large difference between
+MDS LAST_ID [0x1000200000000:0x100048:0x0] (1048648) and
+OST LAST_ID [0x1000200000000:0x2232123:0x0] (35856675), trust the OST"</screen>
+ <para>In such cases, the MDS will advance the <literal>lov_objid</literal>
+ value to match that of the OST to avoid deleting existing objects,
+ which may contain data. Files on the MDT that reference these objects
+ will not be lost. Any unreferenced OST objects will be attached to
+ the <literal>.lustre/lost+found</literal> directory the next time
+ LFSCK <literal>layout</literal> check is run.</para>