LustreTroubleshooting.xml

   1 <?xml version='1.0' encoding='UTF-8'?>
   2 <chapter xmlns="http://docbook.org/ns/docbook"
   3  xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
   4  xml:id="lustretroubleshooting">
   5   <title xml:id="lustretroubleshooting.title">Lustre File System Troubleshooting</title>
   6   <para>This chapter provides information about troubleshooting a Lustre file system, submitting a
   7     bug to the Jira bug tracking system, and Lustre file system performance tips. It includes the
   8     following sections:</para>
   9   <itemizedlist>
  10     <listitem>
  11       <para><xref linkend="dbdoclet.50438198_11171"/></para>
  12     </listitem>
  13     <listitem>
  14       <para><xref linkend="dbdoclet.reporting_lustre_problem"/></para>
  15     </listitem>
  16     <listitem>
  17       <para><xref linkend="dbdoclet.50438198_93109"/></para>
  18     </listitem>
  19   </itemizedlist>
  20   <section xml:id="dbdoclet.50438198_11171">
  21       <title><indexterm><primary>troubleshooting</primary></indexterm>
  22           <indexterm><primary>lustre</primary><secondary>troubleshooting</secondary><see>troubleshooting</see></indexterm>
  23           <indexterm><primary>lustre</primary><secondary>errors</secondary><see>troubleshooting</see></indexterm>
  24           <indexterm><primary>errors</primary><see>troubleshooting</see></indexterm>
  25           Lustre Error Messages</title>
  26     <para>Several resources are available to help troubleshoot an issue in a Lustre file system.
  27       This section describes error numbers, error messages and logs.</para>
  28     <section remap="h3">
  29       <title><indexterm><primary>troubleshooting</primary><secondary>error numbers</secondary></indexterm>Error Numbers</title>
  30       <para>Error numbers are generated by the Linux operating system and are located in
  31           <literal>/usr/include/asm-generic/errno.h</literal>. The Lustre software does not use all
  32         of the available Linux error numbers. The exact meaning of an error number depends on where
  33         it is used. Here is a summary of the basic errors that Lustre file system users may
  34         encounter.</para>
  35       <informaltable frame="all">
  36         <tgroup cols="3">
  37           <colspec colname="c1" colwidth="33*"/>
  38           <colspec colname="c2" colwidth="33*"/>
  39           <colspec colname="c3" colwidth="33*"/>
  40           <thead>
  41             <row>
  42               <entry>
  43                 <para><emphasis role="bold">Error Number</emphasis></para>
  44               </entry>
  45               <entry>
  46                 <para><emphasis role="bold">Error Name</emphasis></para>
  47               </entry>
  48               <entry>
  49                 <para><emphasis role="bold">Description</emphasis></para>
  50               </entry>
  51             </row>
  52           </thead>
  53           <tbody>
  54             <row>
  55               <entry>
  56                 <para> -1</para>
  57               </entry>
  58               <entry>
  59                 <literal> -EPERM </literal>
  60               </entry>
  61               <entry>
  62                 <para> Permission is denied.</para>
  63               </entry>
  64             </row>
  65             <row>
  66               <entry> -2 </entry>
  67               <entry>
  68                 <literal> -ENOENT </literal>
  69               </entry>
  70               <entry>
  71                 <para> The requested file or directory does not exist.</para>
  72               </entry>
  73             </row>
  74             <row>
  75               <entry>
  76                 <para> -4</para>
  77               </entry>
  78               <entry>
  79                 <literal> -EINTR </literal>
  80               </entry>
  81               <entry>
  82                 <para> The operation was interrupted (usually CTRL-C or a killing process).</para>
  83               </entry>
  84             </row>
  85             <row>
  86               <entry>
  87                 <para> -5</para>
  88               </entry>
  89               <entry>
  90                 <literal> -EIO </literal>
  91               </entry>
  92               <entry>
  93                 <para> The operation failed with a read or write error.</para>
  94               </entry>
  95             </row>
  96             <row>
  97               <entry>
  98                 <para> -19</para>
  99               </entry>
 100               <entry>
 101                 <literal> -ENODEV </literal>
 102               </entry>
 103               <entry>
 104                 <para> No such device is available. The server stopped or failed over.</para>
 105               </entry>
 106             </row>
 107             <row>
 108               <entry>
 109                 <para> -22</para>
 110               </entry>
 111               <entry>
 112                 <literal> -EINVAL </literal>
 113               </entry>
 114               <entry>
 115                 <para> The parameter contains an invalid value.</para>
 116               </entry>
 117             </row>
 118             <row>
 119               <entry>
 120                 <para> -28</para>
 121               </entry>
 122               <entry>
 123                 <literal> -ENOSPC </literal>
 124               </entry>
 125               <entry>
 126                 <para> The file system is out-of-space or out of inodes. Use <literal>lfs df</literal> (query the amount of file system space) or <literal>lfs df -i</literal> (query the number of inodes).</para>
 127               </entry>
 128             </row>
 129             <row>
 130               <entry>
 131                 <para> -30</para>
 132               </entry>
 133               <entry>
 134                 <literal> -EROFS </literal>
 135               </entry>
 136               <entry>
 137                 <para> The file system is read-only, likely due to a detected error.</para>
 138               </entry>
 139             </row>
 140             <row>
 141               <entry>
 142                 <para> -43</para>
 143               </entry>
 144               <entry>
 145                 <literal> -EIDRM </literal>
 146               </entry>
 147               <entry>
 148                 <para> The UID/GID does not match any known UID/GID on the MDS. Update etc/hosts and etc/group on the MDS to add the missing user or group.</para>
 149               </entry>
 150             </row>
 151             <row>
 152               <entry>
 153                 <para> -107</para>
 154               </entry>
 155               <entry>
 156                 <literal> -ENOTCONN </literal>
 157               </entry>
 158               <entry>
 159                 <para> The client is not connected to this server.</para>
 160               </entry>
 161             </row>
 162             <row>
 163               <entry>
 164                 <para> -110</para>
 165               </entry>
 166               <entry>
 167                 <literal> -ETIMEDOUT </literal>
 168               </entry>
 169               <entry>
 170                 <para> The operation took too long and timed out.</para>
 171               </entry>
 172             </row>
 173             <row>
 174               <entry>
 175                 <para> -122</para>
 176               </entry>
 177               <entry>
 178                 <literal> -EDQUOT </literal>
 179               </entry>
 180               <entry>
 181                 <para> The operation exceeded the user disk quota and was aborted.</para>
 182               </entry>
 183             </row>
 184           </tbody>
 185         </tgroup>
 186       </informaltable>
 187     </section>
 188     <section xml:id="dbdoclet.50438198_40669">
 189       <title><indexterm><primary>troubleshooting</primary><secondary>error messages</secondary></indexterm>Viewing Error Messages</title>
 190       <para>As Lustre software code runs on the kernel, single-digit error codes display to the
 191         application; these error codes are an indication of the problem. Refer to the kernel console
 192         log (dmesg) for all recent kernel messages from that node. On the node,
 193           <literal>/var/log/messages</literal> holds a log of all messages for at least the past
 194         day.</para>
 195       <para>The error message initiates with &quot;LustreError&quot; in the console log and provides a short description of:</para>
 196       <itemizedlist>
 197         <listitem>
 198           <para>What the problem is</para>
 199         </listitem>
 200         <listitem>
 201           <para>Which process ID had trouble</para>
 202         </listitem>
 203         <listitem>
 204           <para>Which server node it was communicating with, and so on.</para>
 205         </listitem>
 206       </itemizedlist>
 207       <para>Lustre logs are dumped to the pathname stored in the parameter
 208       <literal>lnet.debug_path</literal>.</para>
 209       <para>Collect the first group of messages related to a problem, and any messages that precede &quot;LBUG&quot; or &quot;assertion failure&quot; errors. Messages that mention server nodes (OST or MDS) are specific to that server; you must collect similar messages from the relevant server console logs.</para>
 210       <para>Another Lustre debug log holds information for a short period of time for action by the
 211         Lustre software, which, in turn, depends on the processes on the Lustre node. Use the
 212         following command to extract debug logs on each of the nodes, run</para>
 213       <screen>$ lctl dk <replaceable>filename</replaceable></screen>
 214       <note>
 215         <para>LBUG freezes the thread to allow capture of the panic stack. A system reboot is needed to clear the thread.</para>
 216       </note>
 217     </section>
 218   </section>
 219   <section xml:id="dbdoclet.reporting_lustre_problem">
 220       <title><indexterm>
 221         <primary>troubleshooting</primary>
 222         <secondary>reporting bugs</secondary>
 223       </indexterm><indexterm>
 224         <primary>reporting bugs</primary>
 225         <see>troubleshooting</see>
 226       </indexterm>Reporting a Lustre File System Bug</title>
 227     <para>If you cannot resolve a problem by troubleshooting your Lustre file
 228       system, other options are:<itemizedlist>
 229         <listitem>
 230           <para>Post a question to the <link xmlns:xlink="http://www.w3.org/1999/xlink"
 231               xlink:href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">lustre-discuss</link>
 232             email list or search the archives for information about your issue.</para>
 233         </listitem>
 234         <listitem>
 235           <para>Submit a ticket to the <link xmlns:xlink="http://www.w3.org/1999/xlink"
 236               xlink:href="https://jira.whamcloud.com/secure/Dashboard.jspa">Jira</link><abbrev><superscript>*</superscript></abbrev>
 237            bug tracking and project management tool used for the Lustre project.
 238            If you are a first-time user, you'll need to open an account by
 239            clicking on <emphasis role="bold">Sign up</emphasis> on the
 240            Welcome page.</para>
 241         </listitem>
 242       </itemizedlist> To submit a Jira ticket, follow these steps:<orderedlist>
 243         <listitem>
 244           <para>To avoid filing a duplicate ticket, search for existing
 245             tickets for your issue.
 246             <emphasis role="italic">For search tips, see
 247             <xref xmlns:xlink="http://www.w3.org/1999/xlink"
 248               linkend="dbdoclet.searching_jira"/>.</emphasis></para>
 249         </listitem>
 250         <listitem>
 251           <para>To create a ticket, click <emphasis role="bold">+Create Issue</emphasis> in the
 252             upper right corner. <emphasis role="italic">Create a separate ticket for each issue you
 253               wish to submit.</emphasis></para>
 254         </listitem>
 255         <listitem>
 256           <para>In the form displayed, enter the following information:<itemizedlist>
 257               <listitem>
 258                 <para><emphasis role="italic">Project</emphasis> - Select <emphasis role="bold"
 259                     >Lustre</emphasis> or <emphasis role="bold">Lustre Documentation</emphasis> or
 260                   an appropriate project.</para>
 261               </listitem>
 262               <listitem>
 263                 <para><emphasis role="italic">Issue type</emphasis> - Select <emphasis role="bold"
 264                     >Bug</emphasis>.</para>
 265               </listitem>
 266               <listitem>
 267                 <para><emphasis role="italic">Summary</emphasis> - Enter a short description of the
 268                   issue. Use terms that would be useful for someone searching for a similar issue. A
 269                   LustreError or ASSERT/panic message often makes a good summary.</para>
 270               </listitem>
 271               <listitem>
 272                 <para><emphasis role="italic">Affects version(s)</emphasis> - Select your Lustre
 273                   release.</para>
 274               </listitem>
 275               <listitem>
 276                 <para><emphasis role="italic">Environment</emphasis> - Enter your kernel with
 277                   version number.</para>
 278               </listitem>
 279               <listitem>
 280                 <para><emphasis role="italic">Description</emphasis> - Include a detailed
 281                   description of <emphasis role="italic">visible symptoms</emphasis> and, if
 282                   possible, <emphasis role="italic">how the problem is produced</emphasis>. Other
 283                   useful information may include <emphasis role="italic">the behavior you expect to
 284                     see</emphasis> and <emphasis role="italic">what you have tried so far to
 285                     diagnose the problem</emphasis>.</para>
 286               </listitem>
 287               <listitem>
 288                 <para><emphasis role="italic">Attachments</emphasis> - Attach log sources such as
 289                   Lustre debug log dumps (see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
 290                     linkend="dbdoclet.50438274_15874"/>), syslogs, or console logs. <emphasis
 291                     role="italic"><emphasis role="bold">Note:</emphasis></emphasis> Lustre debug
 292                   logs must be processed using <code>lctl df</code> prior to attaching to a Jira
 293                   ticket. For more information, see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
 294                     linkend="dbdoclet.50438274_62472"/>. </para>
 295               </listitem>
 296             </itemizedlist>Other fields in the form are used for project tracking and are irrelevant
 297             to reporting an issue. You can leave these in their default state.</para>
 298         </listitem>
 299       </orderedlist></para>
 300     <section xml:id="dbdoclet.searching_jira">
 301       <title>Searching Jira<superscript>*</superscript>for Duplicate Tickets</title>
 302       <para>Before submitting a ticket, always search the Jira bug tracker for
 303         an existing ticket for your issue.  This avoids duplicating effort and
 304         may immediately provide you with a solution to your problem. </para>
 305       <para>To do a search in the Jira bug tracker, select the
 306         <emphasis role="bold">Issues</emphasis> tab and click on
 307         <emphasis role="bold">New filter</emphasis>. Use the filters provided
 308         to select criteria for your search. To search for specific text, enter
 309         the text in the "Contains text" field and click the magnifying glass
 310         icon.</para>
 311       <para>When searching for text such as an ASSERTION or LustreError
 312         message, you can remove NIDs, pointers, and other installation-specific
 313         and possibly version-specific text from your search string such as line
 314         numbers by following the example below.</para>
 315       <para><emphasis role="italic">Original error message:</emphasis></para>
 316       <para><code>"(filter_io_26.c:</code>
 317         <emphasis role="bold">791</emphasis><code>:filter_commitrw_write())
 318         ASSERTION(oti-&gt;oti_transno&lt;=obd-&gt;obd_last_committed) failed:
 319         oti_transno </code><emphasis role="bold">752</emphasis>
 320         <code>last_committed </code><emphasis role="bold">750</emphasis>
 321         <code>"</code></para>
 322       <para><emphasis role="italic">Optimized search string</emphasis></para>
 323       <para><code>filter_commitrw_write ASSERTION oti_transno
 324         obd_last_committed failed:</code></para>
 325     </section>
 326   </section>
 327   <section xml:id="dbdoclet.50438198_93109">
 328     <title><indexterm>
 329         <primary>troubleshooting</primary>
 330         <secondary>common problems</secondary>
 331       </indexterm>Common Lustre File System Problems</title>
 332     <para>This section describes how to address common issues encountered with
 333       a Lustre file system.</para>
 334     <section remap="h3">
 335       <title>OST Object is Missing or Damaged</title>
 336       <para>If the OSS fails to find an object or finds a damaged object,
 337         this message appears:</para>
 338       <para><screen>OST object missing or damaged (OST &quot;ost1&quot;, object 98148, error -2)</screen></para>
 339       <para>If the reported error is -2 (<literal>-ENOENT</literal>, or
 340         &quot;No such file or directory&quot;), then the object is no longer
 341         present on the OST, even though a file on the MDT is referencing it.
 342         This can occur either because the MDT and OST are out of sync, or
 343         because an OST object was corrupted and deleted by e2fsck.</para>
 344       <para>If you have recovered the file system from a disk failure by using
 345         e2fsck, then unrecoverable objects may have been deleted or moved to
 346         /lost+found in the underlying OST filesystem. Because files on the MDT
 347         still reference these objects, attempts to access them produce this
 348         error.</para>
 349       <para>If you have restored the filesystem from a backup of the raw MDT
 350         or OST partition, then the restored partition is very likely to be out
 351         of sync with the rest of your cluster. No matter which server partition
 352         you restored from backup, files on the MDT may reference objects which
 353         no longer exist (or did not exist when the backup was taken); accessing
 354         those files produces this error.</para>
 355       <para>If neither of those descriptions is applicable to your situation,
 356         then it is possible that you have discovered a programming error that
 357         allowed the servers to get out of sync.
 358         Please submit a Jira ticket (see <xref xmlns:xlink="http://www.w3.org/1999/xlink"
 359           linkend="dbdoclet.reporting_lustre_problem"/>).</para>
 360       <para>If the reported error is anything else (such as -5,
 361       &quot;<literal>I/O error</literal>&quot;), it likely indicates a storage
 362       device failure. The low-level file system returns this error if it is
 363       unable to read from the storage device.</para>
 364       <para><emphasis role="bold">Suggested Action</emphasis></para>
 365       <para>If the reported error is -2, you can consider checking in
 366         <literal>lost+found/</literal> on your raw OST device, to see if the
 367         missing object is there. However, it is likely that this object is
 368         lost forever, and that the file that references the object is now
 369         partially or completely lost. Restore this file from backup, or
 370         salvage what you can using <literal>dd conv=noerror</literal>and
 371         delete it using the <literal>unlink</literal> command.</para>
 372       <para>If the reported error is anything else, then you should
 373         immediately inspect this server for storage problems.</para>
 374     </section>
 375     <section remap="h3">
 376       <title>OSTs Become Read-Only</title>
 377       <para>If the SCSI devices are inaccessible to the Lustre file system
 378         at the block device level, then <literal>ldiskfs</literal> remounts
 379         the device read-only to prevent file system corruption. This is a normal
 380         behavior. The status in the parameter <literal>health_check</literal>
 381         also shows &quot;not healthy&quot; on the affected nodes.</para>
 382       <para>To determine what caused the &quot;not healthy&quot; condition:</para>
 383       <itemizedlist>
 384         <listitem>
 385           <para>Examine the consoles of all servers for any error indications</para>
 386         </listitem>
 387         <listitem>
 388           <para>Examine the syslogs of all servers for any LustreErrors or <literal>LBUG</literal></para>
 389         </listitem>
 390         <listitem>
 391           <para>Check the health of your system hardware and network. (Are the disks working as expected, is the network dropping packets?)</para>
 392         </listitem>
 393         <listitem>
 394           <para>Consider what was happening on the cluster at the time. Does this relate to a specific user workload or a system load condition? Is the condition reproducible? Does it happen at a specific time (day, week or month)?</para>
 395         </listitem>
 396       </itemizedlist>
 397       <para>To recover from this problem, you must restart Lustre services using these file systems. There is no other way to know that the I/O made it to disk, and the state of the cache may be inconsistent with what is on disk.</para>
 398     </section>
 399     <section remap="h3">
 400       <title>Identifying a Missing OST</title>
 401       <para>If an OST is missing for any reason, you may need to know what files are affected. Although an OST is missing, the files system should be operational. From any mounted client node, generate a list of files that reside on the affected OST. It is advisable to mark the missing OST as &apos;unavailable&apos; so clients and the MDS do not time out trying to contact it.</para>
 402       <orderedlist>
 403         <listitem>
 404           <para>Generate a list of devices and determine the OST&apos;s device number. Run:</para>
 405           <screen>$ lctl dl </screen>
 406           <para>The lctl dl command output lists the device name and number, along with the device UUID and the number of references on the device.</para>
 407         </listitem>
 408         <listitem>
 409           <para>Deactivate the OST (on the OSS at the MDS). Run:</para>
 410           <screen>$ lctl --device <replaceable>lustre_device_number</replaceable> deactivate</screen>
 411           <para>The OST device number or device name is generated by the lctl dl command.</para>
 412           <para>The <literal>deactivate</literal> command prevents clients from creating new objects on the specified OST, although you can still access the OST for reading.</para>
 413           <note>
 414             <para>If the OST later becomes available it needs to be reactivated, run:</para>
 415             <screen># lctl --device <replaceable>lustre_device_number</replaceable> activate</screen>
 416           </note>
 417         </listitem>
 418         <listitem>
 419           <para>Determine all files that are striped over the missing OST, run:</para>
 420           <screen># lfs find -O {OST_UUID} /mountpoint</screen>
 421           <para>This returns a simple list of filenames from the affected file system.</para>
 422         </listitem>
 423         <listitem>
 424           <para>If necessary, you can read the valid parts of a striped file, run:</para>
 425           <screen># dd if=filename of=new_filename bs=4k conv=sync,noerror</screen>
 426         </listitem>
 427         <listitem>
 428           <para>You can delete these files with the <literal>unlink</literal> command.</para>
 429           <screen># unlink filename {filename ...} </screen>
 430           <note>
 431             <para>When you run the <literal>unlink</literal> command, it may
 432               return an error that the file could not be found, but the file
 433               on the MDS has been permanently removed.</para>
 434           </note>
 435         </listitem>
 436       </orderedlist>
 437       <para>If the file system cannot be mounted, currently there is no way
 438         that parses metadata directly from an MDS. If the bad OST does not
 439         start, options to mount the file system are to provide a loop device
 440         OST in its place or replace it with a newly-formatted OST. In that case,
 441         the missing objects are created and are read as zero-filled.</para>
 442     </section>
 443     <section xml:id="dbdoclet.repair_ost_lastid">
 444       <title>Fixing a Bad LAST_ID on an OST</title>
 445       <para>Each OST contains a <literal>LAST_ID</literal> file, which holds
 446         the last object (pre-)created by the MDS
 447         <footnote><para>The contents of the <literal>LAST_ID</literal>
 448           file must be accurate regarding the actual objects that exist
 449           on the OST.</para></footnote>.
 450         The MDT contains a <literal>lov_objid</literal> file, with values
 451         that represent the last object the MDS has allocated to a file.</para>
 452       <para>During normal operation, the MDT keeps pre-created (but unused)
 453         objects on the OST, and normally <literal>LAST_ID</literal> should be
 454         larger than <literal>lov_objid</literal>.  Any small difference in the
 455         values is a result of objects being precreated on the OST to improve
 456         MDS file creation performance. These precreated objects are not yet
 457         allocated to a file, since they are of zero length (empty).</para>
 458       <para>However, in the case where <literal>lov_objid</literal> is
 459         larger than <literal>LAST_ID</literal>, it indicates the MDS has
 460         allocated objects to files that do not exist on the OST.  Conversely,
 461         if <literal>lov_objid</literal> is significantly less than
 462         <literal>LAST_ID</literal> (by at least 20,000 objects) it indicates
 463         the OST previously allocated objects at the request of the MDS (which
 464         likely contain data) but it doesn't know about them.</para>
 465       <para condition='l25'>Since Lustre 2.5 the MDS and OSS will resync the
 466         <literal>lov_objid</literal> and <literal>LAST_ID</literal> files
 467         automatically if they become out of sync.  This may result in some
 468         space on the OSTs becoming unavailable until LFSCK is next run, but
 469         avoids issues with mounting the filesystem.</para>
 470       <para condition='l26'>Since Lustre 2.6 the LFSCK will repair the
 471         <literal>LAST_ID</literal> file on the OST automatically based on
 472         the objects that exist on the OST, in case it was corrupted.</para>
 473       <para>In situations where there is on-disk corruption of the OST, for
 474         example caused by the disk write cache being lost, or if the OST
 475         was restored from an old backup or reformatted, the
 476         <literal>LAST_ID</literal> value may become inconsistent and result
 477         in a message similar to:</para>
 478       <screen>&quot;myth-OST0002: Too many FIDs to precreate,
 479 OST replaced or reformatted: LFSCK will clean up&quot;</screen>
 480       <para>A related situation may happen if there is a significant
 481         discrepancy between the record of previously-created objects on the
 482         OST and the previously-allocated objects on the MDT, for example if
 483         the MDT has been corrupted, or restored from backup, which would cause
 484         significant data loss if left unchecked. This produces a message
 485         like:</para>
 486       <screen>&quot;myth-OST0002: too large difference between
 487 MDS LAST_ID [0x1000200000000:0x100048:0x0] (1048648) and
 488 OST LAST_ID [0x1000200000000:0x2232123:0x0] (35856675), trust the OST&quot;</screen>
 489       <para>In such cases, the MDS will advance the <literal>lov_objid</literal>
 490         value to match that of the OST to avoid deleting existing objects,
 491         which may contain data.  Files on the MDT that reference these objects
 492         will not be lost.  Any unreferenced OST objects will be attached to
 493         the <literal>.lustre/lost+found</literal> directory the next time
 494         LFSCK <literal>layout</literal> check is run.</para>
 495     </section>
 496     <section remap="h3">
 497       <title><indexterm><primary>troubleshooting</primary><secondary>'Address already in use'</secondary></indexterm>Handling/Debugging &quot;<literal>Bind: Address already in use</literal>&quot; Error</title>
 498       <para>During startup, the Lustre software may report a <literal>bind: Address already in
 499           use</literal> error and reject to start the operation. This is caused by a portmap service
 500         (often NFS locking) that starts before the Lustre file system and binds to the default port
 501         988. You must have port 988 open from firewall or IP tables for incoming connections on the
 502         client, OSS, and MDS nodes. LNet will create three outgoing connections on available,
 503         reserved ports to each client-server pair, starting with 1023, 1022 and 1021.</para>
 504       <para>Unfortunately, you cannot set sunprc to avoid port 988. If you receive this error, do the following:</para>
 505       <itemizedlist>
 506         <listitem>
 507           <para>Start the Lustre file system before starting any service that uses sunrpc.</para>
 508         </listitem>
 509         <listitem>
 510           <para>Use a port other than 988 for the Lustre file system. This is configured in
 511               <literal>/etc/modprobe.d/lustre.conf</literal> as an option to the LNet module. For
 512             example:</para>
 513           <screen>options lnet accept_port=988</screen>
 514         </listitem>
 515       </itemizedlist>
 516       <itemizedlist>
 517         <listitem>
 518           <para>Add modprobe ptlrpc to your system startup scripts before the service that uses
 519             sunrpc. This causes the Lustre file system to bind to port 988 and sunrpc to select a
 520             different port.</para>
 521         </listitem>
 522       </itemizedlist>
 523       <note>
 524         <para>You can also use the <literal>sysctl</literal> command to mitigate the NFS client from grabbing the Lustre service port. However, this is a partial workaround as other user-space RPC servers still have the ability to grab the port.</para>
 525       </note>
 526     </section>
 527     <section remap="h3">
 528       <title><indexterm><primary>troubleshooting</primary><secondary>'Error -28'</secondary></indexterm>Handling/Debugging Error &quot;- 28&quot;</title>
 529       <para>A Linux error -28 (<literal>ENOSPC</literal>) that occurs during
 530         a write or sync operation indicates that an existing file residing
 531         on an OST could not be rewritten or updated because the OST was full,
 532         or nearly full. To verify if this is the case, run on a client:</para>
 533         <screen>
 534 client$ lfs df -h
 535 UUID                       bytes        Used   Available Use% Mounted on
 536 myth-MDT0000_UUID          12.9G        1.5G       10.6G  12% /myth[MDT:0]
 537 myth-OST0000_UUID           3.6T        3.1T      388.9G  89% /myth[OST:0]
 538 myth-OST0001_UUID           3.6T        3.6T       64.0K 100% /myth[OST:1]
 539 myth-OST0002_UUID           3.6T        3.1T      394.6G  89% /myth[OST:2]
 540 myth-OST0003_UUID           5.4T        5.0T      267.8G  95% /myth[OST:3]
 541 myth-OST0004_UUID           5.4T        2.9T        2.2T  57% /myth[OST:4]
 542
 543 filesystem_summary:        21.6T       17.8T        3.2T  85% /myth
 544         </screen>
 545       <para>To address this issue, you can expand the disk space on the OST,
 546           or use the <literal>lfs_migrate</literal> command to migrate (move)
 547           files to a less full OST.  For details on both of these options
 548           see <xref linkend="lustremaint.adding_new_ost" /></para>
 549       <para condition='l26'>In some cases, there may be processes holding
 550         files open that are consuming a significant amount of space (e.g.
 551         runaway process writing lots of data to an open file that has been
 552         deleted).  It is possible to get a list of all open file handles in the
 553         filesystem from the MDS:
 554         <screen>
 555 mds# lctl get_param mdt.*.exports.*.open_files
 556 mdt.myth-MDT0000.exports.192.168.20.159@tcp.open_files=
 557 [0x200003ab4:0x435:0x0]
 558 [0x20001e863:0x1c1:0x0]
 559 [0x20001e863:0x1c2:0x0]
 560 :
 561 :
 562         </screen>
 563         These file handles can be converted into pathnames on any client via
 564         the <literal>lfs fid2path</literal> command (as root):
 565         <screen>
 566 client# lfs fid2path /myth [0x200003ab4:0x435:0x0] [0x20001e863:0x1c1:0x0] [0x20001e863:0x1c2:0x0]
 567 lfs fid2path: cannot find '[0x200003ab4:0x435:0x0]': No such file or directory
 568 /myth/tmp/4M
 569 /myth/tmp/1G
 570 :
 571 :
 572         </screen>
 573         In some cases, if the file has been deleted from the filesystem,
 574         <literal>fid2path</literal> will return an error that the file is
 575         not found.  You can use the client NID
 576         (<literal>192.168.20.159@tcp</literal> in the above example) to
 577         determine which node the file is open on, and <literal>lsof</literal>
 578         to find and kill the process that is holding the file open:
 579         <screen>
 580 # lsof /myth
 581 COMMAND   PID   USER  FD TYPE    DEVICE      SIZE/OFF               NODE NAME
 582 logger  13806 mythtv  0r REG  35,632494 1901048576384 144115440203858997 /myth/logs/job.1283929.log (deleted)
 583         </screen>
 584       </para>
 585       <para>A Linux error -28 (<literal>ENOSPC</literal>) that occurs when
 586         a new file is being created may indicate that the MDT has run out
 587         of inodes and needs to be made larger. Newly created files are not
 588         written to full OSTs, while existing files continue to reside on
 589         the OST where they were initially created. To view inode information
 590         on the MDT, run on a client:</para>
 591         <screen>
 592 lfs df -i
 593 UUID                      Inodes       IUsed       IFree IUse% Mounted on
 594 myth-MDT0000_UUID        1910263     1910263           0 100% /myth[MDT:0]
 595 myth-OST0000_UUID         947456      360059      587397  89% /myth[OST:0]
 596 myth-OST0001_UUID         948864      233748      715116  91% /myth[OST:1]
 597 myth-OST0002_UUID         947456      549961      397495  89% /myth[OST:2]
 598 myth-OST0003_UUID        1426144      477595      948549  95% /myth[OST:3]
 599 myth-OST0004_UUID        1426080      465248     1420832  57% /myth[OST:4]
 600
 601 filesystem_summary:      1910263     1910263           0 100% /myth
 602         </screen>
 603       <para>Typically, the Lustre software reports this error to your
 604         application. If the application is checking the return code from
 605         its function calls, then it decodes it into a textual error message
 606         such as <literal>No space left on device</literal>. The numeric
 607         error message may also appear in the system log.</para>
 608       <para>For more information about the <literal>lfs df</literal> command,
 609         see <xref linkend="dbdoclet.checking_free_space"/>.</para>
 610       <para>You can also use the <literal>lctl get_param</literal> command to
 611         monitor the space and object usage on the OSTs and MDTs from any
 612         client:</para>
 613         <screen>lctl get_param {osc,mdc}.*.{kbytes,files}{free,avail,total}
 614         </screen>
 615       <note>
 616         <para>You can find other numeric error codes along with a short name
 617         and text description in <literal>/usr/include/asm/errno.h</literal>.
 618         </para>
 619       </note>
 620     </section>
 621     <section remap="h3">
 622       <title>Triggering Watchdog for PID NNN</title>
 623       <para>In some cases, a server node triggers a watchdog timer and this causes a process stack to be dumped to the console along with a Lustre kernel debug log being dumped into <literal>/tmp</literal> (by default). The presence of a watchdog timer does NOT mean that the thread OOPSed, but rather that it is taking longer time than expected to complete a given operation. In some cases, this situation is expected.</para>
 624       <para>For example, if a RAID rebuild is really slowing down I/O on an OST, it might trigger watchdog timers to trip. But another message follows shortly thereafter, indicating that the thread in question has completed processing (after some number of seconds). Generally, this indicates a transient problem. In other cases, it may legitimately signal that a thread is stuck because of a software error (lock inversion, for example).</para>
 625       <screen>Lustre: 0:0:(watchdog.c:122:lcw_cb()) </screen>
 626       <para>The above message indicates that the watchdog is active for pid 933:</para>
 627       <para>It was inactive for 100000ms:</para>
 628       <screen>Lustre: 0:0:(linux-debug.c:132:portals_debug_dumpstack()) </screen>
 629       <para>Showing stack for process:</para>
 630       <screen>933 ll_ost_25     D F896071A     0   933      1    934   932 (L-TLB)
 631 f6d87c60 00000046 00000000 f896071a f8def7cc 00002710 00001822 2da48cae
 632 0008cf1a f6d7c220 f6d7c3d0 f6d86000 f3529648 f6d87cc4 f3529640 f8961d3d
 633 00000010 f6d87c9c ca65a13c 00001fff 00000001 00000001 00000000 00000001</screen>
 634       <para>Call trace:</para>
 635       <screen>filter_do_bio+0x3dd/0xb90 [obdfilter]
 636 default_wake_function+0x0/0x20
 637 filter_direct_io+0x2fb/0x990 [obdfilter]
 638 filter_preprw_read+0x5c5/0xe00 [obdfilter]
 639 lustre_swab_niobuf_remote+0x0/0x30 [ptlrpc]
 640 ost_brw_read+0x18df/0x2400 [ost]
 641 ost_handle+0x14c2/0x42d0 [ost]
 642 ptlrpc_server_handle_request+0x870/0x10b0 [ptlrpc]
 643 ptlrpc_main+0x42e/0x7c0 [ptlrpc]
 644 </screen>
 645     </section>
 646     <section remap="h3">
 647       <title><indexterm>
 648           <primary>troubleshooting</primary>
 649           <secondary>timeouts on setup</secondary>
 650         </indexterm>Handling Timeouts on Initial Lustre File System Setup</title>
 651       <para>If you come across timeouts or hangs on the initial setup of your Lustre file system,
 652         verify that name resolution for servers and clients is working correctly. Some distributions
 653         configure <literal>/etc/hosts</literal> so the name of the local machine (as reported by the
 654         &apos;hostname&apos; command) is mapped to local host (127.0.0.1) instead of a proper IP
 655         address.</para>
 656       <para>This might produce this error:</para>
 657       <screen>LustreError:(ldlm_handle_cancel()) received cancel for unknown lock cookie
 658 0xe74021a4b41b954e from nid 0x7f000001 (0:127.0.0.1)
 659 </screen>
 660     </section>
 661     <section remap="h3" xml:id="went_back_in_time">
 662       <title>Handling/Debugging &quot;LustreError: xxx went back in time&quot;</title>
 663       <para>Each time the MDS or OSS modifies the state of the MDT or OST disk
 664       filesystem for a client, it records a per-target increasing transaction
 665       number for the operation and returns it to the client along with the
 666       reply to that operation. Periodically, when the server commits these
 667       transactions to disk, the <literal>last_committed</literal> transaction
 668       number is returned to the client to allow it to discard pending operations
 669       from memory, as they will no longer be needed for recovery in case of
 670       server failure.</para>
 671       <para>In some cases error messages similar to the following have
 672       been observed after a server was restarted or failed over:</para>
 673       <screen>
 674 LustreError: 3769:0:(import.c:517:ptlrpc_connect_interpret())
 675 testfs-ost12_UUID went back in time (transno 831 was previously committed,
 676 server now claims 791)!
 677       </screen>
 678       <para>This situation arises when:</para>
 679       <itemizedlist>
 680         <listitem>
 681           <para>You are using a disk device that claims to have data written
 682           to disk before it actually does, as in case of a device with a large
 683           cache. If that disk device crashes or loses power in a way that
 684           causes the loss of the cache, there can be a loss of transactions
 685           that you believe are committed. This is a very serious event, and
 686           you should run e2fsck against that storage before restarting the
 687           Lustre file system.</para>
 688         </listitem>
 689         <listitem>
 690           <para>As required by the Lustre software, the shared storage used
 691           for failover is completely cache-coherent. This ensures that if one
 692           server takes over for another, it sees the most up-to-date and
 693           accurate copy of the data. In case of the failover of the server,
 694           if the shared storage does not provide cache coherency between all
 695           of its ports, then the Lustre software can produce an error.</para>
 696         </listitem>
 697       </itemizedlist>
 698       <para>If you know the exact reason for the error, then it is safe to
 699       proceed with no further action. If you do not know the reason, then this
 700       is a serious issue and you should explore it with your disk vendor.</para>
 701       <para>If the error occurs during failover, examine your disk cache
 702       settings. If it occurs after a restart without failover, try to
 703       determine how the disk can report that a write succeeded, then lose the
 704       Data Device corruption or Disk Errors.</para>
 705     </section>
 706     <section remap="h3">
 707       <title>Lustre Error: &quot;<literal>Slow Start_Page_Write</literal>&quot;</title>
 708       <para>The slow <literal>start_page_write</literal> message appears when the operation takes an extremely long time to allocate a batch of memory pages. Use these pages to receive network traffic first, and then write to disk.</para>
 709     </section>
 710     <section remap="h3">
 711       <title>Drawbacks in Doing Multi-client O_APPEND Writes</title>
 712       <para>It is possible to do multi-client <literal>O_APPEND</literal> writes to a single file, but there are few drawbacks that may make this a sub-optimal solution. These drawbacks are:</para>
 713       <itemizedlist>
 714         <listitem>
 715           <para>  Each client needs to take an <literal>EOF</literal> lock on all the OSTs, as it is difficult to know which OST holds the end of the file until you check all the OSTs. As all the clients are using the same <literal>O_APPEND</literal>, there is significant locking overhead.</para>
 716         </listitem>
 717         <listitem>
 718           <para> The second client cannot get all locks until the end of the writing of the first client, as the taking serializes all writes from the clients.</para>
 719         </listitem>
 720         <listitem>
 721           <para> To avoid deadlocks, the taking of these locks occurs in a known, consistent order. As a client cannot know which OST holds the next piece of the file until the client has locks on all OSTS, there is a need of these locks in case of a striped file.</para>
 722         </listitem>
 723       </itemizedlist>
 724     </section>
 725     <section remap="h3">
 726       <title><indexterm>
 727           <primary>troubleshooting</primary>
 728           <secondary>slowdown during startup</secondary>
 729         </indexterm>Slowdown Occurs During Lustre File System Startup</title>
 730       <para>When a Lustre file system starts, it needs to read in data from the disk. For the very
 731         first mdsrate run after the reboot, the MDS needs to wait on all the OSTs for object
 732         pre-creation. This causes a slowdown to occur when the file system starts up.</para>
 733       <para>After the file system has been running for some time, it contains more data in cache and hence, the variability caused by reading critical metadata from disk is mostly eliminated. The file system now reads data from the cache.</para>
 734     </section>
 735     <section remap="h3">
 736       <title><indexterm><primary>troubleshooting</primary><secondary>OST out of memory</secondary></indexterm>Log Message <literal>&apos;Out of Memory</literal>&apos; on OST</title>
 737       <para>When planning the hardware for an OSS node, consider the memory usage of several
 738         components in the Lustre file system. If insufficient memory is available, an &apos;out of
 739         memory&apos; message can be logged.</para>
 740       <para>During normal operation, several conditions indicate insufficient RAM on a server node:</para>
 741       <itemizedlist>
 742         <listitem>
 743           <para> kernel &quot;<literal>Out of memory</literal>&quot; and/or &quot;<literal>oom-killer</literal>&quot; messages</para>
 744         </listitem>
 745         <listitem>
 746           <para> Lustre &quot;<literal>kmalloc of &apos;mmm&apos; (NNNN bytes) failed...</literal>&quot; messages</para>
 747         </listitem>
 748         <listitem>
 749           <para> Lustre or kernel stack traces showing processes stuck in &quot;<literal>try_to_free_pages</literal>&quot;</para>
 750         </listitem>
 751       </itemizedlist>
 752       <para>For information on determining the MDS memory and OSS memory
 753       requirements, see <xref linkend="dbdoclet.mds_oss_memory"/>.</para>
 754     </section>
 755     <section remap="h3">
 756       <title>Setting SCSI I/O Sizes</title>
 757       <para>Some SCSI drivers default to a maximum I/O size that is too small for good Lustre file
 758         system performance. we have fixed quite a few drivers, but you may still find that some
 759         drivers give unsatisfactory performance with the Lustre file system. As the default value is
 760         hard-coded, you need to recompile the drivers to change their default. On the other hand,
 761         some drivers may have a wrong default set.</para>
 762       <para>If you suspect bad I/O performance and an analysis of Lustre file system statistics
 763         indicates that I/O is not 1 MB, check
 764           <literal>/sys/block/<replaceable>device</replaceable>/queue/max_sectors_kb</literal>. If
 765         the <literal>max_sectors_kb</literal> value is less than 1024, set it to at least 1024 to
 766         improve performance. If changing <literal>max_sectors_kb</literal> does not change the I/O
 767         size as reported by the Lustre software, you may want to examine the SCSI driver
 768         code.</para>
 769     </section>
 770   </section>
 771 </chapter>
 772 <!--
 773   vim:expandtab:shiftwidth=2:tabstop=8:
 774   -->