LU-5537 ptlrpc: Fix an rq_no_reply assertion failure
An OSS had an assertion failure:
LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout
on bulk GET after 0+0s req@
ffff88083a61b400
x1476486691018500/t0(
4300509964)
o4->
8dda3382-83f8-6445-5eea-
828fd59e4a06@192.168.1.116@o2ib1:0/0
lens 504/448 e 391470 to 0 dl
1408494729 ref 2 fl Complete:/4/0 rc
0/0
LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION(
req->rq_no_reply == 0 ) failed:
Lustre: soaked-OST0000: Bulk IO write error with
8dda3382-83f8-6445-5eea-
828fd59e4a06 (at 192.168.1.116@o2ib1),
client will retry: rc -110
LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG
Pid: 5432, comm: ll_ost_io03_003
Call Trace:
[<
ffffffffa0641895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<
ffffffffa0641e97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<
ffffffffa09cda4c>] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc]
[<
ffffffffa09d4aae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
[<
ffffffffa09e4d75>] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc]
[<
ffffffffa09dc1e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
[<
ffffffffa09e66f8>] ptlrpc_main+0x12e8/0x1990 [ptlrpc]
[<
ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130
[<
ffffffff81529246>] ? schedule+0x176/0x3b0
[<
ffffffffa09e5410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
[<
ffffffff8109abf6>] kthread+0x96/0xa0
[<
ffffffff8100c20a>] child_rip+0xa/0x20
[<
ffffffff8109ab60>] ? kthread+0x0/0xa0
[<
ffffffff8100c200>] ? child_rip+0x0/0x20
The thread in tgt_brw_write() had decided not to reply by setting
rq_no_reply, right before another thread tried to send an early reply
for the request.
Change-Id: I9096a098621a38610c0d0d2dff016c012fc4b7f2
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/11740
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>