From: Li Wei Date: Wed, 3 Sep 2014 09:02:22 +0000 (+0800) Subject: LU-5537 ptlrpc: Fix an rq_no_reply assertion failure X-Git-Tag: 2.6.91~66 X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commitdiff_plain;h=a8d448e4cd5978c546911f98067232bcdd30b651 LU-5537 ptlrpc: Fix an rq_no_reply assertion failure An OSS had an assertion failure: LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout on bulk GET after 0+0s req@ffff88083a61b400 x1476486691018500/t0(4300509964) o4->8dda3382-83f8-6445-5eea-828fd59e4a06@192.168.1.116@o2ib1:0/0 lens 504/448 e 391470 to 0 dl 1408494729 ref 2 fl Complete:/4/0 rc 0/0 LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION( req->rq_no_reply == 0 ) failed: Lustre: soaked-OST0000: Bulk IO write error with 8dda3382-83f8-6445-5eea-828fd59e4a06 (at 192.168.1.116@o2ib1), client will retry: rc -110 LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG Pid: 5432, comm: ll_ost_io03_003 Call Trace: [] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [] lbug_with_loc+0x47/0xb0 [libcfs] [] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc] [] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] [] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc] [] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] [] ptlrpc_main+0x12e8/0x1990 [ptlrpc] [] ? pick_next_task_fair+0xd0/0x130 [] ? schedule+0x176/0x3b0 [] ? ptlrpc_main+0x0/0x1990 [ptlrpc] [] kthread+0x96/0xa0 [] child_rip+0xa/0x20 [] ? kthread+0x0/0xa0 [] ? child_rip+0x0/0x20 The thread in tgt_brw_write() had decided not to reply by setting rq_no_reply, right before another thread tried to send an early reply for the request. Change-Id: I9096a098621a38610c0d0d2dff016c012fc4b7f2 Signed-off-by: Li Wei Reviewed-on: http://review.whamcloud.com/11740 Tested-by: Jenkins Tested-by: Maloo Reviewed-by: Andreas Dilger Reviewed-by: Johann Lombardi --- diff --git a/lustre/ptlrpc/service.c b/lustre/ptlrpc/service.c index 8747321..5b72b9f 100644 --- a/lustre/ptlrpc/service.c +++ b/lustre/ptlrpc/service.c @@ -1354,6 +1354,14 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) reqcopy->rq_reqmsg = reqmsg; memcpy(reqmsg, req->rq_reqmsg, req->rq_reqlen); + /* + * tgt_brw_read() and tgt_brw_write() may have decided not to reply. + * Without this check, we would fail the rq_no_reply assertion in + * ptlrpc_send_reply(). + */ + if (reqcopy->rq_no_reply) + GOTO(out, rc = -ETIMEDOUT); + LASSERT(atomic_read(&req->rq_refcount)); /** if it is last refcount then early reply isn't needed */ if (atomic_read(&req->rq_refcount) == 1) {