Whamcloud - gitweb
LU-16356 hsm: add running ref to the coordinator
This patch replaces the fe5706e by adding a reference "cdt_ref" when
the coordinator is running (it does not trust HSM state).
This avoids to de-init the coordinator while still in use (e.g:
thread to add an hsm request) and avoids complex locking on HSM state.
It also causes the coordinator thread to exit if
cdt_start_pending_restore() fails. Otherwise, this can produce a lot
of unexpected behavior (hang, crash).
The patch modifies mdc_kuc_reregister() to register the hsm agent in
background. This make independent reconnect and the agent
registration. It enables to re-activate resend for HSM_CT_REGISTER
without the LU-13455. The coordinator returns EINPROGRESS if not
ready and the client will resend the request for that case. So the
copytools can wait the coordinator to be ready.
Add regression test sanity-hsm 409a.
Fixes: fe5706e ("LU-16235 hsm: check CDT state before adding actions llog")
Fixes: 3d58403 ("LU-13455 ptlrpc: connect to MDT stucks")
Test-Parameters: testlist=sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=107,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm env=ONLY=409a,ONLY_REPEAT=20
Test-Parameters: testlist=conf-sanity env=ONLY=132,ONLY_REPEAT=20
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I14302d1053abbe76eeaaa1a63c6fd6d9b530baa9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51256
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>