In recovery_small/160, we sleep and check for destroys_in_flight
and make sure the number is low, which indicates destroys are not
blocked.
However there's a 10s timeout, the destroy rpcs could be retried
and before they got put back on error_list again, the destroys_in_flight
could be bumped up, if the test case happen to check destroys_in_flight
during this window the test case could fail.
Use wait_update_cond to watch for the expected drop.
Change-Id: I0b29a90e4c78e80a0b5a522d57ed97db1b698364
Test-Parameters: trivial testlist=recovery-small env=ONLY=160,ONLY_REPEAT=100
Fixes:
27f787daa7 ("LU-15737 ofd: don't block destroys")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58096
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
mds_evict_client
client_reconnect
- local step=3
- for ((i = 1; i <= $((timeout / step + 1)); i++)); do
- do_facet mds1 $LCTL get_param osp.$FSNAME-OST0000-osc-MDT0000.destroys_in_flight
- sleep $step
- done
- local rc=$(do_facet mds1 $LCTL get_param -n osp.$FSNAME-OST0000-osc-MDT0000.destroys_in_flight)
+ wait_update_facet_cond --verbose mds1 \
+ "$LCTL get_param -n osp.$FSNAME-OST0000-osc-MDT0000.destroys_in_flight" \
+ "-le" "2" $((timeout * 3))
+ local rc=$?
do_facet mds1 $LCTL get_param osp.$FSNAME-OST0000-osc-MDT0000.error_list
- echo inflight $rc
for ((i = 1; i <= threads; i++)); do
kill -USR1 ${pids[$i]} && wait ${pids[$i]}
done
- (( $rc <= 2 )) || error "destroying OST objects are blocked $rc"
+ (( rc == 0 )) || error "destroying OST objects are blocked"
#without group lock, wait and check if all objects are destroyed
sleep $((timeout * 3))