Whamcloud - gitweb
LU-18392 tests: watch destroys_in_flight recovery_small/160 96/58096/3
authorLi Dongyang <dongyangli@ddn.com>
Mon, 17 Feb 2025 03:37:55 +0000 (14:37 +1100)
committerOleg Drokin <green@whamcloud.com>
Fri, 28 Feb 2025 08:16:01 +0000 (08:16 +0000)
In recovery_small/160, we sleep and check for destroys_in_flight
and make sure the number is low, which indicates destroys are not
blocked.

However there's a 10s timeout, the destroy rpcs could be retried
and before they got put back on error_list again, the destroys_in_flight
could be bumped up, if the test case happen to check destroys_in_flight
during this window the test case could fail.

Use wait_update_cond to watch for the expected drop.

Change-Id: I0b29a90e4c78e80a0b5a522d57ed97db1b698364
Test-Parameters: trivial testlist=recovery-small env=ONLY=160,ONLY_REPEAT=100
Fixes: 27f787daa7 ("LU-15737 ofd: don't block destroys")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/58096
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lustre/tests/recovery-small.sh

index d4a03db..7981998 100755 (executable)
@@ -3752,19 +3752,16 @@ test_160() {
        mds_evict_client
        client_reconnect
 
-       local step=3
-       for ((i = 1; i <= $((timeout / step + 1)); i++)); do
-               do_facet mds1 $LCTL get_param osp.$FSNAME-OST0000-osc-MDT0000.destroys_in_flight
-               sleep $step
-       done
-       local rc=$(do_facet mds1 $LCTL get_param -n osp.$FSNAME-OST0000-osc-MDT0000.destroys_in_flight)
+       wait_update_facet_cond --verbose mds1 \
+               "$LCTL get_param -n osp.$FSNAME-OST0000-osc-MDT0000.destroys_in_flight" \
+               "-le" "2" $((timeout * 3))
+       local rc=$?
        do_facet mds1 $LCTL get_param osp.$FSNAME-OST0000-osc-MDT0000.error_list
-       echo inflight $rc
        for ((i = 1; i <= threads; i++)); do
                kill -USR1 ${pids[$i]} && wait ${pids[$i]}
        done
 
-       (( $rc <= 2 )) || error "destroying OST objects are blocked $rc"
+       (( rc == 0 )) || error "destroying OST objects are blocked"
 
        #without group lock, wait and check if all objects are destroyed
        sleep $((timeout * 3))