Whamcloud - gitweb
LU-17888 osd-ldiskfs: osd_scrub_refresh_mapping deadlock 67/55267/4
authorAlexander Zarochentsev <alexander.zarochentsev@hpe.com>
Thu, 30 May 2024 16:23:25 +0000 (16:23 +0000)
committerOleg Drokin <green@whamcloud.com>
Fri, 21 Jun 2024 04:57:24 +0000 (04:57 +0000)
commit37ea88ad8dc35718a815533e47a59e3865c3d307
tree0da6256d98d5e4397045cfcab9e7db3e1b9765d9
parent0fa5515b8bcfc310d5cfea30b81fdd3b09e26bcc
LU-17888 osd-ldiskfs: osd_scrub_refresh_mapping deadlock

After copying a lustre special file (last_rcvd for example)
to a new inode, lustre mount hangs with the following stack trace:

[root@testbed ~]# cat /proc/pidof mount.lustre/stack
[<0>] rwsem_down_write_slowpath+0x32a/0x610
[<0>] osd_obj_update_entry.isra.22+0xb7/0x900 [osd_ldiskfs]
[<0>] osd_obj_spec_update+0x146/0x160 [osd_ldiskfs]
[<0>] osd_scrub_refresh_mapping+0x282/0x420 [osd_ldiskfs]
[<0>] osd_ios_scan_one+0x5df/0xe10 [osd_ldiskfs]
[<0>] osd_ios_root_fill+0x267/0x300 [osd_ldiskfs]
[<0>] call_filldir+0xb0/0x120 [ldiskfs]
[<0>] ldiskfs_readdir+0x7a7/0xac0 [ldiskfs]
[<0>] iterate_dir+0x13c/0x190
[<0>] osd_ios_general_scan+0x10e/0x250 [osd_ldiskfs]
[<0>] osd_initial_OI_scrub+0x72/0x920 [osd_ldiskfs]
[<0>] osd_scrub_setup+0x8ab/0x9e0 [osd_ldiskfs]
[<0>] osd_device_init0+0x447/0x810 [osd_ldiskfs]
[<0>] osd_device_alloc+0x186/0x220 [osd_ldiskfs]
[<0>] obd_setup+0x115/0x2d0 [obdclass]
[<0>] class_setup+0x57f/0x790 [obdclass]
[<0>] class_process_config+0x1104/0x2460 [obdclass]
[<0>] do_lcfg+0x21d/0x530 [obdclass]
[<0>] lustre_start_simple+0x77/0x1d0 [obdclass]
[<0>] osd_start+0x408/0x7f0 [obdclass]
[<0>] server_fill_super+0x382/0x10d0 [obdclass]
[<0>] lustre_fill_super+0x3a1/0x3f0 [lustre]
[<0>] mount_nodev+0x48/0xa0
[<0>] legacy_get_tree+0x27/0x40
[<0>] vfs_get_tree+0x25/0xb0
[<0>] do_mount+0x2e2/0x950
[<0>] ksys_mount+0xb6/0xd0
[<0>] __x64_sys_mount+0x21/0x30
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca

root inode lock is attempted to be taken twice,
once in iterate_dir() and another attempt in
osd_obj_update_entry().

HPE-bug-id: LUS-12368
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Idc5f9bd2a20d25dfb5eb4a044ddd00ff7eb4558b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55267
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/osd-ldiskfs/osd_compat.c
lustre/osd-ldiskfs/osd_internal.h
lustre/osd-ldiskfs/osd_oi.c
lustre/osd-ldiskfs/osd_scrub.c