From e2ac5f28c06a34318c9eb2c741ffbf47eea4690d Mon Sep 17 00:00:00 2001 From: Kevin Zhao Date: Wed, 22 Dec 2021 09:53:27 +0800 Subject: [PATCH] LU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs When setup with multiple MDTs, the atomic operation is needed for `set_bit` operation. On Arm64 platform, the atomic operation will rely on the exclusive access, which is requesting the address alignment[1]. So that's why we see that the __ll_sc_atomic64_or+0x4 is crashed. __ll_sc_atomic64_or+0x4 is LDXR instruction, directly load the value from address exclusively. The atomic64 required the access the 64 bits alignment address, but the struct element ha_map is 4 bytes alignment, that is the root cause. The Error code of this crash is ESR = 0x96000021, which is the alignment issue[2]. 1. https://developer.arm.com/documentation/den0024/a/ch05s01s02 2. https://developer.arm.com/documentation/ddi0595/2021-06/ AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1- Signed-off-by: Kevin Zhao Change-Id: I3cc6d7347f05680ab55f00538e91886f006deb5d Reviewed-on: https://review.whamcloud.com/45922 Tested-by: jenkins Reviewed-by: James Simmons Reviewed-by: xinliang Tested-by: Maloo Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin --- lustre/include/lustre_dlm.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lustre/include/lustre_dlm.h b/lustre/include/lustre_dlm.h index d40bcc8..dc738f4 100644 --- a/lustre/include/lustre_dlm.h +++ b/lustre/include/lustre_dlm.h @@ -1032,12 +1032,10 @@ struct ldlm_match_data { * which is for server. */ #define l_slc_link l_rk_ast -#define HANDLE_MAP_SIZE ((LMV_MAX_STRIPE_COUNT + 7) >> 3) - struct lustre_handle_array { unsigned int ha_count; /* ha_map is used as bit flag to indicate handle is remote or local */ - char ha_map[HANDLE_MAP_SIZE]; + DECLARE_BITMAP(ha_map, LMV_MAX_STRIPE_COUNT); struct lustre_handle ha_handles[0]; }; -- 1.8.3.1