Whamcloud - gitweb
LU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs 22/45922/5
authorKevin Zhao <kevin.zhao@linaro.org>
Wed, 22 Dec 2021 01:53:27 +0000 (09:53 +0800)
committerOleg Drokin <green@whamcloud.com>
Tue, 18 Jan 2022 09:07:43 +0000 (09:07 +0000)
When setup with multiple MDTs, the atomic operation is needed for
`set_bit` operation. On Arm64 platform, the atomic operation will
rely on the exclusive access, which is requesting the address
alignment[1]. So that's why we see that the __ll_sc_atomic64_or+0x4
is crashed. __ll_sc_atomic64_or+0x4 is LDXR instruction, directly
load the value from address exclusively.

The atomic64 required the access the 64 bits alignment address, but
the struct element ha_map is 4 bytes alignment, that is the root
cause. The Error code of this crash is ESR = 0x96000021, which is
the alignment issue[2].

1. https://developer.arm.com/documentation/den0024/a/ch05s01s02
2. https://developer.arm.com/documentation/ddi0595/2021-06/
   AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-

Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Change-Id: I3cc6d7347f05680ab55f00538e91886f006deb5d
Reviewed-on: https://review.whamcloud.com/45922
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/lustre_dlm.h

index d40bcc8..dc738f4 100644 (file)
@@ -1032,12 +1032,10 @@ struct ldlm_match_data {
  *  which is for server. */
 #define l_slc_link l_rk_ast
 
-#define HANDLE_MAP_SIZE  ((LMV_MAX_STRIPE_COUNT + 7) >> 3)
-
 struct lustre_handle_array {
        unsigned int            ha_count;
        /* ha_map is used as bit flag to indicate handle is remote or local */
-       char                    ha_map[HANDLE_MAP_SIZE];
+       DECLARE_BITMAP(ha_map, LMV_MAX_STRIPE_COUNT);
        struct lustre_handle    ha_handles[0];
 };