Whamcloud - gitweb
Olaf Faaland [Mon, 28 Oct 2019 20:34:53 +0000 (13:34 -0700)]
LU-12530 utils: narrow l_tunedisk udev rule
Narrow the udev rule so that it runs l_tunedisk only for ext4 block
devices formatted for Lustre.
Devices which are members of ZFS pools do not need such tunings to
be provided by lustre - they are handled by ZFS.
There are currently no other OSD types in the tree. Sites/Vendors which
support other OSDs will need to adjust the rule appropriately.
Lustre-change: https://review.whamcloud.com/36599
Lustre-commit:
7b2cb54858daa60d560fd6374c4ecba552a10d27
Change-Id: Iba8b20fc705da0259ab71ee33b92193cae7e8eae
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36776
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Mon, 30 Sep 2019 20:50:49 +0000 (23:50 +0300)]
LU-12703 utils: reset rootpath in llapi_search_rootpath()
as get_root_path() can use it as a source and fail if
passed pathname contains garbage (on stack);
Lustre-change: https://review.whamcloud.com/36335
Lustre-commit:
3e2e0025d1e929763f9d4de48746c3433d3684d5
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9f628353c872afc82a582b0a6ca960cd0e8cffcb
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36482
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Artem Blagodarenko [Wed, 28 Nov 2018 20:37:53 +0000 (23:37 +0300)]
LU-1365 utils: allow set block size for ldiskfs backend
Add “-b” option to mkfs.lustre that allows to set backend block size.
Lustre-change: https://review.whamcloud.com/33757
Lustre-commit:
5f674667bfd1ab9a0e47d9f03f3e7eab37eb8e17
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Change-Id: I83fc76f64ce2a0b4bf500841b695d64d3dea78de
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36778
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexander Boyko [Fri, 26 Jul 2019 14:13:21 +0000 (10:13 -0400)]
LU-12593 osd: zeroing a freshly allocated block buffer
Ldiskfs zeroes new buffer only when it is not uptodate.
In rare case we can get a new buffer head with uptodate flag.
This may cause a file corruption for non zero offset writes,
especially for internal Lustre files like update_log, CATALOGS,
lov_objid.
od_fld_lookup()) lustre-MDT0001-mdtlov: invalid FID [0x0:0x50:0x0]
The patch adds zeroing under i_mutex for unmaped blocks.
The performance results, since the patch adds mutex to a creation
path (lov_objid file).
40 tasks, 2000000 files
SUMMARY: (of 5 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
without fix
File creation: 39990.601 19020.238 27443.823 6909.605
With fix
File creation: 37958.809 21708.187 27065.855 5900.961
Lustre-change: https://review.whamcloud.com/35629
Lustre-commit:
f832a7dc33c69fae9af199f0317e6385deeaeccf
Cray-bug-id: LUS-6132
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: Ica8fbe29b5a7253d553b41a41ffe5d8d8b4b2e55
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Tue, 22 Oct 2019 18:27:24 +0000 (11:27 -0700)]
LU-12893 lnet: fix peer_ni selection
When selecting a peer-ni we must use the same peer NID
through all the messages which belong to the same RPC.
This is necessary in order to ensure we do the RDMA over
the optimal interface.
Lustre-change: https://review.whamcloud.com/36552
Lustre-commit:
94ee26738884e3f5b241698bc2e7a8da9702d264
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0391537da32bc6ac7a8a3d92e207bf172d111981
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36643
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jinshan Xiong [Tue, 1 May 2018 18:35:53 +0000 (11:35 -0700)]
LU-4398 llite: do not cache write open lock for exec file
This is to avoid the problem that the MDT needs an extra lock
revocation to make the file be able to execute.
Lustre-change: https://review.whamcloud.com/32265
Lustre-commit:
6dd9d57bc006a37731d34409ce43de13c192e0cc
Signed-off-by: Jinshan Xiong <jinshan.xiong@uber.com>
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Change-Id: Ibb42a9a8cb56a9bf48a6e972b72d3d71ed7fbaf5
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36680
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Thu, 8 Aug 2019 17:13:29 +0000 (13:13 -0400)]
LU-12533 llite: Improve readahead RPC issuance
lov_io_submit receives a range of pages, then adds pages in
to a batch until it hits a page which is not in the stripe
associated with this lov object. This means that if a
readahead page range hits the same stripe more than once,
we will issue multiple I/Os, even if the pages would fit in
one RPC.
This is unnecessary - Just submit all these pages at once.
mpirun -n 2 $IOR -s 2000 -t 47K -b 47K -k -r -E -o $FILE
Without patch:
osc.lustre-OST0001-osc-
ffff8fe82c952000.rpc_stats=
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 118 56 56 | 0 0 0
2: 0 0 56 | 0 0 0
4: 0 0 56 | 0 0 0
8: 0 0 56 | 0 0 0
16: 5 2 58 | 0 0 0
32: 0 0 58 | 0 0 0
64: 0 0 58 | 0 0 0
128: 21 10 68 | 0 0 0
256: 25 11 80 | 0 0 0
512: 10 4 85 | 0 0 0
1024: 31 14 100 | 0 0 0
osc.lustre-OST0002-osc-
ffff8fe82c952000.rpc_stats=
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 5 6 6 | 0 0 0
2: 0 0 6 | 0 0 0
4: 0 0 6 | 0 0 0
8: 0 0 6 | 0 0 0
16: 0 0 6 | 0 0 0
32: 0 0 6 | 0 0 0
64: 0 0 6 | 0 0 0
128: 19 23 29 | 0 0 0
256: 19 23 52 | 0 0 0
512: 5 6 58 | 0 0 0
1024: 34 41 100 | 0 0 0
With patch:
osc.lustre-OST0001-osc-
ffff8fe7a7227800.rpc_stats=
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 12 17 17 | 0 0 0
2: 0 0 17 | 0 0 0
4: 0 0 17 | 0 0 0
8: 0 0 17 | 0 0 0
16: 5 7 24 | 0 0 0
32: 0 0 24 | 0 0 0
64: 5 7 31 | 0 0 0
128: 6 8 40 | 0 0 0
256: 1 1 42 | 0 0 0
512: 2 2 44 | 0 0 0
1024: 38 55 100 | 0 0 0
osc.lustre-OST0002-osc-
ffff8fe7a7227800.rpc_stats=
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 0 0 0 | 0 0 0
2: 0 0 0 | 0 0 0
4: 0 0 0 | 0 0 0
8: 0 0 0 | 0 0 0
16: 0 0 0 | 0 0 0
32: 0 0 0 | 0 0 0
64: 4 7 7 | 0 0 0
128: 7 13 21 | 0 0 0
256: 0 0 21 | 0 0 0
512: 3 5 26 | 0 0 0
1024: 38 73 100 | 0 0 0
Note the much larger # of smaller RPC issued without the patch.
Lustre-change: https://review.whamcloud.com/35458
Lustre-commit:
05b9da4fd124c61fd41d4b560773c0552a1ee5d7
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic10c138628c269afe57fbc57ec8c91ce990717f9
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36342
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Wed, 9 Oct 2019 21:41:54 +0000 (14:41 -0700)]
LU-12844 lnet: fix strncpy bound error
This patch fixes the following error while using gcc 8:
liblnetconfig.c: In function ‘lustre_lnet_parse_nids’:
liblnetconfig.c:320:3: error: ‘strncpy’ specified bound depends on
the length of the source argument [-Werror=stringop-overflow=]
strncpy(entry, cur, len - 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
liblnetconfig.c:310:10: note: length computed here
len = strlen(cur) + 1;
^~~~~~~~~~~
cc1: all warnings being treated as errors
This patch is back-ported from the following one:
Lustre-commit:
cebda7a478f9943f10b9a3388377c61a54957a87
Lustre-change: https://review.whamcloud.com/36417
Change-Id: I2d5840fd58c7b7d27ef1b2aa12f1f187d30abbfd
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36418
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Nathaniel Clark [Wed, 11 Sep 2019 15:10:58 +0000 (11:10 -0400)]
LU-12745 build: Account for optional SPL for ZFS 0.8+
With ZFS 0.8.0 and later, SPL is not longer present.
Some zfs packages provide vestigial spl package contents, but zfs-dkms
does not. This makes testing SPL directories optional depending on
version of ZFS, this also accounts for the new location of the spl
include directory under the zfs include directory.
Lustre-change: https://review.whamcloud.com/36161
Lustre-commit:
a245dde23a9fdbdff7d09a783bcbe3349f68a908
Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I8afcff079f25543a3c86df0c404146a859b226aa
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36408
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Yang Sheng [Mon, 4 Nov 2019 03:49:41 +0000 (11:49 +0800)]
LU-12925 test: assign right initial value for test_61
This patch snip from commit:
591a9b4cebc510ff5. The test_62
would be failed since test_61 leave a failover state in some
case.
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: If46a6d435bcaafb9000abb032ac561c5453776ee
Reviewed-on: https://review.whamcloud.com/36660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Fri, 18 Oct 2019 13:31:00 +0000 (09:31 -0400)]
LU-12803 libcfs: bump module version
The linux client version of libcfs is further ahead in its
cleanup so its module version is higher. While this is the
case it does prevent the OpenSFS version of libcfs from
loading and since OpenSFS is current ahead of the linux
client we prefere to use it at this time. Lets just increase
the OpenSFS libcfs module to be just slightly ahead of the
linux client.
Test-Parameters: trivial
Lustre-change: https://review.whamcloud.com/36488
Lustre-commit:
4b25d733342bc6f3a424ecfb0db80f1c175a8986
Change-Id: Ie57d93529bf25d908099f7dab06d2960f9923d58
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36642
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Bobi Jam [Fri, 24 May 2019 17:40:25 +0000 (01:40 +0800)]
LU-12328 flr: avoid reading unhealthy mirror
* Fix an error in lov_io_mirror_init() which would wait unnecessarily
if we're retrying the last mirror of the file.
* In osc_io_iter_init() we'd check its OSC import status so that the
read path can quickly switch another mirror.
sanity-flr test_33b is added to test this case.
* And with all mirrors have been tried, we'd turn off the quick switch
so that when all mirrors contain bad OSTs, the read will still try
its best to get partial data from a component before trying another
mirror.
sanity-flr test_33c is added to test this case.
Lustre-change: https://review.whamcloud.com/34952
Lustre-commit:
39da3c06275e04e2a6e7f055cb27ee9dff1ea576
Test-Parameters: envdefinitions=ONLY="33" testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr,sanity-flr
Fixes:
5a6ceb664f07 ("LU-7236 ptlrpc: idle connections can disconnect")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I5621a834e58ee1bfccf6c407d2c68357b5c3eb3b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36550
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Wed, 31 Oct 2018 09:54:59 +0000 (12:54 +0300)]
LU-11221 osd: allow concurrent bulks from pagecache
drop page lock earlier, once IO is complete so that page can be
read by few clients simultanously.
Lustre-change: https://review.whamcloud.com/33521
Lustre-commit:
0a92632538d8c985e024def73512d18d1570d5ca
Change-Id: Iee28f578e937744f07f7c5be7eae99e59e625e6e
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36570
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Qian Yingjin [Thu, 1 Nov 2018 08:49:53 +0000 (16:49 +0800)]
LU-11367 som: integrate LSOM with lfs find
The patch integrates LSOM functionality with lfs find so that it
is possible to use LSOM functionality directly on the client. The
MDS fills in the mbo_size and mbo_blocks fields from the LSOM
xattr, if the actual size/blocks are not available, and then set
new OBD_MD_FLLSIZE and OBD_MD_FLLBLOCKS flags in the reply so that
the client knows these fields are valid.
The lfs find command adds "-l|--lazy" option to allow the use of
LSOM data from the MDS.
Add a new version of ioctl(LL_IOC_MDC_GETINFO) call that also returns
valid flags from the MDS RPC to userspace in struct lov_user_mds_data
so that it is possible to determine whether the size and blocks are
returned by the call. The old LL_IOC_MDC_GETINFO ioctl number is
renamed to LL_IOC_MDC_GETINFO_OLD and is binary compatible, but
newly-compiled applications will use the new struct lov_user_mds_data.
New llapi interfaces llapi_get_lum_file(), llapi_get_lum_dir(),
llapi_get_lum_file_fd(), llapi_get_lum_dir_fd() are added to fetch
valid stat() attributes and LOV info to the user.
Lustre-change: https://review.whamcloud.com/35167
Lustre-commit:
11aa7f8704c490b011f60f234c3ac9929ce76948
Signed-off-by: Qian Yingjin <qian@ddn.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I21dfae7c2633dead5d83b438ec340fea4d3ebbe5
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Chris Horn [Mon, 30 Sep 2019 15:03:06 +0000 (10:03 -0500)]
LU-12824 o2ib: Record rc in debug log on startup failure
Since kiblnd_startup() return -ENETDOWN on failure, let's record the
rc value for the failure case in the debug log.
Lustre-change: https://review.whamcloud.com/36325
Lustre-commit:
99f85541a685df82265f18167e91c161c523ce50
Cray-bug-id: LUS-7935
Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ied934642bc567b8d3f51293d7dd095d47ff134df
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36547
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Chris Horn [Mon, 30 Sep 2019 15:01:28 +0000 (10:01 -0500)]
LU-12824 o2ib: Fix whitespace in kiblnd_startup
Convert whitespace to tabs where appropriate in kiblnd_startup()
Lustre-change: https://review.whamcloud.com/36324
Lustre-commit:
50300e83e4cab3157149107eb735825cc4c3aff1
Cray-bug-id: LUS-7935
Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I11aaaa8e47d754b219fb773d74e37190476e4eeb
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36546
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Chris Horn [Mon, 30 Sep 2019 15:04:10 +0000 (10:04 -0500)]
LU-12824 o2ib: Reintroduce kiblnd_dev_search
If we add an interface to multiple nets then we need to re-use the
struct ib_dev object for each of the nets.
Lustre-change: https://review.whamcloud.com/36326
Lustre-commit:
e25e45c612a061031e8b4b5233137fbb57b50cc4
Cray-bug-id: LUS-7935
Fixes: 75ab841 ("LU-11893 lnet: consoldate secondary IP address handling")
Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I1790e24458f47d632fd137b78de076d408fe5260
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36545
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Thu, 10 Oct 2019 04:49:36 +0000 (21:49 -0700)]
LU-12355 llite: vfs atomic_open change with FMODE_CREATED
Kernel 4.19 introduced FMODE_CREATED and switched to it while
the last argument to vfs atomic_open was removed and the f_mode
flags are used to indicate the created state on return.
Linux-commit:
73a09dd94377e4b186b300bd5461920710c7c3d5
Lustre-change: https://review.whamcloud.com/35020
Lustre-commit:
4decb4c2da6053066f10cbe419e2db212de8e4aa
Test-Parameters: trivial
Change-Id: I26d4aadb123bb1d1bc0aa1d78a64a75b94276ffb
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Thomas Stibor <t.stibor@gsi.de>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36415
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Dominique Martinet [Mon, 9 Sep 2019 14:46:45 +0000 (16:46 +0200)]
LU-12734 misc: add bash completion for lctl set/get_param
Add some start of bash completion for lctl, mainly set_param and
get_param, and modify build system to install it.
Lustre-change: https://review.whamcloud.com/36105
Lustre-commit:
f87a7f2656ceff174a00933a170032f093ecc72d
Test-Parameters: trivial
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Change-Id: I16d2698e782702375c7fa3edf3bfde2e3b197297
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36483
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Qian Yingjin [Wed, 16 Jan 2019 02:13:20 +0000 (10:13 +0800)]
LU-11526 rpc: support maximum 64MB I/O RPC
On newer systems, some block drivers allow max_hw_sector_kb to
be up to 65536KB (64MB) to the underlying storage. To maximize
driver efficiency, Lustre should also have bump up maximum I/O
RPC size to 64MB.
Clamp max_read_ahead_whold_mb not to exceed
max_read_ahead_per_file_mb
Lustre-change: https://review.whamcloud.com/34042
Lustre-commit:
1a9be0046b1f1772d3f934c2146dc5233c391377
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Icbf78742f8210d82dc310af7d05b7c32b93af34f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35369
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Wang Shilong [Sun, 2 Jun 2019 15:17:26 +0000 (23:17 +0800)]
LU-12043 llite,readahead: don't always use max RPC size
Since 64M RPC landed, @PTLRPC_MAX_BRW_PAGES will be 64M.
And we always try to use this max possible RPC size to check
whether we should avoid fast IO and trigger real context IO.
This is not good for following reasons:
(1) Since current default RPC size is still 4M,
most of system won't use 64M for most of time.
(2) Currently default readahead size per file is still 64M,
which makes fast IO always run out of all readahead pages
before next IO. This breaks what users really want for readahead
grapping pages in advance.
To fix this problem, we use 16M as a balance value if RPC smaller
than 16M, patch also fix the problem that @ras_rpc_size could not
grow bigger which is possibe in the following case:
1) set RPC to 16M
2) Set RPC to 64M
In the current logic ras->ras_rpc_size will be kept as 16M which is wrong.
Lustre-change: https://review.whamcloud.com/35033
Lustre-commit:
7864a6854c3dfe3319dcf6809e728eed9a37b9b4
Change-Id: Ida9f839f7c692cd88d32dc0909503f6ae991d909
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35559
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Thu, 15 Aug 2019 14:31:17 +0000 (22:31 +0800)]
LU-11933 mdt: clear sp_cr_flags in migrate unpack
mdt_thread_info.mti_spec is not cleared after operation handling, so
mdt_migrate_unpack() should clear it in case the old values are used.
Lustre-change: https://review.whamcloud.com/36154
Lustre-commit:
d4da3b55a8303d937828e74341b3ab5c4dfd52b2
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib3d5d39a4a072621c8da8b6ef7869cb4d8178aac
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/36399
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Hongchao Zhang [Tue, 8 Oct 2019 17:10:34 +0000 (13:10 -0400)]
LU-11626 mdc: hold obd while processing changelog
During read/write changelog, the corresponding obd_device should
be held to protect it from being released by umount.
Lustre-change: https://review.whamcloud.com/35784
Lustre-commit:
d7bb6647cd4dd26949bceb6a099cd606623aff2b
Change-Id: Ib5b528f178edcf73425587ea60335df640c1696d
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36338
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Emoly Liu [Sun, 22 Sep 2019 12:06:07 +0000 (20:06 +0800)]
LU-11743 utils: allow lctl pool_list on separate MGS
Change lctl pool_list command to parse the configuration log directly
when run on a standalone MGS node. This also allows the pool commands
to be run when only the MGS is started.
Also, those test scripts from the patch of LU-9899 to mount a client
on the standalone MGS to allow OST pools to work properly are cleared.
Lustre-change: https://review.whamcloud.com/35895
Lustre-commit:
d908fe9686bc1e583da7434856d9c06e6cbbc4fd
Change-Id: Ic25931d49c2cf747da2a3f2ac3c25a21f6878991
Test-Parameters: standalonemgs=true testlist=ost-pools.sh,sanity.sh,conf-sanity.sh
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36314
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Mon, 3 Jun 2019 14:30:50 +0000 (23:30 +0900)]
LU-12131 tests: properly handle GSS in server failover
In case of server failover, a number of aspects must be handled when
GSS based features (SSK or Kerberos) are activated:
- lsvcgssd daemon must be restarted;
- targets must be mounted with proper skpath option;
- permissions on keys must be adjusted.
When service is initially started, all that is managed in setupall().
fail() and facet_failover() have to be improved to take GSS aspects
into account.
Lustre-change: https://review.whamcloud.com/35041
Lustre-commit:
1cbfb44fb59945da62acbb672330fde5c75ddc98
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8db686f406629c7eec655496cf83c0539c1bfb33
Reviewed-on: https://review.whamcloud.com/35534
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mikhail Pershin [Sun, 26 May 2019 17:46:43 +0000 (20:46 +0300)]
LU-11204 obdclass: remove unprotected access to lu_object
The check of lu_object_is_dying() is done after reference
drop and without lock, so can access freed object if concurrent
thread did final put.
The patch saves object state right before atomic_dec_and_lock()
and checks it after check, so object is not being accessed
Lustre-change: https://review.whamcloud.com/34960
Lustre-commit:
336cf0f2f3a9ce5b11a34aeaeec062a5d5144213
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I926991f465e7913e5fc150095425bfb5bf07f57f
Reviewed-on: https://review.whamcloud.com/36217
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Sun, 11 Aug 2019 06:34:04 +0000 (14:34 +0800)]
LU-12719 obdclass: serialize lwp list access
lustre_sb_info.lsi_lwp_list should be acessed with lock, and
some place may sleep, change lsi_lwp_lock from spinlock to mutex.
Lustre-change: https://review.whamcloud.com/36003
Lustre-commit:
875252d59924ad09db8de9f0fbb611788a0b9c78
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ifc3622eb28cd6cf49661b14fc10e98aa689a58dc
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36349
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Thu, 1 Aug 2019 06:58:44 +0000 (08:58 +0200)]
LU-12622 tests: skip sanity test_816 with SSK
sanity test_816 is incompatible with SSK, so skip it
in case SHARED_KEY is true.
Lustre-change: https://review.whamcloud.com/35664
Lustre-commit:
b4ab1084cc0f1947ab65c32075d579edac5ab547
Whamcloud-bug-id: ATM-1283
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I019cf6a4bab8b0cf9825b7c49f364225d5156dfa
Reviewed-on: https://review.whamcloud.com/36401
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
James Simmons [Tue, 30 Apr 2019 13:17:56 +0000 (09:17 -0400)]
LU-11157 obd: round values to nearest MiB for *_mb syfs files
Several sysfs files report their settings with the functions
lprocfs_[seq]_read_frac_helper() which has the intent of showing
fractional values i.e 1.5 MiB. This approach has caused problems
with shells which don't handle fractional representation and the
values reported don't faithfully represent the original value the
configurator passed into the sysfs file. To resolve this lets
instead always round up the value the configurator passed into
the sysfs file to the nearest MiB value. This way it is always
guaranteed the values reported are always exactly some MiB value.
Lustre-change: https://review.whamcloud.com/34317
Lustre-commit:
ba2817fe3ead1b8e32be6d6c6ce25b490626118a
Change-Id: Ia2e8cf8421784853aa33d4bb87c54aee00953835
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36393
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Alexey Zhuravlev [Wed, 21 Aug 2019 08:32:56 +0000 (11:32 +0300)]
LU-12674 osp: handle -EINPROGRESS on llog objects
if llog object is corrupted and OI doesn't allow access to that
OSP panics being asked to declare new llog record (e.g. for unlink).
Instead OSP should complain in the logs, skip llogging and suggest
to run LFSCK to fix orphans.
Lustre-change: https://review.whamcloud.com/35844
Lustre-commit:
a3ec8ff69fceb53a467a80c2e6008869f25f72b4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I18d4d68811833c08cdc1937d147ac6e8c3408a30
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36348
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Lai Siyao [Sun, 30 Jun 2019 15:26:11 +0000 (23:26 +0800)]
LU-11967 mdt: reint layout_change in standard way
Layout_change is a reint operation, and it should be handled the
same as other reint operations, so that resent and replay can
work correctly.
Also replace the lock passed in ldlm_handle_enqueue0() with the
lock taken in mdt_layout_change(). This avoids taking lock again
in ldlm_handle_enqueue0(), and also makes replay eaiser. Note,
before replacing, the mode is downgraded from EX to CR, because
client only needs this mode, as can avoid unnecessary lock cancel
later.
Add missing resent reconstructor for REINT_RESYNC.
Lustre-change: https://review.whamcloud.com/35465
Lustre-commit:
e1bebbdf53c8490a3be35793070f45f1a68721b1
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I328044dacbf18d03232c9bbb51271f6202e9b939
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36376
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Lai Siyao [Sat, 5 Oct 2019 05:42:56 +0000 (22:42 -0700)]
LU-11739 lod: subdir under ROOT should honor default layout
Though sub directories under ROOT don't inherit ROOT default
layout, they should hornor ROOT default layout in creation.
Add sub test in sanity.sh 65n to verify this.
Fixes:
0a988cae95 ("LU-11739 lod: don't inherit default
layout from root directory")
Lustre-change: https://review.whamcloud.com/35204
Lustre-commit:
693fb63ac777eab426f1b618316a5649534759ad
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I1edf91da7944a8871652df7bca2104d00f66163a
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36370
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Hongchao Zhang [Thu, 11 Jul 2019 08:20:33 +0000 (04:20 -0400)]
LU-11768 test: limit at_max to timeout in time
In test_6 of sanity-quota, if the AT is enabled, the timeout of
the QUOTA_DQACQ request could be longer than OBD_TIMEOUT*2, which
cause the watchdog to be triggered.
This is a backport to Lustre b2_12 of
Lustre-change: https://review.whamcloud.com/35651
Lustre-commit:
d8226b9353dbc1448af8d23c13cae5f21cbe3a86
Change-Id: I7e3a976a004259f5c956fc48f4d8d63c751ee2c0
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36365
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Nunez [Wed, 7 Aug 2019 15:56:50 +0000 (09:56 -0600)]
LU-12639 tests: initialize variable sanity 317
sanity test 317 checks the file system type of $facet,
but facet is not initiaized in the test.
Replace the check of the facet file system type with the
variable ost1_FSTYPE. The call to return after skip is
removed.
This is a backport to Lustre b2_12 of
Lustre-change: https://review.whamcloud.com/35716
Lustre-commit:
c00be06a1f8f27eb5bd8bb47086d0f1e5b5f5f50
Fixes:
6115eb7fd55a ("LU-10370 ofd: truncate does not update blocks count on client")
Test-Parameters: trivial
Test-Parameters: fstype=zfs envdefinitions=ONLY=317 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: If67c4d786e4d23752effd1aaebc82bb1be8aceb5
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/36366
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Bobi Jam [Wed, 10 Oct 2018 06:23:55 +0000 (14:23 +0800)]
LU-11239 lfs: fix mirror resync error handling
This patch returns error for partially successful mirror resync.
Lustre-change: https://review.whamcloud.com/33537
Lustre-commit:
0f670d1ca9dd5af697bfbf3b95a301c61a8b4447
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9d6c9ef5aca1674ceb7a9cbc6b790f3f7276ff5d
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36341
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Nathaniel Clark [Thu, 1 Aug 2019 15:36:59 +0000 (11:36 -0400)]
LU-12100 tests: Use minimum soft qunit limits
Ensure that we don't create limits that are too small, which would
cause all writes to fail.
Wait for grace period to timeout.
Lustre-change: https://review.whamcloud.com/35667
Lustre-commit:
37e28b7e05a5b1f77fe663f9407436aea312b3b2
Test-Parameters: trivial
Test-Parameters: testlist=sanity-quota
Test-Parameters: testlist=sanity-quota fstype=zfs
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I9342272615ca9c252fcc7f77ed8a61030fc9672a
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36346
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alexey Lyashkov [Wed, 28 Aug 2019 15:06:43 +0000 (18:06 +0300)]
LU-12707 obdecho: avoid panic with partially object init
in some cases (like ENOMEM) init function can't called, so
any init related code should placed in the object delete handler,
not in the object free.
Lustre-change: https://review.whamcloud.com/35950
Lustre-commit:
1a9ca8417c60f04a9aa719b7254372e2d18a6b0a
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: I1fca56423de9a045aac2c495fbc45069c3bbc97c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36317
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Mon, 21 Oct 2019 19:30:29 +0000 (13:30 -0600)]
New release 2.12.3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0901824abf874ecf6124907adad23fbc486cab07
Oleg Drokin [Tue, 8 Oct 2019 13:28:14 +0000 (09:28 -0400)]
New RC 2.12.3-RC1
Change-Id: I24e7f7622654332ce3e29a292ce87591cd0e09be
Alex Zhuravlev [Tue, 18 Jun 2019 14:33:16 +0000 (18:33 +0400)]
LU-12553 mdc: polling mode for changelog reader
this allows the user (like lsom_sync and similar) to follow
the changelog and don't rescan getting duplicates.
Lustre-change: https://review.whamcloud.com/35262
Lustre-commit:
e215002883d5620f43615013452935da8e7e3f8c
Change-Id: I78dc163838c1b88f9447a4731ad4bfe00fec7eff
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36362
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andrew Perepechko [Tue, 17 Sep 2019 07:34:44 +0000 (10:34 +0300)]
LU-11426 llog: changelog records reordering
Changelog records can get reordered because of a race
window between cr_index generation and llog file
space allocation. This can lead to llog records
loss.
llog_write() holds loghandle->lgh_lock semaphore,
so it seems an appropriate place to generate a
new changelog index.
Lustre-change: https://review.whamcloud.com/36187
Lustre-commit:
1fa0a984c5c3863d8f40b3b0d63c3d08cfa1a9f0
Change-Id: I034d1a696bde1d0f780e494ab65073e4018ceec9
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Cray-bug-id: LUS-7691
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36316
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Emoly Liu [Tue, 24 Sep 2019 11:20:48 +0000 (19:20 +0800)]
LU-12790 obdclass: print jobid error message properly
Modify unlikely() condition to print error message properly when
(rc == -EOVERFLOW).
Lustre-change: https://review.whamcloud.com/36272
Lustre-commit:
df21a3b9eb01621de92940e441cd557913d1cd05
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I19bfb353c71b55a0dfb6eec78c1af915494acd71
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36378
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andriy Skulysh [Wed, 22 Aug 2018 15:11:53 +0000 (18:11 +0300)]
LU-11756 o2iblnd: kib_conn leak
A new tx can be queued while kiblnd_finalise_conn()
aborts txs. Thus a reference from new tx will
prevent connection from moving into kib_connd_zombies.
Insert new tx after IBLND_CONN_DISCONNECTED into
ibc_zombie_txs list and abort it during
kiblnd_destroy_conn().
Lustre-change: https://review.whamcloud.com/33828
Lustre-commit:
a155c3fca38d2a3092f9b5d116ad7877d51d1db1
Change-Id: Ib92d8d02e6e3f66f7140041a330fc00b7ad44ae3
Cray-bug-id: LUS-6412
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36347
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Li Xi [Wed, 10 Feb 2016 14:37:00 +0000 (22:37 +0800)]
LU-12802 tests: speedup cleanup of racer
After racer test survives for a given time, it starts to cleanup.
And the parent racer.sh script waits the child racer/racer.sh
to exit. However sometimes, somehow, this stucks for a long time.
Sending a signal to remaining dd(or other) processes will wake up
the wait in parent racer.sh script immediately.
Lustre-change: https://review.whamcloud.com/36289
Lustre-commit:
fcf219db6d2cfb692f0b987945e85953a5b07de7
Test-Parameter: trivial testlist=racer
DDN-Bug-ID: DDN-256
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I2ff2784b76faa0532c39af29b1586a48f2b90a21
Reviewed-by: Shilong Wang <wshilong@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36381
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Wang Shilong [Thu, 26 Sep 2019 13:21:13 +0000 (21:21 +0800)]
LU-12777 test: fix to pass facet to facet_fstype
Function facet_fstype() expect mgs1 mds1 etc as its
argument, and we used it wrong to pass $mds1 which will
cause following error.
line 1192: lustre-ost1/ost1_FSTYPE: bad substitution
And we fail to detect this is ZFS based OSD, and pool
reimporting will be missed thus failed to mount.
Lustre-change: https://review.whamcloud.com/36298
Lustre-commit:
38c8fdfde3953f239bd3d86a91a3213737231ce5
Test-Parameters: trivial clientdistro=el8 testlist=conf-sanity \
fstype=zfs envdefinitions=ONLY=103
Test-Parameters: trivial clientdistro=el8 testlist=conf-sanity \
fstype=ldiskfs envdefinitions=ONLY=103
Change-Id: Id8fd5b9f17e666614e83e5c1a2399fde8b91b023
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36379
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Emoly Liu [Thu, 19 Sep 2019 10:26:31 +0000 (18:26 +0800)]
LU-12229 tests: fix "bad substitution" error
In newer bash version, the special characters is invalid in the
usage of indirect variable expansion {!word}. For example,
# a=lustre,pool
# echo ${!a}
-bash: lustre,pool: bad substitution
To avoid "bad sustitution" error, pool_new command is used in
test_1j and test_1k directly.
Lustre-change: https://review.whamcloud.com/36243
Lustre-commit:
ac426d6f17b80ed36052f11b9780fa444cfa24aa
Test-Parameters:trivial clientdistro=el8 testlist=ost-pools
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ifce4616cd7f314416fe5fa09f8fba846ae45bcef
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36377
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Wed, 19 Dec 2018 23:55:49 +0000 (15:55 -0800)]
LU-11816 lnet: setup health timeout defaults
Enable health feature by default.
Setup transaction timeout to a default 10 seconds and
retry count to 3 when health is enabled. When health
is disabled set default transaction timeout to 50.
When toggling between health enabled/disabled the defaults
will always kick in.
Lustre-change: https://review.whamcloud.com/34252
Lustre-commit:
8632e94aeb7e62da07f342a9897d15dfd8251148
This is a new commit for the previous reverted of commit
https://review.whamcloud.com/#/c/36031/
Change-Id: I359f9cc6c93b5f7d0b58df1abdd29ae0bffd4faf
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36382
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Mon, 27 May 2019 17:43:10 +0000 (10:43 -0700)]
LU-12344 lnet: handle remote health error
When a peer is dead set the health status to REMOTE_DROPPED
in order to handle health properly for the peer.
When dropping a routed message set REMOTE_ERROR. Routed messages
are dropped when the routing feature is turned off which could
be considered a configuration error if it happens in the middle
of traffic. Therefore, it's better to flag this issue at this
point without resending the message.
Lustre-change: https://review.whamcloud.com/34967
Lustre-commit:
b45e3d96fc4d82ebf5b1bb3ef0b5a59e8ff86e75
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I131263215a68fc8607582643a47007ce4d04abbc
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36030
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Fri, 5 Oct 2018 00:18:20 +0000 (17:18 -0700)]
LU-11478 lnet: misleading discovery seqno.
There is a sequence number used when sending discovery messages. This
sequence number is intended to detect stale messages. However it
could be misleading if the peer reboots. In this case the peer's
sequence number will reset. The node will think that all information
being sent to it is stale, while in reality the peer might've
changed configuration.
There is no reliable why to know whether a peer rebooted, so we'll
always assume that the messages we're receiving are valid. So we'll
operate on first come first serve basis.
Lustre-change: https://review.whamcloud.com/33304
Lustre-commit:
42d999ed8f6113724b1ac103b832d5b74b878d55
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I421a00e47bc93ee60fa37c648d6d9a726d9def9c
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36041
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Minh Diep [Mon, 30 Sep 2019 18:25:50 +0000 (11:25 -0700)]
LU-12825 build: change lbuild to support MOFED 4.7
* Remove 'alternate' name in MOFED tar
* use MLNX_LIBS to download rpms
Test-Parameters: trivial
Lustre-change: https://review.whamcloud.com/36333
Lustre-commit:
279c26466bff37dd25fe26e4bb56a16a9a797870
Change-Id: Ia5a4f51455be836a7df4fa6b3e9eccc17cffef2c
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36344
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sergey Gorenko [Fri, 20 Sep 2019 13:34:48 +0000 (16:34 +0300)]
LU-12789 o2ib: fix configure checks
Fix configure checks for modern kernels / MOFED 4.7
1) sg_dma_address() and sg_dma_len() always have only one argument.
2) Make configure checks executed in proper enviroment
Lustre-change: https://review.whamcloud.com/36245
Lustre-commit:
f44f657ee218303220f41182ced4fac290266b7f
Change-Id: I9910de888371776758376743ab4418778e1d85e4
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36331
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
NeilBrown [Sun, 4 Nov 2018 20:42:51 +0000 (15:42 -0500)]
LU-11617 mdc: fix possible deadlock in chlg_open()
Lockdep reports a possible deadlock between chlg_open() and
mdc_changelog_cdev_init()
mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
calls misc_register() which takes misc_mtx.
chlg_open() is called while misc_mtx is held, and tries to take
chlg_registered_dev_lock.
If these two functions race, a deadlock can occur as each thread will
hold one of the locks while trying to take the other.
chlg_open() does not need to take a lock. It only uses the
lock to stablize a list while looking for the matching
chlg_registered_dev, and this can be found directly by examining
file->private_data.
So remove chlg_obd_get(), and use file->private_data to find the
obd_device.
Also ensure the device is fully initialized before calling
misc_register(). This means setting up some list linkage before the
call, and tearing it down if there is an error.
Lustre-change: https://review.whamcloud.com/33572
Lustre-commit:
206b21741b07a10269bbcfdac28743591b64ab2f
Change-Id: Icffdebcee656ee6199297ba2a28ba57dcbc51ae1
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36230
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Gu Zheng [Wed, 18 Sep 2019 04:12:55 +0000 (12:12 +0800)]
LU-12705 utils: cleanup unnecessary typecasting
There're a bunch of variables typeecasted in utils/lfs.c where
they are not needed, so cleanup them here.
Lustre-change: https://review.whamcloud.com/36224
Lustre-commit:
d8135ad2fbe58a0fbe6984584816338542901c5c
Change-Id: I6c944f18137fd1ff1162d9b6567c9328dfa185eb
Test-Parameters: trivial
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36313
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Shaun Tancheff [Sun, 21 Jul 2019 07:42:43 +0000 (02:42 -0500)]
LU-12400 lnet: Infiniband sg_dma changes for linux 5.1
IB/core: Remove ib_sg_dma_address() and ib_sg_dma_len()
Linux-commit:
a163afc88556e099271a7b423295bc5176fcecce
This simplification can be applied to mainline 3.15 and later
however the test should remain for 3rd party ib driver support
Lustre-change: https://review.whamcloud.com/35497
Lustre-commit:
bbc2cf593b83f5f1822889ef5c910906aadbe735
Test-Parameters: trivial
Cray-bug-id: LUS-7600
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I4824b3b737388a3fc0aec43b2d8e5d10f871ccdd
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36330
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Bobi Jam [Sat, 24 Aug 2019 17:20:23 +0000 (01:20 +0800)]
LU-12690 llite: error handling of ll_och_fill()
The return error of ll_och_fill() should be handled.
Lustre-change: https://review.whamcloud.com/35913
Lustre-commit:
4d6d58575d3d957aa3dbf38f83f749259b580bf2
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I4e750001cb124104836fa24e39ec8ae203b51a83
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36315
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alexey Zhuravlev [Fri, 13 Sep 2019 19:28:06 +0000 (22:28 +0300)]
LU-12570 mdt: request env for DT threads
as part of lock enqueue MDT thread can call ldlm_reclaim_full() to
cancel old unused LDLM locks and that scans all presented namespace
including OFD-originated (with extent locks). thus MDT ends with
calls into OFD code which needs own env marked with LCT_DT_THREAD.
Lustre-change: https://review.whamcloud.com/36179
Lustre-commit:
1f94d5eb2be4e921e909d8f18523dcab91bb6531
Test-Parameters: testlist=sanity,sanity,sanity,sanity envdefinitions=ONLY="134a",SHARED_KEY=true
Test-Parameters: testlist=sanity,sanity,sanity,sanity envdefinitions=ONLY="134a",SHARED_KEY=true
Test-Parameters: testlist=sanity,sanity,sanity,sanity envdefinitions=ONLY="134a",SHARED_KEY=true
Signed-off-by: Alexey Zhuravlev <bzzz@whamcloud.com>
Change-Id: I231b88159978bc3ce7a3fa0f27e57eb32137c343
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36312
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Thu, 13 Jun 2019 19:04:54 +0000 (14:04 -0500)]
LU-12355 lnet: ib_fmr_pool_unmap returns void
Historically ib_fmr_pool_unmap only ever returned 0
Linux kernel 4.20 changed the return for ib_fmr_pool_unmap to void.
Linux-commit:
3eeeb7a59acddaa326b03efdf6dce61c120449a3
Lustre-change: https://review.whamcloud.com/35017
Lustre-commit:
46298ffe0b436a8cf1c60aa3d7bde7ae52c78d00
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Change-Id: I49d91a49c452dad5c7d9b153fdbc011f2f25743a
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36329
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Shaun Tancheff [Tue, 11 Jun 2019 12:29:49 +0000 (07:29 -0500)]
LU-12355 lnet: Adjust checks for ib_device_ops
RDMA/core: Introduce ib_device_ops
The ib_device_ops structure defines all the InfiniBand device
operations in one place
Linux-commit:
521ed0d92ab0db3edd17a5f4716b7f698f4fce61
Lustre-change: https://review.whamcloud.com/35016
Lustre-commit:
27572b0476b07b396174430940f184ed85088eeb
Test-Parameters: trivial
Change-Id: Ia2a617597c75ec819f485b93a1deb368d4b5e873
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36328
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Sat, 25 May 2019 16:55:47 +0000 (09:55 -0700)]
LU-12339 lnet: select LO interface for sending
In the following scenario
Lustre->LNetPrimaryNID with 0@lo
Discover is initiated on 0@lo
The peer is created with 0@lo and <addr>@<net>
The interface health of the peer's <addr>@<net> is decremented
LNetPut() to self
selection algorithm selects 0@lo to send to
This exposes an issue where we try and go through the peer credit
management algorithm, but because there are no credits associated with
0@lo we end up indefinitely queuing the message. ptlrpc will then get
stuck waiting for send completion on the message.
This was exposed via conf-sanity 32a
Lustre-change: https://review.whamcloud.com/34957
Lustre-commit:
69d1535ebdac139c6b19db2bca5f65663fe88467
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I98e9d3428b594a0d041d27d8e8d8de7596825edc
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Tue, 30 Apr 2019 21:01:48 +0000 (14:01 -0700)]
LU-12199 lnet: verify msg is commited for send/recv
Before performing a health check make sure the message
is committed for either send or receive. Otherwise we
can just finalize it.
Lustre-change: https://review.whamcloud.com/34797
Lustre-commit:
fc6b321036f34c00d5b32b49c817dc0034fbad9e
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id7bd956f8e81e60a2d63059730973f851d4c7abe
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36039
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Wed, 20 Mar 2019 19:14:51 +0000 (12:14 -0700)]
LU-12080 lnet: clean mt_eqh properly
There is a scenario where you have a peer on your recovery queue
that's down. So you keep pinging it, but every ping times out
after 10 seconds. In the middle of these 10 seconds you perform a
shutdown. First you try to do the rsp_tracker_clean. It goes through
and calls MDUnlink on the MD related to that ping. But because the
message has a ref count on the MD, it doesn't go away. The MD gets
zombied. And just waits for lnet_md_unlink to be called in
lnet_finalize(). Then you hit clean_peer_ni_recovery. We see the peer
on the queue, we try to call Unlink on it, but when we lookup the
MD using lnet_handle2md() we can't find it. Afterwards we try to clean
up the EQ and it asserts. Even if we remove the assert we end up with
a resource leak since the EQ is not actually freed since we won't call
LNetEQFree() again.
The solution is to pull the EQ create in the LNetNIInit() and deletion
happens in lnet_unprepare. By this point all the remaining messages
would've been finalized and all references on the EQ are gone,
allowing us to clean it up properly
Lustre-change: https://review.whamcloud.com/34477
Lustre-commit:
1065c8888e96fef9e98676bd3a71b46f7910b085
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7fd6018ee2e57f82c649fc3658352e89a4309986
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Chris Horn [Thu, 18 Apr 2019 03:49:18 +0000 (22:49 -0500)]
LU-12199 lnet: Ensure md is detached when msg is not committed
It's possible for lnet_is_health_check() to return "true" when the
message has not hit the network. In this situation the message is
freed without detaching the MD. As a result, requests do not receive
their unlink events and these requests are stuck forever.
A little cleanup is included here:
- The value of lnet_is_health_check() is only used in one place, so
we don't need to save the result of it in a variable.
- We don't need separate logic to detach the md when the send was
successful. We'll fall through to the finalizing code after
incrementing the health counters
Lustre-change: https://review.whamcloud.com/34885
Lustre-commit:
b65f3a1767ae82c7f629320187b33eb8670da537
Cray-bug-id: LUS-7239
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I6301d491090b862d016eed3aac8afd7be8685e57
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36038
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Chris Horn [Thu, 2 May 2019 22:24:32 +0000 (17:24 -0500)]
LU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock
Protect the peer discovery queue from concurrent manipulation by
acquiring the lp_lock.
Lustre-change: https://review.whamcloud.com/34798
Lustre-commit:
dd16a31bf4ae874a69cc7dc5fe1f3197993630ae
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: If43b877c1c7ea203f346a3d6ea846f00b8f9661f
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36037
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Tue, 30 Apr 2019 18:51:09 +0000 (11:51 -0700)]
LU-12254 lnet: correct discovery LNetEQFree()
The EQ needs to be freed after all the queues are cleaned to avoid
having non-processed events on the event queue on free. This will
prevent the memory from being freed.
Lustre-change: https://review.whamcloud.com/34796
Lustre-commit:
a0879b5985b41f92dede96e7f27623eb72102b15
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie38ec25e09bf6d7cf2aadc30edd91d298897c51b
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36036
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Tue, 30 Apr 2019 05:57:21 +0000 (22:57 -0700)]
LU-12249 lnet: fix list corruption
In shutdown the resend queues are cleared and freed. The monitor
thread state is set to shutdown. It is possible to get lnet_finalize()
called after the queues are freed. The code checks for ln_state to see
if we're shutting down. But in this case we should really be checking
ln_mt_state. The monitor thread is the one that matters in this case,
because it's the one which allocates and frees the resend queues.
Lustre-change: https://review.whamcloud.com/34778
Lustre-commit:
d799ac910cd6c980b40c81b76eaefb65b88904d0
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia077cec7a52ef5cd2e1b231437c6265ba9416b1b
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36035
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Sat, 27 Apr 2019 22:47:42 +0000 (15:47 -0700)]
LU-11297 lnet: invalidate recovery ping mdh
For cleanliness, ensure that recovery ping mdh is invalidated when
an peer ni or a local ni are allocated
Lustre-change: https://review.whamcloud.com/34771
Lustre-commit:
d7b5f3114d51d5a9d1a34f5073e0bb2d0d63d302
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If06448b1602b3680831244923b6b982a555159ea
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36034
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Fri, 27 Sep 2019 18:09:51 +0000 (11:09 -0700)]
LU-12620 kernel: kernel update RHEL 8.0 [4.18.0-80.7.1.el8_0]
Update RHEL 8.0 kernel to 4.18.0-80.7.1.el8_0 for Lustre client.
Test-Parameters: trivial clientdistro=el8 \
envdefinitions=SANITY_EXCEPT="421a" \
testlist=sanity
Change-Id: I9a78ad00d1503cc90f5975e349fe96d452b1174f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35657
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Wang Shilong [Mon, 23 Sep 2019 16:34:24 +0000 (09:34 -0700)]
LU-12755 ldiskfs: fix project quota unpon unpatched kernel
The value of MAXQUOTAS is the number of quota types supported
by kernel. With project quotas patch applied, MAXQUOTAS is
equal to EXT4_MAXQUOTAS. However, on an unpatched kernel,
project quota type is not supported and MAXQUOTAS is one less
than EXT4_MAXQUOTAS.
In ldiskfs, we need to make sure that the loop in
ext4_quota_off_umount() is limiting the EXT4_MAXQUOTAS loop
to the kernel MAXQUOTAS value. Otherwise, it is trying to
dereference sb_dqopt(sb)->files[2] which is not an inode at all,
and cause the kernel stick on a spinlock in ext4_quota_off()
as follows during unmount:
Call Trace:
[<
ffffffffb9d733c5>] queued_spin_lock_slowpath+0xb/0xf
[<
ffffffffb9d81b30>] _raw_spin_lock+0x20/0x30
[<
ffffffffb9865e2e>] igrab+0x1e/0x60
[<
ffffffffc08a8c4b>] ldiskfs_quota_off+0x3b/0x130 [ldiskfs]
[<
ffffffffc08abcdd>] ldiskfs_put_super+0x4d/0x400 [ldiskfs]
[<
ffffffffb984b13d>] generic_shutdown_super+0x6d/0x100
[<
ffffffffb984b5b7>] kill_block_super+0x27/0x70
[<
ffffffffb984b91e>] deactivate_locked_super+0x4e/0x70
[<
ffffffffb984c0a6>] deactivate_super+0x46/0x60
[<
ffffffffb986abff>] cleanup_mnt+0x3f/0x80
[<
ffffffffb986ac92>] __cleanup_mnt+0x12/0x20
[<
ffffffffb96c1c0b>] task_work_run+0xbb/0xe0
[<
ffffffffb962cc65>] do_notify_resume+0xa5/0xc0
[<
ffffffffb9d8d23b>] int_signal+0x12/0x17
This patch is back-ported from the following one:
Lustre-commit:
4b013aa4cdc12647cb1aa9c93bdd72d741b83af4
Lustre-change: https://review.whamcloud.com/36203
Test-Parameters: clientdistro=el7.7 serverdistro=el7.7
Change-Id: I18a4d97656e2f8478754943424c0fac927f843ca
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/36270
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andrew Perepechko [Fri, 10 Aug 2018 13:18:48 +0000 (16:18 +0300)]
LU-11296 osc: speed up page cache cleanup during blocking ASTs
While we are cleaning a write lock, we don't need to check if
page cache pages under this lock are covered by another lock.
If a client needs to give up its lock, cleaning gigabytes of
page cache can take quite a long time.
Lustre-change: https://review.whamcloud.com/33090
Lustre-commit:
b9ebb17277c78101018a0cf4a63f6beb93b9baf0
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Cray-bug-id: LUS-6352
Change-Id: I576130216ed4de4e352ea697bddb5ff83046443a
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35831
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Patrick Farrell [Tue, 16 Jul 2019 19:26:43 +0000 (15:26 -0400)]
LU-12559 ptlrpc: Hold imp lock for idle reconnect
Idle reconnect sets import state to IMP_NEW, then releases
the import lock before calling ptlrpc_connect_import. This
creates a gap where an import in IMP_NEW state is exposed,
which can cause new requests to fail with EIO.
Hold the lock across the call so as not to expose imports
in this state.
Lustre-change: https://review.whamcloud.com/35530
Lustre-commit:
e9472c54ac820c3a0db2318a6ef894c3971e6e0b
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I9f8509d11c4d5a8917a313349534d98b964cd588
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36215
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Wed, 12 Dec 2018 08:49:00 +0000 (16:49 +0800)]
LU-11743 utils: allow lctl pool commands on separate MGS
The current lctl code checks for the presence of configured pools on
the client and MDS via /proc or /sys files. However, the MGS does
not parse the client/MDS configuration logs, so it does not create
the various files for the pools, which causes the pool commands to
fail verification.
Change lctl pool_new, pool_add, pool_remove and pool_destroy commands
to parse the configuration log directly when run on a standalone MGS
node. This also allows the pool commands to be run when only the MGS
is started.
Lustre-change: https://review.whamcloud.com/34110
Lustre-commit:
4a003a1f554602265630637080f65d9b4474f822
Test-Parameters: standalonemgs=true testlist=ost-pools.sh
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ib6fdb367c919f7b726fbf551dcfa6015593ebbe5
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35804
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Thu, 15 Aug 2019 18:33:08 +0000 (22:33 +0400)]
LU-12612 osd: add lnb size down to osd
so that each OSD can check for lnb array overflow.
the patch isn't final - there will be proper
implementation in osd-zfs and a new test.
Lustre-change: https://review.whamcloud.com/35801
Lustre-commit:
8033f80de3d0db87f7e965078ceee62033adb58d
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I43683c84e48006b4075f9a8b3e87cdfeae28c02b
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36273
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Patrick Farrell [Sun, 21 Jul 2019 17:06:37 +0000 (13:06 -0400)]
LU-12569 o2iblnd: Make credits hiw connection aware
The IBLND_CREDITS_HIGHWATER mark check currently looks only
at the global peer credits tunable, ignoring the connection
specific queue depth when determining the threshold at
which to send a NOOP message to return credits.
This is incorrect because while connection queue depth
defaults to the same as peer credits, it can be less than
that global value for specific connections.
So we must check for this case when setting the threshold.
Lustre-change: https://review.whamcloud.com/35578
Lustre-commit:
1b87e8f61781e48c31b4da647214d66addf2b90c
Test-Parameter: nettype=o2ib
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ie028ae11cdbd0f75a38b265b7ab5830f92f08d90
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Thu, 5 Sep 2019 21:28:48 +0000 (14:28 -0700)]
LU-12385 lnet: update opa defaults
Testing reveals no significant performance improvements
when using peer_credits > 32. Adjusted the default
peer_credits, peer_credits_hiw and concurrent_sends
to take that into account.
This has the advantage of avoiding an issue observed
on multiple opa sites where the qp can not be created because
of large initial queue_depth. The queue depth is then
reduced gradually until the qp creation succeeds.
Lustre-change: https://review.whamcloud.com/36072
Lustre-commit:
7f199dbf0261b89afe0dc8185db4403ae0efdefa
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I6036ec1da7063e30b567446e5db89040f21bc701
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36252
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Fri, 6 Sep 2019 01:15:10 +0000 (18:15 -0700)]
LU-12621 o2iblnd: cache max_qp_wr
When creating the device the maximum number of work requests per qp
which can be allocated is already known. Cache that internally,
and when creating the qp make sure the qp's max_send_wr does not
exceed that max. If it does then cap max_send_wr to max_qp_wr.
Recalculate the connection's queue depth based on the max_qp_wr.
Lustre-change: https://review.whamcloud.com/36073
Lustre-commit:
7ee319ed7f9dfa365a66b20b03f2141c54fb0293
Test-Parameter: nettype=o2ib
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I6d9a642d03633264f5f14445a051dd14515709c1
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36253
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Mr NeilBrown [Wed, 11 Sep 2019 18:26:47 +0000 (14:26 -0400)]
LU-11542 import: Fix missing spin_unlock()
A recent patch moved the spin_unlock() down into
each branch of an 'if', but missed the final 'else'.
Add the spin_unlock in the else.
Lustre-change: https://review.whamcloud.com/35999
Lustre-commit:
3dbdd38a6adcee63b6d89d4656e0099a0006f26c
Fixes:
29904135df67 ("LU-11542 import: fix race between imp_state & imp_invalid")
Signed-off-by: Mr NeilBrown <neilb@suse.com>
Change-Id: I6ee399050aad0fe9df9c0e3ddf8ec0be8eae1641
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36251
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Yang Sheng [Mon, 15 Oct 2018 09:37:21 +0000 (17:37 +0800)]
LU-11542 import: fix race between imp_state & imp_invalid
We set import to LUSTRE_IMP_DISCON and then deactive when
it is unreplayable. Someone may set this import up between
those two operations. So we will get a invalid import with
FULL state.
Lustre-change: https://review.whamcloud.com/33395
Lustre-commit:
29904135df671c624b1e542fdda94b221d76e667
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ib4cec0bcaf6f4b221ba260edb94749a4e523f5e6
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35796
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Alex Zhuravlev [Tue, 23 Jul 2019 13:53:22 +0000 (17:53 +0400)]
LU-12090 utils: lfs rmfid
a new RPC_REINT_RMFID has been introduced by the patch.
it's supposed to be used with corresponding llapi_rmfid()
to unlink a batch of MDS files by their FIDs. the caller
has to have permission to modify parent dir(s) and the objects
themselves.
Lustre-change: https://review.whamcloud.com/34449
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib22379033aca92692e0e219671ca0c2ec7893c24
Reviewed-on: https://review.whamcloud.com/35595
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Gu Zheng [Fri, 30 Aug 2019 07:27:30 +0000 (03:27 -0400)]
LU-12705 build: fix building fail against Power9 little endian
We use "%ll[dux]" for __u64 variable as an input/output modifier,
this may cause building error on some architectures which use "long"
for 64-bit types, for example, Power9 little endian.
Here add necessary typecasting (long long/unsigned long long) to
make the build correct.
Lustre-change: https://review.whamcloud.com/36007
Lustre-commit:
4eddf36ac3607c66c172668b30eb5dcf921e3de4
Test-Parameters: trivial
Change-Id: I2e8569f4ac14f7d328a29d153ff57c7834cabc46
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36207
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Andreas Dilger [Wed, 21 Aug 2019 14:03:11 +0000 (10:03 -0400)]
LU-11729 obdclass: align to T10 sector size when generating guard
Otherwise the client and server would come up with
different checksum when the page size is different.
Improve test_810 to verify all available checksum types.
Test-Parameters: trivial envdefinitions=ONLY=810 testlist=sanity,sanity,sanity
Test-Parameters: clientarch=aarch64 envdefinitions=ONLY=810 testlist=sanity,sanity
Test-Parameters: clientarch=ppc64 envdefinitions=ONLY=810 testlist=sanity,sanity
Lustre-change: https://review.whamcloud.com/34043
Lustre-commit:
98ceaf854bb4738305769c5cd1df556ee99aa859
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I24117aebb277d4ddcb7787b715587e33023ebbe5
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36205
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Mon, 16 Sep 2019 19:15:03 +0000 (12:15 -0700)]
LU-11485 lod: disallow setting the last non-stale mirror as stale
"lfs setstripe" allows setting stale flag on the last
non-stale mirror of a file, which makes the file have
no valid component to read and return IO error.
This patch fixes the above issue by disallowing that.
It also disallows "lfs mirror split" to destroy the
last non-stale mirror of a file.
This patch is back-ported from the following one:
Lustre-commit:
29be32a759f696006a539d3cff74ca55a281aa64
Lustre-change: https://review.whamcloud.com/36141
Change-Id: I6934cfe0190cd1ea83de1cf28ddf840b9f96193a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36195
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Alex Zhuravlev [Mon, 16 Sep 2019 19:07:20 +0000 (12:07 -0700)]
LU-11022 lfs: remove mirror by pool name
lfs mirror split --pool <poolname> <file>
This patch is back-ported from the following one:
Lustre-commit:
0c710a46cfb43366dc57ff6e83e414086b1d0e6c
Lustre-change: https://review.whamcloud.com/35329
Change-Id: I012e68729b94657236ba3fc530fc7b7485529ed2
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Emoly Liu [Mon, 9 Sep 2019 08:10:29 +0000 (16:10 +0800)]
LU-12602 mdt: more EA size check in mdt_getxattr_pack_reply()
While the RMF_EAVALS field size can be arbitrary length,
the RMF_EAVALS_LENS field definition specifies
the RMF_F_STRUCT_ARRAY flag, so the passed size must be a multiple
of sizeof(__u32) or the internal LBUG() will trigger.
Lustre-change: https://review.whamcloud.com/36103
Lustre-commit:
4d8bc239c2c30a47e8833cf23db6ccd39ff61705
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I767e1b1496298e9a66274fc324f9c34daaed4a09
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36208
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Tue, 15 Jan 2019 16:25:54 +0000 (11:25 -0500)]
LU-8130 libcfs: port working hash from upstream
The hash_[32|64] function in pre-4.6 kernels produce hashes
with poor distributions which result in high collision rates.
Backport those improvements for the pre-4.6 kernels Lustre
supports. Details can be read here:
https://lwn.net/Articles/687494
Lustre-change: https://review.whamcloud.com/33789
Lustre-commit:
1658ae30a0e97e7f4018d8cba67e459078470d1a
Test-Parameters: trivial
Change-Id: Id2436ba8be2d3ed482c5386b79710f594d5b3e59
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35179
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Sebastien Buisson [Tue, 9 Apr 2019 12:58:20 +0000 (14:58 +0200)]
LU-12131 tests: only create lgssc.conf file if necessary
lgssc.conf file is now packaged by Lustre, and installed under
/etc/request-key.d/.
So, unless run from build tree, init_gss() must not create its own
anymore. So adjust corresponding commands in init_gss() and
cleanup_sk().
Lustre-change: https://review.whamcloud.com/34520
Lustre-commit:
66919f2b687f8b15679e6ff4e22a3f66f7d1c13a
Fixes:
e299df1e9eea ("LU-7854 gss: install lgssc.conf under /etc/request-key.d")
Whamcloud-bug-id: ATM-1283
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9cc76fddb8a622d7c40d6348913df42ae063254a
Reviewed-on: https://review.whamcloud.com/35557
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Bobi Jam [Wed, 24 Jul 2019 13:24:01 +0000 (21:24 +0800)]
LU-12581 osc: prevent use after free
Clear aa_oa after it's been freed to prevent use after free.
Lustre-change: https://review.whamcloud.com/35601
Lustre-commit:
61c9f8797771c951ecd240981d7d97d5adc685e0
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idf122aa53fe5b13c07337745e5a26763e8712be2
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36210
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Nikitas Angelinas [Wed, 24 Jul 2019 09:43:53 +0000 (02:43 -0700)]
LU-11675 hsm: don't allow new HSM requests during CDT_INIT
When the HSM CDT is shut down and restarted, it resets cdt_last_cookie
using ktime_get_real_seconds() and examines the CDT llog for existing
requests, in order to set cdt_last_cookie to the highest known value,
so that newly-assigned cookies are unique. There is a window between
CDT_INIT and CDT_RUNNING during which new requests can arrive, and if
the CDT llog has not been fully examined, cookies can be reused. This
can cause the following two assertions to be triggered in
cdt_agent_record_hash_add():
LASSERT(carl0->carl_cat_idx == carl1->carl_cat_idx);
LASSERT(carl0->carl_rec_idx == carl1->carl_rec_idx);
Fix this by not allowing new HSM requests during CDT_INIT.
Also, cookie values are incremented on a separate line, which causes
one value to be skipped at CDT startup time. This is not an issue, but
there does not seem to be a need for it; fix this post-incrementing
and assigning cookie values in the same line.
Lustre-change: https://review.whamcloud.com/33671
Lustre-commit:
39862136c3cfee127c4b0a9604ff12f560af3124
Signed-off-by: Nikitas Angelinas <nangelinas@cray.com>
Cray-bug-id: LUS-6589
Test-Parameters: trivial testlist=sanity-hsm
Change-Id: I18a1c3e85de6c50a9bf1ce598e21d83d893ad0ca
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36212
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Emoly Liu [Thu, 29 Aug 2019 06:15:15 +0000 (14:15 +0800)]
LU-12613 ptlrpc: check buffer length in lustre_msg_string()
Check buffer length in lustre_msg_string() in case of any invalid
access.
Lustre-change: https://review.whamcloud.com/35932
Lustre-commit:
728c58d60faef288eb7d05d8809fa2b1a55ade89
Change-Id: I286000db16384938a594bd8d104e5f3d0fff585a
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Yunye Ry <yunye.ry@alibaba-inc.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36209
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Bobi Jam [Mon, 16 Sep 2019 18:56:47 +0000 (11:56 -0700)]
LU-10258 lfs: lfs mirror copy command
Add "lfs mirror copy" command to copy a mirror's content to other
mirror(s) of a mirrored file.
Usage:
lfs mirror copy {--read-mirror|-i <id0>}
{--write-mirror|-o <id1>[,<id2>,...]} <mirrored_file>
Options:
--read-mirror|-i <id0>
This option indicates the content of which mirror specified by id0
needs to be read. The id0 is the numerical unique identifier for a
mirror.
--write-mirror|-o <id1>[,<id2>,...]
This option indicates the content of which mirror(s) specified by
mirror IDs needs to be written. The mirror IDs are separated with
comma. If the mirror id -1 is used here, it means that all mirrors
other than the read mirror are to be written.
Note:
Be ware that the written mirror(s) will be marked as non-stale
mirror(s), be careful that after using this command, you could get a
file with non-stale mirrors while containing different contents.
This patch is back-ported from the following one:
Lustre-commit:
c6e7c0788d7cd766880d12eae6679782283dc479
Lustre-change: https://review.whamcloud.com/33220
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Id138368cdb29ec14b7c03a5db3b2dd1e0db5ea37
Reviewed-on: https://review.whamcloud.com/36193
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Thu, 27 Jun 2019 10:08:17 +0000 (12:08 +0200)]
LU-12131 tests: fix test_802 for GSS
test_802 should not overwrite already existing client mount options
when trying to mount client as read-only.
Lustre-change: https://review.whamcloud.com/35335
Lustre-commit:
a51d0653cf46fc898da01f86c26cc0f4f5beff5a
Test-Parameters: trivial
Test-Parameters: envdefinitions=ONLY=802 testlist=sanity
Test-Parameters: envdefinitions=SHARED_KEY=true,ONLY=802 testlist=sanity
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8189c245870fb0caf48006db11621f0af48e1878
Reviewed-on: https://review.whamcloud.com/35535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Amir Shehata [Fri, 19 Apr 2019 00:12:49 +0000 (17:12 -0700)]
LU-12201 lnet: detach response tracker
We need to unlink the response tracker from MDs even if the
corresponding message failed to send.
Lustre-change: https://review.whamcloud.com/34770
Lustre-commit:
1bb91b966d15345b4c89245d51f6cb631b052779
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4f320274576790e3332f66f30aad5c2b3450b955
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36033
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Artem Blagodarenko [Fri, 9 Aug 2019 19:19:29 +0000 (22:19 +0300)]
LU-12650 lib: fix strings comparison during mount searching
get_root_path() returns path to "lustre" mount instead "lustre1"
because last symbol is not taking in account during comparison.
This bug has influence to get_root_path() users.
For example, fid2path use get_root_path().
lfs path2fid /mnt/lustre2/foodir3
[0x200000401:0x1:0x0]
lfs fid2path lustre2 [0x200000401:0x1:0x0]
lfs fid2path: cannot find '[0x200000401:0x1:0x0]': No such file or
directory
umount /mnt/lustre
lfs fid2path lustre2 [0x200000401:0x1:0x0]
foodir3
This fix adds strings length comparison.
Lustre-change: https://review.whamcloud.com/35755
Lustre-commit:
0817efd73f04bf59d1234887bc3971d2d067067e
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Cray-bug-id: LUS-7693
Change-Id: I3275d2182486d25389814f4c25b3f2a54ec29469
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36211
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Emoly Liu [Wed, 14 Aug 2019 07:52:58 +0000 (15:52 +0800)]
LU-12602 mdt: check EA size in mdt_getxattr_pack_reply()
Check EA data size(non-positive or excessively large) in case of
any corruption.
Lustre-change: https://review.whamcloud.com/35768
Lustre-commit:
915135c37cbfa6851a5ec732afd20955eb020566
Change-Id: I8ccea214f8d7c0403a9df180acf487ee381b8d77
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35936
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Oleg Drokin [Sat, 17 Aug 2019 05:43:36 +0000 (01:43 -0400)]
LU-12614 ldlm: ldlm_cancel_hpreq_check should check lock count
Make sure the number of locks we are going to cancel fits into
the supplied buffer first.
This is similar to LU-12603, just in a different place.
Lustre-change: https://review.whamcloud.com/35807
Lustre-commit:
2b7af478bdbf5c6701e0e49aefe34597bdee3126
Change-Id: Ifa2aa976ce8613217c739ef609de54538c57b5e9
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yunye Ry <yunye.ry@alibaba-inc.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36107
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Wed, 31 Jul 2019 16:12:40 +0000 (18:12 +0200)]
LU-12604 mdt: check field size of sec context name
In request received from client, check that claimed size of
RMF_FILE_SECCTX_NAME field is consistent with expected content,
which is supposed to be an extended attribute name.
Lustre-change: https://review.whamcloud.com/35655
Lustre-commit:
384cd84489c9a7aa3145560002eb7a053cf4b2db
Test-Parameters: clientselinux testlist=sanity,recovery-small,sanity-selinux envdefinitions=SANITY_EXCEPT="271f"
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ice96f0e03f790b334fcdf64ae4becef2e39738f4
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Emoly Liu [Thu, 29 Aug 2019 02:55:13 +0000 (10:55 +0800)]
LU-12590 ptlrpc: check lm_bufcount and lm_buflen
Check lm_bufcount to be used by lustre_msg_hdr_size_v2() and
validate individual and total buffer lengths in
lustre_unpack_msg_v2() in case of any out-of-bound read.
Lustre-change: https://review.whamcloud.com/35783
Lustre-commit:
268edb13d769994c4841864034d72f0bd7b36e12
Change-Id: I4905e0665c7770443684cffe504935d27473d7c6
Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Yunye Ry <yunye.ry@alibaba-inc.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36119
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Thu, 12 Sep 2019 07:08:25 +0000 (00:08 -0700)]
LU-12608 kernel: kernel update RHEL7.6 [3.10.0-957.27.2.el7]
Update RHEL7.6 kernel to 3.10.0-957.27.2.el7.
Test-Parameters: clientdistro=el7.6 serverdistro=el7.6
Change-Id: I8dd5e24746ccf11467c7a468edf7f9056d5705e3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Jian Yu [Fri, 6 Sep 2019 07:24:35 +0000 (00:24 -0700)]
LU-12724 kernel: kernel update RHEL7.7 [3.10.0-1062.1.1.el7]
Update RHEL7.7 kernel to 3.10.0-1062.1.1.el7.
Test-Parameters: trivial clientdistro=el7.7 serverdistro=el7.7
Change-Id: Iad40fb93b8a15d875b72749a05666a23e4755fcc
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36075
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Amir Shehata [Sun, 17 Mar 2019 15:16:40 +0000 (08:16 -0700)]
LU-12080 lnet: recovery event handling broken
Don't increment health on unlink event.
If a SEND fails an unlink will follow so no need to do any
special processing on SEND event. If SEND succeeds then we
wait for the reply.
When queuing a message on the NI recovery queue only do so
if the MT thread is still running.
Lustre-change: https://review.whamcloud.com/34445
Lustre-commit:
5409e620e0256dc9b657f1c457541d7411b543cd
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4877caebcac5cdfc35a59a18a3e3451b1f23cb0d
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36028
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Oleg Drokin [Thu, 12 Sep 2019 18:04:55 +0000 (18:04 +0000)]
Revert "LU-11816 lnet: setup health timeout defaults"
This is causing frequent assertion failures like below:
LNetError: 1701:0:(lib-move.c:3670:lnet_monitor_thr_stop()) ASSERTION( rc == 0 ) failed:
[ 378.662897] LNetError: 1701:0:(lib-move.c:3670:lnet_monitor_thr_stop()) LBUG
[ 378.665136] Pid: 1701, comm: rmmod 3.10.0-7.6-debug #1 SMP Fri Jul 12 02:40:17 EDT 2019
[ 378.667455] Call Trace:
[ 378.668302] [<
ffffffffa01927dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 378.670463] [<
ffffffffa019288c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 378.672398] [<
ffffffffa021d036>] lnet_monitor_thr_stop+0xe6/0x120 [lnet]
[ 378.674727] [<
ffffffffa01fde8a>] LNetNIFini+0x6a/0x110 [lnet]
[ 378.676532] [<
ffffffffa0622b15>] ptlrpc_ni_fini+0x175/0x200 [ptlrpc]
[ 378.678598] [<
ffffffffa0622e53>] ptlrpc_exit_portals+0x13/0x20 [ptlrpc]
[ 378.680850] [<
ffffffffa06b59aa>] ptlrpc_exit+0x22/0x678 [ptlrpc]
[ 378.683338] [<
ffffffff81108aab>] SyS_delete_module+0x19b/0x300
[ 378.684809] [<
ffffffff817c8e15>] system_call_fastpath+0x1c/0x21
[ 378.686727] [<
ffffffffffffffff>] 0xffffffffffffffff
[ 378.688144] Kernel panic - not syncing: LBUG
This reverts commit
db81f3f293dbc0c9dba90ea1153f554b33fbb80b.
Change-Id: Id12f9d3ec4af3ab37158b3e6049d2ea971d86913
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36173