Whamcloud - gitweb
fs/lustre-release.git
5 years agoLU-8384 scripts: Add scripts to systemd for EL7 57/21457/6
Dmitry Eremin [Fri, 8 Jul 2016 21:15:37 +0000 (00:15 +0300)]
LU-8384 scripts: Add scripts to systemd for EL7

When rebooting a lustre client where Lustre filesystem is still
mounted, the shutdown hangs. This patch create a systemd service
that unmount the Lustre filesystems and unload the Lustre modules
when system is shutdown.

Test-Parameters: trivial
Change-Id: I1cfe84684e23b8861743241dfbc4d6e320ace4a6
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Gregoire Pichon <gregoire.pichon@atos.net>
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/21457
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8047 llite: optimizations for not granted lock processing 65/19665/13
Andrew Perepechko [Thu, 7 Mar 2019 20:18:45 +0000 (12:18 -0800)]
LU-8047 llite: optimizations for not granted lock processing

This patch removes ll_md_blocking_ast() processing for
not granted locks. The reason is ll_invalidate_negative_children()
can slow down I/O significantly without a reason if there
are thousands or millions of files in the directory
cache.

Change-Id: Ic69c5f02f71c14db4b9609677d102dd2993f4feb
Seagate-bug-id: MRP-3409
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/19665
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-6836 test: re-add test 4a to sanity-quota for ZFS 43/34143/4
Hongchao Zhang [Thu, 24 Jan 2019 19:45:05 +0000 (14:45 -0500)]
LU-6836 test: re-add test 4a to sanity-quota for ZFS

The ZFS sync performance has been improved, it's time to add test
4a back into sanity-quota for ZFS, and also increase the grace time
a little for ZFS.

Test-Parameters: trivial
Test-Parameters: envdefinitions=ONLY=4a fstype=zfs testlist=sanity-quota,sanity-quota,sanity-quota

Change-Id: I32dd76686cdd289b49e36efff3abd6691e76ef57
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34143
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
5 years agoLU-11835 mdt: return DOM size on open resend 44/34044/5
Mikhail Pershin [Wed, 16 Jan 2019 13:24:58 +0000 (16:24 +0300)]
LU-11835 mdt: return DOM size on open resend

DOM size is returned along with DOM lock always, but it is
not true with open resend.

Patch fixes that issue and adds test case.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I73d43933f781f192e9aa8c6ee388a043dab5bde9
Reviewed-on: https://review.whamcloud.com/34044
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8260 osd-ldiskfs: osd_fiemap_get() fix address space mismatch 78/33878/6
Arshad Hussain [Tue, 4 Dec 2018 18:20:59 +0000 (23:50 +0530)]
LU-8260 osd-ldiskfs: osd_fiemap_get() fix address space mismatch

There was an address space mismatch in function
osd_fiemap_get() as this uses "__user" qualifier
in fiemap_extent buffer. Since this buffer is created
under kernel and again passed to another call, this
may fail under some configuration.

This patch address this issue by modifying the
address space limit by using get_fs() and set_fs()
call suggesting that the pointers are intact and
secure.

Change-Id: I25048faecd3475d5e91e25e6a47e065e49e36b26
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33878
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-6142 obdclass: Fix style issues for obd_config.c 82/33082/9
Arshad Hussain [Sat, 25 Aug 2018 23:59:42 +0000 (05:29 +0530)]
LU-6142 obdclass: Fix style issues for obd_config.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/obd_config.c

Change-Id: If97513fe594ee76c9e153c33d644cd94c48f82c0
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33082
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11999 dne: performance improvement for file creation 91/34291/4
Jinshan Xiong [Sun, 24 Feb 2019 22:32:41 +0000 (14:32 -0800)]
LU-11999 dne: performance improvement for file creation

This is to remove an obsoleted code where it causes drastic
performance degradation. This code is written before PERM lock
is introduced, and it requests UPDATE lock at path walk for
remote directory, which will be cancelled at later file creation.

Tests result before and after this patch is applied:

Test case:
rm -rf /mnt/lustre_purple/testdir
lfs mkdir -i 0 /mnt/lustre_purple/testdir
lfs mkdir -i 2 /mnt/lustre_purple/testdir/dir2
./lustre-release/lustre/tests/createmany -o \
/mnt/lustre_purple/testdir/dir2/f 10000

Before the patch is applied:
total: 10000 open/close in 12.82 seconds: 780.22 ops/second

After the patch is applied:
total: 10000 open/close in 4.89 seconds: 2044.75 ops/second

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Change-Id: Ib474dc28d6edc7d15801b6821edc0e1d108bb4b6
Reviewed-on: https://review.whamcloud.com/34291
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11964 mdc: prevent glimpse lock count grow 61/34261/8
Mikhail Pershin [Thu, 14 Feb 2019 21:51:00 +0000 (00:51 +0300)]
LU-11964 mdc: prevent glimpse lock count grow

DOM locks matching tries to ignore locks with
LDLM_FL_KMS_IGNORE flag during ldlm_lock_match() but
checks that after ldlm_lock_match() call. Therefore if
there is any lock with such flag in queue then all other
locks after it are ignored and new lock is created causing
big amount of locks on single resource in some access
patterns.
Patch extends lock_matches() function to check flags to
exclude and adds ldlm_lock_match_with_skip()p to use that
when needed.
Corresponding test was added in sanity-dom.sh

Test-Parameters: testlist=sanity-dom
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic45ca10f0e603e79a3a00e4fde13a5fae15ea5fc
Reviewed-on: https://review.whamcloud.com/34261
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10949 mdt: lost reference on mdt_md_root 81/34181/3
Andriy Skulysh [Wed, 20 Feb 2019 10:48:03 +0000 (12:48 +0200)]
LU-10949 mdt: lost reference on mdt_md_root

mdt_remote_object_lock_try() drops object
reference in case of an error but if the
request was sent to a server it is decreased
again via failed_lock_cleanup()

Add ldlm_created_callback. It is called after
lock creation, so we can safely add a reference
to l_ast_data and drop it only in BL AST handler.

Cray-bug-id: LUS-7013
Change-Id: Iaf98c620804f2de4528689e44e957a9fb0073162
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/34181
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10496 ofd: move FMD to the target code 76/34176/7
Mikhail Pershin [Thu, 31 Jan 2019 13:15:28 +0000 (16:15 +0300)]
LU-10496 ofd: move FMD to the target code

- make FMD structures common for all targets
- adapt FMD functionality to be isolated from OFD for
  further move to the target code.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9f67f14e4132205cca67aa778b990bb3b45c30be
Reviewed-on: https://review.whamcloud.com/34176
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
5 years agoLU-8066 fid: use LDEBUGFS_SEQ_* macro 72/34372/3
James Simmons [Mon, 4 Mar 2019 16:06:52 +0000 (11:06 -0500)]
LU-8066 fid: use LDEBUGFS_SEQ_* macro

Lustre has LPROC_SEQ_* for proc handling and LDEBUGFS_SEQ_* macros
for debugfs handling. While similar using the wrong macro can
break things. To avoid that chance lets move the fid subsystem to
the LDBEUGFS_SEQ_* macro since its already moved to debugfs.

Change-Id: I3936c72f9fb58a38847822dad21c4b6f5e1d7a78
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34372
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11721 lod: limit statfs ffree if less than OST ffree 67/34167/8
Andreas Dilger [Sun, 3 Feb 2019 00:11:00 +0000 (17:11 -0700)]
LU-11721 lod: limit statfs ffree if less than OST ffree

If the OSTs report fewer total free objects than the MDTs, then
use the free files count reported by the OSTs, since it represents
the minimum number of files that can be created in the filesystem
(creating more may be possible, but this depends on other factors).
This has always been what ll_statfs_internal() reports, but the
statfs aggregation via the MDT missed this step in lod_statfs().

Fix a minor defect in sanity test_418() that would let it loop
forever until the test was killed due to timeout if the "df -i"
and "lfs df -i" output did not converge.

Fixes: b500d5193360 ("LU-10018 protocol: MDT as a statfs proxy")
Fixes: 263e80f4572b ("LU-11721 tests: wait for statfs to update ...")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id8d7b7edfd854f1ec30bfbbb85f04b0c973ebbe5
Reviewed-on: https://review.whamcloud.com/34167
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nangelinas@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11860 lnet: support config of LNDs with numeric intf name 28/34028/3
Gregoire Pichon [Mon, 14 Jan 2019 19:52:58 +0000 (20:52 +0100)]
LU-11860 lnet: support config of LNDs with numeric intf name

This patch adds support for the net configuration of LNDs that
have numeric interface name (PTL4LND, GNILND). The GNILND case has
already been treated with a specific fix (see patch a29eb587).

Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Change-Id: I10556f9c78ec332bca3344990e509434f904ffc0
Reviewed-on: https://review.whamcloud.com/34028
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-12039 tests: fix 'lfs mkdir' in sanity-selinux 68/34368/2
Sebastien Buisson [Mon, 4 Mar 2019 07:43:42 +0000 (08:43 +0100)]
LU-12039 tests: fix 'lfs mkdir' in sanity-selinux

sanity-selinux test_2b and test_20c assume that directory created
with 'lfs mkdir' will be on MDT0, but they have to use the '-i 0'
flag to make sure.

Test-Parameters: trivial testlist=sanity-selinux clientselinux
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I57df7aed18673b9f7f1301b45e621bb76ebb9845
Reviewed-on: https://review.whamcloud.com/34368
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
5 years agoLU-6142 obdclass: Fix style issues for obd_mount.c 66/34366/3
Arshad Hussain [Sun, 12 Aug 2018 09:18:29 +0000 (14:48 +0530)]
LU-6142 obdclass: Fix style issues for obd_mount.c

This patch fixes issues reported by checkpatch
for file lustre/obdclass/obd_mount.c

Change-Id: Icd0ccdf25d69f01b690b9381864a96c8dc45dd96
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34366
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
5 years agoLU-11836 ldlm: don't convert wrong resource 64/34264/4
Mikhail Pershin [Fri, 15 Feb 2019 09:14:30 +0000 (12:14 +0300)]
LU-11836 ldlm: don't convert wrong resource

During enqueue the returned lock may have different resource
and local client lock replaces resource too. But there is
a valid race with bl_ast and reply from server, so BL AST
may come earlier and find client lock with old resource.
In that case ldlm_handle_bl_callback() should proceed with
normal cancel and don't use cancel_bits for lock convert.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib7fd98ce73821b1e3207e7f2bfba0e0acfdc2380
Reviewed-on: https://review.whamcloud.com/34264
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11922 ldiskfs: make dirdata work with metadata_csum 19/34219/4
Li Dongyang [Sat, 9 Feb 2019 05:37:29 +0000 (16:37 +1100)]
LU-11922 ldiskfs: make dirdata work with metadata_csum

Handle ext4_dir_entry_tail correctly, which is a bogus dir entry
contains the checksum at the end of dir leaf block.

Fix how we get to the limit on the dx_root, we can't assume the
rec_len as 12 as . and .. in front of the dx_root have dirdata.

Also includes another fix for large_dir, where we should update
checksum for dx_node calling ext4_handle_dirty_dx_node(). The change
also makes large_dir patch more consistent with the upstream version.

With this we can enable metadata_csum on the targets.

Test-Parameters: fstype=ldiskfs envdefinitions=LDISKFS_MKFS_OPTS="-O metadata_csum"
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I04df8c30d9d423111e2b4031a7e4b9058101016f
Reviewed-on: https://review.whamcloud.com/34219
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoNew tag 2.12.52 2.12.52 v2_12_52
Oleg Drokin [Fri, 15 Mar 2019 23:15:38 +0000 (19:15 -0400)]
New tag 2.12.52

Change-Id: I7de1c524f2cc413f4028851406ccf50d43bc8d43
Signed-off-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11765 ofd: return EAGAIN during 1st CLEANUP_ORPHAN 36/33836/6
Sergey Cheremencev [Wed, 24 Oct 2018 10:23:43 +0000 (13:23 +0300)]
LU-11765 ofd: return EAGAIN during 1st CLEANUP_ORPHAN

During the 1st CLEANUP_ORPHAN after failover some objects
could absent - they haven't been recreated yet. Issue exists
when MDS last_id much grater than OST last_id and ofd should
recreate thousands of objects. Some of these objects could
be assigned to a FID and requested by client through
glimpse RPC. Thus if object is not found return EAGAIN instead
of ENOENT during the 1st CLEANUP_ORPHAN.

Patch is also adding a test to reproduce the issue.
Test adds a delay to osd_trans_commit_cb() causing
large number OST objects not written to the disk
after failover. And checks that all objects have been
successfully recreated after failover.
The test works only with FAILURE_MODE=HARD option.

Cray-bug-id: LUS-6414
Change-Id: Ia6899b4c1c35e1681f49faf1cb93a501ad159ec2
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/154151
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/33836
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11855 utils: move declarations out of local blocks 26/34026/6
Alex Zhuravlev [Tue, 5 Feb 2019 16:45:19 +0000 (19:45 +0300)]
LU-11855 utils: move declarations out of local blocks

few variables were declared within local blocks, but used by
pointer when blocks are closed. this schema used to work with
older GCC (due to trivial stack management), but not with GCC8.

Change-Id: Ibd02a72264d50609ccf3c5bc5252e45de4160b9e
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34026
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11943 llog: Reset current log on ENOSPC 47/34347/4
Patrick Farrell [Thu, 28 Feb 2019 18:02:13 +0000 (13:02 -0500)]
LU-11943 llog: Reset current log on ENOSPC

The original LU-10527 patch:
"LU-10527 obdclass: don't recycle loghandle upon ENOSPC"
https://review.whamcloud.com/#/c/30897/

Kept the current log on ENOSPC.

This appears to cause llog corruption on failover, and the
other part of the original patch (removing an incorrect
assert) should be sufficient to fix the original issue.

Fixes: 5761b9576d39 ("LU-10527 obdclass: don't recycle loghandle upon ENOSPC")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ie5c0ab77940c1be0ec1f166e4d38080b254bed5c
Reviewed-on: https://review.whamcloud.com/34347
Tested-by: Jenkins
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 years agoLU-12033 build: Git ignore generated *.install files 44/34344/2
Thomas Stibor [Thu, 28 Feb 2019 13:37:46 +0000 (14:37 +0100)]
LU-12033 build: Git ignore generated *.install files

Building client or server DEB packages generates files:
* lustre-server-utils.install
* lustre-client-utils.install
from description files
* lustre-server-utils.install.in
* lustre-client-utils.install.in
Any modifications take place in *.install.in files,
thus gitignore the generated *.install files.

Test-Parameters: trivial
Signed-off-by: Thomas Stibor <t.stibor@gsi.de>
Change-Id: Ib7812f3407b35a74ef8bd63e630bda24fcbd5b48
Reviewed-on: https://review.whamcloud.com/34344
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
5 years agoLU-11329 misc: reorder Git .mailmap file 09/34109/3
Andreas Dilger [Tue, 22 Jan 2019 23:09:41 +0000 (16:09 -0700)]
LU-11329 misc: reorder Git .mailmap file

If there are multiple remappings of the same person (e.g. CFS to
Sun to Oracle to Whamcloud to Intel to Whamcloud2), the entries
in the .mailmap file need to be in chronological order, so that
older entries are mapped to the next email, which is mapped to
the next one, etc.  Otherwise, only a single remapping is done.

Add additional mapping for Patrick Farrell.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6d7f2977da5c24707e3d54fddfdabf94c28e3f0b
Reviewed-on: https://review.whamcloud.com/34109
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11880 build: allow Debian build on Raspbian 85/34085/4
Andreas Dilger [Wed, 26 Dec 2018 09:03:55 +0000 (02:03 -0700)]
LU-11880 build: allow Debian build on Raspbian

Allow the Debian build to work on Raspbian and other Debian and
Ubuntu variants that use the same build machinery.  Add armhf
to the list of allowed CPU architectures.

Test-Parameters: trivial clientdistro=ubuntu1804
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iedc90ad32754f81a3d387a66409f07aaa305a5b1
Reviewed-on: https://review.whamcloud.com/34085
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Thomas Stibor <t.stibor@gsi.de>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11974 llapi: improve llapi_layout_get_by_xattr(3) API 76/34276/8
Li Xi [Tue, 19 Feb 2019 02:33:20 +0000 (10:33 +0800)]
LU-11974 llapi: improve llapi_layout_get_by_xattr(3) API

llapi_layout_get_by_xattr() assumes that the lum has already
been properly swapped by llapi_layout_swab_lov_user_md().
However, llapi_layout_swab_lov_user_md() function is not
exported, so external tool won't be able to use it.

Instead of exporting a lot of APIs, this patch include the
swab functions into llapi_layout_get_by_xattr() and add an
argument flags to the API.

Change-Id: I9fbf0f0ba66660d2f382fb20b03f069c1a7afad5
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/34276
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
5 years agoLU-11243 lod: fix assertion and hang upon lod_add_device failure 94/32994/7
Wang Shilong [Mon, 10 Dec 2018 05:45:33 +0000 (13:45 +0800)]
LU-11243 lod: fix assertion and hang upon lod_add_device failure

There are two problems:

See following assertion:

    lod_add_device() lustre-OSTe42a-osc-MDT0000:
                     can't set up pool, failed with -12
    osp_disconnect() ASSERTION( imp != ((void *)0) ) failed:
    osp_disconnect() LBUG
    CPU: 1 PID: 10059 Comm: llog_process_th

Problem is obd_disconnect() will cleanup @imp and set NULL.
 ->osp_obd_disconnect
    ->class_manual_cleanup
       ->class_process_config
          ->class_cleanup
             ->obd_precleanup
                ->osp_device_fini
                   ->client_obd_cleanup

While ldo_process_config() will try to access @imp again:
 ->ldo_process_config
    ->osp_shutdown
       ->osp_disconnect
          ->LASSERT(imp != NULL)

Another problem is if we failed before obd_connect().
we will hang on with mount:
 ->ldo_process_config
    ->osp_shutdown
       ->osp_disconnect
          ->ptlrpc_disconnect_import
             ->rc = l_wait_event(imp->imp_recovery_waitq,
                                 !ptlrpc_import_in_recovery(imp), &lwi);

Since connect is not called, imp state will stay LUSTRE_IMP_NEW.
Fix this by check whether we are in recovery properly, only consider
we are in recovery if we are in following states:

 LUSTRE_IMP_CONNECTING = 4,
 LUSTRE_IMP_REPLAY     = 5,
 LUSTRE_IMP_REPLAY_LOCKS = 6,
 LUSTRE_IMP_REPLAY_WAIT  = 7,
 LUSTRE_IMP_RECOVER    = 8,

Change-Id: I2113b95a421bae7117f3057d5f0fdf78db95caa3
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32994
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-12042 utils: Remove stray waring from lfs mkdir 74/34374/2
Nathaniel Clark [Tue, 5 Mar 2019 14:22:09 +0000 (09:22 -0500)]
LU-12042 utils: Remove stray waring from lfs mkdir

lfs mkdir -i shouldn't issue a warning about -m being depricated.

Test-Parameters: trivial mdtcount=2 mdscount=2
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I7012ff8ec7499b07d6463c2637c8bedcb3976fe2
Reviewed-on: https://review.whamcloud.com/34374
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoLU-12020 llite: make sure name pack atomic 30/34330/3
Wang Shilong [Tue, 26 Feb 2019 14:38:29 +0000 (22:38 +0800)]
LU-12020 llite: make sure name pack atomic

We are trying to access dentry name directly and pass it
down without holding @d_lock, this is racy and possibly
make us trigger assertions:

(mdc_lib.c:137:mdc_pack_name()) ASSERTION( lu_name_is_valid_2(buf, cpy_len) ) failed:

Fix the problem by allocting memory and copy name with @d_lock
held.

Change-Id: Iae0066661f42e8fca9358cbedd9cb21828779bbb
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34330
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-12018 quota: do not start a thread under memory pressure 28/34328/2
Alex Zhuravlev [Tue, 26 Feb 2019 07:31:53 +0000 (10:31 +0300)]
LU-12018 quota: do not start a thread under memory pressure

this leads to a deadlock as kthreadd creating new threads
can get stuck waiting for memory as well:

PID: 2 TASK: ffff88015d1e0fb0 CPU: 3 COMMAND: "kthreadd"

Change-Id: I88f14da24ea64dcc02a9fd1f4a9c03f5771f8fda
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34328
Tested-by: Jenkins
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10143 tests: Add version check for interop 85/34285/5
Patrick Farrell [Wed, 20 Feb 2019 22:46:20 +0000 (17:46 -0500)]
LU-10143 tests: Add version check for interop

Without the fix for LU-10143, these tests will fail, so add
a version check for interop.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I65eead5d5252ec6abf5d8de68d73e4f9b690d030
Reviewed-on: https://review.whamcloud.com/34285
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 years agoLU-11982 utils: Correct lfs migrate help 84/34284/3
Patrick Farrell [Wed, 20 Feb 2019 20:04:41 +0000 (15:04 -0500)]
LU-11982 utils: Correct lfs migrate help

Correct lfs migrate help to correctly describe
block/non-block behavior.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id7c01df08514ecc1b3968ef92a466bd8f5d9e656
Reviewed-on: https://review.whamcloud.com/34284
Tested-by: Jenkins
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11944 llite: Lock inode on tiny write if setuid/setgid set 18/34218/4
Ann Koehler [Fri, 8 Feb 2019 21:41:37 +0000 (15:41 -0600)]
LU-11944 llite: Lock inode on tiny write if setuid/setgid set

During a write, the setuid/setgid bits must be reset if they are
enabled and the user does not have the correct permissions. Setting
any file attributes, including setuid and setgid, requires the inode
to be locked. Writes became lockless with the introduction of
LU-1669. Locking the inode in the setuid/setgid case was added to
vvp_io_write_start() as a special case. The inode locking was not
included when support for tiny writes was added with LU-9409. This
mod adds the necessary inode lock/unlock calls to ll_do_tiny_write().

If the inode is not locked when setuid/setgid are reset, the kernel
will issue a one time warning and Lustre may hang trying to get the
inode lock in ll_setattr_raw().

Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I5e8a98789828de52dbff4226958741320aba92e6
Reviewed-on: https://review.whamcloud.com/34218
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-9859 libcfs: use a workqueue for rehash work. 69/34169/3
NeilBrown [Mon, 11 Feb 2019 15:46:14 +0000 (10:46 -0500)]
LU-9859 libcfs: use a workqueue for rehash work.

lustre has a work-item queuing scheme that provides the
same functionality as linux work_queues.
To make the code easier for linux devs to follow, change
to use work_queues.

Linux-commit: 0aa211e39857f17e24126c47f6e3fe3b971344b3

Change-Id: I1600ea1ef8769f1f6489b81fd578685ea58f9cb6
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/34169
Tested-by: Jenkins
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11894 lnet: check for asymmetrical route messages 19/34119/4
Sebastien Buisson [Mon, 28 Jan 2019 15:16:42 +0000 (00:16 +0900)]
LU-11894 lnet: check for asymmetrical route messages

Asymmetrical routes can be an issue when debugging network,
and allowing them also opens the door to attacks where hostile
clients inject data to the servers.

In order to prevent asymmetrical routes, add a new lnet kernel
module option named 'lnet_drop_asym_route'. When set to non-zero,
lnet_parse() will check if the message received from a remote peer
is coming through a router that would normally be used by this node
to reach the remote peer. If it is not the case, then it means we
are dealing with an asymmetrical route message, and the message will
be dropped.

The check for asymmetrical route can also be switched on/off with
the command 'lnetctl set drop_asym_route 0|1'. And this parameter is
exported/imported in Yaml.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I06fb23d9e46984d79c14fa9b53b2fa04ce3c50c5
Reviewed-on: https://review.whamcloud.com/34119
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11891 utils: getstripe use --mdt-index consistently 16/34116/2
Andreas Dilger [Sun, 27 Jan 2019 18:23:48 +0000 (11:23 -0700)]
LU-11891 utils: getstripe use --mdt-index consistently

LU-10856 fixed most usages of "warning: '-M' deprecated,
use '--mdt-index' or '-m' instead" but missed a few in
cases in sanity test_271d, test_271e, and test_271f.
Fix those tests to use "--mdt-index".

Also, lfs has a few places were the usage of "--mdt-index"
and "--mdt" is inconsistent.  Fix those options to be used
consistently across all commands.

Fixes: 6c617a3d56 ("LU-10856 tests: remove deprecated lfs ...")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I013b2198f3a39533da9a0067a0bf5846604b3052
Reviewed-on: https://review.whamcloud.com/34116
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-9855 obd: use ldo_process_config for mdc and osc layer 06/34106/7
James Simmons [Thu, 24 Jan 2019 16:59:32 +0000 (11:59 -0500)]
LU-9855 obd: use ldo_process_config for mdc and osc layer

Both the mdc and osc layer use the lu_device infrastructure but
we don't use ldo_process_config() which is preferred over the
currently used obd_process_config() handling. Migrate to the
lu_device ldo_process_config() for both mdc and osc layer.

Change-Id: I4d5e84c48377283148133e6338ef7257c44b89a3
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34106
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11849 utils: fix to make exclude projid works 05/34005/4
Wang Shilong [Thu, 10 Jan 2019 15:32:14 +0000 (23:32 +0800)]
LU-11849 utils: fix to make exclude projid works

We intended to use projid not uid here, fix it.
Also add ! --projid options test to cover this.

Test-Parameters: trivial testlist=sanity-quota
Change-Id: I64c3f1c68885947d0e91626525ee037756e1d7d8
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34005
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
5 years agoLU-11830 lov: avoid signed vs. unsigned comparison 21/33921/8
Andreas Dilger [Wed, 26 Dec 2018 09:29:55 +0000 (02:29 -0700)]
LU-11830 lov: avoid signed vs. unsigned comparison

In the expansion of do_div64() GCC complains about pointer comparison
because loff_t is not a u64 variable as it should be.  lov_do_div64()
also has signed vs. unsigned comparisons due to a signed loff_t.
Change lov_do_div() to use a 64-bit variable for do_div() instead of
loff_t to avoid these warnings.

Change OST_MAXREQSIZE and friends to be consistently unsigned values
to avoid compiler warnings.

Fix "lfs mirror resync" to avoid comparing signed and unsigned valued.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6df31a2c0d75e5777f471fe8cb252715dd85a5b1
Reviewed-on: https://review.whamcloud.com/33921
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8066 utils: have llapi_target_iterate use sysfs tree 99/33799/16
James Simmons [Fri, 22 Feb 2019 15:41:34 +0000 (10:41 -0500)]
LU-8066 utils: have llapi_target_iterate use sysfs tree

Update llapi_target_iterate() to not use 'devices' but collect the
data from the lustre sysfs tree itself.

Change-Id: If100b4918bdcc8b24e72f37127048a32a808310f
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33799
Tested-by: Jenkins
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11010 tests: remove return after skip for conf-sanity 35/32735/7
James Nunez [Wed, 27 Jun 2018 15:52:22 +0000 (09:52 -0600)]
LU-11010 tests: remove return after skip for conf-sanity

The skip() routine now contains a call to exit. All calls
to skip() and skip_env() should be reviewed and calls to
return() that followed skip() should be removed.

This is the fifth patch in a series of patches that
remove calls to return() after skip() in the Lustre test
suites.

A comment is added to skip_env() defining when it should
be used.

Calls to return() after skip() are removed for:
conf-sanity.sh
functions.sh

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Idbcdffda38aaac07f128ae42a2ffcda8986afc33
Reviewed-on: https://review.whamcloud.com/32735
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8207 scripts: add auto-stripe option to lfs_migrate 52/20552/18
Nathan Dauchy [Mon, 2 Jul 2018 14:21:35 +0000 (10:21 -0400)]
LU-8207 scripts: add auto-stripe option to lfs_migrate

Add a "-A" flag to lfs_migrate, which will automatically select the
stripe count as the file is rewritten. Initial algorithm to
determine stripe count is sqrt(size_in_GB)+1, with an additional cap
on object size, though the algorithm or thresholds could conceivably
change in the future.  The primary intent for this feature is to be
able to give users a tool to fix stripe settings on existing files
based on file size.

A new "-C" flag specifies the object size cap.  On each OST, the
amount of space available for migration is capped by dividing the
free space of the smallest OST by the specified value.

A new "-M" flag allows OSTs with free space less than the specified
value to be considered unavailable for migration.

A new "-v" flag increases verbosity to help debug what is being done.

A new "-X" flag limits the amount of free space on each OST that
can be used for migration to the specified value.  This flag is
useful for testing by simulating OSTs that are nearly full.

A new sanity test verifies the operation of the new "-A" flag.

Test-Parameters: trivial
Signed-off-by: Nathan Dauchy <nathan.dauchy@nasa.gov>
Signed-off-by: Steve Guminski <stephenx.guminski@intel.com>
Change-Id: I9ce8b64e028d9abb66b6b49cf7675263fd7202f0
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/20552
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoLU-11502 migrate: link parents lock may deadlock 25/33325/3
Lai Siyao [Fri, 31 Aug 2018 20:23:11 +0000 (04:23 +0800)]
LU-11502 migrate: link parents lock may deadlock

To cancel link parent lock, it should cancel all locks taken including
source parent locks, otherwise it may cause deadlock, so lock retry
should start from beginning.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I820d0e1664dbb405d6ed8245bb4ca2137140c323
Reviewed-on: https://review.whamcloud.com/33325
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8365 ldiskfs: procfs entries for mballoc 42/21142/14
Lokesh Nagappa Jaliminche [Mon, 4 Jul 2016 09:04:20 +0000 (14:34 +0530)]
LU-8365 ldiskfs: procfs entries for mballoc

Export mballoc streaming block allocator variables
mb_last_group and mb_last_start through procfs.

Test-Parameters: testgroup=review-ldiskfs
Change-Id: I5dd00503a81c6819751c9f99b64615b497ef4e28
Cray-bug-id: LUS-3176
Signed-off-by: Lokesh Nagappa Jaliminche <lokesh.jaliminche@seagate.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/21142
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 years agoLU-11413 lnet: use right address for routing message 32/34032/3
Alexey Lyashkov [Tue, 22 Jan 2019 08:41:00 +0000 (11:41 +0300)]
LU-11413 lnet: use right address for routing message

msg_initiator is real sender address, so use this address as
hash source to better distribution against CPT on server side.

Cray-bug-id: LUS-6841
Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: Ie4487ea29d9db458564c66518270ad82b5ffae49
Reviewed-on: https://review.whamcloud.com/34032
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11413 lnet: use right rtr address 31/34031/3
Alexey Lyashkov [Tue, 22 Jan 2019 08:40:59 +0000 (11:40 +0300)]
LU-11413 lnet: use right rtr address

use a sender router to avoid credits distribution problem.
Sender is preferable rtr now.

Cray-bug-id: LUS-6490
Test-Parameters: trivial
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Change-Id: Ic7cf57820176979a52675dcc74342c2e26335e73
Reviewed-on: https://review.whamcloud.com/34031
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11775 osc: check if opg is in lru list without locking 60/33860/4
Li Dongyang [Fri, 14 Dec 2018 01:36:22 +0000 (12:36 +1100)]
LU-11775 osc: check if opg is in lru list without locking

osc_lru_use is called for every page queued for io,
we can just check if the osc_page is in the lru list
without taking the cl_lru_list_lock and return if not
as a fast path.
Note we still need to do the check again after locking
as it could be removed from the lru list by another thread.

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I1587b6b5547ae5a7a8bfe32a78361bb888c85d5b
Reviewed-on: https://review.whamcloud.com/33860
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11951 ptlrpc: reset generation for old requests 21/34221/9
Alex Zhuravlev [Mon, 11 Feb 2019 11:27:54 +0000 (14:27 +0300)]
LU-11951 ptlrpc: reset generation for old requests

all requests generated while the import is changing from
FULL to IDLE need to be moved to the new generation.

Change-Id: I59d9b92680c724132dba9c7315c26e9851c5d5d2
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34221
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8299 llite: ll_fault should fail for insane file offsets 42/34242/5
Alexander Zarochentsev [Tue, 12 Feb 2019 15:28:37 +0000 (18:28 +0300)]
LU-8299 llite: ll_fault should fail for insane file offsets

A page fault for a mmapped lustre file at offset large than
2^63 cause Lustre client to hang due to wrong page index
calculations from signed loff_t.
There is no need to do such calclulations but perform
page offset sanity checks in ll_fault().

Cray-bug-id: LUS-1392
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Change-Id: Ia492083ee4bdc23edfcbf88cb6d7e9726b2ca80c
Reviewed-on: https://review.whamcloud.com/34242
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
5 years agoLU-11838 llite: remove assert for acl refcount 36/34236/3
James Simmons [Tue, 12 Feb 2019 23:59:19 +0000 (18:59 -0500)]
LU-11838 llite: remove assert for acl refcount

The purpose of this asssert to was to ensure lustre
was properly managing its posix_acl access. This test
is invalid due to the VFS layer also taking references
on the posix_acl. In reality their is no simple way to
detect this class of mistakes.

* lastest kernels remove this refcount *

Linux-commit: 6a42e615a28bad49f2e04829486e94190c066390

Change-Id: I167f2de449a2e8357517f33c2e81a25b25104d57
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34236
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11960 build: Add missing libssl-dev DEB package 33/34233/3
Thomas Stibor [Tue, 12 Feb 2019 13:30:51 +0000 (14:30 +0100)]
LU-11960 build: Add missing libssl-dev DEB package

Building Lustre client DEB packages on Debian fails due to missing
package libssl-dev and results in error:
"No such file or directory #include <openssl/evp.h>"
Add required package libssl-dev into "make debs" chain.

Test-Parameters: clientdistro=ubuntu1804 envdefinitions=SANITY_EXCEPT=802a trivial
Signed-off-by: Thomas Stibor <t.stibor@gsi.de>
Change-Id: Ib99cd744b2d44d3f6c1915e2f2d2da7d83e07cae
Reviewed-on: https://review.whamcloud.com/34233
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11206 tests: Use import_ready to check IDLE 25/34225/2
Patrick Farrell [Mon, 11 Feb 2019 17:47:09 +0000 (12:47 -0500)]
LU-11206 tests: Use import_ready to check IDLE

When checking if a client/OST import is up, we have to
check for IDLE as well as FULL.

wait_osc_import_ready is provided for this, but a few spots
don't use it, so they occasionally fail.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I826659a7f5953dee4e4551c1177479ef742b5589
Reviewed-on: https://review.whamcloud.com/34225
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-9706 dt: remove dt_txn_hook_commit() 12/34212/3
Alex Zhuravlev [Thu, 7 Feb 2019 09:33:12 +0000 (12:33 +0300)]
LU-9706 dt: remove dt_txn_hook_commit()

it's not used and it's not safe as dt_txn_callback_del()
and dt_txn_callback_add() can race with commit callbacks.

Change-Id: Ib80b0f69be008b4f895586dde35d1a5833a1a861
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34212
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
5 years agoLU-11927 kernel: new kernel [SLES12 SP4 4.12.14-95.6.1] 91/34191/4
Jian Yu [Mon, 11 Feb 2019 23:53:13 +0000 (15:53 -0800)]
LU-11927 kernel: new kernel [SLES12 SP4 4.12.14-95.6.1]

This patch makes changes to support new SLES12 SP4 release
for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT=103a

Change-Id: Ibe59ebc30c25f2cab771ac4c2c9b7a9b974732d5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34191
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11914 build: add a configure check for l_getsepol 83/34183/5
Sebastien Buisson [Tue, 5 Feb 2019 14:09:45 +0000 (23:09 +0900)]
LU-11914 build: add a configure check for l_getsepol

l_getsepol requires openssl-devel, so add a configure check for
openssl/evp.h header and EVP_MD_CTX_create function, and disable
building l_getsepol in case they are missing.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I31ddbc2f5300e9e38db9e00e2b7fbcac7f83d9e5
Reviewed-on: https://review.whamcloud.com/34183
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Thomas Stibor <t.stibor@gsi.de>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11923 utils: fix mkfs.lustre meta_bg handling 78/34178/2
Andreas Dilger [Tue, 5 Feb 2019 10:06:43 +0000 (03:06 -0700)]
LU-11923 utils: fix mkfs.lustre meta_bg handling

If meta_bg is specified as a formatting option, mke2fs reports an
error because it conflicts with the default resize_inode feature,
which is enabled by default for filesystems under 2^32 blocks.

Enable meta_bg by default if the filesystem is over 2^36 blocks
(256TiB) or the group descriptor table grows too large.

Disable the resize_inode feature if meta_bg is explicitly specified
or if the filesystem is over 2^32 blocks.

Fix is_e2fsprogs_feature_supp() to return a boolean in the proper
sense for the function, rather than a 0 or -ve error number, since
that doesn't make sense for the name of the function.

Drop a level of indent in ldiskfs_make_lustre() by returning an
error directly if the ldd_mount_type is unknown.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8886e6850aad5868155b2208043dbbc4873ebbe5
Reviewed-on: https://review.whamcloud.com/34178
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-9859 libcfs: use strim instead of cfs_trimwhite. 68/34168/4
NeilBrown [Mon, 11 Feb 2019 15:56:11 +0000 (10:56 -0500)]
LU-9859 libcfs: use strim instead of cfs_trimwhite.

Linux lib provides identical functionality to cfs_trimwhite,
so discard that code and use the standard.

Linux-commit: 213b14b1fa55790f55b180ed5121b07f037c7ddd

Change-Id: Ide9d829aef554541a3dfb65ecb305e89c7ddf74a
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/34168
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Jenkins
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11913 utils: allow "mq-deadline" as scheduler 63/34163/2
Andreas Dilger [Fri, 1 Feb 2019 20:10:40 +0000 (13:10 -0700)]
LU-11913 utils: allow "mq-deadline" as scheduler

Allow the "mq-deadline" scheduler for multi-queue block devices, in
addition to just "noop" and "deadline".  Explicitly add "deadline"
as a valid option, in case the default scheduler is changed.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2cb0878188aea43f88c503ea70a699be083ebbe5
Reviewed-on: https://review.whamcloud.com/34163
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11905 mdc: Add RETURN to mdc_intent_open_pack 40/34140/3
Patrick Farrell [Wed, 30 Jan 2019 21:28:08 +0000 (16:28 -0500)]
LU-11905 mdc: Add RETURN to mdc_intent_open_pack

mdc_intent_open_pack has ENTRY, but not RETURN.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I443dad13db19b6ee8fa4102c97ec93e16f9dd008
Reviewed-on: https://review.whamcloud.com/34140
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11777 tests: add version check sanityn 102 53/33953/6
James Nunez [Tue, 15 Jan 2019 16:08:17 +0000 (09:08 -0700)]
LU-11777 tests: add version check sanityn 102

sanityn test 102 was added to Lustre tag 2.11.57. Thus,
we need to check that the server is 2.11.57 or later
before running test 102.

Test-Parameters: trivial serverjob=lustre-b2_10 serverbuildno=152 testlist=sanityn
Test-Parameters: testlist=sanityn
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Ieefd6f0f3dc0051646f07c309fb59dc6124c2975
Reviewed-on: https://review.whamcloud.com/33953
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11768 test: reset qsd_time before test 31/33931/2
Hongchao Zhang [Sat, 22 Dec 2018 22:21:22 +0000 (17:21 -0500)]
LU-11768 test: reset qsd_time before test

In test_6 of sanity-quota, if the qsd_timeout is larger than
TIMEOUT*2, it will trigger the watchdog and cause the test fail.

Change-Id: I3f2993ce2b88e1520b6907ae134557abcd30aa0c
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33931
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11827 llog: protect cathandle in llog_cat_declare_add_rec 14/33914/4
Vladimir Saveliev [Sat, 22 Dec 2018 00:31:45 +0000 (03:31 +0300)]
LU-11827 llog: protect cathandle in llog_cat_declare_add_rec

llog_cat_declare_add_rec() calls llog_cat_prep_log() passing
&cathandle->u.chd.chd_current_log and
&cathandle->u.chd.chd_next_log. Then it has to protect cathandle in
order to avoid race with llog_cat_current_log() when it decides to
change cathandle->u.chd.chd_current_log and
cathandle->u.chd.chd_next_log.

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Cray-bug-id: LUS-6804
Change-Id: I689efb40452af180f137aff35ccabe132a24180a
Reviewed-on: https://review.whamcloud.com/33914
Tested-by: Jenkins
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11775 osc: reduce atomic ops in osc_enter_cache_try 59/33859/3
Li Dongyang [Fri, 14 Dec 2018 01:22:29 +0000 (12:22 +1100)]
LU-11775 osc: reduce atomic ops in osc_enter_cache_try

We can reduce the number of atomic ops performed on
obd_dirty_pages for the common case.

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I5526e449d483241d825af18b612ae1d1dff3241e
Reviewed-on: https://review.whamcloud.com/33859
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11773 utils: add PFL flags support to YAML API 52/33852/4
Patrick Farrell [Thu, 13 Dec 2018 20:21:31 +0000 (14:21 -0600)]
LU-11773 utils: add PFL flags support to YAML API

The setstripe YAML interface currently ignores the
lcme_flags field. This means it doesn't work correctly with
some FLR layouts.

Fixing this is a trivial matter of making the YAML layout
generator read & use the lcme_flags field.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: If15999aa58ac3e31da677bd5d1ef8b063b46b1e5
Reviewed-on: https://review.whamcloud.com/33852
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
5 years agoLU-11759 tests: racer cleanup 31/33831/3
Vladimir Saveliev [Fri, 14 Dec 2018 16:59:38 +0000 (19:59 +0300)]
LU-11759 tests: racer cleanup

1. set LCTL in do_nodes $clients $racer so that lustre_build_version
   worked correctly
2. list processes for ps -C properly
3. clear trap ERR in racer routines which sources test-framework.sh

Cray-bug-id: LUS-6592
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: I667efd58004fbe02e79b3c02032133ea41f5337b
Test-Parameters: testlist=racer
Reviewed-on: https://review.whamcloud.com/33831
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11702 o2iblnd: ibc_rxs is created and freed with different size 21/33721/4
Andriy Skulysh [Mon, 6 Aug 2018 16:31:05 +0000 (19:31 +0300)]
LU-11702 o2iblnd: ibc_rxs is created and freed with different size

kiblnd_create_conn()) alloc '(conn->ibc_rxs)': 26832 at ffffc90012e69000
kiblnd_destroy_conn()) kfreed 'conn->ibc_rxs': 4576 at ffffc90012e69000

The size changed by kiblnd_create_conn() :
"peer 172.18.2.3@o2ib - queue depth reduced from 128 to 21"

Based on size LIBCFS_FREE() decides whether to use kfree or vfree
and accounts memory usage.

Allocate ibc_rxs after rdma_create_qp()

Change-Id: I1fb1516bd5427e0c959ce2e71bb248d727bb3c49
Cray-bug-id: LUS-6339
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-on: https://review.whamcloud.com/33721
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8130 libcfs: support latest rhashtable API 36/34036/3
James Simmons [Mon, 11 Feb 2019 16:20:04 +0000 (11:20 -0500)]
LU-8130 libcfs: support latest rhashtable API

With the broad support range of the OpenSFS lustre version pieces
are missing in some distributions to properly support using the
rhashtable API as required by Lustre.

Change-Id: I7ce2949ca2f1d497dcb60a8b17b964e47cdff223
Test-Parameters: trivial
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34036
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8066 osd: migrate from proc to sysfs 10/32810/18
James Simmons [Fri, 1 Feb 2019 16:26:45 +0000 (11:26 -0500)]
LU-8066 osd: migrate from proc to sysfs

Move the osd based modules, osd-ldiskfs and osd-zfs, from using
proc for most single value files to sysfs. Also update MGS as
well since it had symlinks into the osd proc tree originally.

Change-Id: Ib3838038299937d7e9ae68130d50ec2afb84e996
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32810
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11330 osd-zfs: hash for ./.. must be 0 98/34098/10
Alex Zhuravlev [Thu, 24 Jan 2019 05:04:09 +0000 (08:04 +0300)]
LU-11330 osd-zfs: hash for ./.. must be 0

do not use current iterator position as hash source for dot and dotdot.
instead just return 0 as hash for these entries.

Change-Id: I5ee439b237e8ed98d295f5672b1d0e8a6b48a55b
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34098
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-6142 mgs: Fix style issues for mgs_nids.c 13/33713/2
Arshad Hussain [Sat, 24 Nov 2018 15:01:17 +0000 (20:31 +0530)]
LU-6142 mgs: Fix style issues for mgs_nids.c

This patch fixes issues reported by checkpatch
for file lustre/mgs/mgs_nids.c

Change-Id: Iefdaf6f2aa8c9c426365fe98c4f91438b5fe6689
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/33713
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
5 years agoLU-11089 obdclass: use an rwsem instead of lu_key_initing_cnt. 12/32712/5
NeilBrown [Fri, 28 Dec 2018 14:26:29 +0000 (09:26 -0500)]
LU-11089 obdclass: use an rwsem instead of lu_key_initing_cnt.

The main use of lu_key_initing_cnt is to wait for it to be zero, so
that lu_context_key_quiesce() can continue.  This is a lot
like the behavior of a semaphore.

So use an rwsem instead.

When keys_fill() calls down_read() it will opportunistically spin
while the writer is running.  As the writer is very short - just
setting a bit for keys_fill() to see, this is likey to always
be the case.
lu_context_key_quiesce() will now, if necessary, go to sleep until
woken, rather than spin repeatedly calling schedule.

Code is much more readable this way and lu_keys_guard is no longer
involved in this locking.

We can remove the write_lock from lu_context_key_revive() as there is
nothing to protect against.  This already mustn't race with
lu_context_key_quiesce(), and if keys_fill() runs concurrently and
doesn't see that LCT_QUIESCENT has been cleared, it hardly matters.
After it is cleared, lu_context_refill() will need to be run anyway.

Change-Id: Id183a9372ca42267cc50f2547823585ff383ea1d
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32712
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11089 obdclass: make key_set_version an atomic_t 11/32711/15
NeilBrown [Thu, 15 Nov 2018 19:03:01 +0000 (14:03 -0500)]
LU-11089 obdclass: make key_set_version an atomic_t

As a first step to simplifying the locking in lu_object.c,
change key_set_version to an atomic_t.  This will mean
we don't need to hold a spinlock when incrementing.

It is clear that keys_fill() (and so lu_context_refill())
cannot race with itself as it holds no locks for the main
part of the work.  So updates to lc_version, and testing of
that value cannot need locking either.
So remove the locking from keys_fill() and lu_context_refill()
as there is no longer anything to protect.  The locking around
lu_keys_initing_cnt is preserved for now.

Also, don't increment when deregistering or quiescing a key.
key_set_version is *only* use to avoid filling new key values
if there have been no changes to the set of key.
Deregistering a key does not mean that we need to try filling
any new value, so the increment is pointless.

Finally, remove the refill loop in keys_fill().  If a key
is registered or revived while keys_fill() is running it must be safe
to ignore it just as we would if it was registered immediately
after keys_fill() ran.  The important thing is that the
keys_set_version stored in ctx->lc_version must be sampled
*before* those unseen keys were added.
So sample keys_set_version early.

Linux-commit : 0fbfbc5ad0f892cf4c5e087a4e7e67102b2289af

Change-Id: Ic6907561d0bb864b10e1c53fb3e5469d0c60f888
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32711
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11584 osd-ldiskfs: fix lost+found object replace 46/33546/3
Andreas Dilger [Thu, 1 Nov 2018 08:59:04 +0000 (02:59 -0600)]
LU-11584 osd-ldiskfs: fix lost+found object replace

Fix the case where an OST object is being moved from lost+found
and an unused OST object is found in the object tree with the
same OID.  The unused object was being deleted, but the object
was not being moved from lost+found in this case.

Continue on with moving the object from lost+found unless an
error was returned from the unlink (excluding -ENOENT).

Fix sanity-scrub.sh test_14 to run e2fsck after the test to
verify that OI Scrub repaired the lost+found objects correctly.

The check_and_prep() helper erases the filesystem and should
only be used in cases like sanity-scrub and sanity-lfsck where
it matters.  Remove unnecessary call from sanity test_409().

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6fc6a15b1ca3d8e34bfd2d1266f80fc0730540e5
Reviewed-on: https://review.whamcloud.com/33546
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-5604 tests: fix usage of drop_ldlm_reply() in tests 46/16846/14
Mikhail Pershin [Fri, 16 Oct 2015 11:04:22 +0000 (14:04 +0300)]
LU-5604 tests: fix usage of drop_ldlm_reply() in tests

The OBD_FAIL_LDLM_REPLY is not used to drop replies on MDT,
OST and MGS anymore. But it is still used in some tests
via drop_ldlm_reply() wrapper.

Patch renames drop_ldlm_reply() to the drop_mdt_ldlm_reply()
since it was used only for MDT. Tests were fixed also to use
MDT-specific OBD_FAIL_MDS_REPLY_NET code.
recovery-small.sh: test_53, test_66, test_113 and test_133
replay-dual.sh: test_19
replay-single.sh: test_73b

Test 66 in recovery-small also was fixed to be aware of DNE.
Tests 66 and 10c in recovery-small.sh were fixed to use
'conn_uuid' param instead of 'mds_conn_uuid' which doesn't
exists now.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I90f7410cffcd504b3ff37728df1522693e6115cf
Reviewed-on: https://review.whamcloud.com/16846
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11975 test: fix for llog test 10h 87/34287/6
Alexander Boyko [Thu, 21 Feb 2019 15:12:56 +0000 (10:12 -0500)]
LU-11975 test: fix for llog test 10h

At test 10h thread should set failloc before the test
starts adding records. And the main llog_process_thread
should wait a bitmap modification.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I4fdedf10f943f6ab264c2d83414f0a404ca42b9c
Reviewed-on: https://review.whamcloud.com/34287
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
5 years agoLU-11208 tests: add version check to sanity tests 20/33420/10
James Nunez [Wed, 13 Feb 2019 00:13:11 +0000 (17:13 -0700)]
LU-11208 tests: add version check to sanity tests

sanity test 27G was added to Lustre tag 2.11.51. Thus, we
need to check that the server is 2.11.51 or later before
running test 27G (LU-11208).

sanity test 239A, change version check to 2.10.4 (LU-10230).

sanity test 311 was modified in Lustre tag 2.12.51. We need
to check that the server is 2.12.51 or later (LU-11965).

sanity test 317 was added to Lustre tag 2.11.53.
We need to check that the server is 2.11.53 or later before
running test 317 (LU-11778).

Fixes: 37f6357a5c9f ("LU-10629 lod: Clear OST pool with setstripe")
Fixes: 0ba690a526be ("LU-7251 osp: do not assign commit callback to every thandle")
Fixes: a531ab5f38a6 ("LU-11605 osp: max_create_count and create_count changes")
Fixes: 6115eb7fd5 ("LU-10370 ofd: truncate does not update blocks count on client")
Test-Parameters: trivial serverjob=lustre-b2_10 serverbuildno=152 testlist=sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I3be45bd0bde5ec041fefef2559656ec74448dffa
Reviewed-on: https://review.whamcloud.com/33420
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11934 mdc: don't use ACL at setattr 94/34194/2
Alexander Boyko [Wed, 6 Feb 2019 08:55:31 +0000 (03:55 -0500)]
LU-11934 mdc: don't use ACL at setattr

For ldiskfs with large_ea, EA max size is equal to 1MB.
At mdc_setattr ptlrpc reply size is 1.1MB and it is rounded
to 2MB. So REINT_SETATTR request takes about 2MB of memory at
client. For a MDS failover case many request stay at reply queue
and could lead to OOM.

The patch changes acl size to zero, cause server doesn't fill
acl for setattr request.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: Id37ee07d743371a03c0c2d14f462bdd13afe1ef6
Cray-bug-id: LUS-6938
Reviewed-on: https://review.whamcloud.com/34194
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11924 osp: combine llog cancel operations 79/34179/5
Alexander Boyko [Tue, 5 Feb 2019 11:36:28 +0000 (06:36 -0500)]
LU-11924 osp: combine llog cancel operations

The osp_sync_process_committed() cancels llog records one by one.
For each cancel it do open,transaction,mutex,write, etc. But most
of all cancels belongs to a single llog file. So they could
be combined.

The patch adds functions for cancelling array of indexes for a
llog file. And adds behavior and calls at
osp_sync_process_committed().

Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-6836
Change-Id: I4f461687021b3f76595d403cdd0bb6aba8d93b53
Reviewed-on: https://review.whamcloud.com/34179
Tested-by: Jenkins
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11920 lod: do not reset lds_def_comp_entries 75/34175/5
Alex Zhuravlev [Mon, 4 Feb 2019 17:42:40 +0000 (20:42 +0300)]
LU-11920 lod: do not reset lds_def_comp_entries

as it can contain valid pointer and the buffer is refilled every time.

Change-Id: I6ae043c31c8cd1414a80a48687bd784e30425553
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34175
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11919 llite: Initialize cl_dirty_max_pages 73/34173/4
Patrick Farrell [Mon, 4 Feb 2019 03:47:10 +0000 (22:47 -0500)]
LU-11919 llite: Initialize cl_dirty_max_pages

cl_dirty_max_pages must be initialized to zero before
calling client_adjust_max_dirty.

Test-Parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ie34306ae329e377520a7a4858ab969f901c6154d
Reviewed-on: https://review.whamcloud.com/34173
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
5 years agoLU-9859 libcfs: replace cfs_srand() calls with add_device_randomness(). 70/34170/5
NeilBrown [Mon, 4 Feb 2019 17:54:41 +0000 (12:54 -0500)]
LU-9859 libcfs: replace cfs_srand() calls with add_device_randomness().

The only places that cfs_srand is called, the random bits are
mixed with bits from get_random_bytes().  So it is equally effective
to add entropy to either pool.
So we can replace calls to cfs_srand() with calls that add the
entropy with add_device_randomness().  That function adds time-based
entropy, so we can discard the ktime_get_ts64 calls.

One location in lustre_handles.c only adds time based
entropy. This cannot improve the entropy provided by
get_random_bytes(), so just discard that call.

Linux-commit: 30f4236aafa81722490e74ded48a9fb2aff013ab

Change-Id: If1a8ffad05fcc89272136949300f6512dca39704
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/34170
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11917 tests: wait for test_257 to finish recovery 65/34165/4
Andreas Dilger [Fri, 1 Feb 2019 23:35:47 +0000 (16:35 -0700)]
LU-11917 tests: wait for test_257 to finish recovery

The sanity test_257 restarts the MDS, but it may still be in
recovery at the start of test_260a and cause failures.  Wait for
recovery to complete before the end of test_257.

Fixes: 1cb9e85039c ("LU-7433 ldlm: xattr locks are lost on mdt")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I00f183938a208608155ca71b08ca7977903ebbe5
Reviewed-on: https://review.whamcloud.com/34165
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-8346 tests: remove spaces around fail_val 55/34155/3
James Nunez [Fri, 1 Feb 2019 03:40:59 +0000 (20:40 -0700)]
LU-8346 tests: remove spaces around fail_val

conf-sanity test 93 tries to set fail_loc and fail_val
with the command 'lctl set_param fail_val = 10 fail_loc...'.
fail_val should have no spaces before and after the
equals sign.

Test-Parameters: trivial mdscount=2 mdtcount=4 envdefinitions=ONLY=93 testlist=conf-sanity
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Iaa2bff1750a2afa96a73a452a0c098ae92f7616c
Reviewed-on: https://review.whamcloud.com/34155
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
5 years agoLU-11911 lov: fix lov_iocontrol for inactive OST case 48/34148/2
Vladimir Saveliev [Fri, 1 Feb 2019 00:16:29 +0000 (03:16 +0300)]
LU-11911 lov: fix lov_iocontrol for inactive OST case

For inactive OSTs lov->lov_tgts[index]->ltd_exp is
NULL. lov_iocontrol() is to check that before dereferencing to
lov->lov_tgts[index]->ltd_exp->exp_obd.

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Cray-bug-id: LUS-6937
Test-Parameters: trivial
Change-Id: I4bb332ee2c50b07a1471035556f4d77a3559847f
Reviewed-on: https://review.whamcloud.com/34148
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
5 years agoLU-930 utils: fix --verbose option for lfs-migrate.1 75/34075/2
Andreas Dilger [Fri, 18 Jan 2019 02:44:39 +0000 (21:44 -0500)]
LU-930 utils: fix --verbose option for lfs-migrate.1

Fix the "lfs migrate --verbose" option.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I66c02926023fd3f08ddb6d5d10ee5836b83ebbe5
Reviewed-on: https://review.whamcloud.com/34075
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Joseph Gmitter <jgmitter@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11401 tests: add version check sanity-flr tests 55/33955/5
James Nunez [Tue, 15 Jan 2019 00:48:17 +0000 (17:48 -0700)]
LU-11401 tests: add version check sanity-flr tests

sanity-flr test 48 and 203 was added to Lustre tag 2.11.55.
Thus, we need to check that the server version is 2.11.55 or
later before running test 48 and 203.

sanity-flr test 0h checks for a file inheriting the directory
layout. sanity-flr test 37 added ‘lfs mirror write’ functionality.
Inheritance was fixed and ‘lfs mirror write’ was added in Lustre
tag 2.11.57. Thus, we need to check that the server version is
2.11.57 or later before running test 0h and 37.

Test-Parameters: trivial testlist=sanity-flr
Test-Parameters: serverjob=lustre-b2_11 serverbuildno=2 testlist=sanity-flr
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I94c68e900d60e2b97d7f74c6629ee54bcb3a5480
Reviewed-on: https://review.whamcloud.com/33955
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11770 osc: allow build without blk_integrity or crc-t10pi 23/33923/9
Andreas Dilger [Wed, 26 Dec 2018 09:05:37 +0000 (02:05 -0700)]
LU-11770 osc: allow build without blk_integrity or crc-t10pi

Allow the client to build if blk_integrity or crc-t10pi is not
enabled in the kernel.

Fixes: ccf3674c9ca ("LU-10472 osd-ldiskfs: T10PI between RPC and BIO")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I97c4e75ad084e99927bcb41cf0df8a680525a5b1
Reviewed-on: https://review.whamcloud.com/33923
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10171 lmv: avoid gratuitous 64-bit modulus 22/33922/9
Andreas Dilger [Wed, 26 Dec 2018 10:45:52 +0000 (03:45 -0700)]
LU-10171 lmv: avoid gratuitous 64-bit modulus

Fix the pct() calculation to use unsigned long arguments, since this
is what callers use.  Remove duplicate pct() definition in lproc_mdc.

Don't do a 64-bit modulus of the LNet NID to find the starting MDT
index when this isn't really needed.

Similarly, don't compute the FLD cache usage percentage for a debug
message that is never used.

Fixes: 9b924e86b27d ("LU-10171 headers: define pct(a,b) once")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34cefd269cb83f563d2f08c32dc3fa1ed5c5a5b1
Reviewed-on: https://review.whamcloud.com/33922
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11797 tests: improve sanity test 272a checking 89/33889/3
Andreas Dilger [Tue, 18 Dec 2018 18:57:46 +0000 (11:57 -0700)]
LU-11797 tests: improve sanity test 272a checking

Improve sanity.sh test_272a() to check data integrity before it
checks the stripe count.  Also, print out the stripe count if it
does not match our expectations.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I75397f251f10ed00c3dea9f80243d7ba9eacab07
Reviewed-on: https://review.whamcloud.com/33889
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11634 tests: sanityn/test_77 improvements 07/33607/5
Vladimir Saveliev [Mon, 9 Apr 2018 09:18:50 +0000 (12:18 +0300)]
LU-11634 tests: sanityn/test_77 improvements

sshd limits number of simultaneous unauthenticated connections via
MaxStartups configuration parameter. By default, 10 connections are
allowed. nrs_write_read() tries to run up to 32 do_nodes() in
parallel, causing sshd to drop some of connections.

The fix is to have do_nodes() to start required number of dd-s in
parallel.

Minor changes which were probably meant at the development:
- Test filenames include $HOSTNAME so that each client worked with its
own file, it seems. Add missing escaping backslashes so that $HOSTNAME
worked as expected.
- Add conv=notrunc parameter for dd-s which write lustre file at
  different seeks.
- Have reading dd-s to read files which were especially created for
  that.
- use /dev/null instead on /dev/zero to throw read data away.

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: I496b0f6b50811351ac8e0e606cf5a20843fab5d4
Cray-bug-id: LUS-2493
Test-Parameters: testlist=sanityn envdefinitions=ONLY=77
Reviewed-on: https://review.whamcloud.com/33607
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11495 tests: zconf_mount_clients() fix to work with FILESET 38/33338/11
Elena Gryaznova [Fri, 1 Feb 2019 16:41:22 +0000 (19:41 +0300)]
LU-11495 tests: zconf_mount_clients() fix to work with FILESET

If FILESET is set zconf_mount_clients() fails to mount clients
because of missing $mnt dir on the clients.

When FILESET is set the following tests are to be skipped:
- tests accessing .lustre directly
- tests involving lustre_rsync (which needs .lustre to do
  open by FID)
- tests involving copytool (lhsmtool_posix needs .lustre)
- tests taking use of llapi_open_by_fid (sanity.sh:test_405()
  ->swap_lock_test, sanity.sh:test_807()->llsom_sync)
- tests require Lustre root mounted on $MOUNT
  (sanity.sh:test_65n() needs root to test default layout
  inheritance)

Test-Parameters: trivial envdefinitions=FILESET=/subdir
Signed-off-by: Elena Gryaznova <c17455@cray.com>
Cray-bug-id: LUS-6546, LUS-6561
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Change-Id: I68081fbff808abbaddb314e36283e09d151a81db
Reviewed-on: https://review.whamcloud.com/33338
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10885 llite: enable flock mount option by default 91/32091/10
Andreas Dilger [Tue, 2 Oct 2018 21:52:28 +0000 (15:52 -0600)]
LU-10885 llite: enable flock mount option by default

The "flock" mount option has been optional for many years, initially
because of potential stability issues, and also to provide a choice
for administrators to select between "flock" and "localflock" options.

However, from the large number of problems that users report when
trying to use applications that depend on this feature (typically
databases and other cloud stacks) that disabling flock by default
causes more problems than it solves.

Enable the "flock" (distributed coherent userspace locking) feature
by default.  If applications do not need this functionality, then it
will not affect them.  If applications *do* need this functionality,
they will get it.  If administrators really know what they are doing,
then they can use the "localflock" feature to enable client-local
flock functionality, possibly only on select nodes that need this.

Users wanting to disable this functionality should mount with the
existing "-o noflock" mount option, or build the client with the
"configure --disable-flock" option.

If clients are already using "-o {flock|localflock|noflock}" then
their existing options will be handled appropriately.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I182637604fa22573b1da6b6b86d8915e3c3ebbe5
Reviewed-on: https://review.whamcloud.com/32091
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-10885 tests: fix up flocks_test bugs and code style 92/32092/10
Andreas Dilger [Tue, 2 Oct 2018 21:52:02 +0000 (15:52 -0600)]
LU-10885 tests: fix up flocks_test bugs and code style

Fix the flocks_test test program:
- don't segfault if run without any command-line arguments
- return errors from test2 to the caller

Clean up flocks_test code style:
- tabify the whole file
- don't put assignments inside conditionals
- print error messages where needed.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2555c741b0c170a43c47c16425cba3186e3ebbe5
Reviewed-on: https://review.whamcloud.com/32092
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11947 scripts: handle ZFS targets in Lustre RA 17/34217/3
Nathaniel Clark [Fri, 8 Feb 2019 18:02:28 +0000 (13:02 -0500)]
LU-11947 scripts: handle ZFS targets in Lustre RA

Fixes a regression introduced in LU-11461
This handles the case of realpath of target being an empty string.

Fixes: c36d70272541 ("LU-11461 scripts: Support symlink target")
Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I1bcb85908019e968ac0d69e437db217594a6565e
Reviewed-on: https://review.whamcloud.com/34217
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
5 years agoLU-11897 ost: improve memory allocation for ost 27/34127/7
Andrew Perepechko [Fri, 26 Oct 2018 08:29:03 +0000 (11:29 +0300)]
LU-11897 ost: improve memory allocation for ost

OST_BUFSIZE is defined as 17 KiB. Lustre uses
OBD_CPT_ALLOC_LARGE() to allocate buffers, which,
in turn, uses kmalloc_node(). kmalloc_node(8192+) falls
back to the traditional buddy allocator kmalloc_large_node().

In the end, 32 KiB is allocated using a 17 KiB allocation
request.

This patch changes OST_BUFSIZE to 32 KiB so we can
effectively use the whole allocated buffer.

Change-Id: I93ce5b26eff4a6a1a17b2a9bfb83161528570197
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Cray-bug-id: LUS-6657
Reviewed-on: https://review.whamcloud.com/34127
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
5 years agoLU-11803 tests: don't assume obd device name 94/33894/8
James Simmons [Thu, 24 Jan 2019 18:20:48 +0000 (13:20 -0500)]
LU-11803 tests: don't assume obd device name

Several tests created to exercise lustre were developed on the
x86 platform and it was assumed the device name exposed in the
sysfs tree are the same across all platforms. Additionally
we can update the test to handle the case of using an uuid
format for the sysfs directory naming instead of an internal
address pointer.

Test-Parameters: clientdistro=ubuntu1804

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: I704d13059f76337fa49aab77f3e748a70a74f1bc
Reviewed-on: https://review.whamcloud.com/33894
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
5 years agoLU-11591 llog: add synchronization for the last record 83/33683/6
Alexander Boyko [Thu, 29 Nov 2018 13:58:30 +0000 (08:58 -0500)]
LU-11591 llog: add synchronization for the last record

The initial problem was a race between llog_process_thread
and llog_osd_write_rec for a last record with lgh_last_idx.
The catalog should be wrapped for the problem. The lgh_last_idx
could be increased with a modification of llog bitmap, and a writing
record happen a bit later. When llog_process_thread processing
lgh_last_idx after modification and before a write it operates
with old record data.

The patch adds synchronization when lgh_last_idx is processed.

The patch changes llog_test 10h to check race between
llog_process_thread and llog_osd_write_rec.

1 Thread with write                  2 Thread with read
llog_osd_write_rec()                llog_process_thread()
lgh_last_idx++
lock lgh_hdr_mutex
ext2_set_bit()
dt_write_rec (write header)         ext2_test_bit()
         check lgh_last_idx was changed
         dt_read_rec()
         reread the record, and here we
         got the old value of record
unlock lgh_hdr_mutex
dt_write_rec (write the record)

Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-6683
Change-Id: I642b488655940b9456ca8e2f2174c98a966ba242
Reviewed-on: https://review.whamcloud.com/33683
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11111 lfsck: skip orphan processing 59/32959/20
Alexey Lyashkov [Wed, 8 Aug 2018 04:47:51 +0000 (07:47 +0300)]
LU-11111 lfsck: skip orphan processing

LFSCK can reconnect a recently-deleted orphan object back
into the normal namespace when it shouldn't.  This can
cause access to the deleted data (potential security risk),
and sometimes cause an assertion if orphan is later deleted.

Skip LFSCK on orphan objects.  Fix the handling of the
LUSTRE_ORPHAN_FL in both osd-zfs and osd-ldiskfs so that
la_valid |= LA_FLAGS is set when la_flags |= LUSTRE_ORPHAN_FL
is set, otherwise the upper layers cannot properly detect it.

Clean up alignment of flags values and provide hex equivalents
so that they are more easily referenced during debugging.

Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9b2809b95efa4b3c3e3b2c7d0a501624ed743ede
Reviewed-on: https://review.whamcloud.com/32959
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11616 llite: replace smp_wb() with full memory barrier 71/33571/2
NeilBrown [Sun, 4 Nov 2018 20:26:57 +0000 (15:26 -0500)]
LU-11616 llite: replace smp_wb() with full memory barrier

While porting the smp_mb() patch from LU-9210 to the linux clients
Neil asked that the smp_wb() be replaced with the full memory
barrier functions smp_store_release() and smp_load_acquire(). For
this case _sa_make_ready() sets the entry->se_state and
revalidate_statahead_dentry() tests the value of entry->se_state
after waiting on sai_waitq. This change will make it obvious which
variable was important, and would show the paired synchronization
points. An additional benefit is that code will work on platforms
that are not "total store order" TSO architectures like PowerPC.

Change-Id: I687177bf1697a21db624a289c136215af4d90506
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33571
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11058 tests: cleanup persistent checksum= in 77k 35/34035/4
Alex Zhuravlev [Tue, 15 Jan 2019 12:13:50 +0000 (15:13 +0300)]
LU-11058 tests: cleanup persistent checksum= in 77k

this trivial patch let me pass 77k locally. can be used
till actual solution is landed

Test-Parameters: trivial
Change-Id: I5c4b4cd15d8e02dd96d918c07aacd184014ade0c
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34035
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 years agoLU-11881 utils: silence error message 86/34086/2
Alexander Zarochentsev [Sun, 15 Jul 2018 21:17:20 +0000 (00:17 +0300)]
LU-11881 utils: silence error message

llapi_get_poollist prints an error message
in case of reallocating the buffer
and successful completion.

Cray-bug-id: LUS-6185
Test-Parameters: testlist=ost-pools
Change-Id: I0ca1f25edf3f4c89525d41f2deab8d25ec9e0516
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/34086
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>