Whamcloud - gitweb
fs/lustre-release.git
4 years agoLU-11089 obd: use wait_event_var() in lu_context_key_degister() 67/33667/13
NeilBrown [Tue, 21 May 2019 14:22:39 +0000 (10:22 -0400)]
LU-11089 obd: use wait_event_var() in lu_context_key_degister()

lu_context_key_degister() has an open coded loop which calls
schedule() without setting a new task state.  This is generally
a bad idea - it could easily just spin.

Instead, use wait_event_var() to wait for ->lct_used to be zero,
and arrange to get a wakeup when that happens.
Previously ->lct_used would only fall down to 1.  Now we decrement
it an extra time so that wake_up, which only happens when the
count reaches zero, will only happen when lu_context_key_degister()
is actually waiting for it.

Note that this patch removes key_fini() from protection by
lu_keys_guard.  key_fini() calls are not always protected
by a lock, and there seems to be no need here.  Nothing else
can be acting on the given key in that context at this point,
so no race is possible.

Linux-commit: ef84c07364211bb4e398a9de45d1c13a32059cee

Change-Id: I9514bd21916f75fce00e393612967fb197e3a1c4
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33667
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12350 tests: Do not use background failover 85/34985/2
Patrick Farrell [Tue, 28 May 2019 21:02:49 +0000 (17:02 -0400)]
LU-12350 tests: Do not use background failover

For some reason, test 33 chooses at one point to take an
OST offline by starting failover in the background. It
seems to assume the OST will be offline during the
subsequent read, without doing anything to verify it is
offline - In fact, it could either be not offline yet or
back online with failover complete.

Just use stop like the rest of the test does.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I9c074ff1412793b8f0d8f15dc1e2ee21bb6d9fd6
Reviewed-on: https://review.whamcloud.com/34985
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12342 spec: mark lsvcgss as a config file in the rpm 78/34978/3
Götz Waschk [Tue, 28 May 2019 06:48:02 +0000 (08:48 +0200)]
LU-12342 spec: mark lsvcgss as a config file in the rpm

The file /etc/sysconfig/lsvcgss shouldn't be overwritten on package
upgrades.

Signed-off-by: Götz Waschk <goetz.waschk@desy.de>
Change-Id: I3fa0a3a5a06d9e59699d23e652329365f38fd028
Reviewed-on: https://review.whamcloud.com/34978
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12333 ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro 49/34949/3
Chris Horn [Thu, 23 May 2019 19:42:53 +0000 (14:42 -0500)]
LU-12333 ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro

The rq_req_unlinked, rq_reply_unlinked and rq_receiving_reply flags
determine whether a PtlRPC request can transition out of
RQ_PHASE_UNREG_RPC. Add these flags to the DEBUG_REQ_FLAGS macro to
aid in debugging issues where requests are stuck in this unregistering
state.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I0b4f424cba70a29c64035ebaccf33fdb954a2db6
Reviewed-on: https://review.whamcloud.com/34949
Reviewed-by: Ann Koehler <amk@cray.com>
Tested-by: Jenkins
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12302 lnet: Fix NI status in proc for loopback ni 71/34871/2
Chris Horn [Wed, 15 May 2019 19:07:20 +0000 (14:07 -0500)]
LU-12302 lnet: Fix NI status in proc for loopback ni

The loopback NI is never really "down", but since its associated
ns_status is used for other purposes that's how it is reported in
proc_lnet_nis(). There's an existing check for lolnd so just hardcode
the status as "up" there.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: If3f29dbc08c14aa187b00d680d0045a7dbb7f2d8
Reviewed-on: https://review.whamcloud.com/34871
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11946 build: no yaml check during configure --enable-dist 12/34812/2
Olaf Faaland [Mon, 6 May 2019 18:38:37 +0000 (11:38 -0700)]
LU-11946 build: no yaml check during configure --enable-dist

If the yaml libraries are not found, the error is fatal, and prevents
the sources from being packaged.

This check is unnecessary when sources are being packaged, so this patch
disables the test when configure is run with --enable-dist.

Change-Id: I160e0d54efc59480d2f830607467dbc9f34c9de3
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/34812
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11946 build: no zlib check during configure --enable-dist 11/34811/5
Olaf Faaland [Mon, 6 May 2019 18:31:21 +0000 (11:31 -0700)]
LU-11946 build: no zlib check during configure --enable-dist

If the zlib libraries are not found, the error is fatal, and prevents
the sources from being packaged.

This check is unnecessary when sources are being packaged, so this patch
disables the test when configure is run with --enable-dist.

Change-Id: Ie262b17b63c0edc8e8bfbd0c1a466ec37d05622c
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/34811
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10894 dom: mdc_lock_flush() improvement 38/34738/2
Mikhail Pershin [Mon, 22 Apr 2019 17:31:12 +0000 (20:31 +0300)]
LU-10894 dom: mdc_lock_flush() improvement

There is small improvement in osc_lock_flush() to don't
match other locks for write lock because there are none.

Do the same in mdc_lock_flush().

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ie18ef63b2f969f762f0263f8b93aea726f89305f
Reviewed-on: https://review.whamcloud.com/34738
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10894 dom: per-resource ELC for WRITE lock enqueue 36/34736/2
Mikhail Pershin [Mon, 22 Apr 2019 13:05:00 +0000 (16:05 +0300)]
LU-10894 dom: per-resource ELC for WRITE lock enqueue

Improve client write lock enqueue by doing ELC for any
read lock on the same resource. This helps with read/write
access, e.g. compilebench shows ~10% better results with
about 45% less ldlm cancel RPCs.

In mdc_enqueue_send() collect resource unused read locks
and pack them into enqueue request.

The ldlm_cancel_resource_local() is changed also to don't
skip DOM lock if it is set in policy explicitly

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I06ece95d837495e2e970ce670db61ba0aa4e1ab4
Reviewed-on: https://review.whamcloud.com/34736
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11872 utils: don't follow link files in default 11/34111/5
Wang Shilong [Fri, 25 Jan 2019 08:54:50 +0000 (16:54 +0800)]
LU-11872 utils: don't follow link files in default

We actually don't support operation on link files itself for now.
As a first step, let's skip link files for now in default,
otherwise, it cause unexpected behavior.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ib0069ed1982e26984c6cf093f0803bf4a2208fe1
Reviewed-on: https://review.whamcloud.com/34111
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
4 years agoLU-11526 rpc: support maximum 64MB I/O RPC 42/34042/11
Qian Yingjin [Wed, 16 Jan 2019 02:13:20 +0000 (10:13 +0800)]
LU-11526 rpc: support maximum 64MB I/O RPC

On newer systems, some block drivers allow max_hw_sector_kb to
be up to 65536KB (64MB) to the underlying storage. To maximize
driver efficiency, Lustre should also have bump up maximum I/O
RPC size to 64MB.
Clamp max_read_ahead_whold_mb not to exceed
max_read_ahead_per_file_mb

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Icbf78742f8210d82dc310af7d05b7c32b93af34f
Reviewed-on: https://review.whamcloud.com/34042
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11893 o2iblnd: add secondary IP address handling 76/34476/7
James Simmons [Sat, 18 May 2019 22:35:54 +0000 (18:35 -0400)]
LU-11893 o2iblnd: add secondary IP address handling

Using dev_get_by_name() in kiblnd_create_dev() means we can only
discover primary IP addresses. This breaks using network
aliasing which some people use. Move away from dev_get_by_name()
to using for_ifa() so we can detect any secondary IP addresses.

Change-Id: I03f2f8d18118b716a5eb5fb87694000ac06fe242
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34476
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9846 lod: Add overstriping support 25/28425/43
Patrick Farrell [Wed, 29 May 2019 14:42:59 +0000 (10:42 -0400)]
LU-9846 lod: Add overstriping support

Each stripe in a shared file in Lustre corresponds to a
single LDLM extent locking domain and also to a single
object on disk (and in the OSS page cache).  LDLM locks are
extent locks, but there are still significant issues with
false sharing with multiple writers.  On-disk file systems
also have per-object performance limitations for both read
and write.

The LDLM limitation means it is best to have a single
writer per stripe, but modern OSTs can be faster than a
single client, so this restricts maximum performance unless
special methods are used (eg, Lustre lock ahead).

The on disk file system limitations mean that even if LDLM
locking is not an issue (read and not write, or lockahead),
OST performance in a shared file is still limited by having
only one object per OST.

These limitations make it impossible to get the full
performance of a modern Lustre FS with a single shared
file.

This patch makes it possible to have >1 stripe on a given
OST in each layout component.  This is known as
overstriping.  It works exactly like a normally striped
file, and is largely transparent to users.

By raising the object count per OST, this avoids the single
object limits, and by creating more stripes, also avoids
the "single effective writer per stripe" LDLM limitation.

However, it is only desirable in some situations, so users
must request it with a special setstripe command:

lfs setstripe -C [count] [file]

Users can also access overstriping using the standard '-o'
option to manually select OSTs:

lfs setstripe -o [ost_indices] [file]

Overstriping also makes it easy to test layout size limits,so we add a
test for that.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I14bb94b05642b3542a965e84fda4615b997a4dea
Reviewed-on: https://review.whamcloud.com/28425
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11359 mdt: fix mdt_dom_discard_data() timeouts 71/34071/21
Mikhail Pershin [Wed, 31 Oct 2018 13:28:29 +0000 (16:28 +0300)]
LU-11359 mdt: fix mdt_dom_discard_data() timeouts

The mdt_dom_discard_data() issues new lock to cause data
discard for all conflicting client locks. This was done in
context of unlink RPC processing and may cause it to be stuck
waiting for client to cancel their locks leading to cascading
timeouts for any other locks waiting on the same resource and
parent directory.

Patch skips discard lock waiting in the current context by
using own CP callback for that which doesn't wait for blocking
locks. They will be finished later by LDLM and cleaned up in
that completion callback. So current thread just makes sure
discard locks are taken and BL ASTs are sent but doesnt't wait
for lock granting and that fixes the original problem.

At the same time that opens window for race with data being
flushed on client, so it is possible that new IO from client
will happen on just unlinked object causing error message and
it is not possible to distinguish that case from other
possibly critical situations. To solve that the unlinked object
is pinned in memory while until discard lock is granted.
Therefore, such objects can be easily distinguished as stale one
and any IO against it can be just silently ignored.

Older clients are not fully compatible with async DoM discard so
patch adds also new connection flag ASYNC_DISCARD to distinguish
old clients and use old blocking discard for then.

Test-Parameters: testlist=racer,racer,racer
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I419677af43c33e365a246fe12205b506209deace
Reviewed-on: https://review.whamcloud.com/34071
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12034 obdclass: put all service's env on the list 66/34566/17
Alex Zhuravlev [Wed, 3 Apr 2019 08:29:06 +0000 (11:29 +0300)]
LU-12034 obdclass: put all service's env on the list

to be able to lookup by current thread where it's too
complicated to pass env by argument.

this version has stats to see slow/fast lookups. so, in sanity-benchmark
there were 172850 fast lookups (from per-cpu cache) and 27228 slow lookups
(from rhashtable). going to see the ration in autotest's reports.

Fixes: 2339e1b3b690 ("LU-11483 ldlm ofd_lvbo_init() and mdt_lvbo_fill() create env")
Fixes: e02cb40761ff ("LU-11164 ldlm: pass env to lvbo methods")

Change-Id: Ia760e10fa5c68e7a18284e4726d215b330fc0eed
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34566
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-11233 tests: fix gcc8 build warnings 61/34661/6
Alex Zhuravlev [Wed, 22 May 2019 20:28:55 +0000 (13:28 -0700)]
LU-11233 tests: fix gcc8 build warnings

this patch covers Lustre tests

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I6345d603772fb32bbc4b38a758a3e97f0361d116
Reviewed-on: https://review.whamcloud.com/34661
Tested-by: Jenkins
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12269 build: add value to definition of with_gss in spec 92/34892/4
Ben Menadue [Thu, 16 May 2019 23:52:40 +0000 (09:52 +1000)]
LU-12269 build: add value to definition of with_gss in spec

rpmbuild currently fails when gss_keyring is enabled (which
happens automatically if the right packages are installed).
This is due to an ill-formed %define in lustre.spec.in that
doesn't include the value to set the macro do.

This patch updates this line to set the value to 1.

Signed-off-by: Ben Menadue <ben.menadue@anu.edu.au>
Change-Id: I2f52b19795091702622eb3b4c110f09eb80654db
Reviewed-on: https://review.whamcloud.com/34892
Tested-by: Jenkins
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12269 build: remove %{fullrelease} from Provides 83/34883/7
Ben Menadue [Thu, 16 May 2019 05:14:56 +0000 (15:14 +1000)]
LU-12269 build: remove %{fullrelease} from Provides

Commit 7532409 adds a version number to lustre-osd-mount
Provides lines in lustre.spec.in, but include the
%{fullrelease} macro that was previously removed by
28c17d4. This causes an "unexpanded macro" warning when
building the RPM, and the result contains a bogus string
for that name, e.g.

    2.12.53_45_g43fc4db-%{fullrelease}

This patch simply removes the "-%{fullrelease}" suffix from
these lines in lustre.spec.in.

Test-Parameters: trivial
Signed-off-by: Ben Menadue <ben.menadue@anu.edu.au>
Change-Id: Ia13f339f57b89c02443ebc2d68f0aa3b0802319a
Reviewed-on: https://review.whamcloud.com/34883
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12269 build: fix hardened builds in rpm spec file 82/34882/6
Ben Menadue [Thu, 16 May 2019 04:42:37 +0000 (14:42 +1000)]
LU-12269 build: fix hardened builds in rpm spec file

The hardened build configure on RHEL8 has a quoted string
with spaces in it, and this breaks the construction of
%eval_configure on lustre.spec.in - the quotes end up in
the wrong place.

Moreover, the hardened build flags are only for user-space
code, and breaks kernel code compilation on RHEL 8.0 (it
adds -fPIE, which isn't valid for kernel code.

This patch stores the %build_cflags and %build_ldflags from
rpmbuild as environment variables before turning hardened
build off to allow the kernel code to build. These
environment variables are used in the lnet/utils and
lustre/utils Makefiles so that the user-space code there
gets the benefit of any system-specific RPM build flag
(such as hardened builds).

For RHEL7 on PPC64 we then also need to define the C macro
__SANE_USERSPACE_TYPES__ so that __s64 and __u64 are long
long instead of the default long - otherwise the build will
fail with a format string error on this platform because
Lustre uses %ll when printing/scanning __s64/__u64.

The environment variables (UTILS_CFLAGS and UTILS_LDFLAGS)
could also be used for a standalone, non-RPM build to pass
flags to the user-space code, with the usual CFLAGS and
LDFLAGS still used for kernel code.

Signed-off-by: Ben Menadue <ben.menadue@anu.edu.au>
Change-Id: I9b4ba830bf63838fd88ef1bae5dd10dff2109a1d
Reviewed-on: https://review.whamcloud.com/34882
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12299 libcfs: fix panic for too large cpu partions 64/34864/3
Wang Shilong [Wed, 15 May 2019 01:52:37 +0000 (09:52 +0800)]
LU-12299 libcfs: fix panic for too large cpu partions

If cpu partions larger than online cpus, following calcuation
will be 0:

num = num_online_cpus() / ncpt;

And it will trigger following panic in cfs_cpt_choose_ncpus()

LASSERT(number > 0);

We actually did not support this, instead of panic
it, return failure is better.

Also fix a invalid pointer access if we failed to init @cfs_cpt_table,
as it will be converted to ERR_PTR() if error happen.

Change-Id: I49daadd8f0c7d22aa78d08248d8c085781740768
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34864
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12095 ptlrpc: ocd_connect_flags are wrong during reconnect 80/34480/5
Andriy Skulysh [Wed, 27 Feb 2019 17:37:24 +0000 (19:37 +0200)]
LU-12095 ptlrpc: ocd_connect_flags are wrong during reconnect

Import connect flags are reset to original ones during
reconnect, so a request can be created with unsupported
features.

Use separate obd_connect_data to send connect request.

Change-Id: I4cfc48bf7ef66c4f3832613e179030b0eb1d6fdf
Cray-bug-id: LUS-6397
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/34480
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12267 tests: update filter in acl for SElinux case 18/34818/2
Sebastien Buisson [Tue, 7 May 2019 15:55:04 +0000 (00:55 +0900)]
LU-12267 tests: update filter in acl for SElinux case

With SElinux enforced on client, sanity.sh test_103a fails because
the "ls -l" command produces an extra '.' at the end to indicate
extra security attributes are set.

So update filter by removing this trailing '.' in the output.

Test-Parameters: trivial testlist=sanity envdefinitions=ONLY=103a
Test-Parameters: clientselinux testlist=sanity envdefinitions=ONLY=103a
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie684a3fe02f0f2821c8059855165a0f9dd585b72
Reviewed-on: https://review.whamcloud.com/34818
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838: lnet: remove lnet_ipif_enumerate() 34/34234/3
NeilBrown [Wed, 20 Mar 2019 19:25:24 +0000 (15:25 -0400)]
LU-11838: lnet: remove lnet_ipif_enumerate()

Also remove lnet_ipif_query() and related functions.

There are no longer any users of these functions, so remove them.

Linux-commit: 6e659fcfab0cdd876a555a752acf9997f98acbcd

Change-Id: I8183e505e3dbe12ff71ddf38f5b18a945d8a4a6c
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/34234
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11907 dne: allow access to striped dir with broken layout 50/34750/4
Lai Siyao [Sun, 14 Apr 2019 20:12:54 +0000 (04:12 +0800)]
LU-11907 dne: allow access to striped dir with broken layout

Sometimes the layout of striped directories may become broken:
* creation/unlink is partially executed on some MDT.
* disk failure or stopped MDS cause some stripe inaccessible.
* software bugs.

In this situation, this directory should still be accessible,
and specially be able to migrate to other active MDTs.

This patch add this support on both server and client: don't
imply stripe FID is sane, and when stripe doesn't exist, skip
it.

Add OBD_FAIL_MDS_STRIPE_FID to simulate insane stripe FID, and
OBD_FAIL_MDS_STRIPE_CREATE to simulate stripe creation failure.

Add sanity 60h.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8a05a0e0cef8b051a935b3fa3d3e26c0b6ef3b4a
Reviewed-on: https://review.whamcloud.com/34750
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6142 ptlrpc: Fix style issues for llog_client.c 00/34900/4
Arshad Hussain [Thu, 9 May 2019 21:18:04 +0000 (02:48 +0530)]
LU-6142 ptlrpc: Fix style issues for llog_client.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/llog_client.c

Change-Id: I4a3ce0022b9086fc1885d447c9b876bef183f298
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34900
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-6142 ptlrpc: Fix style issues for client.c 03/34803/6
Arshad Hussain [Tue, 30 Apr 2019 04:48:50 +0000 (10:18 +0530)]
LU-6142 ptlrpc: Fix style issues for client.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/client.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I24c4412d8747292f71c28fc0e8fc48b1cea405b9
Reviewed-on: https://review.whamcloud.com/34803
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6142 ptlrpc: Fix style issues for sec.c 97/34597/2
Arshad Hussain [Sat, 23 Mar 2019 01:21:04 +0000 (06:51 +0530)]
LU-6142 ptlrpc: Fix style issues for sec.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/sec.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: Icfbf23301b8f1b8d21df0e5122c121671997d5eb
Reviewed-on: https://review.whamcloud.com/34597
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-6142 ptlrpc: Fix style issues for sec_gc.c 51/34551/2
Arshad Hussain [Fri, 22 Mar 2019 13:01:52 +0000 (18:31 +0530)]
LU-6142 ptlrpc: Fix style issues for sec_gc.c

This patch fixes issues reported by checkpatch for
file lustre/ptlrpc/sec_gc.c

Change-Id: I19f9f86aba86417b31245da4246c2d6eeb0a3752
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34551
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-6142 ptlrpc: Fix style issues for sec_plain.c 50/34550/4
Arshad Hussain [Fri, 22 Mar 2019 11:44:31 +0000 (17:14 +0530)]
LU-6142 ptlrpc: Fix style issues for sec_plain.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/sec_plain.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I03220084b303d9d411665db9a6080f934115b67a
Reviewed-on: https://review.whamcloud.com/34550
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-6142 ldlm: Fix style issues for ldlm_resource.c 92/34492/2
Arshad Hussain [Wed, 20 Mar 2019 22:30:06 +0000 (04:00 +0530)]
LU-6142 ldlm: Fix style issues for ldlm_resource.c

This patch fixes issues reported by checkpatch
for file lustre/ldlm/ldlm_resource.c

Test-Parameters: trivial
Change-Id: I50cf6d303f284ea5d77f825eaba8f7bbdbf60568
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34492
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Ben Evans <bevans@cray.com>
4 years agoLU-11893 ksocklnd: add secondary IP address handling 92/34392/14
James Simmons [Mon, 20 May 2019 15:09:07 +0000 (11:09 -0400)]
LU-11893 ksocklnd: add secondary IP address handling

With ksocknal_enumerate_interfaces() use of for_primary_ifa() only
primary IP addresses are returned. This disables using network
aliasing which some people use. Change for_primary_ifa() to
for_ifa() so we can detect any secondary IP addresses. Update the
string handling since ifa_device names can be different than the
net_device name. Discard the 'j' counter and instead keep
ksnn_ninterfaces up to date. This measn that we return 0 on
sucess, rather than a count of added interfaces. Update the
too many interfaces test in ksocknal_enumerate_interfaces()
with a better test using ARRAY_SIZE.

Change-Id: I832df89148def5088502ac92df27b8b3872f3792
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34392
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12335 ldiskfs: fixed size preallocation table 50/34950/2
Artem Blagodarenko [Thu, 23 May 2019 13:10:48 +0000 (16:10 +0300)]
LU-12335 ldiskfs: fixed size preallocation table

Preallocation table read/write code is racy. There is a
possibility of accessing memory outside of allocated table.

Make preallocation table fixed size. Array with 64
long int values are enough for any configuration and
don’t need much memory. With such array races are not
possible.

Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Cray-bug-id: LUS-7218
Change-Id: Ie089ac47c2610717a00d6cea9121ec08879a159c
Reviewed-on: https://review.whamcloud.com/34950
Tested-by: Jenkins
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12279 lnet: use number of wrs to calculate CQEs 45/34945/3
Amir Shehata [Tue, 21 May 2019 20:44:58 +0000 (13:44 -0700)]
LU-12279 lnet: use number of wrs to calculate CQEs

Using concurrent sends to calculate the number of CQEs results
in a small number of CQEs which exposes an issue where under
failure scenarios, example when a node reboots, there wouldn't
be enough CQEs available leading to IB_EVENT_QP_FATAL

Fixes: 83e45ead69ba ("LU-11931 lnd: bring back concurrent_sends")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I6e2be079e11622b83fe3fb4fdb695f5a2672c9ac
Reviewed-on: https://review.whamcloud.com/34945
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12131 tests: fix SSK handling in tests 21/34521/13
Sebastien Buisson [Thu, 28 Mar 2019 07:35:18 +0000 (08:35 +0100)]
LU-12131 tests: fix SSK handling in tests

SSK can be activated for Lustre tests by setting SHARED_KEY env
variable to true.
In setup_all() an additional env variable SK_MOUNTED is used to avoid
mounting an SSK file system twice. But this variable has to be set
back to false in stopall() for consistency.
Some tests are incompatible with SSK, so skip them in case SHARED_KEY
is true. Some other tests playing with nodemaps have to take SSK into
account.

Whamcloud-bug-id: ATM-1283
Test-Parameters: clientselinux testlist=sanity,recovery-small,sanity-selinux
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Test-Parameters: envdefinitions=SHARED_KEY=true clientselinux testlist=sanity,recovery-small,sanity-selinux,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1016a459c42ffed1ab2b6f67d0a145ed2af9fa40
Reviewed-on: https://review.whamcloud.com/34521
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11851 ldiskfs: reschedule for htree thread. 60/34160/6
Yang Sheng [Fri, 1 Feb 2019 05:04:10 +0000 (13:04 +0800)]
LU-11851 ldiskfs: reschedule for htree thread.

Thread may be waken inproperly in htree code. This patch
reschedule thread to keep locking correct.

Change-Id: I6a8d1bbc0470b2577ca80faa304eb06f7913c218
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34160
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12225 obdclass: improve jobid memory reclaim policy 75/34775/3
Wang Shilong [Mon, 29 Apr 2019 13:13:59 +0000 (21:13 +0800)]
LU-12225 obdclass: improve jobid memory reclaim policy

jobid_should_free_item() will be called in following three
cases to decide whether @pidmap should be deleted from hash list:

1) expire normal timeout and memory reclaimer called to
try free some items.

2) admin echo sys interface to free some jobid.

3) Umount client to free all memory.

For case 2 && 3, it makes sense we always return 1,
add a warn_on in case3 to make sure there isn't any
bug in the codes.

For the case1, we could change policy a bit to not
return 1 if reference count of @pidmap is larger than 1,
a common case is a newly added @pidmap is easily freed
from hash list with current policy.

Actually, even we delete @pidmap from hash list, memory
will be eventually freed with its references count reached
1, and it is very likely we deleted and inserted @pidmap
again since this could be a hot and long runtime job.

Change-Id: I61b894a900319953d5a3369bee69bda050102129
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34775
Tested-by: Jenkins
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11233 utils: fix build warnings for gcc8 62/34662/9
Alex Zhuravlev [Mon, 15 Apr 2019 13:25:50 +0000 (16:25 +0300)]
LU-11233 utils: fix build warnings for gcc8

Quiet new build warnings that appear with GCC8, mainly related
to the length of string buffers not being long enough (in theory)
for the maximum possible string sizes, even if this never actually
is possible in practice.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I83a955fc68f3e03fe84622ddf1cedfb30d5916ac
Reviewed-on: https://review.whamcloud.com/34662
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12314 tests: Add Missing Description to sanity test 258a 02/34902/3
Arshad Hussain [Fri, 10 May 2019 00:54:03 +0000 (06:24 +0530)]
LU-12314 tests: Add Missing Description to sanity test 258a

This patch adds missing test description to sanity test 258a.

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I972549cd049b965c9e6da9b43aa245bab875a77a
Reviewed-on: https://review.whamcloud.com/34902
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12270 o2iblnd: pci_unmap_addr() removed in 4.19 27/34827/2
Li Dongyang [Wed, 8 May 2019 11:53:21 +0000 (21:53 +1000)]
LU-12270 o2iblnd: pci_unmap_addr() removed in 4.19

Since kernel 4.19 the pci_unmap_addr() wrappers have
been removed, along with linux/pci-dma.h
We can use the good old DEFINE_DMA_UNMAP_ADDR instead
of DECLARE_PCI_UNMAP_ADDR.

Linux-commit: 18b01b16e8bae9cd227909f6e6d2783d74855f65

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I387bd3d1c4e8c3bc75400ce1be05132fb25f8a50
Reviewed-on: https://review.whamcloud.com/34827
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12236 gss: remove unused code in gss_svc_upcall.c 94/34794/4
Aurelien Degremont [Thu, 2 May 2019 15:47:15 +0000 (15:47 +0000)]
LU-12236 gss: remove unused code in gss_svc_upcall.c

Delete rsc_flush() related functions which are never
used.

Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Change-Id: Iedd6339b5fafdea81147c83e5f0499fa3ad60251
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/34794
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12206 mdt: mdt_init0 failure handling 24/34724/3
Vladimir Saveliev [Fri, 19 Apr 2019 09:33:12 +0000 (12:33 +0300)]
LU-12206 mdt: mdt_init0 failure handling

When mdt_init0 fails it has to wait until zombie workqueue has all
disconnected exports destroyed before mdt_device_alloc will free the
mdt_device. Otherwise, zombie workqueue refers to freed mdt_device
via:
  general protection fault: 0000 [#1] SMP
  ..
  Workqueue: obd_zombid obd_zombie_exp_cull [obdclass]
  ..
  [<ffffffffc08829c5>] tgt_client_free+0x1e5/0x3c0 [ptlrpc]
  [<ffffffffc0ec2327>] mdt_destroy_export+0x57/0x200 [mdt]
  [<ffffffffc05bf20e>] class_export_destroy+0xee/0x490 [obdclass]
  [<ffffffffc05bf5c5>] obd_zombie_exp_cull+0x15/0x20 [obdclass]
  [<ffffffff93ab1d2f>] process_one_work+0x17f/0x440

- mdt_init0
  call to target_recovery_fini is moved so that it is called on every
  failure after successful tgt_init.

  obd_zombie_barrier is to be called after
  target_recovery_fini->class_disconnect_exports

  obd->obd_fail is set so that mdt_export_cleanup->tgt_client_del did
  not clear client's slot in last_rcvd in case of server start failure

- mdt_quota_init
  class_manual_clean does class_detach, goto is added to avoid
  repeated call to class_detach

- qmt_device_init0
  start qmt rebalance thread with SVC_STARTING flag so that
  qmt_start_reba_thread waited until the thread has started.
  Otherwise, qmt_device may get freed before qmt rebalance thread is
  stopped

Tests for failures during mdt_init0 are added
- conf-sanity.sh:test_5i leads to general protection fault
- conf-sanity.sh:test_5h causes
  rmmod: ERROR: Module mdt is in use

Cray-bug-id: LUS-2403
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Test-Parameters: trivial testlist=conf-sanity envdefinitions=ONLY=5
Change-Id: Ic9dc9e167f6c2e47a5f97e59b5bd26c5231c23ce
Reviewed-on: https://review.whamcloud.com/34724
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11771 ldlm: use hrtimer for recovery to fix timeout messages 10/34710/4
James Simmons [Thu, 18 Apr 2019 23:07:39 +0000 (19:07 -0400)]
LU-11771 ldlm: use hrtimer for recovery to fix timeout messages

Currently the functions target_handle_connect/reconnect show
incorrect timeout to the end of recovery:

fs1-OST0000: Recovery already passed deadline 71578:57.
If you do not want to wait more, please abort the recovery by force.
...
fs1-OST0000: Denying connection for new client ...
(1 recovered, 11 in progress, and 1 evicted) to recover in 71578:57

This is due to the assumption that the time returned by the
monotonic clock and jiffies was initialized at the same time but
that is not the case. So a compare between ktime_get_seconds()
and jiffies converted to seconds is invalid.

We solve this by replacing the recovery timer with a hrtimer based
one. Their are many benefits to using a hrtimer over jiffies like
better scaling, power profile, and better handling on tickless
system. This also makes the code clear by using just the real wall
clock in all cases.

Change-Id: I9d7e7e92e67ee942bc1dc51fbb0af7d8f53e54e1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34710
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
4 years agoLU-11838 llite: address_space ->page_tree renamed ->i_pages 73/34673/5
Li Dongyang [Mon, 15 Apr 2019 04:15:40 +0000 (14:15 +1000)]
LU-11838 llite: address_space ->page_tree renamed ->i_pages

kernel 4.17 renamed address_space renamed ->page_tree to ->i_pages,
and switched to xa_lock on the radix_tree_root.

Linux-commit: b93b016313b3ba8003c3b8bb71f569af91f19fc7

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Iadbc5eda884dbe8ad0d694e0f88255bc496dea5b
Reviewed-on: https://review.whamcloud.com/34673
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-11760 ofd: formatted OST recognition change 33/33833/6
Sergey Cheremencev [Fri, 24 Aug 2018 14:03:45 +0000 (17:03 +0300)]
LU-11760 ofd: formatted OST recognition change

Modern system is fast enough to create above
100 000(5 * OST_MAX_PRECREATE) objects during commit interval.
Increase the difference between MDS last_used ID
and OST LAST_ID to 500 000 to avoid gaps after OST failover.

Cray-bug-id: LUS-6399
Change-Id: If36e04878d13f27f5229b488781440a159ddff7d
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/153866
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/33833
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12309 osd-zfs: Support disabled project quotas 88/34888/5
Nathaniel Clark [Thu, 16 May 2019 17:18:04 +0000 (13:18 -0400)]
LU-12309 osd-zfs: Support disabled project quotas

Allow project quotas to be compiled in but disabled in the zpool.
This would be the case for zpools created by pre-0.8.0 ZFS, but then
used with newer ZFS.

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I79c2c4ee3b191dad4150c218b25ced2508062d51
Reviewed-on: https://review.whamcloud.com/34888
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12013 lfsck: use correct buffer 01/34901/2
Alex Zhuravlev [Sat, 18 May 2019 07:04:05 +0000 (10:04 +0300)]
LU-12013 lfsck: use correct buffer

lmm is used as a temporary pointer to structure, it can get moved within
the buffer while @size remain the same. this may cause invalid memory
access.

Change-Id: Iecc51e8bb75c678e7d8287b3798afbab8bfd1485
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34901
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12221 statahead: sa_handle_callback get lli_sa_lock earlier 60/34760/5
Ann Koehler [Thu, 25 Apr 2019 19:02:19 +0000 (14:02 -0500)]
LU-12221 statahead: sa_handle_callback get lli_sa_lock earlier

sa_handle_callback() must acquire the lli_sa_lock before calling
sa_has_callback(), which checks whether the sai_interim_entries list is
empty. Acquiring the lock avoids a race between an rpc handler
executing ll_statahead_interpret and the separate ll_statahead_thread.

When a client receives a stat request response, ll_statahead_interpret
increments sai_replied and if needed adds the request to the
sai_interim_entries list for instantiating by the ll_statahead_thread.
ll_statahead_interpret() holds the lli_sa_lock while doing this work.
On process termination, ll_statahead_thread() waits for sai_sent to
equal sai_replied and then removes any entries in the
sai_interim_entries list. It does not get the lli_sa_lock until
it determines that there are sai_interim_entries to process.

A bug occurs on weak memory model processors that do not guarantee
that both ll_statahead_interpret updates done under the lock are
visible to other processors at the same time. For example, on ARM
nodes, an ll_statahead_thread can read the updated value of
sai_replied and a non-updated value of sai_interim_lists.
ll_statahead_thread then thinks all replies have been received (true)
and all sai_interim_entries have been processed false). Later, the
update to sai_interim_entries becomes visible leaving the
ll_statahead_info struct in an unexpected state.

The bad state eventually triggers the LBUG:
statahead.c:477:ll_sai_put()) ASSERTION( !sa_has_callback(sai) )

Cray-bug-id: LUS-6243
Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I9fc6bd664188d9ac7c26b1b6965e2b99abf5e948
Reviewed-on: https://review.whamcloud.com/34760
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12324 mdd: Do not record xattr size get in changelogs 36/34936/2
Oleg Drokin [Wed, 22 May 2019 05:22:49 +0000 (01:22 -0400)]
LU-12324 mdd: Do not record xattr size get in changelogs

It looks like if the xattr itself was not fetched there's no
need to create a changelog entry for it. The real get will come
and we'd do it there

Change-Id: I5b19f9309f65da0a4c58cb79a95787dab862eb94
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34936
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
4 years agoLU-10754 tests: sanityn/47b to sleep for 1s 53/34853/4
Alex Zhuravlev [Mon, 13 May 2019 17:49:44 +0000 (20:49 +0300)]
LU-10754 tests: sanityn/47b to sleep for 1s

it seem 0.2s is not enough in this specific case

Change-Id: I51e00adb2de1229e8beafd8fe567fa7637e5d764
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34853
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
4 years agoLU-12282 build: export IB_OPTIONS before build 43/34843/3
Minh Diep [Fri, 10 May 2019 03:45:51 +0000 (20:45 -0700)]
LU-12282 build: export IB_OPTIONS before build

We need to export any option before dpkg-buildpackage

Test-Parameters: trivial

Change-Id: I683080e1872c8818ae9c391f5971b5e4488147a6
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34843
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12232 test: commit before df 08/34808/3
Hongchao Zhang [Tue, 2 Apr 2019 02:49:53 +0000 (22:49 -0400)]
LU-12232 test: commit before df

In sub_test6 of replay_ost_single, the transactions at OSTs should
be committed to cleanup the test environment.

Test-Parameters: trivial
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Icbb06789855ab02252b7f1b0b9aff6bbb0f5f2e1
Reviewed-on: https://review.whamcloud.com/34808
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12098 mdd: explicitly clear changelogs on deregister 88/34688/7
Sebastien Buisson [Tue, 16 Apr 2019 13:32:43 +0000 (22:32 +0900)]
LU-12098 mdd: explicitly clear changelogs on deregister

In case of MDS crash in the middle of changelog_deregister, the system
can end up with the changelogs user deregistered, but the changelog
entries not actually cleared. Then the only way to get rid of the
remaining changelogs not used anymore by any user is to register a new
changelogs user and then deregister it.
To protect from this scenario, explicitly clear changelogs used by the
user, before actually deregistering it.

Also add recovery-small test_136 for non-regression purpose.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I14576180c9351337fc4d9ed0e1b176d352584750
Reviewed-on: https://review.whamcloud.com/34688
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 ldlm: struct timespec64.tv_sec type change 77/34677/4
Li Dongyang [Tue, 16 Apr 2019 05:41:04 +0000 (15:41 +1000)]
LU-11838 ldlm: struct timespec64.tv_sec type change

Since kernel 4.18 struct timespec64 is no longer defined
as struct timespec on 64bit systems, this means tv_sec
is no longer __kernel_time_t but now time64_t.

Use %llu as the format specifier and explicitly cast it
to unsigned long long.

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ib4c80c9b20854d45b1b3c04057c45ee20d5413d9
Reviewed-on: https://review.whamcloud.com/34677
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-11838 osp: atomic64_read() returns s64 76/34676/4
Li Dongyang [Tue, 16 Apr 2019 05:28:11 +0000 (15:28 +1000)]
LU-11838 osp: atomic64_read() returns s64

Since kernel 4.17 atomic64_read on x86_64 returns s64
instead of long.

Use %llu as the format specifier and explicitly cast it
to unsigned long long.

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I805d43251f24417e6405f5d087927c15cf531619
Reviewed-on: https://review.whamcloud.com/34676
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12093 osc: don't check capability for every page 78/34478/4
Li Dongyang [Thu, 21 Mar 2019 03:05:14 +0000 (14:05 +1100)]
LU-12093 osc: don't check capability for every page

We check CFS_CAP_SYS_RESOURCE for every page during the io.
This is expensive on apparmor enabled systems, we can only
do that once for the entire io and use the result when
submitting the pages.

Don't init the oap_brw_flags during osc_page_init(), the flag
will be set in either osc_queue_async_io() or osc_page_submit().

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I0e664f43ce31c276b33476fdff11794185ab0a3b
Reviewed-on: https://review.whamcloud.com/34478
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12019 build: Recognize Debian Kernel and set KMP dir 29/34329/3
Thomas Stibor [Tue, 7 May 2019 16:37:20 +0000 (12:37 -0400)]
LU-12019 build: Recognize Debian Kernel and set KMP dir

Recognize Debian kernel and make sure kernel module package (KMP)
directory matches with KMP_MODDIR of Ubuntu and the Debian building
package system.

Test-Parameters: clientdistro=ubuntu1804
Signed-off-by: Thomas Stibor <t.stibor@gsi.de>
Change-Id: Iaf3635af6a624c9395db3f891d31413cb9e57b92
Reviewed-on: https://review.whamcloud.com/34329
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8066 ptlrpc: move sptlrpc procfs entry to debugfs 63/30963/12
Dmitry Eremin [Tue, 30 Apr 2019 17:27:27 +0000 (13:27 -0400)]
LU-8066 ptlrpc: move sptlrpc procfs entry to debugfs

We might want eventualy split it into a bunch of
single-value sysfs entries, I imagine, but there is no urgent need now.

Linux-commit : 77386b3c0b4470db1ed546de858b31cac66fc943

Migrate the GSS stuff to debugfs as well.

Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec

Change-Id: I417d3a46aa21cd7dca7cb8f7b6fd78623d726bed
Signed-off-by: Dmitry Eremin <dmiter4ever@gmail.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30963
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11089 obdclass: remove locking from lu_context_exit() 13/32713/8
NeilBrown [Wed, 8 May 2019 14:17:24 +0000 (10:17 -0400)]
LU-11089 obdclass: remove locking from lu_context_exit()

Recent patches suggest that the locking in lu_context_exit() hurts
performance as the changes that make are to improve performance.
Let's go all the way and remove the locking completely.

The race of interest is between lu_context_exit() finalizing a
value with ->lct_exit, and lu_context_key_quiesce() freeing
the value with key_fini().

If lu_context_key_quiesce() has started, there is no need to
finalize the value - it can just be freed.  So lu_context_exit()
is changed to skip the call to ->lcu_exit if LCT_QUIESCENT it set.

If lc_context_exit() has started, lu_context_key_quiesce() must wait
for it to complete - it cannot just skip the freeing.  To allow
this we introduce a new lc_state, LCS_LEAVING.  This indicates that
->lcu_exit might be called.  Before calling key_fini() on a context,
lu_context_key_quiesce() waits (spinning) for lc_state to move on from
LCS_LEAVING.

Linux-commit: ac3f8fd6e61b245fa9c14e3164203c1211c5ef6b

fix possible hang waiting for LCS_LEAVING

As lu_context_key_quiesce() spins waiting for LCS_LEAVING to
change, it is important the we set and then clear in within a
non-preemptible region.  If the thread that spins pre-empty the
thread that sets-and-clears the state while the state is LCS_LEAVING,
then it can spin indefinitely, particularly on a single-CPU machine.

Also update the comment to explain this dependency.

Linux-commit: 4859716f66db6989ef4bf52434b5b1d813c6adc1

Change-Id: I92ef27304eab43518fcb216b9c9cb4875cc9b98c
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32713
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12212 mdt: fix SECCTX reply buffer handling 34/34734/9
Mikhail Pershin [Mon, 22 Apr 2019 12:20:45 +0000 (15:20 +0300)]
LU-12212 mdt: fix SECCTX reply buffer handling

LU-9193 changes for inline SECCTX in reply may cause often
resends and reconnects in some loads, e.g. dbench runs.
That is caused by missed buffer shrink when SECCTX is not
used.

Patch fo the following:
- shrink SECCTX buffer if it is not used
- in mdt_getattr_name_lock() fill SECCTX buffer a bit earlier
  for simpler handling DoM size attributes, also move
  LDLM_LOCK_PUT() at the end of block to don't use 'lock'
  after LDLM_LOCK_PUT()

Fixes: fca35f74f9ec ("LU-9193 security: return security context for metadata ops")
Test-Parameters: clientselinux testlist=sanity envdefinitions=EXCEPT=103a
Test-Parameters: mdscount=2 mdtcount=4 clientselinux testlist=recovery-small,sanity-selinux
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9beffd06f76c3bd8e826ba4ab0ce70ac3f57951c
Reviewed-on: https://review.whamcloud.com/34734
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12298 init: Add init info to lustre sysvinit script 73/34873/2
Nathaniel Clark [Wed, 15 May 2019 18:16:40 +0000 (14:16 -0400)]
LU-12298 init: Add init info to lustre sysvinit script

This adds info to sysvinit script that systemd can use
to build dependency graphs.

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: Ied3bc05d61ba9dc33904a84c5f91bb9adc60cb01
Reviewed-on: https://review.whamcloud.com/34873
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoRevert "LU-8384 scripts: Add scripts to systemd for EL7" 72/34872/2
Nathaniel Clark [Wed, 15 May 2019 18:09:00 +0000 (14:09 -0400)]
Revert "LU-8384 scripts: Add scripts to systemd for EL7"

This reverts commit 420d8c09887ff178508be0434373f74b5ef7ae6e.

This prevents lustre from starting correctly, as seen in LU-12298

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: Ib0a7e85079d1aea27b3a09496a2bf02c698c294c
Reviewed-on: https://review.whamcloud.com/34872
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10754 tests: Clear mdc locks before tests 48/34848/3
Patrick Farrell [Fri, 10 May 2019 20:34:36 +0000 (16:34 -0400)]
LU-10754 tests: Clear mdc locks before tests

On ZFS testing, a sync stemming from a lock cancellation
from a previous test sometimes causes us to run longer than
the sleep times allowed for forked processes to be ready.

So, cancel the MDC lru locks first.  This will only incur a
sync if there is data to sync, but will wait for one if
necessary.

Test-Parameters: testlist=sanityn,sanityn,sanityn,sanityn

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I865de238aadd6da719066e6f22e2a36d1d3f368e
Reviewed-on: https://review.whamcloud.com/34848
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12242 kernel: kernel update RHEL7.6 [3.10.0-957.12.1.el7] 84/34784/4
Jian Yu [Tue, 30 Apr 2019 19:09:05 +0000 (12:09 -0700)]
LU-12242 kernel: kernel update RHEL7.6 [3.10.0-957.12.1.el7]

Update RHEL7.6 kernel to 3.10.0-957.12.1.el7.

Test-Parameters: clientdistro=el7.6 serverdistro=el7.6

Change-Id: I71d3bc18dbc16ed1ad7a3083dc19f52b56f60e40
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34784
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12248 lov: fix ost objects calculation in lod_statfs 77/34777/4
Li Dongyang [Tue, 30 Apr 2019 05:29:19 +0000 (15:29 +1000)]
LU-12248 lov: fix ost objects calculation in lod_statfs

Wen OSTs report fewer free objects than MDTs, the statfs
objects results are presented with the numbers reported
by OSTs. Fix the calculation of OST objetcs to make it
work with statfs aggregation via the MDT.

Make the lfs code consistent with ll_statfs_internal()
and lod_statfs().

Fixes: a829595add ("LU-11721 lod: limit statfs ffree if less ...")

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I838a1527ed6411a412b63e2855ca7247755a3bcf
Reviewed-on: https://review.whamcloud.com/34777
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12225 obdclass: fix race access vs removal of jobid_hash 63/34763/5
Wang Shilong [Mon, 29 Apr 2019 12:46:47 +0000 (20:46 +0800)]
LU-12225 obdclass: fix race access vs removal of jobid_hash

We added @pidmap into hash and reference count will be 1.
However, another thread might reclaim this newely added
@pidmap from hash list, we try to access this @pidmap
will become a user-after-free operation.

Fix this problem by init reference count as 1 before
adding hash list, which gurantee memory could be not
freed during our access.

Check other places where memory reclaim used did similar
idea like this.

Change-Id: Idd5f429b97e064e29b6883243f8a012c2b4b4ae7
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34763
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11838 lnet: getname dropping addrlen argument 72/34672/4
Li Dongyang [Mon, 15 Apr 2019 02:18:40 +0000 (12:18 +1000)]
LU-11838 lnet: getname dropping addrlen argument

Since kernel 4.17 ->getname() does not take int *addrlen
argument anymore, instead it's returning the length to
the caller.

Linux-commit: 9b2c45d479d0fb8647c9e83359df69162b5fbe5f

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I4ad5de4a22f3fb23c07a356650ea7925acf07eed
Reviewed-on: https://review.whamcloud.com/34672
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12139 kernel: kernel update [SLES12 SP3 4.4.176-94.88] 70/34670/4
Jian Yu [Tue, 30 Apr 2019 16:23:53 +0000 (09:23 -0700)]
LU-12139 kernel: kernel update [SLES12 SP3 4.4.176-94.88]

Update SLES12 SP3 kernel to 4.4.176-94.88.

Test-Parameters: trivial clientdistro=sles12sp3 serverdistro=sles12sp3

Change-Id: Iecf77e056fc571eb5118ac8c96d440e5f3ceebc0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34670
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11157 obd: round values to nearest MiB for *_mb syfs files 17/34317/16
James Simmons [Tue, 30 Apr 2019 13:17:56 +0000 (09:17 -0400)]
LU-11157 obd: round values to nearest MiB for *_mb syfs files

Several sysfs files report their settings with the functions
lprocfs_[seq]_read_frac_helper() which has the intent of showing
fractional values i.e 1.5 MiB. This approach has caused problems
with shells which don't handle fractional representation and the
values reported don't faithfully represent the original value the
configurator passed into the sysfs file. To resolve this lets
instead always round up the value the configurator passed into
the sysfs file to the nearest MiB value. This way it is always
guaranteed the values reported are always exactly some MiB value.

Change-Id: Ia2e8cf8421784853aa33d4bb87c54aee00953835
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34317
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-2233 tests: improve tests sanityn/40-47 92/4392/8
Alex Zhuravlev [Mon, 29 Apr 2019 08:21:13 +0000 (11:21 +0300)]
LU-2233 tests: improve tests sanityn/40-47

sanity/40-46 usually take 800-900s which is almost a half
of the whole sanityn pass. 99.(9)% of time the tests just
wait to ensure specific order the operations execute in.

the patch changes cfs_fail_timeout_set() so that it can
interrupt waiting if fail_loc is set to 0 - polling with
1/10s frequency is used.

the tests itself are modified to reset fail_loc. to be
able to do so both operations (referenced as OP1 and OP2
in the tests) are run in background. once started and then
ensured with pdo_sched() helper that MDS threads got to the
blocking points, we can interrupt OP1 and do usual checks.

ONLY=40-47 sh sanityn.sh take: 1017s before and 78s after.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie01bd6a077333f6f57e533a73f38588a073a2381
Reviewed-on: https://review.whamcloud.com/4392
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
4 years agoLU-12276 lnet: check const parameters for ib_post_send and ib_post_recv 37/34837/3
Jian Yu [Thu, 9 May 2019 08:04:01 +0000 (01:04 -0700)]
LU-12276 lnet: check const parameters for ib_post_send and ib_post_recv

In MOFED 4.6, the second and third parameters for ib_post_send() and
ib_post_recv() are declared with 'const'. This patch adds the check in
configure file to resolve build failure.

Change-Id: If7193a6a4fcb7b238f5d4ee64e878a5816433e7b
Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34837
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12266 mdd: fix up non-dir creation in SGID dirs 09/34809/5
Sebastien Buisson [Mon, 6 May 2019 11:07:58 +0000 (20:07 +0900)]
LU-12266 mdd: fix up non-dir creation in SGID dirs

sgid directories have special semantics, making newly created files in
the directory belong to the group of the directory, and newly created
subdirectories will also become sgid. This is historically used for
group-shared directories.

But group directories writable by non-group members should not imply
that such non-group members can magically join the group, so make sure
to clear the sgid bit on non-directories for non-members (but remember
that sgid without group execute means "mandatory locking", just to
confuse things even more).

Adapt fix from inode_init_owner() to use in mdd_create_sanity_check().

Linux-commit: 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iae253c5cc7865fc81574760ce0ed4d93698b7314
Reviewed-on: https://review.whamcloud.com/34809
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-12227 scripts: check for mounted ZFS devices too 66/34766/2
Aurelien Degremont [Fri, 26 Apr 2019 09:58:37 +0000 (09:58 +0000)]
LU-12227 scripts: check for mounted ZFS devices too

lustre init script skips several checks if the device type is ZFS. If
some ZFS devices are already mounted, the script will return a
non-zero exit code.

The label and mount point check is valid for ZFS devices, so let's do
it and avoid this error case. With this patch, when starting ZFS
devices the script will only start the not already started ones and if
it succeeds, return 0.

Test-Parameters: trivial
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: I152ca4d62d444193cc66896173873587f0761493
Reviewed-on: https://review.whamcloud.com/34766
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-10602 utils: fix file heat support 57/34757/3
Andreas Dilger [Thu, 25 Apr 2019 13:23:56 +0000 (15:23 +0200)]
LU-10602 utils: fix file heat support

Change the LL_IOC_HEAT_SET ioctl number assignment to reduce the
number of different values used, since we are running out.  Use
a __u64 as the IOC struct argument instead of a "long" since that
is what is actually passed, and it avoids being CPU-dependent.

Move the LU_HEAT_FLAG_* values into an enum to avoid a generic
"flags" argument in the code.  This makes it clear what is passed.

Clean up code style for lfs_heat_get() and lfs_heat_set().

Fixes: ae723cf8161f ("LU-10602 llite: add file heat support")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If06212d2d62d085a2104cf54ae9a10e512eb2efd
Reviewed-on: https://review.whamcloud.com/34757
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-8066 obd: embed typ_kobj in obd_type 12/34612/7
NeilBrown [Tue, 30 Apr 2019 15:09:25 +0000 (11:09 -0400)]
LU-8066 obd: embed typ_kobj in obd_type

As there is a 1-1 mapping between obd_types and their ->typ_kobj, it
is simple and more normal to embed the kobj in the obd_type, rather
than allocate it separately.

This requires calling "kobject_init_and_add()" earlier, so we
open-code relevant part of class_setup_tunables() in
class_register_type(). Now class_setup_tunables() is needed only
for server side code.

With typ_kobj embedded in obd_type we change class_setup_tunables()
to return an obd_type object instead of a kobject. This way we
can use kobject_put() to cleanup the obd_type created with
class_setup_tunables(). The reason for class_setup_tunables() is
for the creation of a lightweight obd_type which is never added
to the typ_chain list to avoid potential duplicates which can
happen on single node setups with lod / lov and osp /osc.

Change-Id: Iac160e6817a7c520e4462a3fc133ddfee6a7ccdc
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34612
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11690 lod: fix LBUG with wide striping 08/33708/14
Patrick Farrell [Thu, 2 May 2019 13:06:58 +0000 (09:06 -0400)]
LU-11690 lod: fix LBUG with wide striping

When striping extremely widely (~1600+ stripes), we reach
more than half of the theoretical limit of layout size,
and LBUG.

It is also possible to trigger this assert with
multi-component PFL files, where all the components are
below the stripe count limit, but together they exceed it.

PFL makes asserting based on LOV_MAX_STRIPE_COUNT
unworkable, so just remove the assert.  Further work is
planned to match up maximum allowed layout size with
the real maximum EA size.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id0240785792e7d4084ea6e53b44469a40e59043d
Reviewed-on: https://review.whamcloud.com/33708
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
4 years agoLU-6142 ptlrpc: Fix style issues for service.c 05/34605/6
Arshad Hussain [Sat, 23 Mar 2019 05:57:51 +0000 (11:27 +0530)]
LU-6142 ptlrpc: Fix style issues for service.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/service.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: Ibaffcdfaeac48176ba05b5e4f4471f9db96d9cbe
Reviewed-on: https://review.whamcloud.com/34605
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6142 ptlrpc: Fix style issues for sec_null.c 49/34549/3
Arshad Hussain [Fri, 22 Mar 2019 11:07:34 +0000 (16:37 +0530)]
LU-6142 ptlrpc: Fix style issues for sec_null.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/sec_null.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I67631d35ae4461ca92516975ab71f69d01378e19
Reviewed-on: https://review.whamcloud.com/34549
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-6142 ldlm: Fix style issues for interval_tree.c 98/34498/2
Arshad Hussain [Thu, 21 Mar 2019 10:02:20 +0000 (15:32 +0530)]
LU-6142 ldlm: Fix style issues for interval_tree.c

This patch fixes issues reported by checkpatch
for file lustre/ldlm/interval_tree.c

Test-Parameters: trivial
Change-Id: Ida99aa8f7a5928e87611c73aa7b5d0dc4a5246e9
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34498
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11803 obd: replace class_uuid with linux kernel version. 16/33916/30
James Simmons [Tue, 30 Apr 2019 13:42:10 +0000 (09:42 -0400)]
LU-11803 obd: replace class_uuid with linux kernel version.

We can replace the lustre custom class_uuid_t with the linux
kernels uuid handling.

Change-Id: I9a59b0b6027ccb95994a87f3a5dcdf80a8a56480
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33916
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11376 lmv: new foreign LMV format 87/34087/40
Bruno Faccini [Tue, 22 Jan 2019 15:10:26 +0000 (16:10 +0100)]
LU-11376 lmv: new foreign LMV format

This patch introduces a new striping/LMV format in order to
allow to specify an arbitrary external reference for a dir
in Lustre namespace.
The new LMV format is made of {newmagic, length, type, flags,
string[length]} to be as flexible as possible.
Foreign dir can be created by using the ioctl(LL_IOC_LMV_SETDIRSTRIPE)
operation and it can only be and remain an empty dir until removed.
A new API method llapi_dir_create_foreign() has been introduced
and "lfs {get,set}dirstripe" and "lfs find" modified to understand
new format.
The idea behind this is to provide Lustre namespace support and
striping prefetch/caching under lock protection, for user/external
usage.

This patch is the LMV/dirs complement of LOV/files previous change
(Change-Id: I5d9c0642fe8e7009c30918bfa946cac7c00c9af8) and has
been rebased on top of the latter along with some with obvious
mutualizations and simplifications.

Code has been added for lfsck to handle foreign dirs, and
a new sub-test has been added in sanity-lfsck in order to verify
if does not break foreign dir and that reverse is also true.

Also fixes a bug causing SEGVs during
"lfs find [--mdt-count=[+,-]<count>, --mdt-hash=<hashtype>]" when
handling a file (ie, "DIR *dir" is NULLL) in cb_find_init().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I3721b8f14578bf926a92da76375dae92dc8d764d
Reviewed-on: https://review.whamcloud.com/34087
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11376 lov: new foreign LOV format 55/33755/37
Bruno Faccini [Wed, 27 Feb 2019 21:59:52 +0000 (16:59 -0500)]
LU-11376 lov: new foreign LOV format

This patch introduces a new layout/LOV format in order to
allow to specify an arbitrary external reference for a file
in Lustre namespace.
The new LOV format is made of {newmagic, length, type, flags,
string[length]} to be as flexible as possible.
Foreign file can be created by using the open(O_LOV_DELAY_CREATE) +
ioctl(LL_IOC_LOV_SETSTRIPE) operations and it can only be and remain
an empty file until removed.
A new API method llapi_file_create_foreign() has been introduced
and "lfs [[get,set]stripe,find" modified to understand new layout.
The idea behind this is to provide Lustre namespace support and
layout prefetch/caching under layout protection, for user/external
usage.

Code has been added for lfsck to handle foreign files, and
a new sub-test has been added in sanity-lfsck in order to verify
if does not break foreign file and that reverse is also true.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I5d9c0642fe8e7009c30918bfa946cac7c00c9af8
Reviewed-on: https://review.whamcloud.com/33755
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
4 years agoLU-11403 tests: Fix $tfile usage 98/34698/4
Patrick Farrell [Wed, 17 Apr 2019 16:19:09 +0000 (12:19 -0400)]
LU-11403 tests: Fix $tfile usage

We cannot just use raw $tfile - we must use something under
$DIR.  This is resulting in failures because $tfile exists.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iea6356cabb1623606bf926ce80c55a3210c0b535
Reviewed-on: https://review.whamcloud.com/34698
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11233 utils: fix double-free of params fields 11/34711/2
Andreas Dilger [Thu, 18 Apr 2019 23:29:44 +0000 (17:29 -0600)]
LU-11233 utils: fix double-free of params fields

Call find_param_fini() on error so that the params are not leaked
during initialization if there is an intermediate error.

Zero out the parameters as they are freed, so if find_param_fini()
is called multiple times (as it is in some error paths) it does
not corrupt the heap by double freeing pointers.  This can be hit
by calling "lfs getstripe -m" on multiple pathnames, some of which
do not exist.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie0d7e9ee134deb0633af2f8052b8a458333ebbe5
Reviewed-on: https://review.whamcloud.com/34711
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-6951 tests: sanity test_27m failure 06/23506/5
Andrew Perepechko [Mon, 19 Feb 2018 10:17:42 +0000 (05:17 -0500)]
LU-6951 tests: sanity test_27m failure

sanity 27m fails with "OST0 was full but new created file
still use it" if the test runs with more than 1 client.
The issue can be easily reproduced with qos_threshold_rr=100.

The reason is grants. Every client initially gets 2 Mb grant.
When dd from the first client receives ENOSPC, it does not mean
the OST is filled up, since the client is not allowed to use
other clients' grants. When creating a new file, the MDS still
sees free space on OST0 equal to the amount of unused grants
and allocates new objects on OST0.

This situation does not seem to reflect any defect in Lustre.
Rather, the original author's intent seems to be that
the test should always run with a single client. So, this patch
simply disables the test if the test is running with more than
one client.

Change-Id: I47cd1a6806e8fa5203aeb5bcf57a6b31b424f24d
Seagate-bug-id: MRP-1690
Signed-off-by: Alexander Boyko <c17825@cray.com>
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/23506
Tested-by: Jenkins
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11090 quota: Oops in qsd_config 15/32715/4
Andriy Skulysh [Tue, 17 Apr 2018 11:57:07 +0000 (14:57 +0300)]
LU-11090 quota: Oops in qsd_config

It's quota config vs umount race
Remove qsd from the list of fsinfo before
freeing per-quota type data.

Change-Id: Ib7c3a94b3222ffd229da1a384113b3befc19665b
Cray-bug-id: LUS-5896
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/32715
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11251 mdt: ASSERTION (req_transno < next_transno) failed 01/33001/11
Vitaly Fertman [Tue, 7 Aug 2018 14:59:13 +0000 (17:59 +0300)]
LU-11251 mdt: ASSERTION (req_transno < next_transno) failed

An update request is checked for duplicates by xid in
is_req_replayed_by_update(). However xid is unique per
client only. It may happen that there are 2 requests
with the same xid from different clients.

Perform lookup by transno, it is unique per MDT.

Change-Id: If00b69f01451c659292c004aa296a6ea36680d3c
Cray-bug-id: LUS-6015
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/33001
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9010 ptlrpc: Change static defines to use macro for gss_krb5_mech.c 36/33936/6
Arshad Hussain [Thu, 27 Dec 2018 17:36:56 +0000 (12:36 -0500)]
LU-9010 ptlrpc: Change static defines to use macro for gss_krb5_mech.c

This patch replaces spinlock which are defined statically
in file lustre/ptlrpc/gss/gss_krb5_mech.c with kernel provided macro.

Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I5da319ce013c29043fc4bde4a4946cfbdf6c2491
Reviewed-on: https://review.whamcloud.com/33936
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12043 llite, readahead: fix to call ll_ras_enter() properly 55/34755/2
Wang Shilong [Wed, 24 Apr 2019 15:13:29 +0000 (23:13 +0800)]
LU-12043 llite, readahead: fix to call ll_ras_enter() properly

ll_ras_enter() is expected to be called per syscall.
However, with fast read enabled, it will be no longer true that
We will call vvp_io_read_start() for every syscall.

To fix this problem, we should move this to file read handler.

Change-Id: I8d70714b2e8bc04b7c4ab996d189f10f37488d97
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34755
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12159 utils: improve lfs getname functionality 95/34595/2
Andreas Dilger [Thu, 4 Apr 2019 07:21:46 +0000 (01:21 -0600)]
LU-12159 utils: improve lfs getname functionality

Add "-n" and "-i" options to lfs getname to allow printing only
the fsname or instance ID of the filesystem(s).

Split out the documentation to a separate lfs-getname.1 man page.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie132513325b6630fc5103a89b469271ba7392cb2
Reviewed-on: https://review.whamcloud.com/34595
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-9846 obd: Add overstriping CONNECT flag 43/34743/2
Patrick Farrell [Tue, 23 Apr 2019 16:43:09 +0000 (12:43 -0400)]
LU-9846 obd: Add overstriping CONNECT flag

This patch reserves the OBD_CONNECT flag for overstriping,
and also does some cleanup of OBD_CONNECT flags, putting
them in the correct order and adding some missing ones in
proc and the wire{test,check} checks.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5d7c8f30d16cc2541d3202582fe55177022ccede
Reviewed-on: https://review.whamcloud.com/34743
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
4 years agoLU-12218 ptlrpc: Bulk assertion fails on -ENOMEM 53/34753/2
Andriy Skulysh [Wed, 10 Apr 2019 20:42:06 +0000 (23:42 +0300)]
LU-12218 ptlrpc: Bulk assertion fails on -ENOMEM

Recalculate rq_mbits on ENOMEM resend if OBD_CONNECT_BULK_MBITS
isn't used.

Change-Id: I3bd5f7536372558a264bf5fe3247b8b1946f84fd
Cray-bug-id: LUS-7159
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/34753
Tested-by: Jenkins
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11233 build: support for gcc8 60/34660/7
Alex Zhuravlev [Mon, 15 Apr 2019 12:58:59 +0000 (15:58 +0300)]
LU-11233 build: support for gcc8

this patch covers kernel portion of Lustre

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I3fac8b89eef2291b5cb91ea05ee0b6ff32d11741
Reviewed-on: https://review.whamcloud.com/34660
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-12131 tests: only create lgssc.conf file if necessary 20/34520/12
Sebastien Buisson [Tue, 9 Apr 2019 12:58:20 +0000 (14:58 +0200)]
LU-12131 tests: only create lgssc.conf file if necessary

lgssc.conf file is now packaged by Lustre, and installed under
/etc/request-key.d/.
So, unless run from build tree, init_gss() must not create its own
anymore. So adjust corresponding commands in init_gss() and
cleanup_sk().

Fixes: e299df1e9eea ("LU-7854 gss: install lgssc.conf under /etc/request-key.d")
Whamcloud-bug-id: ATM-1283
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9cc76fddb8a622d7c40d6348913df42ae063254a
Reviewed-on: https://review.whamcloud.com/34520
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoLU-12263 build: push depreciation of LMV_HASH_FLAG_DEAD to 2.12.55 93/34793/2
Oleg Drokin [Thu, 2 May 2019 06:26:43 +0000 (02:26 -0400)]
LU-12263 build: push depreciation of LMV_HASH_FLAG_DEAD to 2.12.55

This is to quickly restore buildability for now.
The actual proper removal TBD.

Change-Id: Ib28d658d614716307c984dcf77b2451138bc0e1b
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34793

4 years agoNew tag 2.12.53 2.12.53 v2_12_53
Oleg Drokin [Thu, 2 May 2019 06:08:45 +0000 (02:08 -0400)]
New tag 2.12.53

Change-Id: I913a2b175ba5ea02c8489a9baa64e2932a8bdbe8
Signed-off-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12261 tests: Race between exec and truncate 91/34791/3
Patrick Farrell [Wed, 1 May 2019 19:05:37 +0000 (15:05 -0400)]
LU-12261 tests: Race between exec and truncate

Execing '$tdir/sleep' with & doesn't guarantee the file is
actually open before returning, so it is sometimes losing
the race with truncate, resulting in errors like this:
/usr/lib64/lustre/tests/sanity.sh: line 4172:
/mnt/lustre/d43b.sanity/sleep: Text file busy

Where $tdir/sleep gets ETXTBSY, instead of truncate as
expected.

A 1 second delay should be enough to guarantee exec wins
the race vs truncate.

Test-Parameters: trivial
Test-Parameters: testgroup=review-ldiskfs-arm
Test-Parameters: testgroup=review-ldiskfs
Test-Parameters: testgroup=review-ldiskfs-arm

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ice6f4567805e64c3be755b6c684b6a086a348dd8
Reviewed-on: https://review.whamcloud.com/34791
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 years agoRevert "LU-11367 som: integrate LSOM with lfs find" 80/34780/2
Oleg Drokin [Tue, 30 Apr 2019 18:00:09 +0000 (18:00 +0000)]
Revert "LU-11367 som: integrate LSOM with lfs find"

This is causing LU-12253

This reverts commit 5b6569affc9a0e33fa5d7d2061834397da13e0cb.

Change-Id: I5a70d4cec5bb81f8067e847cec99c77bc8f94093
Reviewed-on: https://review.whamcloud.com/34780
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12138 kernel: kernel update SLES12 SP4 [4.12.14-95.13.1] 19/34619/2
Jian Yu [Mon, 8 Apr 2019 19:56:03 +0000 (12:56 -0700)]
LU-12138 kernel: kernel update SLES12 SP4 [4.12.14-95.13.1]

Update SLES12 SP4 kernel to 4.12.14-95.13.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp4 \
envdefinitions=LNET_SELFTEST_EXCEPT=smoke,SANITY_EXCEPT=103a

Change-Id: I0831e611caa1ad51775e5f73d7989212f1347c2e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34619
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-11367 som: integrate LSOM with lfs find 45/33545/10
Qian Yingjin [Thu, 1 Nov 2018 08:49:53 +0000 (16:49 +0800)]
LU-11367 som: integrate LSOM with lfs find

The patch integrates LSOM functionality with lfs find so that it
is possible to use LSOM functionality directly on the client. The
MDS fills in the mbo_size and mbo_blocks fields from the LSOM
xattr, if the actual size/blocks are not available, and then set
new OBD_MD_FLLSIZE and OBD_MD_FLLBLOCKS flags in the reply so that
the client knows these fields are valid.

The lfs find command adds "--lazy" option to allow the use of LSOM
data from the MDS.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I21dfae7c2633dead5d83b438ec340fea4d56c52b
Reviewed-on: https://review.whamcloud.com/33545
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 years agoLU-12147 utils: statone doesn't place \0 72/34572/4
Alex Zhuravlev [Tue, 2 Apr 2019 12:14:35 +0000 (15:14 +0300)]
LU-12147 utils: statone doesn't place \0

as strncpy() is not supposed to do, the caller has to
take care of that.

Change-Id: I858a7f0eb6c7cdcb70e8a8e445c96f1187c73c2f
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34572
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>