Whamcloud - gitweb
fs/lustre-release.git
19 months agoLU-11141 tests: put sanity-quota 61 on slow list 03/32903/3
James Nunez [Wed, 25 Jul 2018 18:41:44 +0000 (12:41 -0600)]
LU-11141 tests: put sanity-quota 61 on slow list

Since the patch for LU-11141, with commit, 6316b42a73f8,
landed, sanity-quota test 61 takes between 20 and 50
minutes to run. Test 61 needs to be added to the slow
list and, thus, will not be run unless the SLOW
variable is true.

Test-Parameters: trivial testlist=sanity-quota
Test-Parameters: envdefinitions="SLOW=yes" testlist=sanity-quota

Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I2b6c21996ef2db9472da8838d3f41fed60ba5102
Reviewed-on: https://review.whamcloud.com/32903
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
19 months agoLU-11071 build: Add server build support for Ubuntu 18.04 13/32613/10
Li Dongyang [Thu, 19 Jul 2018 16:24:36 +0000 (12:24 -0400)]
LU-11071 build: Add server build support for Ubuntu 18.04

This enables server build for Ubuntu 18.04 LTS, the ldiskfs
patches are based on Gael's 4.12 support,
they apply to kernel versions 4.15.0-20.21 to 4.15.0-23.25

There's also a small fix to make dpkg happy when installing
lustre packages which requires lustre-client-utils.

Test-Parameters: clientdistro=ubuntu1604 trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Gael Delbary <gael.delbary@cea.fr>
Change-Id: I65e1a5ee0d17115f23ba071ff1ab23b4fb22e78f
Reviewed-on: https://review.whamcloud.com/32613
Tested-by: Jenkins
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-9538 mdt: Lazy size on MDT 60/29960/43
Qian Yingjin [Tue, 7 Nov 2017 08:27:07 +0000 (16:27 +0800)]
LU-9538 mdt: Lazy size on MDT

The design of Lazy size on MDT (LSOM) does not guarantee the
accuracy. A file that is being opened for a long time might
cause inaccurate LSOM for a very long time. And also eviction or
crash of client might cause incomplete process of closing a file,
thus might cause inaccurate LSOM. A precise LSOM could only be read
from MDT when 1) all possible corruption and inconsistency caused
by client eviction or client/server crash have all been fixed by
LFSCK and 2) the file is not being opened for write.
In the first step of implementing LSOM, LSOM will not be accessible
from client. Instead, LSOM values can only be accessed on MDT. Thus,
no interface or logic codes will be added on client side to enabled
the access of LSOM from client side.
The LSOM will be saved as an EA value on MDT.
LSOM includes both the apparent size and also the disk usage of
the file.
Whenever a file is being truncated, the LSOM of the file on MDT
will be updated.
Whenever a client is closing a file, ll_prepare_close() will send
the size and blocks to the MDS. The MDS will update the LSOM of
the file if the file size or block size is being increased.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If4032a55f448a65235a6b3db58f857c74222faa3
Reviewed-on: https://review.whamcloud.com/29960
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10928 tests: sanity/133b should wait a bit 69/32069/3
Alex Zhuravlev [Thu, 19 Apr 2018 10:40:57 +0000 (13:40 +0300)]
LU-10928 tests: sanity/133b should wait a bit

to invalidate cache in obd_statfs()

Test-Parameters: trivial

Change-Id: I08283542962e4b88ca4b5dcde4dfcc58316c1bba
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/32069
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
19 months agoLU-11149 build: enable KMP for Mellanox build 33/32833/9
Minh Diep [Thu, 19 Jul 2018 14:27:55 +0000 (07:27 -0700)]
LU-11149 build: enable KMP for Mellanox build

* We need to build Mellanox KMP to avoid error
in symbol dependency when installing lustre
* Remove all Mellanox config parameters and use
default

Test-Parameters: trivial

Change-Id: I4676d01bd5f788581e1be6df98d2d787a5419c07
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/32833
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-11083 tests: automatically load external modules 90/32790/4
John L. Hammond [Thu, 5 Jul 2018 16:02:25 +0000 (11:02 -0500)]
LU-11083 tests: automatically load external modules

In the test-framework function load_module(), try to load (using
modprobe) any not yet loaded modules (which are assumed to be
external) that the current module depends on.

Test-Parameters: trivial

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Id1d10519b00854600d095b861670e96f906298fc
Reviewed-on: https://review.whamcloud.com/32790
Tested-by: Jenkins
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-11010 tests: remove calls to return after skip() 31/32731/3
James Nunez [Tue, 26 Jun 2018 16:46:14 +0000 (10:46 -0600)]
LU-11010 tests: remove calls to return after skip()

The skip() routine now contains a call to exit. All calls
to skip() and skip_env() should be reviewed and calls to
return that followed skip() should be removed.

This is the second patch in a series that removes calls
to return after skip() in the Lustre test suites.

Calls to return after skip() are removed for:
dne_sanity
insanity
obdfilter-survey
sgpdd-survey

Test-Parameters: trivial testlist=dne-sanity,insanity,obdfilter-survey,sgpdd-survey
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I4b9aeaeddd673dcba371b8340dd635ddeed2b6be
Reviewed-on: https://review.whamcloud.com/32731
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-11068 build: remove invalid kernel srpm location 06/32606/2
Minh Diep [Fri, 1 Jun 2018 17:56:56 +0000 (10:56 -0700)]
LU-11068 build: remove invalid kernel srpm location

The location has never been existed

Change-Id: I8958bbdb5c61284c55d6cc337ac92832f91ee08b
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/32606
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10254 tests: fix racer version checks 07/30307/9
Andreas Dilger [Fri, 13 Jul 2018 20:44:39 +0000 (14:44 -0600)]
LU-10254 tests: fix racer version checks

Fix the checks for enabling DOM, PFL, and FLR tests in file_create.sh.
The $LCTL variable was unset in the test script, so the version check
was failing.

Instead of doing the version check inside file_create.sh do it in the
Lustre-specific racer.sh test script, where other version checks live.
This enables PFL and FLR testing by default, but leaves DOM tests off.

Author: Andreas Dilger <adilger@whamcloud.com>

Test-Parameters: trivial testlist=racer envdefinitions=SLOW=yes
Test-Parameters: testlist=racer mdtfilesystemtype=zfs ostfilesystemtype=zfs
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I2aeab0911f19f9741212925cf9b4aeb70e3ebbe5
Reviewed-on: https://review.whamcloud.com/30307
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms 23/32823/2
Jian Yu [Tue, 17 Jul 2018 00:09:15 +0000 (17:09 -0700)]
LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms

While choosing OST to create object, both lod_alloc_qos() and
lod_alloc_rr() functions use lod_statfs_and_check() function
to check whether the OST is available for new OST objects or not.
However, OST with max_create_count=0 is not checked in that
function and just returned as an available OST.

This patch fixes the above issue by detecting OST with
max_create_count=0 in lod_statfs_and_check() and skip it.

Change-Id: I04476a4b369e99133bd89c00155fd9f51bf0c930
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32823
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11174 quota: use sync io to test quota 74/32874/3
Hongchao Zhang [Fri, 20 Jul 2018 19:45:56 +0000 (15:45 -0400)]
LU-11174 quota: use sync io to test quota

In test_61 of sanity-quota, the client cache (grant) could affect
the quota behavior, using sync io to avoid the effect of it.

Test-Parameters: trivial testlist=sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota

Change-Id: I08bc19c5e7ac4f9cb679f96a2299c0be772f0330
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32874
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
20 months agoLU-9007 lod: improve obj alloc for FLR file 04/32404/10
Bobi Jam [Mon, 14 May 2018 11:10:24 +0000 (19:10 +0800)]
LU-9007 lod: improve obj alloc for FLR file

* add lod_layout_component::llc_ost_indices to track the map
  of dt_object to its OST index.
* add lod_device::lod_avoid to collect information of objects on other
  mirrors which overlapped the target component
* lod_should_avoid_ost() use the avoid guidance information to avoid
  allocating objects on the same OST for different mirrors.

Change-Id: Ib7e155e4b02c2e25d3955aa9a4acff7569ab7d8f
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32404
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11141 quota: reset adjust schedule when updating lqe 19/32819/5
Hongchao Zhang [Sat, 7 Jul 2018 17:55:21 +0000 (13:55 -0400)]
LU-11141 quota: reset adjust schedule when updating lqe

The scheduled adjust for some lquota_entry should be reset when its
limits (hard or soft) are updated by the glimpse callback from QMT.

Test-Parameters: mdtfilesystemtype=zfs ostfilesystemtype=zfs \
testlist=sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota,sanity-quota

Change-Id: Ia16cd90adfa15b92577841259f91f2b275fc7e82
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32819
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11147 llite: add newline to llite.*.offset_stats 17/32817/3
Andreas Dilger [Sat, 14 Jul 2018 09:29:48 +0000 (03:29 -0600)]
LU-11147 llite: add newline to llite.*.offset_stats

The llite.*.offset_stats file is missing a newline in the output.

Fixes: 49577875

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieade87f500c4fa24a0a7b8bd35d65f18dd5681ba
Reviewed-on: https://review.whamcloud.com/32817
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11138 lfs: getstripe display certain mirror(s) 04/32804/5
Bobi Jam [Tue, 10 Jul 2018 18:07:01 +0000 (12:07 -0600)]
LU-11138 lfs: getstripe display certain mirror(s)

Add [!] --mirror-index=[+-]<index> | [!] --mirror-id=[+-]<id>
option for lfs getstripe to print the components of mirror(s)
relative to <index>-th mirror or components of mirror(s) relative
to the one with mirror ID of <id>.

Change-Id: I9ab8fd5faaea07b7567f88665e06ca71157cca67
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32804
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11129 kernel: kernel update RHEL7.5 [3.10.0-862.6.3.el7] 94/32794/5
Yang Sheng [Sun, 8 Jul 2018 15:49:24 +0000 (11:49 -0400)]
LU-11129 kernel: kernel update RHEL7.5 [3.10.0-862.6.3.el7]

Update RHEL7.5 kernel to 3.10.0-862.6.3.el7

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I59b362135b5c235ac76848afb2d48014b7a4e928
Reviewed-on: https://review.whamcloud.com/32794
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Joseph Gmitter <jgmitter@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11107 mdt: handle nonexistent xattrs correctly 53/32753/3
John L. Hammond [Mon, 2 Jul 2018 15:07:51 +0000 (10:07 -0500)]
LU-11107 mdt: handle nonexistent xattrs correctly

In mdt_getxattr_pack_reply() propagate -ENODATA returns from
mo_xattr_list() to mdt_getxattr(). Add sanity test_102s() to ensure
that getting a nonexistint xattr will fail.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ic7a01feb3fcac66d39f84b4ebdfc86025c3e2779
Reviewed-on: https://review.whamcloud.com/32753
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
20 months agoLU-11108 mdt: propagate errors in mdt_getxattr() 43/32743/2
John L. Hammond [Fri, 29 Jun 2018 21:11:05 +0000 (16:11 -0500)]
LU-11108 mdt: propagate errors in mdt_getxattr()

In mdt_getxattr(), if mo_xattr_get() fails then return that error
value rather than letting mdt_nodemap_map_acl() mangle it.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I967bcc5ad6edf30b43f373e85f22fc922647c435
Reviewed-on: https://review.whamcloud.com/32743
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11094 osd-ldiskfs: Fix style issues for osd_quota.c 24/32724/6
Arshad Hussain [Sun, 24 Jun 2018 04:30:57 +0000 (10:00 +0530)]
LU-11094 osd-ldiskfs: Fix style issues for osd_quota.c

This patch fixes issues reported by checkpatch
for file lustre/osd-ldiskfs/osd_quota.c

Change-Id: I1a01c3e6327ec56a1ffcf85c5d06934a5f8e8c54
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32724
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
20 months agoLU-11087 osd-ldiskfs: Fix style issues for osd_compat.c 09/32709/5
Arshad Hussain [Tue, 12 Jun 2018 15:51:11 +0000 (21:21 +0530)]
LU-11087 osd-ldiskfs: Fix style issues for osd_compat.c

This patch fixes issues reported by checkpatch for
file lustre/osd-ldiskfs/osd_compat.c

Test-Parameters: trivial
Change-Id: Ifa5ea5563fc7e5b5e94ea992e602979dea20eb9f
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32709
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
20 months agoLU-11032 hsm: memory leak in mdt_hsm_cdt_cleanup 56/32456/3
Qian Yingjin [Fri, 18 May 2018 08:55:32 +0000 (16:55 +0800)]
LU-11032 hsm: memory leak in mdt_hsm_cdt_cleanup

Release the alloced memory of archive id in mdt_hsm_cdt_cleanup
when free hsm_agent data structure, avoiding memroy leak problem.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I40e5fd289419d7c18d5f2c3ebe0d3955229f5517
Reviewed-on: https://review.whamcloud.com/32456
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11022 lfs: accept specifing comp_id in mirror split 55/32455/2
Bobi Jam [Fri, 18 May 2018 05:15:11 +0000 (13:15 +0800)]
LU-11022 lfs: accept specifing comp_id in mirror split

This patch enables "lfs mirror split" to accept --component-id
specifying a mirror containing the designated component in mirror
splitting.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I02bf4d75013341d99d95852cb7fb0fbbb41c7a4d
Reviewed-on: https://review.whamcloud.com/32455
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11027 doc: Add lockahead to llapi_ladvise man 37/32437/4
Patrick Farrell [Thu, 17 May 2018 10:31:47 +0000 (05:31 -0500)]
LU-11027 doc: Add lockahead to llapi_ladvise man

Document lockahead in the llapi_ladvise man page.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: Ia709611bb2751a408e3525c538daa824b365b09c
Reviewed-on: https://review.whamcloud.com/32437
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10970 tests: make sure write is complete 03/32203/5
Patrick Farrell [Mon, 30 Apr 2018 12:10:38 +0000 (07:10 -0500)]
LU-10970 tests: make sure write is complete

The current test does not guarantee the write has arrived
on the server before dropping caches and checking memory
usage.  If the write is still in progress, the baseline
memory used value will be incorrect.

Sync on the client to force the write out.

Test-Parameters: trivial

Cray-bug-id: LUS-5923
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: Ic0379ffdfd14ff630d65a0197a99fba929868e9c
Reviewed-on: https://review.whamcloud.com/32203
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
20 months agoLU-11120 test: add compilebench and DNE tests 49/31749/12
Mikhail Pershin [Fri, 23 Mar 2018 09:35:10 +0000 (12:35 +0300)]
LU-11120 test: add compilebench and DNE tests

Add more tests in dom-performance.sh
- add compilebench run
- add default DOM+DNE run

Test-Parameters: trivial mdtcount=2 mdscount=2 mdssizegb=20 testlist=dom-performance
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Id93c17157dba4887d250cd933d7a1fae5906af1b
Reviewed-on: https://review.whamcloud.com/31749
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8066 osp: migrate from proc to sysfs 77/32377/10
James Simmons [Wed, 11 Jul 2018 17:27:11 +0000 (13:27 -0400)]
LU-8066 osp: migrate from proc to sysfs

Move the osp module from using proc for most single value files
to sysfs. Create the default attrs for dt_devices which can be
used for other server side devices.

Change-Id: I51fef51287585b38a1aff80d8edf986583c54a14
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32377
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10683 osd_zfs: set offset in page correctly 88/32788/2
Hongchao Zhang [Thu, 5 Jul 2018 11:44:38 +0000 (07:44 -0400)]
LU-10683 osd_zfs: set offset in page correctly

In osd_bufs_get_write, the offset in the first page should
be calculated on the offset parameter instead of zero.

Change-Id: I6592d8b5b0162b92953d59e2662a4381ba3e89ba
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32788
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-5638 tests: resume running sanity-quota tests 94/32694/2
James Nunez [Mon, 11 Jun 2018 15:58:31 +0000 (09:58 -0600)]
LU-5638 tests: resume running sanity-quota tests

sanity-quota tests 11 and 33 were not run due to the
issues documented in LU-5638. A patch, commmit id
a046e879fcadd601c9a19fd906f82ecbd2d4efd5, landed to fix
this issue. We should resume running sanity-quota
tests 11 and 33 for ZFS servers.

Test-Parameters: trivial clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=sanity-quota
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Iadb1356a0a6b4f5a8b5f54275db794f0ddbb5af6
Reviewed-on: https://review.whamcloud.com/32694
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8708 osc: depart grant shrinking from pinger 02/23202/12
Bobi Jam [Mon, 17 Oct 2016 06:36:31 +0000 (14:36 +0800)]
LU-8708 osc: depart grant shrinking from pinger

* Removing grant shrinking code outside of pinger, use a workqueue
  to handle grant shrinking timer.
* Enable OSC grant shrinking by default.

bugzilla: 19507

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ifb03c907ad285a307d37d707193cfc32998ba2b2
Reviewed-on: https://review.whamcloud.com/23202
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
20 months agoNew tag 2.11.53 2.11.53 v2_11_53 v2_11_53_0
Oleg Drokin [Tue, 24 Jul 2018 03:58:13 +0000 (23:58 -0400)]
New tag 2.11.53

Change-Id: I02c52e58bd01f54d55a9083a2d1a12f6e811eaf1

20 months agoLU-11132 compile: fix LC_BI_BDEV for old kernels 99/32799/2
Vladimir Saveliev [Thu, 12 Jul 2018 19:45:11 +0000 (22:45 +0300)]
LU-11132 compile: fix LC_BI_BDEV for old kernels

struct bio is located in linux/bio.h in 2.6 kernel serie. LC_BI_BDEV
uses linux/blk_types.h. That makes the configuration check to fail for
those kernels and breaks compiling.

Use linux/bio.h in LC_BI_BDEV so that it worked for both new and all
kernels.

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Change-Id: Iaeefea9ba96ebe4dad30acedb5fa7551c4516241
Reviewed-on: https://review.whamcloud.com/32799
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
20 months agoLU-11161 tests: stop running sanity test 160g 44/32844/2
James Nunez [Thu, 19 Jul 2018 22:36:27 +0000 (16:36 -0600)]
LU-11161 tests: stop running sanity test 160g

When run with two or more MDSs, sanity test 160g will fail
due to expecting a changelog user being deregistered on
all MDSs.

In order to stop sanity 160g from failing, add it to the
ALWAYS_EXCEPT list when running in a DNE environment which
results in the test not being executed.

Test-Parameters: trivial
Test-Parameters: testlist=sanity mdtcount=2 mdscount=2
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I091f148a3da820cad0103aead559a96c54c9fe8b
Reviewed-on: https://review.whamcloud.com/32844
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11157 obd: keep dirty_max_pages a round number of MB 31/32831/4
John L. Hammond [Wed, 18 Jul 2018 20:47:25 +0000 (15:47 -0500)]
LU-11157 obd: keep dirty_max_pages a round number of MB

In client_adjust_max_dirty() ensure that the dirty pages limit is
always divisible by 256 so that it may faithfully be represented in MB
as is the case when the max_dirty_mb parameters are used.

Test-Parameters: trivial

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I8e2fbdd4bf253a46e2951e7840484ab6a617fbe2
Reviewed-on: https://review.whamcloud.com/32831
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
20 months agoLU-11074 mdc: set correct body eadatasize for getxattr() 39/32739/3
John L. Hammond [Fri, 29 Jun 2018 21:11:45 +0000 (16:11 -0500)]
LU-11074 mdc: set correct body eadatasize for getxattr()

In mdc_intent_getxattr_pack() set mbo_eadatasize to the size of the
xattr values buffer rather than the size of the xattr names buffer.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ibbed6aba6718f50eed1a08d506d526b1e0e042c8
Reviewed-on: https://review.whamcloud.com/32739
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11097 utils: add libuuid for llverdev 26/32726/2
Alex Zhuravlev [Sun, 24 Jun 2018 19:00:21 +0000 (22:00 +0300)]
LU-11097 utils: add libuuid for llverdev

this is explicitly required on my setup

Change-Id: I2b518c922d1857411bac74f68223259bb255e0e4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32726
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
20 months agoLU-11131 target: keep reply data bit set on failover 98/32798/2
Vladimir Saveliev [Thu, 12 Jul 2018 20:27:24 +0000 (23:27 +0300)]
LU-11131 target: keep reply data bit set on failover

The following scenario leads to failure of recent reint rpc:

1. mdt server has number of rpcs being handled, rpc 1 from client A
and rpc 2 from client B.

2. shutdown for the server starts

3. rpc 1 is processed, reply data is added, but client A gets ENODEV
in reply (ptlrpc_send_reply()) as shutdown is running

3. shutdown reaches class_disconnect_exports() and links an export A
to the list of zombie exports

4. obd_zombid thread wakes up and destroy the export A, which includes
freeing of reply data list with clearing bits in
lut->lut_reply_bitmap (tgt_free_reply_data())

5. export B is still processing the rpc 2 and looks for free bit in
the lut->lut_reply_bitmap to store reply data
(tgt_add_reply_data()). If it finds a bit which has been just freed by
obd_zombid thread, then reply data from export A will get overwritten
in reply_data file with reply data from export B

6. after failover, reply data gets restored with
tgt_reply_data_init(). The reply data of client A is missing

7. client A reconnects and resends its rpc 1. Server does not find
reply data and processes the rpc as if it has not been seen yet. In
case of unlink, the directory entry already does not exist so rpc 1
fails

The fix is to not free bits in lut->lut_reply_bitmap in case of
failover.

Test illustrating the issue is added.

Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Cray-bug-id: LUS-6004
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Change-Id: I6db3728f3271ce2751fbe08dadca365eb2ffe727
Reviewed-on: https://review.whamcloud.com/32798
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11099 doc: include "-N" option to lfs_setstripe.1 34/32734/2
Emoly Liu [Wed, 27 Jun 2018 04:18:57 +0000 (12:18 +0800)]
LU-11099 doc: include "-N" option to lfs_setstripe.1

This patch includes mirror option "-N[mirror_count]" to
lfs_setstripe.1 man page so that the user can follow the manual
to create a mirrored file or set s default mirror layout on a
directory correctly.
The command format is like:
$lfs setstripe -N[mirror_count] [STRIPE_OPTIONS] <dir|filename>

Test-Parameters: trivial

Change-Id: If0fabd79d218e5582f9c64336f60466f35dbd968
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32734
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
20 months agoLU-11098 ptlrpc: ASSERTION(!list_empty(imp->imp_replay_cursor)) 27/32727/2
Andriy Skulysh [Mon, 4 Jun 2018 16:08:29 +0000 (19:08 +0300)]
LU-11098 ptlrpc: ASSERTION(!list_empty(imp->imp_replay_cursor))

It's ptlrpc_replay_next() vs close race.
ll_close_inode_openhandle() calls
mdc_free_open()->ptlrpc_request_committed->ptlrpc_free_request

Need to reset imp_replay_cursor while dropping a request from
replay list.

Change-Id: Ia0ce327a729f8cf554b008ab6d32323b5dd26ee7
Cray-bug-id: LUS-2455
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/32727
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8066 llite: replace ll_process_config with class_modify_config 22/32722/4
James Simmons [Tue, 3 Jul 2018 00:35:05 +0000 (20:35 -0400)]
LU-8066 llite: replace ll_process_config with class_modify_config

The current method of handling tunables with ll_process_config can
not work with sysfs. So replace ll_process_config handling with
class_modify_config() which can handle sysfs, debugfs and procfs.

Change-Id: I7ef5a4b1ee47827711a9d6654fda279abde06268
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32722
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-8066 osc: fix idle_timeout handling 19/32719/4
James Simmons [Thu, 14 Jun 2018 16:53:29 +0000 (12:53 -0400)]
LU-8066 osc: fix idle_timeout handling

The patch that landed for LU-7236 introduced new sysfs entries
which were done wrong.

1) For idle_timeout it returns -ERANGE for
   any value passed in expect setting idle_timeout to zero. This
   does not match what the commit message said for LU-7236. So
   I changed lprocfs_str_with_units_to_s64() into kstrtouint()
   since a signed 64 bit timeout is not needed. Using kstrtouint()
   ensures that negative values are not possible and also cap the
   value to CONNECTION_SWITCH_MAX since the max of 4 billion
   seconds is over kill.

2) For the next procfs idle_connect it is really a write only file
   but it was treated as both read and write. There is no need for
   the osc_idle_connect_seq_show() function.

3) Lastly no more stuffing new entries into proc or debugfs. For
   this patch convert these new proc entries to sysfs. It seems
   to be a common occurrence so add LPROC_SEQ_* to spelling.txt
   so checkpatch will complain about using LPROC_SEQ_* which will
   go away.

Change-Id: I1c992b2db47aade6a887919824d869e8d5354c71
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32719
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10855 ptlrpc: remove obsolete LLOG_ORIGIN_* RPCs 54/32654/2
Andreas Dilger [Wed, 6 Jun 2018 22:21:51 +0000 (16:21 -0600)]
LU-10855 ptlrpc: remove obsolete LLOG_ORIGIN_* RPCs

Remove the obsolete RPC opcodes LLOG_ORIGIN_HANDLE_WRITE_REC,
LLOG_ORIGIN_HANDLE_CLOSE, LLOG_ORIGIN_CONNECT, LLOG_CATINFO
along with their unused OBD_FAIL counterparts.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I5a2a15bc0dc9e09d0081b6c3aa291fc7713ebbe5
Reviewed-on: https://review.whamcloud.com/32654
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10855 ptlrpc: assign specific values to MGS opcodes 53/32653/2
Andreas Dilger [Wed, 6 Jun 2018 22:18:03 +0000 (16:18 -0600)]
LU-10855 ptlrpc: assign specific values to MGS opcodes

Assign specific values to all of the MGS opcodes in enum mgs_cmd
so that these values do not change if a new items is added or one
is removed in the future.  These opcodes are part of the wire
protocol and need to remain constant.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I8132ca01916cd657933d0c8864e4e78f8b3ebbe5
Reviewed-on: https://review.whamcloud.com/32653
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10855 ptlrpc: remove obsolete OBD RPC opcodes 51/32651/3
Andreas Dilger [Wed, 6 Jun 2018 20:41:13 +0000 (14:41 -0600)]
LU-10855 ptlrpc: remove obsolete OBD RPC opcodes

Remove the obsolete OBD_LOG_CANCEL (since Lustre 1.5) and
OBD_QC_CALLBACK (since Lustre 2.4) RPC opcodes.

Assign  OBD_IDX_READ an explicit opcode (as should be done with all
enums in lustre_idl.h) so that the value does not change if some
prior field is removed.

Also remove the OBD_FAIL checks that were used to test them.
The setting in conf_sanity.sh test_58 was unused for many years.

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ie68c6be0da1c114fc981cb4b1afdcdb7c13ebbe5
Reviewed-on: https://review.whamcloud.com/32651
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11052 obd: remove OBD ops based stats 02/32602/2
John L. Hammond [Fri, 25 May 2018 14:40:04 +0000 (09:40 -0500)]
LU-11052 obd: remove OBD ops based stats

Stats maintained via the OBD operations wrappers (obd_setup(),
obd_cleanup(), ...) are less and less interesting to the point that we
should remove them. The only stats files affected by this are
obdfilter.*.stats, obdfilter.*.exports.*.stats and
obdecho.*.stats. For obdfilter here is a comparison for two racer
runs. With the current OBD ops based stats:

obdfilter.lustre-OST0000.stats=
snapshot_time             1527267354.328068245 secs.nsecs
read_bytes                610 samples [bytes] 4096 4194304 800043008
write_bytes               2196 samples [bytes] 5 4194304 3410224606
setattr                   13545 samples [reqs]
punch                     7682 samples [reqs]
destroy                   2281 samples [reqs]
create                    74 samples [reqs]
statfs                    234 samples [reqs]
get_info                  1 samples [reqs]
connect                   3 samples [reqs]
disconnect                1 samples [reqs]
preprw                    2806 samples [reqs]
commitrw                  2806 samples [reqs]
ping                      422 samples [reqs]

And after the OBD ops bases stats have been removed:

obdfilter.lustre-OST0000.stats=
snapshot_time             1527168813.867472974 secs.nsecs
read_bytes                200 samples [bytes] 4096 4194304 231366656
write_bytes               1703 samples [bytes] 5 4194304 1220864892
getattr                   337 samples [reqs]
setattr                   6358 samples [reqs]
punch                     2880 samples [reqs]
destroy                   2000 samples [reqs]
create                    71 samples [reqs]
statfs                    2148 samples [reqs]
get_info                  4 samples [reqs]

Changes to obdfilter.lustre-OST0000.exports.*.stats are similar.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: If4fb7022a3de0aa61905212eaab07b94c1687c68
Reviewed-on: https://review.whamcloud.com/32602
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Jesse Hanley <hanleyja@ornl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-9325 llog: replace simple_strtol with kstrtol 98/32598/3
James Simmons [Thu, 7 Jun 2018 00:52:17 +0000 (20:52 -0400)]
LU-9325 llog: replace simple_strtol with kstrtol

Eventually simple_strtol will be removed so replace its use in
the llog_ioctl code with kstrtoxxx() functions.

Change-Id: I55a4e97837a1d9e0134dde92f0c2380f07691ab9
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32598
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
20 months agoLU-11045 test: use provided directory in racer/racer.sh 14/32514/2
John L. Hammond [Wed, 23 May 2018 15:03:46 +0000 (10:03 -0500)]
LU-11045 test: use provided directory in racer/racer.sh

In racer/racer.sh use the directory provided by the parent script
rather than the environmental variable $DIR.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iab753c34752462a30e7263b7c304e1626e5cc343
Reviewed-on: https://review.whamcloud.com/32514
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11044 osd-ldiskfs: ext4_dir_operations uses iterate_shared 86/32486/2
Chris Horn [Tue, 22 May 2018 14:39:14 +0000 (09:39 -0500)]
LU-11044 osd-ldiskfs: ext4_dir_operations uses iterate_shared

Linux 4.7 commit ae05327a00fd47c34dfe25294b359a3f3fef96e8 replaces
ext4_dir_operations iterate with iterate_shared. dir_relaxed_shared()
was also added in that commit, so we can use that function to verify
that the ext4_dir_operations is using iterate_shared.

Cray-bug-id: LUS-6008
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I67ff714296cab96408cb74fba62855c0e12cdf43
Reviewed-on: https://review.whamcloud.com/32486
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11034 build: update changelog for Ubuntu 18.04 59/32459/3
Minh Diep [Fri, 18 May 2018 17:46:56 +0000 (10:46 -0700)]
LU-11034 build: update changelog for Ubuntu 18.04

Record the version that we are building

Test-Parameters: trivial

Change-Id: I78c4aa6ad9b1a85cd498709b76ec3111e9572b84
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/32459
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
20 months agoLU-11014 mdt: remove enum mdt_it_code 58/32358/3
John L. Hammond [Fri, 11 May 2018 14:52:45 +0000 (09:52 -0500)]
LU-11014 mdt: remove enum mdt_it_code

Remove enum mdt_it_code, struct mdt_it_flavor and the mdt_it_flavor
array. In mdt_intent_opc, collapse the switch statement followed by
array lookup into a single switch statement that assigns the intent
format, handler, and handler flags. Simplify the subsequent logic in
mdt_intent_opc() accordingly.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Id56fe5fa1bd4d4c03a8de2db9d39f571bed06b2f
Reviewed-on: https://review.whamcloud.com/32358
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10990 osc: increase default max_dirty_mb to 2G 88/32288/5
Oleg Drokin [Fri, 4 May 2018 03:08:35 +0000 (23:08 -0400)]
LU-10990 osc: increase default max_dirty_mb to 2G

While ideally we want to go away from max_dirty_mb setting
completely and let grants code to take the msot part of it,
Andreas raises a somewhat valid point that for certain
system configurations with high-latency links, system
administrators might want to have ability to limit
amount of dirty pages just for those OSCs to limit amount
of time it might take to flush that dirty data.

So a good compromise is to lift the max_dirty_mb default
value first while we work out the current grant code
deficiencies

Change-Id: I4de407088af70e0f98f0563160217ba70a635dfb
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/32288
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
20 months agoLU-10986 lfs: make lfs project tolerant errors 43/32243/16
Wang Shilong [Wed, 2 May 2018 08:54:15 +0000 (16:54 +0800)]
LU-10986 lfs: make lfs project tolerant errors

This patch try to fix following problems:
1)command hang on pipe file, reproduced by following steps:
 $ mkfifo tmp/pipe
 $ lfs project -srp 500 tmp -->this will never finish.

Problem is opening a pipe file will be blocked in default
without O_NOBLOCK or O_NODELAY flag.

2)If a symbolic link with missing target exists, command
returns error and does not process remaining entries.

we should fix this problem by allowing command process
further even it hit some errors.

3)fix a wrong check for MAX_PATH.

Test-Parameters: trivial testlist=sanity-quota,sanity-quota
Change-Id: I7d08a7547e6b1351a1eff23063da6cd9c4cdc5e3
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32243
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11086 test: reset quota setting properly 07/32707/3
Wang Shilong [Wed, 13 Jun 2018 14:12:16 +0000 (22:12 +0800)]
LU-11086 test: reset quota setting properly

some test cases don't reset quota setting properly, which
make running sanity-quota.sh several times fail, this patch
try to improve this problem by:

1)reset quota setting before check_runas_id_ret, as it will
touch file which might hit EDQUOT if we don't cleanup quota
setting properly since last run.

2)fix to reset quota for test case 55 and 60.

3)reset quota setting again after all tests finished, because
some tests after sanity-quota.sh might be affected, if quota
setting not reset properly for some reasons.

Test-Parameters: trivial testlist=sanity-quota,sanity-quota
Change-Id: I2983102ea379e64173ef8c54b149ba3b5fbfebe9
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32707
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10734 tests: ensure current GC interval is over 04/31604/9
Bruno Faccini [Fri, 9 Mar 2018 01:59:51 +0000 (02:59 +0100)]
LU-10734 tests: ensure current GC interval is over

In sanity/test_160g, ensure current configured
"changelog_min_gc_interval=2" is over to allow for
GC thread to be effectivelly started.

Also, enable Changelog GC, as it is no longer the
default, in sanity/test_160g sub-test and remove
it from ALWAYS_EXCEPT to reenable it and leave
160f for LU-10680 reason.

sanity/test_160g has also been reworked to become
fully DNE aware.

Test-Parameters: trivial
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I8a079ba2ba1822b488f65ad9703204d6296fada0
Reviewed-on: https://review.whamcloud.com/31604
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11079 llite: control concurrent statahead instances 90/32690/7
Fan Yong [Wed, 13 Jun 2018 14:33:55 +0000 (22:33 +0800)]
LU-11079 llite: control concurrent statahead instances

It is found that if there are too many concurrent statahead
instances, then related statahead RPCs may accumulate on the
client import (for MDT) RPC lists
(imp_sending_list/imp_delayed_list/imp_unreplied_lis), as to
seriously affect the efficiency of spin_lock under the case
of MDT overloaded or in recovery. Be as the temporarily solution,
restrict the concurrent statahead instances.

If want to support more concurrent statahead instances, please
consider to decentralize the RPC lists attached on related import.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I7251cc536f11d184f768e3d3704ba6717644541e
Reviewed-on: https://review.whamcloud.com/32690
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10893 tests: allow to disable dm-flakey layer 58/32658/4
Alexander Boyko [Thu, 7 Jun 2018 13:54:41 +0000 (09:54 -0400)]
LU-10893 tests: allow to disable dm-flakey layer

The patch 54b9e3f789358bd9dfb94b77fe33a4faa1e28ab2 adds flakey layer
to test framework. But it also adds a regression, you can`t run tests
separately from a setup. Before the dm-flakey, it was easy to create a
configuration at ncli, setup a cluster, and start a test. But now it
is impossible. For example
sudo MDSDEV=/dev/sdb MDSDEV1=/dev/sdb sh lustre/tests/llmount.sh
sudo MDSDEV=/dev/sdb MDSDEV1=/dev/sdb ONLY=0 sh
lustre/tests/conf-sanity.sh
Format mds1: /dev/sdb
mkfs.lustre FATAL: Unable to build fs /dev/sdb (256)
mkfs.lustre FATAL: mkfs failed 256

The fix disables dm-flakey layer with option FLAKEY=false.

Test-Parameters: envdefinitions=FLAKEY=false
Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-5851
Change-Id: I248be2307cff5fe6b4b2524478ca8e4cd96a77d2
Reviewed-on: https://review.whamcloud.com/32658
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11064 lnd: determine gaps correctly 86/32586/4
Amir Shehata [Wed, 30 May 2018 20:22:11 +0000 (13:22 -0700)]
LU-11064 lnd: determine gaps correctly

We're allowed to start at a non-aligned page offset in the first
fragment and end at a non-aligned page offset in the last fragment.

When checking the iovec exclude both of the first and last fragments
from the tx_gaps check.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I8a9231db7db404a5d5a6294ff263c1bd2ac28e6c
Reviewed-on: https://review.whamcloud.com/32586
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11117 ptlrpc: don't zero request handle 81/32781/4
Alexander Boyko [Fri, 15 Jun 2018 09:02:36 +0000 (05:02 -0400)]
LU-11117 ptlrpc: don't zero request handle

LNet can retransmit a request at any time if it isn't replied.
The ptlrpc_resend_req zero the request handle and ptlrpc_send_rpc
set it. If retransmission happen with zeroed handle, the client
can't find a valid export by handle and set rq_export to NULL and
reply with ENOTCONN. A server evict client with this error.

client (nid x.x.x.x@tcp) returned error from blocking AST
(req status -107 rc -107), evict it

Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-6037
Change-Id: I198666d386fea99b46994f965c1519acb5743d75
Reviewed-on: https://review.whamcloud.com/32781
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-7816 quota: add default quota setting support 06/32306/16
Hongchao Zhang [Tue, 5 Jun 2018 22:23:42 +0000 (18:23 -0400)]
LU-7816 quota: add default quota setting support

Similar function which is motivated by GPFS which is friendly
feature for cluster administrators to manage quota.

Lazy Quota default setting support, here is basic idea:

Default quota setting is global quota setting for user, group,
project quotas, if default quota is set for one quota type,
newer created users/groups/projects will inherit this setting
automatically, since Lustre itself don't have ideas when new
users created, they could only know when this users trying to
acquire space from Lustre.

So we try to implement lazy quota setting inherit, Slave firstly
check if there exists default quota setting, if exists, it will
force slave to acquire quota from master, and master will detect
whether default quota is set, then it will set this quota and also
return proper grant space to slave.

To implement this and reuse existed quota APIs, we try to manage
the default quota in the quota record of 0 id, and enforce the
quota check when reading the quota recored from disk.

In the current Lustre implementation, the grace time is either
the time or the timestamp to be used after some quota ID exceeds
the soft limt, then 48bits should be enough for it, its high 16bits
can be used as kinds of quota flags, this patch will use one of
them as the default quota flag.

The global quota record used by default quota will set its soft
and hard limit as zero, its grace time will contain the default flag.

Use lfs setquota -U/-G/-P <mnt> to set default quota.
Use lfs setquota -u/-g/-p foo -d <mnt> to set foo to use default quota
Use lfs quota -U/-G/-P <mnt> to show default quota.

Test-Parameters: envdefinitions=DEBUG_SIZE=64

Change-Id: Ib23007360921832b3c7d5710ab50324bc5067286
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/32306
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11003 ldlm: don't add canceling lock back to LRU 92/32692/2
Mikhail Pershin [Mon, 11 Jun 2018 06:44:01 +0000 (09:44 +0300)]
LU-11003 ldlm: don't add canceling lock back to LRU

When lock is converted check it is not canceling before
adding it back to LRU.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I278389f2a23b304d812f82ffb2dcee2ca70f5b21
Reviewed-on: https://review.whamcloud.com/32692
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-11004 ptlrpc: Serialize procfs access to scp_hist_reqs using mutex 07/32307/2
Andriy Skulysh [Thu, 12 Apr 2018 13:12:05 +0000 (16:12 +0300)]
LU-11004 ptlrpc: Serialize procfs access to scp_hist_reqs using mutex

scp_hist_reqs list can be quite long thus a lot of
userland processes can waste CPU power in spinlock cycles.

Change-Id: Ic0fa7338569f9a19213a1dc31f5479c96a76d23a
Cray-bug-id: LUS-5833
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/32307
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10527 obdclass: don't recycle loghandle upon ENOSPC 97/30897/4
Bruno Faccini [Wed, 17 Jan 2018 15:22:58 +0000 (16:22 +0100)]
LU-10527 obdclass: don't recycle loghandle upon ENOSPC

In llog_cat_add_rec(), upon -ENOSPC error being returned from
llog_cat_new_log(), don't reset "cathandle->u.chd.chd_current_log"
to NULL.
Not doing so will avoid to have llog_cat_declare_add_rec() repeatedly
and unnecessarily create new+partially initialized LLOGs/llog_handle
and assigned to "cathandle->u.chd.chd_current_log", this without
llog_init_handle() never being called to initialize
"loghandle->lgh_hdr".

Also, unnecessary LASSERT(llh) has been removed in
llog_cat_current_log() as it prevented to gracefully handle this
case by simply returning the loghandle.
Thanks to S.Cheremencev (Cray) to report this.

Both ways to fix have been kept in patch as the 1st part allows for
better performance in terms of number of FS operations being done
with permanent changelog's ENOSPC condition, even if this covers
a somewhat unlikely situation.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I526f788dc283fa7136ba518179d9337e1d5e3714
Reviewed-on: https://review.whamcloud.com/30897
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-10175 ldlm: handle lock converts in cancel handler 14/32314/5
Mikhail Pershin [Mon, 7 May 2018 20:36:55 +0000 (23:36 +0300)]
LU-10175 ldlm: handle lock converts in cancel handler

- Use cancel portals and high-priority handling for lock
  converts. Update ldlm_cancel_handler to understand
  LDLM_CONVERT RPC for that.
- Use ns_dirty_age_limit for lock convert - don't convert too old
  locks.
- Check for empty converts and skip such

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I767626acd974ad88bbbf0bb3b0a46744c45b7897
Reviewed-on: https://review.whamcloud.com/32314
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
21 months agoRevert "LU-8066 llite: replace ll_process_config with class_modify_config" 21/32721/2
Oleg Drokin [Thu, 14 Jun 2018 18:08:55 +0000 (18:08 +0000)]
Revert "LU-8066 llite: replace ll_process_config with class_modify_config"

This patch was landed by mistake.

This reverts commit db67e686d9abcf750359820bfbdb754ab611bf5c.

Change-Id: I2cbfe808eb7d5c448bdf06d4c36229813e6978d2
Reviewed-on: https://review.whamcloud.com/32721
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-8066 llite: replace ll_process_config with class_modify_config 95/32495/5
James Simmons [Sat, 9 Jun 2018 14:16:59 +0000 (10:16 -0400)]
LU-8066 llite: replace ll_process_config with class_modify_config

The current method of handling tunables with ll_process_config can
not work with sysfs. So replace ll_process_config handling with
class_modify_config() which can handle sysfs, debugfs and procfs.

Change-Id: I40611930ab2b769c0661aa7dce0c7dd0f2d90204
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32495
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
21 months agoLU-10560 osd: bio_integrity_enabled was removed 21/32621/3
Li Dongyang [Tue, 5 Jun 2018 01:40:43 +0000 (11:40 +1000)]
LU-10560 osd: bio_integrity_enabled was removed

T10PI bio support patches used bio_integrity_enabled
which was no longer available in recent kernels.
Fix this so we can have server support back on 4.13+
kernels.

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I32eeea244ad599c7af2d551b9b2b173e982d07d3
Reviewed-on: https://review.whamcloud.com/32621
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-11065 kernel: kernel update [SLES12 SP3 4.4.132-94.33] 99/32599/3
Bob Glossman [Thu, 31 May 2018 13:57:52 +0000 (06:57 -0700)]
LU-11065 kernel: kernel update [SLES12 SP3 4.4.132-94.33]

Update target, kernel_config, and ldiskfs files for new version
One ldiskfs patch revised for ext4 changes.
Old unchanged ldiskfs patch kept to use for sles12sp2.

Test-Parameters: clientdistro=sles12sp3 testgroup=review-ldiskfs \
  mdsdistro=sles12sp3 ossdistro=sles12sp3 \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ic6d0219a7133825d1dba0b2bfadf8354442cddb3
Reviewed-on: https://review.whamcloud.com/32599
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-11051 obd: remove obd_{get,put}ref() 29/32529/3
John L. Hammond [Thu, 17 May 2018 16:36:23 +0000 (11:36 -0500)]
LU-11051 obd: remove obd_{get,put}ref()

obd_getref() and obd_putref() are only used in the lov layer and only
implemented by the lov layer. So they can be removed in favor of
direct calls. Rename lov_{get,put}ref() to lov_tgts_{get,put}ref()
since they do not manage references on the lov device but on its
targets array.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I0f48eaf4bb42b81b2155c599f361a17dd7bb1ae3
Reviewed-on: https://review.whamcloud.com/32529
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10921 utils: improve lfs setstripe error message 42/32442/4
Andreas Dilger [Wed, 16 May 2018 22:18:33 +0000 (16:18 -0600)]
LU-10921 utils: improve lfs setstripe error message

Improve the error messages when "lfs setstripe" or "lfs setdirstripe"
is run on an existing file/directory.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I3b21fb65847822c73713e9a26d6dea978b3cab07
Reviewed-on: https://review.whamcloud.com/32442
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10175 ptlrpc: add LOCK_CONVERT connection flag 93/32593/3
Mikhail Pershin [Sun, 20 May 2018 18:00:23 +0000 (21:00 +0300)]
LU-10175 ptlrpc: add LOCK_CONVERT connection flag

Add LOCK_CONVERT connection flag to don't use lock
convert feature with old servers.

Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ie860f43955314017609774d692f89cfe3c2ab896
Reviewed-on: https://review.whamcloud.com/32593
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
21 months agoLU-10963 gnilnd: stats variables overflow assert 84/32184/4
Chuck Fossen [Thu, 26 Apr 2018 20:04:25 +0000 (15:04 -0500)]
LU-10963 gnilnd: stats variables overflow assert

Reverse bte rdma transactions stats were being
incremented by kgnilnd_admin_addref() which asserts when the value
goes negative. These stats should be incremented with atomic_inc
instead.

Test-Parameters: trivial
Cray-bug-id: LUS-5940
Signed-off-by: Chuck Fossen <chuckf@cray.com>
Change-Id: I06426bc078cc76f14c7b3efb5f3ceb71054c2d09
Reviewed-on: https://review.whamcloud.com/32184
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-4423 ldlm: use delayed_work for ldlm_pools_recalc 05/31705/8
NeilBrown [Thu, 31 May 2018 16:44:57 +0000 (12:44 -0400)]
LU-4423 ldlm: use delayed_work for ldlm_pools_recalc

ldlm currenty has a kthread which wakes up every so often and calls
ldlm_pools_recalc(). The thread is started and stopped, but no other
external interactions happen.

This can trivially be replaced by a delayed_work if we have
ldlm_pools_recalc() reschedule the work rather than just report when to
do that.

Change-Id: I85f8bc79ef86d1c7a6cbe159e6970445eb7f8389
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/31705
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10370 ofd: truncate does not update blocks count on client 73/31073/10
Arshad Hussain [Fri, 9 Feb 2018 19:11:51 +0000 (00:41 +0530)]
LU-10370 ofd: truncate does not update blocks count on client

'truncate' call correctly updates the server side with
correct size and blocks count. However, on the client
side all the metadata are correctly updated except the
blocks count, which still reflects the old count prior
to truncate call. This patch fixes this issue by
modifying ofd_punch_hdl() to update repbody with the
updated block count.

New test case under sanity is added to verify the that
the blocks counts are correctly updated after truncate call

Change-Id: I8f3f44e1668fab925339350074d1ad8ab681fc95
Co-authored-by: Abrarahmed Momin <abrar.momin@gmail.com>
Signed-off-by: Abrarahmed Momin <abrar.momin@gmail.com>
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/31073
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10120 lsnapshot: handle dash in fsname 26/30626/2
Fan Yong [Thu, 21 Dec 2017 12:09:30 +0000 (20:09 +0800)]
LU-10120 lsnapshot: handle dash in fsname

'-' is a valid character for Lustre fsname. Replace "strchr()"
with "strrchr()" to correctly parse fsname from configuration.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: Darby Vicker <darby.vicker-1@nasa.gov>
Change-Id: Ib972288668f1b7bcf1f9188c0e9cc77027e7ceeb
Reviewed-on: https://review.whamcloud.com/30626
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
21 months agoLU-9751 snapshot: set PATH for remote zfs commands 99/27999/15
Fan Yong [Mon, 9 Apr 2018 14:55:14 +0000 (22:55 +0800)]
LU-9751 snapshot: set PATH for remote zfs commands

It is possible that the remote zfs/zpool commands for Lustre
snapshot are NOT in the remote shell execute/search path. So
needs to set the PATH variable for the remote shell commands.

It is inconvenient for the admin to specify the PATH option
via single lsnapshot command for each Lustre target. So the
patch specifies the remote PATH environment variable as the
the local PATH environment variable. It requires all Lustre
servers to have broadly consistent zfs tools instalation in
such PATH.

It also contains some macro definations for code cleanup.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I2b1ce630d4aad63ab20e6c323f2222dccb51ed6e
Reviewed-on: https://review.whamcloud.com/27999
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-9764 lfsck: reset LFSCK trace file if fail to load it 97/27997/2
Fan Yong [Wed, 12 Jul 2017 04:50:41 +0000 (12:50 +0800)]
LU-9764 lfsck: reset LFSCK trace file if fail to load it

If the on-disk LFSCK trace file is corrupted, then LFSCK
may get failure when load it. Under such case, the LFSCK
should reset (recreate) the traces files by force.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0237a88ff23cdec680303ac3976a53c1632598fe
Reviewed-on: https://review.whamcloud.com/27997
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
21 months agoLU-10048 osd: async truncate 88/27488/43
Alex Zhuravlev [Wed, 7 Jun 2017 13:32:39 +0000 (17:32 +0400)]
LU-10048 osd: async truncate

osd-ldiskfs should execute truncate outside of main transaction
handle. This avoids restarting truncate transaction handles in
main transaction, and allows "transaction first, locking second"
model on OST.

Change-Id: Iffe45c42834c26ca72b65e068ad25ac61d0607c8
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/27488
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
21 months agoLU-6511 osd-ldiskfs: Fix all irregular indentation for osd_iam.c 98/19598/4
Parinay Kondekar [Sat, 26 May 2018 20:02:49 +0000 (01:32 +0530)]
LU-6511 osd-ldiskfs: Fix all irregular indentation for osd_iam.c

"osd_iam.c" had irregular and inconsistent indentation all
throughout the file. This patch fixes all the indentation
and space warnings throughout the file. There are still few
'checkpatch' errors/warnings left. However, to keep the patch
consistent only space and indents are corrected in this patch.

Test-Parameters: trivial
Change-Id: I55f650175b7efc85f87f216d8225b0517e8a3d94
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Signed-off-by: Parinay Kondekar <parinay.kondekar@seagate.com>
Reviewed-on: https://review.whamcloud.com/19598
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
21 months agoLU-7236 ptlrpc: idle connections can disconnect 82/16682/123
Alex Zhuravlev [Mon, 28 Sep 2015 13:50:15 +0000 (16:50 +0300)]
LU-7236 ptlrpc: idle connections can disconnect

 - when new request is being allocated ptlrpc initiates
   connection if it's not connected yet
 - if the import is idle (no locks, no active RPCs, no
   non-PING reply for last osc_idle_timeout seconds),
   then pinger tries to disconnect asynchronously
 - currently only client-to-OST connections can be idle
 - lctl set_param osc.*.idle_timeout=N controls new feature:
   N=0 - disable
   N>0 - seconds to idle before disconnect
 - lctl set_param osc.*.idle_connect=N to reconnect if idle
   (N is positive number)
 - OSC module parameter osc_idle_timeout controls default
   idle timeout and set to 20 seconds by default

Change-Id: I4b90eb5209a0b0e62d85fd55ad6e9cab8c03fd14
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/16682
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
21 months agoLU-11066 systemd: Add IB dependencies to lnet.service 46/32646/3
Nathaniel Clark [Thu, 31 May 2018 14:45:47 +0000 (10:45 -0400)]
LU-11066 systemd: Add IB dependencies to lnet.service

Add ordering for inkernel (rdma.server) and Mellanox MOFED
(openibd.service).

This ensures that systemd will shutdown lnet prior to IB, thus
preventing it from hanging.

Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ia0be1ca60eb8f54edd2f4f6bfbca10cbc01cc638
Reviewed-on: https://review.whamcloud.com/32646
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
21 months agoLU-11049 ssk: correctly handle null byte by lgss_sk 10/32510/4
Sebastien Buisson [Tue, 22 May 2018 15:50:53 +0000 (17:50 +0200)]
LU-11049 ssk: correctly handle null byte by lgss_sk

lgss_sk must include null byte with fsname and nodemap info taken from
command line.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie98444c930b8df521482468c4897e080ded0d2f6
Reviewed-on: https://review.whamcloud.com/32510
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10680 mdd: create gc thread when no current transaction 76/31376/40
Bruno Faccini [Thu, 22 Feb 2018 15:23:18 +0000 (16:23 +0100)]
LU-10680 mdd: create gc thread when no current transaction

Creating a kthread can't occur during a journal transaction is being
filled because otherwise a deadlock can happen if memory reclaim is
triggered by kthreadd when forking the new thread, and thus I/Os
could be attempted to the same device from shrinkers requiring a new
journal transaction to be started when current could never complete.

Thus this patch moves kthread_run() of gc_task in mdd_trans_stop().

Comment in mdd_changelog_max_idle_time_seq_write() as been updated
to reflect the need to limit the value to about 68 years, to allow
to keep with 32 bits operands for comparison,

As it will go away with recent kernels, get_seconds() usage has
been replaced by calling ktime_get_real_seconds() for user idle
time initialization and comparison.

Also, enable Changelog GC, as it is no longer the default, in
sanity/test_160f sub-test and remove it from ALWAYS_EXCEPT to
reenable it, leaving 160g for LU-10734 reason now. And in
addition, changes in sanity/test_160f have been added to make
it fully DNE-compatible.

With this patch, GC-thread can be stopped upon MDT umount, and
remaining orphan ChangeLog records clean-up will occur upon next
restart. New sanity/test_160h sub-test checks this scenario.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I7ec076bc04594b230c57348d7ac92acc58c258e1
Reviewed-on: https://review.whamcloud.com/31376
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-11069 llite: correct file position after appending writes 41/32641/3
John L. Hammond [Wed, 6 Jun 2018 13:14:50 +0000 (08:14 -0500)]
LU-11069 llite: correct file position after appending writes

In ll_file_io_generic() use the position returned in the kiocb to set
the returned file position. This ensures that the file position is set
correctly after an appending write. Add sanity test_23d() to check
that calling lseek() for the current offset returns the correct value
in this situation.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ic76ce49db6e87d5294e18546d5b75a12793aa99c
Reviewed-on: https://review.whamcloud.com/32641
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10419 lfsck: signal master engine when stop 27/31627/5
Fan Yong [Fri, 20 Apr 2018 21:53:50 +0000 (05:53 +0800)]
LU-10419 lfsck: signal master engine when stop

It is possible that during the LFSCK scanning, some server, MDT
or OST, maybe offline. At that time, if the LFSCK needs to talk
with such offline server, related RPC will trigger reconnect to
the offline server, and the LFSCK engine has to wait untill the
offline server become online or someone deactives the server by
force. To avoid being blocked when lfsck_stop() under such case,
the stop logic will send SIGINT signal to LFSCK engines. But we
only do that for the LFSCK assistant engines, forget to do that
for the LFSCK master engine. This patch fixes that.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I5d51ab49524e8ae54f0853e93b94e78913f65e8a
Reviewed-on: https://review.whamcloud.com/31627
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-11058 tests: stop running sanity test 77k 85/32685/2
James Nunez [Fri, 8 Jun 2018 18:46:10 +0000 (12:46 -0600)]
LU-11058 tests: stop running sanity test 77k

sanity test 77k is failing for a variety of Lustre
file system configurations. Stop running test 77k by
adding it to the ALWAYS_EXCEPT list.

When this issue is resolved, we need to resume running
sanity test 77k by removing it from the ALWAYS_EXCEPT list.

Test-Parameters: trivial
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I3cd53e721b1b3ede633603273dafd54c9f5701c4
Reviewed-on: https://review.whamcloud.com/32685
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
21 months agoLU-11054 lnet: remove non-error error message 60/32560/2
John L. Hammond [Fri, 25 May 2018 14:36:31 +0000 (09:36 -0500)]
LU-11054 lnet: remove non-error error message

In lnet_ipif_enumerate(), remove the CERROR() that prints each device.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ida8d1636e9e608087205defabda865f930fd38a1
Reviewed-on: https://review.whamcloud.com/32560
Tested-by: Jenkins
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-11043 kernel: kernel update RHEL7.5 [3.10.0-862.3.2.el7] 13/32513/3
Bob Glossman [Mon, 21 May 2018 23:20:05 +0000 (16:20 -0700)]
LU-11043 kernel: kernel update RHEL7.5 [3.10.0-862.3.2.el7]

update RHEL 7.5 kernel to 3.10.0-862.3.2.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I0defa14e83ce098c48b3228b4867afa73a2d9185
Reviewed-on: https://review.whamcloud.com/32513
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10808 lod: remove DoM component if DoM is disabled 82/32482/5
Mikhail Pershin [Mon, 21 May 2018 18:24:05 +0000 (21:24 +0300)]
LU-10808 lod: remove DoM component if DoM is disabled

If file is created with DoM component but server disables
DoM file creation then remove DoM entry from file layout
and keep other components.
If layout has only DoM entry then just return error.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ibafd0269d76dc5de4599efca064930607dc556eb
Reviewed-on: https://review.whamcloud.com/32482
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-11014 mdt: intent handling simplification 57/32357/3
John L. Hammond [Fri, 11 May 2018 14:01:32 +0000 (09:01 -0500)]
LU-11014 mdt: intent handling simplification

Remove the obsolete constants MDT_IT_CREATE, MDT_IT_READDIR,
MDT_IT_UNLINK, and MDT_IT_TRUNC from enum mdt_it_code. Also remove
MDT_IT_OCREAT, since (at this level) it can be handled identically to
MDT_IT_OPEN. Rename mdt_intent_reint() to mdt_intent_open() since it
only handles open. Move the definition of the mdt_it_flavor array down
and remove the then unneeded forward declarations of mdt_intent_*().
In struct mdt_it_flavor, remove the obsolete it_reint member and
rename the it_flags member to it_handler_flags to avoid confusion with
LDLM flags. Use 'enum tgt_handler_flags' rather than __u32 for several
parameters used to hold values of that type.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I297ef397c879fcc7711d725e0315e73439d95826
Reviewed-on: https://review.whamcloud.com/32357
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10977 test: add version check to sanity test_60ab 43/32343/4
Saurabh Tandan [Wed, 9 May 2018 21:27:15 +0000 (14:27 -0700)]
LU-10977 test: add version check to sanity test_60ab

Skip sanity.sh test_60ab if server is equal or
less than 2.11.51

Test-Parameters:trivial testlist=sanity envdefinitions=ONLY=60ab serverjob=lustre-b2_10 serverbuildno=69
Signed-off-by: Saurabh Tandan <saurabh.tandan@intel.com>
Change-Id: Ie9d2728790e19ac2a24c94e7c13ade28b5a5bbbe
Reviewed-on: https://review.whamcloud.com/32343
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-4423 obd: backport of lu_object changes upstream 25/32325/3
NeilBrown [Wed, 9 May 2018 02:46:29 +0000 (22:46 -0400)]
LU-4423 obd: backport of lu_object changes upstream

fold lu_object_new() into lu_object_find_at()

lu_object_new() duplicates a lot of code that is in
lu_object_find_at().
There is no real need for a separate function, it is simpler just
to skip the bits of lu_object_find_at() that we don't
want in the LOC_F_NEW case.

Linux-commit: 775c4dc274343e5e2959fa1171baf2fc01028840

discard extra lru count.

lu_object maintains 2 lru counts.
One is a per-bucket lsb_lru_len.
The other is the per-cpu ls_lru_len_counter.

The only times the per-bucket counters are use are:
 - a debug message when an object is added
 - in lu_site_stats_get when all the counters are combined.

The debug message is not essential, and the per-cpu counter
can be used to get the combined total.

So discard the per-bucket lsb_lru_len.

Linux-commit: e167b370360f8887cf21a2a82f83e7118a2aeb11

make struct lu_site_bkt_data private

This data structure only needs to be public so that
various modules can access a wait queue to wait for object
destruction.
If we provide a function to get the wait queue, rather than the
whole bucket, the structure can be made private.

Linux-commit: bc5e7fb40d36edb95ce8f661596811bec3f7d5cf

Change-Id: I26203f331a0c73ae4e23878eb10b15d9fcf546c5
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32325
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10971 tests: use changelog routines in lustre-rsync-test 08/32208/4
James Nunez [Mon, 30 Apr 2018 19:12:49 +0000 (13:12 -0600)]
LU-10971 tests: use changelog routines in lustre-rsync-test

The lustre-rsync-test script has two subroutines to register
and deregister changelog users. These subroutines should be
updated to use changelog_register() and changelog_deregister()
found in test-framework.sh.

Test-Parameters: trivial clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=lustre-rsync-test
Test-Parameters: clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=lustre-rsync-test
Test-Parameters: clientcount=2 mdscount=1 mdtcount=1 osscount=1 ostcount=8 mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=lustre-rsync-test
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ia54095a6e039f6835def0f9c49157b71088d9e51
Reviewed-on: https://review.whamcloud.com/32208
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10808 lod: align wrong DoM stripe values with defaults 73/32073/5
Mikhail Pershin [Thu, 19 Apr 2018 13:29:54 +0000 (16:29 +0300)]
LU-10808 lod: align wrong DoM stripe values with defaults

- Align DoM component size to the server limit size instead of
  returning a error. Error is returned still if DoM file creation
  is disabled on the server (DOM limit is set to 0)
- Correct wrong values for dom_stripesize parameter by using minimal
  stripe size if provided value is lower and by aligning it to be a
  multiple of that minimal size.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ifcdf60fddda65acda92509bb7e69c9b2951fb6bd
Reviewed-on: https://review.whamcloud.com/32073
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-4423 ptlrpc: use delayed_work in sec_gc 24/31724/3
Dmitry Eremin [Thu, 22 Mar 2018 15:51:00 +0000 (18:51 +0300)]
LU-4423 ptlrpc: use delayed_work in sec_gc

The garbage collection for security contexts currently has a dedicated
kthread which wakes up every 30 minutes to discard old garbage.

Replace this with a simple delayed_work item on the system work queue.

Change-Id: I5cdb023783104b5e21f4139731065946ed162af1
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/31724
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10648 ldlm: Reduce debug to console during eviction 37/31237/4
Patrick Farrell [Fri, 26 Aug 2016 16:03:33 +0000 (11:03 -0500)]
LU-10648 ldlm: Reduce debug to console during eviction

During an eviction, Lustre calls ldlm_namespace_cleanup,
and it will sometimes end up dumping all of the locks on a
particular resource to the console log
(ldlm_resource_complain), which is very wasteful and only
rarely helpful.

Move the debug level for this to D_NETERROR since it is in the
default debug mask.

Change-Id: I8a00f030393ce1748914d70fa8edb4690273e08a
Cray-bug-id: LUS-1418
Signed-off-by: Chris Horn <hornc@cray.com>
Signed-off-by: Patrick Farrell <paf@cray.com>
Reviewed-on: https://review.whamcloud.com/31237
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10472 osc: add T10PI support for RPC checksum 80/30980/37
Li Xi [Tue, 23 Jan 2018 07:17:17 +0000 (02:17 -0500)]
LU-10472 osc: add T10PI support for RPC checksum

T10 Protection Information (T10 PI), previously known as Data
Integrity Field (DIF), is a standard for end-to-end data integrity
validation. T10 PI prevents silent data corruption, ensuring that
incomplete and incorrect data cannot overwrite good data.

Lustre file system already supports RPC level checksum which
validates the data in bulk RPCs when writing/reading data to/from
objects on OSTs. RPC level checksum can detect data corruption that
happens during RPC being transferred over the wire. However, it is
not capable to prevent silent data corruption happening in other
conditions, for example, memory corruption when data is cached in
page cache. And by using the existing checksum mechanism, only
disjoint protection coverage is provided. Thus, in order to provide
end-to-end data protection, T10PI support for Lustre should be added.

In order to provide end-to-end data integrity validation, the T10 PI
checksum of data in a sector need to be calculated on Lustre client
side and validated later on the Lustre OSS side. The T10 protection
information should be sent together with the data in the RPC.
However, in order to avoid significant performance degradation,
instead of sending all original guard tags for all sectors in a bulk
RPC, the existing checksum feature of bulk RPC will be integrated
together with the new T10PI feature.

When OST starts, necessary T10PI information will be extracted from
storage, i.e. the T10PI DIF type and sector size. The DIF type could
be one of TYPE1_IP, TYPE1_CRC, TYPE3_IP and TYPE3_CRC. And sector
size could be either 512 or 4K bytes.

When an OSC is connecting to OST, OSC and OST will negotiate about
the checksum types. New checksum types are added for T10PI support
including OBD_CKSUM_T10IP512, OBD_CKSUM_T10IP4K, OBD_CKSUM_T10CRC512,
and OBD_CKSUM_T10CRC4K. If the OST storage has T10PI suppoort, the
only selectable T10PI checksum type would have the same type with the
T10PI type of the hardware. The other existing checksum types (crc32,
crc32c, adler32) are still valid options for the RPC checksum type.

When calculating RPC checksum of T10PI, the T10PI checksums of all
sectors will be calculated first using the T10PI chekcsum type, i.e.
16-bit crc or IP checksum. And then RPC checksum will be calculated on
all of the T10PI checksums. The RPC checksum type used in this step is
always alder32. Considering that the checksum-of-checksums is only
computed on a * 4KB chunk of GRD tags for a 1MB RPC for 512B sectors,
or 16KB of GRD tags for 16MB of 4KB sectors, this is only 1/256 or
1/1024 of the total data being checksummed, so the checksum type used
here should not affect overall system performance noticeably.

obdfilter.*.enforce_t10pi_cksum can be used to tune whether to enforce
T10-PI checksum or not.

If the OST supports T10-PI feature and T10-PI chekcsum is enforced, clients
will have no other choice for RPC checksum type other than using the T10PI
chekcsum type. This is useful for enforcing end-to-end integrity in the
whole system.

If the OST doesn't support T10-PI feature and T10-PI chekcsum is enforced,
together with other checksums with reasonably good speeds (e.g. crc32,
crc32c, adler, etc.), all the T10-PI checksum types (t10ip512, t10ip4K,
t10crc512, t10crc4K) will be added to the available checksum types,
regardless of the speeds of T10-PI chekcsums. This is useful for testing
T10-PI checksums of RPC.

If the OST supports T10-PI feature and T10-PI chekcsum is NOT enforced,
the corresponding T10-PI checksum type will be added to the checksum type
list, regardless of the speed of the T10-PI chekcsum. This provide the
clients to flexibility to choose whether to enable end-to-end integrity
or not.

If the OST does NOT supports T10-PI feature and T10-PI chekcsum is NOT
enforced, together with other checksums with reasonably good speeds,
all the T10-PI checksum types with good speeds will be added into the
checksum type list. Note that a T10-PI checksum type with a speed worse
than half of Alder will NOT be added as a option. In this circumstance,
T10-PI checksum types has the same behavior like other normal checksum
types.

The clients that has no T10-PI RPC checksum support will not be affected
by the above-mentioned logic. And that logic will only be enforced to the
newly connected clients after changing obdfilter.*.enforce_t10pi_cksum on
an OST.

Following are the speeds of different checksum types on a server with CPU
of Intel(R) Xeon(R) E5-2650 @ 2.00GHz:

crc: 1575 MB/s
crc32c: 9763 MB/s
adler: 1255 MB/s
t10ip512: 6151 MB/s
t10ip4k: 7935 MB/s
t10crc512: 1119 MB/s
t10crc4k: 1531 MB/s

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I6468680edeab0917bb71dbd8cd9ea16c65e935f5
Reviewed-on: https://review.whamcloud.com/30980
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
21 months agoLU-8066 llite: Preparation to move /proc/fs/lustre/llite to sysfs 31/24031/23
James Simmons [Fri, 25 May 2018 01:16:18 +0000 (21:16 -0400)]
LU-8066 llite: Preparation to move /proc/fs/lustre/llite to sysfs

Add necessary infrastructure, add support for mountpoint
registration in /sys/fs/lustre/llite

This is a heavly modified version of

Linux-commit: fd0d04ba85f95169106701397417360541a983b3

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: Ic9ca2044249a59dc79ebc86553c8b7ce7afbf710
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/24031
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-10264 misc: fix possible array overflow 42/32242/2
Andreas Dilger [Fri, 9 Mar 2018 23:18:53 +0000 (16:18 -0700)]
LU-10264 misc: fix possible array overflow

Fix a static analysis error.

lustre/obdclass/obd_mount_server.c:1830 in osd_start(), buffer
    flagstr has size 16 but length of format string "%lu:%lu" is 31.
Increase the size of buffer to hold maximal-sized strings plus NUL.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I3cc80d66bbb537161a561f4f2ba7830dde2cab07
Reviewed-on: https://review.whamcloud.com/32242
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-8972 tests: remove conf-sanity test from ALWAYS_EXCEPT 20/32220/4
James Nunez [Tue, 1 May 2018 19:20:28 +0000 (13:20 -0600)]
LU-8972 tests: remove conf-sanity test from ALWAYS_EXCEPT

A patch landed to fix the issue reported in LU-8972. We need
to run conf-sanity test 101 to ensure that the issue is
fixed and does not regress.

Remove conf-sanity test 101 from the ALWAYS_EXCEPT list.

Test-Parameters: trivial
Test-Parameters: trivial clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=conf-sanity
Test-Parameters: clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=conf-sanity
Test-Parameters: clientcount=2 mdscount=1 mdtcount=1 osscount=1 ostcount=8 mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=conf-sanity
Test-Parameters: clientcount=2 mdscount=1 mdtcount=1 osscount=1 ostcount=8 mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=conf-sanity
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ic678c7527a60cab2de6139041cca81017d4aa75e
Reviewed-on: https://review.whamcloud.com/32220
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
21 months agoLU-6160 osd-zfs: Fix refcount_add call 44/28544/2
Giuseppe Di Natale [Mon, 14 Aug 2017 16:51:52 +0000 (09:51 -0700)]
LU-6160 osd-zfs: Fix refcount_add call

Correct the refcount_add in osd-zfs module's osd_fix_new_dnode
function. The variable 'tag' was undefined and caused osd-zfs
to fail builds against zfs packages with debug enabled.

This small change should enable lustre to be built against
zfs packages that have debug enabled.

Test-Parameters: trivial
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: If95f0af6178cf0ea78724658edfaece1ee16a3f1
Reviewed-on: https://review.whamcloud.com/28544
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
22 months agoLU-9727 tests: exercise new changelog fields and records 35/32335/4
Sebastien Buisson [Fri, 19 Jan 2018 17:22:40 +0000 (02:22 +0900)]
LU-9727 tests: exercise new changelog fields and records

Add new tests in sanity-hsm to exercise new changelog fields
and also record types.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I1cd7282983d936105e1616aa859c47fd453e6017
Reviewed-on: https://review.whamcloud.com/32335
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>