Whamcloud - gitweb
fs/lustre-release.git
18 months agoLU-10986 lfs: make lfs project tolerant errors 43/32243/16
Wang Shilong [Wed, 2 May 2018 08:54:15 +0000 (16:54 +0800)]
LU-10986 lfs: make lfs project tolerant errors

This patch try to fix following problems:
1)command hang on pipe file, reproduced by following steps:
 $ mkfifo tmp/pipe
 $ lfs project -srp 500 tmp -->this will never finish.

Problem is opening a pipe file will be blocked in default
without O_NOBLOCK or O_NODELAY flag.

2)If a symbolic link with missing target exists, command
returns error and does not process remaining entries.

we should fix this problem by allowing command process
further even it hit some errors.

3)fix a wrong check for MAX_PATH.

Test-Parameters: trivial testlist=sanity-quota,sanity-quota
Change-Id: I7d08a7547e6b1351a1eff23063da6cd9c4cdc5e3
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32243
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-11086 test: reset quota setting properly 07/32707/3
Wang Shilong [Wed, 13 Jun 2018 14:12:16 +0000 (22:12 +0800)]
LU-11086 test: reset quota setting properly

some test cases don't reset quota setting properly, which
make running sanity-quota.sh several times fail, this patch
try to improve this problem by:

1)reset quota setting before check_runas_id_ret, as it will
touch file which might hit EDQUOT if we don't cleanup quota
setting properly since last run.

2)fix to reset quota for test case 55 and 60.

3)reset quota setting again after all tests finished, because
some tests after sanity-quota.sh might be affected, if quota
setting not reset properly for some reasons.

Test-Parameters: trivial testlist=sanity-quota,sanity-quota
Change-Id: I2983102ea379e64173ef8c54b149ba3b5fbfebe9
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32707
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10734 tests: ensure current GC interval is over 04/31604/9
Bruno Faccini [Fri, 9 Mar 2018 01:59:51 +0000 (02:59 +0100)]
LU-10734 tests: ensure current GC interval is over

In sanity/test_160g, ensure current configured
"changelog_min_gc_interval=2" is over to allow for
GC thread to be effectivelly started.

Also, enable Changelog GC, as it is no longer the
default, in sanity/test_160g sub-test and remove
it from ALWAYS_EXCEPT to reenable it and leave
160f for LU-10680 reason.

sanity/test_160g has also been reworked to become
fully DNE aware.

Test-Parameters: trivial
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I8a079ba2ba1822b488f65ad9703204d6296fada0
Reviewed-on: https://review.whamcloud.com/31604
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-11079 llite: control concurrent statahead instances 90/32690/7
Fan Yong [Wed, 13 Jun 2018 14:33:55 +0000 (22:33 +0800)]
LU-11079 llite: control concurrent statahead instances

It is found that if there are too many concurrent statahead
instances, then related statahead RPCs may accumulate on the
client import (for MDT) RPC lists
(imp_sending_list/imp_delayed_list/imp_unreplied_lis), as to
seriously affect the efficiency of spin_lock under the case
of MDT overloaded or in recovery. Be as the temporarily solution,
restrict the concurrent statahead instances.

If want to support more concurrent statahead instances, please
consider to decentralize the RPC lists attached on related import.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I7251cc536f11d184f768e3d3704ba6717644541e
Reviewed-on: https://review.whamcloud.com/32690
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10893 tests: allow to disable dm-flakey layer 58/32658/4
Alexander Boyko [Thu, 7 Jun 2018 13:54:41 +0000 (09:54 -0400)]
LU-10893 tests: allow to disable dm-flakey layer

The patch 54b9e3f789358bd9dfb94b77fe33a4faa1e28ab2 adds flakey layer
to test framework. But it also adds a regression, you can`t run tests
separately from a setup. Before the dm-flakey, it was easy to create a
configuration at ncli, setup a cluster, and start a test. But now it
is impossible. For example
sudo MDSDEV=/dev/sdb MDSDEV1=/dev/sdb sh lustre/tests/llmount.sh
sudo MDSDEV=/dev/sdb MDSDEV1=/dev/sdb ONLY=0 sh
lustre/tests/conf-sanity.sh
Format mds1: /dev/sdb
mkfs.lustre FATAL: Unable to build fs /dev/sdb (256)
mkfs.lustre FATAL: mkfs failed 256

The fix disables dm-flakey layer with option FLAKEY=false.

Test-Parameters: envdefinitions=FLAKEY=false
Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-5851
Change-Id: I248be2307cff5fe6b4b2524478ca8e4cd96a77d2
Reviewed-on: https://review.whamcloud.com/32658
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-11064 lnd: determine gaps correctly 86/32586/4
Amir Shehata [Wed, 30 May 2018 20:22:11 +0000 (13:22 -0700)]
LU-11064 lnd: determine gaps correctly

We're allowed to start at a non-aligned page offset in the first
fragment and end at a non-aligned page offset in the last fragment.

When checking the iovec exclude both of the first and last fragments
from the tx_gaps check.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I8a9231db7db404a5d5a6294ff263c1bd2ac28e6c
Reviewed-on: https://review.whamcloud.com/32586
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-11117 ptlrpc: don't zero request handle 81/32781/4
Alexander Boyko [Fri, 15 Jun 2018 09:02:36 +0000 (05:02 -0400)]
LU-11117 ptlrpc: don't zero request handle

LNet can retransmit a request at any time if it isn't replied.
The ptlrpc_resend_req zero the request handle and ptlrpc_send_rpc
set it. If retransmission happen with zeroed handle, the client
can't find a valid export by handle and set rq_export to NULL and
reply with ENOTCONN. A server evict client with this error.

client (nid x.x.x.x@tcp) returned error from blocking AST
(req status -107 rc -107), evict it

Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-6037
Change-Id: I198666d386fea99b46994f965c1519acb5743d75
Reviewed-on: https://review.whamcloud.com/32781
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-7816 quota: add default quota setting support 06/32306/16
Hongchao Zhang [Tue, 5 Jun 2018 22:23:42 +0000 (18:23 -0400)]
LU-7816 quota: add default quota setting support

Similar function which is motivated by GPFS which is friendly
feature for cluster administrators to manage quota.

Lazy Quota default setting support, here is basic idea:

Default quota setting is global quota setting for user, group,
project quotas, if default quota is set for one quota type,
newer created users/groups/projects will inherit this setting
automatically, since Lustre itself don't have ideas when new
users created, they could only know when this users trying to
acquire space from Lustre.

So we try to implement lazy quota setting inherit, Slave firstly
check if there exists default quota setting, if exists, it will
force slave to acquire quota from master, and master will detect
whether default quota is set, then it will set this quota and also
return proper grant space to slave.

To implement this and reuse existed quota APIs, we try to manage
the default quota in the quota record of 0 id, and enforce the
quota check when reading the quota recored from disk.

In the current Lustre implementation, the grace time is either
the time or the timestamp to be used after some quota ID exceeds
the soft limt, then 48bits should be enough for it, its high 16bits
can be used as kinds of quota flags, this patch will use one of
them as the default quota flag.

The global quota record used by default quota will set its soft
and hard limit as zero, its grace time will contain the default flag.

Use lfs setquota -U/-G/-P <mnt> to set default quota.
Use lfs setquota -u/-g/-p foo -d <mnt> to set foo to use default quota
Use lfs quota -U/-G/-P <mnt> to show default quota.

Test-Parameters: envdefinitions=DEBUG_SIZE=64

Change-Id: Ib23007360921832b3c7d5710ab50324bc5067286
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/32306
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-11003 ldlm: don't add canceling lock back to LRU 92/32692/2
Mikhail Pershin [Mon, 11 Jun 2018 06:44:01 +0000 (09:44 +0300)]
LU-11003 ldlm: don't add canceling lock back to LRU

When lock is converted check it is not canceling before
adding it back to LRU.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I278389f2a23b304d812f82ffb2dcee2ca70f5b21
Reviewed-on: https://review.whamcloud.com/32692
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-11004 ptlrpc: Serialize procfs access to scp_hist_reqs using mutex 07/32307/2
Andriy Skulysh [Thu, 12 Apr 2018 13:12:05 +0000 (16:12 +0300)]
LU-11004 ptlrpc: Serialize procfs access to scp_hist_reqs using mutex

scp_hist_reqs list can be quite long thus a lot of
userland processes can waste CPU power in spinlock cycles.

Change-Id: Ic0fa7338569f9a19213a1dc31f5479c96a76d23a
Cray-bug-id: LUS-5833
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/32307
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10527 obdclass: don't recycle loghandle upon ENOSPC 97/30897/4
Bruno Faccini [Wed, 17 Jan 2018 15:22:58 +0000 (16:22 +0100)]
LU-10527 obdclass: don't recycle loghandle upon ENOSPC

In llog_cat_add_rec(), upon -ENOSPC error being returned from
llog_cat_new_log(), don't reset "cathandle->u.chd.chd_current_log"
to NULL.
Not doing so will avoid to have llog_cat_declare_add_rec() repeatedly
and unnecessarily create new+partially initialized LLOGs/llog_handle
and assigned to "cathandle->u.chd.chd_current_log", this without
llog_init_handle() never being called to initialize
"loghandle->lgh_hdr".

Also, unnecessary LASSERT(llh) has been removed in
llog_cat_current_log() as it prevented to gracefully handle this
case by simply returning the loghandle.
Thanks to S.Cheremencev (Cray) to report this.

Both ways to fix have been kept in patch as the 1st part allows for
better performance in terms of number of FS operations being done
with permanent changelog's ENOSPC condition, even if this covers
a somewhat unlikely situation.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I526f788dc283fa7136ba518179d9337e1d5e3714
Reviewed-on: https://review.whamcloud.com/30897
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10175 ldlm: handle lock converts in cancel handler 14/32314/5
Mikhail Pershin [Mon, 7 May 2018 20:36:55 +0000 (23:36 +0300)]
LU-10175 ldlm: handle lock converts in cancel handler

- Use cancel portals and high-priority handling for lock
  converts. Update ldlm_cancel_handler to understand
  LDLM_CONVERT RPC for that.
- Use ns_dirty_age_limit for lock convert - don't convert too old
  locks.
- Check for empty converts and skip such

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I767626acd974ad88bbbf0bb3b0a46744c45b7897
Reviewed-on: https://review.whamcloud.com/32314
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoRevert "LU-8066 llite: replace ll_process_config with class_modify_config" 21/32721/2
Oleg Drokin [Thu, 14 Jun 2018 18:08:55 +0000 (18:08 +0000)]
Revert "LU-8066 llite: replace ll_process_config with class_modify_config"

This patch was landed by mistake.

This reverts commit db67e686d9abcf750359820bfbdb754ab611bf5c.

Change-Id: I2cbfe808eb7d5c448bdf06d4c36229813e6978d2
Reviewed-on: https://review.whamcloud.com/32721
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-8066 llite: replace ll_process_config with class_modify_config 95/32495/5
James Simmons [Sat, 9 Jun 2018 14:16:59 +0000 (10:16 -0400)]
LU-8066 llite: replace ll_process_config with class_modify_config

The current method of handling tunables with ll_process_config can
not work with sysfs. So replace ll_process_config handling with
class_modify_config() which can handle sysfs, debugfs and procfs.

Change-Id: I40611930ab2b769c0661aa7dce0c7dd0f2d90204
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32495
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
19 months agoLU-10560 osd: bio_integrity_enabled was removed 21/32621/3
Li Dongyang [Tue, 5 Jun 2018 01:40:43 +0000 (11:40 +1000)]
LU-10560 osd: bio_integrity_enabled was removed

T10PI bio support patches used bio_integrity_enabled
which was no longer available in recent kernels.
Fix this so we can have server support back on 4.13+
kernels.

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I32eeea244ad599c7af2d551b9b2b173e982d07d3
Reviewed-on: https://review.whamcloud.com/32621
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-11065 kernel: kernel update [SLES12 SP3 4.4.132-94.33] 99/32599/3
Bob Glossman [Thu, 31 May 2018 13:57:52 +0000 (06:57 -0700)]
LU-11065 kernel: kernel update [SLES12 SP3 4.4.132-94.33]

Update target, kernel_config, and ldiskfs files for new version
One ldiskfs patch revised for ext4 changes.
Old unchanged ldiskfs patch kept to use for sles12sp2.

Test-Parameters: clientdistro=sles12sp3 testgroup=review-ldiskfs \
  mdsdistro=sles12sp3 ossdistro=sles12sp3 \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ic6d0219a7133825d1dba0b2bfadf8354442cddb3
Reviewed-on: https://review.whamcloud.com/32599
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-11051 obd: remove obd_{get,put}ref() 29/32529/3
John L. Hammond [Thu, 17 May 2018 16:36:23 +0000 (11:36 -0500)]
LU-11051 obd: remove obd_{get,put}ref()

obd_getref() and obd_putref() are only used in the lov layer and only
implemented by the lov layer. So they can be removed in favor of
direct calls. Rename lov_{get,put}ref() to lov_tgts_{get,put}ref()
since they do not manage references on the lov device but on its
targets array.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I0f48eaf4bb42b81b2155c599f361a17dd7bb1ae3
Reviewed-on: https://review.whamcloud.com/32529
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10921 utils: improve lfs setstripe error message 42/32442/4
Andreas Dilger [Wed, 16 May 2018 22:18:33 +0000 (16:18 -0600)]
LU-10921 utils: improve lfs setstripe error message

Improve the error messages when "lfs setstripe" or "lfs setdirstripe"
is run on an existing file/directory.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I3b21fb65847822c73713e9a26d6dea978b3cab07
Reviewed-on: https://review.whamcloud.com/32442
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10175 ptlrpc: add LOCK_CONVERT connection flag 93/32593/3
Mikhail Pershin [Sun, 20 May 2018 18:00:23 +0000 (21:00 +0300)]
LU-10175 ptlrpc: add LOCK_CONVERT connection flag

Add LOCK_CONVERT connection flag to don't use lock
convert feature with old servers.

Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ie860f43955314017609774d692f89cfe3c2ab896
Reviewed-on: https://review.whamcloud.com/32593
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
19 months agoLU-10963 gnilnd: stats variables overflow assert 84/32184/4
Chuck Fossen [Thu, 26 Apr 2018 20:04:25 +0000 (15:04 -0500)]
LU-10963 gnilnd: stats variables overflow assert

Reverse bte rdma transactions stats were being
incremented by kgnilnd_admin_addref() which asserts when the value
goes negative. These stats should be incremented with atomic_inc
instead.

Test-Parameters: trivial
Cray-bug-id: LUS-5940
Signed-off-by: Chuck Fossen <chuckf@cray.com>
Change-Id: I06426bc078cc76f14c7b3efb5f3ceb71054c2d09
Reviewed-on: https://review.whamcloud.com/32184
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-4423 ldlm: use delayed_work for ldlm_pools_recalc 05/31705/8
NeilBrown [Thu, 31 May 2018 16:44:57 +0000 (12:44 -0400)]
LU-4423 ldlm: use delayed_work for ldlm_pools_recalc

ldlm currenty has a kthread which wakes up every so often and calls
ldlm_pools_recalc(). The thread is started and stopped, but no other
external interactions happen.

This can trivially be replaced by a delayed_work if we have
ldlm_pools_recalc() reschedule the work rather than just report when to
do that.

Change-Id: I85f8bc79ef86d1c7a6cbe159e6970445eb7f8389
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/31705
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10370 ofd: truncate does not update blocks count on client 73/31073/10
Arshad Hussain [Fri, 9 Feb 2018 19:11:51 +0000 (00:41 +0530)]
LU-10370 ofd: truncate does not update blocks count on client

'truncate' call correctly updates the server side with
correct size and blocks count. However, on the client
side all the metadata are correctly updated except the
blocks count, which still reflects the old count prior
to truncate call. This patch fixes this issue by
modifying ofd_punch_hdl() to update repbody with the
updated block count.

New test case under sanity is added to verify the that
the blocks counts are correctly updated after truncate call

Change-Id: I8f3f44e1668fab925339350074d1ad8ab681fc95
Co-authored-by: Abrarahmed Momin <abrar.momin@gmail.com>
Signed-off-by: Abrarahmed Momin <abrar.momin@gmail.com>
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/31073
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10120 lsnapshot: handle dash in fsname 26/30626/2
Fan Yong [Thu, 21 Dec 2017 12:09:30 +0000 (20:09 +0800)]
LU-10120 lsnapshot: handle dash in fsname

'-' is a valid character for Lustre fsname. Replace "strchr()"
with "strrchr()" to correctly parse fsname from configuration.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: Darby Vicker <darby.vicker-1@nasa.gov>
Change-Id: Ib972288668f1b7bcf1f9188c0e9cc77027e7ceeb
Reviewed-on: https://review.whamcloud.com/30626
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
19 months agoLU-9751 snapshot: set PATH for remote zfs commands 99/27999/15
Fan Yong [Mon, 9 Apr 2018 14:55:14 +0000 (22:55 +0800)]
LU-9751 snapshot: set PATH for remote zfs commands

It is possible that the remote zfs/zpool commands for Lustre
snapshot are NOT in the remote shell execute/search path. So
needs to set the PATH variable for the remote shell commands.

It is inconvenient for the admin to specify the PATH option
via single lsnapshot command for each Lustre target. So the
patch specifies the remote PATH environment variable as the
the local PATH environment variable. It requires all Lustre
servers to have broadly consistent zfs tools instalation in
such PATH.

It also contains some macro definations for code cleanup.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I2b1ce630d4aad63ab20e6c323f2222dccb51ed6e
Reviewed-on: https://review.whamcloud.com/27999
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-9764 lfsck: reset LFSCK trace file if fail to load it 97/27997/2
Fan Yong [Wed, 12 Jul 2017 04:50:41 +0000 (12:50 +0800)]
LU-9764 lfsck: reset LFSCK trace file if fail to load it

If the on-disk LFSCK trace file is corrupted, then LFSCK
may get failure when load it. Under such case, the LFSCK
should reset (recreate) the traces files by force.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0237a88ff23cdec680303ac3976a53c1632598fe
Reviewed-on: https://review.whamcloud.com/27997
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
19 months agoLU-10048 osd: async truncate 88/27488/43
Alex Zhuravlev [Wed, 7 Jun 2017 13:32:39 +0000 (17:32 +0400)]
LU-10048 osd: async truncate

osd-ldiskfs should execute truncate outside of main transaction
handle. This avoids restarting truncate transaction handles in
main transaction, and allows "transaction first, locking second"
model on OST.

Change-Id: Iffe45c42834c26ca72b65e068ad25ac61d0607c8
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/27488
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
19 months agoLU-6511 osd-ldiskfs: Fix all irregular indentation for osd_iam.c 98/19598/4
Parinay Kondekar [Sat, 26 May 2018 20:02:49 +0000 (01:32 +0530)]
LU-6511 osd-ldiskfs: Fix all irregular indentation for osd_iam.c

"osd_iam.c" had irregular and inconsistent indentation all
throughout the file. This patch fixes all the indentation
and space warnings throughout the file. There are still few
'checkpatch' errors/warnings left. However, to keep the patch
consistent only space and indents are corrected in this patch.

Test-Parameters: trivial
Change-Id: I55f650175b7efc85f87f216d8225b0517e8a3d94
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Signed-off-by: Parinay Kondekar <parinay.kondekar@seagate.com>
Reviewed-on: https://review.whamcloud.com/19598
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
19 months agoLU-7236 ptlrpc: idle connections can disconnect 82/16682/123
Alex Zhuravlev [Mon, 28 Sep 2015 13:50:15 +0000 (16:50 +0300)]
LU-7236 ptlrpc: idle connections can disconnect

 - when new request is being allocated ptlrpc initiates
   connection if it's not connected yet
 - if the import is idle (no locks, no active RPCs, no
   non-PING reply for last osc_idle_timeout seconds),
   then pinger tries to disconnect asynchronously
 - currently only client-to-OST connections can be idle
 - lctl set_param osc.*.idle_timeout=N controls new feature:
   N=0 - disable
   N>0 - seconds to idle before disconnect
 - lctl set_param osc.*.idle_connect=N to reconnect if idle
   (N is positive number)
 - OSC module parameter osc_idle_timeout controls default
   idle timeout and set to 20 seconds by default

Change-Id: I4b90eb5209a0b0e62d85fd55ad6e9cab8c03fd14
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/16682
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
19 months agoLU-11066 systemd: Add IB dependencies to lnet.service 46/32646/3
Nathaniel Clark [Thu, 31 May 2018 14:45:47 +0000 (10:45 -0400)]
LU-11066 systemd: Add IB dependencies to lnet.service

Add ordering for inkernel (rdma.server) and Mellanox MOFED
(openibd.service).

This ensures that systemd will shutdown lnet prior to IB, thus
preventing it from hanging.

Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ia0be1ca60eb8f54edd2f4f6bfbca10cbc01cc638
Reviewed-on: https://review.whamcloud.com/32646
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
19 months agoLU-11049 ssk: correctly handle null byte by lgss_sk 10/32510/4
Sebastien Buisson [Tue, 22 May 2018 15:50:53 +0000 (17:50 +0200)]
LU-11049 ssk: correctly handle null byte by lgss_sk

lgss_sk must include null byte with fsname and nodemap info taken from
command line.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie98444c930b8df521482468c4897e080ded0d2f6
Reviewed-on: https://review.whamcloud.com/32510
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10680 mdd: create gc thread when no current transaction 76/31376/40
Bruno Faccini [Thu, 22 Feb 2018 15:23:18 +0000 (16:23 +0100)]
LU-10680 mdd: create gc thread when no current transaction

Creating a kthread can't occur during a journal transaction is being
filled because otherwise a deadlock can happen if memory reclaim is
triggered by kthreadd when forking the new thread, and thus I/Os
could be attempted to the same device from shrinkers requiring a new
journal transaction to be started when current could never complete.

Thus this patch moves kthread_run() of gc_task in mdd_trans_stop().

Comment in mdd_changelog_max_idle_time_seq_write() as been updated
to reflect the need to limit the value to about 68 years, to allow
to keep with 32 bits operands for comparison,

As it will go away with recent kernels, get_seconds() usage has
been replaced by calling ktime_get_real_seconds() for user idle
time initialization and comparison.

Also, enable Changelog GC, as it is no longer the default, in
sanity/test_160f sub-test and remove it from ALWAYS_EXCEPT to
reenable it, leaving 160g for LU-10734 reason now. And in
addition, changes in sanity/test_160f have been added to make
it fully DNE-compatible.

With this patch, GC-thread can be stopped upon MDT umount, and
remaining orphan ChangeLog records clean-up will occur upon next
restart. New sanity/test_160h sub-test checks this scenario.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I7ec076bc04594b230c57348d7ac92acc58c258e1
Reviewed-on: https://review.whamcloud.com/31376
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-11069 llite: correct file position after appending writes 41/32641/3
John L. Hammond [Wed, 6 Jun 2018 13:14:50 +0000 (08:14 -0500)]
LU-11069 llite: correct file position after appending writes

In ll_file_io_generic() use the position returned in the kiocb to set
the returned file position. This ensures that the file position is set
correctly after an appending write. Add sanity test_23d() to check
that calling lseek() for the current offset returns the correct value
in this situation.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ic76ce49db6e87d5294e18546d5b75a12793aa99c
Reviewed-on: https://review.whamcloud.com/32641
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10419 lfsck: signal master engine when stop 27/31627/5
Fan Yong [Fri, 20 Apr 2018 21:53:50 +0000 (05:53 +0800)]
LU-10419 lfsck: signal master engine when stop

It is possible that during the LFSCK scanning, some server, MDT
or OST, maybe offline. At that time, if the LFSCK needs to talk
with such offline server, related RPC will trigger reconnect to
the offline server, and the LFSCK engine has to wait untill the
offline server become online or someone deactives the server by
force. To avoid being blocked when lfsck_stop() under such case,
the stop logic will send SIGINT signal to LFSCK engines. But we
only do that for the LFSCK assistant engines, forget to do that
for the LFSCK master engine. This patch fixes that.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I5d51ab49524e8ae54f0853e93b94e78913f65e8a
Reviewed-on: https://review.whamcloud.com/31627
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-11058 tests: stop running sanity test 77k 85/32685/2
James Nunez [Fri, 8 Jun 2018 18:46:10 +0000 (12:46 -0600)]
LU-11058 tests: stop running sanity test 77k

sanity test 77k is failing for a variety of Lustre
file system configurations. Stop running test 77k by
adding it to the ALWAYS_EXCEPT list.

When this issue is resolved, we need to resume running
sanity test 77k by removing it from the ALWAYS_EXCEPT list.

Test-Parameters: trivial
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I3cd53e721b1b3ede633603273dafd54c9f5701c4
Reviewed-on: https://review.whamcloud.com/32685
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
19 months agoLU-11054 lnet: remove non-error error message 60/32560/2
John L. Hammond [Fri, 25 May 2018 14:36:31 +0000 (09:36 -0500)]
LU-11054 lnet: remove non-error error message

In lnet_ipif_enumerate(), remove the CERROR() that prints each device.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ida8d1636e9e608087205defabda865f930fd38a1
Reviewed-on: https://review.whamcloud.com/32560
Tested-by: Jenkins
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-11043 kernel: kernel update RHEL7.5 [3.10.0-862.3.2.el7] 13/32513/3
Bob Glossman [Mon, 21 May 2018 23:20:05 +0000 (16:20 -0700)]
LU-11043 kernel: kernel update RHEL7.5 [3.10.0-862.3.2.el7]

update RHEL 7.5 kernel to 3.10.0-862.3.2.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I0defa14e83ce098c48b3228b4867afa73a2d9185
Reviewed-on: https://review.whamcloud.com/32513
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10808 lod: remove DoM component if DoM is disabled 82/32482/5
Mikhail Pershin [Mon, 21 May 2018 18:24:05 +0000 (21:24 +0300)]
LU-10808 lod: remove DoM component if DoM is disabled

If file is created with DoM component but server disables
DoM file creation then remove DoM entry from file layout
and keep other components.
If layout has only DoM entry then just return error.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ibafd0269d76dc5de4599efca064930607dc556eb
Reviewed-on: https://review.whamcloud.com/32482
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-11014 mdt: intent handling simplification 57/32357/3
John L. Hammond [Fri, 11 May 2018 14:01:32 +0000 (09:01 -0500)]
LU-11014 mdt: intent handling simplification

Remove the obsolete constants MDT_IT_CREATE, MDT_IT_READDIR,
MDT_IT_UNLINK, and MDT_IT_TRUNC from enum mdt_it_code. Also remove
MDT_IT_OCREAT, since (at this level) it can be handled identically to
MDT_IT_OPEN. Rename mdt_intent_reint() to mdt_intent_open() since it
only handles open. Move the definition of the mdt_it_flavor array down
and remove the then unneeded forward declarations of mdt_intent_*().
In struct mdt_it_flavor, remove the obsolete it_reint member and
rename the it_flags member to it_handler_flags to avoid confusion with
LDLM flags. Use 'enum tgt_handler_flags' rather than __u32 for several
parameters used to hold values of that type.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I297ef397c879fcc7711d725e0315e73439d95826
Reviewed-on: https://review.whamcloud.com/32357
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10977 test: add version check to sanity test_60ab 43/32343/4
Saurabh Tandan [Wed, 9 May 2018 21:27:15 +0000 (14:27 -0700)]
LU-10977 test: add version check to sanity test_60ab

Skip sanity.sh test_60ab if server is equal or
less than 2.11.51

Test-Parameters:trivial testlist=sanity envdefinitions=ONLY=60ab serverjob=lustre-b2_10 serverbuildno=69
Signed-off-by: Saurabh Tandan <saurabh.tandan@intel.com>
Change-Id: Ie9d2728790e19ac2a24c94e7c13ade28b5a5bbbe
Reviewed-on: https://review.whamcloud.com/32343
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-4423 obd: backport of lu_object changes upstream 25/32325/3
NeilBrown [Wed, 9 May 2018 02:46:29 +0000 (22:46 -0400)]
LU-4423 obd: backport of lu_object changes upstream

fold lu_object_new() into lu_object_find_at()

lu_object_new() duplicates a lot of code that is in
lu_object_find_at().
There is no real need for a separate function, it is simpler just
to skip the bits of lu_object_find_at() that we don't
want in the LOC_F_NEW case.

Linux-commit: 775c4dc274343e5e2959fa1171baf2fc01028840

discard extra lru count.

lu_object maintains 2 lru counts.
One is a per-bucket lsb_lru_len.
The other is the per-cpu ls_lru_len_counter.

The only times the per-bucket counters are use are:
 - a debug message when an object is added
 - in lu_site_stats_get when all the counters are combined.

The debug message is not essential, and the per-cpu counter
can be used to get the combined total.

So discard the per-bucket lsb_lru_len.

Linux-commit: e167b370360f8887cf21a2a82f83e7118a2aeb11

make struct lu_site_bkt_data private

This data structure only needs to be public so that
various modules can access a wait queue to wait for object
destruction.
If we provide a function to get the wait queue, rather than the
whole bucket, the structure can be made private.

Linux-commit: bc5e7fb40d36edb95ce8f661596811bec3f7d5cf

Change-Id: I26203f331a0c73ae4e23878eb10b15d9fcf546c5
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32325
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10971 tests: use changelog routines in lustre-rsync-test 08/32208/4
James Nunez [Mon, 30 Apr 2018 19:12:49 +0000 (13:12 -0600)]
LU-10971 tests: use changelog routines in lustre-rsync-test

The lustre-rsync-test script has two subroutines to register
and deregister changelog users. These subroutines should be
updated to use changelog_register() and changelog_deregister()
found in test-framework.sh.

Test-Parameters: trivial clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=lustre-rsync-test
Test-Parameters: clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=lustre-rsync-test
Test-Parameters: clientcount=2 mdscount=1 mdtcount=1 osscount=1 ostcount=8 mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=lustre-rsync-test
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ia54095a6e039f6835def0f9c49157b71088d9e51
Reviewed-on: https://review.whamcloud.com/32208
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10808 lod: align wrong DoM stripe values with defaults 73/32073/5
Mikhail Pershin [Thu, 19 Apr 2018 13:29:54 +0000 (16:29 +0300)]
LU-10808 lod: align wrong DoM stripe values with defaults

- Align DoM component size to the server limit size instead of
  returning a error. Error is returned still if DoM file creation
  is disabled on the server (DOM limit is set to 0)
- Correct wrong values for dom_stripesize parameter by using minimal
  stripe size if provided value is lower and by aligning it to be a
  multiple of that minimal size.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: Ifcdf60fddda65acda92509bb7e69c9b2951fb6bd
Reviewed-on: https://review.whamcloud.com/32073
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-4423 ptlrpc: use delayed_work in sec_gc 24/31724/3
Dmitry Eremin [Thu, 22 Mar 2018 15:51:00 +0000 (18:51 +0300)]
LU-4423 ptlrpc: use delayed_work in sec_gc

The garbage collection for security contexts currently has a dedicated
kthread which wakes up every 30 minutes to discard old garbage.

Replace this with a simple delayed_work item on the system work queue.

Change-Id: I5cdb023783104b5e21f4139731065946ed162af1
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/31724
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10648 ldlm: Reduce debug to console during eviction 37/31237/4
Patrick Farrell [Fri, 26 Aug 2016 16:03:33 +0000 (11:03 -0500)]
LU-10648 ldlm: Reduce debug to console during eviction

During an eviction, Lustre calls ldlm_namespace_cleanup,
and it will sometimes end up dumping all of the locks on a
particular resource to the console log
(ldlm_resource_complain), which is very wasteful and only
rarely helpful.

Move the debug level for this to D_NETERROR since it is in the
default debug mask.

Change-Id: I8a00f030393ce1748914d70fa8edb4690273e08a
Cray-bug-id: LUS-1418
Signed-off-by: Chris Horn <hornc@cray.com>
Signed-off-by: Patrick Farrell <paf@cray.com>
Reviewed-on: https://review.whamcloud.com/31237
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10472 osc: add T10PI support for RPC checksum 80/30980/37
Li Xi [Tue, 23 Jan 2018 07:17:17 +0000 (02:17 -0500)]
LU-10472 osc: add T10PI support for RPC checksum

T10 Protection Information (T10 PI), previously known as Data
Integrity Field (DIF), is a standard for end-to-end data integrity
validation. T10 PI prevents silent data corruption, ensuring that
incomplete and incorrect data cannot overwrite good data.

Lustre file system already supports RPC level checksum which
validates the data in bulk RPCs when writing/reading data to/from
objects on OSTs. RPC level checksum can detect data corruption that
happens during RPC being transferred over the wire. However, it is
not capable to prevent silent data corruption happening in other
conditions, for example, memory corruption when data is cached in
page cache. And by using the existing checksum mechanism, only
disjoint protection coverage is provided. Thus, in order to provide
end-to-end data protection, T10PI support for Lustre should be added.

In order to provide end-to-end data integrity validation, the T10 PI
checksum of data in a sector need to be calculated on Lustre client
side and validated later on the Lustre OSS side. The T10 protection
information should be sent together with the data in the RPC.
However, in order to avoid significant performance degradation,
instead of sending all original guard tags for all sectors in a bulk
RPC, the existing checksum feature of bulk RPC will be integrated
together with the new T10PI feature.

When OST starts, necessary T10PI information will be extracted from
storage, i.e. the T10PI DIF type and sector size. The DIF type could
be one of TYPE1_IP, TYPE1_CRC, TYPE3_IP and TYPE3_CRC. And sector
size could be either 512 or 4K bytes.

When an OSC is connecting to OST, OSC and OST will negotiate about
the checksum types. New checksum types are added for T10PI support
including OBD_CKSUM_T10IP512, OBD_CKSUM_T10IP4K, OBD_CKSUM_T10CRC512,
and OBD_CKSUM_T10CRC4K. If the OST storage has T10PI suppoort, the
only selectable T10PI checksum type would have the same type with the
T10PI type of the hardware. The other existing checksum types (crc32,
crc32c, adler32) are still valid options for the RPC checksum type.

When calculating RPC checksum of T10PI, the T10PI checksums of all
sectors will be calculated first using the T10PI chekcsum type, i.e.
16-bit crc or IP checksum. And then RPC checksum will be calculated on
all of the T10PI checksums. The RPC checksum type used in this step is
always alder32. Considering that the checksum-of-checksums is only
computed on a * 4KB chunk of GRD tags for a 1MB RPC for 512B sectors,
or 16KB of GRD tags for 16MB of 4KB sectors, this is only 1/256 or
1/1024 of the total data being checksummed, so the checksum type used
here should not affect overall system performance noticeably.

obdfilter.*.enforce_t10pi_cksum can be used to tune whether to enforce
T10-PI checksum or not.

If the OST supports T10-PI feature and T10-PI chekcsum is enforced, clients
will have no other choice for RPC checksum type other than using the T10PI
chekcsum type. This is useful for enforcing end-to-end integrity in the
whole system.

If the OST doesn't support T10-PI feature and T10-PI chekcsum is enforced,
together with other checksums with reasonably good speeds (e.g. crc32,
crc32c, adler, etc.), all the T10-PI checksum types (t10ip512, t10ip4K,
t10crc512, t10crc4K) will be added to the available checksum types,
regardless of the speeds of T10-PI chekcsums. This is useful for testing
T10-PI checksums of RPC.

If the OST supports T10-PI feature and T10-PI chekcsum is NOT enforced,
the corresponding T10-PI checksum type will be added to the checksum type
list, regardless of the speed of the T10-PI chekcsum. This provide the
clients to flexibility to choose whether to enable end-to-end integrity
or not.

If the OST does NOT supports T10-PI feature and T10-PI chekcsum is NOT
enforced, together with other checksums with reasonably good speeds,
all the T10-PI checksum types with good speeds will be added into the
checksum type list. Note that a T10-PI checksum type with a speed worse
than half of Alder will NOT be added as a option. In this circumstance,
T10-PI checksum types has the same behavior like other normal checksum
types.

The clients that has no T10-PI RPC checksum support will not be affected
by the above-mentioned logic. And that logic will only be enforced to the
newly connected clients after changing obdfilter.*.enforce_t10pi_cksum on
an OST.

Following are the speeds of different checksum types on a server with CPU
of Intel(R) Xeon(R) E5-2650 @ 2.00GHz:

crc: 1575 MB/s
crc32c: 9763 MB/s
adler: 1255 MB/s
t10ip512: 6151 MB/s
t10ip4k: 7935 MB/s
t10crc512: 1119 MB/s
t10crc4k: 1531 MB/s

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I6468680edeab0917bb71dbd8cd9ea16c65e935f5
Reviewed-on: https://review.whamcloud.com/30980
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
19 months agoLU-8066 llite: Preparation to move /proc/fs/lustre/llite to sysfs 31/24031/23
James Simmons [Fri, 25 May 2018 01:16:18 +0000 (21:16 -0400)]
LU-8066 llite: Preparation to move /proc/fs/lustre/llite to sysfs

Add necessary infrastructure, add support for mountpoint
registration in /sys/fs/lustre/llite

This is a heavly modified version of

Linux-commit: fd0d04ba85f95169106701397417360541a983b3

due to the large amount of changes to the OpenSFS/Intel branch.

Change-Id: Ic9ca2044249a59dc79ebc86553c8b7ce7afbf710
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/24031
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10264 misc: fix possible array overflow 42/32242/2
Andreas Dilger [Fri, 9 Mar 2018 23:18:53 +0000 (16:18 -0700)]
LU-10264 misc: fix possible array overflow

Fix a static analysis error.

lustre/obdclass/obd_mount_server.c:1830 in osd_start(), buffer
    flagstr has size 16 but length of format string "%lu:%lu" is 31.
Increase the size of buffer to hold maximal-sized strings plus NUL.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I3cc80d66bbb537161a561f4f2ba7830dde2cab07
Reviewed-on: https://review.whamcloud.com/32242
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-8972 tests: remove conf-sanity test from ALWAYS_EXCEPT 20/32220/4
James Nunez [Tue, 1 May 2018 19:20:28 +0000 (13:20 -0600)]
LU-8972 tests: remove conf-sanity test from ALWAYS_EXCEPT

A patch landed to fix the issue reported in LU-8972. We need
to run conf-sanity test 101 to ensure that the issue is
fixed and does not regress.

Remove conf-sanity test 101 from the ALWAYS_EXCEPT list.

Test-Parameters: trivial
Test-Parameters: trivial clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=conf-sanity
Test-Parameters: clientcount=2 mdscount=2 mdtcount=4 osscount=1 ostcount=8 mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=conf-sanity
Test-Parameters: clientcount=2 mdscount=1 mdtcount=1 osscount=1 ostcount=8 mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs testlist=conf-sanity
Test-Parameters: clientcount=2 mdscount=1 mdtcount=1 osscount=1 ostcount=8 mdtfilesystemtype=zfs ostfilesystemtype=zfs testlist=conf-sanity
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ic678c7527a60cab2de6139041cca81017d4aa75e
Reviewed-on: https://review.whamcloud.com/32220
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-6160 osd-zfs: Fix refcount_add call 44/28544/2
Giuseppe Di Natale [Mon, 14 Aug 2017 16:51:52 +0000 (09:51 -0700)]
LU-6160 osd-zfs: Fix refcount_add call

Correct the refcount_add in osd-zfs module's osd_fix_new_dnode
function. The variable 'tag' was undefined and caused osd-zfs
to fail builds against zfs packages with debug enabled.

This small change should enable lustre to be built against
zfs packages that have debug enabled.

Test-Parameters: trivial
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: If95f0af6178cf0ea78724658edfaece1ee16a3f1
Reviewed-on: https://review.whamcloud.com/28544
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-9727 tests: exercise new changelog fields and records 35/32335/4
Sebastien Buisson [Fri, 19 Jan 2018 17:22:40 +0000 (02:22 +0900)]
LU-9727 tests: exercise new changelog fields and records

Add new tests in sanity-hsm to exercise new changelog fields
and also record types.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I1cd7282983d936105e1616aa859c47fd453e6017
Reviewed-on: https://review.whamcloud.com/32335
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10796 tests: standardize changelog testing in sanity-hsm 13/31613/13
Quentin Bouget [Fri, 9 Mar 2018 14:20:03 +0000 (14:20 +0000)]
LU-10796 tests: standardize changelog testing in sanity-hsm

To manage changelog users and changelog records, sanity-hsm used to
define:
 - changelog_setup
 - changelog_cleanup
 - changelog_get_flags

test-framework.sh implements similar functions:
 - changelog_register
 - changelog_deregister
 - changelog_dump
 - changelog2array (new in this patch)

This patch removes the implementations of sanity-hsm in favor of
those in test-framework.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: Ie3db8ef646fa48d06bf41b6025b3443de026cabd
Reviewed-on: https://review.whamcloud.com/31613
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-8130 ldlm: store name directly in namespace. 08/32408/2
NeilBrown [Tue, 15 May 2018 17:29:27 +0000 (13:29 -0400)]
LU-8130 ldlm: store name directly in namespace.

Rather than storing the name of a namespace in the
hash table, store it directly in the namespace.
This will allow the hashtable to be changed to use
rhashtable.

Linux-commit: 648ae363628c84faa8d8861e3246e096b8c0a392

Change-Id: Ie5bb8092c9e1831fbc38beade46be6d35f3256dc
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/32408
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
19 months agoLU-11017 quota: ignore quota for CAP_SYS_RESOURCE properly 78/32378/10
Wang Shilong [Wed, 16 May 2018 02:13:13 +0000 (10:13 +0800)]
LU-11017 quota: ignore quota for CAP_SYS_RESOURCE properly

Currently, lustre quota will ignore this type of quota if
quota id is 0 or we force to ignore.

For write, we have passed CAP_SYS_RESOURCE properly, but
For metadata operations this is not done.

Test-Parameters: testlist=sanity-quota
Change-Id: Ibcdc0e53ad125042d4889ac51a9a9ead4066c0c8
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/32378
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-11015 lov: Move lov_tgts_kobj init to lov_setup 67/32367/4
Oleg Drokin [Fri, 18 May 2018 01:43:18 +0000 (21:43 -0400)]
LU-11015 lov: Move lov_tgts_kobj init to lov_setup

and free it in lov_cleanup.
This looks like a more robust solution vs doint it in lov_putref
esp. since we know refcount there crosses 0 repeatedly, confusing
things.

Change-Id: I49b1a1e97464bd388fe20a97b903468139730213
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/32367
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
19 months agoLU-11010 tests: remove calls to return after skip() 46/32346/2
James Nunez [Wed, 9 May 2018 22:11:19 +0000 (16:11 -0600)]
LU-11010 tests: remove calls to return after skip()

The skip() routine now contains a call to exit. All calls
to skip() and skip_env() should be reviewed and calls to
return that followed skip should be removed.

A problem with the skip message not being printed is corrected.

Test-Parameters: trivial testlist=sanity-pfl
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I1a52e9bd79a71de4ab4c0cea9c569f379115a603
Reviewed-on: https://review.whamcloud.com/32346
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-11003 ldlm: fix for l_lru usage 09/32309/3
Yang Sheng [Mon, 7 May 2018 15:59:01 +0000 (23:59 +0800)]
LU-11003 ldlm: fix for l_lru usage

Fixes for lock convert code to prevent false assertion and
busy locks in LRU:
- ensure no l_readers and l_writers when add lock to LRU after
  convert.
- don't verify l_lru without ns_lock.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I8bcbdef3cb72db241ad03c50f5ce2b968e3ee3e4
Reviewed-on: https://review.whamcloud.com/32309
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10855 llog: remove obsolete llog handlers 02/32202/2
John L. Hammond [Mon, 30 Apr 2018 14:08:48 +0000 (09:08 -0500)]
LU-10855 llog: remove obsolete llog handlers

Remove the obsolete llog RPC handling for cancel, close, and
destroy. Remove llog handling from ldlm_callback_handler(). Remove the
unused client side method llog_client_destroy().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ieab44f3796971a7d3c65d6044e4c0be4afb4b508
Reviewed-on: https://review.whamcloud.com/32202
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-10938 ptlrpc: Add WBC connect flag 41/32241/5
Oleg Drokin [Wed, 2 May 2018 07:03:38 +0000 (03:03 -0400)]
LU-10938 ptlrpc: Add WBC connect flag

It denotes ability of the node to understand additional
types of intent requests, exclusive metadata locks issued
to clients and server operations performed under such
locks while still held by clients.

Test-Parameters: trivial
Change-Id: I72c1ddfdf94edea3b357d82da6c410bc2d79a75c
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/32241
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
19 months agoLU-10945 ldlm: fix l_last_activity usage 33/32133/3
Alexander Boyko [Tue, 24 Apr 2018 07:06:42 +0000 (03:06 -0400)]
LU-10945 ldlm: fix l_last_activity usage

When race happen between ldlm_server_blocking_ast() and
ldlm_request_cancel(), the at_measured() is called with wrong
value equal to current time. And even worse, ldlm_bl_timeout() can
return current_time*1.5.
Before a time functions was fixed by LU-9019(e920be681) for 64bit,
this race leads to ETIMEDOUT at ptlrpc_import_delay_req() and
client eviction during bl ast sending. The wrong type conversion
take a place at pltrpc_send_limit_expired() at cfs_time_seconds().

We should not take cancels into accoount if the BLAST is not send,
just because the last_activity is not properly initialised - it
destroys the AT completely.
The patch devides l_last_activity to the client l_activity and
server l_blast_sent for better understanding. The l_blast_sent is
used for blocking ast only to measure time between BLAST and
cancel request.

For example:
 server cancels blocked lock after 1518731697s
 waiting_locks_callback()) ### lock callback timer expired after 0s:
 evicting client

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I44962d2b3675b77e09182bbe062bdd78d6cb0af5
Cray-bug-id: LUS-5736
Reviewed-on: https://review.whamcloud.com/32133
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-4684 xattr: add list support for remote object 26/31426/9
Lai Siyao [Sun, 21 Jan 2018 07:57:22 +0000 (15:57 +0800)]
LU-4684 xattr: add list support for remote object

XATTR_LIST may be issued to a remote object in directory migration,
add this support for OSP and OUT.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I9681e149703de2837a04dc1448d1bd583659205d
Reviewed-on: https://review.whamcloud.com/31426
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
19 months agoLU-4684 mdt: improve directory stripe lock 25/31425/9
Lai Siyao [Sun, 21 Jan 2018 05:22:24 +0000 (13:22 +0800)]
LU-4684 mdt: improve directory stripe lock

Striped directory has an implication that the first stripe is
local, and others are remote, but this is not true for migrating
directory because its stripes consists of both the original and
the newly created stripes.

This patch also put striped directory master object locking and
stripes locking into one function called mdt_reint_striped_lock(),
which simplifies locking code.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I4724447e5b10c301b6799e1827f6d13a40876945
Reviewed-on: https://review.whamcloud.com/31425
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-11026 lustre-dkms should require patch or quilt 31/32431/3
Joe Grund [Wed, 16 May 2018 17:18:56 +0000 (13:18 -0400)]
LU-11026 lustre-dkms should require patch or quilt

Add patch requirement to lustre-dkms.spec.in
as it (or quilt) are needed for lustre-build-ldiskfs.

  - Add requires patch.

Change-Id: I640bae382511502c02a0237694c93c304047f339
Signed-off-by: Joe Grund <joe.grund@intel.com>
Reviewed-on: https://review.whamcloud.com/32431
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
20 months agoLU-10964 build: armv7 client build fixes 94/32194/8
Andrew Perepechko [Sat, 28 Apr 2018 08:32:03 +0000 (11:32 +0300)]
LU-10964 build: armv7 client build fixes

This commit is supposed to fix armv7 Lustre client
build, mostly 64-bit division related changes.

Change-Id: I93d83d577351c1a1053e39a162cb1e85fc4e8aa3
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/32194
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-7943 mdd: Move assignment after LASSERT() 76/32376/2
Arshad Hussain [Sat, 12 May 2018 08:43:54 +0000 (14:13 +0530)]
LU-7943 mdd: Move assignment after LASSERT()

This patch moves 'sname->ln_namelen' assignment call after LASSERT() call.
This avoids a case when 'sname' parameter is NULL and dereferencing the
NULL pointer would fault before it reaches LASSERT()

Change-Id: I68b07f7ca33fd21ee0599b7bb73d6e41546bd2d8
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/32376
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
20 months agoLU-10897 kernel: kernel upgrade RHEL7.5 [3.10.0-862.2.3.el7] 70/32370/5
Bob Glossman [Thu, 10 May 2018 14:46:35 +0000 (07:46 -0700)]
LU-10897 kernel: kernel upgrade RHEL7.5 [3.10.0-862.2.3.el7]

With this mod we switch our supported el7 version to RHEL 7.5

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Iedcea9498591d15eab69187274e4c32c57879e4e
Reviewed-on: https://review.whamcloud.com/32370
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-11019 build: Update ZFS/SPL to 0.7.9 88/32388/2
Nathaniel Clark [Mon, 14 May 2018 19:06:45 +0000 (15:06 -0400)]
LU-11019 build: Update ZFS/SPL to 0.7.9

This updates the ZFS version to 0.7.9.

https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.9

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I9452a589d9dc719de7a63d3ed287dec8b6f7c0b6
Reviewed-on: https://review.whamcloud.com/32388
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
20 months agoLU-4423 libcfs: disable preempt while sampling processor id. 60/32360/3
NeilBrown [Sat, 12 May 2018 17:52:58 +0000 (13:52 -0400)]
LU-4423 libcfs: disable preempt while sampling processor id.

Calling smp_processor_id() without disabling preemption
triggers a warning (if CONFIG_DEBUG_PREEMPT).
I think the result of cfs_cpt_current() is only used as a hint for
load balancing, rather than as a precise and stable indicator of
the current CPU.  So it doesn't need to be called with
preemption disabled.

So disable preemption inside cfs_cpt_current() to silence the warning.

Linux-commit : dbeccabf5294e80f7cc9ee566746c42211bed736

Change-Id: Iaa930acc7a2633c0e40bcabbe6bd309a3d767325
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32360
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-9859 libcfs: rearrange placement of CPU partition management code. 59/32359/3
NeilBrown [Sat, 12 May 2018 13:51:30 +0000 (09:51 -0400)]
LU-9859 libcfs: rearrange placement of CPU partition management code.

Currently the code for cpu-partition tables lives in various places.
The non-SMP code is partly in libcfs/libcfs_cpu.h as static inlines,
and partly in lnet/libcfs/libcfs_cpu.c - some of the functions are
tiny and could well be inlines.

The SMP code is all in lnet/libcfs/linux/linux-cpu.c.

This patch moves all the trivial non-SMP functions into
libcfs_cpu.h as inlines, and all the SMP functions into libcfs_cpu.c
with the non-trival !SMP code.

Now when you go looking for some function, it is easier to find both
versions together when neither is trivial.

There is no code change here - just code movement.

Linux-commit: 93aa2c2a5091bd47819a3ead4af70fb57fda5065

Change-Id: I5250a52cad576eaeec17de176a3ca45ad076c4b9
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32359
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10997 build: add files to .gitignore 04/32304/2
James Simmons [Sat, 5 May 2018 18:00:30 +0000 (14:00 -0400)]
LU-10997 build: add files to .gitignore

To avoid by accident adding files created during the build
process add them to the .gitignore. For Ubuntu 18 we add
in .cache.mk and *.o.ur-safe which only show up on that
platform

Test-Parameters: trivial

Change-Id: Ie1329e765f080cbdac1bd3efdd63f83a65d45989
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32304
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
20 months agoLU-10886 build: fix WARNING: modpost: missing MODULE_LICENSE() 03/32303/3
James Simmons [Sat, 5 May 2018 21:35:30 +0000 (17:35 -0400)]
LU-10886 build: fix WARNING: modpost: missing MODULE_LICENSE()

With newer kernels modpost will print a warning when no license is
provided. This could give false results if the kernel test set
the -Werror flag. Update LB_LANG_PROGRAM macro to include the
missing MODULE_LICENSE().

Test-Parameters: trivial

Change-Id: Ia21d0fa5ee6c224d05b7949540ef805d09d3c7c5
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32303
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10906 checksum: enable/disable checksum correctly 95/32095/7
Emoly Liu [Fri, 20 Apr 2018 11:10:00 +0000 (19:10 +0800)]
LU-10906 checksum: enable/disable checksum correctly

There are three ways to set checksum support in Lustre. Their
order during client mount is:
- 1. configure --enable/disable-checksum, this(ENABLE_CHECKSUM)
  only affects the default mount option and is set in function
  client_obd_setup().
- 2. lctl set_param -P osc.*.checksums=0/1, when processing llog,
  this value will be set by osc_checksum_seq_write().
- 3. mount option checksum/nochecksum, this will be checked in
  ll_options() and be set in client_common_fill_super()->
  obd_set_info_async().

This patch fixes one issue in 3. That is if mount option
"-o checksum/nochecksum" is specified, checksum will be changed
accordingly, no matter what is set by "set_param -P" or the
default option; and if no mount option is specified, the value
set by "set_param -P" will be kept. Also, test_77k is added to
sanity.sh to verify this patch.

What's more, a minor initialization issue of cl_supp_cksum_types
is fixed. cl_supp_cksum_types should be always initialized no
matter checksum is enabled or not.

Change-Id: I95d73122d800f5cd44b5fabb0cf00b5be0a35443
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/32095
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
20 months agoLU-9474 tests: remove unneeded hsm_set_param for raolu tests 29/31429/3
Quentin Bouget [Tue, 27 Feb 2018 15:22:08 +0000 (15:22 +0000)]
LU-9474 tests: remove unneeded hsm_set_param for raolu tests

A previous commit introduced the stack_trap() utility function,
the patch should have removed any duplicate cleanup actions but
missed a few in test_26{a,b,c,d} of sanity-hsm.

This commits removes them.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I63e1df6610d0fbceb2f88a5b59b8263b8ccaf525
Reviewed-on: https://review.whamcloud.com/31429
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
20 months agoLU-739 utils: remove references to lustre 1.4 upgrade flag 42/27342/6
Gregoire Pichon [Sun, 13 May 2018 01:43:18 +0000 (21:43 -0400)]
LU-739 utils: remove references to lustre 1.4 upgrade flag

Remove all references to LDD_F_UPGRADE14 flag since we don't support
1.4 upgrade any more. Also we can remove CFG_F_COMPAT146.

Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Change-Id: Ib0ac680646751f9372923875142ce04ea5c8ec53
Reviewed-on: https://review.whamcloud.com/27342
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-6399 lnet: socket cleanup 79/14179/8
Alexander Boyko [Mon, 7 May 2018 13:56:51 +0000 (09:56 -0400)]
LU-6399 lnet: socket cleanup

The ioctl request was designed to get all needed information
through socket from usermode. But the same patterns with tricks
was used at kernel by lib-socket..
The patch changes this behavior from socket to kernel socket.
For libcfs_sock_ioctl tricks with usermode call removed,
kernel_sock_ioctl is used instead. But this call handle SIOCGIFADDR
and SIOCGIFNETMASK. For SIOCGIFFLAGS we take device flag directly.
Function libcfs_ipif_enumerate() which are used SIOCGIFCONF command
totally rewriten to get device names without ioctl requests.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: Idf0da800a49dbefa419fc5fecaa9ee1bd4d85327
Reviewed-on: https://review.whamcloud.com/14179
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-11024 osd-zfs: properly detect ZFS dnode accounting 18/32418/3
Fan Yong [Wed, 16 May 2018 08:00:29 +0000 (16:00 +0800)]
LU-11024 osd-zfs: properly detect ZFS dnode accounting

Properly detect if native ZFS dnode accounting is available for
ZFS 0.7.x releases that do not contain ZFS project quota.  The
function signature changed after ZFS Project Quota was landed,
but we still need to check for the old function for 0.7.x.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Icd40e0bdaa0b7738e9aa761836167780843ebbe5
Reviewed-on: https://review.whamcloud.com/32418
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
20 months agoLU-10972 test: get client uuid correctly 89/32289/3
Emoly Liu [Fri, 4 May 2018 05:19:28 +0000 (13:19 +0800)]
LU-10972 test: get client uuid correctly

Correct sanity.sh test_0d script to get client uuid value only.

Test-Parameters:trivial testlist=sanity envdefinitions=ONLY=0d \
serverjob=lustre-b2_11 serverbuildno=2

Change-Id: I6167d9b9c3c3e86f2a5581f8ac4ccab8f1137ce7
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/32289
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
20 months agoLU-10989 utils: correct lustre_rsync changelog clear logic 47/32247/2
John L. Hammond [Wed, 2 May 2018 14:43:07 +0000 (09:43 -0500)]
LU-10989 utils: correct lustre_rsync changelog clear logic

In the non-extended rename case of lr_replicate() copy the record
number from ext to info. Then remove the spurious rename record
handling from lr_clear_cl().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I615ec2f384f5f9d7807156acb3ce66ac47ca1e77
Reviewed-on: https://review.whamcloud.com/32247
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10978 utils: preserve lustre_rsync state 46/32246/2
John L. Hammond [Wed, 2 May 2018 14:09:05 +0000 (09:09 -0500)]
LU-10978 utils: preserve lustre_rsync state

In lustre_rsync, if debugging is enabled then we zero-out info and ext
in each iteration of the main loop of lr_replicate(). This may prevent
the consumed changelog entries from being cleared at the end of
lr_replicate().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ibd089524ca6cd6f488fc3558882cc431fdf65de7
Reviewed-on: https://review.whamcloud.com/32246
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10980 test: correct the version code check in sanityn.sh 39/32239/3
Emoly Liu [Wed, 2 May 2018 06:29:54 +0000 (14:29 +0800)]
LU-10980 test: correct the version code check in sanityn.sh

The following patch was landed to 2.11.50 instead of 2.10.58, so
the version code check in sanityn.sh test_77ja should be corrected:
- Lustre-commit: e0cdde123c14729a340cf937cf9580be7c9dd9c1
- Lustre-change: https://review.whamcloud.com/27608

Test-Parameters: trivial testlist=sanityn envdefinitions=ONLY=77ja \
serverjob=lustre-b2_11 serverbuildno=2

Change-Id: I619d5fd6fa2297663084d3b5b8bdbc1b4642cdc1
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/32239
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10979 test: correct the version code check in sanity-sec.sh 38/32238/3
Emoly Liu [Wed, 2 May 2018 05:57:19 +0000 (13:57 +0800)]
LU-10979 test: correct the version code check in sanity-sec.sh

The following patch was landed to 2.11.50 instead of 2.10.59, so
the version code check in sanity-sec.sh test_27a/b should be
corrected.
- Lustre-commit: 5b64d9fb0d5c5f292548c23e7841cc30f7a8423e
- Lustre-change: https://review.whamcloud.com/#/c/31450

Test-Parameters: trivial testlist=sanity-sec envdefinitions=ONLY=27 \
serverjob=lustre-b2_11 serverbuildno=2

Change-Id: I641ada8fb75d6347873a86ef453ebafd029876af
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/32238
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
20 months agoLU-9325 libcfs: handle complex strings in cfs_str2num_check 17/32217/3
James Simmons [Tue, 8 May 2018 13:30:15 +0000 (09:30 -0400)]
LU-9325 libcfs: handle complex strings in cfs_str2num_check

Originally the function cfs_str2num_check used simple_strtoul
but has been updated to kstrtoul. The string passed into
cfs_str2num_check can be a very complex, for example we could
have 10.37.202.[59-61]. When simple_strtoul was used the first
number until we hit a non-digit character could be extracted
but testing showed that kstrtoul will not return any value if
it detects any non-digit character. Because of this change in
behavior a different approach is needed to handle these types
of complex strings. The use of sscanf was investigated to see
if it could be used to extract numbers from the passed in
string but unlike its glibc counterpart the kernel version
also just reported a error with no results if a non-digit value
in the string was encountered. Another possible approach would
be to use __parse_int directly but that class of functions is
not exported by the kernel. So the approach in this patch is
to scan the string passed in for the first non-digit character
and replace that character with a '\0' so kstrtoul can be used.
Once completed the original character is restored. We also
restore a original behavior that was removed to return 0 when
we encounter any non digit character before the nob count.

Linux-commit : 3ad6152d766039cb8ffd8633d971fb79402e5464

Change-Id: If73d605499e2f05224a14417b0029036db38c8ba
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32217
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10907 tests: improve nodemap_test_setup() 18/32018/5
Emoly Liu [Tue, 17 Apr 2018 03:08:04 +0000 (11:08 +0800)]
LU-10907 tests: improve nodemap_test_setup()

Do wait_nm_sync() after modifying default.admin_nodemap, otherwise
it might cause the latter sync to fail.

Also, this patch checks the number of clients for sanity-sec.sh
test_23a, 2 clients at least for nodemap c0 and c1.

Test-Parameters: trivial testlist=sanity-sec

Change-Id: I7c7d5929b2f2b5aaaa0e7e884fa296fdf3cc1b57
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-on: https://review.whamcloud.com/32018
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10855 llog: remove llog_cancel() 95/31795/5
John L. Hammond [Tue, 27 Mar 2018 17:17:10 +0000 (12:17 -0500)]
LU-10855 llog: remove llog_cancel()

Remove llog_cancel() and replace its sole use with a direct call to
llog_changelog_cancel(). Simplify the prototype of
llog_changelog_cancel() according to its use.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I32f5adf34ca10b918e5860e03ef0d8028c941697
Reviewed-on: https://review.whamcloud.com/31795
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10699 hsm: add local object storage to MDTs 58/31758/2
John L. Hammond [Fri, 23 Mar 2018 17:00:51 +0000 (12:00 -0500)]
LU-10699 hsm: add local object storage to MDTs

Add local object storage (LOS) setup and cleanup to MDTs. LOS will be
used for the HSM actions index.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I1f9590350beeb49ebf14bfc602923c4cc8f6a15f
Reviewed-on: https://review.whamcloud.com/31758
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-9325 obdclass: handle strings correctly in lmd_find_delimiter 58/26558/7
James Simmons [Tue, 1 May 2018 15:42:28 +0000 (11:42 -0400)]
LU-9325 obdclass: handle strings correctly in lmd_find_delimiter

The feedback from the pushing of the lmd_find_delimiter() work
upstream was not to directly parse the string data passed in.

Change the return value to a bool and return true when the
character is found. Currently even though this function is
named lmd_find_delimiter() it return 1 when nothing is found
which is counter intuitive. Use strcspn() to determine the
position where the first delimiter is found. Add a test to
make sure the mount string is valid.

Change-Id: I7ef53f29a6d3284acd01225c115712d3674e8435
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/26558
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-9230 ldlm: speed up preparation for list of lock cancel 27/26327/31
Yang Sheng [Mon, 25 Sep 2017 13:01:02 +0000 (21:01 +0800)]
LU-9230 ldlm: speed up preparation for list of lock cancel

Keep the skipped locks in lru list will cause serious
contention for ns_lock. Since we have to travel them
every time in the ldlm_prepare_lru_list(). So we will
use a cursor to record position that last accessed
lock in lru list.

Change-Id: Ibda36a90e54076cb785a65910b34300639b3e140
Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://review.whamcloud.com/26327
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10898 tests: enable conf-sanity 32a/32d 91/31991/7
Nathaniel Clark [Fri, 13 Apr 2018 16:27:22 +0000 (12:27 -0400)]
LU-10898 tests: enable conf-sanity 32a/32d

Stop zed service because it hold zfs module open.
Export zpools before trying to rmmod zfs.

Test-Parameters: trivial testlist=conf-sanity
Test-Parameters: trivial testlist=conf-sanity mdtfilesystemtype=zfs ostfilesystemtype=zfs
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Id1b891cadb91d9e3631a2d067c9d76a2965c34ff
Reviewed-on: https://review.whamcloud.com/31991
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10843 mgs: allow snapshot after MGS remount 79/31779/9
Nathaniel Clark [Mon, 16 Apr 2018 21:10:49 +0000 (17:10 -0400)]
LU-10843 mgs: allow snapshot after MGS remount

If the MGS is unmounted/mounted without the MDS reconnecting,
the fsdb FSNAME-barrier would not be created.

This change allows mgs_barrier_freeze (called from snapshot_create)
to attempt the create the required fsdb.

This adds a test to sanity-lsnapshot.sh for this issue.

Test-Parameters: testlist=sanity-lsnapshot mgtfilesystemtype=zfs ostfilesystemtype=zfs mdtfilesystemtype=zfs combinedmdsmgs=false ostcount=2 standalonemgs=true
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I2432cc0bdaddb07f024744065ca2ced77288fd7b
Reviewed-on: https://review.whamcloud.com/31779
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10472 osd-ldiskfs: add T10PI support for BIO 92/30792/7
Li Xi [Mon, 8 Jan 2018 01:09:46 +0000 (10:09 +0900)]
LU-10472 osd-ldiskfs: add T10PI support for BIO

This patch enables data integrity check in osd-ldiskfs when submiting
bio.

A fault injection mechanism is added to make sure the data integrity
check works well. On a OST with T10PI feature enabled, following
results are expected:

$ lctl set_param fail_loc=0x243
fail_loc=0x243
$ dd if=/dev/zero of=/mnt/lustre/file bs=1048576 count=100
dd: error writing '/mnt/lustre/file': Invalid or incomplete
multibyte or wide character
34+0 records in
33+0 records out
34603008 bytes (35 MB) copied, 0.510675 s, 67.8 MB/s

When doing fault injection, the write operation will wait until the
value is returned from BIO. Otherwise, returned error number may not
be returned to the application.

This implies a problem: because of the async submit of BIO, even the
OST has T10PI enabled, the application might not be able get error
notification when data corruption happens. However, there is nothing
we can do to improve this (unless write performance is not important),
because async commit is essential for good performance.

Change-Id: I76cc14b42feed835158100d35f65aedae0d79a5c
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/30792
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
20 months agoNew tag 2.11.52 2.11.52 v2_11_52 v2_11_52_0
Oleg Drokin [Wed, 16 May 2018 18:55:53 +0000 (14:55 -0400)]
New tag 2.11.52

Change-Id: Ie2c8d2a2d9d655f0646324780b8e02f0d031c4ba
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10805 libcfs: timer_setup() API changes 90/31790/4
Li Dongyang [Sun, 6 May 2018 14:28:58 +0000 (10:28 -0400)]
LU-10805 libcfs: timer_setup() API changes

Linux kernel 4.15 replaced setup_timer() with the new
interface timer_setup().
Introduce cfs wrappers to handle the API changes.

Linux-commit: e99e88a9d2b067465adaa9c111ada99a041bef9a

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ib79495f9ab7e955d6f72f1e9390cec0e23e2d641
Reviewed-on: https://review.whamcloud.com/31790
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10560 llite: remove assigning f_version in ll_readir. 68/32268/2
Yang Sheng [Thu, 3 May 2018 06:04:40 +0000 (14:04 +0800)]
LU-10560 llite: remove assigning f_version in ll_readir.

Backport upstream patch:
 commit: 8ec426c7019ed9600d9dc0cf758445adcdbfc14e
         lustre: don't set f_version in ll_readdir

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I296ce2cf8a7f4f91cf051f281206a7af60b7fb38
Reviewed-on: https://review.whamcloud.com/32268
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10988 lfsck: load object attr when prepare LFSCK request 45/32245/4
Fan Yong [Wed, 2 May 2018 16:00:40 +0000 (00:00 +0800)]
LU-10988 lfsck: load object attr when prepare LFSCK request

It will avoid empty attrs in LFSCK request (lfsck_namespace_req).
The patch also shows invalid mode for dt_mode_to_dft() for debug.

Other cleanup for lfsck_namespace_striped_dir_rescan().

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I2874160d9a0c9e3084d0d3d7f365940108c82018
Reviewed-on: https://review.whamcloud.com/32245
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10944 kernel: kernel update [SLES12 SP3 4.4.126-94.22] 39/32139/3
Bob Glossman [Mon, 23 Apr 2018 22:25:31 +0000 (15:25 -0700)]
LU-10944 kernel: kernel update [SLES12 SP3 4.4.126-94.22]

Update target, kernel_config, and ldiskfs files for new version
ldiskfs for sles12sp2 and sles12sp3 are no longer identical,
so revise some build files for sles12sp2 here too.

Test-Parameters: clientdistro=sles12sp3 testgroup=review-ldiskfs \
  mdsdistro=sles12sp3 ossdistro=sles12sp3 \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: If554b24644f763462002d81309f8edfa016210e8
Reviewed-on: https://review.whamcloud.com/32139
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10924 tests: conf-sanity test_116 server version check 50/32050/2
Artem Blagodarenko [Wed, 18 Apr 2018 15:52:49 +0000 (18:52 +0300)]
LU-10924 tests: conf-sanity test_116 server version check

Patch "LU-10520 mkfs: enable extents for big MDT" adds test
that check if extents are enabled with 64bit option. But
Lustre FS before this patch sets ^extents opiton. So interop
testing fails.

This patch adds version check for MDT to conf-sanity test_116.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Cray-bug-id: LUS-5812
Change-Id: Ia42597174944ec1627dfe0f6a7cf8149e674cecc
Reviewed-on: https://review.whamcloud.com/32050
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-10560 osd: bi_bdev is replaced by bi_disk 75/31975/6
Yang Sheng [Wed, 11 Apr 2018 08:15:20 +0000 (16:15 +0800)]
LU-10560 osd: bi_bdev is replaced by bi_disk

The member bi_bdev is replaced by bi_disk in struct bio.
 - upstream commit: 74d46992e0d9dee7f1f376de0d56d31614c8a17a
   block:  replace bi_bdev with a gendisk pointer and partitions index

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: Ice1fb53f8371fb744af5dbac6c076ce817770213
Reviewed-on: https://review.whamcloud.com/31975
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
20 months agoLU-10857 tests: allow to disable project quotas 96/31796/5
Elena Gryaznova [Tue, 27 Mar 2018 17:59:55 +0000 (20:59 +0300)]
LU-10857 tests: allow to disable project quotas

Project quotas is disabled by default, sanity-quota.sh
enables it always if project quota supported.
Patch adds the possibility to disable project quotas
for regular user/group quota testing by using
ENABLE_PROJECT_QUOTAS=false.

Test-Parameters: trivial testlist=sanity-quota envdefinitions=ENABLE_PROJECT_QUOTAS=false
Signed-off-by: Elena Gryaznova <c17455@cray.com>
Change-Id: I48176898fd940d66d0ebee4ef085a0bcece02ee9
Reviewed-on: https://review.whamcloud.com/31796
Tested-by: Jenkins
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-4423 ptlrpc: use workqueue for pinger 28/31728/3
Dmitry Eremin [Thu, 22 Mar 2018 17:02:08 +0000 (20:02 +0300)]
LU-4423 ptlrpc: use workqueue for pinger

lustre has a "pinger" kthread which periodically pings peers to ensure
all hosts are functioning.

This can more easily be done using a work queue.

The SVC_EVENT functionality to wake up the thread can be replaced with
mod_delayed_work().

Change-Id: I6064e9baea186323ab2eb0cca61ed23fcf8b55f7
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/31728
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-8066 mdc: move mdc-specific procfs files to sysfs 61/30961/6
Oleg Drokin [Thu, 12 Apr 2018 02:15:34 +0000 (22:15 -0400)]
LU-8066 mdc: move mdc-specific procfs files to sysfs

This moves max_rpcs_in_flight and max_pages_per_rpc to
/sys/fs/lustre/mdc/.../

Linux-commit : 2ee26222d497cf75eb9a02f324dee8b6b16e1e67

Add better error handling.

Change-Id: If6b50d2a3ef26325df0018ffb990a74ce2b75c10
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30961
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
20 months agoLU-8066 fld: move all files from procfs to debugfs 51/26651/14
James Simmons [Wed, 18 Apr 2018 02:45:34 +0000 (22:45 -0400)]
LU-8066 fld: move all files from procfs to debugfs

Port of the upstream client work done by Dmitry Eremin.

Linux-commit: 827650494fbe9390052502d0498c8b90b2e329ec

Move all server side procfs handling to debugfs.

Change-Id: I6b0311d148576dec1ab1ef7ca1afce0ab2e4ed3b
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Dmitry Eremin <dmiter4ever@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/26651
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>