Whamcloud - gitweb
fs/lustre-release.git
3 years agoLU-8364 ldiskfs: fixes for failover mode for RHEL7.3 77/27077/4
Yang Sheng [Thu, 11 May 2017 17:57:23 +0000 (01:57 +0800)]
LU-8364 ldiskfs: fixes for failover mode for RHEL7.3

When ldiskfs runs in failover mode with read-only disk, it
may lose part of allocation updates and fail while mounting
fs due to group descriptor checks before journal replay.
Don't produce panics with on-disk checks in read-only mode.

Include ext4-dont-check-before-replay and ext4-dont-check-in-ro
patches in the RHEL7.3 series.

The check-ro patch should be ported in to RHEL7.3.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: Icd4b899f9c48040c2453c4d759149e323fa33e18
Reviewed-on: https://review.whamcloud.com/27077
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9413 llite: llite.stat_blocksize param for fixed st_blksize 69/26869/11
Andrew Perepechko [Thu, 27 Apr 2017 16:01:23 +0000 (19:01 +0300)]
LU-9413 llite: llite.stat_blocksize param for fixed st_blksize

llite.stat_blocksize is added to allow configurable st_blksize
for stat(2). The latter is treated incorrectly by some
applications. For example, glibc pre-2.25 uses this value for
stdio buffering which completely ruins performance with random
reads.

The patch changes the behaviour of getattr rather than inode
initialization so that change of the setting causes immediate
effect without the need of reclaiming existing inodes.

The patch is similar to the patch from bz # 12739 by Aurelien
Degremont.

Change-Id: Ic6ab3fea40940892b740b8e87347dbb361619e8b
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-on: https://review.whamcloud.com/26869
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9140 nrs: calculate the measured rate according to dd runtime 10/27110/4
Qian Yingjin [Mon, 15 May 2017 14:04:48 +0000 (22:04 +0800)]
LU-9140 nrs: calculate the measured rate according to dd runtime

This patch help to calculate the measured IOPS by dd according
to dd runtime, instead of obtaining the result from the
performance data of dd output which value changes according to
its unit: 'MB/s', 'GB/s' or 'kB/s'.

Change-Id: Iace926d5c001f20b1c1d89ad6124d14a80316e86
Test-Parameters: trivial testlist=sanityn,sanityn,sanity
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/27110
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
3 years agoLU-6707 tests: Add ability to skip tests in POSIX 12/27012/3
James Nunez [Tue, 9 May 2017 16:36:27 +0000 (10:36 -0600)]
LU-6707 tests: Add ability to skip tests in POSIX

Although the POSIX test suite only has a single sub test,
there are some times when you may need to exclude this
test from being run.

We need to add the ability to skip subtests in the POSIX
test suite.

Test-Parameters: trivial
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Idb051bc7ced02c9f09190874b5e7dacb3c6ad6ce
Reviewed-on: https://review.whamcloud.com/27012
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-2049 grant: Fix grant interop with pre-GRANT_PARAM clients 53/25853/14
Nathaniel Clark [Tue, 7 Mar 2017 13:37:46 +0000 (08:37 -0500)]
LU-2049 grant: Fix grant interop with pre-GRANT_PARAM clients

Fix a performance regression with pre-GRANT_PARAM clients.
This patch brings the grant allocation logic back into line with how
servers prior to GRANT_PARAM support allocated grant space.

Test-Parameters: clientdistro=el6.6 clientjob=lustre-b2_7 clientbuildno=29 testlist=sanity-quota mdscount=1 mdtcount=1 ostcount=2 envdefinitions=SANITY_QUOTA_EXCEPT="36"
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I63fd5b91b499936b74d2dd21c1215e0b006af085
Reviewed-on: https://review.whamcloud.com/25853
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-7988 hsm: change the cdt event flags to a simple boolean 89/22789/14
Frank Zago [Tue, 27 Sep 2016 16:24:09 +0000 (12:24 -0400)]
LU-7988 hsm: change the cdt event flags to a simple boolean

Change the coordinator flag into a boolean event since only one bit is
now used. Do not include lustre_net.h as it's not necessary anymore.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I15dc10125af97f6244c98b421a262f320b79158b
Reviewed-on: https://review.whamcloud.com/22789
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoNew tag 2.9.58 2.9.58 v2_9_58 v2_9_58_0
Oleg Drokin [Tue, 23 May 2017 05:46:00 +0000 (01:46 -0400)]
New tag 2.9.58

Change-Id: Ifb7a33a06cb953de0472693cd7d892a7685d8983
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9303 doc: man pages for snapshot components 53/26453/5
Fan Yong [Sat, 13 May 2017 23:39:18 +0000 (07:39 +0800)]
LU-9303 doc: man pages for snapshot components

Including the separated lctl-barrier and lctl-snapshot man
pages, the lctl-fork-lcfg and lctl-erase-lcfg sections in
the lctl man page. Also some other small cleanup.

Test-parameters: trivial

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ied60b824afcfab5d2d030f12d39c9aa1bab0969a
Reviewed-on: https://review.whamcloud.com/26453
Tested-by: Jenkins
Reviewed-by: Joseph Gmitter <joseph.gmitter@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-4423 selftest: use jiffies_to_*() instead of cfs_duration_usec 19/27019/3
Arnd Bergmann [Tue, 9 May 2017 19:06:43 +0000 (15:06 -0400)]
LU-4423 selftest: use jiffies_to_*() instead of cfs_duration_usec

The cfs_duration_usec() function has a timeval as its output, which we
want to avoid in general because of the y2038 problem.

There are only two locations remaining in lustre, so we can for now
eplace one with jiffies_to_timeval(), which is a generic kernel function
that does the same thing, the other can just use jiffies_to_usecs()
and completely avoid the timeval.

This is not a full solution yet, but it's a small step that lets us
build a larger portion of lustre without this reference to timeval in
a header file, and avoid triggering automated checking tools that wants
to warn about timeval.

Linux-commit: 70513c5d17b9812cc218e8b4c7826ebb5f375d9a

Test-Parameters: trivial testlist=lnet-selftest

Change-Id: If39f4d4857a2b3210bb0dc634b8bb42530df83dc
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/27019
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9412 lfs: A invalid memory write in llapi_layout_to_lum 58/26858/4
Bobi Jam [Thu, 27 Apr 2017 10:29:09 +0000 (18:29 +0800)]
LU-9412 lfs: A invalid memory write in llapi_layout_to_lum

When lum is realloc(), the comp_v1 needs to be updated, otherwise
it could point to the old invalid memory area.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I4493e5f13bce22dae07200bada14ba2349635890
Reviewed-on: https://review.whamcloud.com/26858
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9330 osp: make variables match proc tunables 71/26571/7
Andreas Dilger [Tue, 4 Apr 2017 23:21:21 +0000 (17:21 -0600)]
LU-9330 osp: make variables match proc tunables

Make the osp_syn_* and osp_max_rpcs_* function names consistent with
the tunable names in proc so that they are easier to find.

osp_max_rpcs_in_prog_seq_show()->osp_max_rpcs_in_progress_seq_show()
osp_max_rpcs_in_prog_seq_write()->osp_max_rpcs_in_progress_seq_write()
osp_syn_changes_seq_show()->osp_sync_changes_seq_show()
osp_syn_changes_seq_write()->osp_sync_changes_seq_write()
osp_syn_in_flight_seq_show()->osp_sync_rpcs_in_flight_seq_show()
osp_syn_in_prog_seq_show()->osp_sync_rpcs_in_progress_seq_show()

This entails renaming the osp_syn_* function names consistent with the
function names, so change _syn_ -> _sync_, _prog_ -> _progress_,
and _rpc_ -> _rpcs_.

Make osp_sync_check_for_work() a proper function rather than having
both a macro and __osp_sync_check_for_work() as a function that just
calls the macro.

Remove unused field opd_syn_sync_in_progress.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I537e04fdce7eea194f1a9567b6e5c3ccee2cab07
Reviewed-on: https://review.whamcloud.com/26571
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-6401 uapi: turn lustre_param.h into a proper UAPI header 25/24325/13
James Simmons [Wed, 3 May 2017 20:00:21 +0000 (16:00 -0400)]
LU-6401 uapi: turn lustre_param.h into a proper UAPI header

Move all the kernel specific function prototypes from
lustre_param.h into obd_config.h which is a kernel only
header. The inline functions lustre_is_*_valid are used
only by user land so we can remove them. Remove the user
land error checking with lustre_is_*_valid() since its
the job of the kernel to validate the date passed in.
The libcfs.h header shouldn't be exposed to user land
so remove it.

Change-Id: I6b6b4fe8f4d6799608c0e74318afecb85168ad54
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/24325
Tested-by: Jenkins
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9019 libcfs: migrate watchdog to 64 bit time 94/26994/2
James Simmons [Mon, 8 May 2017 20:30:12 +0000 (16:30 -0400)]
LU-9019 libcfs: migrate watchdog to 64 bit time

The watchdog computes timestamps from timeval, which overflows
in 2038 on 32-bit systems. This changes the results output to
timespec64 type to avoid the overflow. The lcw_last_touched
field is changed to ktime_t since better than seconds resolution
is wanted. For lcw_last_watchdog it is changed to time64_t since
we only care about seconds percision. Both changes will avoid
the 2038 overflow issue and the HZ variablity across platforms
for jiffies.

Change-Id: If81f60a408db956540c563fc695f729cd67cdd9e
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/26994
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8659 tests: use runcon in sanity-selinux 62/23962/14
Sebastien Buisson [Fri, 25 Nov 2016 16:49:08 +0000 (17:49 +0100)]
LU-8659 tests: use runcon in sanity-selinux

In order to switch to other SELinux context, use runcon
instead of 'ssh user@localhost'.
This requires the SELinux policy to allow transitions from
unconfined_t to user_t and guest_t:
allow unconfined_r guest_r;
allow unconfined_r user_r;

Test-Parameters: trivial clientselinux testlist=sanity-selinux
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8f45dfe71d95d03af0c3577f46c91b47232d958a
Reviewed-on: https://review.whamcloud.com/23962
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9491 llite: Handle multi-vec append write correctly 86/27086/3
Jinshan Xiong [Fri, 12 May 2017 17:13:00 +0000 (13:13 -0400)]
LU-9491 llite: Handle multi-vec append write correctly

http://review.whamcloud.com/20256 cleaned up LLITE code by using
iov_iter for old and new kernels. However, it introduced a bug to
append write with multiple iovec buffers where each buffer was written
separately therefore each of them saw the same file size.

Append write with multiple buffers has to be performed atomically.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I89d59e924c7029fedc096d724a946763c7f7006d
Reviewed-on: https://review.whamcloud.com/27086
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9384 ldiskfs: port upstream patches for changing extra isize 45/27045/4
Yang Sheng [Wed, 10 May 2017 17:24:06 +0000 (01:24 +0800)]
LU-9384 ldiskfs: port upstream patches for changing extra isize

Port upstream 5 patches for changing extra isize as below:

commit d0141191a20289f8955c1e03dad08e42e6f71ca9
"ext4: fix xattr shifting when expanding inodes"
commit 418c12d08dc64a45107c467ec1ba29b5e69b0715
"ext4: fix xattr shifting when expanding inodes part 2"
commit 443a8c41cd49de66a3fda45b32b9860ea0292b84
"ext4: properly align shifted xattrs when expanding inodes"
commit e3014d14a81edde488d9a6758eea8afc41752d2d
"ext4: fixup free space calculations when expanding inodes"
commit 94405713889d4a9d341b4ad92956e4e2ec8ec2c2
"ext4: replace bogus assertion in ext4_xattr_shift_entries()"

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I01414bcc91d8f57ca72281916d35536d3926e570
Reviewed-on: https://review.whamcloud.com/27045
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9472 lnd: Fix FastReg map/unmap for MLX5 15/27015/2
Doug Oucharek [Tue, 9 May 2017 18:07:37 +0000 (11:07 -0700)]
LU-9472 lnd: Fix FastReg map/unmap for MLX5

The FastReg support in ko2iblnd was not unmapping pool items
causing the items to leak.  In addition, the mapping code
is not growing the pool like we do with FMR.

This patch makes sure we are unmapping FastReg pool elements
when we are done with them.  It also makes sure the pool
will grow when we depleat the pool.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I4b4ba4de72941b38c4115a00a992cfd1e78e9e49
Reviewed-on: https://review.whamcloud.com/27015
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9340 lov: Initialize component extents unconditionally 97/27097/2
Jinshan Xiong [Fri, 12 May 2017 05:23:20 +0000 (22:23 -0700)]
LU-9340 lov: Initialize component extents unconditionally

LOV should initialize extent of components even they are not
instantiated because lov_io_iter_init() relies on this to issue
write intent RPC.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Iaa452e3a0f3186cfea77866b7a1fabbc9feeec53
Reviewed-on: https://review.whamcloud.com/27097
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9522 test: sanity 27z fix update local variable 63/27163/3
James Nunez [Wed, 17 May 2017 15:55:23 +0000 (09:55 -0600)]
LU-9522 test: sanity 27z fix update local variable

sanity test_27z is failing for servers running ZFS. The
problem is that filter_fid is not being updated for each
stripe of the file being reviewed in check_seq_oid().

For each stripe, the variable that holds the filter_fid,
ff, needs to be reinitialized.

Test-Parameters: trivial testgroup=review-zfs-part-1

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I550720a0a2bad22c4ffa98ca84b5f085eb220051
Reviewed-on: https://review.whamcloud.com/27163
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-9514 test: sanity 51f add to ALWAYS_EXCEPT 89/27189/2
James Nunez [Thu, 18 May 2017 14:21:31 +0000 (08:21 -0600)]
LU-9514 test: sanity 51f add to ALWAYS_EXCEPT

sanity test 51f is failing many times for servers with
ZFS file systems. This test sould be skipped for ZFS
testing until a solution is found.

Test-Parameters: trivial testgroup=review-zfs-part-1
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ia4b035d158cdf8e2179f74c7f1f6b0b382a93663
Reviewed-on: https://review.whamcloud.com/27189
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoRevert "LU-9439 scripts: Change behavior of lustre_rmmod" 81/27181/2
Oleg Drokin [Thu, 18 May 2017 05:59:12 +0000 (05:59 +0000)]
Revert "LU-9439 scripts: Change behavior of lustre_rmmod"

This is causing widespread testing problems partially documented in LU-9524 and also in LU-9439

This reverts commit 645153be3eb1fd8c634717507f73d85625d1b84a.

Change-Id: Ie20b9c7a52aec5153566ad0712689852545a0948
Reviewed-on: https://review.whamcloud.com/27181
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-7988 hsm: change coordinator start/stop mechanisms 67/22667/24
Frank Zago [Wed, 21 Sep 2016 19:17:19 +0000 (15:17 -0400)]
LU-7988 hsm: change coordinator start/stop mechanisms

Instead of using cdt_state and cdt_thread.t_flags to keep the state of
the coordinator, only use cdt_state. cdt_thread.t_flags is now only
used to signal the coordinator, not to stop it. cdt_state is used to
control the coordinator behaviour.

Split mdt_hsm_cdt_stop() into 2 functions. One to actually signal the
coordinator to stop, and another one to cleanup its ressources. The
coordinator is now responsible to clean its own ressources instead of
having two different paths depending on whether Lustre is shutting
down, or a user is stopping it through /proc.

Protect cdt_state with a spin lock, and add a transition table to
catch invalid transitions.

Removed mdt_opts.mo_coordinator as it is just a subset of cdt_state.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: frank zago <fzago@cray.com>
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I7b0a878792d287b781578a7afe4e2c2cf7dec5cc
Reviewed-on: https://review.whamcloud.com/22667
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Jean-Baptiste Riaux <riaux.jb@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9439 scripts: lnet systemd service 25/26925/10
Giuseppe Di Natale [Wed, 21 Dec 2016 21:38:04 +0000 (16:38 -0500)]
LU-9439 scripts: lnet systemd service

Create an lnet systemd service which properly
brings lnet up and down.

Test-Parameters: trivial
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: I584827dbb1fc4e0999b7f107bb4250678b7b68e8
Reviewed-on: https://review.whamcloud.com/26925
Tested-by: Jenkins
Reviewed-by: Ned Bass <bass6@llnl.gov>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9415 lfsck: quiet noisy console message 59/27059/3
Andreas Dilger [Thu, 11 May 2017 07:35:10 +0000 (01:35 -0600)]
LU-9415 lfsck: quiet noisy console message

Mark LFSCK console startup message D_LFSCK unless it is an error.

Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ia3e8bc1cfbfa8a19c56626feb7e88e4c544ebbe5
Reviewed-on: https://review.whamcloud.com/27059
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9477 tests: check correct handling of dead object 18/27018/3
Bruno Faccini [Fri, 24 Jul 2015 16:16:36 +0000 (18:16 +0200)]
LU-9477 tests: check correct handling of dead object

Since original fix to allow for correct handling of dead/removed
object has been finally implemented LU-9312
This patch now only introduces related tests of correct handling
in sanity-hsm test suite.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I1e3ce88909e8820657ee5753fea5d6757c354574
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/27018
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9464 hsm: use OBD_ALLOC_LARGE() for hsm_scan_data array 14/27014/2
John L. Hammond [Tue, 9 May 2017 17:57:22 +0000 (12:57 -0500)]
LU-9464 hsm: use OBD_ALLOC_LARGE() for hsm_scan_data array

In mdt_coordinator() use OBD_ALLOC_LARGE() rather than OBD_ALLOC() for
the hsm_scan_data request array.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ieb648dfb92e6019ab316c7643aa0c0b5cf1d86f7
Reviewed-on: https://review.whamcloud.com/27014
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9119 lnet: Fix deleting peers from YAML 01/27001/3
sharmaso [Thu, 2 Mar 2017 22:00:51 +0000 (14:00 -0800)]
LU-9119 lnet: Fix deleting peers from YAML

Deleting peers with lnetctl command
"import --del < config.yaml" throws error

The above command deletes prim_nid first
and then tries deleting the other nids which
results in error, since deleting the primary_nid
deletes the entire peer and then after that we try
to delete non-existent NIDs.

The behavior should be if the primary_nid is
present in the list of NIDs then delete the
entire peer, otherwise delete only the NIDs
specified within the peer

Test-Parameters: trivial
Signed-off-by: Sonia Sharma <sonia.sharma@intel.com>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I55114fca4d332c950872bd446e02e4f0904ee716
Reviewed-on: https://review.whamcloud.com/27001
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9315 pfl: static analysis issues 03/26503/3
Bobi Jam [Tue, 11 Apr 2017 17:34:32 +0000 (01:34 +0800)]
LU-9315 pfl: static analysis issues

1. Buffer Overflow - Non-null Terminated String
 * lustre/utils/liblustreapi_layout.c: in llapi_layout_expected,
   Buffer overflow of 'donor_path' due to non null terminated string
   'donor_path'
2. Use of Freed Memory by Pointer
 * lustre/utils/liblustreapi_layout.c: in llapi_layout_comp_del,
   Object 'comp' was dereferenced at line 1770 after being freed by
   calling '__llapi_comp_free' at line 1769
3. Result of function that may return NULL will be dereferenced
 * lustre/lov/lov_pack.c: in lov_unpackmd, Pointer
   'lsm_op_find(magic)' returned from call to function 'lsm_op_find'
   at line 334 may be NULL and will be dereferenced at line 334.
4. Uninitialized Variable - possible
 * lustre/utils/liblustreapi.c: in find_check_comp_options, 'ret'
   might be used uninitialized in this function. Also there are 2
   similar errors on lines 3243, 3264.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I397737affeaa409e97b0ed859efcd7ff2840cc89
Reviewed-on: https://review.whamcloud.com/26503
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
3 years agoLU-9241 llite: ASSERTION( de->d_op == &ll_d_ops ) failed 09/26109/3
Andriy Skulysh [Wed, 22 Mar 2017 10:21:15 +0000 (12:21 +0200)]
LU-9241 llite: ASSERTION( de->d_op == &ll_d_ops ) failed

The assertion can be checked while dentry isn't fully
initialized in HAVE_DCACHE_LOCK case.

Reorder assignments to make assertion always true.
Later ll_dentry_data bitfields modification should be
protected by a spinlock.

Change-Id: I4a1ea42b9fe1c9398539a1a241b8a191dba5903c
Seagate-bug-id: MRP-4257
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-on: https://review.whamcloud.com/26109
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9202 lfsck: skip unavailable targets when sync failures 31/25931/3
Fan Yong [Fri, 4 Nov 2016 16:08:52 +0000 (00:08 +0800)]
LU-9202 lfsck: skip unavailable targets when sync failures

It is normal that some target (MDT or OST) may become unavailable
when the LFSCK engine tries to sync failures with related target.
Under such case, just skip such target without LBUG.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib677e6fec121e946caafb34ebc71b8b3068bd6f5
Reviewed-on: https://review.whamcloud.com/25931
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9183 llite: remove struct file on stack in ll_setxattr() 93/25893/11
Dmitry Eremin [Tue, 7 Mar 2017 18:28:42 +0000 (21:28 +0300)]
LU-9183 llite: remove struct file on stack in ll_setxattr()

Backport upstream commit c139f3cef36291902d7a29a334acd25a1251cd9d

Change-Id: I3a7ddf9a2abed8692a203c4d88b9a8ecdaeb9a1a
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/25893
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9183 ptlrpc: handle changes in struct group_info 38/25838/12
Dmitry Eremin [Fri, 3 Mar 2017 17:51:22 +0000 (20:51 +0300)]
LU-9183 ptlrpc: handle changes in struct group_info

In commit 81243eacfa400f5f7b89f4c2323d0de9982bb0fb few members of
struct group_info where changed.

Change-Id: Id8e5b483cf389eff41c63942cfd78303fff67a8c
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/25838
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8730 test: Remove duplicate $dev from conf-sanity test_83 49/23249/4
Sushant Mane [Fri, 12 May 2017 04:09:32 +0000 (09:39 +0530)]
LU-8730 test: Remove duplicate $dev from conf-sanity test_83

Remove duplicate $dev variable passed to add function, which
passes it as argument to mkfs.lustre. mkfs.lustre utility in
Lustre 2.1 throws error when multiple device names passed to it.
Skip test if OSTs are not ldiskfs-based.

Test-Parameters: trivial envdefinitions=ONLY=83 testlist=conf-sanity
Change-Id: I56535aac27e25bbf359bda835b24a56a4101c8ed
Seagate-bug-id: MRP-2239
Signed-off-by: Ashish Maurya <ashish.maurya@seagate.com>
Signed-off-by: Sushant Mane <sushant.mane@seagate.com>
Reviewed-on: http://es-gerrit.xyus.xyratex.com:8080/9425
Tested-by: Jenkins
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Reviewed-by: Ujjwal Lanjewar <ujjwal.lanjewar@seagate.com>
Tested-by: Elena V. Gryaznova <elena.gryaznova@seagate.com>
Signed-off-by: Ashish Maurya <ashish.maurya@seagate.com>
Reviewed-on: https://review.whamcloud.com/23249
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8589 osd: remove "object" from method function names 42/22542/6
Andreas Dilger [Thu, 10 Mar 2016 17:33:11 +0000 (10:33 -0700)]
LU-8589 osd: remove "object" from method function names

Remove "object_" from various OSD API dt_object_operations
method functions (create, destroy, {read,write}_{lock,unlock},
write_locked, ref_add, ref_del, and declare_ variants of same)
to match the actual method names so the code is easier to trace.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I18860bc965f958d6ba2e595882e7e56ca00cab07
Reviewed-on: https://review.whamcloud.com/22542
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8359 ldlm: Wrong evict during failover 14/21114/7
Andriy Skulysh [Fri, 5 Aug 2016 11:25:02 +0000 (14:25 +0300)]
LU-8359 ldlm: Wrong evict during failover

There is a race between setting obd_fail & OBD_OPT_FAILOVER.
tgt_client_del() checks only OBD_OPT_FAILOVER,
class_disconnect_export_list() is called with flags copied
from obd, and umount can start while disconnect
is in progress.

It is better to rely only on obd_fail.
We shouldn't evict during failover at all, it should
be handled on a new server.
Such wrong evict can happen when server can't send CP AST
to the client because failover has started already.

Change-Id: I649d35d180b2239fe558b375872d3805629968a9
Seagate-bug-id: MRP-3604
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: https://review.whamcloud.com/21114
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9305 osd-zfs: arc_buf could be non-pagesize aligned 95/26895/8
Jinshan Xiong [Mon, 1 May 2017 04:10:51 +0000 (21:10 -0700)]
LU-9305 osd-zfs: arc_buf could be non-pagesize aligned

ZFS only guarantees PAGE_SIZE alignment to arc_buf_t only when
the block size is not less than (PAGE_SIZE << 2).

The patch for ZFS https://github.com/zfsonlinux/zfs/pull/6084 fixes
the alignment problem, buf Lustre still needs a fix to handle
the problem in case it's running old ZFS release.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I6fd17d7b20499ec0406a3e10cebf6882b92a730f
Reviewed-on: https://review.whamcloud.com/26895
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-8670 tests: test_115 Fixes & Improvements 50/22950/7
Arshad Hussain [Fri, 29 Jan 2016 08:36:40 +0000 (14:06 +0530)]
LU-8670 tests: test_115 Fixes & Improvements

This patch address below points in sanity test_115.
- Do not skip the test when OST is remote.
- Fail Test if ll_ost_io thread has not started.
- Fail Test if ll_os_io  post count is less then
equal to ll_ost_io pre count.
- Fail Test if the count of started thread more
than thread_max.

Test-Parameters: trivial envdefinitions=SLOW=yes testlist=sanity
Signed-off-by: Arshad Hussain <arshad.hussain@seagate.com>
Signed-off-by: Elena V. Gryaznova <elena.gryaznova@seagate.com>
Seagate-bug-id: MRP-981
Change-Id: I51e434a15ee78db59ea5c091b3ec7d7fcf0bab64
Reviewed-by: Ashish Purkar <ashish.purkar@seagate.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
Reviewed-on: https://review.whamcloud.com/22950
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-6952 mount: Mount options parsing problem 06/15906/26
Hongchao Zhang [Thu, 13 Apr 2017 10:51:23 +0000 (18:51 +0800)]
LU-6952 mount: Mount options parsing problem

The code which parse the mount option expect that the default
mount options come first and then the user given options. But
actually the default options are appended after the user given
option. Because of this user given options are cleared.
Following code fix this bug.

Added a conf-sanity test 104 to verify this fix.

Seagate-bug-id: MRP-2193
Signed-off-by: Pratik Shinde <pratik.shinde@seagate.com>
Signed-off-by: Rahul Deshmukh <rahul.deshmukh@seagate.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Change-Id: I1a22e5d559e01d8bcba0ceebba5544e92d39bb19
Reviewed-on: https://review.whamcloud.com/15906
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Grégoire Pichon <gregoire.pichon@bull.net>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9425 lnd: Turn on 2 sges by default 11/26911/2
Doug Oucharek [Mon, 1 May 2017 22:29:49 +0000 (15:29 -0700)]
LU-9425 lnd: Turn on 2 sges by default

Currently, the fix in LU-5718 which allows for multiple sges
to deal with RDMA fragmentation is turned off by deafult
(set to 1).  This patch changes the default to 2 so
RDMA fragmentation is fixed by default.

Test-Parameters: trivial
Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I8a29a7b32ababd37cbc471664083362bc7253d97
Reviewed-on: https://review.whamcloud.com/26911
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-7584 tests: clean up sanity test_129 74/26874/2
Andreas Dilger [Thu, 27 Apr 2017 04:07:41 +0000 (22:07 -0600)]
LU-7584 tests: clean up sanity test_129

Clean up sanity test_129 to follow Lustre script coding style.
There is no need to create so many files to verify that the
directory size limit has been removed.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ib80bd8a70386b3e8881f8ca3d417a8be1becab07
Reviewed-on: https://review.whamcloud.com/26874
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9362 lfs: getstripe print last init-ed component info 47/26947/7
Bobi Jam [Thu, 4 May 2017 14:45:27 +0000 (22:45 +0800)]
LU-9362 lfs: getstripe print last init-ed component info

If getstripe asks for part of stripe information of a PFL file,
lfs should print the last instantiated component stripe info.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Iebdc4d44e47a25ef8445fda240b25975f8dbcbbe
Reviewed-on: https://review.whamcloud.com/26947
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-8998 tests: improve sanity.sh::check_seq_oid() 69/26569/2
Andreas Dilger [Thu, 30 Mar 2017 04:07:18 +0000 (22:07 -0600)]
LU-8998 tests: improve sanity.sh::check_seq_oid()

Enable the use of debugfs to decode the OST "fid" xattr to
avoid an unmount/remount cycle of the OST if possible.

Improve sanity.sh::check_seq_oid() in test_27z to be agnostic
about field position, and improve overall code style to avoid
external command usage.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ia971e33cc3d8a5e4ca6f821116f12c0a72bcab07
Reviewed-on: https://review.whamcloud.com/26569
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8998 tools: support negative flags 90/26490/7
Niu Yawei [Fri, 31 Mar 2017 03:01:46 +0000 (23:01 -0400)]
LU-8998 tools: support negative flags

- Make 'lfs setstripe --component-flags' support negative flags
  "^init".
- Change llapi_layout_file_comp_del() to accept both 'id' and
  'flags', 'id' must be real ID, 'flags' can be negative flags.
- Fix swab defect in lod_declare_layout_del().
- Update man pages and test scripts accordingly.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I9b66492133f56eabc928dbbb41bb8eb5627be095
Reviewed-on: https://review.whamcloud.com/26490
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
3 years agoLU-7473 acl: increase ACL entries limitation 90/19790/29
Fan Yong [Sun, 2 Oct 2016 04:05:16 +0000 (12:05 +0800)]
LU-7473 acl: increase ACL entries limitation

Originally, the limitation of ACL entries is 32, that is not
enough for some use cases. In fact, restricting ACL entries
count is mainly for preparing the RPC reply buffer to receive
the ACL data. So we cannot make the ACL entries count to be
unlimited. But we can enlarge the RPC reply buffer to hold
more ACL entries. On the other hand, MDT backend filesystem
has its own EA size limitation. For example, for ldiskfs case,
if large EA enable, then the max ACL size is 1048492 bytes;
otherwise, it is 4012 bytes. For ZFS backend, such value is
32768 bytes. With such hard limitation, we can calculate how
many ACL entries we can have at most. This patch increases
the RPC reply buffer to match such hard limitation. For old
client, to avoid buffer overflow because of large ACL data
(more than 32 ACL entries), the MDT will forbid the old client
to access the file with large ACL data. As for how to know
whether it is old client or new, a new connection flag
OBD_CONNECT_LARGE_ACL is used for that.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I557d2696a8df84bf83fcd3955292227d7aa0284c
Reviewed-on: https://review.whamcloud.com/19790
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9096 test: add sanity 253 to ALWAYS_EXCEPT 13/27013/2
James Nunez [Tue, 9 May 2017 17:39:03 +0000 (11:39 -0600)]
LU-9096 test: add sanity 253 to ALWAYS_EXCEPT

Discussion in the LU-9096 ticket questions if sanity
test 253 is a valid/correct test. Since tes 253 is failing
intermittently for ZFS servers add the test to the
ALWAYS_EXCEPT list until the test can be improved.

Since the correctness of the test is being questioned,
skip the test for all server file systems, not just ZFS.

Test-Parameters: trivial
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: I14cffb288b4d338ff009f4b57a19d5a5f4c516ff
Reviewed-on: https://review.whamcloud.com/27013
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9458 ptlrpc: handle case of epp_free_pages <= PTLRPC_MAX_BRW_PAGES 16/27016/4
Bob Glossman [Tue, 9 May 2017 18:16:21 +0000 (11:16 -0700)]
LU-9458 ptlrpc: handle case of epp_free_pages <= PTLRPC_MAX_BRW_PAGES

Current code where page_pools.epp_free_pages is too small isn't
handled correctly.  This mod fixes those instances.

Change-Id: I9242cbdeba9f5ea4836189f0049ae3617cd665f7
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-on: https://review.whamcloud.com/27016
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
3 years agoLU-8294 gss: quiet cache_check return ENOENT warning 76/26976/2
Kit Westneat [Sat, 6 May 2017 22:58:52 +0000 (18:58 -0400)]
LU-8294 gss: quiet cache_check return ENOENT warning

gss_svc_upcall_handle_init spews a lot of error messages if GSS
support has been compiled, but GSS is not being used. The current
error message is not very useful, so this patch moves it to CDEBUG.

Test-Parameters: trivial
Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Change-Id: Id58e3a9b5c702f8669f6659c6ff8e577391484de
Reviewed-on: https://review.whamcloud.com/26976
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9456 lnd: Change sock_create() to sock_create_kern() 58/26958/6
Doug Oucharek [Thu, 4 May 2017 23:29:59 +0000 (16:29 -0700)]
LU-9456 lnd: Change sock_create() to sock_create_kern()

Changing all calls in the ksocklnd from sock_create() to
sock_create_kern().

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: Ib8b175e73478b1edfb5e8cd3491e589e8267f52a
Reviewed-on: https://review.whamcloud.com/26958
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9439 scripts: Change behavior of lustre_rmmod 59/26959/3
Prakash Surya [Wed, 20 Feb 2013 17:18:13 +0000 (09:18 -0800)]
LU-9439 scripts: Change behavior of lustre_rmmod

The lustre_rmmod script was modified to take an arbitrary list of
modules and try to remove them and any modules dependent on them.
Previously its behavior was to always remove the libcfs module,
along with either the ldiskfs or another module passed as a
parameter.

The old interface was roughly maintained. Any of the following
commands will remove the ldiskfs, libcfs, and all dependent modules:

    $ lustre_rmmod
    $ lustre_rmmod ldiskfs
    $ lustre_rmmod ldiskfs libcfs

The benefit now, is that any other list of modules can be specified
without removing libcfs. For example, the following command will only
remove ptlrpc and its dependent modules (leaving libcfs intact):

    $ lustre_rmmod ptlrpc

The lnet init script was modified to perform a lustre_rmmod ptlrpc
before performing an lctl network down. By removing the ptlrpc
module, we can ensure that lnet is not in use. This will help
systems running lustre to shut down cleanly.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: If9681ebd785b6b5920e7d1553333ad8d6120de56
Reviewed-on: https://review.whamcloud.com/26959
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Tested-by: Maloo <hpdd-maloo@intel.com>
3 years agoLU-8346 obdclass: guarantee all keys filled 99/26099/4
Hongchao Zhang [Sat, 15 Apr 2017 20:04:48 +0000 (04:04 +0800)]
LU-8346 obdclass: guarantee all keys filled

In keys_fill, the key_set_version could be changed after
the keys are filled, then the keys in this context won't
be refilled by the following lu_context_refill for its
verion is equal to the current key_set_verion.

Change-Id: Ibaa49ec6e95ffee902cfa98f18ac9e66f2127bf1
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/26099
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-7567 utils: fix timestamp printing in lfs_changelog() 06/18006/5
Andreas Dilger [Sat, 4 Mar 2017 06:50:12 +0000 (23:50 -0700)]
LU-7567 utils: fix timestamp printing in lfs_changelog()

lfs_changelog() should use ".09d" when printing the fractional part
instead of ".06d" for accurate timestamps.

Test-Parameters: trivial
Signed-off-by: akam kumar bharathi <azurelustre@gmail.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I2b219f588de688ba06b3c8cf3ccd255ed845c45e
Reviewed-on: https://review.whamcloud.com/18006
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-7108 test: Remove sanityn tests from ALWAYS_EXCEPT list 83/16383/20
Saurabh Tandan [Thu, 4 May 2017 20:44:06 +0000 (13:44 -0700)]
LU-7108 test: Remove sanityn tests from ALWAYS_EXCEPT list

Removing sanityn tests from Always_Except list. All the
tests on sanityn's ALWAYS_EXCEPT list ran about 40 times
with no errors except for test_28 and test_29.

Since tests 14b, 19 and 35 ran with no failures, therefore
removing them from ALWAYS_EXCEPT list.

Test-Parameters: trivial testlist=sanityn,sanityn,sanityn
Test-Parameters: testgroup=review-zfs-part-1
Test-Parameters: testgroup=review-dne-part-1
Signed-off-by: Saurabh Tandan <saurabh.tandan@intel.com>
Change-Id: Icac5201ba27f010707f536287fa4478ff5afcfd6
Reviewed-on: https://review.whamcloud.com/16383
Tested-by: Jenkins
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9324 lnet: move cyaml.h under lnet/include/ 59/26859/6
Bobi Jam [Mon, 24 Apr 2017 09:01:46 +0000 (17:01 +0800)]
LU-9324 lnet: move cyaml.h under lnet/include/

So that it can be included in other projects.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I980578742fd194e2464870f1ab8d6a9ae8deb9e2
Reviewed-on: https://review.whamcloud.com/26859
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9372 ptlrpc: drain "ptlrpc_request_buffer_desc" objects 52/26752/5
Bruno Faccini [Thu, 20 Apr 2017 10:10:28 +0000 (12:10 +0200)]
LU-9372 ptlrpc: drain "ptlrpc_request_buffer_desc" objects

Prior to this patch, new "ptlrpc_request_buffer_desc"
could be additionally allocated upon need by
ptlrpc_check_rqbd_pool(), but will never be freed
until OST umount/stop by ptlrpc_service_purge_all().
Now try to release some of them when possible.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ieb72bab202e3f3d957cd2e6ce06bb56c4e21b1bd
Reviewed-on: https://review.whamcloud.com/26752
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9357 pfl: should inherit pool from previous layout comp 50/26750/5
Bobi Jam [Thu, 20 Apr 2017 04:42:41 +0000 (12:42 +0800)]
LU-9357 pfl: should inherit pool from previous layout comp

llapi_layout_comp_add() should inherit pool from previous component
layout so that user doesn't need to specify the same pool name
everytime a new component is added if necessary.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I5437112d9eb51b7c4a3e5e67b65302df15eca681
Reviewed-on: https://review.whamcloud.com/26750
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
3 years agoLU-9324 lfs: output stripe info in YAML format 08/26708/8
Bobi Jam [Tue, 18 Apr 2017 09:00:55 +0000 (17:00 +0800)]
LU-9324 lfs: output stripe info in YAML format

Add --yaml|-y for lfs getstripe to output plain/PFL layout file's
stripe information in YAML format.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Ic6a3bf3572346f06ed90f75945acf374efd0fd96
Reviewed-on: https://review.whamcloud.com/26708
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-5788 test: fix the cmd to getfree space 31/26031/12
Hongchao Zhang [Sat, 15 Apr 2017 21:54:47 +0000 (05:54 +0800)]
LU-5788 test: fix the cmd to getfree space

The cmd to get free space in run_tar.sh and run_dd.sh is obsolete
after the patch https://review.whamcloud.com/24228 was landed.

Test-Parameters: trivial testlist=recovery-double-scale clientdistro=el7 serverdistro=el7 \
osscount=2 mdscount=2 ostcount=7 mdtcount=1 clientcount=4 ostfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs failover=true iscsi=1

Change-Id: I442825a94102eff68aaba411b57f86d65441ae49
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: https://review.whamcloud.com/26031
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-8943 lnd: Enable Multiple OPA Endpoints between Nodes 68/25168/15
Doug Oucharek [Tue, 31 Jan 2017 00:30:19 +0000 (16:30 -0800)]
LU-8943 lnd: Enable Multiple OPA Endpoints between Nodes

OPA driver optimizations are based on the MPI model where it is
expected to have multiple endpoints between two given nodes. To
enable this optimization for Lustre, we need to make it possible,
via an LND-specific tuneable, to create multiple endpoints and to
balance the traffic over them.

Both sides of a connection must have this patch for it to work.
Only the active side of the connection (usually the client)
needs to have the new tuneable set > 1.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: Iaf3b49bf0aecf79cb67eb1bacba1940cd811b2fb
Reviewed-on: https://review.whamcloud.com/25168
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9140 nrs: add some debug log for NRS TBF 01/26701/8
Qian Yingjin [Tue, 18 Apr 2017 07:51:14 +0000 (15:51 +0800)]
LU-9140 nrs: add some debug log for NRS TBF

This patch adds some useful debug log into NRS TBF.

Test-Parameters: trivial testlist=sanityn,sanityn,sanityn
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I682451df6b1bf4f81c9e46ca272fc14d372cddca
Reviewed-on: https://review.whamcloud.com/26701
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
3 years agoLU-9312 hsm: release restore lock without object 42/26742/3
John L. Hammond [Wed, 19 Apr 2017 15:49:23 +0000 (10:49 -0500)]
LU-9312 hsm: release restore lock without object

In the error path of mdt_hsm_agent_send(), the object to be restored
is not needed to release the layout lock so don't find it.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ic8e82a5ccd8f83cac2d7ebc3d4b800f8a4563ca6
Reviewed-on: https://review.whamcloud.com/26742
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9447 o2iblnd: Check for 2 arg ib_alloc_pd 34/26934/3
Chris Horn [Tue, 2 May 2017 21:42:31 +0000 (16:42 -0500)]
LU-9447 o2iblnd: Check for 2 arg ib_alloc_pd

A flags argument was added to ib_alloc_pd() in Linux 4.9 commit
ed082d36a7b2c27d1cda55fdfb28af18040c4a89. The fix for LU-9026, Lustre
commit e4297ef38561f1e788ba73ca0c8078a09dc8c303, accounted for this
change by checking for the removal of ib_get_dma_mr() (which happened
separately). However, SLES 12 SP3 beta 1 adopted the extra argument
to ib_alloc_pd(), but retains the ib_get_dma_mr() function. As a
result, we need an explicit check for the two argument version of
ib_alloc_pd().

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Iecde347e9f18149cac63e243082a2686de260ba7
Reviewed-on: https://review.whamcloud.com/26934
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9443 mdd: omit changelog records for volatile files 31/26931/3
John L. Hammond [Wed, 16 Dec 2015 21:16:43 +0000 (15:16 -0600)]
LU-9443 mdd: omit changelog records for volatile files

Omit changelog records for volatile files. Policy engines need not be
concerned with these files.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Id28db03e0bd6e477db8de1bab1d53688ac7720dd
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/26931
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9369 lfs: make lfs find work correctly 14/26914/2
Emoly Liu [Tue, 2 May 2017 06:37:26 +0000 (14:37 +0800)]
LU-9369 lfs: make lfs find work correctly

This patch resets fp_lmd->lmd_lmm.lmm_magic in case that the
previous file's lum is mistakenly used for the next directory in
cb_find_init().
This patch also improves sanity.sh test_56s and test_56t to verify
the fix.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I717ec809ef199be50dd1c4c7e6152ab9aa223f94
Reviewed-on: https://review.whamcloud.com/26914
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8746 update: restore tdtd_refcount during failure 88/26888/2
wangdi [Tue, 21 Mar 2017 14:49:52 +0000 (10:49 -0400)]
LU-8746 update: restore tdtd_refcount during failure

During batchid_update, tdtd_refcount should be restored
once error happens, otherwise tdtd_refcount will not
reach 0 which will cause distribute thread hang during
umount.

Change the distribute thread name to "dist_txn".

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: I585cc4ceb37a7f3ddaf38201306e0a331fb43e74
Reviewed-on: https://review.whamcloud.com/26888
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
3 years agoLU-9385 mdt: remove XATTR locking from mdt_add_dirty_flag() 70/26870/2
John L. Hammond [Thu, 27 Apr 2017 16:12:37 +0000 (11:12 -0500)]
LU-9385 mdt: remove XATTR locking from mdt_add_dirty_flag()

Clients use the MDS_HSM_STATE_{GET,SET} RPCs to get and set the HSM
state of a file rather than accessing the HSM xattr directly. So
remove the use of MDS_INODELOCK_XATTR around the setting of HSM
attributes in mdt_add_dirty_flag().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I4374671ac5d4828480060de56c95ff7e1601fe59
Reviewed-on: https://review.whamcloud.com/26870
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9404 mdt: set HSM xattr only when needed 67/26867/2
John L. Hammond [Thu, 27 Apr 2017 15:39:11 +0000 (10:39 -0500)]
LU-9404 mdt: set HSM xattr only when needed

In mdt_hsm_add_hal() avoid setting the HSM xattr when the HSM
attributes have not changed.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I12570034127b9928e49ea329bf77b674aaa6ade8
Reviewed-on: https://review.whamcloud.com/26867
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9403 mdt: prevent HSM leak on re-archive 66/26866/2
John L. Hammond [Thu, 27 Apr 2017 15:23:26 +0000 (10:23 -0500)]
LU-9403 mdt: prevent HSM leak on re-archive

In mdt_hsm_is_action_compat() if the file to be archived already
exists in some backend archive then ensure that the re-archive
uses the same backend archive.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ifc0ef03264a20557c31df7add9e34a1dc1f0c814
Reviewed-on: https://review.whamcloud.com/26866
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8364 ldiskfs: fixes for failover mode 54/26854/4
Yang Sheng [Thu, 27 Apr 2017 06:48:22 +0000 (14:48 +0800)]
LU-8364 ldiskfs: fixes for failover mode

Port failover mode patches to other distro and
fix failure path in replay patch.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: I51f5ca0b906a3cbd7554fabb8b447cda4096c781
Reviewed-on: https://review.whamcloud.com/26854
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9411 tests: skip llapi_layout_test 30, 31 for interop 53/26853/4
Andreas Dilger [Thu, 27 Apr 2017 04:07:41 +0000 (22:07 -0600)]
LU-9411 tests: skip llapi_layout_test 30, 31 for interop

The test30, test31 should be skipped if running a pre-PFL MDS
as the PFL layout type is not supported.

Change the test29 skip version to 2.8.55 since the patch for
this test didn't actually land until that version.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ib80bd8a70386b3e8881f8ca3d417a8be18acab07
Reviewed-on: https://review.whamcloud.com/26853
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9312 hsm: add a cookie indexed request hash 63/26763/3
John L. Hammond [Wed, 17 Jun 2015 22:42:36 +0000 (15:42 -0700)]
LU-9312 hsm: add a cookie indexed request hash

Replace linear scans of the HSM coordinator's cdt_requests list with
lookups into a cookie indexed hash (cdt_request_cookie_hash). Rename
cdt_requests to cdt_request_list. Remove the unused function
mdt_hsm_get_running().

Change-Id: I97309aeeb0e02a07e8ddac9f1667989c65f01b8b
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: https://review.whamcloud.com/26763
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9312 hsm: fix error handling around mdt_hsm_get_md_hsm() 41/26741/3
John L. Hammond [Wed, 19 Apr 2017 15:42:20 +0000 (10:42 -0500)]
LU-9312 hsm: fix error handling around mdt_hsm_get_md_hsm()

Correct several spurious NULL return checks from
mdt_hsm_get_md_hsm().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Icfe74e87183bc5356d4c7627088b402805dcc164
Reviewed-on: https://review.whamcloud.com/26741
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9119 lnet: selftest MR fix 23/26723/2
Amir Shehata [Mon, 30 Jan 2017 23:10:32 +0000 (15:10 -0800)]
LU-9119 lnet: selftest MR fix

selftest always responded to the primary nid of the peer rather than
the source of the message, which it should be.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I14a4b6ffc5882cb23298429d8a4bd0bcb0a8a5be
Reviewed-on: https://review.whamcloud.com/26723
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9394 osd: __osd_obj2dnode() to return negative errors 93/26893/3
Alex Zhuravlev [Fri, 28 Apr 2017 22:19:26 +0000 (01:19 +0300)]
LU-9394 osd: __osd_obj2dnode() to return negative errors

DMU/ZFS uses positive values, Lustre negatives..

Change-Id: I3615fac1616d6647897c68ef94f298b356e508d1
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/26893
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
3 years agoLU-9040 scrub: detect dead loop scanning 51/26751/10
Fan Yong [Tue, 2 May 2017 20:10:33 +0000 (04:10 +0800)]
LU-9040 scrub: detect dead loop scanning

It is found that the OI scrub may fall into dead loop scanning
for some unknown reason. This patch checks the scanning cursor
to make sure it will not scan the same object repeatedly.

It also fixes a logic error about 'noscrub' handling, that may
cause the OI scrub to fall into dead loop scanning when the OI
scrub resumes from former crashed partial scanning.

Test-Parameters: mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs mdscount=2 mdtcount=4 testlist=sanity-scrub,sanity-scrub,sanity-scrub
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ia1f63e8a2d675e9fa4567fa329905ac769b83a74
Reviewed-on: https://review.whamcloud.com/26751
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8998 llapi: add LLAPI_LAYOUT_COMP_USE_PREV 84/26484/4
Andreas Dilger [Mon, 10 Apr 2017 21:22:06 +0000 (15:22 -0600)]
LU-8998 llapi: add LLAPI_LAYOUT_COMP_USE_PREV

Add LLAPI_LAYOUT_COMP_USE_PREV to be able to iterate through
components in reverse order.

Add a test case to llapi_layout_test.c to exercise COMP_USE_LAST
and COMP_USE_PREV options.

Improve description of component ID in llapi_layout_comp_id_get.3
to indicate that the component ID does not imply ordering or other
semantics, and is just a numeric identifier for each component.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I21f78e575c2429ef927c8c2fc50bf150f59cab07
Reviewed-on: https://review.whamcloud.com/26484
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9201 test: avoid long sleeps in mount_facet() 21/26021/5
Andreas Dilger [Mon, 13 Mar 2017 20:53:12 +0000 (14:53 -0600)]
LU-9201 test: avoid long sleeps in mount_facet()

Reduce the long sleep during mount since this was fixed via
patch https://review.whamcloud.com/24845 for LU-7481.

This reduces one llmount.sh time from 62s to 37s (2 MDTs, 4 OSTs),
and removes about 800s from each conf-sanity run (201x 4s sleeps
due to "Commit the device label" in a recent test log).

Test-Parameters: trivial testlist=conf-sanity,conf-sanity,conf-sanity
Test-Parameters: trivial testlist=conf-sanity,conf-sanity,conf-sanity
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ib51858a00b935c4f7e473cead117e7d59c3ebbe5
Reviewed-on: https://review.whamcloud.com/26021
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9205 tests: fix failures in CLIENTONLY mode 51/25951/9
Dmitry Eremin [Mon, 13 Mar 2017 13:15:20 +0000 (16:15 +0300)]
LU-9205 tests: fix failures in CLIENTONLY mode

Turn off sanity tests 27F 130 160e 255 311 313 399 407 in
CLIENTONLY mode because of they required remote access to
MDS/OSS nodes.

Test-Parameters: trivial
Change-Id: Id1b79c614200c0d06c208a1c8f04ee13b10165ce
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/25951
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
3 years agoLU-8367 osp: orphan cleanup do not wait for reserved 26/25926/25
Alex Zhuravlev [Fri, 10 Mar 2017 11:37:25 +0000 (14:37 +0300)]
LU-8367 osp: orphan cleanup do not wait for reserved

a thread holding an object reserved on some OST may block
another thread trying to recover that OST. a set of threads
like these may lead to a livelock and cascading timeouts.

Change-Id: Ic14741759d30f9453b0fe28a91a878795a84ef39
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/25926
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9285 osp: revert patches LU-8367 and LU-8973 25/25925/8
Alex Zhuravlev [Fri, 10 Mar 2017 08:50:21 +0000 (11:50 +0300)]
LU-9285 osp: revert patches LU-8367 and LU-8973

another solution will be proposed.

Revert "LU-8972 osp: skip subsequent orphan cleanups"

This reverts commit 6f56f71b407a8c14db4c2accd37da5b4feecde1a.

Revert "LU-8367 osp: do not block orphan cleanup"

This reverts commit 2ce0d5b0640e3e440822080e407eee1ce1cafd75.

Change-Id: I4fb215d4dcdbe0edac0c25998b7deebf02a427c0
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/25925
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9183 llite: handle removal the pos argument of generic_write_sync 26/25826/11
Dmitry Eremin [Fri, 3 Mar 2017 18:31:36 +0000 (21:31 +0300)]
LU-9183 llite: handle removal the pos argument of generic_write_sync

In commit e259221763a40403d5bb232209998e8c45804ab8 the pos argument
of generic_write_sync() was removed.

Change-Id: Iad76c517e372d7dc5e12670b5a0b8106005b71ff
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/25826
Tested-by: Jenkins
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9183 llite: handle make the string hashes salt the hash 19/25819/14
Dmitry Eremin [Thu, 2 Mar 2017 19:57:15 +0000 (22:57 +0300)]
LU-9183 llite: handle make the string hashes salt the hash

In commit 8387ff2577eb9ed245df9a39947f66976c6bcd02 Linus Torvalds
make the string hashes salt the hash.

Hash users that don't have any particular initial salt can just use
the NULL pointer as a no-salt.

Change-Id: Id262d459370aa46c2e3d0e8b1e09dad74c717f03
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/25819
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9125 test: Update setstripe options 75/25475/8
James Nunez [Wed, 15 Feb 2017 20:52:37 +0000 (13:52 -0700)]
LU-9125 test: Update setstripe options

Some flags for 'lfs setstripe'  will be depricated in
tag 2.9.59; '--count' replaced by --stripe-count or -c.

replay-single test 68 will silently fail due to this change.
We need to check that an error is called if 'lfs setstripe'
fails and change the depricated parameters used in replay-single.
The check_default_stripe_attr() routine in sanity.sh also needs
to be updated with the new setstripe options.

Test-Parameters: trivial testlist=sanity,replay-single

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ie5809c9268684675585d17cd1c402ec3fb002239
Reviewed-on: https://review.whamcloud.com/25475
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
3 years agoLU-4423 ptlrpc: use 64-bit times for request times 77/24977/14
Arnd Bergmann [Mon, 1 May 2017 18:07:59 +0000 (14:07 -0400)]
LU-4423 ptlrpc: use 64-bit times for request times

All request timestamps and deadlines in lustre are recorded in time_t
and timeval units, which overflow in 2038 on 32-bit systems.

In this patch, I'm converting them to time64_t and timespec64,
respectively. Unfortunately, this makes a relatively large patch,
but I could not find an obvious way to split it up some more without
breaking atomicity of the change.

Also unfortunately, this introduces two instances of div_u64_rem()
in the request path, which can be slow on 32-bit architectures. This
can probably be avoided by a larger restructuring of the code, but
it is unlikely that lustre is used in performance critical setups
on 32-bit architectures, so it seems better to optimize for correctness
rather than speed here.

Linux-commit: 219e6de627243c8dbc701eaafe1c30c481d1f82c

Change-Id: Iff3c2bdb50bbb34d27edd4402838f915c16530f4
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/24977
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
3 years agoLU-4017 quota: cleanup to improve quota codes 77/26577/8
Wang Shilong [Thu, 13 Apr 2017 02:45:10 +0000 (22:45 -0400)]
LU-4017 quota: cleanup to improve quota codes

Add man page updates for project quota, also
cleanup to address some style and minor problems

Change-Id: I3ee3e866dd0300a1b07e0f5319dfd695c0bafba0
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/26577
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8376 ost: enhance end to end bulk cksum error report 60/23960/20
Bruno Faccini [Fri, 25 Nov 2016 14:57:20 +0000 (15:57 +0100)]
LU-8376 ost: enhance end to end bulk cksum error report

Some sites have experienced spurious checksum errors upon bulk
xfers where it is very difficult to determine the source of the
corruption.
With this patch, upon cksum error, full dump of all pages in a
bulk xfer is now possible (enabled via a /proc tunable) on both
Client and OSS sides, to allow easier root cause identification.

sanity.sh/test_77[b,d,f,g]() existing sub-tests results can already
be used to show the effects of this patch, by injecting bulk cksum
error/corruption using OBD_FAIL_[OSC,OST]_CHECKSUM_[SEND,RECEIVE]
fail codes.

sanity.sh/test_77c has been created to specificaly test new dump
on cksum error functionality.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I0d200bb6d5c41c55a66ac012fd9cbd8d702d2f3a
Reviewed-on: https://review.whamcloud.com/23960
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8650 mdt: enable REP-ACK for DNE 07/22807/4
Lai Siyao [Thu, 29 Sep 2016 12:48:47 +0000 (20:48 +0800)]
LU-8650 mdt: enable REP-ACK for DNE

LU-7903 reveals that REP-ACK is disabled in 2.8, this was
introduced in LU-3538 http://review.whamcloud.com/12530
which is to support DNE Commit-on-Sharing, but it disabled
REP-ACK, while Commit-on-Sharing doesn't take effect for
local operations (operation which involves only one MDT)
either, this may cause single MDT recovery fail.

To fix this, we need to enable REP-ACK, and also make sure
http://review.whamcloud.com/12530 work as designed:
1. save local locks upon unlock as before, but don't convert
   locks into COS mode.
2. reply_out_callback() wakes up ptlrpc_handle_rs(), if
   reply was committed, decref locks like before.
3. otherwise for uncommitted reply convert locks to COS mode,
   and later when it's committed, ptlrpc_commit_replies()
   wakes up ptlrpc_handle_rs() again, which will decref locks
   like before.

In short, local locks will be converted into COS mode upon
REP-ACK, and transaction commit will decref locks.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Id21681017573b50e071dd8b5a4d65489843781a1
Reviewed-on: https://review.whamcloud.com/22807
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8307 ldlm: cond_resched in ldlm_bl_thread_main 88/20888/2
Patrick Farrell [Mon, 20 Jun 2016 21:15:51 +0000 (16:15 -0500)]
LU-8307 ldlm: cond_resched in ldlm_bl_thread_main

When clearing all of the ldlm LRUs (as Cray does at the end of
a job), a ldlm_bl_work_item is generated for each namespace
and then they are placed on a list for the ldlm_bl threads to
iterate over.

If the number of namespaces greatly exceeds the number of
ldlm_bl threads, a given thread will iterate over many
namespaces without sleeping looking for work.  This can go
on for an extremely long time and result in an RCU stall.

This patch adds a cond_resched() between completing one
work item and looking for the next.  This is a fairly cheap
operation, as it will only schedule if there is an
interrupt waiting, and it will not be called too much -
Even the largest file systems have < 100 namespaces per
ldlm_bl_thread currently.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: Ic8022faf641ad6ab02462ab376a4bfd510dca14c
Reviewed-on: https://review.whamcloud.com/20888
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Ann Koehler <amk@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9430 utils: fix logic errors and putchar in sk_name2hmac() 20/26920/2
Chris Hanna [Tue, 2 May 2017 18:11:13 +0000 (14:11 -0400)]
LU-9430 utils: fix logic errors and putchar in sk_name2hmac()

In the sk_name2hmac function in lgss_sk.c, there are a couple minor
errors: bad usage of strcmp(), use of putchar() instead of assigning
a lowercased value, and use of a logical OR instead of AND.

These errors would prevent proper creation of shared keys in certain
circumstances.

Signed-off-by: Chris Hanna <hannac@iu.edu>
Change-Id: I16462f15201626f194e1b452acf3a1e63dbf0ed7
Reviewed-on: https://review.whamcloud.com/26920
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
3 years agoLU-7062 ldlm: GPF in _ldlm_lock_debug() 39/16139/8
Andriy Skulysh [Mon, 31 Aug 2015 08:39:34 +0000 (11:39 +0300)]
LU-7062 ldlm: GPF in _ldlm_lock_debug()

Lock's resource can change on a client.
Take a resource reference under spinlock
to print debug information.

Change-Id: Id8acb801ea549bf3c1ce1bcf6349db31578579f3
Xyratex-bug-id: MRP-2760
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-on: https://review.whamcloud.com/16139
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoNew tag 2.9.57 2.9.57 v2_9_57 v2_9_57_0
Oleg Drokin [Mon, 8 May 2017 03:44:40 +0000 (23:44 -0400)]
New tag 2.9.57

Change-Id: Idc4dec64104cfb538501a8ee50f101f10fd69ff4
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-4017 quota: extend to test project quota 11/26411/16
Wang Shilong [Wed, 5 Apr 2017 07:52:17 +0000 (03:52 -0400)]
LU-4017 quota: extend to test project quota

Extend sanity-quota.sh to test project quota.
Also extend llog_test module to test new format
@llog_setattr64_rec_v2. codes should be able
handle both @llog_setattr64_rec_v2 and @llog_setattr64_rec
well.

Change-Id: I4f22c1e994da10ffed64c08749ae749740ed9b46
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/26411
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9119 lnet: rename LNET_MAX_INTERFACES 93/26693/4
Olaf Weber [Fri, 27 Jan 2017 15:14:50 +0000 (16:14 +0100)]
LU-9119 lnet: rename LNET_MAX_INTERFACES

LNET_MAX_INTERFACES is the number of interfaces supported by
interface bonding in the ksocknal LND. It shows up in LNet
because a number of data structures are shared between LNDs.

Rename it to LNET_NUM_INTERFACES to reduce the confusion of
what it does.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: Ibc1d85a379d6616eb1db2fcb54aaffc835ffa9f4
Reviewed-on: https://review.whamcloud.com/26693
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9119 lnet: loopback NID in lnet_select_pathway() 92/26692/4
Olaf Weber [Fri, 27 Jan 2017 15:14:34 +0000 (16:14 +0100)]
LU-9119 lnet: loopback NID in lnet_select_pathway()

In lnet_select_pathway() sending to the loopback NID is handled
as a special case, because there are no credits involved. (The
loopback NID doesn't use credits, and therefore does not have
any credits. If a message goes through the credit-managing code
it therefore ends up waiting indefinitely for credits to become
available.)

The check whether we're sending over the loopback NID must be
done after we've completed choosing the NI to send over. In its
present location it only handles the case where the loopback
NID was explicitly passed in as the source NID.

(Lustre does not exercise this code path during normal operation,
the bug was encountered while testing code for the peer discovery
feature.)

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: Ifa25abf508214ae363a2f1bb04ffeab1891a2564
Reviewed-on: https://review.whamcloud.com/26692
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9119 socklnd: propagate errors on send failure 91/26691/4
Olaf Weber [Fri, 27 Jan 2017 15:13:53 +0000 (16:13 +0100)]
LU-9119 socklnd: propagate errors on send failure

When an attempt to send a message fails, for example because no
connection could be established with the remote address, socklnd
drops the message. For a PUT or REPLY message with non-zero
payload, ksocknal_tx_done() calls lnet_finalize() with -EIO
as the error code. But for an ACK or GET message there is no
payload, and lnet_finalize() is called with 0 (no error) as the
error code. This leaves upper layers to rely on other means to
determine that sending the message did actually fail, and that
(for example) no REPLY will ever answer a failed GET.

Add an error code parameter to ksocknal_tx_done().

In ksocknal_txlist_done() change the 0/1 'error' indicator to be
an actual error code that is passed on the ksocknal_tx_done().
Update the callers of ksocknal_txlist_done() to pass in the error
code if they have encountered an error.

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I66b897a31e537e70dcc2622ffdfcc6e96fa93193
Reviewed-on: https://review.whamcloud.com/26691
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9119 lnet: fix race in lnet shutdown path 90/26690/4
Olaf Weber [Fri, 27 Jan 2017 15:13:29 +0000 (16:13 +0100)]
LU-9119 lnet: fix race in lnet shutdown path

The locking changes for the lnet_net_lock made for Multi-Rail
introduce a race in the LNet shutdown path. The code keeps two
states in the_lnet.ln_shutdown: 0 means LNet is either up and
running or shut down, while 1 means lnet is shutting down. In
lnet_select_pathway() if we need to restart and drop and relock
the lnet_net_lock we can find that LNet went from running to
stopped, and not be able to tell the difference.

Replace ln_shutdown with a three-state ln_state patterned on
ln_rc_state: states are LNET_STATE_SHUTDOWN, LNET_STATE_RUNNING,
and LNET_STATE_STOPPING. Most checks against ln_shutdown now test
ln_state against LNET_STATE_RUNNING. LNet moves to RUNNING state
in lnet_startup_lndnets().

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I7afcbeb793dfa4d0a361e421ae06a99b7d4db903
Reviewed-on: https://review.whamcloud.com/26690
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-9183 llite: handle flags as argument for inode_operations->rename 27/25827/11
Dmitry Eremin [Fri, 3 Mar 2017 18:27:36 +0000 (21:27 +0300)]
LU-9183 llite: handle flags as argument for inode_operations->rename

In Linux kernel v3.14 the inode_operations->rename() needs flags in
arguments.

Change-Id: I5028357d1d459b83ff0b1df0abeaadf78c5d05da
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/25827
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8879 tests: speed up copytool_cleanup() in sanity-hsm 25/24025/7
Quentin Bouget [Tue, 29 Nov 2016 15:39:26 +0000 (16:39 +0100)]
LU-8879 tests: speed up copytool_cleanup() in sanity-hsm

This patch implements the following improvements:

 - The coordinator now wakes up when hsm_control is set to 'shutdown'

 - The wait_copytools() function in sanity-hsm uses a polling
   mechanism to detect when all running copytools are killed.
   It used to sleep before the first check, even though that check
   would pass most of the time. This has been fixed.

 - wait_copytools() used to sleep for 2s between its checks. It now
   sleeps for 0.1s, 0.2s, 0.4s, 0.8s, 1.6s, 3.2s, 3.2s, 3.2s, ...
   until it times out.

Considering how often the wait_copytools() function is called in
sanity-hsm, this patch should represent a noticeable speed-up.

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: Ia460df59a724caaa194565dd7af402c8c617f40e
Reviewed-on: https://review.whamcloud.com/24025
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jean-Baptiste Riaux <riaux.jb@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8882 osd: use bydnode methods to access DMU 35/24035/21
Alex Zhuravlev [Wed, 30 Nov 2016 20:07:54 +0000 (23:07 +0300)]
LU-8882 osd: use bydnode methods to access DMU

newer ZFS allows to access DMU by dnode which save expensive
dnode# to dnode_t mapping.

Change-Id: I469c2a72d18f170ebb96dd33c23bb6d8f037188a
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/24035
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8873 osd: use sa_handle_get_from_db() 04/24004/15
Alex Zhuravlev [Tue, 29 Nov 2016 16:48:59 +0000 (19:48 +0300)]
LU-8873 osd: use sa_handle_get_from_db()

use sa_handle_get_from_db() instead of sa_handle_get() and
save on object->dnode lookup

Change-Id: I2a23e36c3c98ecf4ec00ac590a32d2c14a867aa0
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/24004
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Jenkins
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
3 years agoLU-8998 llapi: rename llapi_layout_comp_move -> *use 83/26483/3
Andreas Dilger [Thu, 30 Mar 2017 23:12:29 +0000 (17:12 -0600)]
LU-8998 llapi: rename llapi_layout_comp_move -> *use

Rename llapi_layout_comp_move() and llapi_layout_comp_move_at()
to llapi_layout_comp_use() and llapi_layout_comp_use_id(),
respectively.  This avoids confusion about what "move" and "at" in
the function name implies. The component itself is not actually
being moved, just a different layout component is being selected
for access or modification.  Using "_id" instead of "_at" also
makes it more clear what the difference is between these functions.

Rename LLAPI_LAYOUT_COMP_POS_{FIRST,NEXT,LAST} correspondingly to
LLAPI_LAYOUT_COMP_USE_{FIRST,NEXT,LAST} to match.

Split llapi_layout_comp_use_id.3 from llapi_layout_comp_use.3 since
they are mostly independent anyway.

Test-Parameters: trivial testlist=sanity-pfl
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I85926d4ec9774745bc49b0d178ed9b23ec2cab07
Reviewed-on: https://review.whamcloud.com/26483
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>