Whamcloud - gitweb
fs/lustre-release.git
6 years agoLU-9979 kernel: kernel update RHEL6.9 [2.6.32-696.10.2.el6] 69/28969/2
Bob Glossman [Tue, 12 Sep 2017 16:36:41 +0000 (09:36 -0700)]
LU-9979 kernel: kernel update RHEL6.9 [2.6.32-696.10.2.el6]

Update RHEL6.9 kernel to 2.6.32-696.10.2.el6

Test-Parameters: clientdistro=el6.9 mdsdistro=el6.9 \
  ossdistro=el6.9 mdtfilesystemtype=ldiskfs \
  ostfilesystemtype=ldiskfs testgroup=review-ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ie86a84fda1ec391c4e0b9ab18a82d4a5b0bd25d1
Reviewed-on: https://review.whamcloud.com/28969
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Jenkins
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9960 osd-zfs: don't auto-upgrade quota 54/28954/2
Nathaniel Clark [Mon, 11 Sep 2017 14:14:18 +0000 (10:14 -0400)]
LU-9960 osd-zfs: don't auto-upgrade quota

To preserve the ability to down-grade from 0.7.x to 0.6.x,
don't auto-upgrade quotas.
Print warning if quotas haven't been upgraded when mouting with 0.7.0.
Do check based on zpool feature in sanity-quota instead of just
version.

Lustre-change: https://review.whamcloud.com/#/c/28924/
Lustre-commit: 0bbef0afc16081e1af3529642436864954a73e3c

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I2b0dcba3a230c9b2dec3d07d1b4ca6f1a1717d47
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28954
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9907 build: add patchless server for lbuild 52/28952/2
Minh Diep [Wed, 23 Aug 2017 23:07:28 +0000 (16:07 -0700)]
LU-9907 build: add patchless server for lbuild

Adding lbuild support for building patchless server
Cleanup unused TARGET_ARCHS and BUILD_ARCHS

Test-Parameters: trivial

Lustre-change: https://review.whamcloud.com/#/c/28672/
Lustre-commit: 7018f18957817afd5dcd3b6f173e7634b93101e9

Change-Id: I946352fa243c86d5729779406264e6ee37856145
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28952
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9347 ioctl: Add BLKSSZGET ioctl support 61/28861/2
Emoly Liu [Thu, 17 Aug 2017 07:36:49 +0000 (15:36 +0800)]
LU-9347 ioctl: Add BLKSSZGET ioctl support

Add BLKSSZGET ioctl and return PAGE_SIZE for the minimun
alignment from ll_file_ioctl() for this call.

Lustre-change: https://review.whamcloud.com/28578
Lustre-commit: dcc32cd7d0d89a49f0c73ecf99130a2678442e55

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: Id8a77e77cd7e1807aa90474ca6d3d1fea4d7c269
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28861
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8958 llite: remove llite_loop left overs 60/28860/2
James Simmons [Mon, 31 Jul 2017 18:24:19 +0000 (14:24 -0400)]
LU-8958 llite: remove llite_loop left overs

With the removal of llite_loop several pieces of code are still
present in the llite layer that were only used by the lloop device.
We can remove these no longer used pieces.

Lustre-change: https://review.whamcloud.com/26795
Lustre-commit: 2e875d5fc5b73f735bd42f5da54c23e4c2d35d5c

Change-Id: I67f2761ae29b8ba7bb3bc9bcc3e3f8ece73a3ea3
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28860
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9842 osd: return ENODATA for XATTR_NAME_FID on MDT 61/28761/3
Fan Yong [Tue, 8 Aug 2017 23:18:21 +0000 (07:18 +0800)]
LU-9842 osd: return ENODATA for XATTR_NAME_FID on MDT

The XATTR_NAME_FID xattr is OST side EA, if someone calls
getxattr() for XATTR_NAME_FID on MDT, then return -ENODATA.

Lustre-change: https://review.whamcloud.com/28434
Lustre-commit: eb34cf7695766b39e15704861d6ac3d636042196

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I18b1466cf62d10fa28f7ed9731490e963b6274f4
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28761
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9841 lov: do not split IO for single striped file 60/28760/2
Jinshan Xiong [Wed, 9 Aug 2017 23:31:17 +0000 (16:31 -0700)]
LU-9841 lov: do not split IO for single striped file

stripe size for single striped file is not reliable, it shouldn't
be used to split I/O.

Lustre-change: https://review.whamcloud.com/28451
Lustre-commit: 078a099d26ef7f5d26131c0e18615855a39f341d

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I47c31d59b46b07d4a6760b8985e1c19da4765a5c
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28760
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0 59/28759/2
Doug Oucharek [Wed, 31 May 2017 21:39:12 +0000 (14:39 -0700)]
LU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0

There is a case in the routine ptlrpc_register_bulk() where we were
asserting if bd_nob_transferred != 0 when not resending.  There is
evidence that network errors can create a situation where
this does happen.  So we should not be asserting!

This patch changes that assert to an error return code of -EIO.

Lustre-change: https://review.whamcloud.com/28491
Lustre-commit: e6490ea6cf0b793c0b47f17ac5a5fa3a2a136e0d

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I6a73ca1b04a86f187744d3b8b5d46df71d95e416
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28759
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9558 llite: handle struct vm_operations changes 73/28573/2
James Simmons [Thu, 27 Jul 2017 18:09:17 +0000 (14:09 -0400)]
LU-9558 llite: handle struct vm_operations changes

For the linux 4.11 kernel passing in struct vm_area_struct
to struct vm_operations members has been removed since
struct vm_area_struct has been merged into struct vm_fault.
Handle these changes in the llite layer.

Linux-commit: 11bac80004499ea59f361ef2a5516c84b6eab675

Test-Parameters: trivial

Lustre-commit: a1fc8dffef216b71cb4a29a5a8faa2aa7919d2ae
Lustre-change: https://review.whamcloud.com/27651

Change-Id: I3f8767f02695515d83c59a61f6e9921b3d823109
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/28573
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9104 obd: Ignore unknown config param while mounting 32/28232/2
Rahul Deshmkuh [Mon, 8 May 2017 17:03:41 +0000 (22:33 +0530)]
LU-9104 obd: Ignore unknown config param while mounting

class_process_proc_param() returns positive value when it encounters
unknown parameters in order to have the below levels process them.
At the very bottom layer the positive value returned by
class_process_proc_param() needs to be dropped.osd_process_config()
missed that, which resulted in target mount failure.

Make sure that osd_process_config() does not return positive value
returned by class_process_proc_param().

Test case has been added to check processing of unknown config
parameters.

Lustre-change: https://review.whamcloud.com/25368
Lustre-commit: 1385dbf8b9caedca7bb32f35db1529e4d5c52d4f

Test-Parameters: testlist=conf-sanity
Signed-off-by: Parinay Kondekar <parinay.kondekar@seagate.com>
Signed-off-by: Rahul Deshmukh <rahul.deshmukh@seagate.com>
Seagate-bug-id: MRP-4162
Change-Id: I068b8f2aee4cee69629efc83745d7cb88aea268c
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28232
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9654 mdt: fix problem of RAoLU HSM policy 20/28220/2
Li Xi [Mon, 12 Jun 2017 07:32:29 +0000 (15:32 +0800)]
LU-9654 mdt: fix problem of RAoLU HSM policy

mdt_attr_get_complex() clears all known attributes even they are
already valid. So in mdt_handle_last_unlink(), the valid attributes
of HSM or nlink should be checked before calling that function.

Lustre-change: https://review.whamcloud.com/27564
Lustre-commit: 2e4e60234028743404a302893fb1aa9f8d6a95e7

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I9ba561cadcc40baf5e28172cfda699cdecce7ea8
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28220
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9781 llog: Improve catalog full warning 15/28815/2
Giuseppe Di Natale [Tue, 18 Jul 2017 21:57:18 +0000 (14:57 -0700)]
LU-9781 llog: Improve catalog full warning

When warning that a catalog file is full, provide the name
of the catalog file. If the name of catalog file isn't
defined, print its FID.

Lustre-change: https://review.whamcloud.com/28093
Lustre-commit: 1c4900dcd367d2a0b2e6f26796328f5aa12508db

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: I559e43d08febfd8a1512ceb58fd3030b06372e9f
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28815
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9410 ldiskfs: no check mb bitmap if flex_bg enabled 65/28765/2
Fan Yong [Wed, 9 Aug 2017 18:30:02 +0000 (02:30 +0800)]
LU-9410 ldiskfs: no check mb bitmap if flex_bg enabled

When initializes (reformat) the filesystem, the number of
free blocks in the group descriptor is calculated via the
ext2fs_reserve_super_and_bgd() (e2fsprogs). As commented
in such function: "This is not necessarily the case when
the flex_bg feature is enabled, so callers should take care!".

So it is normal that we may find the block group descriptor
that has LDISKFS_BG_BLOCK_UNINIT flag but with 0 free blocks.
The ldiskfs_mb_check_ondisk_bitmap() should NOT report error
for such block group, instead, skip the check directly.

Lustre-change: https://review.whamcloud.com/28566
Lustre-commit: 5506c15a65b3eebb9f15000105e6eb7c02742a10

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Iba0fb2bf0632a6e54222472bc724a8ea0478e9ae
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28765
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9890 osd-zfs: dmu_objset_own/disown changes 64/28764/2
Giuseppe Di Natale [Thu, 17 Aug 2017 17:16:49 +0000 (10:16 -0700)]
LU-9890 osd-zfs: dmu_objset_own/disown changes

ZFS 0.8.0 will introduce ZFS encryption. The interfaces
to 'dmu_objset_own' and 'dmu_objset_disown' have changed.
Add configure checks to determine which versions of these
functions are available and call them appropriately.

Test-Parameters: trivial ostfilesystemtype=zfs mdtfilesystemtype=zfs testlist=sanity

Lustre-change: https://review.whamcloud.com/28593
Lustre-commit: 0fedb017c12629d145fa0577451d43adc757eb36

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: Ide1a712858770e373404445b06596130a574d85b
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28764
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9869 lnet: fix incorrect arguments order calling lstcon_session_new 63/28763/2
Colin Ian King [Fri, 11 Aug 2017 17:17:57 +0000 (13:17 -0400)]
LU-9869 lnet: fix incorrect arguments order calling lstcon_session_new

The arguments args->lstio_ses_force and args->lstio_ses_timeout are
in the incorrect order. Fix this by swapping them around.

Detected by CoverityScan, CID#1226833 ("Arguments in wrong order")

Test-Parameters: trivial testlist=lnet-selftest

Lustre-change: https://review.whamcloud.com/28487
Lustre-commit: 0a80067c60043da428c473c331ed7b602c6e960b

Change-Id: If11c574655425db5bbf21ba2264be8d83a7e8bf8
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28763
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9848 llog: check padding size for update reclen 62/28762/2
Lai Siyao [Tue, 15 Aug 2017 11:51:08 +0000 (19:51 +0800)]
LU-9848 llog: check padding size for update reclen

Update log only checks padding size for split case, which should also
be done if it's less than chunk size.

Lustre-change: https://review.whamcloud.com/28554
Lustre-commit: 6705d6cc40a83b0e94668d651c444c855482bd01

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Ie7819f67dd9bcbfb060713bb208c9777420c5178
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28762
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9499 lfsck: set target bitmap properly 18/28518/2
Fan Yong [Wed, 14 Jun 2017 07:48:56 +0000 (15:48 +0800)]
LU-9499 lfsck: set target bitmap properly

If the notify from the peer server has LF_INCOMPLETE flags,
then record it in the target bitmap unconditionally to avoid
missing to update the bitmap for some corner cases.

This patch also addes more debug information when the LFSCK
updates the bitmap and handle double_scan_result.

Lustre-change: https://review.whamcloud.com/27632
Lustre-commit: 775780cb90ba6069aefc7063adfe6862b26ce935

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I3a6195136d608aa47e59e61f95c92978503e3a4b
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28518
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8760 lib: avoid unexpected out of order execution 22/28322/4
Fan Yong [Fri, 4 Nov 2016 01:04:39 +0000 (09:04 +0800)]
LU-8760 lib: avoid unexpected out of order execution

There is race condtion in __l_wait_event() because of the
out-of-order execution between changing thread state and
checking condition. It may block the thread (to be waken)
for ever. Consider the following real execution order:

1. Thread1 checks condition on CPU1, gets false.
2. Thread2 sets condition on CPU2.
3. Thread2 calls wake_up() on CPU2 to wake the threads with
   state TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE. But the
   Thread1'sstate is TASK_RUNNING at that time.
4. Thread1 sets its state as TASK_INTERRUPTIBLE on CPU1,
   then schedule.

If the '__timeout' variable is zero, the Thread1 will have
no chance to check the condition again.

Generally, the interval between out-of-ordered step1 and step4
is very tiny, as to above step2 and step3 cannot happen. On some
degree, it can explain why we seldom hit related trouble. But
such race really exists, especially consider that the step1 and
step4 can be interruptible.

The patch adds barrier between changing thread's state and
checking condition to avoid out-of-order execution.

Lustre-change: https://review.whamcloud.com/23564
Lustre-commit: c2b6030e9217e54e7153c0a33cce0c2ea4afa54c

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I32caee6b332f037d864419ea8728112da563cce0
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28322
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9856 mdd: handle NULL buffer in mdd_xattr_list() 66/28766/2
John L. Hammond [Thu, 10 Aug 2017 19:44:24 +0000 (14:44 -0500)]
LU-9856 mdd: handle NULL buffer in mdd_xattr_list()

The upper layer may call mdd_xattr_list() with a NULL buffer to get
the length of the xattr name list. Handle this case safely by skipping
the removal of the link xattr for unlinked objects.

Lustre-change: https://review.whamcloud.com/28469
Lustre-commit: 33a4b5ef00e88b33136d09d2f4029223a3c4d681

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Iae87fba20325b228ef75ee762acfa49353932b1b
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28766
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9772 utils: Enable new ZFS MMP on mkfs 52/28652/2
Nathaniel Clark [Fri, 14 Jul 2017 17:39:10 +0000 (13:39 -0400)]
LU-9772 utils: Enable new ZFS MMP on mkfs

ZFS 0.7.0 come with new multi-modifier protection, this patch
enables it by default, on mkfs.

This also ensures canmount is off for pools that were not just
created.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: If9b87e9786e0eaefe5ac9a536edcdca3d1012585
Reviewed-on: https://review.whamcloud.com/28051
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
(cherry picked from commit 25e1cea871abd3c08dffb06ea62046ad84a822c1)
Reviewed-on: https://review.whamcloud.com/28652

6 years agoLU-9903 kernel: kernel update RHEL6.9 [2.6.32-696.10.1.el6] 85/28685/2
Bob Glossman [Tue, 22 Aug 2017 17:51:36 +0000 (10:51 -0700)]
LU-9903 kernel: kernel update RHEL6.9 [2.6.32-696.10.1.el6]

Update RHEL6.9 kernel to 2.6.32-696.10.1.el6
kernel patch added as a workaround for LU-9698

Test-Parameters: clientdistro=el6.9 mdsdistro=el6.9 \
  ossdistro=el6.9 mdtfilesystemtype=ldiskfs \
  ostfilesystemtype=ldiskfs testgroup=review-ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: If201c5220f2125e018a16b57e3c97be55adfb1ce
Reviewed-on: https://review.whamcloud.com/28685
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8346 obdclass: Set lc_version 05/28405/2
Patrick Farrell [Tue, 11 Jul 2017 14:31:46 +0000 (09:31 -0500)]
LU-8346 obdclass: Set lc_version

The patch LU-8346 obdclass: guarantee all keys filled
removed the setting of lc_version, which makes us always
refill cached envs.  This is very expensive, particularly
for fast reads.

Original commit e58f8d609a81576eaf5bc9d0fa53bef274a01bf,
https://review.whamcloud.com/26099

Change-Id: I13ba7d19185b899d1f68d244365160539e881b8e
Signed-off-by: Patrick Farrell <paf@cray.com>
Reviewed-on: https://review.whamcloud.com/27994
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Jenkins
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
(cherry picked from commit 96f3fb788c230872e6d31185367a55ec3c4fedbc)
Reviewed-on: https://review.whamcloud.com/28405
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-7899 osd: batch EA updates 82/28482/2
Alex Zhuravlev [Tue, 28 Feb 2017 09:44:14 +0000 (12:44 +0300)]
LU-7899 osd: batch EA updates

during file creation we set number of EAs: LMA, VBR, LinkEA, LOVEA, ACLs.
calling into SA to refill spill again and again is expensive. thus it
makes sense to postpone this to osd_trans_stop() where all changed EAs
has been already collected in a temporary buffer.

Lustre-change: https://review.whamcloud.com/21893
Lustre-commit: 2c9ff6dffdf4320af95c9db9af07a416529275f0

Change-Id: Ia2604ddafdf8b2ca4f6db4d70ead6d2d2761cd26
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28482
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9850 patchless client should conflict patched kernel 37/28437/7
Brian J. Murrell [Wed, 9 Aug 2017 15:09:44 +0000 (11:09 -0400)]
LU-9850 patchless client should conflict patched kernel

Due to how dependencies work in RPM (and a bug in how kmod RPMs generate
their dependency lists), on a node where the server and client repos are
both configured, YUM could allow the patched kernel to satisfy the
patchless-client RPM's requirements.

Add Conflicts: and Provides: to the kernel RPM and lustre-client RPM to
prevent this from happening.

This change also allows one to force the installation of the patched
kernel RPM (yum install kernel-lustre) if one desires.

Signed-off-by: Brian J. Murrell <brian.murrell@intel.com>
Change-Id: If9c44a93937cd7603b0246676ebc9c8260a43b11
Reviewed-on: https://review.whamcloud.com/28437
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9725 quota: always deregister lwp 57/28357/4
Lai Siyao [Fri, 4 Aug 2017 15:16:46 +0000 (23:16 +0800)]
LU-9725 quota: always deregister lwp

qsd should always deregiter lwp upon finish no matter qsd_exp was
set before, otherwise the item will stay on the list, but qsd has
been freed.

Lustre-change: https://review.whamcloud.com/#/c/28356/
Lustre-commit: ce8ca7d3564439285a56982430f380354b697f68

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I0d6206f2f2bc8177d0aa35b350f534d85eab1c03
Reviewed-on: https://review.whamcloud.com/28357
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8935 ptlrpc: missing barrier before wake_up 21/28321/3
Lai Siyao [Wed, 12 Apr 2017 21:56:50 +0000 (05:56 +0800)]
LU-8935 ptlrpc: missing barrier before wake_up

ptlrpc_client_wake_req() misses a memory barrier, which may cause
strange errors.

Lustre-change: https://review.whamcloud.com/26583
Lustre-commit: 33033c27aae361069877d56d44714097a208aa76

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Ic8e9cbaf8c07f503798b95c608477508204d9614
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28321
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9745 dkms: Fix included dkms.conf file 24/28224/3
Nathaniel Clark [Tue, 25 Jul 2017 21:34:30 +0000 (17:34 -0400)]
LU-9745 dkms: Fix included dkms.conf file

When lustre-dkms is installed with other dkms packages,
the PRE/POST scripts don't seem to function correctly.
This includes the correct dkms.conf by default without having
to recreate and reread it during build.

Lustre-change: https://review.whamcloud.com/#/c/28210/
Lustre-commit: 2f4d11de2c37803822d6c1df0b7df6828477c11a

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ic6f25480db40d784dfcb3b650f7c869716b903ee
Reviewed-on: https://review.whamcloud.com/28224
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8619 lbuild: update ZFS to use 0.7.1 31/28531/5
Andreas Dilger [Wed, 9 Aug 2017 04:55:46 +0000 (12:55 +0800)]
LU-8619 lbuild: update ZFS to use 0.7.1

Update lbuild to build against ZFS 0.7.1

Changelog: https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.1

Lustre-change: https://review.whamcloud.com/#/c/22569/
Lustre-commit: ffa8ea9ca6f096515cf8b0638161378665591022

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I04082cd6cd43c98477100f9fc308666e1b981c0a
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28531
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-7991 osd-zfs: remove obsolete quota code 81/28481/3
Fan Yong [Tue, 25 Jul 2017 06:09:21 +0000 (14:09 +0800)]
LU-7991 osd-zfs: remove obsolete quota code

Directly use ZFS backend quota accounting objects,
no need be mapped from Lustre own account objects.
Related obsolete logic is removed totally.

Lustre-change: https://review.whamcloud.com/27661
Lustre-commit: 9fdbfed4ce20aaf7ac0e6b5001f323428d9d3893

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I2c6aeb1ccac52348d8d163017f73ef0fd9133551
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28481
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8999 quota: fix quota iteration interface 30/28530/2
Jinshan Xiong [Fri, 4 Aug 2017 01:42:39 +0000 (18:42 -0700)]
LU-8999 quota: fix quota iteration interface

Since zfs 0.7.0, object accounting is maintained by DMU, so that quota
iteration interface should retrieve the information over there.

Lustre-change: https://review.whamcloud.com/#/c/28345/
Lustre-commit: 8fb8e938c8af8630b06a527874ef47d52fb8b102

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I3d0744dfb52b1a9088b828bc72d648872ec4d00b
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28530
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9054 tests: disable test_312 due to zdb issue 80/28480/3
Jinshan Xiong [Thu, 3 Aug 2017 22:13:56 +0000 (15:13 -0700)]
LU-9054 tests: disable test_312 due to zdb issue

zdb used to work for datasets of exported pool by '-e -p' options,
this has been changed in zfs-0.7.0.

This patch temporarily disables test_312 until zfs upstream ticket
https://github.com/zfsonlinux/zfs/issues/6464 is solved.

Lustre-change: https://review.whamcloud.com/28343
Lustre-commit: 05ebd8218108dccaa31c1dbc97cd4cc90dfd3f8d

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ib0c9eeed4964ea4a0abfed70760cb8fbaeb44496
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28480
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9826 tests: Do not run conf-sanity 32b with ZFS 04/28504/2
James Nunez [Mon, 7 Aug 2017 17:29:50 +0000 (11:29 -0600)]
LU-9826 tests: Do not run conf-sanity 32b with ZFS

With ZFS 0.7.0, conf-sanity test 32b consistently fails
in automated testing. Since the issue may be the VM test
environment, add conf-sanity test 32b to the
ALWAYS_EXCEPT list for ZFS testing.

Test-Parameters: trivial testlist=conf-sanity

Lustre-change: https://review.whamcloud.com/28408
Lustre-commit: d564becbe58f7fb349cbe19a7f23e447278310d5

Signed-off-by: James Nunez <james.a.nunez@intel.com>
Change-Id: Ib8582b85760057045bc7cce66d470e81e0e43dde
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28504
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9799 mount: Call read_ldd with initialized mount type 81/28581/2
Nathaniel Clark [Thu, 10 Aug 2017 14:20:04 +0000 (10:20 -0400)]
LU-9799 mount: Call read_ldd with initialized mount type

When re-reading the ldd when doing relabel, ensure the correct mount
type is used for the read. Otherwise ldiskfs complains:

   mount.lustre FATAL: unhandled/unloaded fs type 0 'ext3'

and ZFS complains:

   e2label: No such file or directory while trying to open MGS/MGT
   Couldn't find valid filesystem superblock.

Lustre-change: https://review.whamcloud.com/28456
Lustre-commit: 0108281c65545df169faaa0ce0690fb021680643

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ife53cff948d545c306e99e4b023989245a1ac3f7
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Hongchao Zhang <hongchao.zhang@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28581
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9866 kernel: kernel update [SLES12 SP2 4.4.74-92.35] 43/28543/2
Bob Glossman [Fri, 11 Aug 2017 15:25:03 +0000 (08:25 -0700)]
LU-9866 kernel: kernel update [SLES12 SP2 4.4.74-92.35]

Update target and kernel_config files for new version

Test-Parameters: clientdistro=sles12sp2 testgroup=review-ldiskfs \
  mdsdistro=sles12sp2 ossdistro=sles12sp2 \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ibd5e7e931a6055c1b0d2a52359d4f4527843dec0
Reviewed-on: https://review.whamcloud.com/28543
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8618 tests: ha.sh improvements 24/28524/2
Elena Gryaznova [Mon, 29 May 2017 18:50:57 +0000 (21:50 +0300)]
LU-8618 tests: ha.sh improvements

Patch adds the following ha.sh changes:
- Customise SIMUL and IOR paths.
- Add -p max failover period parameter.
- Add -r dry run parameter.
- Add "iozone" load.
- Add the possibilities to set the number of mpi threads per client.
- CRM is not always configured to fail target back when
  the primary node is back. Add the possibility to execute
  failback command if required.
- The logs from all clients are required if non mpi load fails on
  one client only. Dump logs from all clients.
- Add the possibilities to:
  run ha.sh with custom ior, simul parameters;
  start only the defined list of applications;
  start MPI loads instances on defined number of clients.

Lustre-change: https://review.whamcloud.com/22528
Lustre-commit: d3a044086f5790fec2747c653dca26b8ec529e2d

Test-Parameters: trivial
Seagate-bug-id: MRP-2150, MRP-2896, MRP-3431, MRP-3252, MRP-3495
Signed-off-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Alexander Lezhoev <alexander.lezhoev@seagate.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Change-Id: I252aa0945286b30ffa6bad40aebf0c2cbc0c7261
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28524
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9494 test: Improve message for skipping tests 22/28522/2
Ruth A Klundt [Fri, 7 Jul 2017 15:17:26 +0000 (09:17 -0600)]
LU-9494 test: Improve message for skipping tests

Modify skip messages for consistency and clarity.

Lustre-change: https://review.whamcloud.com/27350
Lustre-commit: 75b426f9f9d3f21e08cd1e62b7fa4962a2b8c679

Test-Parameters: trivial testlist=sanity

Signed-off-by: Ruth Klundt <rklundt@sandia.gov>
Change-Id: I44ced56e67aa63ed84da6a15c88282bc3ff19332
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28522
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-282 tests: remove extra logging from sanity 27 21/28521/2
Andreas Dilger [Mon, 29 May 2017 00:54:37 +0000 (20:54 -0400)]
LU-282 tests: remove extra logging from sanity 27

Remove extra logging from sanity.sh test_27b to avoid confusing
autotest log parsing.

Replace some uses of $SETSTRIPE and $GETSTRIPE with $LFS in these
functions since this was only needed during the ancient transition
from the standalone "lstripe" binary.

Test-Parameters: trivial

Lustre-change: https://review.whamcloud.com/27322
Lustre-commit: f0795a24fd79e2cf2c2dccc1a2b510fabd9ddacf

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I270d2f75cd803bede5776117c9d5aaaa5b3ebbe5
Reviewed-by: Steve Guminski <stephenx.guminski@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28521
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9235 libcfs: don't dump stack if just touched 20/28520/2
Hongchao Zhang [Wed, 10 May 2017 03:45:19 +0000 (11:45 +0800)]
LU-9235 libcfs: don't dump stack if just touched

If some lc_watchdog was touched before lcw_dump_stack dumped
the stack of the thread, it should not dump it anymore for
the thread is verified to be active and no need to dump.

Lustre-change: https://review.whamcloud.com/23162
Lustre-commit: 1376094062c1e46c985f04d82821114d84329699

Change-Id: I8e4acc1793bb8458ee3b6dc73f2953670ed22896
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28520
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9558 lnet: kernel socket accept takes new bool agrument 63/28463/2
James Simmons [Wed, 14 Jun 2017 16:42:52 +0000 (12:42 -0400)]
LU-9558 lnet: kernel socket accept takes new bool agrument

During the development of the linux 4.11 kernel it was discovered
that the kernel socket layer could get into lockdep situation. To
handle this a new bool argument was added to the accept member
of struct socket. For LNet we can always pass false.

Lustre-commit: 15045c9067fd021baa0ec925bcc245949945d01e
Lustre-change: https://review.whamcloud.com/27642

Change-Id: I420cda95b70cf927b1a6e3493b631bc5a3585d74
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/27642
Reviewed-by: Doug Oucharek <doug@cadentcomputing.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/28463
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9364 test: wait rmultiop_start to start 40/28440/2
Hongchao Zhang [Thu, 13 Apr 2017 07:48:32 +0000 (15:48 +0800)]
LU-9364 test: wait rmultiop_start to start

In rmultiop_start, the remote command could be delayed a while,
and wait some time for the command to run.

Test-Parameters: trivial testlist=replay-vbr

Lustre-change: https://review.whamcloud.com/26729
Lustre-commit: f1e337e077b3bc653c91fda0e19a5346ca154705

Change-Id: Ic8beec725edc89a527c74e2033e59e1da0d444c9
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Saurabh Tandan <saurabh.tandan@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28440
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9817 lnet: safe access to msg 39/28439/2
Amir Shehata [Tue, 1 Aug 2017 21:24:57 +0000 (14:24 -0700)]
LU-9817 lnet: safe access to msg

When tx credits are returned if there are pending messages they
need to be sent. Messages could have different tx_cpts, so the
correct one needs to be locked. After lnet_post_send_locked(),
if we locked a different CPT then we need to relock the correct one
However, as part of lnet_post_send_locked(), lnet_finalze() can
be called which can free the message. Therefore, the cpt of the
message being passed must be cached in order to prevent access to
freed memory.

Lustre-change: https://review.whamcloud.com/28308
Lustre-commit: 7af6307ba4e67673bc52aaaeb54f2cb4b632a3b7

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I959fdc30daf87b5575d8371da20d5cf6f64e7d3c
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28439
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9731 Limit work-around to EL7 only 38/28438/2
Brian J. Murrell [Tue, 25 Jul 2017 12:02:24 +0000 (08:02 -0400)]
LU-9731 Limit work-around to EL7 only

Since the workaround previously landed for LU-9731 only applies to
EL7, only apply it for EL7 builds.

Lustre-change: https://review.whamcloud.com/28202
Lustre-commit: 950c511276866efb6af8defe49213fd69b8883d2

Signed-off-by: Brian J. Murrell <brian.murrell@intel.com>
Change-Id: Id74f03f3af74f324320e094e32f7b7480259145c
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28438
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8275 tests: add flag to enable secret shared key for tests 73/28373/2
Chris Hanna [Wed, 19 Jul 2017 14:25:10 +0000 (10:25 -0400)]
LU-8275 tests: add flag to enable secret shared key for tests

When the SHARED_KEY environment variable is set to true,
test-framework will set up a shared key between the nodes and start
Lustre with shared key enabled. Three tests (28,29,30) are also
added to sanity-sec in order to test shared key features.

Lustre-commit: 62ed4f22e21075daa074f2c7f92be6509d76e51c
Lustre-change: https://review.whamcloud.com/20780

Change-Id: I1fb15e1b6541ef67dc72d555fdcf9820874f0f75
Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Signed-off-by: Chris Hanna <hannac@iu.edu>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/28373
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
6 years agoLU-9266 hsm: don't add request when cdt is stopped 41/28441/2
Sergey Cheremencev [Mon, 20 Mar 2017 14:20:40 +0000 (22:20 +0800)]
LU-9266 hsm: don't add request when cdt is stopped

Check cdt_state after getting layout lock in mdt_hsm_add_actions.
Fix protects against several RESTORE records addressed to the
same object in llog. Such records causes mount to hung when
starting hsm:
D: 15524  TASK: ffff880068b5b540  CPU: 4   COMMAND: "lctl"
 #0 [ffff8800bacd9728] schedule at ffffffff81525d30
 #1 [ffff8800bacd97f0] ldlm_completion_ast at ffffffffa08527f5 [ptlrpc]
 #2 [ffff8800bacd9890] ldlm_cli_enqueue_local at ffffffffa0851b8e
[ptlrpc]
 #3 [ffff8800bacd9910] mdt_object_lock0 at ffffffffa0e4ec4c [mdt]
 #4 [ffff8800bacd99c0] mdt_object_lock at ffffffffa0e4f694 [mdt]
 #5 [ffff8800bacd99d0] mdt_object_find_lock at ffffffffa0e4f9c1 [mdt]
 #6 [ffff8800bacd9a00] hsm_restore_cb at ffffffffa0e9b533 [mdt]
 #7 [ffff8800bacd9a50] llog_process_thread at ffffffffa05fd699
[obdclass]
 #8 [ffff8800bacd9b10] llog_process_or_fork at ffffffffa05fdbaf
[obdclass]
 #9 [ffff8800bacd9b60] llog_cat_process_cb at ffffffffa0601250
[obdclass]

Lustre-change: https://review.whamcloud.com/26215
Lustre-commit: 37a5157b84bce367e31743cb8648a15618492531

Change-Id: Ib09139795d847cac2e5f079a192a3548d32db09c
Seagate-bug-id: MRP-4251
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28441
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9744 mdt: avoid wrong CLF_HSM_DIRTY report in ChangeLog 03/28403/3
Bruno Faccini [Fri, 7 Jul 2017 14:42:07 +0000 (16:42 +0200)]
LU-9744 mdt: avoid wrong CLF_HSM_DIRTY report in ChangeLog

In hsm_cdt_request_completed() and upon error being returned
from mdt_hsm_get_md_hsm(), where "struct md_hsm mh" has not been
populated, HS_DIRTY can be wrongly detected and thus CLF_HSM_DIRTY
flag will be reported by error
This can be the cause of errors in associated sanity-hsm sub-tests
test_220a, test_222c, test_222d, tests_224a, when analyzing
ChangeLog flags.
!IS_ERR(obj) should also be tested before adding CLF_HSM_DIRTY
in cl_flags.

Lustre-change: https://review.whamcloud.com/27962
Lustre-commit: 1b6af5006d0a36615f059030b005d55d3c7bb45e

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I4469a55b35ea5d35a9f0be152f085bd676f74240
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28403
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9597 ofd: fix race for project setattr 02/28402/2
Wang Shilong [Sat, 29 Jul 2017 13:54:30 +0000 (21:54 +0800)]
LU-9597 ofd: fix race for project setattr

sanity-quota 33 exposed an intresting project quota bug,
problem is we could hit following condition:

step 1: create an empty file
step 2: buffer write data to file, at this time project id 0 is packed.
step 3: chattr file's project id

If write RPC is generated on client but not reaching OST yet, and step 3
chattr is issued in this time window and chattr on OST is done before
write PRC being processed on OST:

That means we have changed file's project ID, but ofd_attr_handle_id()
did not clear S_ISUID and S_ISGID for this case. Write RPC arrived on
OST, first write RPC will call ofd_attr_handle_id() to set attribute.
Unfortunately it will think this is first write as S_ISUID and S_ISGID
is set, project id will be reset with packed id 0, thus we got wrong
project accounting.

We should use another file mode to indicate whether
project id has been initialized, this patch tries to
fix this problem.

Lustre-change: https://review.whamcloud.com/28274
Lustre-commit: fb9152406da6fe439b75244e769db7dacb20a8d1

Change-Id: Ie2f9ce5b662c3011aa1cd2f59e1fb20526a3e3d7
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28402
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9203 lnet: fix lnet_cpt_of_md() 00/28400/3
Amir Shehata [Fri, 21 Jul 2017 04:17:50 +0000 (21:17 -0700)]
LU-9203 lnet: fix lnet_cpt_of_md()

The intent of this function is to get the cpt nearest to the
memory described by the MD.

There are three scenarios that must be handled:
1. The memory is described by an lnet_kiov_t structure
 -> this describes kernel pages
2. The memory is described by a struct kvec
 -> this describes kernel logical addresses
3. The memory is a contiguous buffer allocated via vmalloc

For case 1 and 2 we look at the first vector which contains
the data to be DMAed, taking into consideration the msg offset.

For case 2 we have to take the extra step of translating the kernel
logical address to a physical page using virt_to_page() macro.

For case 3 we need to use is_vmalloc_addr() and vmalloc_to_page to
get the associated page to be able to identify the CPT.

o2iblnd uses the same strategy when it's mapping the memory into
a scatter/gather list. Therefore, lnet_kvaddr_to_page() common
function was created to be used by both the o2iblnd and
lnet_cpt_of_md()

kmap_to_page() performs the high memory check which
lnet_kvaddr_to_page() does. However, unlike the latter it handles
the highmem case properly instead of calling LBUG. It's not
100% clear why the code was written that way. Since the legacy
code will need to still be maintained, adding kmap_to_page() will
not simplify the code. Furthermore, the behavior for kernels
which export kmap_to_page() will be different from kernels which
do not. At worst calling kmap_to_page() might mask some problems
which would've been caught by the LBUG earlier on. However, at
the time of this fix, that LBUG has never been observed.

Lustre-change: https://review.whamcloud.com/28165
Lustre-commit: 43b0e6328b113d9ee64e0b8a0cc35bff28eb3383

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I2c67e5df77d60112bf27f900e0325d189f193aed
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28400
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9758 build: allow disabling lustre test and iokit rpm creation 72/28372/2
James Simmons [Mon, 31 Jul 2017 17:25:03 +0000 (13:25 -0400)]
LU-9758 build: allow disabling lustre test and iokit rpm creation

While attempting to create a basic set of rpms that didn't
include the lustre test and iokit rpm I encountered build
breakage. Test the lustre_test conditional in the spec file
so we don't attempt to build special lustre test rpms. The
lustre test rpm is actually dependant on the lustre iokit
rpm so if --disable-iokit is set we should disable lustre
test rpms generation as well.

Test-Parameters: trivial

Lustre-commit: f810a0b0c5af842c7e644f1e82594f1615e540e6
Lustre-change: https://review.whamcloud.com/27985

Change-Id: I39e215aa6816b782314c779ad6f752bf32a43341
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/28372
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-7988 hsm: run HSM coordinator once per second at most 68/28368/2
Frank Zago [Wed, 6 Apr 2016 21:03:14 +0000 (16:03 -0500)]
LU-7988 hsm: run HSM coordinator once per second at most

When there is heavy HSM usage, each new HSM request can trigger the
HSM coordinator, which may run many times per seconds. When it is
running it locks the HSM catalog (using cdt_llog_lock) preventing any
other HSM operation to happen, such as insertion, removal or dumping
of the requests.

Limit the coordinator to run once per second, and only if there is
work to do. It will still execute the loop once every 10 seconds (or
as defined by the procfs loop_period parameter) to do housekeeping.

Lustre-change: https://review.whamcloud.com/19341
Lustre-commit: cc6ef11d2f972ebc440013bddda87a536a09750c

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ide3f061f8943a3088ea713993521897fb74e5d99
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-on: https://review.whamcloud.com/28368
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9183 llite: add support set_acl method in inode operations 42/28342/2
Dmitry Eremin [Sat, 10 Jun 2017 21:36:52 +0000 (17:36 -0400)]
LU-9183 llite: add support set_acl method in inode operations

Linux kernel v3.14 adds set_acl method to inode operations.
See kernel commit 893d46e443346370cd4ea81d9d35f72952c62a37

Lustre-commit: c8d56a664306c3ceb4e598999c41eb72ea46a68f
Lustre-change: https://review.whamcloud.com/25965

Change-Id: Ia40d55364016fafa8633fdaecd317910505f8ad4
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/28342
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-7372 mgs: reprocess all locks at device fini 23/28323/3
Jinshan Xiong [Wed, 10 May 2017 17:39:25 +0000 (10:39 -0700)]
LU-7372 mgs: reprocess all locks at device fini

This is to avoid a case that IR lock revocation is going on when
the obd is being stopped, an extra ldlm_reprocess_recovery_done() is
required to make revocation process move forward.

Turn off 'set -e' in rundbench. Otherwise killing dbench process will
return an error to wait(1) in rundbench. Since test-framework has
turned on error on exit, it will set test result as failure, which
is actually a false alarm.

Test-Parameters: envdefinitions=SLOW=yes,ONLY=26 testlist=replay-dual,replay-dual,replay-dual,replay-dual

Lustre-change: https://review.whamcloud.com/17853
Lustre-commit: 2dc19f20ba9fcc1bcac6ae7ee5169ce10caab882

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I43ab1df9c8fe5aea15da6c90175fd08a0b099ea2
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28323
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9728 osd: use GFP_HIGHUSER for non-local IO 18/28318/3
Andreas Dilger [Fri, 30 Jun 2017 20:37:07 +0000 (14:37 -0600)]
LU-9728 osd: use GFP_HIGHUSER for non-local IO

When the obdfilter code was split into separate OFD and OSD modules,
the bulk IO page allocation was implemented to use GFP_NOFS to avoid
allocations recursing into the filesystem and causing deadlocks.

However, this is only possible if the RPC is coming from a local
client, as we might end up waiting on a page sent in the request we're
serving. Local RPCs use __GFP_HIGHMEM so that the pages can use all of
the available memory on the OSS on 32-bit machines.

It is possible to use more aggressive GFP_HIGHUSER flags for non-local
clients to be able to generate more memory pressure on the OSS and
allow inactive pages to be reclaimed, since the OSS doesn't have any
other processes or allocations that generate memory reclaim pressure.

See also b=17576 (bdf50dc9) and b=19529 (3dcf18d3) for details.

The patch also implements an LNet function to determine if a client NID
is local or not.  This becomes more complex in the LNet Multi-Rail world
and it is really LNet's job to handle NIDs, not that of Lustre.

Lustre-change: https://review.whamcloud.com/27908
Lustre-commit: b0ab95d6133e783acacc6329c025d17fb282775e

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I2806c9c5c2fe269669eafdafaf2310924c3ebbe5
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28318
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9683 ptlrpc: fix argument misorder 90/28290/2
Alex Zhuravlev [Wed, 19 Jul 2017 04:59:13 +0000 (00:59 -0400)]
LU-9683 ptlrpc: fix argument misorder

involved in timediffs calculation.

Lustre-commit: 61c48e79fdfb825ea1ab2649cdadaccfb863155c
Lustre-change: https://review.whamcloud.com/28027

Change-Id: Ib4a45dddb3866824b696aaeaa190f2ab9b1c71ac
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/28290
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9773 kernel: kernel update [SLES12 SP2 4.4.74-92.29]
Bob Glossman [Thu, 13 Jul 2017 19:32:30 +0000 (12:32 -0700)]
LU-9773 kernel: kernel update [SLES12 SP2 4.4.74-92.29]

Update target and kernel_config files for new version

Test-Parameters: clientdistro=sles12sp2 testgroup=review-ldiskfs   mdsdistro=sles12sp2 ossdistro=sles12sp2   mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I42107acb5a12f5200d3cb58121d10ffdc1dbc6d2
Reviewed-on: https://review.whamcloud.com/28043
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9763 kernel: kernel update RHEL6.9 [2.6.32-696.6.3.el6]
Bob Glossman [Tue, 11 Jul 2017 18:47:57 +0000 (11:47 -0700)]
LU-9763 kernel: kernel update RHEL6.9 [2.6.32-696.6.3.el6]

Update RHEL6.9 kernel to 2.6.32-696.6.3.el6
kernel patch added as a workaround for LU-9698

Test-Parameters: clientdistro=el6.9 mdsdistro=el6.9   ossdistro=el6.9 mdtfilesystemtype=ldiskfs   ostfilesystemtype=ldiskfs testgroup=review-ldiskfs
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I4c3d2faa35d70b5aa981e8dc9bc630275c1c61f1
Reviewed-on: https://review.whamcloud.com/28029
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9712 kernel: kernel update [SLES11 SP4 3.0.101-107]
Bob Glossman [Mon, 26 Jun 2017 17:05:26 +0000 (10:05 -0700)]
LU-9712 kernel: kernel update [SLES11 SP4 3.0.101-107]

Update SLES11 SP4 kernel to 3.0.101-107

Test-Parameters: mdsdistro=sles11sp4 ossdistro=sles11sp4   clientdistro=sles11sp4 mdtfilesystemtype=ldiskfs   ostfilesystemtype=ldiskfs testgroup=review-ldiskfs
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Ic62b288becc936d54ee88539ab1635ca3a78cf7f
Reviewed-on: https://review.whamcloud.com/27841
Tested-by: Jenkins
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9738 kernel: kernel update RHEL7.3 [3.10.0-514.26.2.el7]
Bob Glossman [Wed, 5 Jul 2017 15:08:06 +0000 (08:08 -0700)]
LU-9738 kernel: kernel update RHEL7.3 [3.10.0-514.26.2.el7]

update RHEL 7.3 kernel to 3.10.0-514.26.2.el7

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I8eff2a9d0c732d97dac1df0c9233d88dbe564a4f
Reviewed-on: https://review.whamcloud.com/27934
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Tested-by: Jenkins
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9345 tests: use hsm_remove with --mntpath for deleted files 20/28320/2
Quentin Bouget [Tue, 9 May 2017 12:05:36 +0000 (14:05 +0200)]
LU-9345 tests: use hsm_remove with --mntpath for deleted files

In test_29d of sanity-hsm, to run "lfs hsm_remove" on a file deleted
from Lustre, one has to use the --mntpath option.

Lustre-change: https://review.whamcloud.com/27006
Lustre-commit: cff9f1e7c6a41bfa05d1455b8964860803d12612

Test-Parameters: trivial testlist=sanity-hsm clientcount=3 envdefinitions="ONLY=29d"
Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I35865c059e498e1a0ced0cebeac22a8491231e00
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28320
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9710 utils: adjust barrier_stat input/output 19/28319/2
Fan Yong [Wed, 19 Jul 2017 05:58:06 +0000 (13:58 +0800)]
LU-9710 utils: adjust barrier_stat input/output

The command format will be:
lctl barrier_stat [--state|-s] [--timeout|-t] <fsname>

If no option is specified, or both "state" and "timeout"
options are specified, then the output format will be:
state: xxx
timeout: nnn seconds

Otherwise, only the value ('xxx' or 'nnn') corresponding
to the given option will be printed.

Lustre-change: https://review.whamcloud.com/27810
Lustre-commit: 3f198baf4fe343b679ce14ee11069126f4be3e72

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: If39f95ef984be3ab709b1366fdefe8eedb4b2453
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28319
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9778 llite: Read ahead should return pages read 16/28316/2
Patrick Farrell [Fri, 14 Jul 2017 14:07:52 +0000 (09:07 -0500)]
LU-9778 llite: Read ahead should return pages read

ll_read_ahead_pages was modified by:
LU-7990 clio: revise readahead to support 16MB IO
d8467ab8a2ca15fbbd5be3429c9cf9ceb0fa78b8

And returning the count of pages read was removed.

This only affects debug, but it's very nice to have it
printed out, and several messages still try to print out
pages read ahead, but print 0.

Restore this functionality.

Lustre-change: https://review.whamcloud.com/28052
Lustre-commit: 805118241598df27c7617eff4cb9d8229bc8d2ba

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I80fe66b5195629e0c46d5d19c76e3bcc0030a22a
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28316
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9500 lnd: Don't Page Align remote_addr with FastReg 37/28237/2
Doug Oucharek [Tue, 16 May 2017 23:00:53 +0000 (16:00 -0700)]
LU-9500 lnd: Don't Page Align remote_addr with FastReg

Trying to page align the remote_addr for IB_RDMA_WRITE work
requests is triggering "dump_cqe" errors from MOFED 4.x + mlx5.

This patch removes the address masking we were doing with FastReg
which was trying to page align remote_addr values. I am also
removing the setting of "mr->iova" with FastReg as this is being
done in the call to ib_map_mr_sg() and could cause problems.

Lustre-change: https://review.whamcloud.com/27149
Lustre-commit: 6c6341804133ea0a4d4535c621f28f61fe6c29ab

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: If35baa467d8d60866f709b5feea7f619063c6da4
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28237
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9729 lnet: correct locking in legacy add net 36/28236/2
Amir Shehata [Sat, 1 Jul 2017 01:06:40 +0000 (18:06 -0700)]
LU-9729 lnet: correct locking in legacy add net

Make sure to unlock the api mutex properly
in lnet_dyn_add_net()

Lustre-change: https://review.whamcloud.com/27907
Lustre-commit: 65326ab2f3e7493f72767cd3c69471f3985c77f6

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I786545de690ea5966771be3e84d3561b794d55ec
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sonia Sharma <sonia.sharma@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28236
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9716 osc: osc_extent_tree_dump0() implementation is suboptimal 35/28235/2
Andrew Perepechko [Wed, 28 Jun 2017 09:24:26 +0000 (12:24 +0300)]
LU-9716 osc: osc_extent_tree_dump0() implementation is suboptimal

Avoid looping in osc_extent_tree_dump() if debugging is disabled.
This helps us save some cpu ticks.

Lustre-change: https://review.whamcloud.com/27866
Lustre-commit: 97115ccd159e4503ca16cb7f68ee7479c780f1cf

Change-Id: I492429d8a6de79f67b5923895ffa58b7fe3a100d
Seagate-bug-id: MRP-4469
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-by: Alexander Boyko <alexander.boyko@seagate.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28235
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9219 tests: add missing mgs reformat to conf-sanity/56 33/28233/2
Jadhav Vikram [Wed, 15 Mar 2017 17:50:56 +0000 (23:20 +0530)]
LU-9219 tests: add missing mgs reformat to conf-sanity/56

conf-sanity/test_56 timedout while mounting client
the reason of timeout is mounting mds failed with -EADDRINUSE.
Registering mdt to MGS while mounting mds index of server was
already present in MGS configuration database so mds mount
failed with -EADDRINUSE and further test stuck while client
mount so test timedout.

This change adds missing mgs reformat when mgs and mds
separate before starting mdt and osts, This will make sure
index of mdt will not be present in MGS config database.

Lustre-change: https://review.whamcloud.com/26029
Lustre-commit: 75eb91aeabcd167fe586e5e0f707cee5e8966133

Seagate-bug-id: MRP-3806
Signed-off-by: Jadhav Vikram <jadhav.vikram@seagate.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
Reviewed-by: Alexandr Boyko <alexander.boyko@seagate.com>
Change-Id: Ie7d897534197af7e01d92d29613123a0290ffc4c
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alexander Boyko <alexander.boyko@seagate.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28233
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9750 nrs: some code cleanup in NRS policies 31/28231/2
Emoly Liu [Wed, 8 Feb 2017 06:21:09 +0000 (14:21 +0800)]
LU-9750 nrs: some code cleanup in NRS policies

This patch does some code cleanup in NRS CRR and ORR polices,
including:
- remove the useless NULL checks
- handle errors properly by multiple labels instead of a single one

Lustre-change: https://review.whamcloud.com/25319
Lustre-commit: 647d9d34be28e53d72606030461d8516979e8590

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: Iafe86ac94042547e83c69e4b46ff7bf1ca31f073
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28231
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9753 ofd: 64-bits diff variable to avoid overflow 30/28230/2
Fan Yong [Fri, 7 Jul 2017 15:24:14 +0000 (23:24 +0800)]
LU-9753 ofd: 64-bits diff variable to avoid overflow

In ofd_create_hdl(), the logic will compare the OST stroed LAST_ID
with the MDT given one: if the difference exceeds some threshold,
then it will trust the OST LAST_ID directly and reset the MDT side
value with the OST one. Otherwise, the orphan OST-objects will be
destroyed.

Unfortunately, both the OST stored LAST_ID and MDT given one are
64 bits, but the @diff variable is only 32 bits, and if the OST
side value is too larger than the MDT side, then the @diff will
overflow. That will misguide the OST to destroy useful OST-objects
by wrong. This patch change the @diff as 64 bits variable.

Lustre-change: https://review.whamcloud.com/27975
Lustre-commit: 03bbd4c27471751ada57282fad15e074ae01e9d7

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: If75899cbab5754be4ede226e0463ba5f69d70e3d
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28230
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9740 ldiskfs: more credits for non-append write 29/28229/2
Fan Yong [Thu, 6 Jul 2017 00:49:32 +0000 (08:49 +0800)]
LU-9740 ldiskfs: more credits for non-append write

As code comment explained: for not append write, the split
may need to modify existing blocks moving entries into the
new ones. That needs more journal credits. The old logic
handled it as append case by typo.

Lustre-change: https://review.whamcloud.com/27947
Lustre-commit: c668a8d405a9d8819bf9b96e0c610ccc5353d77d

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I277f144ec056bb2f07ffd5e5ce19d9a6eee8e0ef
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28229
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9442 osp: can't create IDIF fid number > 0xFFFFFFFF 27/28227/2
Sergey Cheremencev [Thu, 15 Jun 2017 05:14:07 +0000 (08:14 +0300)]
LU-9442 osp: can't create IDIF fid number > 0xFFFFFFFF

fid_is_last_id didn't recognize IDIFes. This caused OST
to allocate new sequence despite of MDT still used
initial FID_SEQ_IDIF. Finally allocation failed with -115
returned from osd_check_lma:
osd_check_lma()) lustre-OST0000-osd: FID [0x100000001:0x0:0x0] != self_fid [0x100000000:0x0:0x0]

Patch has several typical "IDIF" fixes. Also it has lov_objid
fix to store all 48 IDIF bits instead of 32.
Finally it changes union fields order in ost_id.
Before the fix oi_fid.f_seq addressed oi_id instead of oi_seq.

Lustre-change: https://review.whamcloud.com/27225
Lustre-commit: 2dde01f1edac9e330f853c3ffee64eb43d82b7c1

Change-Id: Ifbda97a5b228254aedcb050c3d94d2ecb3a9590c
Seagate-bug-id: MRP-4392
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28227
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9505 llapi: treat MDT index as a hex number 25/28225/2
Emoly Liu [Wed, 17 May 2017 09:04:44 +0000 (17:04 +0800)]
LU-9505 llapi: treat MDT index as a hex number

Since MDT index is a hex number, "base" in strtol() should be 16.

Lustre-change: https://review.whamcloud.com/27156
Lustre-commit: 26b710a536dc58c9fa0320cdbf5f6b7ce4dc1a68

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I50922f9eb5d1095f06a493628ef521d34969a59f
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28225
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8885 tests: zconf_mount_clients() defect 22/28222/2
Arshad Hussain [Fri, 23 Sep 2016 03:12:09 +0000 (23:12 -0400)]
LU-8885 tests: zconf_mount_clients() defect

Presently zconf_mount_clients() returns success(0)
in cases where the 'NFS' mount is already mounted
on a mount point which it is trying to mount again.
In this case, it silently ignores the mount, leading
to testcase failure. This patch addresses this defect
by allowing zconf_mount_clients() to compare that
nothing unexpected is mounted by comparing the result
of mount count with mount count of "type lustre". If
they are unequal the function exists with an error.

Lustre-change: https://review.whamcloud.com/24054
Lustre-commit: 10ec9eba801dc80f1ccb9f8fbcbd4b0258940623

Test-Parameters: envdefinitions=PARALLEL_SCALE_EXCEPT=parallel_grouplock \
testlist=sanity,parallel-scale,parallel-scale-nfsv3
Signed-off-by: Arshad Hussain <arshad.hussain@seagate.com>
Change-Id: I55e4b2ef2a18985be4833fca017cc6c6b0c5410f
Seagate-bug-id: MRP-3773
Reviewed-by: Ashish Purkar <ashish.purkar@seagate.com>
Reviewed-by: Ujjwal Lanjewar <ujjwal.lanjewar@seagate.com>
Reviewed-by: Elena V. Gryaznova <elena.gryaznova@seagate.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28222
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9671 nodemap: restore client's IDs for OST_WRITE 21/28221/2
Niu Yawei [Fri, 16 Jun 2017 04:52:50 +0000 (00:52 -0400)]
LU-9671 nodemap: restore client's IDs for OST_WRITE

Client sets overquota flags for certain UID/GID based on the
IDs & flags in OST_WRITE reply, so we need to reply client IDs
instead of mapped IDs.

Lustre-change: https://review.whamcloud.com/27680
Lustre-commit: e207f9f96fc51f3b6d219193cca3d83aaa99b3e8

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I375847fa734237f9bcea10fa676e09c471a0fcfb
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Stephan Thiell <sthiell@stanford.edu>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28221
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-7988 hsm: mark the cdt as stopped when its thread exits 19/28219/2
Frank Zago [Tue, 27 Sep 2016 19:13:29 +0000 (15:13 -0400)]
LU-7988 hsm: mark the cdt as stopped when its thread exits

Use kthread_stop() to stop and join the coordinator thread. Only after
that step can the coordinator state be set to CDT_STOPPED. As a
coordinator doesn't stop instantly, this closes a race if the
coordinator is being restarted at the same time, leaving one thread
shutting down and a new one starting up.

Lustre-change: https://review.whamcloud.com/22762
Lustre-commit: f11a5022fc129fec797adb155e5553331f224ecc

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I0a21d0d22403a56a8965441e1b57118073b6f210
Signed-off-by: Ben Evans <bevans@cray.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28219
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9514 ptlrpc: free reply buffer earlier for open RPC 17/28217/2
Fan Yong [Wed, 14 Jun 2017 09:03:46 +0000 (17:03 +0800)]
LU-9514 ptlrpc: free reply buffer earlier for open RPC

It is unnecessary to keep the reply buffer for open RPC. Replay
related data has already been saved in the request buffer when
the RPC replied. If the open replay really happen, the replay
logic will alloc the reply buffer when needed.

On the other hand, the client always tries to alloc big enough
space to hold the open RPC reply since the client does not exactly
know how much data the server will reply to the client. So the reply
buffer may be quite larger than the real needed. Under such case,
keeping the large reply buffer for the open RPC will occupy a lot
of RAM as to OOM if the are too many open RPCs to be replayed.

This patch frees the reply buffer for the open RPC when only
the replay logic holds the last reference of the RPC.

Lustre-change: https://review.whamcloud.com/27208
Lustre-commit: c8e3992acf3039b2824725d41f90d9a3be3be921

Test-Parameters: envdefinitions=ONLY=51f testlist=sanity ostfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs
Test-Parameters: envdefinitions=ONLY=51f testlist=sanity ostfilesystemtype=zfs mdtfilesystemtype=zfs
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I1bea2456b8aa4e53a0b65143a48e617f836181a0
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28217
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8062 ptlrpc: increase sleep time in ptlrpc_request_bufs_pack() 81/28181/2
Vitaly Fertman [Wed, 21 Jun 2017 02:52:19 +0000 (22:52 -0400)]
LU-8062 ptlrpc: increase sleep time in ptlrpc_request_bufs_pack()

schedule_timeout() does not necessarily expire. Increase the sleeping
time in ptlrpc_request_bufs_pack() as 2 seconds is too short, given
the 1 second sleep used for recovery-small test_115_write().

Test-Parameters: envdefinitions=ONLY=115 testlist=recovery-small,recovery-small,recovery-small,recovery-small
Test-Parameters: envdefinitions=ONLY=115 testlist=recovery-small,recovery-small,recovery-small,recovery-small
Test-Parameters: envdefinitions=ONLY=115 testlist=recovery-small,recovery-small,recovery-small,recovery-small
Test-Parameters: envdefinitions=ONLY=115 testlist=recovery-small,recovery-small,recovery-small,recovery-small

Lustre-commit: e9e744ea7352ea0d1a5d9b2bd05e0e7c19f08596
Lustre-change: https://review.whamcloud.com/26815

Change-Id: Ia1b1f096b500f67d26e5fa7c7fbd42642992f327
Signed-off-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-on: https://review.whamcloud.com/28181
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9545 lfsck: report "inconsistent" under dryrun mode 12/28112/2
Fan Yong [Tue, 13 Jun 2017 10:52:43 +0000 (18:52 +0800)]
LU-9545 lfsck: report "inconsistent" under dryrun mode

It is confused to report the item as "fixed" under dryrun
mode LFSCK. Instead, report them as "inconsistent".

Lustre-commit: caac78f3f30f6f556671c5049ccd0f98764f572e
Lustre-change: https://review.whamcloud.com/27606

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I22e056d7143a55e0dc06d9a891f4126522b466c9
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/28112
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-8703 libcfs: change CPT estimate algorithm 11/28111/2
Dmitry Eremin [Tue, 13 Jun 2017 00:50:03 +0000 (20:50 -0400)]
LU-8703 libcfs: change CPT estimate algorithm

The main idea to have more CPU partitions is based on KNL experience.
When a thread submit IO for network communication one of threads from
current CPT is used for network stack. Whith high parallelization many
threads become involved in network submission but having less CPU
partitions they will wait until single thread process them from network
queue. So, the bottleneck just moves into network layer in case of
small amount of CPU partitions. My experiments showed that the best
performance was when for each IO thread we have one network thread.
This condition can be provided having 2 real HW cores (without hyper
threads) per CPT. This is exactly what implemented in this patch.

Change CPT estimate algorithm from 2 * (N - 1)^2 < NCPUS <= 2 * N^2
to 2 HW cores per CPT. This is critical for machines with number of
cores different from 2^N.

Current algorithm splits CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 16
cpu_partition_table=
0       : 0-4,68-71,136-139,204-207
1       : 5-9,73-76,141-144,209-212
2       : 10-14,78-81,146-149,214-217
3       : 15-17,72,77,83-85,140,145,151-153,208,219-221
4       : 18-21,82,86-88,150,154-156,213,218,222-224
5       : 22-26,90-93,158-161,226-229
6       : 27-31,95-98,163-166,231-234
7       : 32-35,89,100-103,168-171,236-239
8       : 36-38,94,99,104-105,157,162,167,172-173,225,230,235,240-241
9       : 39-43,107-110,175-178,243-246
10      : 44-48,112-115,180-183,248-251
11      : 49-51,106,111,117-119,174,179,185-187,242,253-255
12      : 52-55,116,120-122,184,188-190,247,252,256-258
13      : 56-60,124-127,192-195,260-263
14      : 61-65,129-132,197-200,265-268
15      : 66-67,123,128,133-135,191,196,201-203,259,264,269-271

New algorithm will split CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 34
cpu_partition_table=
0       : 0-1,68-69,136-137,204-205
1       : 2-3,70-71,138-139,206-207
2       : 4-5,72-73,140-141,208-209
3       : 6-7,74-75,142-143,210-211
4       : 8-9,76-77,144-145,212-213
5       : 10-11,78-79,146-147,214-215
6       : 12-13,80-81,148-149,216-217
7       : 14-15,82-83,150-151,218-219
8       : 16-17,84-85,152-153,220-221
9       : 18-19,86-87,154-155,222-223
10      : 20-21,88-89,156-157,224-225
11      : 22-23,90-91,158-159,226-227
12      : 24-25,92-93,160-161,228-229
13      : 26-27,94-95,162-163,230-231
14      : 28-29,96-97,164-165,232-233
15      : 30-31,98-99,166-167,234-235
16      : 32-33,100-101,168-169,236-237
17      : 34-35,102-103,170-171,238-239
18      : 36-37,104-105,172-173,240-241
19      : 38-39,106-107,174-175,242-243
20      : 40-41,108-109,176-177,244-245
21      : 42-43,110-111,178-179,246-247
22      : 44-45,112-113,180-181,248-249
23      : 46-47,114-115,182-183,250-251
24      : 48-49,116-117,184-185,252-253
25      : 50-51,118-119,186-187,254-255
26      : 52-53,120-121,188-189,256-257
27      : 54-55,122-123,190-191,258-259
28      : 56-57,124-125,192-193,260-261
29      : 58-59,126-127,194-195,262-263
30      : 60-61,128-129,196-197,264-265
31      : 62-63,130-131,198-199,266-267
32      : 64-65,132-133,200-201,268-269
33      : 66-67,134-135,202-203,270-271

'N' pattern in KNL works is not always good.
in flat mode it will be one CPT with all CPUs inside.

in SNC-4 mode:
cpu_partition_table=
0       : 0-17,68-85,136-153,204-221
1       : 18-35,86-103,154-171,222-239
2       : 36-51,104-119,172-187,240-255
3       : 52-67,120-135,188-203,256-271

Lustre-commit: 02dea319b2ef21868b3fa3fad7b3f5cab7eb244e
Lustre-change: https://review.whamcloud.com/24304

Change-Id: I07d1fa2e490dd5720497d438027f128df5f01773
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-on: https://review.whamcloud.com/28111
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoLU-9183 llite: handle xattr with the xattr_handler infrastructure 10/28110/6
James Simmons [Sat, 29 Jul 2017 15:17:00 +0000 (11:17 -0400)]
LU-9183 llite: handle xattr with the xattr_handler infrastructure

In commit fd50ecaddf8372a1d96e0daeaac0f93cf04e4d42 for the linux
4.9 kernel the {get,set,remove}xattr inode operations were removed
and all xattr operations are now handled by xattr_handlers. For
the upsteam lustre client a port was already done with:

Linux-commit: 1e1f9ff406fd5f6003a5dab2ab5a26c4c5bb8cbd
Linux-commit: 2c563880ea8fdc900693ae372fa07b3894f8ff63

This patch brings this work to the OpenSFS/Intel branch. The
difference is that we also have to support older kernels which
means we need to handle the following changes to the
struct xattr_handler:

Linux-commit: e409de992e3ea3674393465f07cc71c948edd87a
Linux-commit: b296821a7c42fa58baa17513b2b7b30ae66f3336

Lastly the xattr_handler api for RHEL6 is too old for proper
support, lacks the flags in struct xattr_handler. Since this
is the case we have to carry around the pre xattr handler
code. Once RHEL6 support is dropped we can remove that code.

Test-Parameters: testgroup=review-ldiskfs
Test-Parameters: testgroup=review-zfs-part-1
Test-Parameters: testgroup=review-zfs-part-2
Test-Parameters: testgroup=review-dne-part-1
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: trivial clientselinux testlist=sanity-selinux

Lustre-commit: 7c332c2757fb125ffb1c902d5302f7f7dc3c0433
Lustre-change: https://review.whamcloud.com/27240

Change-Id: I7bdeb57c09f4a252f61737dbdbfa76939df7b5eb
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/28110
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
6 years agoUpdate the version file for out of tree builds to say 2.10.0
Oleg Drokin [Mon, 24 Jul 2017 15:52:02 +0000 (11:52 -0400)]
Update the version file for out of tree builds to say 2.10.0

Change-Id: I054088778c8a428612c47f661e28891aeb84a72a
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9776 lustre-client cannot be installed if both client and server repos are configured 47/28047/2
Brian J. Murrell [Fri, 14 Jul 2017 12:53:27 +0000 (08:53 -0400)]
LU-9776 lustre-client cannot be installed if both client and server repos are configured

Due to current Obsoletes: tags on lustre-client, the client cannot be
installed on nodes where the client and server repos are configured.

Update the Obsoletes: to only obsolete previous lustre-client versions,
not the current one.

Signed-off-by: Brian J. Murrell <brian.murrell@intel.com>
Change-Id: Ie2f022967fd6f65030feeb23ea9637dce505054a
Reviewed-on: https://review.whamcloud.com/28065
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
6 years agoLU-9731 kmods need to be limited to EL minor release kernel
Brian J. Murrell [Mon, 17 Jul 2017 16:19:04 +0000 (12:19 -0400)]
LU-9731 kmods need to be limited to EL minor release kernel

Due to upstream RHBZ#1467319 kmods are not being populated with the full
kabi information needed to find a matching kernel for the kmod.

Until this is fixed, we need to apply a workaround to achieve the same
result.

Test-Parameters: trivial

Signed-off-by: Brian J. Murrell <brian.murrell@intel.com>
Change-Id: Ib2eab09719c75be8928eaf607efaa2d814baf5f2
Reviewed-on: https://review.whamcloud.com/28066
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9775 Look for kernel-devel in /usr/src/kernels 46/28046/3
Brian J. Murrell [Fri, 14 Jul 2017 12:40:55 +0000 (08:40 -0400)]
LU-9775 Look for kernel-devel in /usr/src/kernels

If one is building in a build [ch]root such as mock provides, one may
not have the kernel installed which corresponds to $(uname -r).  In such
a case, also try to look for the kernel-devel in /usr/src/kernels/ and
just build for the latest one.  Ideally there is only one installed in any case.

Test-Parameters: trivial

Signed-off-by: Brian J. Murrell <brian.murrell@intel.com>
Change-Id: I7704c6ce7078a507fd6b5f9178b07f750dc03789
Reviewed-on: https://review.whamcloud.com/28064
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9725 lwp: wait on deregister 61/28161/2
Niu Yawei [Thu, 20 Jul 2017 23:46:25 +0000 (16:46 -0700)]
LU-9725 lwp: wait on deregister

When lustre_deregister_lwp_item() is being called, it should wait
for the inflight callback done before moving on to free the data
used by the callback.

This patch is back-ported from the following one:
Lustre-commit: 5d5702a3ec24cd1bc7effbadb13d272fa51dff05
Lustre-change: https://review.whamcloud.com/27987

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I9c27a0ae4c765147fd183b78bf3693a66e7511dc
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-on: https://review.whamcloud.com/28161
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoNew release 2.10.0 2.10.0 v2_10_0 v2_10_0_0
Oleg Drokin [Thu, 13 Jul 2017 19:40:16 +0000 (15:40 -0400)]
New release 2.10.0

Change-Id: I8ea141dda61ec87ebcadbf79a3089700b32d6283
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoNew tag 2.10.0-RC3 2.10.0-RC3 v2_10_0_0_RC3 v2_10_0_RC3
Oleg Drokin [Wed, 12 Jul 2017 05:19:00 +0000 (01:19 -0400)]
New tag 2.10.0-RC3

Change-Id: I92f56eab0f07e2a2ac158e83db88c9be31444a90
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9274 ptlrpc: add replay request into unreplied list
Niu Yawei [Thu, 22 Jun 2017 07:03:38 +0000 (03:03 -0400)]
LU-9274 ptlrpc: add replay request into unreplied list

ptlrpc_prepare_replay() may fail to add replay request into unreplied
list if the request hasn't been on replay list yet, so in
ptlrpc_replay_next() before sending replay, we'd always make sure the
replay request is on unreplied list.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I91757cd4fde1d85d146475e078db125acc2c821f
Reviewed-on: https://review.whamcloud.com/27920
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoNew tag 2.10.0-RC2 2.10.0-RC2 v2_10_0_0_RC2 v2_10_0_RC2
Oleg Drokin [Mon, 10 Jul 2017 21:14:22 +0000 (17:14 -0400)]
New tag 2.10.0-RC2

Change-Id: Ia923cb4f9fb0c29e618b18f84a0eff7078f19102
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9305 osd: do not release pages twice
Alex Zhuravlev [Sun, 9 Jul 2017 16:56:53 +0000 (12:56 -0400)]
LU-9305 osd: do not release pages twice

in case of blocksize mismatch dmu_assign_arcbuf() releases passed
abuf internally, including the pages. osd_bufs_put() can't detect
this and may call __free_page() on inappropriate pages (which can
be allocated to someone else already).

Change-Id: I454e56ee3de3d201a14e6ba7b4beabaad42d82ae
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: https://review.whamcloud.com/27950
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9693 tests: Adding sanity test_42a & 42c to always_except
Saurabh Tandan [Tue, 20 Jun 2017 18:57:35 +0000 (11:57 -0700)]
LU-9693 tests: Adding sanity test_42a & 42c to always_except

Sanity test_42a and test_42c were removed from Always_Except
list earlier. But it seems the issues were not completely
resolved.

Hence, adding them back to Always_Except list.

Test-Parameters: trivial
Signed-off-by: Saurabh Tandan <saurabh.tandan@intel.com>
Change-Id: Ib82e0d788054f3ede58c2dbdf5af21227fb4e7f3
Reviewed-on: https://review.whamcloud.com/27740
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Wei Liu <wei3.liu@intel.com>
6 years agoLU-9715 libcfs: crash in cpu_pattern parsing code
Andreas Dilger [Wed, 28 Jun 2017 17:12:28 +0000 (11:12 -0600)]
LU-9715 libcfs: crash in cpu_pattern parsing code

The for loop in cfs_cpt_table_create_pattern() that scans
for brackets to count the number of cpts is broken. It will
increment bracket beyond NULL and it will increment ncpt
beyond the number of available cpts. This has been fixed.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I87bc9de4c531c42c421e8e62edd881417dbcab07
Reviewed-on: https://review.whamcloud.com/27872
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoNew tag 2.10.0-RC1 2.10.0-RC1 v2_10_0_0_RC1 v2_10_0_RC1
Oleg Drokin [Tue, 27 Jun 2017 19:22:43 +0000 (15:22 -0400)]
New tag 2.10.0-RC1

First release candidate for 2.10 release.

Change-Id: Ie65c29d0639504ccead21d633e88e7303328b9fa
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9073 gss: remove newer kernel support 23/27823/2
James Simmons [Mon, 26 Jun 2017 17:54:25 +0000 (13:54 -0400)]
LU-9073 gss: remove newer kernel support

Revert the work to support newer kernels for GSS. For now
disable GSS support for kernels newer than 4.6 so this
doesn't block people on newer distros. Even is very basic
support for GSS is restored I wouldn't recommend this for
production systems at this time.

Change-Id: I7e1636bf695e1686bbdf968d088fcfc5a8f8f062
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/27823
Tested-by: Jenkins
Reviewed-by: Chris Hanna <hannac@iu.edu>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9678 osd-zfs: register arc_prune_func() after site init 08/27708/3
Lai Siyao [Fri, 16 Jun 2017 14:34:33 +0000 (22:34 +0800)]
LU-9678 osd-zfs: register arc_prune_func() after site init

Register arc_prune_func() after site init, otherwise it may be
called while object cache is not initialized yet.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I822252da906f03899386fb0941cc11c1c3366fbf
Reviewed-on: https://review.whamcloud.com/27708
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9049 obdclass: change object lookup to no wait mode 65/26965/9
Lai Siyao [Tue, 23 May 2017 07:56:06 +0000 (15:56 +0800)]
LU-9049 obdclass: change object lookup to no wait mode

Currently we set LU_OBJECT_HEARD_BANSHEE on object when we want
to remove object from cache, but this may lead to deadlock, because
when other process lookup such object, it needs to wait for this
object until release (done at last refcount put), while that process
maybe already hold an LDLM lock.

Now that current code can handle dying object correctly, we can just
return such object in lookup, thus the above deadlock can be avoided.

There is another case we need to make some changes:
objects created in OUT doesn't set dt_body_ops for LOD layer because
originally it's set by lod_create(), change to set dt_body_ops in
lod_object_init() so objects created in OUT is no different from
those created in MDT. To achieve this, functions in lod_body_ops
should check file type inside to avoid misuse.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Ia31ab5f09f9bf80a9aa8fd7e7b60348b02400b25
Reviewed-on: https://review.whamcloud.com/26965
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Cliff White <cliff.white@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9485 llite: check the return value of cl_file_inode_init() 58/27658/7
Bobi Jam [Thu, 15 Jun 2017 08:01:13 +0000 (16:01 +0800)]
LU-9485 llite: check the return value of cl_file_inode_init()

ll_update_inode() does not check the return value of
cl_file_inode_init(), and it should check.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I4174e4f8166d7834a1d619aa8d0191d1f428c62c
Reviewed-on: https://review.whamcloud.com/27658
Tested-by: Jenkins
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-8703 libcfs: rework CPU pattern parsing code 06/23306/14
Dmitry Eremin [Tue, 13 Jun 2017 00:59:16 +0000 (20:59 -0400)]
LU-8703 libcfs: rework CPU pattern parsing code

Rewrite CPU pattern parsing code to avoid passed buffer change
add real errors propogation to caller function.

Change-Id: I8dfee2c0013fcfccd3d99c361d3ec626594689bd
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: https://review.whamcloud.com/23306
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9597 test: wait setattr finished before checking accounting 25/27425/11
Wang Shilong [Wed, 7 Jun 2017 08:10:12 +0000 (16:10 +0800)]
LU-9597 test: wait setattr finished before checking accounting

Need wait setattr finished for ost objects, otherwise, accounting
for projects won't be accurate, also add failure check for setattr
ioctl.

Test-Parameters: testlist=sanity-quota,sanity-quota,sanity-quota,sanity-quota clientdistro=el7 serverdistro=el7 \
        ostfilesystemtype=ldiskfs mdtfilesystemtype=ldiskfs
Change-Id: I106689c224997f79eb779fdc6843704ae7e9ffe6
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/27425
Tested-by: Jenkins
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9558 llite: use struct vma_area_struct address field 81/27281/8
James Simmons [Wed, 14 Jun 2017 17:54:13 +0000 (13:54 -0400)]
LU-9558 llite: use struct vma_area_struct address field

The field virtual_address of struct vma_area_struct was
removed since it provided no better benefit than just
using the address field directly.

Linux-commit: 1a29d85eb0f19b7d8271923d8917d7b4f5540b3e

Change-Id: I05068cdf27c93c5b3201c76ec043bc6c0e66df1f
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/27281
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9210 statahead: missing barrier before wake_up 30/27330/2
Lai Siyao [Tue, 7 Mar 2017 05:56:04 +0000 (13:56 +0800)]
LU-9210 statahead: missing barrier before wake_up

A barrier is missing before wake_up() in ll_statahead_interpret(),
which may cause 'ls' hang.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I391d6222e353fb27761ffd5412b52ce08f7a0592
Reviewed-on: https://review.whamcloud.com/27330
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9074 llite: Stop file creation for ro bind mnt 04/25204/12
Noopur Maheshwari [Thu, 2 Feb 2017 09:09:31 +0000 (14:39 +0530)]
LU-9074 llite: Stop file creation for ro bind mnt

While remounting the bind mount of lustre with read-only, vfs
sets MNT_READONLY in the mnt flags and does not make a call
to lustre. Hence, the change in mnt flags is not reflected in
lustre.

Therefore, file creation goes ahead in lookup operation of
lustre with LOOKUP_CREATE intent set and converted to IT_CREAT.

The fix is to disallow file creation by not setting IT_CREAT
intent when bind mnt pt is readonly and unsetting O_CREAT.

Added a test case to test that files are not created in ro
bind mount point. Files must be created after bind mount point
is converted from ro to rw.

Signed-off-by: Noopur Maheshwari <noopur.maheshwari@seagate.com>
Change-Id: Ic60fb18f539159715049515e264afdf51a00378e
Reviewed-on: https://review.whamcloud.com/25204
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
6 years agoLU-9504 ptlrpc: REP-ACK hr may race with trans commit 07/27207/9
Lai Siyao [Thu, 18 May 2017 16:27:31 +0000 (00:27 +0800)]
LU-9504 ptlrpc: REP-ACK hr may race with trans commit

REP-ACK hr may race with transaction commit, and the latter will
release saved locks, so in REP-ACK hr we need to get locks early to
convert them to COS mode safely.

But the locks got may be decrefed and canceled, in this case it
can't be converted to COS mode, remove an assert in
ldlm_lock_downgrade() for this.

Also protect mdt_steal_ack_locks() with rs_lock because it may also
race with REP-ACK hr. And move ldlm_lock_decref() outside of locks
because it may sleep.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: Ia9a3ba6a83689c0552ae8aaf2eb735c3f06b62e2
Reviewed-on: https://review.whamcloud.com/27207
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>