Whamcloud - gitweb
fs/lustre-release.git
4 years agoLU-7285 update: update next transno only if recovery succeeds 99/16799/3
Di Wang [Thu, 8 Oct 2015 23:58:35 +0000 (16:58 -0700)]
LU-7285 update: update next transno only if recovery succeeds

Update obd_next_recovery_transno only if update recovery
succeeds, otherwise if client send replay request with the
same transno, it will cause panic in check_for_next_transno()

LustreError: 4529:0:(ldlm_lib.c:1826:check_for_next_transno())
ASSERTION( req_transno >= next_transno ) failed: req_transno:
1404455952555, next_transno: 1404455952556

LustreError: 4529:0:(ldlm_lib.c:1826:check_for_next_transno()) LBUG
Call Trace:
[<ffffffffa074c875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa074ce77>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0a8640c>] check_for_next_transno+0x68c/0x6d0 [ptlrpc]
[<ffffffffa089a6ed>] ? keys_fini+0x16d/0x240 [obdclass]
[<ffffffffa0a85d80>] ? check_for_next_transno+0x0/0x6d0 [ptlrpc]
[<ffffffffa0a82883>] target_recovery_overseer+0x93/0x320 [ptlrpc]
[<ffffffffa0a81000>] ? exp_req_replay_healthy+0x0/0x30 [ptlrpc]
[<ffffffffa0a89510>] target_recovery_thread+0x6d0/0x2380 [ptlrpc]
[<ffffffffa0a88e40>] ? target_recovery_thread+0x0/0x2380 [ptlrpc]
[<ffffffff8109e78e>] kthread+0x9e/0xc0

Add replay-single.sh 71a to verify double MDTs failover.

Signed-off-by: Di Wang <di.wang@intel.com>
Change-Id: Id74768a851985a1cec53e6bce28a0bf00b3fc1c7
Reviewed-on: http://review.whamcloud.com/16799
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6271 osc: faulty assertion in osc_object_prune() 27/16727/5
Jinshan Xiong [Tue, 6 Oct 2015 00:45:36 +0000 (17:45 -0700)]
LU-6271 osc: faulty assertion in osc_object_prune()

There may exist freeing pages in object's radix tree at
the time of osc_object_prune(), which causes failure at
the assertion of (osc->oo_npages == 0). This is a safe
race.

This problem is introduced in change at:
Lustre-commit: e8b421531c166b91ab5c1f417570c544bcdd050c
Lustre-change: http://review.whamcloud.com/16456

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I7d4e59bccfb012b870a2e8fa7ab99774def57349
Reviewed-on: http://review.whamcloud.com/16727
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7195 jobstats: Allow setting static content for jobid_var 98/16598/7
Oleg Drokin [Mon, 12 Oct 2015 15:33:34 +0000 (11:33 -0400)]
LU-7195 jobstats: Allow setting static content for jobid_var

When enabling jobstats a ten percent performance was observed
when running any job. This was due to the expense of the kernel
acquiring the process environment state. Create a alternative
way to setting jobid_var besides meddling directly in process
environment variables (which is also not possible on certain
platforms due to not exported  symbols), create jobid_name
proc file to represent this info (to be filled by job scheduler
epilogue). Is this based on the upstream commit

Linux-commit : 76133e66b1417a73c0950d0716219d09ee21d595

except it doesn't remove the process environment probing to
allow backwards compatiblity. This patch doesn't notify the
admins that using old jobstat proc method has a heavy cost.

Change-Id: If81733e549222a7ab31b24673f0e9b8401541130
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
CC: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: http://review.whamcloud.com/16598
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
4 years agoLU-7261 ldiskfs: fix large_xattr overwrite 77/16777/4
Alexey Lyashkov [Fri, 9 Oct 2015 06:23:47 +0000 (00:23 -0600)]
LU-7261 ldiskfs: fix large_xattr overwrite

Handle the case where a large (external inode) xattr is being replaced
correctly.  The special case for this in ext4_set_xattr() was
incorrectly setting the offset of the xattr data within the inode
when it shouldn't have.

Add an e2fsck check of the large_xattr filesystem in conf-sanity
test_61 to verify this is working correctly.

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I27123d7985eff0538b6f64139cebc2f0f1806260
Reviewed-on: http://review.whamcloud.com/16777
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6921 test: failed to operate on TBF rules 05/16305/3
vinayakswami hariharmath [Tue, 8 Sep 2015 06:20:25 +0000 (11:50 +0530)]
LU-6921 test: failed to operate on TBF rules

Operate tbf rules on ost1 rather than ost0.
ost0 looks to be wrong target since OSTCOUNT starts from 1.

Signed-off-by: vinayakswami hariharmath <vinayakswami.hariharmath@seagate.com>
Change-Id: I57b4e1d09411638ba35b37473421c747958620cf
Reviewed-on: http://review.whamcloud.com/16305
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Li Xi <lixi@ddn.com>
4 years agoLU-6868 mdd: add changelog for migration 45/16645/7
wang di [Thu, 24 Sep 2015 07:35:33 +0000 (00:35 -0700)]
LU-6868 mdd: add changelog for migration

Add changelog for migration, so robinhood policy engine
can handle the migration command.

Add test_160d to verify the migration changelog

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Iaa33dee607fcd79285f59bd3131d70b7e5329622
Reviewed-on: http://review.whamcloud.com/16645
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7295 osp: do not warn on uncommitted changes 17/16817/2
Alex Zhuravlev [Wed, 14 Oct 2015 10:07:53 +0000 (13:07 +0300)]
LU-7295 osp: do not warn on uncommitted changes

there is no need to warn about uncommitted changes at umount.

Change-Id: I7a5578d7ea044553fa8a9544e1ee6998468842b4
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/16817
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6746 ptlrpc: Move IT_* definitions to lustre_idl.h 28/16228/7
Ben Evans [Mon, 12 Oct 2015 23:06:04 +0000 (19:06 -0400)]
LU-6746 ptlrpc: Move IT_* definitions to lustre_idl.h

Put IT_* definitions into an enum, as they're sent over the wire,
adjust calls, print statements, etc. to use the new enum.

Signed-off-by: Ben Evans <bevans@cray.com>
Change-Id: Ie6ad700ac185459ace72ea67563864e43c548ec3
Reviewed-on: http://review.whamcloud.com/16228
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6556 obdclass: re-allow catalog to wrap around 12/14912/28
Bruno Faccini [Thu, 21 May 2015 18:02:50 +0000 (20:02 +0200)]
LU-6556 obdclass: re-allow catalog to wrap around

Since patch for LU-4528 a LLOG catalog is no longer allowed to
wrap around. This is a regression and it can also cause catalog
corruption (grow behind max-size/records) upon upgrading if
catalog has already wrap around.

This patch reintroduces catalog wrap around capability, and also
introduces a new test to extensively check it.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ife9a452199895ed9d9f43eb9fdeeac15322e272a
Reviewed-on: http://review.whamcloud.com/14912
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-4341 test: skip failing sanity test 170 46/16146/5
Bob Glossman [Mon, 31 Aug 2015 19:22:32 +0000 (12:22 -0700)]
LU-4341 test: skip failing sanity test 170

Since sanity.sh, test_170 always fails in sles11 testing
add it to ALWAYS_EXCEPT when testing on sles11.
This can be removed when we have a real fix for the test.

Test-Parameters: mdsdistro=sles11sp3 ossdistro=sles11sp3 \
  clientdistro=sles11sp3 mdsfilesystemtype=ldiskfs \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
  testlist=sanity

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I76a2bfaad2bff8786ea832a4c9cabb11a71c11e4
Reviewed-on: http://review.whamcloud.com/16146
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-7153 build: Update SPL/ZFS to 0.6.5.2 99/16399/9
Nathaniel Clark [Mon, 14 Sep 2015 16:51:48 +0000 (12:51 -0400)]
LU-7153 build: Update SPL/ZFS to 0.6.5.2

ZFS/SPL 0.6.5.2

Bug Fixes
* Init script fixes zfsonlinux/zfs#3816
* Fix uioskip crash when skip to end zfsonlinux/zfs#3806
  zfsonlinux/zfs#3850
* Userspace can trigger an assertion zfsonlinux/zfs#3792
* Fix quota userused underflow bug zfsonlinux/zfs#3789
* Fix performance regression from unwanted synchronous I/O
  zfsonlinux/zfs#3780
* Fix deadlock during ARC reclaim zfsonlinux/zfs#3808
  zfsonlinux/zfs#3834
* Fix deadlock with zfs receive and clamscan zfsonlinux/zfs#3719
* Allow NFS activity to defer snapshot unmounts zfsonlinux/zfs#3794
* Linux 4.3 compatibility zfsonlinux/zfs#3799
* Zed reload fixes zfsonlinux/zfs#3773
* Fix PAX Patch/Grsec SLAB_USERCOPY panic zfsonlinux/zfs#3796
* Always remove during dkms uninstall/update zfsonlinux/spl#476

ZFS/SPL 0.6.5.1

Bug Fixes

* Fix zvol corruption with TRIM/discard zfsonlinux/zfs#3798
* Fix NULL as mount(2) syscall data parameter zfsonlinux/zfs#3804
* Fix xattr=sa dataset property not honored zfsonlinux/zfs#3787

ZFS/SPL 0.6.5

Supported Kernels

* Compatible with 2.6.32 - 4.2 Linux kernels.

New Functionality

* Support for temporary mount options.
* Support for accessing the .zfs/snapshot over NFS.
* Support for estimating send stream size when source is a bookmark.
* Administrative commands are allowed to use reserved space improving
  robustness.
* New notify ZEDLETs support email and pushbullet notifications.
* New keyword 'slot' for vdev_id.conf to control what is use for the
  slot number.
* New zpool export -a option unmounts and exports all imported pools.
* New zpool iostat -y omits the first report with statistics since
  boot.
* New zdb can now open the root dataset.
* New zdb can print the numbers of ganged blocks.
* New zdb -ddddd can print details of block pointer objects.
* New zdb -b performance improved.
* New zstreamdump -d prints contents of blocks.

New Feature Flags

* large_blocks - This feature allows the record size on a dataset to
be set larger than 128KB. We currently support block sizes from 512
bytes to 16MB. The benefits of larger blocks, and thus larger IO, need
to be weighed against the cost of COWing a giant block to modify one
byte. Additionally, very large blocks can have an impact on I/O
latency, and also potentially on the memory allocator. Therefore, we
do not allow the record size to be set larger than zfs_max_recordsize
(default 1MB). Larger blocks can be created by changing this tuning,
pools with larger blocks can always be imported and used, regardless
of this setting.

* filesystem_limits - This feature enables filesystem and snapshot
limits. These limits can be used to control how many filesystems
and/or snapshots can be created at the point in the tree on which the
limits are set.

*Performance*

* Improved zvol performance on all kernels (>50% higher throughput,
  >20% lower latency)
* Improved zil performance on Linux 2.6.39 and earlier kernels (10x
  lower latency)
* Improved allocation behavior on mostly full SSD/file pools (5% to
  10% improvement on 90% full pools)
* Improved performance when removing large files.
* Caching improvements (ARC):
** Better cached read performance due to reduced lock contention.
** Smarter heuristics for managing the total size of the cache and the
   distribution of data/metadata.
** Faster release of cached buffers due to unexpected memory pressure.

*Changes in Behavior*

* Default reserved space was increased from 1.6% to 3.3% of total pool
capacity. This default percentage can be controlled through the new
spa_slop_shift module option, setting it to 6 will restore the
previous percentage.

* Loading of the ZFS module stack is now handled by systemd or the
sysv init scripts. Invoking the zfs/zpool commands will not cause the
modules to be automatically loaded. The previous behavior can be
restored by setting the ZFS_MODULE_LOADING=yes environment variable
but this functionality will be removed in a future release.

* Unified SYSV and Gentoo OpenRC initialization scripts. The previous
functionality has been split in to zfs-import, zfs-mount, zfs-share,
and zfs-zed scripts. This allows for independent control of the
services and is consistent with the unit files provided for a systemd
based system. Complete details of the functionality provided by the
updated scripts can be found here.

* Task queues are now dynamic and worker threads will be created and
destroyed as needed. This allows the system to automatically tune
itself to ensure the optimal number of threads are used for the active
workload which can result in a performance improvement.

* Task queue thread priorities were correctly aligned with the default
Linux file system thread priorities. This allows ZFS to compete fairly
with other active Linux file systems when the system is under heavy
load.

* When compression=on the default compression algorithm will be lz4 as
long as the feature is enabled. Otherwise the default remains lzjb.
Similarly lz4 is now the preferred method for compressing meta data
when available.

* The use of mkdir/rmdir/mv in the .zfs/snapshot directory has been
disabled by default both locally and via NFS clients. The
zfs_admin_snapshot module option can be used to re-enable this
functionality.

* LBA weighting is automatically disabled on files and SSDs ensuring
the entire device is used fairly.
* iostat accounting on zvols running on kernels older than Linux 3.19
is no longer supported.

* The known issues preventing swap on zvols for Linux 3.9 and newer
kernels have been resolved. However, deadlocks are still possible for
older kernels.

Module Options

* Changed zfs_arc_c_min default from 4M to 32M to accommodate large
  blocks.
* Added metaslab_aliquot to control how many bytes are written to a
  top-level vdev before moving on to the next one. Increasing this may
  be helpful when using blocks larger than 1M.
* Added spa_slop_shift, see 'reserved space' comment in the 'changes
  to behavior' section.
* Added zfs_admin_snapshot, enable/disable the use of mkdir/rmdir/mv
  in .zfs/snapshot directory.
* Added zfs_arc_lotsfree_percent, throttle I/O when free system
  memory drops below this percentage.
* Added zfs_arc_num_sublists_per_state, used to allow more
  fine-grained locking.
* Added zfs_arc_p_min_shift, used to set a floor on arc_p.
* Added zfs_arc_sys_free, the target number of bytes the ARC should
  leave as free.
* Added zfs_dbgmsg_enable, used to enable the 'dbgmsg' kstat.
* Added zfs_dbgmsg_maxsize, sets the maximum size of the dbgmsg
  buffer.
* Added zfs_max_recordsize, used to control the maximum allowed
  record size.
* Added zfs_arc_meta_strategy, used to select the preferred ARC
  reclaim strategy.
* Removed metaslab_min_alloc_size, it was unused internally due to
  prior changes.
* Removed zfs_arc_memory_throttle_disable, replaced by
  zfs_arc_lotsfree_percent.
* Removed zvol_threads, zvols no longer require a dedicated task
  queue.
* See zfs-module-parameters(5) for complete details on available
  module options.

Bug Fixes

* Improved documentation with many updates, corrections, and
  additions.
* Improved sysv, systemd, initramfs, and dracut support.
* Improved block pointer validation before issuing IO.
* Improved scrub pause heuristics.
* Improved test coverage.
* Improved heuristics for automatic repair when zfs_recover=1 module
  option is set.
* Improved debugging infrastructure via 'dbgmsg' kstat.
* Improved zpool import performance.
* Fixed deadlocks in direct memory reclaim.
* Fixed deadlock on db_mtx and dn_holds.
* Fixed deadlock in dmu_objset_find_dp().
* Fixed deadlock during zfs rollback.
* Fixed kernel panic due to tsd_exit() in ZFS_EXIT.
* Fixed kernel panic when adding a duplicate dbuf to dn_dbufs.
* Fixed kernel panic due to security / ACL creation failure.
* Fixed kernel panic on unmount due to iput taskq.
* Fixed panic due to corrupt nvlist when running utilities.
* Fixed panic on unmount due to not waiting for all znodes to be
  released.
* Fixed panic with zfs clone from different source and target pools.
* Fixed NULL pointer dereference in dsl_prop_get_ds().
* Fixed NULL pointer dereference in dsl_prop_notify_all_cb().
* Fixed NULL pointer dereference in zfsdev_getminor().
* Fixed I/Os are now aggregated across ZIO priority classes.
* Fixed .zfs/snapshot auto-mounting for all supported kernels.
* Fixed 3-digit octal escapes by changing to 4-digit which
  disambiguate the output.
* Fixed hard lockup due to infinite loop in zfs_zget().
* Fixed misreported 'alloc' value for cache devices.
* Fixed spurious hung task watchdog stack traces.
* Fixed direct memory reclaim deadlocks.
* Fixed module loading in zfs import systemd service.
* Fixed intermittent libzfs_init() failure to open /dev/zfs.
* Fixed hot-disk sparing for disk vdevs
* Fixed system spinning during ARC reclaim.
* Fixed formatting errors in {{zfs(8)}}
* Fixed zio pipeline stall by having callers invoke next stage.
* Fixed assertion failed in zrl_tryenter().
* Fixed memory leak in make_root_vdev().
* Fixed memory leak in zpool_in_use().
* Fixed memory leak in libzfs when doing rollback.
* Fixed hold leak in dmu_recv_end_check().
* Fixed refcount leak in bpobj_iterate_impl().
* Fixed misuse of input argument in traverse_visitbp().
* Fixed missing missing mutex_destroy() calls.
* Fixed integer overflows in dmu_read/dmu_write.
* Fixed verify() failure in zio_done().
* Fixed zio_checksum_error() to only include info for ECKSUM errors.
* Fixed -ESTALE to force lookup on missing NFS file handles.
* Fixed spurious failures from dsl_dataset_hold_obj().
* Fixed zfs compressratio when using with 4k sector size.
* Fixed spurious watchdog warnings in prefetch thread.
* Fixed unfair disk space allocation when vdevs are of unequal size.
* Fixed ashift accounting error writing to cache devices.
* Fixed zdb -d has false positive warning when
  feature@large_blocks=disabled.
* Fixed zdb -h | -i seg fault.
* Fixed force-received full stream into a dataset if it has a
  snapshot.
* Fixed snapshot error handling.
* Fixed 'hangs' while deleting large files.
* Fixed lock contention (rrw_exit) while running a read only load.
* Fixed error message when creating a pool to include all problematic
  devices.
* Fixed Xen virtual block device detection, partitions are now
  created.
* Fixed missing E2BIG error handling in zfs_setprop_error().
* Fixed zpool import assertion in libzfs_import.c.
* Fixed zfs send -nv output to stderr.
* Fixed idle pool potentially running itself out of space.
* Fixed narrow race which allowed read(2) to access beyond fstat(2)'s
  reported end-of-file.
* Fixed support for VPATH builds.
* Fixed double counting of HDR_L2ONLY_SIZE in ARC.
* Fixed 'BUG: Bad page state' warning from kernel due to writeback
  flag.
* Fixed arc_available_memory() to check freemem.
* Fixed arc_memory_throttle() to check pageout.
* Fixed'zpool create warning when using zvols in debug builds.
* Fixed loop devices layered on ZFS with 4.1 kernels.
* Fixed zvol contribution to kernel entropy pool.
* Fixed handling of compression flags in arc header.
* Substantial changes to realign code base with illumos.
* Many additional bug fixes.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I87c012aec9ec581b10a417d699dafc7d415abf63
Reviewed-on: http://review.whamcloud.com/16399
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-6155 osd-zfs: dbuf_hold_impl() called without the lock 41/13541/16
Isaac Huang [Tue, 27 Jan 2015 21:03:32 +0000 (14:03 -0700)]
LU-6155 osd-zfs: dbuf_hold_impl() called without the lock

The osd-zfs osd_count_not_mapped() calls dbuf_hold_impl() without
the required lock. In addition, dbuf_hold_impl() is an internal
function and has the expensive side effect of reading the block
from disk which would convert a full-block write into a
read-modify-write.

Since space estimation with ZFS is complicated any way, just use
the worst case as a rough estimate where a snapshot holds all current
blocks, i.e. no old space can be freed after the COW.

Skip test sanity-quota/23 on ZFS because overwrites on ZFS are not
guarenteed to be space neutral, and new worst-case assumptions will
always cause this test to fail.

Change-Id: Idf6f2ff80ff185ca8c0f38e1002ff90e457c3ca0
Signed-off-by: Isaac Huang <he.huang@intel.com>
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-on: http://review.whamcloud.com/13541
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6852 ldlm: Do not evict MDS-MDS connection 24/13224/45
wang di [Tue, 6 Oct 2015 18:36:43 +0000 (14:36 -0400)]
LU-6852 ldlm: Do not evict MDS-MDS connection

Do not put the MDT-MDT lock in the waiting lock list, so
it will evict MDTs due to the lock timeout between MDTs,
which can help the updates replay being finished finally,
so the DNE filesystem will be in consistent state after
recovery.

If for some reasons, the filesystem will hang there because
of these two changes, then the administrator should step in
and inactivate the MDT manually and run lfsck.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I83e7f8f55ee15730ed2d9826d08a398ddd72792a
Reviewed-on: http://review.whamcloud.com/13224
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6215 lnet: make o2iblnd buildable for 4.2.1 kernels 67/16767/3
James Simmons [Thu, 8 Oct 2015 14:33:06 +0000 (10:33 -0400)]
LU-6215 lnet: make o2iblnd buildable for 4.2.1 kernels

The commit f5c9753872cfa8ad47821be3fa924c74c4c8b0d
altered some macros for the ko2iblnd driver which wasn't
updated for the most recent kernels. A simple one line change
restores this support.

Change-Id: Iedd5e36451bf84aae29058e40a89055f451bfeec
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16767
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-2049 grant: delay grant releasing until commit 31/13531/14
Johann Lombardi [Mon, 26 Jan 2015 16:04:51 +0000 (17:04 +0100)]
LU-2049 grant: delay grant releasing until commit

Grant space acquired for a bulk write is released from the grant
accounting at the end of request processing. At that point, the
additional space consumed by the write request is believed to be
taken into account in any subsequent statfs call.
However, it does not seem to be the case with all backend
filesystems and more particularly ZFS which seems to provide
reliable space information only once the transaction associated
with the bulk write has committed. This creates a hole in the
grant space management where we can end up allocating more grant
space than really available.

This patch postpones grant releasing until transaction commit time.
This is done by registering a commit callback in charge of this
operation.
The patch also removes the implicit use of info->fti_used and stores
the amount of grant space to be released in obdo::o_grant_used.

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Change-Id: Id99b8712ffc1e5f103df4835b698127619b8ba85
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/13531
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6204 misc: Add missing MODULE_VERSION for lustre 29/16729/4
James Simmons [Wed, 7 Oct 2015 14:46:27 +0000 (10:46 -0400)]
LU-6204 misc: Add missing MODULE_VERSION for lustre

Many of the lustre modules are missing a MODULE_VERSION.
Update the remaining MODULE_AUTHORS from Intel to OpenSFS.

Change-Id: Iae24d820c68c570c6e1399bbc7396060d21bdf41
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16729
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7244 llite: Fix XATTR_NAME_EVM redefinition 07/16707/3
Dmitry Eremin [Sat, 12 Sep 2015 04:10:37 +0000 (23:10 -0500)]
LU-7244 llite: Fix XATTR_NAME_EVM redefinition

In Linux kernel version 3.2.x the defintion of XATTR_NAME_EVM exist
but defintion XATTR_NAME_IMA is not. So, check them independently.

Change-Id: Ib98534d278ae4d5eaaa86538beb9bf683b9cf807
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/16707
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7122 utils: changelog_{de}register cleanup 41/16341/4
Henri Doreau [Tue, 28 Apr 2015 14:15:41 +0000 (16:15 +0200)]
LU-7122 utils: changelog_{de}register cleanup

Document the -n switch for "changelog_register" in the man page.
Apply coding style and remove unneeded code.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: I38431d371bc08f5068e5b7e3e62a7847dc64283d
Reviewed-on: http://review.whamcloud.com/16341
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6899 test: rename sanity test_162 to test_162a 10/15710/2
Elena Gryaznova [Fri, 24 Jul 2015 14:20:09 +0000 (17:20 +0300)]
LU-6899 test: rename sanity test_162 to test_162a

Made this test be run separately from others in this group.
- sanity test_162 is renamed to test_162a

Signed-off-by: Elena Gryaznova <elena.gryaznova@seagate.com>
Seagate-bug-id: MRP-2496
Reviewed-by: Alexander Lezhoev <alexander.lezhoev@seagate.com>
Change-Id: I945b5ab006722d230058ebf44538480e018964c9
Reviewed-on: http://review.whamcloud.com/15710
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5733 lnet: Use lnet_is_route_alive for router aliveness 55/14055/2
Chris Horn [Thu, 12 Mar 2015 22:39:17 +0000 (17:39 -0500)]
LU-5733 lnet: Use lnet_is_route_alive for router aliveness

lctl show_route and lctl route_list will output router aliveness
information via lnet_get_route(). lnet_get_route() should use the
lnet_is_route_alive() function, introduced in e8a1124
http://review.whamcloud.com/7857, to determine route aliveness.

Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: Ie57aebeb6b4c80a3b89ed72fc6acbccbbd321be1
Reviewed-on: http://review.whamcloud.com/14055
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7184 lod: cleanup unused OSP devices on error 35/16635/3
John L. Hammond [Thu, 24 Sep 2015 20:46:31 +0000 (15:46 -0500)]
LU-7184 lod: cleanup unused OSP devices on error

In lod_add_device() if the OSP device to be added cannot be added then
call LCFG_CLEANUP in the cleanup path.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I01e0a5b0f541481a002cf60fcece05908ba3194f
Reviewed-on: http://review.whamcloud.com/16635
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6895 scrub: not trigger scrub if inode removed by race 39/16439/8
Fan Yong [Mon, 21 Sep 2015 09:10:48 +0000 (17:10 +0800)]
LU-6895 scrub: not trigger scrub if inode removed by race

When osd_consistency_check(), the target file may has just been
removed by other, so the osd_oi_lookup() will return -ENOENT to
the caller. Under such case, the osd_consistency_check() should
not trigger OI scrub.

On the other hand, if someone unlinked the file during OI scrub
adding the missed OI mapping to the OI file, the OI scrub needs
to remove the new added OI mapping.

Test-Parameters: alwaysuploadlogs \
envdefinitions=SLOW=yes,ENABLE_QUOTA=yes \
mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs \
ostfilesystemtype=ldiskfs clientdistro=el7 ossdistro=el7 \
mdsdistro=el7 mdtcount=1 \
testlist=sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I4703fc7f99d7b0a0f769127b5cdba5a2b992250d
Reviewed-on: http://review.whamcloud.com/16439
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6895 lfsck: not destroy directory when fix FID-in-dirent 40/16440/9
Fan Yong [Fri, 14 Aug 2015 04:34:54 +0000 (12:34 +0800)]
LU-6895 lfsck: not destroy directory when fix FID-in-dirent

When repair FID-in-dirent, the lfsck may append the FID after
the name entry directly. If checking the space after the name
entry improperly, it may over write the subsequent name entry
as to crash the whole directory.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes,ENABLE_QUOTA=yes mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs clientdistro=el7 ossdistro=el7 mdsdistro=el7 mdtcount=1 testlist=sanity-lfsck,sanity-lfsck,sanity-lfsck,sanity-lfsck
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ia1afc643fdfac205a5ea7aa9c365e45b4da90868
Reviewed-on: http://review.whamcloud.com/16440
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6386 tgt: don't update client data with smaller transno 13/14113/6
Mikhail Pershin [Thu, 1 Oct 2015 18:34:27 +0000 (21:34 +0300)]
LU-6386 tgt: don't update client data with smaller transno

Fix tgt_last_rcvd_update() to don't update transaction number
in client slot with smaller value.

Also patch removes outdated code about ted_lcd == NULL case.
This is not possible now, because lcd is set to NULL only
upon export destroy. This check was needed in past when that
lcd was set to NULL during export disconnect and some activity
was still possible on this export.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I34717ea91493785beadcf725d49c4c9265b63f7c
Reviewed-on: http://review.whamcloud.com/14113
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7045 osd: enough credits for single indirect block write 30/16330/8
Fan Yong [Fri, 7 Aug 2015 05:13:03 +0000 (13:13 +0800)]
LU-7045 osd: enough credits for single indirect block write

For single indirect block case, if the i_data[LDISKFS_IND_BLOCK]
block is not allocated, the osd_calc_bkmap_credits() should declare
additional three blocks for subsequent write operation; otherwise,
preserve another single block for that.

Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes,ENABLE_QUOTA=yes,CONF_SANITY_EXCEPT=45 mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs clientdistro=el7 ossdistro=el7 mdsdistro=el7 mdtcount=1 testlist=conf-sanity,conf-sanity,conf-sanity,conf-sanity,conf-sanity
Test-Parameters: alwaysuploadlogs envdefinitions=SLOW=yes,ENABLE_QUOTA=yes,CONF_SANITY_EXCEPT=45 mdtfilesystemtype=ldiskfs mdsfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs clientdistro=el7 ossdistro=el7 mdsdistro=el7 mdscount=2 mdtcount=4 testlist=conf-sanity,conf-sanity,conf-sanity,conf-sanity,conf-sanity
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I76b50cef8df56b49dae7afe4d759a55599548479
Reviewed-on: http://review.whamcloud.com/16330
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-6842 clio: add cl_page LRU shrinker 30/15630/12
Bobi Jam [Fri, 17 Jul 2015 05:36:37 +0000 (13:36 +0800)]
LU-6842 clio: add cl_page LRU shrinker

Register cache shrinker to reclaim memory from cl_page LRU list.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: Id22fd1f1f8554dc03ac7313a58abd8cd3472ece0
Reviewed-on: http://review.whamcloud.com/15630
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7005 tests: wait client imports fully recovered 83/15983/10
wang di [Wed, 12 Aug 2015 06:25:05 +0000 (23:25 -0700)]
LU-7005 tests: wait client imports fully recovered

In conf-sanity.sh 50i, it should wait client and all MDTs recover
before creating files.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I637ebfb6c531708e194df4c03d8657361d1b40ee
Reviewed-on: http://review.whamcloud.com/15983
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7196 kernel: kernel update RHEL 6.7 [2.6.32-573.7.1.el6] 08/16608/4
Bob Glossman [Tue, 22 Sep 2015 16:13:48 +0000 (09:13 -0700)]
LU-7196 kernel: kernel update RHEL 6.7 [2.6.32-573.7.1.el6]

update RHEL 6.7 kernel to 2.6.32-573.7.1.el6

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I1b90ac046582c052612219b8af1d172069bb01fd
Reviewed-on: http://review.whamcloud.com/16608
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6886 mdd: declare changelog store for POSIX ACLs 60/15660/3
Li Dongyang [Tue, 21 Jul 2015 05:20:49 +0000 (15:20 +1000)]
LU-6886 mdd: declare changelog store for POSIX ACLs

mdd_xattr_del() records POSIX ACL ops in the changelog,
we should declare them in mdd_declare_xattr_del().

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Change-Id: I9184c7906d0da715c12b833bab080c56a1a07285
Reviewed-on: http://review.whamcloud.com/15660
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-7074 mdd: validate the linkea before packing 35/16235/13
wang di [Wed, 2 Sep 2015 10:51:28 +0000 (03:51 -0700)]
LU-7074 mdd: validate the linkea before packing

During migration, let's validate linkea entry before
packing the updates into the buffer and sending to
the remote MDT.

And also move retrieving linkea before transaction
start to avoiding sending RPC inside the transaction.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I36f235274d39560f6654fd76967e45400e8187ce
Reviewed-on: http://review.whamcloud.com/16235
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7228 build: make lustre rpm also provide lustre-client 73/16673/2
Frank Zago [Tue, 29 Sep 2015 20:47:38 +0000 (15:47 -0500)]
LU-7228 build: make lustre rpm also provide lustre-client

An application packaged in an rpm has to depend on either lustre or
lustre-client. Since the lustre rpm also includes everything the
lustre-client does, it should also provides lustre-client.

That way an application rpm only has to require lustre-client and not
juggle between a lustre or lustre-client dependency.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I46ae8a96b0fbc6153a288bf45896f7b4ed1dfddc
Reviewed-on: http://review.whamcloud.com/16673
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Thomas LEIBOVICI <thomas.leibovici@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7082 test: fix synchronization of conf_sanity test_90 15/16215/6
Gregoire Pichon [Thu, 3 Sep 2015 12:04:34 +0000 (14:04 +0200)]
LU-7082 test: fix synchronization of conf_sanity test_90

Add some delays in check_max_mod_rpcs_in_flight() routine
to ensure background commands are launched before continuing
test execution.

Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Change-Id: Ia1c943b7b58ecfe4f3fd80d6470a8ee2650789e7
Reviewed-on: http://review.whamcloud.com/16215
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6215 ldlm: handle percpu_counter_init change in 3.18+ kernels 49/16649/6
James Simmons [Thu, 1 Oct 2015 18:59:15 +0000 (14:59 -0400)]
LU-6215 ldlm: handle percpu_counter_init change in 3.18+ kernels

Starting in 3.18 kernels the function percpu_counter_init()
started to take memory allocation flags GFP_*. This patch
detects and handles this new case thus enabling lustre servers
to function up to 4.2.1 kernels.

Change-Id: Ibdb716987c367dc6ea93f6f9747fb70fd7ac2cbb
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16649
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Frank Zago <fzago@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6527 ext4: journal_commit_callback optimization 11/14611/4
Sergey Cheremencev [Wed, 8 Apr 2015 19:18:07 +0000 (22:18 +0300)]
LU-6527 ext4: journal_commit_callback optimization

Don't take spinlock in tgt_cb_last_committed, if
exp_last_committed was updated with higher trasno.
Also change list_add_tail to list_add. It gives
advantages to ldiskfs in tgt_cb_last_committed.
In the beginning of list will be placed thandles
with the highest transaction numbers. So at the
first iterations we will have the highest transno.
It will save from extra call of
ptlrpc_commit_replies.

Change-Id: Ib6f9cc54dae7d9ac1ca301402299f308b825ede4
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Xyratex-bug-id: MRP-2575
Reviewed-on: http://es-gerrit.xyus.xyratex.com:8080/5907
Tested-by: Jenkins
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Alexander Boyko <alexander.boyko@seagate.com>
Tested-by: Alexander Lezhoev <alexander.lezhoev@seagate.com>
Reviewed-by: Alexey Leonidovich Lyashkov <alexey.lyashkov@seagate.com>
Reviewed-on: http://review.whamcloud.com/14611
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-3322 ko2iblnd: Support different configs between systems 94/11794/11
Jeremy Filizetti [Wed, 13 May 2015 21:19:04 +0000 (17:19 -0400)]
LU-3322 ko2iblnd: Support different configs between systems

This patch adds suppoort for ko2iblnd to have different values for
peer_credits and map_on_demand between systems.

Signed-off-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Change-Id: Idfe5acdfdde5c2185488b92c96d7a83f1705a556
Reviewed-on: http://review.whamcloud.com/11794
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6584 osd: prevent int type overflow in osd_read_prep() 85/16685/2
Mikhail Pershin [Wed, 30 Sep 2015 18:11:04 +0000 (21:11 +0300)]
LU-6584 osd: prevent int type overflow in osd_read_prep()

There is possible type overflow in osd_read_prep() that may
cause too big value in lnb_rc followed by assertion.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: If17b533e7d0dcae7db57eefc0e5981821f628c56
Reviewed-on: http://review.whamcloud.com/16685
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Cliff White <cliff.white@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7222 tests: add Mulitple MDTs to test_84 62/16662/2
wang di [Sun, 27 Sep 2015 07:27:48 +0000 (00:27 -0700)]
LU-7222 tests: add Mulitple MDTs to test_84

Add multiple MDTs to conf_sanity.sh test_84(), and
add more information into the error message when
config log corrupted.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I45160d053f8dd52ca3230888e720fc04102d50ab
Reviewed-on: http://review.whamcloud.com/16662
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-3281 obdclass: remove structure holes to reduce memory 92/16692/2
Andreas Dilger [Thu, 9 May 2013 04:00:07 +0000 (22:00 -0600)]
LU-3281 obdclass: remove structure holes to reduce memory

Fix the alignment of fields in commonly-used structures to reduce
memory usage on the client and server.  Structures fixed:

ptlrpc_reply_state: reduced by 8 bytes
obd_device:         reduced by 16 bytes
niobuf_local:       reduced by 8 bytes

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: Ibe50c02e7ba823e337e846f90c6267cffc3ebbe5
Reviewed-on: http://review.whamcloud.com/16692
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7162 kernel: kernel update RHEL 7.1 [3.10.0-229.14.1.el7] 44/16444/4
Bob Glossman [Tue, 15 Sep 2015 19:52:23 +0000 (12:52 -0700)]
LU-7162 kernel: kernel update RHEL 7.1 [3.10.0-229.14.1.el7]

Update RHEL7.1 kernel to 3.10.0-229.14.1.el7

Test-Parameters: mdsdistro=el7 ossdistro=el7 \
  clientdistro=el7 mdsfilesystemtype=ldiskfs \
  mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
  testgroup=review-ldiskfs

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: Iac500f0a6c2bbbe43014125b6327bacf5be4e59b
Reviewed-on: http://review.whamcloud.com/16444
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoRevert "LU-5951 ptlrpc: track unreplied requests" 34/16734/2
Oleg Drokin [Tue, 6 Oct 2015 17:21:02 +0000 (17:21 +0000)]
Revert "LU-5951 ptlrpc: track unreplied requests"

This causes a blocker LU-7252

This reverts commit c77e504fdac12d3be7d19a652d6c7da497018c76.

Change-Id: I1f442b9e8ce2e73484b229ede34b6b013e57ad70
Reviewed-on: http://review.whamcloud.com/16734
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7179 scripts: init and ha script fixes 72/16472/4
Olaf Faaland [Thu, 17 Sep 2015 19:10:50 +0000 (12:10 -0700)]
LU-7179 scripts: init and ha script fixes

1) Because of a typo, Lustre.ha_v2 currently continues running after
determining that a bad resource-name has been provided by the user.

This commit fixes that typo so that die() is called when
the target-name is bad.

2) When a target is in recovery, lustre/scripts/lustre produces
improper output when run as follows, and a relevant target is in
recovery:

/etc/init.d/lustre status
/etc/init.d/lustre status local
/etc/init.d/lustre status foreign

A grep command in health_check() expects variables to contain the path
to /proc files containing recovery status, but these variables'
contents were altered in a prior commit.

e3ddff LU-5030 utils: fix hard-coded /proc/fs/lustre in scripts

This commit fixes health_check() to correctly report recovery by
obtaining recovery status via lctl and checking that with grep.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: I25b8c0d82b637cf9d40feace7d8b964ffcd34251
Reviewed-on: http://review.whamcloud.com/16472
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7190 lfsck: tolerate MDT-OST communication failures 67/16667/2
Fan Yong [Fri, 14 Aug 2015 03:01:50 +0000 (11:01 +0800)]
LU-7190 lfsck: tolerate MDT-OST communication failures

During the 2nd phase scanning, the layout LFSCK slave engine on the
OST will query the master engine status from the MDT periodically.
Sometimes, the query RPC may hit failure that may because network
trouble, or the MDS node issues. To make the LFSCK can go ahead,
the slave engine will not wait for ever, instead, it will assume
the master engine has exited without notifying (or fail to notify)
the slave engine. So the slave engine will exit also and clean up
the LFSCK environment on the OST, including the OST-object access
bitmap that is used to find out orphan OST-objects.

On the other hand, the assumption of master engine exit maybe wrong.
If the master engine does not exit, and the network trouble between
the MDS and OSS recovered after the slave engine exited, then the
master engine will try to find out orphan OST-objects during its
2nd phase scanning. But because the slave engine has already exited
and released the OST-object access bitmap, the master engine has
no way to find out orphan OST-objects.

To avoid above trouble, we make some compromise: when the slave
engine on the OST failed to query the master engine status, it will
not exit at once, instead, it will try several times. If the network
trouble can recover during such interval, the LFSCK will go ahead;
otherwise, the slave engine will exit as original does.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ifa06552c61d885297a54ab6bfdc92d48c8f56fa3
Reviewed-on: http://review.whamcloud.com/16667
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6204 misc: Update module author to OpenSFS 32/16132/5
James Simmons [Tue, 29 Sep 2015 17:30:00 +0000 (13:30 -0400)]
LU-6204 misc: Update module author to OpenSFS

The modinfo data has gone stale for the author information.
This patch changes all the MODULE_AUTHOR to OpenSFS.

Change-Id: I730356ddffa747194ad164e60ab1e90d58b1f87b
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16132
Reviewed-by: Frank Zago <fzago@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-4178 tests: Wait requests to reach CDT before Cancel 73/13173/2
Bruno Faccini [Tue, 23 Dec 2014 10:28:59 +0000 (11:28 +0100)]
LU-4178 tests: Wait requests to reach CDT before Cancel

sanity-hsm/test_[200-202] sometimes fail because the Cancel
reaches the CDT before the operation it targets.
This patch verifies the operation has already been registered
at CDT before to send the Cancel.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ic6299a2fdb6b6a358a0ce6ecd5a17a8cf9839c87
Reviewed-on: http://review.whamcloud.com/13173
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: jacques-Charles Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6215 llite: bio_endio only takes one argument for 4.2 78/16278/6
James Simmons [Tue, 29 Sep 2015 14:10:29 +0000 (10:10 -0400)]
LU-6215 llite: bio_endio only takes one argument for 4.2

For the 4.2 kernel bio_endio() is down to taking only
one argument. This patch handles this API change.

Change-Id: I22edc64e76d22241c8c809acf58cf64dd67bbb61
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16278
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6996 osd-ldiskfs: handle stale OI mapping cache 57/16157/8
Fan Yong [Mon, 21 Sep 2015 00:56:52 +0000 (08:56 +0800)]
LU-6996 osd-ldiskfs: handle stale OI mapping cache

On server side, the RPC service thread may cache one OI mapping
on its stack, such OI mapping will become invalid if some other
removed the object by race. If the RPC service thread uses the
cached OI mapping and finds the inode that has been unlinked
and reused by other object with no LMA generated yet, then
the osd_check_lma() should NOT skip such case to avoid the
caller mis-using the inode by wrong.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I46348a06327bcf944aff9af7914230573e2cef89
Reviewed-on: http://review.whamcloud.com/16157
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6741 osp: bulk transfer for osp_md_read 99/15899/11
wang di [Tue, 8 Sep 2015 15:05:49 +0000 (08:05 -0700)]
LU-6741 osp: bulk transfer for osp_md_read

Do buffer bulk read for osp_md_read(), so it would
be more efficient to retrieve update logs from remote
target.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Idcb7ef402d02ad46a33cb4d913763235a6215b5b
Reviewed-on: http://review.whamcloud.com/15899
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6865 mdd: check return value of posix_acl_from_xattr 33/15633/8
Li Dongyang [Fri, 17 Jul 2015 14:04:03 +0000 (00:04 +1000)]
LU-6865 mdd: check return value of posix_acl_from_xattr

passing a ERR_PTR to posix_acl_release will cause a crash

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Change-Id: I1870121e2f4fb187cd8c58f263b651ddf83a574b
Reviewed-on: http://review.whamcloud.com/15633
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
4 years agoLU-6850 lnet: remove references to ib_reg_phys_mr() 88/15788/14
Amir Shehata [Wed, 29 Jul 2015 14:53:47 +0000 (07:53 -0700)]
LU-6850 lnet: remove references to ib_reg_phys_mr()

Removed references to ib_reg_phys_mr() and PMR which were added
to deal with some Chelsio specific scenario, but are no longer needed.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: I07fc8e66e5bb0ade286761612b9b878fae34c183
Reviewed-on: http://review.whamcloud.com/15788
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoNew tag 2.7.61 2.7.61 v2_7_61 v2_7_61_0
Oleg Drokin [Tue, 6 Oct 2015 01:44:33 +0000 (21:44 -0400)]
New tag 2.7.61

Change-Id: I31b1393fd2610dbc4b2d17b4730e5fb406ee04bf
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6984 lmv: remove nlink check in lmv_revalidate_slaves 90/16490/8
Di Wang [Wed, 26 Aug 2015 06:15:43 +0000 (23:15 -0700)]
LU-6984 lmv: remove nlink check in lmv_revalidate_slaves

Remove nlink < 2 check in lmv_revalidate_slaves, because
after nlink reaches to LDISKFS_LINK_MAX (65000), the inode
nlink will be set to 1.

Add test_300o in sanity to verify the case.

And also add version check for striped dir test.

Test-Parameters: envdefinitions=SLOW=yes testlist=sanity
Change-Id: I1a333ea7f68da9157679c1358df5f7a54aee8e51
Signed-off-by: di wang <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/16490
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7091 mdt: release cross-MDT lock immediately 72/16372/8
wang di [Wed, 9 Sep 2015 17:51:39 +0000 (10:51 -0700)]
LU-7091 mdt: release cross-MDT lock immediately

Because the cross-MDT operations are relatively rare
compared with the normal operation, so let's release
cross-MDT lock immediately after the operation.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I48be60a13e9d10a92595c7faeb91dd8c106b2d42
Reviewed-on: http://review.whamcloud.com/16372
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-4423 ptlrpc: incorrect AT index type 93/16693/2
Oleg Drokin [Sun, 27 Sep 2015 19:07:45 +0000 (15:07 -0400)]
LU-4423 ptlrpc: incorrect AT index type

Arnd Bergmann <arnd@arndb.de> noticed that rq_at_index is incorrectly
labelled as time_t where as it's really an integer index.

Change-Id: Id6858def627054eb87d9860ce3d98984970ed481
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: http://review.whamcloud.com/16693
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-7218 osc: fix max_dirty_mb tunable setting limit 52/16652/2
Gregoire Pichon [Mon, 28 Sep 2015 11:30:29 +0000 (13:30 +0200)]
LU-7218 osc: fix max_dirty_mb tunable setting limit

The OSC tunable max_dirty_mb must be set to a value strictly lower
than 2048, as it is assumed by OSS in ofd_grant_alloc() routine.

Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Change-Id: I66a0927c69749cdbb9cd48459af67a57c3e25af0
Reviewed-on: http://review.whamcloud.com/16652
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
4 years agoLU-7191 test: sanity 27z failed on DNE with ZFS 05/16605/2
Lai Siyao [Wed, 23 Sep 2015 06:35:55 +0000 (14:35 +0800)]
LU-7191 test: sanity 27z failed on DNE with ZFS

Because ZFS osd always use LPU64 as OST object name, sanity 27z
function check_seq_oid() should follow this semantic to check
obj, otherwise it will not find OST obj.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I28f9a2db291f89c16bd142886c195da64c6817bb
Reviewed-on: http://review.whamcloud.com/16605
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-6401 lustre: make lustre_user.h compile in user space 94/16494/5
James Simmons [Thu, 24 Sep 2015 14:16:32 +0000 (10:16 -0400)]
LU-6401 lustre: make lustre_user.h compile in user space

While building my own lustre dependent application on
Ubuntu I discovered that lustre_user.h does not compile
due to libcfs/types.h not being packaged. So instead of
fixing the packaging just remove the libcfs/type.h
dependency from lustre_user.h which needs to be done
anyways.

Change-Id: Iecb62ca3b1aa727d1b7a01132e7074ee9079d0d4
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16494
Reviewed-by: Frank Zago <fzago@cray.com>
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7034 obd: Remove dead code in precleanup 61/16061/7
Henri Doreau [Mon, 24 Aug 2015 10:03:51 +0000 (12:03 +0200)]
LU-7034 obd: Remove dead code in precleanup

There used to be several pre-cleanup phases, but
only OBD_CLEANUP_EXPORTS is actually used.  Thus
remove the whole notion of precleanup phases.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: Id1b0922b5d2637aebd409a612c906fe9e15f00d6
Reviewed-on: http://review.whamcloud.com/16061
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-4065 tests: hsm copytool_cleanup improvement 83/12783/5
Sergey Cheremencev [Sun, 16 Nov 2014 12:01:35 +0000 (16:01 +0400)]
LU-4065 tests: hsm copytool_cleanup improvement

hsm shutdown from copytool_cleanup could race with
cdt_set_mount_state enabled. Because set_param -P
doesn't wait when configuration "params" will be
retrieved and applied from server.

Xyratex-bug-id: MRP-2037
Change-Id: I2f274c933986439deae04cd252b4dd9c8442ef1f
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Reviewed-on: http://review.whamcloud.com/12783
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5465 build: strengthen Lustre DKMS RPM install 76/11776/9
Bruno Faccini [Fri, 5 Sep 2014 15:13:18 +0000 (17:13 +0200)]
LU-5465 build: strengthen Lustre DKMS RPM install

This patch adds more control in Lustre DKMS RPM to take care that
its further install+build will not conflict with legacy lustre-osd
and lustre-modules RPMs already installed.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ie35e96326811f239d614aea19ae179b73c5961f3
Reviewed-on: http://review.whamcloud.com/11776
Tested-by: Jenkins
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7205 osp: prohibit multi inflight RPCs for same object 51/16651/2
Niu Yawei [Mon, 28 Sep 2015 08:40:46 +0000 (04:40 -0400)]
LU-7205 osp: prohibit multi inflight RPCs for same object

We should prohibit multi inflight sync RPCs for the same object,
otherwise, the object could be set with stale ownership if the
RPCs arrives OST in reverse order.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I8835e493fa04fbc224b2f9840f0bb7b250d5de1d
Reviewed-on: http://review.whamcloud.com/16651
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7065 lod: Fix free of already added target description 41/16441/4
Dmitry Eremin [Wed, 16 Sep 2015 12:16:28 +0000 (15:16 +0300)]
LU-7065 lod: Fix free of already added target description

In lod_add_device() we may free tgt_desc after adding it to the ldt
if error happens in lod_sub_init_llog() or lfsck_add_target().

Change-Id: Ifb267378db996ae3a86da75b03427fb01eb0d73a
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/16441
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6961 ldiskfs: buffer head leak in mmp 72/15872/3
Jadhav Vikram [Thu, 6 Aug 2015 01:41:30 +0000 (07:11 +0530)]
LU-6961 ldiskfs: buffer head leak in mmp

Release bh_check in case of error.

Seagate-bug-id: MRP-2337
Signed-off-by: Jadhav Vikram <jadhav.vikram@seagate.com>
Signed-off-by: Rahul Deshmukh <rahul.deshmukh@seagate.com>
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Change-Id: I818dbaa22d61e1cc7e66f97c218333e39c6c8afa
Reviewed-on: http://review.whamcloud.com/15872
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7187 jobstats: confine the output of jobid to single line 93/16593/3
Niu Yawei [Tue, 22 Sep 2015 02:44:11 +0000 (22:44 -0400)]
LU-7187 jobstats: confine the output of jobid to single line

Repalce the non-printable characters into '?' when display the
jobid via proc file, so that output of jobid will be confined
to a single line and not break the YAML indention.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ic4e0e6196a13b46f20d96ccce7705c62674f2440
Reviewed-on: http://review.whamcloud.com/16593
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6741 osp: Pack small request inline 53/16353/7
wang di [Tue, 8 Sep 2015 14:41:37 +0000 (07:41 -0700)]
LU-6741 osp: Pack small request inline

Pack small size request inline, instead of using
bulk transfer to save space and RPC round trips.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I9ca71d3c7634c6c82ce0be7ad4f2d54e8f967e19
Reviewed-on: http://review.whamcloud.com/16353
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5951 ptlrpc: track unreplied requests 73/15473/20
Niu Yawei [Thu, 2 Jul 2015 15:46:22 +0000 (11:46 -0400)]
LU-5951 ptlrpc: track unreplied requests

The request xid was used to make sure the ost object timestamps
being updated by the out of order setattr/punch/write requests
properly. However, this mechanism is broken by the multiple rcvd
slot feature, where we deferred the xid assignment from request
packing to request sending.

This patch moved back the xid assignment to request packing, and
the manner of finding lowest unreplied xid is changed from scan
sending & delay list to scan a unreplied requests list.

This patch also skipped packing the known replied XID in connect
and disconnect request, so that we can make sure the known replied
XID is increased only on both server & client side.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ib079b2029680934a4c448da766bf0e42d580be26
Reviewed-on: http://review.whamcloud.com/15473
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Grégoire Pichon <gregoire.pichon@bull.net>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6271 osc: further OSC cleanup after eviction 56/16456/8
Jinshan Xiong [Wed, 16 Sep 2015 18:47:20 +0000 (11:47 -0700)]
LU-6271 osc: further OSC cleanup after eviction

A few problems are fixed in this patch:
1. a ldlm lock could be canceled simutaneously by ldlm bl thread and
  cleanup_resource(). In this case, only one side will win the race
  and the other side should wait for the work to complete;
2. in lov_io_iter_init(), if cl_io_iter_init() against sub io fails,
  it should call cl_io_iter_fini() to cleanup leftover information;
3. define osc_lru_reserve() and osc_lru_unreserve() to reserve LRU
  slots in osc_io_write_iter_init() and unreserve them in fini();
4. eviction on group lock is well supported;
5. cleanups

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I293770b62e177a9ecefe0b4e05f3a8f44b1c831d
Reviewed-on: http://review.whamcloud.com/16456
Tested-by: Jenkins
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6969 osd: remove agent inodes in a separate transaction 24/15924/8
Alex Zhuravlev [Mon, 10 Aug 2015 08:11:01 +0000 (11:11 +0300)]
LU-6969 osd: remove agent inodes in a separate transaction

Create a separate list of agent inodes that need to be deleted,
and delete them after the main transaction has been completed.
Otherwise the number of transaction credits needed to delete
these agents is not accounted in the main transaction and may
trigger assertions in the credit accounting.

Change-Id: Idefce3304d070c5a14de55054d95a57767a5954d
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/15924
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-3534 tests: Add dne-2.5 upgrade test 75/15275/14
wang di [Wed, 29 Jul 2015 11:18:43 +0000 (04:18 -0700)]
LU-3534 tests: Add dne-2.5 upgrade test

Add extra dne tests in conf-sanity.sh 32c to verify
dne upgrade from 2.5 DNE ldiskfs images to 2.7 DNE.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I5b15cdee3b125ebe264b867f7141672159e22b8d
Reviewed-on: http://review.whamcloud.com/15275
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6950 utils: support SELinux context labelling 40/15840/6
Andrew Wellington [Tue, 4 Aug 2015 05:12:24 +0000 (15:12 +1000)]
LU-6950 utils: support SELinux context labelling

SELinux contexts are applied by the kernel if mount options are
not binary. As we don't use any binary mount options in Lustre,
remove the binary mount option flag.

Signed-off-by: Andrew Wellington <andrew.wellington@anu.edu.au>
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I363886f58939c1f7384de2ff579968a19f1460bc
Reviewed-on: http://review.whamcloud.com/15840
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Sebastien Buisson <sebastien.buisson@bull.net>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5969 lustreapi: remove obsolete lctl dump_cfg code 57/15857/4
Andreas Dilger [Tue, 22 Sep 2015 15:24:15 +0000 (11:24 -0400)]
LU-5969 lustreapi: remove obsolete lctl dump_cfg code

With Lustre having utilites that can read lustre record
logs from user space we no longer need the ability to
dump the log of recorded commands to the kernel dump log.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I8b3b71ff394a1f81e4b2396449e0f6879c2b5623
Reviewed-on: http://review.whamcloud.com/15857
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7076 ptlrpc: Unitialized rc in ptlrpc_server_hpreq_init 27/16327/2
Giuseppe Di Natale [Fri, 4 Sep 2015 15:16:12 +0000 (08:16 -0700)]
LU-7076 ptlrpc: Unitialized rc in ptlrpc_server_hpreq_init

'rc' was not initialized and could potentially not be set.
The return code is used to determine the priority of the rpc
call. Assume normal priority (rc = 0) to begin with.

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Change-Id: Iff4a4b8bf78151cbd1e3b7218da7551b2039838a
Reviewed-on: http://review.whamcloud.com/16327
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6432 libcfs: lock-class for cfs_percpt_lock 68/14368/8
Liang Zhen [Tue, 15 Sep 2015 13:44:54 +0000 (09:44 -0400)]
LU-6432 libcfs: lock-class for cfs_percpt_lock

initialise lock-class for each sublock of cfs_percpt_lock
to eliminate false alarm ""possible recursive locking detected"

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Change-Id: I29467e3a21560ff4bb5127ea686dea4f6acfd9a2
Reviewed-on: http://review.whamcloud.com/14368
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7178 quota: fixed incorrect memset 82/16482/4
Frank Zago [Fri, 18 Sep 2015 17:04:42 +0000 (12:04 -0500)]
LU-7178 quota: fixed incorrect memset

The memset was done on a structure using the size of another unrelated
structure.

Added a few cosmetic changes: removed an extra word in function
description and fixed a couple formatting issues.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: Ib1da9292037b1e5ef10c93f2fd871488861bd05e
Reviewed-on: http://review.whamcloud.com/16482
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7174 build: add build by products to .gitignore 58/16458/2
James Simmons [Wed, 16 Sep 2015 20:52:43 +0000 (16:52 -0400)]
LU-7174 build: add build by products to .gitignore

While testing patches other non-patch related build
by products show up with git status. To avoid adding
these by accident place thes by product files in the
proper .gitignore files.

Change-Id: Ie2df9c2c7fd19c95e2b990d93db623826ee82c24
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16458
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7132 osd-ldiskfs: speedup rewrites 60/16360/4
Andrew Perepechko [Thu, 10 Sep 2015 13:08:58 +0000 (16:08 +0300)]
LU-7132 osd-ldiskfs: speedup rewrites

This patch slightly speeds up rewrites on OST
by replacing bmap calls with fiemap calls.

This patch also includes a fiemap deadlock fix
created by Alexey Lyashkov.

Change-Id: I8af6350a0049a14a3e29304087064ecdffc1be89
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@seagate.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Xyratex-bug-id: MRP-2559
Xyratex-bug-id: MRP-2688
Reviewed-on: http://review.whamcloud.com/16360
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
4 years agoLU-6589 llapi: ..._layout_pattern_set() rejects valid patterns 84/14784/4
Ned Bass [Tue, 12 May 2015 18:07:59 +0000 (11:07 -0700)]
LU-6589 llapi: ..._layout_pattern_set() rejects valid patterns

A typo in the input validation code causes llapi_layout_pattern_set()
to reject valid pattern values. Correct the typo and add related test
coverage in llapi_layout_test.c.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Change-Id: I676a98b63d61fca114eacd882a19abce6f2cc857
Reviewed-on: http://review.whamcloud.com/14784
Reviewed-by: frank zago <fzago@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6215 llite: handle kernel symlink api changes in 4.2+ 76/16376/4
James Simmons [Tue, 15 Sep 2015 22:13:20 +0000 (18:13 -0400)]
LU-6215 llite: handle kernel symlink api changes in 4.2+
 kernels

Starting with 4.2 kernels the inode operations handling
symlinks follow_link() and put_link() stop passing in
struct nameidata as an argument. This patch handles this
change.

Change-Id: I204daa1469f6661ced3a519873e1aa26463c8c72
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16376
Tested-by: Jenkins
Reviewed-by: frank zago <fzago@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6214 llite: tar restore fails for HSM released files. 60/16060/13
Aditya Pandit [Mon, 31 Aug 2015 10:32:00 +0000 (16:02 +0530)]
LU-6214 llite: tar restore fails for HSM released files.

If you create a file, archive and release it, it keeps only a
link and all information in xattr. If you tar the file
with --xattr you will store the same striping information and link
information in the tar. If you delete the file, the file and archive
state does not make sense. Now if you restore the file using tar
with xattr having the RELEASED flag turned on, then it is not correct
because this is a new file. Hence ignoring the HSM xattr and masking
out the "RELEASED" flag for the files, which are not archived.
Added testcase for the same.

Change-Id: I0ca8636bf508211d63ba796c3b756c9cb4b42d48
Signed-off-by: Aditya Pandit <panditadityashreesh@yahoo.com>
Reviewed-on: http://review.whamcloud.com/16060
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6813 llite: omit to update wire data 62/16462/2
Bobi Jam [Thu, 17 Sep 2015 02:45:33 +0000 (10:45 +0800)]
LU-6813 llite: omit to update wire data

In ll_setattr_raw(), after op_data->op_attr has been copied, the attr
is updated and op_data->op_attr does not get updated afterward.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I85b94a8ddc62184bfbcb128bd90f88ac03837e46
Reviewed-on: http://review.whamcloud.com/16462
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6578 statahead: drop support for remote entry 67/15767/12
Lai Siyao [Tue, 28 Jul 2015 02:44:55 +0000 (10:44 +0800)]
LU-6578 statahead: drop support for remote entry

This patch dropped support for remote entry statahead, because it
needs 2 async RPCs to fetch both LOOKUP lock from parent MDT and
UPDATE lock from client MDT, which is complicated. And not
supporting remote entry statahead won't cause any issue.

* pack child fid in statahead request.
* lmv_intent_getattr_async() will compare parent and child MDT,
  if child is remote, return -ENOTSUPP.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I8c075bab0a716f194eac3c338ffbdd37f787eff6
Reviewed-on: http://review.whamcloud.com/15767
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7063 llog: fix leak of lock in llog_ost_destroy() 71/16471/2
Bob Glossman [Thu, 17 Sep 2015 18:34:01 +0000 (11:34 -0700)]
LU-7063 llog: fix leak of lock in llog_ost_destroy()

Previous fix
http://review.whamcloud.com/#/c/15730/7/lustre/obdclass/llog_osd.c.
introduced the leak of a lock in the case of an error return from
llog_osd_dir_get().  This fixes that error path.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I38f709c5805e23322de988065c83e1a8079bded6
Reviewed-on: http://review.whamcloud.com/16471
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
4 years agoLU-6273 lwp: notify LWP users in dedicated thread 03/16303/8
Niu Yawei [Tue, 8 Sep 2015 13:41:58 +0000 (09:41 -0400)]
LU-6273 lwp: notify LWP users in dedicated thread

On OST/MDT mount, the client config log will be processed to setup
the LWP as following:

1> Process the LCFG_ADD_UUID record to setup LWP device, then connect
   to the server target. (see lustre_lwp_setup());
2> Process the LCFG_ADD_CONN record to add failover connection;

We can see that if the mount process is blocked in step 1 it will
never have a chance to add failover connection, and LWP won't be able
to switch to failover node forever.

Unfortunately, the callbacks for FLD user could block the step 1.
See ofd/mdt_register_lwp_callback() calls fld_client_rpc() which will
send FLD RPC in a deadloop if the connection isn't available.

This patch solve the problem by using a per LWP dedicated thread to
run the notify callbacks.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ic0d89f1524ea0c1a3e7fc3833e16ecbad2123454
Reviewed-on: http://review.whamcloud.com/16303
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoNew tag 2.7.60 2.7.60 v2_7_60 v2_7_60_0
Oleg Drokin [Tue, 22 Sep 2015 00:53:42 +0000 (20:53 -0400)]
New tag 2.7.60

Change-Id: I051a911a3ca030794dfc16d9571076190aabd713
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7175 misc: update ChangeLog for e2fsprogs-1.42.13.wc3 57/16457/2
Andreas Dilger [Wed, 16 Sep 2015 19:57:14 +0000 (13:57 -0600)]
LU-7175 misc: update ChangeLog for e2fsprogs-1.42.13.wc3

Update lustre/ChangeLog to recommend e2fsprogs-1.42.13.wc3.
This release removes the old lfsck tool, which is no longer used.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I33521c4f2140862b1b9aca938b40e8b23380e1c6
Reviewed-on: http://review.whamcloud.com/16457
Tested-by: Jenkins
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
4 years agoLU-5569 ptlrpc: change reverse import life cycle 50/11750/15
Alexey Lyashkov [Fri, 5 Jun 2015 15:46:48 +0000 (11:46 -0400)]
LU-5569 ptlrpc: change reverse import life cycle

Make reverse import on server have same life cycle for a client
import, otherwise a reverse import disconnecting on each client
reconnect open several races in request sending (AST mostly) code.

First problem is send_rpc vs class_destroy_import() race. If send RPC
(or resending) issued after class_destroy_import() function it will
fail due to generation check.

The second problem is resending via a different router (to a different nid).
The target_handle_connect() function doesn't update the connection
information for older reverse import, so if the connection information
or security flavor has changed we won't be able to deliver an RPC
from server to the client.

The third problem is that connection flags aren't updated atomically
for an import. The target_handle_connect() function connects the new
import before the message header flags are set, so if we send an RPC
in that time it may have the wrong flags.

The final fourth problem is none wakeup an older RPC if client
reconnected to ability to resend after network flap. This was
not a problem without Vitaly's "resend AST callbacks" patch
(commit 30be03b4dd59389) as it was not possible to resend RPCs.
Now, however, this problem results in failing to resend ASTs at
all, or adding long timeout to AST RPCs.

Xyratex-bug-id: MRP-2038
Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Change-Id: I5dd65a0a507827d6a43683dedbbc0cee263ee0d0
Reviewed-on: http://review.whamcloud.com/11750
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7042 lnet: Handle OFED 3.18 packaging definitions 18/16418/3
Dmitry Eremin [Mon, 14 Sep 2015 19:34:52 +0000 (22:34 +0300)]
LU-7042 lnet: Handle OFED 3.18 packaging definitions

Starting with OFED 3.18 the OFED package started to use config.h
provided by autoconf tool. This lead to clash between PACKAGE_*
macros which are defined in OFED and Lustre headers.

Also dealing with kernels that have lustre enabled already
required to undefine the common macros that used by in-kernel
Lustre client and this sources.

This fix undefine all symbols that are generated by Lustre autoconf
to avoid conflicts with kernel defines or OFEDs. They are undefined
right before new definition in config.h. The list of symbols to undef
is automatically generated by autoconf and should not be extended in
the future.

Also undefine clashed macros in autoconf checks.

Change-Id: I0d93adac19573328e905ba536db0dbd842ed2aab
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/16418
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7078 llite: reset md->lmv to NULL 82/16382/3
wang di [Thu, 10 Sep 2015 04:06:11 +0000 (21:06 -0700)]
LU-7078 llite: reset md->lmv to NULL

ll_update_lsm_md() should reset md->lmv to NULL
right after it is assigned to lli_lsm_md, otherwise
it might be double freed if failure happens.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I4f069e3445a957860c2853c6f32104885edc33fa
Reviewed-on: http://review.whamcloud.com/16382
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7096 nrs: serialize executions of nrs_policy_stop 14/16214/4
Henri Doreau [Thu, 3 Sep 2015 11:38:40 +0000 (13:38 +0200)]
LU-7096 nrs: serialize executions of nrs_policy_stop

Do not release nrs_lock in nrs_policy_stop0 to prevent op_policy_stop()
from being executed concurrently.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: Ie42793021aa47ff7e2c14eb58b3d6e8405fa8407
Reviewed-on: http://review.whamcloud.com/16214
Tested-by: Jenkins
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7014 osd: add additional credits for generic IAM delete 13/16213/3
Alex Zhuravlev [Thu, 3 Sep 2015 10:46:11 +0000 (13:46 +0300)]
LU-7014 osd: add additional credits for generic IAM delete

IAM is different from regular htree used by regular directories.
it can recycle and shrink blocks which needs additional credits.

Change-Id: I819d26206b158a0e8ef2a9110ba89fe34e5a1925
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-on: http://review.whamcloud.com/16213
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7090 deb: fix wrong IB path for configure 83/16183/5
Wang Shilong [Tue, 15 Sep 2015 13:47:11 +0000 (09:47 -0400)]
LU-7090 deb: fix wrong IB path for configure

There are two problems:

Firstly, we need check whether O2IBPATH is valid before using
if not, asssign '--with-o2ib=no'instead.

Secondly, macro O2IBPATHS might be "$LINUX $LINUX/drivers/infiniband".
--with-o2ib only expect one string assignment here

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: I86172056c57ec8e649c2e56455369e66bbbe3513
Reviewed-on: http://review.whamcloud.com/16183
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6995 osd: fix invalid use of bvec_iter_page 67/15967/2
Frank Zago [Wed, 12 Aug 2015 16:44:24 +0000 (11:44 -0500)]
LU-6995 osd: fix invalid use of bvec_iter_page

bvec_iter_page is expecting a biovecs list as a first argument, not a
biovec. That works as long as there is only one biovec in the bio. If
there is more than one, then invalid memory is dereferences since bvl
is a single vector, not a list. This only affect 3.14+ kernels, where
HAVE_BVEC_ITER is true. Don't use bvec_iter_page, but create the right
macro to simply return the vector's bv_page. Bug introduced in commit
833b670a.

Signed-off-by: frank zago <fzago@cray.com>
Change-Id: I390b527bacae9dd650814bb19ab2dbc9184a605d
Reviewed-on: http://review.whamcloud.com/15967
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <paf@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6798 kernel: kernel update [SLES11 SP3 3.0.101-0.47.55] 93/15493/14
Yang Sheng [Fri, 3 Jul 2015 22:01:43 +0000 (06:01 +0800)]
LU-6798 kernel: kernel update [SLES11 SP3 3.0.101-0.47.55]

Update SLES11 SP3 kernel to 3.0.101-0.47.55.

Test-Parameters: envdefinitions=SANITY_EXCEPT=170 \
mdsdistro=sles11sp3 ossdistro=sles11sp3 \
clientdistro=sles11sp3 mdsfilesystemtype=ldiskfs \
mdtfilesystemtype=ldiskfs ostfilesystemtype=ldiskfs \
testgroup=review-ldiskfs

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: Ib1932447f8d2a507a260e76a39df53ef9bea3a67
Reviewed-on: http://review.whamcloud.com/15493
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6772 tests: fix malform open in sanity:33d 15/16315/4
Yang Sheng [Tue, 8 Sep 2015 15:52:22 +0000 (23:52 +0800)]
LU-6772 tests: fix malform open in sanity:33d

The malform open original intends to verify whether panic.
Though we send a malform open flags, but it has possible
to success. So we should ignore the return value as test_33c.

Signed-off-by: Yang Sheng <yang.sheng@intel.com>
Change-Id: Ibc20e4864068f07dac30739792dc09e3844b7579
Reviewed-on: http://review.whamcloud.com/16315
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7102 tests: fix replay-dual.sh test_26 for MDSCOUNT=1 14/16414/3
Andreas Dilger [Mon, 14 Sep 2015 19:07:41 +0000 (13:07 -0600)]
LU-7102 tests: fix replay-dual.sh test_26 for MDSCOUNT=1

The replay-dual.sh test_26 code could never pass for MDSCOUNT=1
since it was returning a "false" condition for the first
conditional and always tripping the error message:

[ $MDSCOUNT -ge 2 ] && {set default dirstripe} || error

Instead, make {set default dirstripe} a sub-clause of the conditional.

Fix cleanup_26 to kill proper dbench pid.

Clean up some code style in this test.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Change-Id: I0849318aa59e5698efb27b730a69e2f1b4e2d181
Reviewed-on: http://review.whamcloud.com/16414
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6846 llog: create remote llog synchronously 33/16333/5
wang di [Mon, 7 Sep 2015 15:27:34 +0000 (08:27 -0700)]
LU-6846 llog: create remote llog synchronously

Create remote llog synchronously, because llog_create
for remote object only pack the RPC in the buffer,
instead the real llog object will be created until
transaction stop. If other thread happens to use
this llog object and send RPC before the creation,
which might cause the failure.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I6c806174381b87836b1f6dd833cda50f0ab2d168
Reviewed-on: http://review.whamcloud.com/16333
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5344 llite: lookup master inode by ilookup5_nowait 66/16066/8
wang di [Sat, 22 Aug 2015 20:54:52 +0000 (13:54 -0700)]
LU-5344 llite: lookup master inode by ilookup5_nowait

Do not lookup master inode by ilookup5, instead it should
use ilookup5_nowait, otherwise it will cause dead lock,

1. Client1 send chmod req to the MDT0, then on MDT0, it
enqueues master and all of its slaves lock, (mdt_attr_set()
->mdt_lock_slaves()), after gets master and stripe0 lock,
it will send the enqueue request(for stripe1) to MDT1, then
MDT1 finds the lock has been granted to client2. Then MDT1
sends blocking ast to client2.

2. At the same time, client2 tries to unlink the striped
dir (rm -rf striped_dir), and during lookup, it will hold
the master inode of the striped directory, whose inode state
is NEW, then tries to revalidate all of its slaves,
(ll_prep_inode()->ll_iget()->ll_read_inode2()->
ll_update_inode().). And it will be blocked on the server
side because of 1.

3.Then the client get the blocking_ast request, cancel the
lock, but being blocked by ilookup5 in ll_md_blocking_ast(),
because the inode state is still NEW.

Add test_90/91 in sanityn.sh to verify the deadlock

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I8ce88595998dc35b6165951873192a65674bf3a7
Reviewed-on: http://review.whamcloud.com/16066
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6485 libcfs: embed kr_data into kkuc_reg 38/14638/13
Hongchao Zhang [Tue, 16 Jun 2015 14:17:48 +0000 (22:17 +0800)]
LU-6485 libcfs: embed kr_data into kkuc_reg

In struct kkuc_reg, the "kr_data" is difficult to be freed
outside of libcfs, then it's better to change it to be
inline data instead of the data pointer.

Change-Id: Iaf5e9fbb9bad2540f51da2c4fd9c4047640d0877
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/14638
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: frank zago <fzago@cray.com>
Reviewed-by: Robert Read <robert.read@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-6978 utils: mkfs.lustre to recognise non ldiskfs opts 37/15937/3
Alexander Zarochentsev [Sun, 9 Aug 2015 20:50:04 +0000 (23:50 +0300)]
LU-6978 utils: mkfs.lustre to recognise non ldiskfs opts

After "LU-6030 osd-ldiskfs: improve mount option handling landing",
mkfs.lustre lost ability to store non ldiskfs persistent options, because their
support was stripped out from ldiskfs layer.
This patch makes ldiskfs mount independent from the mount options from mkfs.lustre
command string.

Change-Id: I63e2efb84249eae8294ce33a72894aeb52563ad5
Xyratex-bug-id: MRP-2819
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-on: http://review.whamcloud.com/15937
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-5297 osp: process unsuccessful osp sync records properly 25/14925/7
Emoly Liu [Fri, 21 Aug 2015 15:48:58 +0000 (23:48 +0800)]
LU-5297 osp: process unsuccessful osp sync records properly

Unsuccessful records can be classifed into two types: failed and
invalid. And they should be handled differently.

This patch improves the handling process by the following fixes.

First, correct the return values of osp_sync_new_xxx_job() to separate
the record types:
- 0 on success
- negative on error
- 1 on invalid record

Second, process these two types of records differently:
- When a failed record is processed, opd_syn_{changes,rpc_in_flight,
  rpc_in_progess} should be decreased, and opd_syn_last_processed_id
  should be bumped.
- When an invalid record is processed, besides above process should
  be taken, this record should be deleted at the end. ("unknown record
  type is treated as invalid record".)

Third, simplify the sending rec process in osp_sync_process_queues(),
remove the unnecessary loop waiting and continue processing other
records directly.

Also, OBD_FAIL_OSP_CHECK_INVALID_REC and OBD_FAIL_OSP_CHECK_ENOMEM are
defined and used in sanity.sh test_239a/b respectively to verify the
fix.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Change-Id: I9c55f43f160a3d9e51892a2dc2f45a52f9b6f2c8
Reviewed-on: http://review.whamcloud.com/14925
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
4 years agoLU-7093 mdt: Remote operation permission check 86/16286/6
wang di [Fri, 4 Sep 2015 08:48:32 +0000 (01:48 -0700)]
LU-7093 mdt: Remote operation permission check

Only do permission check for migrate, create striped (remote)
directory, and set default LMV stripeEA for directory.

For non-administrators, only if their gid match
enable_remote_dir_gid (under /proc) or
enable_remote_dir_gid = -1, then they can do these above 3
operations.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Id103ddd4dbf4a1901a32b599639037de8ce58e4a
Reviewed-on: http://review.whamcloud.com/16286
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>