LU-12998 mds: add no_create parameter to stop creates Add an target tunable parameter and mount option "no_create" to disable new *directory* creation on an MDT. This sends the flag OS_STATFS_NOCREATE to the clients, and the DNE MDT space balance will avoid selecting that MDT when creating a new subdirectory, without disabling access to existing files/dirs. This allows "soft disabling" an MDT in advance of storage upgrades to minimize new directories and files created on that MDT, reduce future migration, and/or backup/restore workload. As yet it does not totally disable *file* creation on the MDT, but it may be extended to do so in the future. This is analogous to the "no_precreate" option that was added on the OSTs, and "no_create" has been added to the OSTs for consistency ("no_precreate" is kept for compatibility for now). Test-Parameters: testlist=conf-sanity env=ONLY=112b,ONLY_REPEAT=50 Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: I53cfb48ade2f844b18bfc630e7fcea6de9ce7057 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47124 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16796 ofd: Change struct ofd_seq to use refcount_t This patch changes struct ofd_seq to use refcount_t instead of atomic_t Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: Ie149a6812671ea872e17d2881e52cf6096d147ff Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52722 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15671 mds: do not send OST_CREATE transno interop Send OST_CREATE RPCs from the MDS with no_resend and no_delay when communicating with an old OST that does not support the OBD_CONNECT2_REPLAY_RESEND. Likewise, the OST should not reply to the MDS RPC with rq_transno set, or this will trigger: osp_precreate_send() ASSERTION(req->rq_transno == 0) failed This can be avoided if the MDS is upgraded before the OSS, but will always be hit if OSS is upgraded first. After 2.20.53 the MDS/OSS assume that this is always true, since rolling upgrades are unsupported for larger version differences. Test-Parameters: testgroup=rolling-upgrade-oss Fixes: 63e17799a3 ("LU-8367 osp: enable replay for precreation request") Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com> Change-Id: I1ab601a2f55540dd75cf24838f7cdb7f823ed42c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51056 Tested-by: Maloo <maloo@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH Reduce LUSTRE_DATA_SEQ_MAX_WIDTH from ~4B to ~32M to limit the number of objects under /O/[seq]/d[0..31] dir on OSTs. This makes the directories stay optimial for ldiskfs, to avoid going into the largedir/3-level htree territory. Remove the hard-coded LUSTRE_DATA_SEQ_MAX_WIDTH checks in ofd, make them check the seq->lcs_width which is a tunable set to LUSTRE_DATA_SEQ_MAX_WIDTH by default, allow the value up to IDIF_MAX_OID if a larger seq width is needed. Use the odbo->o_size in the OST_CREATE rpc reply on ofd, to update osp with the current seq width setting. osp then uses this seq width to determine when to rollover to a new seq. The seq will rollover when the seq width is exhausted, the default is LUSTRE_DATA_SEQ_MAX_WIDTH. For seq >= FID_SEQ_NORMAL objects, the upper limit of seq width is OBIF_MAX_OID, For IDIF/MDT0 objects, the upper limit is IDIF_MAX_OID. The seq FID_SEQ_OST_MDT0 will change to a normal seq after the rollover. Fix osp_precreate_reserve when the last precreated is the end of the seq and the osp_objs_precreated can not host all the requested objects, the mdt thread would stuck: it wakes up osp precreate thread in a loop for progress, but osp thread will not try to do anything until the seq is used up. This can be seen easier when seq->lcs_width is set to a low number and try to create an overstripe with stripe number bigger than seq->lcs_width. Fix the precreate thread spinning when the precreate pool is at the end of the seq, and is nearly empty. Change the seq->lcs_width to 16384 for all tests in test-framework.sh, except a few slow tests to avoid timeouts, and some overstriping tests creating LOV_MAX_STRIPE_COUNT to avoid overstriping creating less objects than expected, when precreate pool is at the end of the seq, and there are not enough objects. Fix the problem where seq could still change after replay_barrier. To achieve this, introduce new fail_loc OBD_FAIL_OSP_FORCE_NEW_SEQ and force_new_seq/force_new_seq_all to drain the objects in the precreate pool then rollover to a new seq. This applies to a bunch of test suites heavily using replay_barrier. Change-Id: I2749c1004b7bf3197b691cc94527f90145bcdef8 Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38424 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16087 lprocfs: add histogram to stats counter Add histogram to stats counter. Enable histogram for read/write_bytes in mdt/obdfilter job stats. Sample job_stats: - job_id: md5sum.0 snapshot_time : 3143196.864165417 secs.nsecs start_time : 3143196.707206168 secs.nsecs elapsed_time : 0.156959249 secs.nsecs read_bytes: { samples: 2, ..., hist: { 32K: 1, 1M: 1 } } write_bytes: { samples: 1, ..., hist: { 1K: 1 } } Signed-off-by: Lei Feng <flei@whamcloud.com> Change-Id: I75b6909c8b63f08b74c3c411ff3dcd27881bb839 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48278 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Jian Yu <yujian@whamcloud.com> Reviewed-by: Shuichi Ihara <sihara@ddn.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-8837 lustre: remove target declarations from obd.h lu_target.h and obd_target.h are only needed in obd.h for some structs in obd_device.u. We don't really need to mention these structs in the union as they are all quite small. So we can define accessor function that cast a pointer to the union into the required type, and then we can completely remove these includes from obd.h Test-Parameters: trivial Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I9b314b0bfc1baae03ccb8eadf134964ea308f638 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41952 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14642 flr: allow layout version update from client/MDS Client write request always carries its layout version so that OFD can reject the request if the carried layout version is a stale one. This patch makes OFD allow layout version change request from client as well as MDS. And during resync write, all OST objects will get layout version updated. Signed-off-by: Bobi Jam <bobijam@whamcloud.com> Change-Id: I655044f69a4509a2b0cfe99f86de2ce4ee846979 Reviewed-on: https://review.whamcloud.com/45443 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15894 ofd: revert range locking in ofd After commit 301d76a711 (LU-14876), range locking is no longer needed in ofd, because 301d76a711 itself prevents the original data corruption fixed by range locking. At the same time, range locking in ofd adds unnecessary overhead, we can even see serialization under specific load. This patch reverts range locking but keeps recovery-small test 148 to test the original corruption scenario case. Change-Id: Ic795bcfb1e249c4927f66b6bad456f5511819861 Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com> HPE-bug-id: LUS-9890 Fixes: 35679a730 ("LU-10958 ofd: data corruption due to RPC reordering") Reviewed-on: https://review.whamcloud.com/47466 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com> Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-12585 obdfilter: Use actual I/O bytes in stats Currently the obdfilter stats note the number of bytes requested by the client rather than the number of bytes actually read or written. This is particularly confusing for reads because clients can request more data than exists and some applications do this normally. This results in statistics that can be off by almost any amount from the actual number of bytes read. This patch moves the stats to be collected just before commit, which allows the true number of bytes to be recorded but does not include the commit time in the time stats. (Since commit time is not part of the operation latency as experienced by the client.) Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com> Change-Id: I81fe9a6afdad5b48e8421f4aa72b8ef10a0eee93 Reviewed-on: https://review.whamcloud.com/46075 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Aurelien Degremont <degremoa@amazon.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14487 modules: remove references to Sun Trademark. "lustre" is no longer a Trademark of Sun Microsystems. There is no need to acknowledge the trademark in every file, so just remove all these claims. Test-Parameters: trivial Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I66941494eabc54bedf85079c5b85701187f2a8f1 Reviewed-on: https://review.whamcloud.com/42139 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Aurelien Degremont <degremoa@amazon.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-12780 ofd: don't use ptlrpc_thread for consistency verification The ofd module runs a consistency verification thread to verify parent FID. Rather than using ptlrpc_thread to manage this, use native kthreads functionality. - startup-up code is moved out of the thread to before the thread is started, which make error handling clearer. As part of this, the lfsck_req_local struct is combined with an lu_env and ofd_device pointer into a new oivm_args which is passed to the thread a arguments - now it doesn't need to allocate anything itself. - Cleanup remains in the thread, so we add a completion to be sure the thread has started before there is any chance of kthread_stop() being called. - kthread_stop() and kthread_should_stop() are used for stopping the thread. wake_up_process() is used to wake it. The thread sets TASK_IDLE at the top of the loop, and sets TASK_RUNNING if anything is found to do. At the bottom of the loop the 'schedule()' will only block if nothing was found to be done. Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: Iec1de307ea48f7d26c60edf5d86eb0b7bf78f49a Reviewed-on: https://review.whamcloud.com/36262 Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10958 ofd: data corruption due to RPC reordering Without read-only cache, it is possible that a client resends a BRW RPC, receives a reply from the original BRW RPC, modifies the same data and sends a new BRW RPC, however, because of RPC reordering stale data gets to disk. Let's use range locking to protect against this race. Change-Id: I35cbf95594601eacfc5f108b14e4c447962b0bbf Signed-off-by: Andrew Perepechko <c17827@cray.com> Cray-bug-id: LUS-5578,LUS-8943 Reviewed-on: https://review.whamcloud.com/32281 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14194 cksum: add lprocfs checksum support in MDC/MDT Add missed support for checksum parameters in MDC and MDT Handle T10-PI parameters in MDT similar to OFD, move all functionality to target code and unify its usage in both targets Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: I7d397067304e028bf597d5c3ab16250731ccba9d Reviewed-on: https://review.whamcloud.com/40971 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Li Xi <lixi@ddn.com>
LU-14052 ofd: support for multiple access readers ofd_access_log_reader can be passed -I, --mdt-index-filter=INDEX to print only FIDs located on INDEX. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I9a4f09c6b7ca15623d459df17939895301a57a8b Reviewed-on: https://review.whamcloud.com/39906 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: John L. Hammond <jhammond@whamcloud.com> Reviewed-by: Jian Yu <yujian@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13597 ofd: add more information to job_stats Request processing times/latency and basic IO size information is added to the job_stats output. This allows monitoring per-job request processing performance. Except read_bytes and write_bytes in bytes units, all the others use "usecs" units and show min/max/sum values. What's more, two new counters for read and write time are added to calculate bandwidth. The output format is like: write_bytes: { samples: 1, unit: bytes, min: x, max: x, sum: x, sumsq: x} sanity.sh test_205b is modified to verify this patch. Signed-off-by: Emoly Liu <emoly@whamcloud.com> Change-Id: I7a5b77ca0ba464f6330a4bc56735c7762e167019 Reviewed-on: https://review.whamcloud.com/38816 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Wang Shilong <wshilong@ddn.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 lustre: convert some container_of to *_safe Each of these uses of container_of0() cannot be determined from local inspection to always received a valid pointer, so container_of() cannot be used. So convert them to the upstream standard container_of_safe(). Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I7d5551ae4d88bc931f7edbd3447b5bb2db8ce40c Reviewed-on: https://review.whamcloud.com/38384 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 lustre: convert some container_of0 to container_of Each of these calls to container_of0() can be determined from local context to be passed a valid pointer, so it is best to use container_of() directly to make this clear. Either: - the returned pointer is dereferenced with out be tests, or - the passed-in pointer is dereferened before the call, or - the passed-in pointer cannot be NULL, such as when it is a '.next' of a list_head or returned by lu_obecjt_next() So convert all of these to container_of() ... except one which *should* be container_of(), but cannot be as it won't compile cleanly on older kernels. Change that one to container_of_safe() with a big comment. Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: Idcd954f89ed366882563810ce042a5ddaba5a1e5 Reviewed-on: https://review.whamcloud.com/38383 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Yingjin Qian <qian@ddn.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-3606 fallocate: Implement fallocate preallocate operation This patch adds fallocate(2) preallocate operation support for Lustre. fallocate(2) method of the inode_operations or file_operations is implemented and transported to the OSTs to interface with the underlying OSD's fallocate(2) code. In a saperate patch, a new RPC, OST_FALLOCATE has been added and reserved for space preallocation. The fallocate functionality (prealloc) in CLIO has been multiplexed with CIT_SETATTR. (https://review.whamcloud.com/37277) Lustre fsx(File system exerciser) is updated in a saperate patch to handle fallocate calls. (https://review.whamcloud.com/37277) Only fallocate preallocate operation is supported by this patch for now. Other operations like, FALLOC_FL_PUNCH (deallocate), FALLOC_FL_ZERO_RANGE, FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSPECT_RANGE is not supported by this patch and will be addressed by a separate patch. ZFS operation is not supported by this patch. ZFS fallocate(2) will be addressed by patch (https://review.whamcloud.com/36506/) New test case under sanity is added to verify fallocate call. Test-Parameters: fstype=ldiskfs testlist=sanity,sanityn,sanity-dom Signed-off-by: Swapnil Pimpale <spimpale@ddn.com> Signed-off-by: Li Xi <lixi@ddn.com> Signed-off-by: Abrarahmed Momin <abrar.momin@gmail.com> Signed-off-by: Arshad Hussain <arshad.super@gmail.com> Change-Id: I03f27d356616fbf3a3ab8e6309af26c00434d81b Reviewed-on: https://review.whamcloud.com/9275 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Wang Shilong <wshilong@ddn.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13238 ofd: add OFD access logs Add access logs to OFD layer. BRW RPC handlers will record accesss to an in memory circular buffer which may be read in userspace through character devices (/dev/lustre-access-log/$FSNAME-OSTxxxx). A control device (/dev/lustre-access-log/control) is added to facilitate device discovery. A utility (ofd_access_log_reader) to discover and consume access logs is included. Signed-off-by: John L. Hammond <jhammond@whamcloud.com> Change-Id: I76b78cc5075ee01f9b234e96e7a22a1bdcf2f755 Reviewed-on: https://review.whamcloud.com/37552 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Gu Zheng <gzheng@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13383 ofd: lazy atime update OST_BRW_WRITE updates atime both in memory and on-disk OST_BRW_READ updates atime in memory and once difference exceeds delay (obdfilter.*.atime_delay seconds) - it's updated on-disk. Test-Parameters: testlist=sanity env=ONLY=39r,ONLY_REPEAT=100 Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: Ibe49882ec6f1984f9cf6a32f6ee9fef579ed2a03 Reviewed-on: https://review.whamcloud.com/38024 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Wang Shilong <wshilong@ddn.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>