LU-16692 osp: osp_fid_diff vs rollover_new_seq race osp_fid_diff/osp_objs_precreated is accessing the last_created_fid and pre_used_fid without opd_pre_lock, and this could race with osp_precreate_rollover_new_seq() when updating them to new fids. Change-Id: I3a61c99570b5532776ddc43247c1513b8c89fb32 Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54087 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17251 osp: force precreate if create_count grows Force the MDS to precreate OST objects if "osp.*.create_count" is written and the OSP does not have at least that many precreated objects locally. This avoids doing complex operations in test scripts to force precreation to run, which slows down the tests and increases the chance that a test might fail. Previously opd_precreate_force was only used for handling OSTs that were reformatted and this reset "create_count" to minimum, so move that to the reformat case rather than in the precreate code path so it does not reset "create_count" when it was just set. Remove the "env" argument from several precreate-related functions, since it wasn't used in those functions, and that made it difficult to call them from the "create_count" parameter handling. Test-Parameters: testlist=parallel-scale env=ONLY=test_rr_alloc Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Change-Id: Iac35c1b981fcd6ab2d1ea5abc9ffe2e4563ebbe5 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52968 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com> Reviewed-by: Alex Deiter <alex.deiter@gmail.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
LU-11912 tests: fix racing in force_new_seq_all We run force_new_seq in parallel to reduce time spent on consuming precreated objects. However this could be racy when multiple MDTs are on the same MDS, a task could finish for one MDT early and reset the fail_loc to 0 on MDS while other tasks are still working on other MDTs. Replace OBD_FAIL_OSP_FORCE_NEW_SEQ with a new param prealloc_force_new_seq for osp, so we can control the seq rollover individually for each osp device. Change-Id: I52dbd550564ca628a8a85c42951694d58b2b93a9 Fixes: 656fc937cf ("LU-11912 tests: consume precreated objects in parallel") Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52801 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Jian Yu <yujian@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-12610 misc: remove OBD_ -> CFS_ macros Remove OBD macros that are simply redefinitions of CFS macros. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Signed-off-by: Ben Evans <beevans@whamcloud.com> Change-Id: I15fe8aa22cb0203bed102a35361f4854ddaabecb Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50809 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH Reduce LUSTRE_DATA_SEQ_MAX_WIDTH from ~4B to ~32M to limit the number of objects under /O/[seq]/d[0..31] dir on OSTs. This makes the directories stay optimial for ldiskfs, to avoid going into the largedir/3-level htree territory. Remove the hard-coded LUSTRE_DATA_SEQ_MAX_WIDTH checks in ofd, make them check the seq->lcs_width which is a tunable set to LUSTRE_DATA_SEQ_MAX_WIDTH by default, allow the value up to IDIF_MAX_OID if a larger seq width is needed. Use the odbo->o_size in the OST_CREATE rpc reply on ofd, to update osp with the current seq width setting. osp then uses this seq width to determine when to rollover to a new seq. The seq will rollover when the seq width is exhausted, the default is LUSTRE_DATA_SEQ_MAX_WIDTH. For seq >= FID_SEQ_NORMAL objects, the upper limit of seq width is OBIF_MAX_OID, For IDIF/MDT0 objects, the upper limit is IDIF_MAX_OID. The seq FID_SEQ_OST_MDT0 will change to a normal seq after the rollover. Fix osp_precreate_reserve when the last precreated is the end of the seq and the osp_objs_precreated can not host all the requested objects, the mdt thread would stuck: it wakes up osp precreate thread in a loop for progress, but osp thread will not try to do anything until the seq is used up. This can be seen easier when seq->lcs_width is set to a low number and try to create an overstripe with stripe number bigger than seq->lcs_width. Fix the precreate thread spinning when the precreate pool is at the end of the seq, and is nearly empty. Change the seq->lcs_width to 16384 for all tests in test-framework.sh, except a few slow tests to avoid timeouts, and some overstriping tests creating LOV_MAX_STRIPE_COUNT to avoid overstriping creating less objects than expected, when precreate pool is at the end of the seq, and there are not enough objects. Fix the problem where seq could still change after replay_barrier. To achieve this, introduce new fail_loc OBD_FAIL_OSP_FORCE_NEW_SEQ and force_new_seq/force_new_seq_all to drain the objects in the precreate pool then rollover to a new seq. This applies to a bunch of test suites heavily using replay_barrier. Change-Id: I2749c1004b7bf3197b691cc94527f90145bcdef8 Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38424 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14692 osp: deprecate IDIF sequence for MDT0000 Always return true for IDIF seq osp_fid_end_seq so osp precreate will rollover to a new seq in the FID_SEQ_NORMAL range for MDT0000. Remove conf-sanity test_122b: Check OST sequence wouldn't change when IDIF 32bit overflows Change-Id: I85a0e38266331c96d971d68ec353949ccac3fc21 Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45822 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
LU-15139 osp: block reads until the object is created it's possible that remote llog can be read and written simultaneously at recovery. for example, dtx recovery thread is fetching updates while MDD's orphan cleanup procedure is removing orphans from PENDING. OSP can be asked to read a just created in OSP cache object while actual object on remote MDS hasn't been created yet. OSP should block such reads until the creation is done. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: Id0f52b90761839399102bed825569da6bfd17864 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47003 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15046 osp: precreate thread vs connect race lcs_exp (required for fid client) was initialized in osp_obd_connect() which races with osp_precreate_thread(). the latter can get stuck if lcs_exp is not initialized and then the whole precreation logic is blocked until remount. instead the precreation thread can just wait preliminary until lcs_exp is initialized properly. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I7a42bf4b17ce5d46bc25bd548d81eb55f168804b Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45099 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-14719 osp: add inode watermark * move block watermark from debugfs to sysfs. * add inode watermark for OSP. Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: I7c768fa2ebfb4b8c2f75255f9e9c061d4c15cf66 Reviewed-on: https://review.whamcloud.com/47128 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Qian Yingjin <qian@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-8367 osp: enable replay for precreation request Lustre has some kind of deadlock between osp_precreate_thread() and stripe allocation at osp_precreate_reserve(). Stripe allocation thread allocated objects and sleeps for more objects at osp_precreate_reserve() in case of OST failover. After reconnection, osp_precreate_thread() calls osp_precreate_cleanup_orphans() to synchronize last id and clean-up unused objects, but it waits zero object reservation(d->opd_pre_reserved). So, no more objects could be created at OST and no reserved objects could be freed. This produce slow creates messages and MDT creation threads hang osp_precreate_reserve()) kjcf05-OST0003-osc-MDT0001: slow creates, last=[0x340000400:0x23a4f483:0x0], next=[0x340000400:0x23a4f378:0x0], reserved=267, sync_changes=0, sync_rpcs_in_progress=0, status=0 The issue reproduced more often with over stripe feature. No need to do orphan clean-up phase when MDT supports resend/replay for precreation request. This behaviour resolves the osp_precreate_cleanup_orphans() hang and unblocks objects creation. Force creation logic is added to support reformatted OST with a same index. It was done during orphan clean-up phase before this. Sanity tests 27S and 822 become invalid. 27S is based on orphan clean-up after reconnection, 822 is based on not resendable OST_CREATE request. These tests are removed. HPE-bug-id: LUS-10793 Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com> Change-Id: I21287b51252e573e796fac69ee3df6ac90e28c10 Reviewed-on: https://review.whamcloud.com/46889 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com> Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6864 osp: manage number of modify RPCs in flight Currently we use a rpc_lock to ensure concurrent in-flight request are handled serially to prevent the execution status from being over written. This patch changes the osp component to send multiple modify RPCs in parallel to the MDT. This will improve metadata performance of cross-MDT operations. For testing replace mkdirmany with createmany -d which does the same thing. Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net> Signed-off-by: James Simmons <jsimmons@infradead.org> Change-Id: Icb601afabd6767463634a4c7943ec4206bc758ec Reviewed-on: https://review.whamcloud.com/14375 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15114 osp: changes queuing throttle Prevent queue of sync changes from growing too much by adding resends when queue size reaches some (tunable) limit. HPE-bug-id: LUS-10345 Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com> Change-Id: I5efabb91d3700c58d9451f81c5fed9a22ae404fb Reviewed-on: https://review.whamcloud.com/45265 Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com> Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-9699 osp: don't assert on OSP duplicating Writeconf on an MDT with index > 0000 will cause "add mdc" to be added to $FSNAME-client config and "add osp" to be added to $FSNAME-MDTXXXX configs. However, the configs may already contain these directives. Duplicating the OSP device will cause the assertion failure in osp_obd_connect(): ASSERTION( osp->opd_connects == 1 ) failed Duplicating the MDC just returns -EEXIST in similar situation. A possible solution is to check configs for duplicates before writing to them. However, sometimes we would like to change nids which are part of "add mdc" and "add osp". Another solution is to mark previous entries with SKIP flags. This patch implements this approach. Since after revoking the config lock, the clients and the MDTs will receive the updated log and apply its newer entries, we still have to handle OSP duplication, but this is only an issue immediately after writeconf processing. Seagate-bug-id: MRP-2634, MRP-3865 Change-Id: Idd7ad43c78d50e6bbe715850503aa0b01fcbf071 Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-on: https://review.whamcloud.com/27753 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13195 osp: track destroyed OSP object retain destroyed OSP objects in memory to prevent races when in-flight destroyed is passed by read or attr_get leading to incorrect local states. also block operations to such an object with -ENOENT. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: Ied59f1a95458e8890249b92d4efc38e258a7e3cf Reviewed-on: https://review.whamcloud.com/38385 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14607 osp: separate buffer for large XATTR Once XATTR is too large to fit into PAGE_SIZE, allocate value in a separate buffer for osp_xattr_entry. Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: Ied090ff73e2e5cdeaf2d91a3670067210f2ab1d7 Reviewed-on: https://review.whamcloud.com/43736 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: John L. Hammond <jhammond@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14487 modules: remove references to Sun Trademark. "lustre" is no longer a Trademark of Sun Microsystems. There is no need to acknowledge the trademark in every file, so just remove all these claims. Test-Parameters: trivial Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I66941494eabc54bedf85079c5b85701187f2a8f1 Reviewed-on: https://review.whamcloud.com/42139 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Aurelien Degremont <degremoa@amazon.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-6142 lustre: Make dev/body/type operations const Many of struct md_device_operations struct dt_body_operations struct dt_object_operations struct dt_device_operations struct dt_index_operations struct lu_object_operations struct lu_device_operations struct lu_device_type_operations are already const. This patch makes the remainder 'const', and changes a few to 'static'. Test-Parameters: trivial Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: Ife82c870a27a9e68e57208d49f51983a552e86ec Reviewed-on: https://review.whamcloud.com/39398 Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-13073 osp: don't block waiting for new objects if OST is down, then it's possible that few threads trying to get already precreated object will get stuck. even worse that all QoS-based allocations then are serialized by the single semaphore, even those that wouldn't try to allocate on failed OST. the patch introduces noblock flag in the allocation hint which is passed to OSP. then QoS code tries to allocate objects in a non-blocking manner. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I38e66d7569aefecf800dbc32f1049ac87853439e Reviewed-on: https://review.whamcloud.com/40274 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Yingjin Qian <qian@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13974 llog: check stale osp object The logic of osp_attr_get has 2 path, 1) return attributes from a cache for health osp object 2) make an out update request and return attributes for stale osp object, object lose stale state. When some out update request with llog writes failed, osp object become stale. But llog handle stay inconsistent (bitmap,count, last_index), and a next llog_add->llog_osd_write_rec do dt_attr_get, gets attributes and makes osp object valid, and uses wrong llog handle data. The result is index jump at llog file - recX, recX+2. And it makes an error during update log processing if failover take a place. The fix adds dt_object_stale function to check osp_object. llog_osd_write_rec check it and return ESTALE. llog_add would fail with ESTALE error and doesn't corrupt update log. HPE-bug-id: LUS-9030 Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com> Change-Id: Iadf53fd816e1c5bde0a19d4c537f0408796c864a Reviewed-on: https://review.whamcloud.com/40742 Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com> Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 lustre: convert some container_of to *_safe Each of these uses of container_of0() cannot be determined from local inspection to always received a valid pointer, so container_of() cannot be used. So convert them to the upstream standard container_of_safe(). Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I7d5551ae4d88bc931f7edbd3447b5bb2db8ce40c Reviewed-on: https://review.whamcloud.com/38384 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>