LU-12610 obd: remove OBD_ -> CFS_ macros Remove OBD macros that are simply redefinitions of CFS macros. Signed-off-by: Timothy Day <timday@amazon.com> Signed-off-by: Ben Evans <beevans@whamcloud.com> Change-Id: Id9e312a6074c5e11370f018afd3201d73b53e7e0 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50808 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15280 llog: fix processing of a wrapped catalog Several issues were found with "lfs changelog --follow" for a wrapped catalog (llog_cat_process() with startidx): 1/ incorrect lpcd_first_idx value for a wrapped catalog (startcat>0) The first llog index to process is "lpcd_first_idx + 1". The startidx represents the last record index processed for a llog plain. The catalog index of this llog is startcat. lpcd_first_idx of a catalog should be set to "startcat - 1" e.g: llog_cat_process(... startcat=10, startidx=101) means that the processing will start with the llog plain at the index 10 of the catalog. And the first record to process will be at index 102. 2/ startidx is not reset for an incorrect startcat index startidx is relevant only for a startcat. So if the corresponding llog plain is removed or if startcat is out of range, we need to reset startidx. This patch remove LLOG_CAT_FIRST, that was really confusing (LU-14158). And update osp_sync_thread() with the llog_cat_process() corrected behavior. It modifies also llog_cat_retain_cb() to zap empty plain llog directly in it (like for llog_cat_size_cb()), the current implementation is not compatible with this patch. The test "conf-sanity 135" verify "lfs changelog --follow" for a wrapped changelog_catalog. Test-Parameters: testlist=conf-sanity env=ONLY=135,ONLY_REPEAT=10 Test-Parameters: testlist=sanity env=ONLY=60a,ONLY_REPEAT=20 Test-Parameters: testlist=conf-sanity env=SLOW=yes,ONLY=106,ONLY_REPEAT=10 Fixes: a4f049b9 ("LU-13102 llog: fix processing of a wrapped catalog") Signed-off-by: Etienne AUJAMES <eaujames@ddn.com> Change-Id: Iaf46ddd4a6ec1e06cec0d17aa9bde766bd793abc Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45708 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
LU-16159 lod: cancel update llogs upon recovery abort If recovery is aborted, cancel update catalog from catlist, and keep them on disk for some time (for debug purpose), as can avoid accumulating stale update records, and also avoid recovery problems if update llogs are corrupt. Update llogs are canceled after recovery completes and before regular request processing. For these logs, their ctime will be set, and log header will be marked with LLOG_F_MAX_AGE|LLOG_F_RM_ON_ERR, and when 30 days passed, they will be removed automatically. Tidy up recovery abort code: * if obd_abort_recovery is set, or OBD is stopping, stop both client recovery and MDT recovery. * otherwise if obd_abort_mdt_recovery is set, stop MDT recovery only. lctl llog_print support printing update log FIDs used by specified MDT: * "lctl --device <MDT> llog_print update_log" will list all update llog FIDs used by this MDT device. Disabled replay-single.sh 100c stripe check because abort_recovery will cancel update llogs, and won't replay them upon next recovery. Added replay-single.sh 100d. Formatall in the end of replay-single.sh because directory unlink may fail. Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single,replay-single,replay-single,replay-single,replay-single,replay-single,replay-single,replay-single,replay-single Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: Ie2bda6c097d65f5c51cba66c2dbf6ae4a5d36dda Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48584 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15646 llog: correct llog FID and path output - fix wrong LLOG_ID-to-FID convertion to output llog FID by introducing PLOGID macro to expand llog ID for DFID format - stop printing lgl_ogen along with llog FID as it always zero since 2.3.51 and is not used anymore - output correct path for update llog in llog_reader - always print header info in llog_reader if available - print llog flags in header info Fixes: 5a8e47d0a1a7 ("LU-9153 llog: update llog print format to use FIDs") Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: I7ba49e8101a67d2d80c204a5fc629bfd0bce89ad Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48430 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain Add the catalog callback return LLOG_SKIP_PLAIN to conditionally skip an entire llog plain. This could speedup the catalog processing for specific usages when a record need to be access in the "middle" of the catalog. This could be usefull for changelog with several users or HSM. This patch modify chlg_read_cat_process_cb() to use LLOG_SKIP_PLAIN. The main idea came from: d813c75d ("LU-14688 mdt: changelog purge deletes plain llog") **Performance test:** * Environement: 2474195 changelogs record store on the mds0 (40 llog plain): mds# lctl get_param -n mdd.lustrefs-MDT0000.changelog_users current index: 2474195 ID index (idle seconds) cl1 0 (3509) * Test Access to records at the end of the catalog (offset: 2474194): client# time lfs changelog lustrefs-MDT0000 2474194 >/dev/null * Results - with the patch: real 0m0.592s - without the patch: real 0m17.835s (x30) Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr> Change-Id: I887d5bef1f3a6a31c46bc58959e0f508266c53d2 Reviewed-on: https://review.whamcloud.com/46310 Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15000 llog: read canceled records in llog_backup llog_backup() do not reproduce index "holes" in the generated copy. This could result to a llog copy indexes different from the source. Then it might confuse the configuration update mechanism that rely on indexes between the MGS source and the target copy. This index gaps can be caused by "lctl --device MGS llog_cancel". This patch add "raw" read mode to llog_process* to read canceled records. So now llog_backup is able to reproduce an exact copy of the original. Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr> Change-Id: I811e23de8f4545bed36a44fedc2638d7418830dd Reviewed-on: https://review.whamcloud.com/46552 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: DELBARY Gael <gael.delbary@cea.fr> Reviewed-by: Stephane Thiell <sthiell@stanford.edu> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15761 obdclass: fix locking in llog_cat_refresh() the patch fixes two problems: 1) pairing up_write() should be used with cathandle 2) llog_read_header() manipulates loghandle's internal structures (header, last_idx, etc) which are supposed to stay consistent from another user's point of view (like llog_add_rec()) Fixes: 71f409c9b31b ("LU-11418 llog: refresh remote llog upon -ESTALE") Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: Ib86e10a925b541d02c22d74e6ddbc4368345ac11 Reviewed-on: https://review.whamcloud.com/47185 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mike Pershin <mpershin@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14474 llog: don't destroy next llog do not destroy empty llog if it's referenced as the next one in a catalog. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I78bfeb90435aaee2b8536b647aa3acec56642ea0 Reviewed-on: https://review.whamcloud.com/44998 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mike Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14487 modules: remove references to Sun Trademark. "lustre" is no longer a Trademark of Sun Microsystems. There is no need to acknowledge the trademark in every file, so just remove all these claims. Test-Parameters: trivial Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I66941494eabc54bedf85079c5b85701187f2a8f1 Reviewed-on: https://review.whamcloud.com/42139 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Aurelien Degremont <degremoa@amazon.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-14098 obdclass: try to skip corrupted llog records if llog's header or record is found corrupted, then ignore the remaining records and try with the next one. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: If47ec1fc1e2eaf64be7ba08d3aa9c2b93903c0cf Reviewed-on: https://review.whamcloud.com/40754 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mike Pershin <mpershin@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13411 llog: allow delete of zero size llog 1) all plain logs belonging to catalog should have flag LLOG_F_ZAP_WHEN_EMPTY base on llog_cat_new_log(). When llog_cat_process_common processing plain log with zero file size, this flag is not set during llog_cat_id2handle LLOG_EMPTY, so these plain llogs are not canceled/destroyed. They appeared during cross MDT updates. Fix adds flag LLOG_F_ZAP_WHEN_EMPTY for any plain llog at catalog. Signed-off-by: Alexander Boyko <c17825@cray.com> Cray-bug-id: LUS-8634 Change-Id: Ieebee67bf9e7bebb9ecc51b858a9976a00583c7b Reviewed-on: https://review.whamcloud.com/38131 Reviewed-by: Mike Pershin <mpershin@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13102 llog: fix processing of a wrapped catalog The logic for rereading a llog buffer had an exception for a full catalog, when lgh_last_idx = llh_cat_idx and a first processing index is a llh_cat_idx+1. This check is based on a value lh_last_idx, which stays unchanged between buffer read. But llh_cat_idx could go forward, and this lead to a wrong check where reread doesn't happen. As a result Lustre got ENOENT for a record and stoped osp processing. llog_cat_set_first_idx()) catlog [0x6:0x1:0x0] first idx 34730, last_idx 34731 osp_sync_process_queues()) 1 changes, 0 in progress, 0 in flight llog_process_thread()) stop processing plain 0x76941:1:0 index 64767 count 1 llog_process_thread()) index: 34731, lh_last_idx: 34730 synced_idx: 34730 lgh_last_idx: 34731 llog_cat_process_common()) processing log [0x2780f:0x1:0x0]:0 at index 34731 of catalog [0x6:0x1:0x0] llog_cat_id2handle()) snx11281-OST0001-osc-MDT0001: error opening log id [0x2780f:0x1:0x0]:0:rc = -2 The patch fixes logic and also adds/changes debugging for llog and osp. Cray-bug-id: LUS-8193 Signed-off-by: Alexander Boyko <c17825@cray.com> Change-Id: I9463223a1ea904b96643b19e1580f5894142c12b Reviewed-on: https://review.whamcloud.com/37102 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com> Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
LU-12477 lustre: remove obsolete config checks Remove from the lustre kernel code all the support for kernels earlier than the RHEL7 3.10+. This greatly simplifies the code and makes build times much better. Change-Id: If52091ac5249b2719b992032040ccf30cc5bf0e4 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/37085 Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Yang Sheng <ys@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10198 llog: keep llog handle alive until last reference Llog handle keeps related dt_object pinned until llog_close() call, meanwhile llog handle can still have other users which took llog handle via llog_cat_id2handle() Patch changes llog_handle_put() to call lop_close() upon last reference drop. So llog_osd_close() will put dt_object only when llog_handle has no more references. The llog_handle_get() checks and reports if llog_handle has zero reference. Also patch modifies checks for destroyed llogs, llog handle has new lgh_destroyed flag which is set when llog is destroyed, llog_osd_exist() checks dt_object_exist() and lgh_destroyed flag, so destroyed llogs are considered as non-existent too. Previously it uses lu_object_is_dying() check which is not reliable because means only that object is not to be kept in cache. Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: If7df41646c243c0d40b20a30a33e86c688d24508 Reviewed-on: https://review.whamcloud.com/37367 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Alexandr Boyko <c17825@cray.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13069 obdclass: don't skip records for wrapped catalog osp_sync_thread() uses opd_sync_last_catalog_idx as a start point of catalog processing. It is used at llog_cat_process_cb also, to skip records from processing. When catalog is wrapped, processing starts from second part of catalog and then a first part. So, a first part would be skipped at llog_cat_process_cb() base on lpd_startcat. osp_sync_thread() restarts a processing loop with a opd_sync_last_catalog_idx. For a wrapped it increases last index and one more increase do a llog_process_thread. This leads to a skipped records at catalog, they would not be processed. The patch fixes these issues. It also adds sanity test 135 and 136 as regression tests. Signed-off-by: Alexander Boyko <c17825@cray.com> Cray-bug-id: LUS-8053,LUS-8236 Change-Id: Ic75af1bf4468b9ef2de32cbf6d834b6a81376e88 Reviewed-on: https://review.whamcloud.com/36996 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andriy Skulysh <c17819@cray.com> Reviewed-by: Alexander Zarochentsev <c17826@cray.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10070 lod: SEL cleanup some cleanups - dt_statfs with an extra paremeter to be dt_statfs_info; - lod_statfs_and_check does not need an extra parameter and to be static again; - move asserts to a better place; - test component-add with wrong paremeters; - print out the layout sanity errors wherever needed; - make an array of layout_sanity errors; - an HSM sanity test is added; and one defect: - the last component cannot be 0-lenght; Signed-off-by: Vitaly Fertman <c17818@cray.com> Change-Id: If832579ce27cb6ab87d36a594c04363deaea8711 Reviewed-on: https://review.whamcloud.com/35414 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alexey Lyashkov <c17817@cray.com> Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10070 lod: SEL: Implement basic spillover space This is a barebones implementation of spillover space. This allows the creation of extendable layout components, which are normal layout components followed by "extension components". These extension components are never initialized, instead, when i/o reaches them, the server checks if there is sufficient space on the preceding normal layout component, and if so, it modifies the extent of the component to give space to the preceding component. If there is not sufficient space on those OSTs, the special extension space component can be removed, and the next component of the layout is moved down to meet the existing component. This allows i/o to "spill over" to this new layout component, which is expected to be on different OSTs. For multi-tiered systems, this makes it possible to avoid the situation where an inner tier is low on space, but a an outer tier has plenty, and PFL files cannot use the space in the outer tier because the inner is full. This patch requires the next patch in the series for FLR support, but does not depend on the other subsequent patches in this series. Cray-bug-id: LUS-2528 Signed-off-by: Patrick Farrell <paf@cray.com> Change-Id: I8f6c6df8ee155033d5278535dc456e604552e409 Reviewed-on: https://review.whamcloud.com/33783 Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alexey Lyashkov <c17817@cray.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-11943 llog: Reset current log on ENOSPC The original LU-10527 patch: "LU-10527 obdclass: don't recycle loghandle upon ENOSPC" https://review.whamcloud.com/#/c/30897/ Kept the current log on ENOSPC. This appears to cause llog corruption on failover, and the other part of the original patch (removing an incorrect assert) should be sufficient to fix the original issue. Fixes: 5761b9576d39 ("LU-10527 obdclass: don't recycle loghandle upon ENOSPC") Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com> Change-Id: Ie5c0ab77940c1be0ec1f166e4d38080b254bed5c Reviewed-on: https://review.whamcloud.com/34347 Tested-by: Jenkins Reviewed-by: Faccini Bruno <bruno.faccini@intel.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-11827 llog: protect cathandle in llog_cat_declare_add_rec llog_cat_declare_add_rec() calls llog_cat_prep_log() passing &cathandle->u.chd.chd_current_log and &cathandle->u.chd.chd_next_log. Then it has to protect cathandle in order to avoid race with llog_cat_current_log() when it decides to change cathandle->u.chd.chd_current_log and cathandle->u.chd.chd_next_log. Signed-off-by: Vladimir Saveliev <c17830@cray.com> Cray-bug-id: LUS-6804 Change-Id: I689efb40452af180f137aff35ccabe132a24180a Reviewed-on: https://review.whamcloud.com/33914 Tested-by: Jenkins Reviewed-by: Alexandr Boyko <c17825@cray.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-11924 osp: combine llog cancel operations The osp_sync_process_committed() cancels llog records one by one. For each cancel it do open,transaction,mutex,write, etc. But most of all cancels belongs to a single llog file. So they could be combined. The patch adds functions for cancelling array of indexes for a llog file. And adds behavior and calls at osp_sync_process_committed(). Signed-off-by: Alexander Boyko <c17825@cray.com> Cray-bug-id: LUS-6836 Change-Id: I4f461687021b3f76595d403cdd0bb6aba8d93b53 Reviewed-on: https://review.whamcloud.com/34179 Tested-by: Jenkins Reviewed-by: Sergey Cheremencev <c17829@cray.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andriy Skulysh <c17819@cray.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>