LU-16314 obdclass: Migrate LASSERTF %p to %px This change covers lustre/obdclass through lustre/target and converts LASSERTF statements to explicitly use %px. Use %px to explicitly report the non-hashed pointer value messages printed when a kernel panic is imminent. When analyzing a crash dump the associated kernel address can be used to determine the system state that lead to the system crash. As crash dumps can and are provided by customers from production systems the use of the kernel command line parameter: no_hash_pointers is not always possible. Ref: Documentation/core-api/printk-formats.rst Test-Parameters: trivial HPE-bug-id: LUS-10945 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: Ia256dc1f74f976640ec82746a5d761ef662f45ae Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49405 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16796 target: Change struct top_multiple_thandle to use kref This patch changes struct top_multiple_thandle to use kref(refcount_t) instead of atomic_t Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: I5892e5ab14ea6570645e6395af6d8a0d2c325398 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51922 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-12610 target: remove OBD_ -> CFS_ macros Remove OBD macros that are simply redefinitions of CFS macros. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Signed-off-by: Ben Evans <beevans@whamcloud.com> Change-Id: I97e3f74d72d41558f293567b4609fa37aaa3b13d Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51123 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-9859 libcfs: use round_up directly The macro cfs_size_round() is just round_up(val, 8). Replace cfs_size_round() with the Linux standard round_up(). Change-Id: I5a5ba4e663672c0b0bba5c99be9e0bece2dc3c87 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50545 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16609 target: top_trans_create cannot alloc memory top_trans_create() requests __GFP_IO memory allocation, which does not allow direct reclaim. However, if the memory shortage is temporary, direct reclaim is reasonable. GFP_NOFS is __GFP_IO with additional reclaim bits. Change-Id: I2c84d9d74188660063c948573780745a2b59a688 Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com> HPE-bug-id: LUS-11293 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50176 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
LU-15646 llog: correct llog FID and path output - fix wrong LLOG_ID-to-FID convertion to output llog FID by introducing PLOGID macro to expand llog ID for DFID format - stop printing lgl_ogen along with llog FID as it always zero since 2.3.51 and is not used anymore - output correct path for update llog in llog_reader - always print header info in llog_reader if available - print llog flags in header info Fixes: 5a8e47d0a1a7 ("LU-9153 llog: update llog print format to use FIDs") Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: I7ba49e8101a67d2d80c204a5fc629bfd0bce89ad Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48430 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-15420 build: fixes to support building on Ubuntu 22.04 LTS Lustre uses the glibc stdarg.h instead of the kernel's version which causes the following build issue. lustre/include/lu_object.h:35, /usr/lib/gcc/x86_64-linux-gnu/11/include/stdarg.h:52: note: this is the location of the previous definition #define va_copy(d,s) __builtin_va_copy(d,s) The solution is to use the kernels version of stdarg.h The second build issue : update_trans.c:1608:30: error: ‘struct task_struct’ has no member named ‘state’; did you mean ‘__state’? is due Linux commit 2f064a59a11ff9bc22e52e9678bc601404c7cb34 (sched: Change task_struct::state). The state field was changed and the barrier macros READ_ONCE()/WRITE_ONCE() are used to access it now which is the proper thing to do. Since the check in update_trans.c is equivalent to testing if the kernel thread is not running, since TASK_RUNNING == 0, we can just change the code to use task_is_running(). The task_is_running() was introduced in 5.13. Test-Parameters: trivial Change-Id: Ib5985b187c3013fbc513e9962a5f27bed4996f5b Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/47133 Reviewed-by: Jian Yu <yujian@whamcloud.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15645 obdclass: llog to handle gaps due to old errors an update llog can contaain gaps in index. this shouldn't block llog processing and recovery. actual gaps in transaction sequence should be catched by VBR. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I11ec817e356f9658118c34706ef3a533e7faba83 Reviewed-on: https://review.whamcloud.com/46837 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13974 tests: update log corruption Test case reproduce missing object for sub transaction during set xattr operation. First setattr got -2, second already started, but didn't make llog_add yet. In this case llog osp object is stale after top_trans_start. So declaration phase can not refresh llogs. And at llog_osd_write_rec osp object changes stale state to valid(dt_attr_get), but llog handle and llog header are invalid. A new record would be added to updatelog with wrong index. In that case processing of update log fails with fs1-MDT0001-osp-MDT0003: [0x2:0x400024d0:0x2] Invalid record: index 112926 but expected 112925 lod_sub_recovery_thread()) fs1-MDT0001-osp-MDT0003 get update log failed: rc = -34 Recovery aborted, and clients are evicted. HPE-bug-id: LUS-9030 Test-Parameters: testlist=sanity envdefinitions=ONLY="427" Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com> Change-Id: I6a47fed1bc01f4be62216d1d0787adc413df0cf5 Reviewed-on: https://review.whamcloud.com/40743 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13986 target: fix possible liveloop in distribute_txn thd A recent patch to update_trans.c changed how distribute_txn_thread() waited for more work to do. It previously had an explicit "wait_event()" which listed all the conditions to wait for. It would then recheck each condition and possibly perform an appropriate action. It was changed to check each condition only once (per loop). If the condition was true, the action would be performed and a flag set. If no conditions were true (indicated by flag), it would wait, otherwise it would loop and recheck all condition. One of the "if (condition) { do work }" stanzas in the loop tested a condition that was *not* a condition that should wake up the loop. "batchid" was not tested at all in the wait_event(). The flag mentioned above was, however, set when that condition tested true. This can cause the loop to spin indefinitely. So remove the "__set_current_state(TASK_RUNNING);" so that the value of batchid cannot stop the loop from sleeping (calling 'schedule()'). Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I124ee3e8250dc63fa927f72dc4d29ed3e7b53005 Reviewed-on: https://review.whamcloud.com/40043 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 lustre: convert some container_of0 to container_of Each of these calls to container_of0() can be determined from local context to be passed a valid pointer, so it is best to use container_of() directly to make this clear. Either: - the returned pointer is dereferenced with out be tests, or - the passed-in pointer is dereferened before the call, or - the passed-in pointer cannot be NULL, such as when it is a '.next' of a list_head or returned by lu_obecjt_next() So convert all of these to container_of() ... except one which *should* be container_of(), but cannot be as it won't compile cleanly on older kernels. Change that one to container_of_safe() with a big comment. Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: Idcd954f89ed366882563810ce042a5ddaba5a1e5 Reviewed-on: https://review.whamcloud.com/38383 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Yingjin Qian <qian@ddn.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-12780 target: don't use ptlrpc_thread for txn_commit_thread rather than ptlrpc_thread, use native kthreads functionality. - there is no need to synchronize on startup, the initialization can be done before the thread is started. This requires adding an lu_env to struct target_distribute_txn_data so distribute_txn_init() can set it up before starting the thread. - correspondingly, the cleanup is best done outside of the thread too, as it is possible for kthread_stop() to stop a thread before the function is called even once. So the lu_env_fini is moved to distrbute_txn_fini(). and ->tdtd_list is cleaned up there too just incase the thread didn't have a chance to run. - kthread_stop/kthread_should_stop is used to synchornize shutdown - signaling the thread is done with wake_up_process(). The thread sets TASK_IDLE at the top of the loop, then sets TASK_RUNNING if it finds anything to do, and finally calls schedule() at the end. This makes tdtd_ready_for_cancel_log() unnecesary as it just duplicates checks that are already present in distribute_txn_commit_thread(). Change-Id: I06c3686b90faa6c6b638b8d6c69cd4e05c2783f4 Signed-off-by: Mr NeilBrown <neilb@suse.de> Reviewed-on: https://review.whamcloud.com/36260 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Reviewed-by: Mike Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-9679 lustre: use LIST_HEAD() for local lists. When declaring a local list head, instead of struct list_head list; INIT_LIST_HEAD(&list); use LIST_HEAD(list); which does both steps. Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I67bda77c04479e9b2b8c84f02bfb86d9c2ef5671 Reviewed-on: https://review.whamcloud.com/36955 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Arshad Hussain <arshad.super@gmail.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10467 lustre: use wait_event_idle() where appropriate. When l_wait_event() is passed an 'lwi' which is initialised to all zeroes, it behaves exactly like wait_event_idle(): - no timeout - not interrupted by any signal - doesn't add to load average. So change all these instances to wait_event_idle(), or in two cases, to wait_event_idle_exclusive(). There are three ways that lwi gets set to all zeros: struct l_wait_info lwi = { 0 }; lwi = LWI_INTR(NULL, NULL); memset(&lwi, 0, sizeof(lwi)); Change-Id: Ia6723cbe248ce067331a002e5e9d54796739c08a Signed-off-by: Mr NeilBrown <neilb@suse.de> Reviewed-on: https://review.whamcloud.com/35971 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com> Reviewed-by: Yang Sheng <ys@whamcloud.com> Reviewed-by: Shaun Tancheff <stancheff@cray.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-11418 osd-zfs: call stop_cb if transaction start fail osd_trans_stop() should call osd_trans_stop_cb() if transaction is not successfully started. Improve debug messages for distribute transaction. Add sanity 416 for this. Get rid of ot_write_commit which is useless. Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: I35da81ebd2c9e97c12ae52bd4faed60393cd67d6 Reviewed-on: https://review.whamcloud.com/33248 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: Jenkins Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Tested-by: Maloo <hpdd-maloo@intel.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-11301 target: add lock in sub_trans_stop_cb() sub_trans st_committed and st_stopped flags take different bit of the same memory address, so both of them should be set with lock, but sub_trans_stop_cb() doesn't, which may cause it overriden. Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: Ic3d7d7281b3cf9bd20702be944e14f35200318f1 Reviewed-on: https://review.whamcloud.com/33169 Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Tested-by: Jenkins Tested-by: Maloo <hpdd-maloo@intel.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10308 misc: update Intel copyright messages for 2017 Update copyright messages for files updated in 2016, excluding trivial patches. Add trivial patches to updatecw.sh script exclude list. Revert some changes that were incorrectly attributed to the 2016 (d10200a80770f0029d1d665af954187b9ad883df) and 2015 (0754bc8f2623bea184111af216f7567608db35b6) copyright update patches themselves, since they were not in the exclude list when the subsequent script was run. Test-Parameters: trivial Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Change-Id: I82f21c30c4dac75792bb49fc139bee2ca51f5545 Reviewed-on: https://review.whamcloud.com/30341 Tested-by: Jenkins Tested-by: Maloo <hpdd-maloo@intel.com> Reviewed-by: Jian Yu <jian.yu@intel.com> Reviewed-by: James Nunez <james.a.nunez@intel.com> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
LU-7251 osp: do not assign commit callback to every thandle with OSP there is a risk of getting a lot of commit callbacks. say, 10K unlinks/sec on 4-striped files could result in 4*10K*5 = 200K commit callbacks. this patch implements another schema: every OSP registers own callback every second. this should result in 4*5 commit callbacks in the same situation. in case of forced sync the commit callback is registered unconditionally. the patch removes th_tags and th_ctx from struct thandle as they are not used anymore. this elimintates 3 allocations from every transaction: (lu_object.c:1714:keys_init()) kmalloced 'ctx->lc_value': 320 (update_records.c:1217:update_key_init()) kmalloced 'value': 408 (osp_dev.c:1807:osp_txn_key_init()) kmalloced 'value': 4 Change-Id: I460d5eccb585b166423d84d5c142af2e27751d8b Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com> Reviewed-on: https://review.whamcloud.com/17270 Tested-by: Jenkins Tested-by: Maloo <hpdd-maloo@intel.com> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Lai Siyao <lai.siyao@intel.com>
LU-9848 llog: check padding size for update reclen Update log only checks padding size for split case, which should also be done if it's less than chunk size. Signed-off-by: Lai Siyao <lai.siyao@intel.com> Change-Id: Ie7819f67dd9bcbfb060713bb208c9777420c5178 Reviewed-on: https://review.whamcloud.com/28554 Tested-by: Jenkins Tested-by: Maloo <hpdd-maloo@intel.com> Reviewed-by: Fan Yong <fan.yong@intel.com> Reviewed-by: wangdi <di.wang@intel.com> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
LU-8746 update: restore tdtd_refcount during failure During batchid_update, tdtd_refcount should be restored once error happens, otherwise tdtd_refcount will not reach 0 which will cause distribute thread hang during umount. Change the distribute thread name to "dist_txn". Signed-off-by: Di Wang <di.wang@intel.com> Change-Id: I585cc4ceb37a7f3ddaf38201306e0a331fb43e74 Reviewed-on: https://review.whamcloud.com/26888 Tested-by: Jenkins Tested-by: Maloo <hpdd-maloo@intel.com> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Lai Siyao <lai.siyao@intel.com> Reviewed-by: Fan Yong <fan.yong@intel.com>