LU-14535 quota: get all quota info in LFS This patch adds option "-a" for LFS to get the quota info of all quota IDs. it iterates quota setting saved in global quota setting files "quota_master/md-0x0" and "quota_master/dt-0x0" from QMT and iterates the quota usage info saved in acct quota files in the backend FS (LDiskFS or ZFS) from QSDs, then merge the two kinds of quota info at client and print it in the similar way as "lfs quota -u|-g|-p". $lfs quota -a -u /mnt/lustre Filesystem /mnt/lustre, Disk usr quotas quota_id kbytes quota limit grace files quota limit grace root 9684 0 0 - 1019 0 0 - bin 4 0 102400 - 1 0 10240 - daemon 4 0 102400 - 1 0 10240 - adm 4 0 102400 - 1 0 10240 - lp 4 0 102400 - 1 0 10240 - sync 4 0 102400 - 1 0 10240 - shutdown 4 0 102400 - 1 0 10240 - halt 4 0 102400 - 1 0 10240 - mail 4 0 102400 - 1 0 10240 - $lfs quota -a -g /mnt/lustre Filesystem /mnt/lustre, Disk grp quotas quota_id kbytes quota limit grace files quota limit grace root 9684 0 0 - 1019 0 0 - bin 4 0 204800 - 1 0 20480 - daemon 4 0 204800 - 1 0 20480 - adm 4 0 204800 - 1 0 20480 - lp 4 0 204800 - 1 0 20480 - sync 4 0 204800 - 1 0 20480 - shutdown 4 0 204800 - 1 0 20480 - halt 4 0 204800 - 1 0 20480 - mail 4 0 204800 - 1 0 20480 - This patch also fixes an deadlock issue in qmt_pool_recalc, the rw_semaphore "qmt_pool_info.qpi_sarr.osts.op_rw_sem" has been acquired in qmt_pool_recalc (read mode), but it was acquired once more in qmt_seed_glbe_all (read mode) and will be stuck if there is a pending write mode lock acquisition from another thread. qsd_reint_qpool D Call Trace: schedule+0x29/0x70 rwsem_down_read_failed+0x105/0x1c0 call_rwsem_down_read_failed+0x18/0x30 down_read+0x20/0x40 qmt_seed_glbe_all+0x3a0/0x800 [lquota] qmt_site_recalc_cb+0x3c7/0x800 [lquota] cfs_hash_for_each_tight+0x11e/0x330 cfs_hash_for_each+0x10/0x20 [libcfs] qmt_pool_recalc+0x9fc/0x1310 [lquota] llog_process_th D Call Trace: schedule+0x29/0x70 rwsem_down_write_failed+0x215/0x3c0 call_rwsem_down_write_failed+0x17/0x30 down_write+0x2d/0x3d lu_tgt_pool_remove+0x36/0x1e0 [obdclass] qmt_pool_add_rem+0x655/0x920 [lquota] qmt_pool_rem+0x10/0x20 [lquota] lod_pool_remove_q+0xd6/0x1d0 [lod] class_process_config+0x16f2/0x2b20 class_config_llog_handler+0x839/0x1540 llog_process_thread+0x913/0x1c10 llog_process_thread_daemonize+0x9f/0xe0 Test-Parameters: testlist=sanity-quota env=SLOW=yes,ONLY=49,NUM_QIDS=20000 Change-Id: I08feb928fbf34635ec9c5c341de993c718798dc9 Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/42098 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-16043 osc: allow error for write on CL_FSYNC_DISCARD If case of CL_FSYNC_DISCARD error is allowed for write of osc object. Otherwise, the included test fails in rm with: (osc_page.c:174:osc_page_delete()) Trying to teardown failed: -16 (osc_page.c:175:osc_page_delete()) ASSERTION( 0 ) failed: (osc_page.c:175:osc_page_delete()) LBUG Test-Parameters: trivial testlist=sanity env=ONLY=907 HPE-bug-id: LUS-10410 Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com> Change-Id: I0aae0dc470ba0371964e7643a6d84b19a1b4e106 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48032 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16965 obd: remove unused obd_evict_inprogress Remove the atomic_t struct field obd_evict_inprogress from 'struct obd_device'. This field was only ever incremented in a unused function that was removed in a previous patch. Hence, remove it altogther. This patch also removes the associated wait queue. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: Id151c1e6a0adde8c1aeb6dbc903b9d98d00fd21d Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51681 Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-8191 target: convert functions to static Static analysis shows that a number of functions could be made static. This patch declares several functions in target static. Also, remove an unused function tgt_obd_log_cancel(), and add some headers where they were missing. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: I1823df3562cb181b275788560166c92b63483637 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51475 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: jsimmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-12610 target: remove OBD_ -> CFS_ macros Remove OBD macros that are simply redefinitions of CFS macros. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Signed-off-by: Ben Evans <beevans@whamcloud.com> Change-Id: I97e3f74d72d41558f293567b4609fa37aaa3b13d Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51123 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-10391 ptlrpc: switch sptlrpc_rule_set_choose to large nid sptlrpc_rule_set_choose() and sptlrpc_target_choose_flavor() now take a large nid. Only the net number is needed, so this is quite straight forward. Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: Id4e8083d31c0393e2ef748babb6b851501b8d46f Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50102 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Sebastien Buisson <sbuisson@ddn.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16712 cksum: fix generating T10PI guard tags for partial brw page To get better performance, we allocate a page as the buffer for T10PI guard tags, we fill the buffer going over every page for brw, when the buffer is considered full, we use cfs_crypto_hash_update_page() to update the hash and reuse the buffer. It doesn't work when there's a page in the brw gets clipped, and the checksum sector size is 512. For a page with PAGE_SIZE of 4096, and offset at 1024, we will end up with 6 guard tags, and won't have enough space in the very end of the buffer for a full brw page, which needs 8. Work out the number of guard tags for each page, update the checksum hash and reuse the buffer when needed. Change-Id: Ic591e63b24534f2a42b670669520895cb35a9546 Fixes: b1e7be00cb ("LU-10472 osc: add T10PI support for RPC checksum") Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50540 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Li Xi <lixi@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15615 target: Free t10pi crypto state on error Looks like when error happens we forgot to release crypto state that not only leaks memory directly, but potentially can tie in-memory pages too. Change-Id: Ia0870ccbb194e4e9ca8701e1c01d519745c236df Signed-off-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50539 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Li Xi <lixi@ddn.com>
LU-14139 statahead: batched statahead processing Batched metadata processing can get a big performance boost. In this patch, it implements a batched statahead mechanism which can also increase the performance for a directory traverse or listing such as the command 'ls'. For the batched statahead, one batch getattr() RPC equals to 'N' normal lookup/getattr RPCs. It can pack a number of dentry name getting from the readdir() call and prepared lock handles one client side lock namespace into one large batched RPC transfering via bulk I/O to obtain ibits DLM locks and associated attributes for a lot of files in one blow. When MDS receives a batched getattr() RPC, it executes the sub requests in it one by one serially. A tunable parameter named "statahead_batch_max" is defined, it means the maximal items can be batched and processed within one aggregate RPC. Once the number of sub requests exceeds this predefined limit, it will pack and trigger the batched RPC. The batched RPC will also be triggered explictly when the readdir() call comes to the end position of the directory or the statahead thread exits abnormally. Batched metadata processing can get a big performance boost. The mdtest performance results without/with this patch series are as follow: mdtest-easy-stat 720.562369 kIOPS : time 118.695 seconds mdtest-easy-stat 1218.290192 kIOPS : time 70.656 seconds In this patch, we set statahead_batch_max=0 and disabled batched statahead by default. It will enable accordingly once some subsequent fixes about batched RPC have been merged. Signed-off-by: Qian Yingjin <qian@ddn.com> Change-Id: I5a80c2c377093dc8b8e21341f440e3038f017ca8 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40720 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-16478 target: disconnected export eviction can race with a reconnect and this in turn can lead to a leaked export reference prevent further umount - mdt_obd_reconnect() grabs a reference via nodemap_add_member(). call obd_disconnect() if such a case observed to balance obd_reconnect(). Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I3fd49429ef40ef391d58e042e091258dcb9add72 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50041 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Sebastien Buisson <sbuisson@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14393 recovery: reply reconstruction for batched RPCs Batched RPC can boost the metadata performance for Lustre dramatically. However, it also increases the complexity of the recovery, such as how to reconstruct the reply in case of the RPC resend if the reply was lost. In this patch, it adds a new field @lrd_batch_idx in the data structure @lsd_reply_data to store each slot of the "reply_data" file: struct lsd_reply_data { __u64 lrd_transno; /* transaction number */ __u64 lrd_xid; /* transmission id */ __u64 lrd_data; /* per-operation data */ __u32 lrd_result; /* request result */ __u32 lrd_client_gen; /* client generation */ __u32 lrd_batch_idx; /* index in a batched RPC */ __u32 lrd_padding[7]; /* unused fields */ }; When found that a batched RPC was a resend RPC request, and if the index of the sub request in the batched RPC is smaller or equal than @lrd_batch_idx in the reply data, it means that the sub request has already executed, the server will reconstruct the reply for this sub request; if the index is larger than @lrd_batch_idx, the server will re-execute the sub reqeust in the batched RPC. Disable conf-sanity/32{a,b,c,d,e,f,g}, 108{a,b} temporarily until the compatibility issue during upgrade for new reply data format is fixed. Signed-off-by: Qian Yingjin <qian@ddn.com> Change-Id: Id48ecc263002cb783f5032642d05e1f3f6673837 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48228 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x Recreate the currently broken lustre kernel patches to allow using custom integrity functions for bio. Note we don't need to save the generate_fn anymore, it will be used once we call bio_integrity_prep_fn(). Add upstream fix b13e0c718568 ("block: bio-integrity: Advance seed correctly for larger interval sizes") for CentOS 8.0 to 8.6. Handle the kernel api changes for the T10PI generate and verify functions introduced in CentOS 8.x kernel, mostly because of switching to blk_integrity_iter. Update the custom generate and verify functions, to sync with upstream versions. - Add T10-DIF-TYPE2, currently only a place holder, not used in upstream either. - Use __be16 instead of __u16 for guard tags. Only reuse guard tags if the rpc checksum is the same one supported on the target. We already have some protection during checksum type negotiation, the server will mark the target's T10PI type as the only T10PI checksum type supported. But it's still good to have the logic in place. Do not call bio_integrity_prep() if the custom interface bio_integrity_prep_fn() does not exist, submit_bio() will do that for us. On the servers, show the target's T10PI checksum as the preferred checksum_type even if it's not the fastest. Note this is only cosmetic and does not impact the checksum type used, which is still done during negotiation. Change-Id: I2d0ba0b80ba9cde2977da24db08095671aa5373c Test-Parameters: trivial Fixes: 293844d132 ("LU-16222 kernel: RHEL 8.7 client and server support") Fixes: f176efd183 ("LU-12269 kernel: RHEL 8.0 server support") Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49441 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Li Xi <lixi@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15847 tgt: move tti_ transaction params to tsi_ Move tti_mult_trans and tti_has_trans to tgt_session_info to be available in all targets. This allows to cleanup old MDT duplicating code and can be used for complex transaction handling in MDT/OFD if needed. Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: I3f0c15e283b9e21c04a009f6cf346afa278e7095 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47491 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: John Hammond <jhammond@whamcloud.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC ladvise RPC doesn't have OBD_MD_FLGROUP set, when RPC reaches server, tgt_validate_obdo() will corrupt the FID if it's seq is in FID_SEQ_NORMAL range. Do not mess with seq in obdo_to_ioobj() and tgt_validate_obdo(), since 2.0 all RPCs should have OBD_MD_FLGROUP set. Add OBD_MD_FLGROUP for ladvise RPC to fix new client talking to old servers. Change-Id: I373b7f32458b18e29d9bb716a912fe4a54eccac5 Signed-off-by: Li Dongyang <dongyangli@ddn.com> Reviewed-on: https://review.whamcloud.com/48080 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 ptlrpc: change bd_sender in ptlrpc_bulk_frag_ops bd_sender in struct ptlrpc_bulk_frag_ops is now 'struct lnet_nid'. Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I43a6600dcc814a6a46b3a793641545123efaa6ab Reviewed-on: https://review.whamcloud.com/44640 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 ptlrpc: change rq_peer to struct lnet_nid rq_peer in struct ptlrpc_request can now store large NIDs. ptlrpc_connection_get() and others now take a struct lnet_processid Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I3bb419720434714301946d278413ce6090aa2cdd Reviewed-on: https://review.whamcloud.com/44638 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-8582 target: send error reply on wrong opcode Unknown opcode does not necessarily means insane client. A new client might send RPCs with new opcodes to an old server. The client might desperately stuck there waiting for a reply. So, send an error back when RPC has a wrong opcode. This patch returns the EOPNOTSUPP to client instead of block. ENOTSUPP is not used here since strerror() does not understand ENOTSUPP. OBD_FAIL_OST_OPCODE=0x253 is added for fault injection test of opcode. To test whether an invalid opcode is handled properly on OST, use the following command: lctl set_param fail_val=${opcode} fail_loc=0x253 Change-Id: I46ca62bc532b92368e06a4f883b102c7186c453c Signed-off-by: Li Xi <lixi@ddn.com> Reviewed-on: https://review.whamcloud.com/47761 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Bobi Jam <bobijam@hotmail.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15451 sec: read-only nodemap flag Add a new 'readonly_mount' property to nodemaps. When set, we return -EROFS from server side if the client is not mounting read-only. So the client will have to specify the read-only mount option to be allowed to mount. Fixes: 928714dddabb ("LU-5092 nodemap: save id maps to targets in new index file") Signed-off-by: Sebastien Buisson <sbuisson@ddn.com> Change-Id: I9931844ae46dfd5d724f592f8dfacc4a8011c7e3 Reviewed-on: https://review.whamcloud.com/46149 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15598 tgt: free allocated page on error Free allocated page if cfs_crypto_hash_init() fails. Fixes: b1e7be00cb6e ("LU-10472 osc: add T10PI support for RPC checksum") Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Change-Id: I0a45a82a57a98ad2517ccf50a2be1e8d65550bb5 Reviewed-on: https://review.whamcloud.com/46659 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Li Dongyang <dongyangli@ddn.com>
LU-15776 tgt: fix transaction handling in tgt_brw_write() Hotfix to prevent possible data loss during WRITE replay. Since commit f0f92773ee18 from LU-14187 the obd_commitrw() may restart write transaction in OFD and MDT. That causes transaction number to be assigned multiple times if such restart happens. Without flag tti_mult_trans the first transaction number is stored only so later one could remain not applied causing data loss after recovery. Patch sets tti_mult_trans for tgt_brw_write() so the latest transaction number will be used as request transno. Fixes: f0f92773ee ("LU-14187 osd-ldiskfs: fix locking in write commit") Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: I364b478591942be5562c3e98ee6e6aa487f3e0c5 Reviewed-on: https://review.whamcloud.com/47371 Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Wang Shilong <wangshilong1991@gmail.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>