From 2007ab4709acaef0397df15c9f4cf4387844ba9c Mon Sep 17 00:00:00 2001 From: Andreas Dilger Date: Wed, 27 Mar 2024 21:18:56 -0600 Subject: [PATCH] LU-16500 utils: 'lfs migrate' should select new OSTs When migrating a file using "lfs migrate FILE" without any arguments to specify a new layout, this should migrate the file to the best OSTs available at that time based on free space, instead of keeping the file on the same OSTs (which is almost pointless otherwise). Reset the starting OST index for all components of the copied file layout so that this can happen properly. Previously, only the last component had the OST index reset, which was only partly helpful. Add llapi_layout_ost_index_reset() to handle this, since it seems likely that tools using llapi_layout_from_fd() and friends to copy an existing layout will want to do the same. Add the corresponding man page and reference it from llapi_layout_get_from_fd(). Update sanity test_56xe to check that the starting OST index of each component is not the same for all components. This check might not catch a broken "lfs migrate" every time since even before this patch the last component would be allocated on a random OST, but will still fail about once every 1/$OST_COUNT runs. Conversely, with this patch it passes hundreds of iterations without a false positive, though a small chance exists that it will have a false positive on occasion. Add a "make utils" target to simplify building only user utilities. Test-Parameters: testlist=sanity env=ONLY=56xe,ONLY_REPEAT=100 Fixes: 0568f4ca25 ("LU-16500 utils: set default ost index for lfs migrate") Signed-off-by: Andreas Dilger Change-Id: Ie4c68d4b2ff09560a7a13ae464723745cf968d36 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54600 Tested-by: jenkins Tested-by: Maloo Reviewed-by: Stephane Thiell Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin --- autoMakefile.am | 6 ++++ lustre/doc/Makefile.am | 1 + lustre/doc/llapi_layout.7 | 1 + lustre/doc/llapi_layout_get_by_fd.3 | 48 ++++++++++++++-------------- lustre/doc/llapi_layout_ost_index_reset.3 | 52 +++++++++++++++++++++++++++++++ lustre/include/lustre/lustreapi.h | 22 ++++++++++++- lustre/tests/sanity.sh | 11 ++++++- lustre/utils/lfs.c | 3 +- lustre/utils/liblustreapi_layout.c | 40 ++++++++++++++++++++++-- 9 files changed, 155 insertions(+), 29 deletions(-) create mode 100644 lustre/doc/llapi_layout_ost_index_reset.3 diff --git a/autoMakefile.am b/autoMakefile.am index b27984a..b3c8186 100644 --- a/autoMakefile.am +++ b/autoMakefile.am @@ -44,6 +44,7 @@ help: @echo 'Generic targets:' @echo ' all - Build all modules and utilities enabled by' @echo ' autotools' + @echo ' utils - Build only userspace utilities' @echo ' checkpatch - Run checkpatch.pl on latest commit' @echo ' checkstack - Run checkstack.pl' @echo ' checkstack-update - Update checkstack.pl' @@ -60,6 +61,11 @@ help: checkpatch: @git diff HEAD~1 | ./contrib/scripts/checkpatch.pl +utils: + $(MAKE) -C libcfs/libcfs/util + $(MAKE) -C lnet/utils + $(MAKE) -C lustre/utils + # these empty rules are needed so that automake doesn't add its own # recursive rules etags-recursive: diff --git a/lustre/doc/Makefile.am b/lustre/doc/Makefile.am index 22c9086..1923f0d 100644 --- a/lustre/doc/Makefile.am +++ b/lustre/doc/Makefile.am @@ -170,6 +170,7 @@ LIBMAN = \ llapi_layout_get_by_xattr.3 \ llapi_layout_ost_index_get.3 \ llapi_layout_ost_index_set.3 \ + llapi_layout_ost_index_reset.3 \ llapi_layout_pattern_get.3 \ llapi_layout_pattern_set.3 \ llapi_layout_pool_name_get.3 \ diff --git a/lustre/doc/llapi_layout.7 b/lustre/doc/llapi_layout.7 index 0c1a622..b7c61a7 100644 --- a/lustre/doc/llapi_layout.7 +++ b/lustre/doc/llapi_layout.7 @@ -198,6 +198,7 @@ The RAID pattern may only be set to 0. .BR llapi_layout_get_by_xattr (3), .BR llapi_layout_ost_index_get (3), .BR llapi_layout_ost_index_set (3), +.BR llapi_layout_ost_index_reset (3), .BR llapi_layout_pattern_get (3), .BR llapi_layout_pattern_set (3), .BR llapi_layout_pool_name_get (3), diff --git a/lustre/doc/llapi_layout_get_by_fd.3 b/lustre/doc/llapi_layout_get_by_fd.3 index 864bf24..f158eaa 100644 --- a/lustre/doc/llapi_layout_get_by_fd.3 +++ b/lustre/doc/llapi_layout_get_by_fd.3 @@ -22,6 +22,7 @@ obtain the layout of a Lustre file .fi .SH DESCRIPTION .PP +The functions .BR llapi_layout_get_by_xattr() , .BR llapi_layout_get_by_fd() , .BR llapi_layout_get_by_fid() , @@ -34,8 +35,8 @@ containing the layout information for the file referenced by .IR fd , .IR fid , or -.IR path . -The +.IR path , +respectively. The .B struct llapi_layout is an opaque entity containing the layout information for a file in a Lustre filesystem. Its internal structure should not be directly @@ -52,7 +53,7 @@ is a Lustre layout extended attribute (LOV EA) from a file or directory in a Lustre filesystem. The .I lov_xattr should be the raw xattr without being byte-swapped, since this function will -swap it properly. +swap it to the local machine endianness properly. .PP For .BR llapi_layout_get_by_fd() , @@ -62,12 +63,10 @@ filesystem. .PP For .BR llapi_layout_get_by_fid() , -the .I lustre_path -argument serves to identify the Lustre filesystem containing the file -represented by +identifies the Lustre filesystem containing the file represented by .IR fid . -It is typically the filesystem root, but may also be any path beneath +It is typically the filesystem root directory, but may also be any path beneath the root. Use the function .BR llapi_path2fid (3) to obtain a @@ -82,17 +81,15 @@ argument that names a file or directory in a Lustre filesystem. .PP Zero or more flags may be bitwise-or'd together in .I flags -or -.I xattr_flags to control how a layout is retrieved. Currently -.B llapi_layout_get_by_path() -accepts only one flag, while .B llapi_layout_get_by_fd() and .B llapi_layout_get_by_fid() -do not use any flags. The list of flags that can be used in -.I flags -is as follows: +do not accept any values for +.IR flags , +while +.B llapi_layout_get_by_path() +accepts only one flag as follows: .TP 5 .SM LLAPI_LAYOUT_GET_EXPECTED Unspecified attribute values are replaced by the literal default values @@ -128,10 +125,11 @@ since stripe size is unspecified, while reports the literal value 1048576. Both forms report a stripe count of 2, since that attribute is specified. .PP -The values that can be used by -.B llapi_layout_get_by_xattr() +Valid arguments for .I flags -argument is as follows: +with +.B llapi_layout_get_by_xattr() +are: .TP 5 .SM LLAPI_LAYOUT_GET_CHECK If the @@ -150,13 +148,16 @@ when necessary, leaving unmodified. Otherwise, the byte swapping will be done to the fields of the .I lov_xattr buffer directly. +.SH NOTE +When using these functions to copy an existing file's layout to create a +new file with +.B llapi_layout_file_open (3) +for mirroring, migration, or as the template for a new file, +.BR llapi_layout_ost_index_reset (3) +should be called to reset the OST index values for each component, so that +the file copy is not created on exactly the same OSTs as the original file. .SH RETURN VALUES -.LP -.BR llapi_layout_get_by_fd() , -.BR llapi_layout_get_by_fid() , -and -.B llapi_layout_get_by_path() -return a valid pointer on success or +These functions return a valid pointer on success or .B NULL on failure with .B errno @@ -180,6 +181,7 @@ An invalid argument was specified. The kernel returned less than the expected amount of data. .SH "SEE ALSO" .BR llapi_layout_file_open (3), +.BR llapi_layout_ost_index_reset (3), .BR llapi_path2fid (3), .BR llapi_layout (7), .BR lustreapi (7) diff --git a/lustre/doc/llapi_layout_ost_index_reset.3 b/lustre/doc/llapi_layout_ost_index_reset.3 new file mode 100644 index 0000000..e5bfd26 --- /dev/null +++ b/lustre/doc/llapi_layout_ost_index_reset.3 @@ -0,0 +1,52 @@ +.TH llapi_layout_ost_index_reset 3 "2024 Mar 27" "Lustre User API" +.SH NAME +llapi_layout_ost_index_reset \- reset OST index of all Lustre file components +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int llapi_layout_ost_index_reset(struct llapi_layout *" layout ); +.fi +.SH DESCRIPTION +.PP +.B llapi_layout_ost_index_reset() +resets the starting ost_index number of all components in the specified file +.I layout +to +.BR LLAPI_LAYOUT_DEFAULT . +This allows the MDS to automatically allocate the objects for each file +component to the best OSTs available at that time. +.PP +This should be called when copying an existing file +.I layout +retrieved using one of +.BR llapi_layout_get_by_fid (3), +.BR llapi_layout_get_by_fd (3), +.BR llapi_layout_get_by_path (3), +or +.BR llapi_layout_get_by_xattr (3), +so that the OST selection is not copied exactly from the source layout if +it is used with +.BR llapi_layout_file_open (3) +to create a new file for migration, mirroring, or other replication task. +.SH RETURN VALUES +.LP +.B llapi_layout_ost_index_reset() +returns 0 on success, or a negative error if an error occurred (in which case, +errno is set appropriately). +.SH ERRORS +.TP 15 +.SM EINVAL +An invalid argument was specified. +.TP 15 +.SM ENOENT +The layout does not have any valid components. +.TP 15 +.SM ENOMEM +The layout does not have any valid components. +.SH "SEE ALSO" +.BR llapi_layout (7), +.BR llapi_layout_alloc (3), +.BR llapi_layout_file_open (3), +.BR llapi_layout_ost_index_set (3), +.BR lustreapi (7) diff --git a/lustre/include/lustre/lustreapi.h b/lustre/include/lustre/lustreapi.h index c527b1a..f5e921e 100644 --- a/lustre/include/lustre/lustreapi.h +++ b/lustre/include/lustre/lustreapi.h @@ -1002,6 +1002,18 @@ int llapi_layout_ost_index_get(const struct llapi_layout *layout, int llapi_layout_ost_index_set(struct llapi_layout *layout, int stripe_number, uint64_t index); +/** + * Reset the OST index on all components in \a layout to LLAPI_LAYOUT_DEFAULT. + * + * This is useful when reusing a file layout that was copied from an existing + * file and to be used for a new file (e.g. when mirroring or migrating or + * copying a file), so the objects are allocated on different OSTs. + * + * \retval 0 Success. + * \retval -1 Error with errno set to non-zero value. + */ +int llapi_layout_ost_index_reset(struct llapi_layout *layout); + /******************** Pool Name ********************/ /** @@ -1285,7 +1297,15 @@ enum { typedef int (*llapi_layout_iter_cb)(struct llapi_layout *layout, void *cbdata); /** - * Iterate all components in the corresponding layout + * Iterate every components in the @layout and call callback function @cb. + * + * \param[in] layout component layout list. + * \param[in] cb callback function called for each component + * \param[in] cbdata callback data passed to the callback function + * + * \retval < 0 error happens during the iteration + * \retval LLAPI_LAYOUT_ITER_CONT finished the iteration w/o error + * \retval LLAPI_LAYOUT_ITER_STOP got something, stop the iteration */ int llapi_layout_comp_iterate(struct llapi_layout *layout, llapi_layout_iter_cb cb, void *cbdata); diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh index 0c819a7..9b64b2b 100755 --- a/lustre/tests/sanity.sh +++ b/lustre/tests/sanity.sh @@ -8197,7 +8197,7 @@ test_56xe() { local dir=$DIR/$tdir local f_comp=$dir/$tfile - local layout="-E 1M -S 512K -c 1 -E -1 -S 1M -c 2 -i 0" + local layout="-E 1M -S 512K -E 2M -c 2 -E 3M -c 2 -E eof -c $OSTCOUNT" local layout_before="" local layout_after="" @@ -8211,14 +8211,23 @@ test_56xe() { # 1. migrate a comp layout file by lfs_migrate $LFS_MIGRATE -y $f_comp || error "cannot migrate $f_comp by lfs_migrate" layout_after=$(SKIP_INDEX=yes get_layout_param $f_comp) + idx_before=$($LFS getstripe $f_comp | awk '$2 == "0:" { print $5 }' | + tr '\n' ' ') [ "$layout_before" == "$layout_after" ] || error "lfs_migrate: $layout_before != $layout_after" # 2. migrate a comp layout file by lfs migrate $LFS migrate $f_comp || error "cannot migrate $f_comp by lfs migrate" layout_after=$(SKIP_INDEX=yes get_layout_param $f_comp) + idx_after=$($LFS getstripe $f_comp | awk '$2 == "0:" { print $5 }' | + tr '\n' ' ') [ "$layout_before" == "$layout_after" ] || error "lfs migrate: $layout_before != $layout_after" + + # this may not fail every time with a broken lfs migrate, but will fail + # often enough to notice, and will not have false positives very often + [ "$idx_before" != "$idx_after" ] || + error "lfs migrate: $idx_before == $idx_after" } run_test 56xe "migrate a composite layout file" diff --git a/lustre/utils/lfs.c b/lustre/utils/lfs.c index 36490c9..2e24d3f 100644 --- a/lustre/utils/lfs.c +++ b/lustre/utils/lfs.c @@ -4521,8 +4521,7 @@ static int lfs_setstripe_internal(int argc, char **argv, * Strip the source layout of specific * OST object/index values. */ - result = llapi_layout_ost_index_set(layout, 0, - LLAPI_LAYOUT_DEFAULT); + result = llapi_layout_ost_index_reset(layout); if (result) { fprintf(stderr, "%s: set default ost index failed: %s\n", diff --git a/lustre/utils/liblustreapi_layout.c b/lustre/utils/liblustreapi_layout.c index 332af24..05f4dae 100644 --- a/lustre/utils/liblustreapi_layout.c +++ b/lustre/utils/liblustreapi_layout.c @@ -1603,6 +1603,42 @@ int llapi_layout_ost_index_set(struct llapi_layout *layout, int stripe_number, return 0; } +static int reset_index_cb(struct llapi_layout *layout, void *cbdata) +{ + int *save_errno = (int *)cbdata; + int rc; + + rc = llapi_layout_ost_index_set(layout, 0, LLAPI_LAYOUT_DEFAULT); + + /* save the first error returned, but try to reset all components */ + if (rc && !*save_errno) + *save_errno = errno; + + return LLAPI_LAYOUT_ITER_CONT; +} + +/** + * Reset the OST index on all components in \a layout to LLAPI_LAYOUT_DEFAULT. + * + * This is useful when reusing a file layout that was copied from an existing + * file and to be used for a new file (e.g. when mirroring or migrating or + * copying a file), so the objects are allocated on different OSTs. + * + * \retval 0 Success. + * \retval -ve errno Error with errno set to non-zero value. + */ +int llapi_layout_ost_index_reset(struct llapi_layout *layout) +{ + int save_errno = 0; + int rc; + + rc = llapi_layout_comp_iterate(layout, reset_index_cb, &save_errno); + + if (save_errno) + errno = save_errno; + return save_errno ? -save_errno : (rc < 0 ? -errno : 0); +} + /** * Get the OST index associated with stripe \a stripe_number. * @@ -2791,8 +2827,8 @@ bool llapi_layout_is_composite(struct llapi_layout *layout) * Iterate every components in the @layout and call callback function @cb. * * \param[in] layout component layout list. - * \param[in] cb callback for each component - * \param[in] cbdata callback data + * \param[in] cb callback function called for each component + * \param[in] cbdata callback data passed to the callback function * * \retval < 0 error happens during the iteration * \retval LLAPI_LAYOUT_ITER_CONT finished the iteration w/o error -- 1.8.3.1