From aa1646a8a1c2a91a1571147994229b6413ee7acc Mon Sep 17 00:00:00 2001 From: Jinshan Xiong Date: Mon, 4 Dec 2017 11:52:25 -0800 Subject: [PATCH] LU-10282 flr: comp-flags support when creating mirrors This patch will allow flags to be set when creating mirrors. The flags are set to individual components therefore it would be flexible to flags based on the location of components. Also, 'stale' and 'prefer' flags are allowed to set to individual components later on. This patch also revises component flags matching rules to allow flags and inverted flags to be set at the same time in the command lfs-find(1) and lfs-getstripe(1). Signed-off-by: Jinshan Xiong Change-Id: Ia077ca5454d49eb411bd82bd451c9dfc426d780c Reviewed-on: https://review.whamcloud.com/30360 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Jian Yu Tested-by: Jenkins Tested-by: Maloo Reviewed-by: Oleg Drokin --- lustre/doc/lfs-mirror-create.1 | 36 ++- lustre/doc/lfs-setstripe.1 | 47 +++- lustre/doc/lfs.1 | 61 ++++- lustre/include/lustre/lustreapi.h | 31 ++- lustre/include/uapi/linux/lustre/lustre_user.h | 19 +- lustre/lod/lod_internal.h | 3 +- lustre/lod/lod_lov.c | 3 + lustre/lod/lod_object.c | 50 +++- lustre/lod/lod_qos.c | 3 + lustre/lov/lov_object.c | 42 ++-- lustre/tests/sanity-flr.sh | 76 +++++- lustre/utils/lfs.c | 314 +++++++++++++++++-------- lustre/utils/liblustreapi.c | 34 ++- lustre/utils/liblustreapi_layout.c | 163 ++++++++++--- 14 files changed, 662 insertions(+), 220 deletions(-) diff --git a/lustre/doc/lfs-mirror-create.1 b/lustre/doc/lfs-mirror-create.1 index 31310af..2943851 100644 --- a/lustre/doc/lfs-mirror-create.1 +++ b/lustre/doc/lfs-mirror-create.1 @@ -4,7 +4,7 @@ lfs mirror create \- create a mirrored file or directory .SH SYNOPSIS .B lfs mirror create <\fB\-\-mirror\-count\fR|\fB\-N\fR[\fImirror_count\fR]> -.RI [ setstripe_options | \fB--parent ] ... +[\fIsetstripe_options\fR|\fB--parent\fR|[\fB--flags\fR<=\fIflags\fR>]] ... .RI < filename | directory > .SH DESCRIPTION This command creates a mirrored file or directory specified by the path name @@ -44,6 +44,27 @@ the stripe options inherited from the previous component will be used. .B \-\-parent This option indicates that the default stripe options inherited from parent directory will be used. +.TP +.B --flags<=\fIflags\fR> +Where available \fIflags\fR are as follows: +.RS +.TP +.BI prefer +is supported in mirror creation. This flag will be set to all components that +belong to ths corresponding mirror. The \fBprefer\fR flag gives hint to Lustre +for which mirrors should be used to serve I/O. When a mirrored file is being +read, the component(s) with \fBprefer\fR are likely to be picked to serve the +read; and when a mirrored file is prepared to be written, the MDT will +tend to choose the component with \fBprefer\fR flag set and stale the other +components with overlapping extents. This flag just provides a hint to Lustre, +which means Lustre may still choose mirrors without this flag set, for instance, +if all preferred mirrors are unavailable when the I/O occurs. This flag could be +set on multiple components. +.LP +Please note that this flag will be set to all components that belong to the +corresponding mirror. There also exists option \fB--comp-flags\fR that can be +set to individual components at mirror creation time. +.RE .SH EXAMPLES .TP .B lfs mirror create -N2 /mnt/lustre/file1 @@ -67,13 +88,20 @@ on OSTs with indices 2 and 3. It also has 4MB stripe size inherited from the first mirror. The third mirror has default striping pattern inherited from parent directory. .LP -.B lfs mirror create -N2 -E 4M -c 2 --pool flash -E eof -c 4 -N3 -E 16M -c 4 -S -.B 16M --pool archive -E eof -c -1 /mnt/lustre/file1 +.B lfs mirror create -N2 -E 4M -c 2 --pool flash --flags prefer -E eof -c 4 +.B -N3 -E 16M -c 4 -S 16M --pool archive --comp-flags=prefer -E eof -c -1 +.B /mnt/lustre/file1 .in Create a mirrored file with 5 PFL mirrors. The first and second mirrors have the same PFL layout, and both of the components are allocated from the \fBflash\fR -OST pool. The last three mirrors have the same PFL layout, and each of these +OST pool. Also, flag \fBprefer\fR is applied to all the components of the first +two mirrors, which tells the client to read data from those components whenever +they are available. +.br +The last three mirrors have the same PFL layout, and each of these components have a stripe size of 16MB and use OSTs in the \fBarchive\fR pool. +However, the flag \fBprefer\fR is only applied to the first component of each +mirror. .SH AUTHOR The \fBlfs mirror create\fR command is part of the Lustre filesystem. .SH SEE ALSO diff --git a/lustre/doc/lfs-setstripe.1 b/lustre/doc/lfs-setstripe.1 index 8d911352..ada5407 100644 --- a/lustre/doc/lfs-setstripe.1 +++ b/lustre/doc/lfs-setstripe.1 @@ -14,6 +14,9 @@ lfs setstripe \- set striping pattern of a file or directory default .B lfs setstripe --component-del \fR{\fB--component-id\fR|\fB-I \fIcomp_id\fR| .B --component-flags=\fIcomp_flags\fR} <\fIfilename\fR> .br +.B lfs setstripe --component-set \fR{\fB--component-id\fR|\fB-I \fIcomp_id\fR| +.B --component-flags=\fIcomp_flags\fR} <\fIfilename\fR> +.br .B lfs setstripe -d \fR<\fIdirectory\fR> .br .SH DESCRIPTION @@ -54,6 +57,8 @@ not possible to add components incrementally to the default directory layout, since the entire default layout can be replaced with one .B lfs setstripe call. +.br +Adding a component to FLR files is not allowed. .TP .B lfs setstripe --component-del \fR{\fB--component-id\fR|\fB-I \fIcomp_id\fR | \ \fB--component-flags \fIcomp_flags\fR} <\fIfilename\fR> @@ -69,7 +74,36 @@ layout, since the entire default layout can be replaced with one .B lfs setstripe call. The \fB--component-flags\fR option is used to specify certain type of -components, such as all instantiated ones. +components, such as all instantiated ones. Available component flags for +deleting a component would be: +.RS +.TP +.B init +instantiated component. +.LP +A leading '^' in front of the \fIflags\fR means inverted flags. +.br +Deleting a component from FLR files is not allowed. +.RE +.TP +.B lfs setstripe --component-set \fR{\fB--component-id\fR|\fB-I \fIcomp_id\fR | \ +\fB--component-flags \fIcomp_flags\fR} <\fIfilename\fR> +Set or clear \fIflags\fR to the specified component. This command can be only +be applied to FLR files. Available \fIflags\fR are: +.RS +.TP +.B stale +indicates the data in the corresponding component is not available for I/O. +Once a component is set to stale, a \fBlfs-mirror-resync\fR(1) is required to +clear the flag. +.TP +.B prefer +set this flag to the corresponding component so that Lustre would prefer to +choose the specified component for I/O. +.LP +A leading '^' means to clear the corresponding flag. It doesn't allow to clear +\fBstale\fR flag. +.RE .TP .B lfs setstripe -d \fR<\fIdirectory\fR> .br @@ -180,8 +214,13 @@ Component flags. Available \fIflags\fR: .B init\fR: instantiated component. .RE .RS -.B ^init\fR: uninstantiated component. +.B prefer\fR: preferred component, for FLR only. +.RE +.RS +.B stale\fR: stale component, for FLR only. .RE +.LP +A leading '^' means inverted flag. Multiple flags can be separated by comma(s). .RE .TP .B --component-add @@ -211,6 +250,10 @@ the end of file. .B $ lfs setstripe --component-del -I 1 /mnt/lustre/file1 This deletes the component with ID equals 1 from an existing file. .TP +.B $ lfs setstripe --component-set -I 1 --component-flags=^prefer,stale /mnt/lustre/file1 +This command will clear the \fBprefer\fR flag and set the \fBstale\fR to +component with ID 1. +.TP .B $ lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/file1 This created file with Data-on-MDT layout. The first 1M is placed on MDT and \ rest of file is placed on OST with default striping. diff --git a/lustre/doc/lfs.1 b/lustre/doc/lfs.1 index 2ef08a5..c18e370 100644 --- a/lustre/doc/lfs.1 +++ b/lustre/doc/lfs.1 @@ -117,6 +117,32 @@ can be used to create a new file with a specific striping pattern, determine the default striping pattern, gather the extended attributes (object numbers and location) for a specific file. It can be invoked interactively without any arguments or in a non-interactive mode with one of the arguments supported. +.TP +.B Component Flags +.br +Component flags can be set by option \fB--component-flags\fR<=\fIflags\fR>. + +.br +The following component flags are supported so far: +.RS +.TP +.B init +indicates the object(s) of this component has been instantiated. +.TP +.B stale +means the data contained in this component is stale. Data in a stale component +won't be returned by reading. Used only by FLR. +.TP +.B prefer +is a hint to Lustre that means this components will likely be chosen for read +and write. Used only by FLR. +.LP +The same set of flag can be set in \fBgetstripe\fR to list the component(s) +that matches the \fIflags\fR, or doesn't match the \fIflags\fR if it has a caret '^' +in front. It can also be applied to \fBfind\fR so that only the files that have +the components described by \fBflags\fR will be listed. +.RE +.LP .SH OPTIONS The various options supported by lfs are listed and explained below: .TP @@ -132,6 +158,16 @@ Changelog consumers must be registered on the MDT node using \fBlctl\fR. .B check Display the status of MDS or OSTs (as specified in the command) or all the servers (MDS and OSTs) .TP +.B data_version [-n] +Display current version of file data. If -n is specified, data version is read +without taking lock. As a consequence, data version could be outdated if there +is dirty caches on filesystem clients, but this will not force data flushes and +has less impact on filesystem. + +Even without -n, race conditions are possible and data version should be +checked before and after an operation to be confident the data did not change +during it. +.TP .B df See .BR lfs-df (1) @@ -141,6 +177,10 @@ usage. .TP .B find To search the directory tree rooted at the given dir/file name for the files that match the given parameters: \fB--atime\fR (file was last accessed N*24 hours ago), \fB--ctime\fR (file's status was last changed N*24 hours ago), \fB--mtime\fR (file's data was last modified N*24 hours ago), \fB--obd\fR (file has an object on a specific OST or OSTs), \fB--size\fR (file has size in bytes, or \fBk\fRilo-, \fBM\fRega-, \fBG\fRiga-, \fBT\fRera-, \fBP\fReta-, or \fBE\fRxabytes if a suffix is given), \fB--type\fR (file has the type: \fBb\fRlock, \fBc\fRharacter, \fBd\fRirectory, \fBp\fRipe, \fBf\fRile, sym\fBl\fRink, \fBs\fRocket, or \fBD\fRoor (Solaris)), \fB--uid\fR (file has specific numeric user ID), \fB--user\fR (file owned by specific user, numeric user ID allowed), \fB--gid\fR (file has specific group ID), \fB--group\fR (file belongs to specific group, numeric group ID allowed),\fB--projid\fR (file has specific numeric project ID), \fB--layout\fR (file has a raid0 layout or is released). The option \fB--maxdepth\fR limits find to decend at most N levels of directory tree. The options \fB--print\fR and \fB--print0\fR print full file name, followed by a newline or NUL character correspondingly. Using \fB!\fR before an option negates its meaning (\fIfiles NOT matching the parameter\fR). Using \fB+\fR before a numeric value means 'more than n', while \fB-\fR before a numeric value means 'less than n'. + +.br +\fBlfs find\fR command allows to use component flags to list files that have +specified \fIflags\fR matched(See \fB Component Flags\fR). .TP .B getname [-h]|[path ...] Report all the Lustre mount points and the corresponding Lustre filesystem @@ -222,6 +262,10 @@ You can limit the displayed content by specifing argument for .B --component-end|-E options. For example, "--component-id=2" or "-I2" will only display the layout attributes for the component with id equal to 2. + +.br +\fBlfs getstripe\fR command allows to use component flags to list files that +have specified \fIflags\fR matched(See \fB Component Flags\fR). .TP .B fid2path [--link ] ... Print out the pathname(s) for the specified \fIfid\fR(s) from the filesystem @@ -269,16 +313,6 @@ reside on the same MDT and writable by the user. Swapping the layout of two directories is not permitted. .TP -.B data_version [-n] -Display current version of file data. If -n is specified, data version is read -without taking lock. As a consequence, data version could be outdated if there -is dirty caches on filesystem clients, but this will not force data flushes and -has less impact on filesystem. - -Even without -n, race conditions are possible and data version should be -checked before and after an operation to be confident the data did not change -during it. -.TP .B mkdir lfs mkdir is documented in the man page: lfs-mkdir(1). NOTE: .B lfs setdirstripe @@ -318,6 +352,9 @@ Lists the detailed information of the component 2 in a given file .B $ lfs getstripe --component-flags=^init -I /mnt/lustre/file1 Print only the component IDs for all the uninstantiated components .TP +.B $ lfs getstripe --component-flags=init,^stale -I /mnt/lustre/file1 +Print only the component(s) that are instantiated but not stale +.TP .B $ lfs getstripe -E-64M /mnt/lustre/file1 Lists the information of the components in a file which has less than 64M extent end .TP @@ -336,6 +373,10 @@ Recursively list all files in a given directory that have objects on OST2-UUID. .B $ lfs find --component-count +3 /mnt/lustre Recursively list all files that have more than 3 components. .TP +.B $ lfs find --component-flags=init,prefer,^stale /mnt/lustre +Recursively list all files that have at least one component with both 'init' and +\'prefer' flags set, and doesn't have flag 'stale' set. +.TP .B $ lfs check servers Check the status of all servers (MDT, OST) .TP diff --git a/lustre/include/lustre/lustreapi.h b/lustre/include/lustre/lustreapi.h index 09648a0..f0069d4 100644 --- a/lustre/include/lustre/lustreapi.h +++ b/lustre/include/lustre/lustreapi.h @@ -224,7 +224,6 @@ struct find_param { fp_check_comp_count:1, fp_exclude_comp_count:1, fp_check_comp_flags:1, - fp_exclude_comp_flags:1, fp_check_comp_start:1, fp_exclude_comp_start:1, fp_check_comp_end:1, @@ -271,6 +270,7 @@ struct find_param { __u32 fp_comp_count; __u32 fp_comp_flags; + __u32 fp_comp_neg_flags; __u32 fp_comp_id; unsigned long long fp_comp_start; unsigned long long fp_comp_start_units; @@ -361,6 +361,7 @@ int llapi_get_version_string(char *version, unsigned int version_size); int llapi_get_version(char *buffer, int buffer_size, char **version) __attribute__((deprecated)); int llapi_get_data_version(int fd, __u64 *data_version, __u64 flags); +int llapi_file_flush(int fd); extern int llapi_get_ost_layout_version(int fd, __u32 *layout_version); int llapi_hsm_state_get_fd(int fd, struct hsm_user_state *hus); int llapi_hsm_state_get(const char *path, struct hsm_user_state *hus); @@ -837,10 +838,9 @@ static const struct comp_flag_name { const char *cfn_name; } comp_flags_table[] = { { LCME_FL_INIT, "init" }, - { LCME_FL_PRIMARY, "primary" }, { LCME_FL_STALE, "stale" }, + { LCME_FL_PREF_RW, "prefer" }, { LCME_FL_OFFLINE, "offline" }, - { LCME_FL_PREFERRED, "preferred" } }; /** @@ -904,14 +904,33 @@ int llapi_layout_file_comp_del(const char *path, uint32_t id, uint32_t flags); * attributes are passed in by @comp and @valid is used to specify which * attributes in the component are going to be changed. */ -int llapi_layout_file_comp_set(const char *path, - const struct llapi_layout *comp, - uint32_t valid); +int llapi_layout_file_comp_set(const char *path, uint32_t *ids, uint32_t *flags, + size_t count); /** * Check if the file layout is composite. */ bool llapi_layout_is_composite(struct llapi_layout *layout); +enum { + LLAPI_LAYOUT_ITER_CONT = 0, + LLAPI_LAYOUT_ITER_STOP = 1, +}; + +/** + * Iteration callback function. + * + * \retval LLAPI_LAYOUT_ITER_CONT Iteration proceeds + * \retval LLAPI_LAYOUT_ITER_STOP Stop iteration + * \retval < 0 error code + */ +typedef int (*llapi_layout_iter_cb)(struct llapi_layout *layout, void *cbdata); + +/** + * Iterate all components in the corresponding layout + */ +int llapi_layout_comp_iterate(struct llapi_layout *layout, + llapi_layout_iter_cb cb, void *cbdata); + /** * FLR: mirror operation APIs */ diff --git a/lustre/include/uapi/linux/lustre/lustre_user.h b/lustre/include/uapi/linux/lustre/lustre_user.h index a4c1792..21948c6 100644 --- a/lustre/include/uapi/linux/lustre/lustre_user.h +++ b/lustre/include/uapi/linux/lustre/lustre_user.h @@ -570,16 +570,20 @@ static inline bool lu_extent_is_whole(struct lu_extent *e) } enum lov_comp_md_entry_flags { - LCME_FL_PRIMARY = 0x00000001, /* Not used */ - LCME_FL_STALE = 0x00000002, /* Not used */ - LCME_FL_OFFLINE = 0x00000004, /* Not used */ - LCME_FL_PREFERRED = 0x00000008, /* Not used */ + LCME_FL_STALE = 0x00000001, /* FLR: stale data */ + LCME_FL_PREF_RD = 0x00000002, /* FLR: preferred for reading */ + LCME_FL_PREF_WR = 0x00000004, /* FLR: preferred for writing */ + LCME_FL_PREF_RW = LCME_FL_PREF_RD | LCME_FL_PREF_WR, + LCME_FL_OFFLINE = 0x00000008, /* Not used */ LCME_FL_INIT = 0x00000010, /* instantiated */ LCME_FL_NEG = 0x80000000 /* used to indicate a negative flag, won't be stored on disk */ }; -#define LCME_KNOWN_FLAGS (LCME_FL_NEG | LCME_FL_INIT) +#define LCME_KNOWN_FLAGS (LCME_FL_NEG | LCME_FL_INIT | LCME_FL_STALE | \ + LCME_FL_PREF_RW) +/* The flags can be set by users at mirror creation time. */ +#define LCME_USER_FLAGS (LCME_FL_PREF_RW) /* the highest bit in obdo::o_layout_version is used to mark if the file is * being resynced. */ @@ -650,11 +654,6 @@ struct lov_comp_md_v1 { struct lov_comp_md_entry_v1 lcm_entries[0]; } __attribute__((packed)); -/* - * Maximum number of mirrors Lustre can support. - */ -#define LUSTRE_MIRROR_COUNT_MAX 16 - static inline __u32 lov_user_md_size(__u16 stripes, __u32 lmm_magic) { if (stripes == (__u16)-1) diff --git a/lustre/lod/lod_internal.h b/lustre/lod/lod_internal.h index 0ac4880..0063026 100644 --- a/lustre/lod/lod_internal.h +++ b/lustre/lod/lod_internal.h @@ -266,7 +266,8 @@ struct lod_default_striping { }; struct lod_mirror_entry { - __u16 lme_stale:1; + __u16 lme_stale:1, + lme_primary:1; /* mirror id */ __u16 lme_id; /* start,end index of this mirror in ldo_comp_entries */ diff --git a/lustre/lod/lod_lov.c b/lustre/lod/lod_lov.c index 6eb6591..83296ac 100644 --- a/lustre/lod/lod_lov.c +++ b/lustre/lod/lod_lov.c @@ -752,9 +752,11 @@ int lod_fill_mirrors(struct lod_object *lo) lod_comp = &lo->ldo_comp_entries[0]; for (i = 0; i < lo->ldo_comp_cnt; i++, lod_comp++) { int stale = !!(lod_comp->llc_flags & LCME_FL_STALE); + int preferred = !!(lod_comp->llc_flags & LCME_FL_PREF_WR); if (mirror_id_of(lod_comp->llc_id) == mirror_id) { lo->ldo_mirrors[mirror_idx].lme_stale |= stale; + lo->ldo_mirrors[mirror_idx].lme_primary |= preferred; lo->ldo_mirrors[mirror_idx].lme_end = i; continue; } @@ -768,6 +770,7 @@ int lod_fill_mirrors(struct lod_object *lo) lo->ldo_mirrors[mirror_idx].lme_id = mirror_id; lo->ldo_mirrors[mirror_idx].lme_stale = stale; + lo->ldo_mirrors[mirror_idx].lme_primary = preferred; lo->ldo_mirrors[mirror_idx].lme_start = i; lo->ldo_mirrors[mirror_idx].lme_end = i; } diff --git a/lustre/lod/lod_object.c b/lustre/lod/lod_object.c index f5f5211..f7cb2a3 100644 --- a/lustre/lod/lod_object.c +++ b/lustre/lod/lod_object.c @@ -2449,7 +2449,7 @@ static int lod_declare_layout_set(const struct lu_env *env, struct lod_device *d = lu2lod_dev(dt->do_lu.lo_dev); struct lod_object *lo = lod_dt_obj(dt); struct lov_comp_md_v1 *comp_v1 = buf->lb_buf; - __u32 magic, id; + __u32 magic; int i, j, rc; bool changed = false; ENTRY; @@ -2476,15 +2476,27 @@ static int lod_declare_layout_set(const struct lu_env *env, } for (i = 0; i < comp_v1->lcm_entry_count; i++) { - id = comp_v1->lcm_entries[i].lcme_id; + __u32 id = comp_v1->lcm_entries[i].lcme_id; + __u32 flags = comp_v1->lcm_entries[i].lcme_flags; + + if (flags & LCME_FL_INIT) { + if (changed) + lod_object_free_striping(env, lo); + RETURN(-EINVAL); + } for (j = 0; j < lo->ldo_comp_cnt; j++) { lod_comp = &lo->ldo_comp_entries[j]; - if (id == lod_comp->llc_id || id == LCME_ID_ALL) { - lod_comp->llc_flags = - comp_v1->lcm_entries[i].lcme_flags; - changed = true; + if (id != lod_comp->llc_id) + continue; + + if (flags & LCME_FL_NEG) { + flags &= ~LCME_FL_NEG; + lod_comp->llc_flags &= ~flags; + } else { + lod_comp->llc_flags |= flags; } + changed = true; } } @@ -2497,8 +2509,8 @@ static int lod_declare_layout_set(const struct lu_env *env, lod_obj_inc_layout_gen(lo); info->lti_buf.lb_len = lod_comp_md_size(lo, false); - rc = lod_sub_declare_xattr_set(env, dt, &info->lti_buf, - XATTR_NAME_LOV, 0, th); + rc = lod_sub_declare_xattr_set(env, dt_object_child(dt), &info->lti_buf, + XATTR_NAME_LOV, LU_XATTR_REPLACE, th); RETURN(rc); } @@ -2558,6 +2570,12 @@ static int lod_declare_layout_del(const struct lu_env *env, RETURN(-EINVAL); } + if (id == LCME_ID_INVAL && !flags) { + CDEBUG(D_LAYOUT, "%s: no id or flags specified.\n", + lod2obd(d)->obd_name); + RETURN(-EINVAL); + } + if (flags & LCME_FL_NEG) { neg_flags = flags & ~LCME_FL_NEG; flags = 0; @@ -5522,16 +5540,22 @@ static int lod_declare_update_rdonly(const struct lu_env *env, /** * Pick a mirror as the primary. - * Now it only picks the first mirror, this algo can be - * revised later after knowing the topology of cluster or - * the availability of OSTs. + * Now it only picks the first mirror that has primary flag set and + * doesn't have any stale components. This algo should be revised + * later after knowing the topology of cluster or the availability of + * OSTs. */ for (picked = -1, i = 0; i < lo->ldo_mirror_count; i++) { int index = (i + seq) % lo->ldo_mirror_count; if (!lo->ldo_mirrors[index].lme_stale) { - picked = index; - break; + if (lo->ldo_mirrors[index].lme_primary) { + picked = index; + break; + } + + if (picked < 0) + picked = index; } } if (picked < 0) /* failed to pick a primary */ diff --git a/lustre/lod/lod_qos.c b/lustre/lod/lod_qos.c index 875424f..840db03 100644 --- a/lustre/lod/lod_qos.c +++ b/lustre/lod/lod_qos.c @@ -1941,6 +1941,9 @@ int lod_qos_parse_config(const struct lu_env *env, struct lod_object *lo, comp_v1->lcm_entries[i].lcme_offset); ext = &comp_v1->lcm_entries[i].lcme_extent; lod_comp->llc_extent = *ext; + lod_comp->llc_flags = + comp_v1->lcm_entries[i].lcme_flags & + LCME_USER_FLAGS; } pool_name = NULL; diff --git a/lustre/lov/lov_object.c b/lustre/lov/lov_object.c index a4ade5c..9b56d84 100644 --- a/lustre/lov/lov_object.c +++ b/lustre/lov/lov_object.c @@ -637,6 +637,7 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev, unsigned int mirror_count; int flr_state = lsm->lsm_flags & LCM_FL_FLR_MASK; int result = 0; + unsigned int seq; int i, j; ENTRY; @@ -719,8 +720,8 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev, /* entries must be sorted by mirrors */ lre->lre_mirror_id = mirror_id; lre->lre_start = lre->lre_end = i; - lre->lre_preferred = (lle->lle_lsme->lsme_flags & - LCME_FL_PREFERRED); + lre->lre_preferred = !!(lle->lle_lsme->lsme_flags & + LCME_FL_PREF_RD); lre->lre_valid = lle->lle_valid; lre->lre_stale = !lle->lle_valid; } @@ -758,43 +759,28 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev, if (psz > 0) cl_object_header(&lov->lo_cl)->coh_page_bufsize += psz; - /* decide the preferred mirror */ - mirror_count = 0, i = 0; - lov_foreach_mirror_entry(lov, lre) { - i++; + /* decide the preferred mirror. It uses the hash value of lov_object + * so that different clients would use different mirrors for read. */ + mirror_count = 0; + seq = hash_long((unsigned long)lov, 8); + for (i = 0; i < comp->lo_mirror_count; i++) { + unsigned int idx = (i + seq) % comp->lo_mirror_count; + + lre = lov_mirror_entry(lov, idx); if (lre->lre_stale) continue; mirror_count++; /* valid mirror */ if (lre->lre_preferred || comp->lo_preferred_mirror < 0) - comp->lo_preferred_mirror = i - 1; + comp->lo_preferred_mirror = idx; } - if (mirror_count == 0) { + if (!mirror_count) { CDEBUG(D_INODE, DFID " doesn't have any valid mirrors\n", PFID(lu_object_fid(lov2lu(lov)))); - GOTO(out, result = -EINVAL); - } - - if (OBD_FAIL_CHECK(OBD_FAIL_FLR_RANDOM_PICK_MIRROR)) { - unsigned int seq; - - get_random_bytes(&seq, sizeof(seq)); - seq %= mirror_count; - - i = 0; - lov_foreach_mirror_entry(lov, lre) { - i++; - if (lre->lre_stale) - continue; - - if (!seq--) { - comp->lo_preferred_mirror = i - 1; - break; - } - } + comp->lo_preferred_mirror = 0; } LASSERT(comp->lo_preferred_mirror >= 0); diff --git a/lustre/tests/sanity-flr.sh b/lustre/tests/sanity-flr.sh index c3b7cd4..877487f 100644 --- a/lustre/tests/sanity-flr.sh +++ b/lustre/tests/sanity-flr.sh @@ -133,6 +133,18 @@ verify_comp_attr() { value=$(eval $cmd) + [ $attr = lcme_flags ] && { + local fl + local expected_list=$(comma_list $expected) + for fl in ${expected_list//,/ }; do + echo $value | grep -q $fl || { + $getstripe_cmd $tf + error "expected flag $fl existing on $comp_id" + } + done + return + } + [[ $value = $expected ]] || { $getstripe_cmd $tf error "verify $attr failed on $tf: $value != $expected" @@ -540,6 +552,64 @@ test_0f() { } run_test 0f "lfs mirror extend composite layout mirrors" +test_0g() { + local tf=$DIR/$tfile + + $LFS mirror create -N -E 1M -o0 --flags=prefer -E eof -o1 -N -o1 $tf || + error "create mirrored file $tf failed" + + verify_comp_attr lcme_flags $tf 0x10001 prefer + verify_comp_attr lcme_flags $tf 0x10002 prefer + + # write to the mirrored file and check primary + cp /etc/hosts $tf || error "error writing file '$tf'" + + verify_comp_attr lcme_flags $tf 0x20003 stale + + # resync file and check prefer flag + $LFS mirror resync $tf || error "error resync-ing file '$tf'" + + cancel_lru_locks osc + $LCTL set_param osc.*.stats=clear + cat $tf &> /dev/null || error "error reading file '$tf'" + + # verify that the data was provided by OST1 where mirror 1 resides + local nr_read=$($LCTL get_param -n osc.$FSNAME-OST0000-osc-ffff*.stats | + awk '/ost_read/{print $2}') + [ -n "$nr_read" ] || error "read was not provided by OST1" +} +run_test 0g "lfs mirror create flags support" + +test_0h() { + local tf=$DIR/$tfile + + $LFS mirror create -N -E 1M --flags=prefer -E eof -N2 $tf || + error "create mirrored file $tf failed" + + verify_comp_attr lcme_flags $tf 0x10001 prefer + verify_comp_attr lcme_flags $tf 0x10002 prefer + + # set flags to the first component + $LFS setstripe --comp-set -I 0x10001 --comp-flags=^prefer,stale $tf + + verify_comp_attr lcme_flags $tf 0x10001 stale + verify_comp_attr lcme_flags $tf 0x10002 prefer + + $LFS setstripe --comp-set -I0x10001 --comp-flags=^stale $tf && + error "clearing 'stale' should fail" + + # write and resync file. It can't resync the file directly because the + # file state is still 'ro' + cp /etc/hosts $tf || error "error writing file '$tf'" + $LFS mirror resync $tf || error "error resync-ing file '$tf'" + + $LFS setstripe --comp-set -I 0x20003 --comp-flags=prefer $tf || + error "error setting flag prefer" + + verify_comp_attr lcme_flags $tf 0x20003 prefer +} +run_test 0h "set, clear and test flags for FLR files" + test_1() { local tf=$DIR/$tfile local mirror_count=16 # LUSTRE_MIRROR_COUNT_MAX @@ -872,10 +942,10 @@ test_33() { error "mirrored file size is not $fsize" # read file - all OSTs are available - echo "reading file (data should be provided by ost1)... " + echo "reading file (data can be provided by any ost)... " local rs=$(cat $DIR/$tfile | head -1) - [[ "$rs" == "ost1" ]] || - error "file content error: expected: \"ost1\", actual: \"$rs\"" + [[ "$rs" == "ost1" || "$rs" == "ost2" ]] || + error "file content error: expected: \"ost1\" or \"ost2\"" # read file again with ost1 failed stop_osts 1 diff --git a/lustre/utils/lfs.c b/lustre/utils/lfs.c index 4c34482..4920cf0 100644 --- a/lustre/utils/lfs.c +++ b/lustre/utils/lfs.c @@ -182,7 +182,9 @@ static inline int lfs_mirror_extend(int argc, char **argv) "\t It can be a plain layout or a composite layout.\n" \ "\t If not specified, the stripe options inherited\n" \ "\t from the previous component will be used.\n" \ - "\tparent: Use default stripe options from parent directory\n" + "\tparent: Use default stripe options from parent directory\n" \ + "\tflags: set flags to the component of the current mirror.\n" \ + "\t Only \"prefer\" flag is supported so far.\n" #define MIRROR_EXTEND_HELP \ MIRROR_CREATE_HELP \ @@ -877,28 +879,59 @@ static int migrate_nonblock(int fd, int fdv) return 0; } -static int lfs_component_set(char *fname, int comp_id, __u32 flags) +static int lfs_component_set(char *fname, int comp_id, + __u32 flags, __u32 neg_flags) { - return -ENOTSUP; + __u32 ids[2]; + __u32 flags_array[2]; + size_t count = 0; + int rc; + + if (flags) { + ids[count] = comp_id; + flags_array[count] = flags; + ++count; + } + + if (neg_flags) { + ids[count] = comp_id; + flags_array[count] = neg_flags | LCME_FL_NEG; + ++count; + } + + rc = llapi_layout_file_comp_set(fname, ids, flags_array, count); + if (rc) + fprintf(stderr, + "%s: cannot change the flags of component '%#x' of file '%s': %x / ^(%x)\n", + progname, comp_id, fname, flags, neg_flags); + + return rc; } -static int lfs_component_del(char *fname, __u32 comp_id, __u32 flags) +static int lfs_component_del(char *fname, __u32 comp_id, + __u32 flags, __u32 neg_flags) { int rc = 0; - if (flags != 0 && comp_id != 0) + if (flags && neg_flags) + return -EINVAL; + + if (!flags && neg_flags) + flags = neg_flags | LCME_FL_NEG; + + if ((flags && comp_id) || (!flags && !comp_id)) return -EINVAL; /* LCME_FL_INIT is the only supported flag in PFL */ - if (flags != 0) { + if (flags) { if (flags & ~LCME_KNOWN_FLAGS) { fprintf(stderr, - "%s setstripe: bad component flags %#x\n", + "%s setstripe: unknown flags %#x\n", progname, flags); return -EINVAL; } } else if (comp_id > LCME_ID_MAX) { - fprintf(stderr, "%s setstripe: bad component id %u\n", + fprintf(stderr, "%s setstripe: invalid component id %u\n", progname, comp_id); return -EINVAL; } @@ -1005,9 +1038,54 @@ out: return rc; } +static int comp_str2flags(char *string, __u32 *flags, __u32 *neg_flags) +{ + char *name; + + if (string == NULL) + return -EINVAL; + + *flags = 0; + *neg_flags = 0; + for (name = strtok(string, ","); name; name = strtok(NULL, ",")) { + bool found = false; + int i; + + for (i = 0; i < ARRAY_SIZE(comp_flags_table); i++) { + __u32 comp_flag = comp_flags_table[i].cfn_flag; + const char *comp_name = comp_flags_table[i].cfn_name; + + if (strcmp(name, comp_name) == 0) { + *flags |= comp_flag; + found = true; + } else if (strncmp(name, "^", 1) == 0 && + strcmp(name + 1, comp_name) == 0) { + *neg_flags |= comp_flag; + found = true; + } + } + if (!found) { + llapi_printf(LLAPI_MSG_ERROR, + "%s: component flag '%s' not supported\n", + progname, name); + return -EINVAL; + } + } + + if (!*flags && !*neg_flags) + return -EINVAL; + + /* don't allow to set and exclude the same flag */ + if (*flags & *neg_flags) + return -EINVAL; + + return 0; +} + /** * struct mirror_args - Command-line arguments for mirror(s). * @m_count: Number of mirrors to be created with this layout. + * @m_flags: Mirror level flags, only 'prefer' is supported. * @m_layout: Mirror layout. * @m_file: A victim file. Its layout will be split and used as a mirror. * @m_next: Point to the next node of the list. @@ -1017,11 +1095,36 @@ out: */ struct mirror_args { __u32 m_count; + __u32 m_flags; struct llapi_layout *m_layout; const char *m_file; struct mirror_args *m_next; }; +static int mirror_sanity_check_flags(struct llapi_layout *layout, void *unused) +{ + uint32_t flags; + int rc; + + rc = llapi_layout_comp_flags_get(layout, &flags); + if (rc) + return -errno; + + if (flags & LCME_FL_NEG) { + fprintf(stderr, "error: %s: negative flags are not supported\n", + progname); + return -EINVAL; + } + + if (flags & LCME_FL_STALE) { + fprintf(stderr, "error: %s: setting '%s' is not supported\n", + progname, comp_flags_table[LCME_FL_STALE].cfn_name); + return -EINVAL; + } + + return LLAPI_LAYOUT_ITER_CONT; +} + static inline int mirror_sanity_check_one(struct llapi_layout *layout) { uint64_t start, end; @@ -1057,7 +1160,8 @@ static inline int mirror_sanity_check_one(struct llapi_layout *layout) return -EINVAL; } - return 0; + rc = llapi_layout_comp_iterate(layout, mirror_sanity_check_flags, NULL); + return rc; } /** @@ -1123,9 +1227,8 @@ static int mirror_create_sanity_check(const char *fname, return -ENODATA; } } else { - if (list->m_layout != NULL) - has_m_layout = true; - else { + has_m_layout = true; + if (list->m_layout == NULL) { fprintf(stderr, "error: %s: no mirror layout\n", progname); return -EINVAL; @@ -1149,6 +1252,25 @@ static int mirror_create_sanity_check(const char *fname, return 0; } +static int mirror_set_flags(struct llapi_layout *layout, void *cbdata) +{ + __u32 mirror_flags = *(__u32 *)cbdata; + uint32_t flags; + int rc; + + rc = llapi_layout_comp_flags_get(layout, &flags); + if (rc < 0) + return rc; + + if (!flags) { + rc = llapi_layout_comp_flags_set(layout, mirror_flags); + if (rc) + return rc; + } + + return LLAPI_LAYOUT_ITER_CONT; +} + /** * mirror_create() - Create a mirrored file. * @fname: The file to be created. @@ -1173,6 +1295,16 @@ static int mirror_create(char *fname, struct mirror_args *mirror_list) cur_mirror = mirror_list; while (cur_mirror != NULL) { + rc = llapi_layout_comp_iterate(cur_mirror->m_layout, + mirror_set_flags, + &cur_mirror->m_flags); + if (rc) { + rc = -errno; + fprintf(stderr, "%s: failed to set mirror flags\n", + progname); + goto error; + } + for (i = 0; i < cur_mirror->m_count; i++) { rc = llapi_layout_merge(&layout, cur_mirror->m_layout); if (rc) { @@ -1256,7 +1388,6 @@ static int mirror_extend_file(const char *fname, const char *victim_file, int fdv = -1; struct stat stbuf; struct stat stbuf_v; - __u64 dv; int rc; fd = open(fname, O_RDWR); @@ -1310,13 +1441,13 @@ static int mirror_extend_file(const char *fname, const char *victim_file, } /* Get rid of caching pages from clients */ - rc = llapi_get_data_version(fd, &dv, LL_DV_WR_FLUSH); + rc = llapi_file_flush(fd); if (rc < 0) { error_loc = "cannot get data version"; return rc; } - rc = llapi_get_data_version(fdv, &dv, LL_DV_WR_FLUSH); + rc = llapi_file_flush(fdv); if (rc < 0) { error_loc = "cannot get data version"; return rc; @@ -1474,8 +1605,9 @@ struct lfs_setstripe_args { long long lsa_stripe_count; long long lsa_stripe_off; __u32 lsa_comp_flags; - int lsa_nr_tgts; + __u32 lsa_comp_neg_flags; unsigned long long lsa_pattern; + int lsa_nr_tgts; __u32 *lsa_tgts; char *lsa_pool_name; }; @@ -1638,6 +1770,13 @@ static int comp_args_to_layout(struct llapi_layout **composite, return rc; } + rc = llapi_layout_comp_flags_set(layout, lsa->lsa_comp_flags); + if (rc) { + fprintf(stderr, "Set flags 0x%x failed: %s\n", + lsa->lsa_comp_flags, strerror(errno)); + return rc; + } + if (lsa->lsa_pool_name != NULL) { rc = llapi_layout_pool_name_set(layout, lsa->lsa_pool_name); if (rc) { @@ -1775,69 +1914,6 @@ static int adjust_first_extent(char *fname, struct llapi_layout *layout) return 0; } -static inline bool comp_flags_is_neg(__u32 flags) -{ - return flags & LCME_FL_NEG; -} - -static inline void comp_flags_set_neg(__u32 *flags) -{ - *flags |= LCME_FL_NEG; -} - -static inline void comp_flags_clear_neg(__u32 *flags) -{ - *flags &= ~LCME_FL_NEG; -} - -static int comp_str2flags(__u32 *flags, char *string) -{ - char *name; - __u32 neg_flags = 0; - - if (string == NULL) - return -EINVAL; - - *flags = 0; - for (name = strtok(string, ","); name; name = strtok(NULL, ",")) { - bool found = false; - int i; - - for (i = 0; i < ARRAY_SIZE(comp_flags_table); i++) { - __u32 comp_flag = comp_flags_table[i].cfn_flag; - const char *comp_name = comp_flags_table[i].cfn_name; - - if (strcmp(name, comp_name) == 0) { - *flags |= comp_flag; - found = true; - } else if (strncmp(name, "^", 1) == 0 && - strcmp(name + 1, comp_name) == 0) { - neg_flags |= comp_flag; - found = true; - } - } - if (!found) { - llapi_printf(LLAPI_MSG_ERROR, - "%s: component flag '%s' not supported\n", - progname, name); - return -EINVAL; - } - } - - if (*flags == 0 && neg_flags == 0) - return -EINVAL; - /* don't support mixed flags for now */ - if (*flags && neg_flags) - return -EINVAL; - - if (neg_flags) { - *flags = neg_flags; - comp_flags_set_neg(flags); - } - - return 0; -} - static inline bool arg_is_eof(char *arg) { return !strncmp(arg, "-1", strlen("-1")) || @@ -1913,6 +1989,7 @@ enum { LFS_COMP_USE_PARENT_OPT, LFS_COMP_NO_VERIFY_OPT, LFS_PROJID_OPT, + LFS_MIRROR_FLAGS_OPT, }; /* functions */ @@ -1977,6 +2054,8 @@ static int lfs_setstripe0(int argc, char **argv, enum setstripe_origin opc) .name = "parent", .has_arg = no_argument}, { .val = LFS_COMP_NO_VERIFY_OPT, .name = "no-verify", .has_arg = no_argument}, + { .val = LFS_MIRROR_FLAGS_OPT, + .name = "flags", .has_arg = required_argument}, { .val = 'c', .name = "stripe-count", .has_arg = required_argument}, { .val = 'c', .name = "stripe_count", .has_arg = required_argument}, { .val = 'd', .name = "delete", .has_arg = no_argument}, @@ -2029,9 +2108,23 @@ static int lfs_setstripe0(int argc, char **argv, enum setstripe_origin opc) comp_del = 1; break; case LFS_COMP_FLAGS_OPT: - result = comp_str2flags(&lsa.lsa_comp_flags, optarg); + result = comp_str2flags(optarg, &lsa.lsa_comp_flags, + &lsa.lsa_comp_neg_flags); if (result != 0) goto usage_error; + if (mirror_mode && lsa.lsa_comp_neg_flags) { + fprintf(stderr, "%s: inverted flags are not supported\n", + progname); + goto usage_error; + } + if (lsa.lsa_comp_neg_flags & LCME_FL_STALE) { + fprintf(stderr, + "%s: cannot clear 'stale' flags from component. Please use lfs-mirror-resync(1) instead\n", + progname); + result = -EINVAL; + goto error; + } + break; case LFS_COMP_SET_OPT: comp_set = 1; @@ -2048,6 +2141,35 @@ static int lfs_setstripe0(int argc, char **argv, enum setstripe_origin opc) case LFS_COMP_NO_VERIFY_OPT: mirror_flags |= NO_VERIFY; break; + case LFS_MIRROR_FLAGS_OPT: { + __u32 flags; + + if (!mirror_mode || !last_mirror) { + fprintf(stderr, "error: %s: --flags must be specified with --mirror-count|-N option\n", + progname); + goto usage_error; + } + + result = comp_str2flags(optarg, &last_mirror->m_flags, + &flags); + if (result != 0) + goto usage_error; + + if (flags) { + fprintf(stderr, "%s: inverted flags are not supported\n", + progname); + result = -EINVAL; + goto usage_error; + } + if (last_mirror->m_flags & ~LCME_USER_FLAGS) { + fprintf(stderr, + "%s: unsupported mirror flags: %s\n", + progname, optarg); + result = -EINVAL; + goto error; + } + break; + } case 'b': if (!migrate_mode) { fprintf(stderr, @@ -2315,8 +2437,8 @@ static int lfs_setstripe0(int argc, char **argv, enum setstripe_origin opc) /* Only LCME_FL_INIT flags is used in PFL, and it shouldn't be * altered by user space tool, so we don't need to support the * --component-set for this moment. */ - if (comp_set != 0) { - fprintf(stderr, "%s %s: --component-set not supported\n", + if (comp_set && !comp_id) { + fprintf(stderr, "%s %s: --component-set doesn't have component-id set\n", progname, argv[0]); goto usage_error; } @@ -2469,10 +2591,12 @@ static int lfs_setstripe0(int argc, char **argv, enum setstripe_origin opc) layout); } else if (comp_set != 0) { result = lfs_component_set(fname, comp_id, - lsa.lsa_comp_flags); + lsa.lsa_comp_flags, + lsa.lsa_comp_neg_flags); } else if (comp_del != 0) { result = lfs_component_del(fname, comp_id, - lsa.lsa_comp_flags); + lsa.lsa_comp_flags, + lsa.lsa_comp_neg_flags); } else if (comp_add != 0) { result = lfs_component_add(fname, layout); } else if (opc == SO_MIRROR_CREATE) { @@ -2778,14 +2902,19 @@ static int lfs_find(int argc, char **argv) param.fp_exclude_comp_count = !!neg_opt; break; case LFS_COMP_FLAGS_OPT: - rc = comp_str2flags(¶m.fp_comp_flags, optarg); - if (rc || comp_flags_is_neg(param.fp_comp_flags)) { + rc = comp_str2flags(optarg, ¶m.fp_comp_flags, + ¶m.fp_comp_neg_flags); + if (rc) { fprintf(stderr, "error: bad component flags " "'%s'\n", optarg); goto err; } param.fp_check_comp_flags = 1; - param.fp_exclude_comp_flags = !!neg_opt; + if (neg_opt) { + __u32 flags = param.fp_comp_neg_flags; + param.fp_comp_neg_flags = param.fp_comp_flags; + param.fp_comp_flags = flags; + } break; case LFS_COMP_START_OPT: if (optarg[0] == '+') { @@ -3239,19 +3368,16 @@ static int lfs_getstripe_internal(int argc, char **argv, break; case LFS_COMP_FLAGS_OPT: if (optarg != NULL) { - __u32 *flags = ¶m->fp_comp_flags; - rc = comp_str2flags(flags, optarg); + rc = comp_str2flags(optarg, + ¶m->fp_comp_flags, + ¶m->fp_comp_neg_flags); if (rc != 0) { fprintf(stderr, "error: %s bad " "component flags '%s'.\n", argv[0], optarg); return CMD_HELP; - } else { - param->fp_check_comp_flags = 1; - param->fp_exclude_comp_flags = - comp_flags_is_neg(*flags); - comp_flags_clear_neg(flags); } + param->fp_check_comp_flags = 1; } else { param->fp_verbose |= VERBOSE_COMP_FLAGS; param->fp_max_depth = 0; diff --git a/lustre/utils/liblustreapi.c b/lustre/utils/liblustreapi.c index ca6864f..fb65b0d 100644 --- a/lustre/utils/liblustreapi.c +++ b/lustre/utils/liblustreapi.c @@ -3177,10 +3177,9 @@ static void lov_dump_comp_v1(struct find_param *param, char *path, entry = &comp_v1->lcm_entries[i]; if (param->fp_check_comp_flags) { - if ((param->fp_exclude_comp_flags && - (param->fp_comp_flags & entry->lcme_flags)) || - (!param->fp_exclude_comp_flags && - !(param->fp_comp_flags & entry->lcme_flags))) + if (((param->fp_comp_flags & entry->lcme_flags) != + param->fp_comp_flags) || + (param->fp_comp_neg_flags & entry->lcme_flags)) continue; } @@ -3798,16 +3797,13 @@ static int find_check_comp_options(struct find_param *param) entry = &comp_v1->lcm_entries[i]; if (param->fp_check_comp_flags) { - if (((entry->lcme_flags & param->fp_comp_flags) && - param->fp_exclude_comp_flags) || - (!(entry->lcme_flags & param->fp_comp_flags) && - !param->fp_exclude_comp_flags)) + ret = 1; + if (((param->fp_comp_flags & entry->lcme_flags) != + param->fp_comp_flags) || + (param->fp_comp_neg_flags & entry->lcme_flags)) { ret = -1; - else - ret = 1; - - if (ret == -1) continue; + } } if (param->fp_check_comp_start) { @@ -5038,6 +5034,20 @@ int llapi_get_data_version(int fd, __u64 *data_version, __u64 flags) return rc; } +/** + * Flush cached pages from all clients. + * + * \param fd File descriptor + * \retval 0 success + * \retval < 0 error + */ +int llapi_file_flush(int fd) +{ + __u64 dv; + + return llapi_get_data_version(fd, &dv, LL_DV_WR_FLUSH); +} + /* * Fetch layout version from OST objects. Layout version on OST objects are * only set when the file is a mirrored file AND after the file has been diff --git a/lustre/utils/liblustreapi_layout.c b/lustre/utils/liblustreapi_layout.c index 0c1849c..531e083 100644 --- a/lustre/utils/liblustreapi_layout.c +++ b/lustre/utils/liblustreapi_layout.c @@ -2119,13 +2119,100 @@ out: * comp->lcme_id value, which must be an unique component ID. The new * attributes are passed in by @comp and @valid is used to specify which * attributes in the component are going to be changed. + * + * \param[in] path path name of the file + * \param[in] ids An array of component IDs + * \param[in] flags flags: LCME_FL_* or; + * negative flags: (LCME_FL_NEG|LCME_FL_*) + * \param[in] count Number of elements in ids and flags array */ -int llapi_layout_file_comp_set(const char *path, - const struct llapi_layout *comp, - uint32_t valid) +int llapi_layout_file_comp_set(const char *path, uint32_t *ids, uint32_t *flags, + size_t count) { - errno = EOPNOTSUPP; - return -1; + int rc = -1, fd = -1, i; + size_t lum_size; + struct llapi_layout *layout; + struct llapi_layout_comp *comp; + struct lov_user_md *lum = NULL; + + if (path == NULL) { + errno = EINVAL; + return -1; + } + + if (!count) + return 0; + + for (i = 0; i < count; i++) { + if (!ids[i] || !flags[i]) { + errno = EINVAL; + return -1; + } + + if (ids[i] > LCME_ID_MAX || (flags[i] & ~LCME_KNOWN_FLAGS)) { + errno = EINVAL; + return -1; + } + + /* do not allow to set or clear INIT flag */ + if (flags[i] & LCME_FL_INIT) { + errno = EINVAL; + return -1; + } + } + + layout = __llapi_layout_alloc(); + if (layout == NULL) + return -1; + + layout->llot_is_composite = true; + for (i = 0; i < count; i++) { + comp = __llapi_comp_alloc(0); + if (comp == NULL) + goto out; + + comp->llc_id = ids[i]; + comp->llc_flags = flags[i]; + + list_add_tail(&comp->llc_list, &layout->llot_comp_list); + layout->llot_cur_comp = comp; + } + + lum = llapi_layout_to_lum(layout); + if (lum == NULL) + goto out; + + lum_size = ((struct lov_comp_md_v1 *)lum)->lcm_size; + + fd = open(path, O_RDWR); + if (fd < 0) + goto out; + + /* flush cached pages from clients */ + rc = llapi_file_flush(fd); + if (rc) { + errno = -rc; + rc = -1; + goto out_close; + } + + rc = fsetxattr(fd, XATTR_LUSTRE_LOV".set.flags", lum, lum_size, 0); + if (rc < 0) + goto out_close; + + rc = 0; + +out_close: + if (fd >= 0) { + int tmp_errno = errno; + close(fd); + errno = tmp_errno; + } +out: + if (lum) + free(lum); + llapi_layout_free(layout); + return rc; } /** @@ -2142,6 +2229,33 @@ bool llapi_layout_is_composite(struct llapi_layout *layout) } /** + * Iterate every components in the @layout and call callback function @cb. + * + * \param[in] + */ +int llapi_layout_comp_iterate(struct llapi_layout *layout, + llapi_layout_iter_cb cb, void *cbdata) +{ + int rc; + + rc = llapi_layout_comp_use(layout, LLAPI_LAYOUT_COMP_USE_FIRST); + if (rc < 0) + return rc; + + while (rc == 0) { + rc = cb(layout, cbdata); + if (rc != LLAPI_LAYOUT_ITER_CONT) + break; + + rc = llapi_layout_comp_use(layout, LLAPI_LAYOUT_COMP_USE_NEXT); + if (rc < 0) + return rc; + } + + return rc >= 0 ? LLAPI_LAYOUT_ITER_CONT : rc; +} + +/** * llapi_layout_merge() - Merge a composite layout into another one. * @dst_layout: Destination composite layout. * @src_layout: Source composite layout. @@ -2237,11 +2351,8 @@ int llapi_mirror_find_stale(struct llapi_layout *layout, int rc; rc = llapi_layout_comp_use(layout, LLAPI_LAYOUT_COMP_USE_FIRST); - if (rc < 0) { - fprintf(stderr, "%s: move to the first layout component: %s.\n", - __func__, strerror(errno)); + if (rc < 0) goto error; - } while (rc == 0) { uint32_t id; @@ -2250,21 +2361,15 @@ int llapi_mirror_find_stale(struct llapi_layout *layout, uint64_t start, end; rc = llapi_layout_comp_flags_get(layout, &flags); - if (rc < 0) { - fprintf(stderr, "llapi_layout_comp_flags_get: %s.\n", - strerror(errno)); + if (rc < 0) goto error; - } if (!(flags & LCME_FL_STALE)) goto next; rc = llapi_layout_mirror_id_get(layout, &mirror_id); - if (rc < 0) { - fprintf(stderr, "llapi_layout_mirror_id_get: %s.\n", - strerror(errno)); + if (rc < 0) goto error; - } /* the caller only wants stale components from specific * mirrors */ @@ -2282,18 +2387,12 @@ int llapi_mirror_find_stale(struct llapi_layout *layout, } rc = llapi_layout_comp_id_get(layout, &id); - if (rc < 0) { - fprintf(stderr, "llapi_layout_comp_id_get: %s.\n", - strerror(errno)); + if (rc < 0) goto error; - } rc = llapi_layout_comp_extent_get(layout, &start, &end); - if (rc < 0) { - fprintf(stderr, "llapi_layout_comp_extent_get: %s.\n", - strerror(errno)); + if (rc < 0) goto error; - } /* pack this component into @comp array */ comp[idx].lrc_id = id; @@ -2303,8 +2402,6 @@ int llapi_mirror_find_stale(struct llapi_layout *layout, idx++; if (idx >= comp_size) { - fprintf(stderr, "%s: resync_comp array too small.\n", - __func__); rc = -EINVAL; goto error; } @@ -2312,8 +2409,6 @@ int llapi_mirror_find_stale(struct llapi_layout *layout, next: rc = llapi_layout_comp_use(layout, LLAPI_LAYOUT_COMP_USE_NEXT); if (rc < 0) { - fprintf(stderr, "%s: move to the next layout " - "component: %s.\n", __func__, strerror(errno)); rc = -EINVAL; goto error; } @@ -2396,11 +2491,8 @@ ssize_t llapi_mirror_resync_one(int fd, struct llapi_layout *layout, ssize_t copied; src = llapi_mirror_find(layout, start, end, &mirror_end); - if (src == 0) { - fprintf(stderr, "llapi_mirror_find cannot find " - "component covering %lu.\n", start); + if (src == 0) return -ENOENT; - } if (mirror_end == OBD_OBJECT_EOF) to_copy = count; @@ -2408,11 +2500,8 @@ ssize_t llapi_mirror_resync_one(int fd, struct llapi_layout *layout, to_copy = MIN(count, mirror_end - start); copied = llapi_mirror_copy(fd, src, dst, start, to_copy); - if (copied < 0) { - fprintf(stderr, "llapi_mirror_copy returned %zd.\n", - copied); + if (copied < 0) return copied; - } result += copied; if (copied < to_copy) /* end of file */ -- 1.8.3.1