Lustre buffered I/O does not work well with restrictive memcg
control. This may result in OOM when the system is under memroy
pressure.
Lustre has implemented unstable pages support similar to NFS.
But it is disabled by default due to the performance reason.
In Lustre, a client pins the cache pages for writes until the
write transcation is committed on the server (OST) even these
pinned pages have been finished writeback. The server starts
a transaction commit either because the commit interval (5
second, by default) for the backend storage (i.e. OST/ldiskfs)
has been reached or there is not enough room in the journal
for a particular handle to start. Before the write transcation
has been committed and notify the client, these pages are
pinned and not flushable in any way by the kernel.
This means that when a client hits memory pressure there can
be a large number of unfreeable (pinned and uncommitted) pages,
so the application on the client will end up OOM killed because
when asked to free up memory it can not.
This is particularly common with cgroups. Because when cgroups
are in use, the memory limit is generally much lower than the
total system memory limits and it is more likely to reach the
limits.
Linux kernel has matured memory reclaim mechanism to avoid OOM
even with cgroups.
After perform dirtied write for a page, the kernel calls
@balance_dirty_pages(). If the dirtied and uncommitted pages
are over background threshold for the global memory limits or
memory cgroup limits, the writeback threads are woken to perform
some writeout.
When allocate a new page for I/O under memory pressure, the
kernel will try direct reclaim and then allocating. For cgroup,
it will try to reclaim pages from the memory cgroup over soft
limit. The slow page allocation path with direct reclaim will
call @wakeup_flusher_threads() with WB_REASON_VMSCAN to start
writeback dirty pages.
Our solution uses the page reclaim mechanism in the kernel
directly.
In the completion of page writeback (in @brw_interpret), call
@__mark_inode_dirty() to add this dirty inode which has pinned
uncommitted pages into the @bdi_writeback where each memory
cgroup has itw own @bdi_writeback to contorl the writeback for
buffered writes within it.
Thus under memory pressure, the writeback threads will be woken
up, and it will call @ll_writepages() to write out data.
For background writeout (over background dirty threshold) or
writeback with WB_REASON_VMSCAN for direct reclaim, we first
flush dirtied pages to OSTs and then sync them to OSTs and force
to commit these pages to release them quickly.
When a cgroup is under memory pressure, the kernel asks to do
writeback and then it does a fsync to OSTs. This will commit
uncommitted/unstable pages, and then the kernel can free them
finally.
In the following, we will give out some performance results.
The client has 512G memory in total.
1. dd if=/dev/zero of=$test bs=1M count=$size
I/O size 128G 256G 512G 1024G
unpatch (GB/s) 2.2 2.2 2.1 2.0
patched (GB/s) 2.2 2.2 2.1 2.0
There is no preformance regession after enable unstable page
account with the patch.
2. One process under different memcg limits and total I/O
size varies from 2X memlimit to 0.5 memlimit:
dd if=/dev/zero of=$file bs=1M count=$((memlimit_mb * time))
memcg limits 1G 4G 16G 64G
2X memlimit (GB/s) 1.7 1.6 1.8 1.7
1X memlimit (GB/s) 1.9 1.9 2.2 2.2
.5X memlimit(GB/s) 2.3 2.3 2.2 2.3
Without this patch, dd with I/O size > memcg limit will be
OOM-killed.
3. Multiple cgroups Testing:
8 cgroups in total each with memory limit of 8G.
Run dd write on each cgrop with I/O size of 2X memory limit
(16G).
17179869184 bytes (17 GB, 16 GiB) copied, 12.7842 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.7889 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.9504 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 12.9577 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.4066 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.5397 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.5769 s, 1.3 GB/s
17179869184 bytes (17 GB, 16 GiB) copied, 13.6605 s, 1.3 GB/s
4. Two dd writers one (A) is under memcg control and another
(B) is not. The total write data is 128G. Memcg limits varies
from 1G to 128G.
cmd: ./t2p.sh $memlimit_mb
memlimit dd writer (A) dd writer (B)
1G 1.3GB/s 2.2GB/s
4G 1.3GB/s 2.2GB/s
16G 1.4GB/s 2.2GB/s
32G 1.5GB/s 2.2GB/s
64G 1.8GB/s 2.2GB/s
128G 2.1GB/s 2.1GB/s
The results demonstrates that the process with memcg limits
nearly has no impact on the performance of the process without
limits.
Test-Parameters: clientdistro=el8.7 testlist=sanity env=ONLY=411b,ONLY_REPEAT=10
Test-Parameters: clientdistro=el9.1 testlist=sanity env=ONLY=411b,ONLY_REPEAT=10
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7b548dcc214995c9f00d57817028ec64fd917eab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50544
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
[delete_from_page_cache is exported])])
]) # LC_EXPORTS_DELETE_FROM_PAGE_CACHE
+
+#
+# LC_HAVE_WB_STAT_MOD
+#
+# Kernel 5.16-rc1 bd3488e7b4d61780eb3dfaca1cc6f4026bcffd48
+# mm/writeback: Rename __add_wb_stat() to wb_stat_mod()
+#
+AC_DEFUN([LC_HAVE_WB_STAT_MOD], [
+tmp_flags="$EXTRA_KCFLAGS"
+EXTRA_KCFLAGS="-Werror"
+LB_CHECK_COMPILE([if wb_stat_mod() exists],
+wb_stat_mode, [
+ #include <linux/backing-dev.h>
+],[
+ wb_stat_mod(NULL, WB_WRITEBACK, 1);
+],[
+ AC_DEFINE(HAVE_WB_STAT_MOD, 1,
+ [wb_stat_mod() exists])
+])
+EXTRA_KCFLAGS="$tmp_flags"
+]) # LC_HAVE_WB_STAT_MOD
+
#
# LC_HAVE_INVALIDATE_FOLIO
#
LC_HAVE_SECURITY_DENTRY_INIT_WITH_XATTR_NAME_ARG
LC_HAVE_KIOCB_COMPLETE_2ARGS
LC_EXPORTS_DELETE_FROM_PAGE_CACHE
+ LC_HAVE_WB_STAT_MOD
# 5.17
LC_HAVE_INVALIDATE_FOLIO
*/
int (*coo_attr_update)(const struct lu_env *env, struct cl_object *obj,
const struct cl_attr *attr, unsigned valid);
+ /**
+ * Mark the inode dirty. By this way, the inode will add into the
+ * writeback list of the corresponding @bdi_writeback, and then it will
+ * defer to write out the dirty pages to OSTs via the kernel writeback
+ * mechanism.
+ */
+ void (*coo_dirty_for_sync)(const struct lu_env *env,
+ struct cl_object *obj);
/**
* Update object configuration. Called top-to-bottom to modify object
* configuration.
enum cl_fsync_mode {
/** start writeback, do not wait for them to finish */
- CL_FSYNC_NONE = 0,
+ CL_FSYNC_NONE = 0,
/** start writeback and wait for them to finish */
- CL_FSYNC_LOCAL = 1,
+ CL_FSYNC_LOCAL = 1,
/** discard all of dirty pages in a specific file range */
- CL_FSYNC_DISCARD = 2,
+ CL_FSYNC_DISCARD = 2,
/** start writeback and make sure they have reached storage before
* return. OST_SYNC RPC must be issued and finished */
- CL_FSYNC_ALL = 3
+ CL_FSYNC_ALL = 3,
+ /** start writeback, thus the kernel can reclaim some memory */
+ CL_FSYNC_RECLAIM = 4,
};
struct cl_io_rw_common {
struct cl_attr *attr);
int cl_object_attr_update(const struct lu_env *env, struct cl_object *obj,
const struct cl_attr *attr, unsigned valid);
+void cl_object_dirty_for_sync(const struct lu_env *env, struct cl_object *obj);
int cl_object_glimpse (const struct lu_env *env, struct cl_object *obj,
struct ost_lvb *lvb);
int cl_conf_set (const struct lu_env *env, struct cl_object *obj,
#define ll_access_ok(ptr, len) access_ok(ptr, len)
#endif
+#ifdef HAVE_WB_STAT_MOD
+#define __add_wb_stat(wb, item, amount) wb_stat_mod(wb, item, amount)
+#endif
+
#ifdef HAVE_SEC_RELEASE_SECCTX_1ARG
#ifndef HAVE_LSMCONTEXT_INIT
/* Ubuntu 5.19 */
ENTRY;
if (mode != CL_FSYNC_NONE && mode != CL_FSYNC_LOCAL &&
- mode != CL_FSYNC_DISCARD && mode != CL_FSYNC_ALL)
+ mode != CL_FSYNC_DISCARD && mode != CL_FSYNC_ALL &&
+ mode != CL_FSYNC_RECLAIM)
RETURN(-EINVAL);
env = cl_env_get(&refcheck);
struct ll_sb_info *sbi = ll_s2sbi(sb);
char *profilenm = get_profile_name(sb);
unsigned long cfg_instance = ll_get_cfg_instance(sb);
- long ccc_count;
- int next, force = 1, rc = 0;
+ int next, force = 1;
+
ENTRY;
if (IS_ERR(sbi))
force = obd->obd_force;
}
- /* Wait for unstable pages to be committed to stable storage */
- if (force == 0) {
- rc = l_wait_event_abortable(
- sbi->ll_cache->ccc_unstable_waitq,
- atomic_long_read(&sbi->ll_cache->ccc_unstable_nr) == 0);
- }
-
- ccc_count = atomic_long_read(&sbi->ll_cache->ccc_unstable_nr);
- if (force == 0 && rc != -ERESTARTSYS)
- LASSERTF(ccc_count == 0, "count: %li\n", ccc_count);
-
/* We need to set force before the lov_disconnect in
* lustre_common_put_super, since l_d cleans up osc's as well.
*/
enum cl_fsync_mode mode;
int range_whole = 0;
int result;
+
ENTRY;
if (wbc->range_cyclic) {
if (wbc->sync_mode == WB_SYNC_ALL)
mode = CL_FSYNC_LOCAL;
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+#ifdef SB_I_CGROUPWB
+ struct bdi_writeback *wb;
+
+ /*
+ * As it may break full stripe writes on the inode,
+ * disable periodic kupdate writeback (@wbc->for_kupdate)?
+ */
+
+ /*
+ * The system is under memory pressure and it is now reclaiming
+ * cache pages.
+ */
+ wb = inode_to_wb(inode);
+ if (wbc->for_background ||
+ (wb->start_all_reason == WB_REASON_VMSCAN &&
+ test_bit(WB_start_all, &wb->state)))
+ mode = CL_FSYNC_RECLAIM;
+#else
+ /*
+ * We have no idea about writeback reason for memory reclaim
+ * WB_REASON_TRY_TO_FREE_PAGES in the old kernel such as rhel7
+ * (WB_REASON_VMSCAN in the newer kernel) ...
+ * Here set mode with CL_FSYNC_RECLAIM forcely on the old
+ * kernel.
+ */
+ if (!wbc->for_kupdate)
+ mode = CL_FSYNC_RECLAIM;
+#endif
+ }
+
if (ll_i2info(inode)->lli_clob == NULL)
RETURN(0);
return 0;
}
+static void vvp_dirty_for_sync(const struct lu_env *env, struct cl_object *obj)
+{
+ struct inode *inode = vvp_object_inode(obj);
+
+ __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+}
+
static int vvp_conf_set(const struct lu_env *env, struct cl_object *obj,
const struct cl_object_conf *conf)
{
.coo_io_init = vvp_io_init,
.coo_attr_get = vvp_attr_get,
.coo_attr_update = vvp_attr_update,
+ .coo_dirty_for_sync = vvp_dirty_for_sync,
.coo_conf_set = vvp_conf_set,
.coo_prune = vvp_prune,
.coo_glimpse = vvp_object_glimpse,
ENTRY;
+ if (fio->fi_mode == CL_FSYNC_RECLAIM) {
+ struct client_obd *cli = osc_cli(osc);
+
+ if (!atomic_long_read(&cli->cl_unstable_count)) {
+ /* Stop flush when there are no unstable pages? */
+ CDEBUG(D_CACHE, "unstable count is zero\n");
+ RETURN(0);
+ }
+ }
+
/* a MDC lock always covers whole object, do sync for whole
* possible range despite of supplied start/end values.
*/
fio->fi_nr_written += result;
result = 0;
}
- if (fio->fi_mode == CL_FSYNC_ALL) {
+ if (fio->fi_mode == CL_FSYNC_ALL || fio->fi_mode == CL_FSYNC_RECLAIM) {
+ struct osc_io *oio = cl2osc_io(env, slice);
+ struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
int rc;
- rc = osc_cache_wait_range(env, osc, 0, CL_PAGE_EOF);
- if (result == 0)
- result = rc;
+ if (fio->fi_mode == CL_FSYNC_ALL) {
+ rc = osc_cache_wait_range(env, osc, 0, CL_PAGE_EOF);
+ if (result == 0)
+ result = rc;
+ }
/* Use OSC sync code because it is asynchronous.
* It is to be added into MDC and avoid the using of
* OST_SYNC at both MDC and MDT.
*/
rc = osc_fsync_ost(env, osc, fio);
- if (result == 0)
+ if (result == 0) {
+ cbargs->opc_rpc_sent = 1;
result = rc;
+ }
}
RETURN(result);
EXPORT_SYMBOL(cl_object_attr_update);
/**
+ * Mark the inode as dirty when the inode has uncommitted (unstable) pages.
+ * Thus when the system is under momory pressure, it will trigger writeback
+ * on background to commit and unpin the pages.
+ */
+void cl_object_dirty_for_sync(const struct lu_env *env, struct cl_object *top)
+{
+ struct cl_object *obj;
+
+ ENTRY;
+
+ cl_object_for_each(obj, top) {
+ if (obj->co_ops->coo_dirty_for_sync != NULL)
+ obj->co_ops->coo_dirty_for_sync(env, obj);
+ }
+ EXIT;
+}
+EXPORT_SYMBOL(cl_object_dirty_for_sync);
+
+/**
* Notifies layers (bottom-to-top) that glimpse AST was received.
*
* Layers have to fill \a lvb fields with information that will be shipped
spin_lock_init(&cache->ccc_lru_lock);
INIT_LIST_HEAD(&cache->ccc_lru);
- /* turn unstable check off by default as it impacts performance */
- cache->ccc_unstable_check = 0;
+ cache->ccc_unstable_check = 1;
atomic_long_set(&cache->ccc_unstable_nr, 0);
init_waitqueue_head(&cache->ccc_unstable_waitq);
mutex_init(&cache->ccc_max_cache_mb_lock);
static int osc_io_fsync_start(const struct lu_env *env,
const struct cl_io_slice *slice)
{
- struct cl_io *io = slice->cis_io;
+ struct cl_io *io = slice->cis_io;
struct cl_fsync_io *fio = &io->u.ci_fsync;
- struct cl_object *obj = slice->cis_obj;
- struct osc_object *osc = cl2osc(obj);
- pgoff_t start = fio->fi_start >> PAGE_SHIFT;
- pgoff_t end = fio->fi_end >> PAGE_SHIFT;
- int result = 0;
+ struct cl_object *obj = slice->cis_obj;
+ struct osc_object *osc = cl2osc(obj);
+ pgoff_t start = fio->fi_start >> PAGE_SHIFT;
+ pgoff_t end = fio->fi_end >> PAGE_SHIFT;
+ int result = 0;
+
ENTRY;
+ if (fio->fi_mode == CL_FSYNC_RECLAIM) {
+ struct client_obd *cli = osc_cli(osc);
+
+ if (!atomic_long_read(&cli->cl_unstable_count)) {
+ /* Stop flush when there are no unstable pages? */
+ CDEBUG(D_CACHE, "unstable count is zero\n");
+ RETURN(0);
+ }
+ }
+
if (fio->fi_end == OBD_OBJECT_EOF)
end = CL_PAGE_EOF;
fio->fi_nr_written += result;
result = 0;
}
- if (fio->fi_mode == CL_FSYNC_ALL) {
+ if (fio->fi_mode == CL_FSYNC_ALL || fio->fi_mode == CL_FSYNC_RECLAIM) {
+ struct osc_io *oio = cl2osc_io(env, slice);
+ struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
int rc;
/* we have to wait for writeback to finish before we can
* send OST_SYNC RPC. This is bad because it causes extents
* to be written osc by osc. However, we usually start
* writeback before CL_FSYNC_ALL so this won't have any real
- * problem. */
- rc = osc_cache_wait_range(env, osc, start, end);
- if (result == 0)
- result = rc;
+ * problem.
+ * We do not have to wait for waitback to finish in the memory
+ * reclaim environment.
+ */
+ if (fio->fi_mode == CL_FSYNC_ALL) {
+ rc = osc_cache_wait_range(env, osc, start, end);
+ if (result == 0)
+ result = rc;
+ }
+
rc = osc_fsync_ost(env, osc, fio);
- if (result == 0)
+ if (result == 0) {
+ cbargs->opc_rpc_sent = 1;
result = rc;
+ }
}
RETURN(result);
const struct cl_io_slice *slice)
{
struct cl_fsync_io *fio = &slice->cis_io->u.ci_fsync;
- struct cl_object *obj = slice->cis_obj;
+ struct cl_object *obj = slice->cis_obj;
+ struct osc_io *oio = cl2osc_io(env, slice);
+ struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
pgoff_t start = fio->fi_start >> PAGE_SHIFT;
pgoff_t end = fio->fi_end >> PAGE_SHIFT;
int result = 0;
if (fio->fi_mode == CL_FSYNC_LOCAL) {
result = osc_cache_wait_range(env, cl2osc(obj), start, end);
- } else if (fio->fi_mode == CL_FSYNC_ALL) {
- struct osc_io *oio = cl2osc_io(env, slice);
- struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
+ } else if (cbargs->opc_rpc_sent && (fio->fi_mode == CL_FSYNC_ALL ||
+ fio->fi_mode == CL_FSYNC_RECLAIM)) {
wait_for_completion(&cbargs->opc_sync);
if (result == 0)
struct osc_extent *tmp;
struct client_obd *cli = aa->aa_cli;
unsigned long transferred = 0;
+ struct cl_object *obj = NULL;
ENTRY;
struct obdo *oa = aa->aa_oa;
struct cl_attr *attr = &osc_env_info(env)->oti_attr;
unsigned long valid = 0;
- struct cl_object *obj;
struct osc_async_page *last;
last = brw_page2oap(aa->aa_ppga[aa->aa_page_count - 1]);
OBD_SLAB_FREE_PTR(aa->aa_oa, osc_obdo_kmem);
aa->aa_oa = NULL;
- if (lustre_msg_get_opc(req->rq_reqmsg) == OST_WRITE && rc == 0)
+ if (lustre_msg_get_opc(req->rq_reqmsg) == OST_WRITE && rc == 0) {
osc_inc_unstable_pages(req);
+ /*
+ * If req->rq_committed is set, it means that the dirty pages
+ * have already committed into the stable storage on OSTs
+ * (i.e. Direct I/O).
+ */
+ if (!req->rq_committed)
+ cl_object_dirty_for_sync(env, cl_object_top(obj));
+ }
list_for_each_entry_safe(ext, tmp, &aa->aa_exts, oe_link) {
list_del_init(&ext->oe_link);
# skip cgroup tests on RHEL8.1 kernels until they are fixed
if (( $LINUX_VERSION_CODE >= $(version_code 4.18.0) &&
$LINUX_VERSION_CODE < $(version_code 5.4.0) )); then
- always_except LU-13063 411
+ always_except LU-13063 411a
+fi
+
+# skip cgroup tests for kernels < v4.18.0
+if (( $LINUX_VERSION_CODE < $(version_code 4.18.0) )); then
+ always_except LU-13063 411b
fi
# 5 12 8 12 15 (min)"
cleanup_test411_cgroup() {
trap 0
+ cat $1/memory.stat
rmdir "$1"
}
-test_411() {
+test_411a() {
local cg_basedir=/sys/fs/cgroup/memory
# LU-9966
test -f "$cg_basedir/memory.kmem.limit_in_bytes" ||
return 0
}
-run_test 411 "Slab allocation error with cgroup does not LBUG"
+run_test 411a "Slab allocation error with cgroup does not LBUG"
+
+test_411b() {
+ local cg_basedir=/sys/fs/cgroup/memory
+ # LU-9966
+ [ -e "$cg_basedir/memory.kmem.limit_in_bytes" ] ||
+ skip "no setup for cgroup"
+ $LFS setstripe -c 2 $DIR/$tfile || error "unable to setstripe"
+ # testing suggests we can't reliably avoid OOM with a 64M limit, but it
+ # seems reasonable to ask that we have at least 128M in the cgroup
+ local memlimit_mb=256
+
+ # Create a cgroup and set memory limit
+ # (tfile is used as an easy way to get a recognizable cgroup name)
+ local cgdir=$cg_basedir/$tfile
+ mkdir $cgdir || error "cgroup mkdir '$cgdir' failed"
+ stack_trap "cleanup_test411_cgroup $cgdir" EXIT
+ echo $((memlimit_mb * 1024 * 1024)) > $cgdir/memory.limit_in_bytes
+
+ echo "writing first file"
+ # Write a file 4x the memory limit in size
+ bash -c "echo \$$ > $cgdir/tasks && dd if=/dev/zero of=$DIR/$tfile bs=1M count=$((memlimit_mb * 4))" ||
+ error "(1) failed to write successfully"
+
+ sync
+ cancel_lru_locks osc
+
+ rm -f $DIR/$tfile
+ $LFS setstripe -c 2 $DIR/$tfile || error "unable to setstripe"
+
+ # Try writing at a larger block size
+ # NB: if block size is >= 1/2 cgroup size, we sometimes get OOM killed
+ # so test with 1/4 cgroup size (this seems reasonable to me - we do
+ # need *some* memory to do IO in)
+ echo "writing at larger block size"
+ bash -c "echo \$$ > $cgdir/tasks && dd if=/dev/zero of=$DIR/$tfile bs=64M count=$((memlimit_mb * 4 / 128))" ||
+ error "(3) failed to write successfully"
+
+ sync
+ cancel_lru_locks osc
+ rm -f $DIR/$tfile
+ $LFS setstripe -c 2 $DIR/$tfile.{1..4} || error "unable to setstripe"
+
+ # Try writing multiple files at once
+ echo "writing multiple files"
+ bash -c "echo \$$ > $cgdir/tasks && dd if=/dev/zero of=$DIR/$tfile.1 bs=32M count=$((memlimit_mb * 4 / 64))" &
+ local pid1=$!
+ bash -c "echo \$$ > $cgdir/tasks && dd if=/dev/zero of=$DIR/$tfile.2 bs=32M count=$((memlimit_mb * 4 / 64))" &
+ local pid2=$!
+ bash -c "echo \$$ > $cgdir/tasks && dd if=/dev/zero of=$DIR/$tfile.3 bs=32M count=$((memlimit_mb * 4 / 64))" &
+ local pid3=$!
+ bash -c "echo \$$ > $cgdir/tasks && dd if=/dev/zero of=$DIR/$tfile.4 bs=32M count=$((memlimit_mb * 4 / 64))" &
+ local pid4=$!
+
+ wait $pid1
+ local rc1=$?
+ wait $pid2
+ local rc2=$?
+ wait $pid3
+ local rc3=$?
+ wait $pid4
+ local rc4=$?
+ if (( rc1 != 0)); then
+ error "error writing to file from $pid1"
+ fi
+ if (( rc2 != 0)); then
+ error "error writing to file from $pid2"
+ fi
+ if (( rc3 != 0)); then
+ error "error writing to file from $pid3"
+ fi
+ if (( rc4 != 0)); then
+ error "error writing to file from $pid4"
+ fi
+
+ sync
+ cancel_lru_locks osc
+
+ # These files can be large-ish (~1 GiB total), so delete them rather
+ # than leave for later cleanup
+ rm -f $DIR/$tfile.*
+ return 0
+}
+run_test 411b "confirm Lustre can avoid OOM with reasonable cgroups limits"
test_412() {
(( $MDSCOUNT > 1 )) || skip_env "needs >= 2 MDTs"