git://git.whamcloud.com - tools/e2fsprogs.git/commit

author	Theodore Ts'o <tytso@mit.edu>
	Tue, 31 Jan 2023 04:18:41 +0000 (23:18 -0500)
committer	Theodore Ts'o <tytso@mit.edu>
	Wed, 1 Feb 2023 05:39:18 +0000 (00:39 -0500)
commit	f6cb7f010d137f83a37d1de375a9e73a84c4edf1
tree	635efc17ccfd04555d2e97d0bcfe0173806ca56a	tree \| snapshot
parent	4073feef2e7c440ba83304de122e2bbf9e976543	commit \| diff

libext2fs: unix_io: fix potential error path deadlock in reuse_cache()

This was reported by [1] but the fix was incorrect. The issue is that
when unix_io was made thread-safe, it was necessary that to add a
CACHE_MUTEX to protect multiple threads from potentially colliding
with the very simple writeback cache used by the unix_io I/O manager.
The original I/O manager was purposefully kept simple, used a
fixed-size cache; accordingly, the locking used also kept simple, and
used a single global mutex.

[1] https://lore.kernel.org/r/310fb77f-dfed-1196-c4ee-30d5138ee5a2@huawei.com

The problem was that if an application (such as e2fsck) registers a
write error handler, that handler would be called with the CACHE_MUTEX
still held, and if that application tried to do any I/O --- for
example, closing the file system using ext2fs_close() and then exiting
--- the application would deadlock.

We should perhaps fix this either by deciding that the simple Unix I/O
cache doesn't actually buy much beyond some system call overhead, or
by putting in a full-fledged buffer I/O cache system which uses a much
larger cache with allocated memory, fine-grained locking and Direct
I/O to prevent double cache at the kernel and userspace level.
However, for now, fix the problem by waiting until after we have
released the CACHE_MUTEX before calling the write handler. This is
good enough given how e2fsck's ehandler.c use case, and in practice no
one else really uses the error handler in any case.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>