Whamcloud - gitweb
tools/e2fsprogs.git
10 years agolibext2fs: don't fail inline data operations if there's no EA
Darrick J. Wong [Sun, 10 Aug 2014 22:31:04 +0000 (18:31 -0400)]
libext2fs: don't fail inline data operations if there's no EA

Fix up the rest of the inline data code not to complain if there's no
EA, since it's possible that there's no EA because we're in the
process of creating an inline data file.  Also, don't return an error
code when removing a nonexistent EA, because there's no reason to.

Furthermore, if we write less than 60 bytes of inline data, remove the
EA to avoid wasting space.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: strict inline data overwrite should not return ENOSPC
Darrick J. Wong [Sun, 10 Aug 2014 22:27:10 +0000 (18:27 -0400)]
libext2fs: strict inline data overwrite should not return ENOSPC

If we're doing a strict overwrite (same data size) of data in an
inline data file, we should be able to skip the size check.  If the
in-core EA representation is fine but the on-disk EA is slightly
corrupt (this happens when fixing minor errors in an inline dir), the
ext2fs_xattr_inode_max_size() call, which reads the disk EA, can lead
us to think that there's no space when in reality there is no issue
with doing a strict overwrite.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agomisc: fix various endianness problems with inline_data
Darrick J. Wong [Sun, 10 Aug 2014 22:22:54 +0000 (18:22 -0400)]
misc: fix various endianness problems with inline_data

The inline data code fails to perform endianness conversions correctly
or at all in a number of places, so fix this so that big-endian
machines function properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs/e2fsck: don't run off the end of the EA block
Darrick J. Wong [Sun, 10 Aug 2014 22:22:07 +0000 (18:22 -0400)]
libext2fs/e2fsck: don't run off the end of the EA block

When we're (a) reading EAs into a buffer; (b) byte-swapping EA
entries; or (c) checking EA data, be careful not to run off the end of
the memory buffer, because this causes invalid memory accesses and
e2fsck crashes.  This can happen if we encounter a specially crafted
FS image.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: check EA value offset
Darrick J. Wong [Sun, 10 Aug 2014 22:21:16 +0000 (18:21 -0400)]
libext2fs: check EA value offset

Perform a little more sanity checking of EA value offsets so that we
don't crash while trying to load things from the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: convert 'delete files' warning to a proper fix_problem error
Darrick J. Wong [Sun, 10 Aug 2014 22:21:15 +0000 (18:21 -0400)]
e2fsck: convert 'delete files' warning to a proper fix_problem error

In pass 3, convert the "delete files and re-run e2fsck" message to a
proper error code for more consistent error reporting and to make
translation easier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fuzz: fix build problems on macosx and i386 linux
Darrick J. Wong [Sun, 10 Aug 2014 22:21:14 +0000 (18:21 -0400)]
e2fuzz: fix build problems on macosx and i386 linux

Fix clang warnings about forgotten header files, dead code, and pwrite
support on OS X.  The unistd.h inclusion also fixes a parameter
truncation bug on i386.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Fri, 8 Aug 2014 21:02:34 +0000 (17:02 -0400)]
Merge branch 'maint' into next

Conflicts:
configure

10 years agolibext2fs: have UNIX IO manager use pread64/pwrite64
Theodore Ts'o [Fri, 8 Aug 2014 20:42:05 +0000 (16:42 -0400)]
libext2fs: have UNIX IO manager use pread64/pwrite64

Commit baa3544609da3c ("libext2fs: have UNIX IO manager use
pread/pwrite) causes a breakage on 32-bit systems where off_t is
32-bits for file systems larger than 4GB.  Fix this by using
pread64/pwrite64 if possible, and if pread64/pwrite64 is not present,
using pread/pwrite only if the size of off_t is at least as big as
ext2_loff_t.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agodebugfs: teach rdump to take multiple source arguments
Aaron Crane [Mon, 4 Aug 2014 01:54:14 +0000 (21:54 -0400)]
debugfs: teach rdump to take multiple source arguments

[ modified to update man page by tytso ]

Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agodebugfs: refactor do_rdump()
Aaron Crane [Mon, 4 Aug 2014 01:53:24 +0000 (21:53 -0400)]
debugfs: refactor do_rdump()

No behaviour changes.  This will simplify the next commit.

Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agodebugfs: fix double-close bug in "rdump" and "dump -p"
Aaron Crane [Mon, 4 Aug 2014 01:52:11 +0000 (21:52 -0400)]
debugfs: fix double-close bug in "rdump" and "dump -p"

Previously, both of these usages called dump_file() with a true value as
the "preserve" argument, which caused it to in turn call fix_perms() to
make the permissions on the locally-dumped file match those found on the
ext2 filesystem. fix_perms() then attempted to close(2) the file descriptor
(if any) before returning (though it didn't attempt to report on any errors
found while doing so).

However, in both of these situations, the local file being dumped had been
opened by the caller of dump_file(), which also closes it (and reports on
any errors detected when closing). This meant that both "rdump" and "dump
-p" would then emit a spurious EBADF message when trying to re-close the
local file descriptor.

Deleting the spurious close(2) call in fix_perms() fixes the problem in both
commands.

Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agodebugfs: be more specific in error messages
Aaron Crane [Mon, 4 Aug 2014 01:51:04 +0000 (21:51 -0400)]
debugfs: be more specific in error messages

Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: place metadata blocks in the last flex_bg so they are contiguous
Theodore Ts'o [Sun, 3 Aug 2014 18:00:47 +0000 (14:00 -0400)]
libext2fs: place metadata blocks in the last flex_bg so they are contiguous

Place the allocation bitmaps and inode table blocks so they are
adjacent, even in the last flexbg.

Previously, after running "mke2fs -t ext4 DEV 286720", the layout of
the last few block groups would look like this:

Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262145 (+0), Inode bitmap at 262161 (+16)
  Inode table at 262177-262432 (+32)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262162 (bg #32 + 17)
  Inode table at 262433-262688 (bg #32 + 288)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262163 (bg #32 + 18)
  Inode table at 262689-262944 (bg #32 + 544)

Now, they look like this:

Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262145 (+0), Inode bitmap at 262148 (+3)
  Inode table at 262151-262406 (+6)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262149 (bg #32 + 4)
  Inode table at 262407-262662 (bg #32 + 262)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262150 (bg #32 + 5)
  Inode table at 262663-262918 (bg #32 + 518)

This reduces the free space fragmentation in a freshly created file
system.  It also allows the following mke2fs command to succeed:

mke2fs -t ext4 -b 4096 -O ^resize_inode -G $((2**20)) DEV 2130483

(Note that while this allows people to run mke2fs with insanely large
flexbg sizes, this is not a recommended practice, as the kernel may
refuse to resize such a file system while mounted, since it currently
tries to allocate an in-memory data structure based on the size of the
flexbg, and so a file system with a very large flexbg size will cause
the memory allocation to fail.  This will hopefully be fixed in a
future kernel release, but if the goal is to force all of the metadata
blocks to be at the beginning of the file system, it's better to use
the packed_meta_blocks configuration parameter in mke2fs.conf.)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoRevert "mke2fs: prevent creation of unmountable ext4 with large flex_bg count"
Theodore Ts'o [Sun, 3 Aug 2014 16:22:27 +0000 (12:22 -0400)]
Revert "mke2fs: prevent creation of unmountable ext4 with large flex_bg count"

This reverts commit d988201ef9cb6f7b521e544061976ab4270a3f89.

The problem with this commit is that causes common small file system
configurations to fail.  For example:

    mke2fs -O flex_bg -b 4096 -I 1024 -F /tmp/tt 79106
    mke2fs 1.42.11 (09-Jul-2014)
    /tmp/tt: Invalid argument passed to ext2 library while setting
             up superblock

This check in ext2fs_initialize() was added to prevent the metadata
from being allocated beyond the end of the filesystem, but it is
also causing a wide range of failures for small filesystems.

We'll address this in a different way, by using a smarter algorithm
for deciding the layout of metadata blocks for the last flex block
group.

Reported-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years ago.gitignore: Add misc/e2fuzz file
Artemiy Volkov [Sun, 3 Aug 2014 04:14:14 +0000 (00:14 -0400)]
.gitignore: Add misc/e2fuzz file

This patch adds the misc/e2fuzz tool executable file to .gitignore.

Signed-off-by: Artemiy Volkov <artemiyv@acm.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: add regression tests for inodes with bad checksums
Darrick J. Wong [Sun, 3 Aug 2014 03:51:31 +0000 (23:51 -0400)]
tests: add regression tests for inodes with bad checksums

Add regression tests to e2fsck to examine how it deals with inode
table blocks which (a) have been zero'd; (b) have been one'd; (c) have
corrupt inodes with obvious problems; and (d) have inodes with
non-obvious problems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: add regression tests for group descriptors with bad checksums
Darrick J. Wong [Sun, 3 Aug 2014 03:50:52 +0000 (23:50 -0400)]
tests: add regression tests for group descriptors with bad checksums

Add tests to examine how e2fsck deals with (a) the block bitmap being
corrupt; (b) the inode bitmap being corrupt; (c) the bitmap checksums
being incorrect (but the bitmaps are fine); and (d) the group
descriptor checksum itself is incorrect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: add regression tests for superblocks with bad checksums
Darrick J. Wong [Sun, 3 Aug 2014 03:50:34 +0000 (23:50 -0400)]
tests: add regression tests for superblocks with bad checksums

Add regression tests to examine how e2fsck deals with random
superblock corruption such as obviously wrong fields and the checksum
itself being incorrect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: add regression tests for MMP blocks with bad checksums
Darrick J. Wong [Sun, 3 Aug 2014 03:49:41 +0000 (23:49 -0400)]
tests: add regression tests for MMP blocks with bad checksums

Add regression tests to examine how e2fsck deals with MMP blocks with
(a) a bad magic number; and (b) an incorrect checksum.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: add tests for directory entry blocks with checksum errors
Darrick J. Wong [Sun, 3 Aug 2014 03:48:58 +0000 (23:48 -0400)]
tests: add tests for directory entry blocks with checksum errors

Add some regression tests to examine how e2fsck handles directory
entry blocks and htree blocks with (a) malformed directory entries;
(b) incorrect checksums; or (c) obviously garbage entries.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: add tests for handling of corrupt extents
Darrick J. Wong [Fri, 1 Aug 2014 18:13:14 +0000 (11:13 -0700)]
tests: add tests for handling of corrupt extents

Add some regression tests to examine how e2fsck deals with (a) extent
blocks with only a bad checksum; (b) extent blocks with a bad magic
number; and (c) extent entries with corruption.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: add regression tests for EA blocks with bad checksums
Darrick J. Wong [Sun, 3 Aug 2014 03:18:37 +0000 (23:18 -0400)]
tests: add regression tests for EA blocks with bad checksums

Add regression tests for e2fsck dealing with (a) EA block with a bad
checksum; (b) EA block with a bad magic number; and (c) EA block with
damage that isn't otherwise noticeable.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: always ask to fix an inode that fails checksum verification
Darrick J. Wong [Sun, 3 Aug 2014 02:52:29 +0000 (22:52 -0400)]
e2fsck: always ask to fix an inode that fails checksum verification

If an inode fails checksum verification during pass 1 and the user
doesn't fix or clear the inode as part of the regular inode checks,
ensure that e2fsck remembers to ask the user if he simply wants to
correct the checksum.

We weren't capturing all the ways out of an interation of the inode
scanning loop, which means that not all errors were caught.  Also,
we might as well clear the 'failed csum' flag if we write the inode
directly from the inode scanning loop.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: disable checksum verification in a few select places
Darrick J. Wong [Sun, 3 Aug 2014 02:51:33 +0000 (22:51 -0400)]
e2fsck: disable checksum verification in a few select places

Selectively disable checksum verification in a couple more places:

In check_blocks, disable checksum verification when iterating a block
map because the block map iterator function (re)reads the inode, which
could be unchanged since the scan found that the checksum fails.  We
don't want to abort here; we want to keep evaluating the inode, and we
already know if the inode checksum doesn't match.

Further down in check_blocks when we're trying to see if i_size
matches the amount of data stored in the inode, don't allow checksum
errors when we go looking for the size of inline data.  If the
required attribute is at all find-able in the EA block, we'll fix any
other problems with the EA block later.  In the meantime, we don't
want to be truncating files unnecessarily.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: don't cache inodes that fail checksum verification
Darrick J. Wong [Sun, 3 Aug 2014 02:49:12 +0000 (22:49 -0400)]
libext2fs: don't cache inodes that fail checksum verification

If an inode fails checksum verification, don't stuff a copy of it in
the inode cache, because this can cause the library to fail to return
the "corrupt inode" error code.

In general, this happens if ext2fs_read_inode_full() is called twice
on an inode with an incorrect checksum.  If fs->flags has
EXT2_FLAG_IGNORE_CSUM_ERRORS set during the first call and *unset*
during the second call, the cache hit during the second call fails to
return EXT2_ET_INODE_CSUM_INVALID as you'd expect.  This happens
during fsck because the first read_inode call happens as part of
check_blocks and the second call happens during inode checksum
revalidation.  A file system with a slightly corrupt non-extent inode
will trigger this.

While we're at it, make the inode read function consistent with the
rest of libext2fs -- copy the metadata object into the caller's buffer
even if it fails checksum verification.  This will help e2fsck avoid a
double re-read later on down the line.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: correctly preserve fs flags when modifying ignore-csum-error flag
Darrick J. Wong [Sun, 3 Aug 2014 02:48:21 +0000 (22:48 -0400)]
e2fsck: correctly preserve fs flags when modifying ignore-csum-error flag

When we need to modify the "ignore checksum error" behavior flag to
get us past a library call, it's possible that the library call can
result in other flag bits being changed.  Therefore, it is not correct
to restore unconditionally the previous flags value, since this will
have unintended side effects on the other fs->flags; nor is it correct
to assume that we can unconditionally set (or clear) the "ignore csum
error" flag bit.  Therefore, we must merge the previous value of the
"ignore csum error" flag with the value of flags after the call.

Note that we want to leave checksum verification on as much as
possible because doing so exposes e2fsck bugs where two metadata
blocks are "sharing" the same disk block, and attempting to fix one
before relocating the other causes major filesystem damage.  The
damage is much more obvious when a previously checked piece of
metadata suddenly fails in a subsequent pass.

The modifications to the pass 2, 3, and 3A code are justified as
follows: When e2fsck encounters a block of directory entries and
cannot find the placeholder entry at the end that contains the
checksum, it will try to insert the placeholder.  If that fails, it
will schedule the directory for a pass 3A reconstruction.  Until that
happens, we don't want directory block writing (pass 2), block
iteration (pass 3), or block reading (pass 3A) to fail due to checksum
errors, because failing to find the placeholder is itself a checksum
verification error, which causes e2fsck to abort without fixing
anything.

The e2fsck call to ext2fs_read_bitmaps must never fail due to a
checksum error because e2fsck subsequently (a) verifies the bitmaps
itself; or (b) decides that they don't match what has been observed,
and rewrites them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: offer to clear inode table blocks that are insane
Darrick J. Wong [Sun, 3 Aug 2014 02:46:16 +0000 (22:46 -0400)]
e2fsck: offer to clear inode table blocks that are insane

Add a new behavior flag to the inode scan functions; when specified,
this flag will do some simple sanity checking of entire inode table
blocks.  If all the checksums are ok, we can skip checksum
verification on individual inodes later on.  If more than half of the
inodes look "insane" (bad extent tree root or checksum failure) then
ext2fs_get_next_inode_full() can return a special status code
indicating that what's in the buffer is probably garbage.

When e2fsck' inode scan encounters the 'inode is garbage' return code
it'll offer to zap the inode straightaway instead of trying to recover
anything.  This replaces the previous behavior of asking to zap
anything with a checksum error (strict_csum).

Signed-off-by: Darrick J. Wong <darrick.wong@orale.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: try to salvage corrupt directory entry blocks
Darrick J. Wong [Sun, 3 Aug 2014 02:32:12 +0000 (22:32 -0400)]
e2fsck: try to salvage corrupt directory entry blocks

Remove the code that would prompt the user to zap directory entry
blocks with bad checksums (i.e. strict_csums).  Instead, we'll run the
directory entries through the usual repair routines in an attempt to
save whatever we can.  At the same time, refactor the code that
schedules the repair of missing dirblock checksum entries.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: try to salvage extent blocks with bad checksums
Darrick J. Wong [Sun, 3 Aug 2014 02:32:11 +0000 (22:32 -0400)]
e2fsck: try to salvage extent blocks with bad checksums

Remove the code that would zap an extent block immediately if the
checksum failed (i.e. strict_csums).  Instead, we'll only do that if
the extent block header shows obvious structural problems; if the
header checks out, then we'll iterate the block and see if we can
recover some extents.

Requires a minor modification to ext2fs_extent_get such that the
extent block will be returned in the buffer even if the return code
indicates a checksum error.  This brings its behavior in line with
the rest of libext2fs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: check EA block headers when reading in the block
Darrick J. Wong [Sun, 3 Aug 2014 02:32:11 +0000 (22:32 -0400)]
libext2fs: check EA block headers when reading in the block

When reading an EA block in from disk, do a quick sanity check of the
block header, and return an error if we think we have garbage.  Teach
e2fsck to ignore the new error code in favor of doing its own
checking, and remove the strict_csums bits while we're at it.

(Also document some assumptions in the new ext_attr code.)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agodumpe2fs: complain when checksum verification fails
Darrick J. Wong [Sun, 3 Aug 2014 02:26:15 +0000 (22:26 -0400)]
dumpe2fs: complain when checksum verification fails

Warn the user to run e2fsck if the superblock or bitmaps fails
checksum verification.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: use root dir for lost+found when really desperate
Darrick J. Wong [Sun, 3 Aug 2014 02:18:30 +0000 (22:18 -0400)]
e2fsck: use root dir for lost+found when really desperate

If we're totally unable to allocate a lost+found directory, ask the
user if he would like to dump orphaned files in the root directory.
Hopefully this enables the user to delete enough files so that a
subsequent run of e2fsck will make more progress.  Better to cram lost
files in the rootdir than the current behavior, which is to fail at
linking them in, thereby leaving them as lost files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: never free critical metadata blocks in the block found map
Darrick J. Wong [Sun, 3 Aug 2014 02:18:29 +0000 (22:18 -0400)]
e2fsck: never free critical metadata blocks in the block found map

Don't allow critical metadata blocks to be marked free in the block
found map.  This can theoretically happen on an FS where a first
inode's ETB/indirect map block is in the inode table, the first inode
is itself unclonable (and thus gets deleted) and there are enough
crosslinked files before and after the first inode to use up all the
free blocks during pass 1b.

(I do actually have a test FS image but it's 256M and it proved very
difficult to craft a bite-sized test case that actually hit this bug.)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fuzz: fix fs handle cleanup when closing fails
Darrick J. Wong [Sun, 3 Aug 2014 02:18:29 +0000 (22:18 -0400)]
e2fuzz: fix fs handle cleanup when closing fails

Fix the handling of 'fs' when closing the FS fails so that we don't
dereference a NULL pointer.  Adapt to use ext2fs_close_free while
we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Fixes-Coverity-Bug: 1229241
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Sun, 3 Aug 2014 02:05:03 +0000 (22:05 -0400)]
Merge branch 'maint' into next

Conflicts:
configure
misc/Makefile.in

10 years agodebugfs: fix argument parsing in do_freefrag()
Artemiy Volkov [Sat, 2 Aug 2014 23:53:04 +0000 (19:53 -0400)]
debugfs: fix argument parsing in do_freefrag()

When do_freefrag() is called from debugfs, the value of optind is
not reset. Rectify that by calling reset_getopt().

Signed-off-by: Artemiy Volkov <artemiyv@acm.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agomisc: fix Makefile for profiled build
Theodore Ts'o [Sat, 2 Aug 2014 23:43:10 +0000 (19:43 -0400)]
misc: fix Makefile for profiled build

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: when appending to a file, don't split an index block in equal halves
Darrick J. Wong [Thu, 1 May 2014 23:15:05 +0000 (16:15 -0700)]
libext2fs: when appending to a file, don't split an index block in equal halves

When we're appending an extent to the end of a file and the index
block is full, don't split the index block into two half-full index
blocks because this leaves us with under utilized index blocks, at
least in the fallocate case.  Instead, copy the last extent from the
full block into the new block.  This isn't perfect utilization, but
there's a lot of work involved in teaching extent.c to be able to goto
a nonexistent node in a newly allocated (and empty) extent block.

This patch does not fix the general problem of keeping the extent tree
balanced.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: have UNIX IO manager use pread/pwrite
Darrick J. Wong [Sat, 2 Aug 2014 23:18:03 +0000 (19:18 -0400)]
libext2fs: have UNIX IO manager use pread/pwrite

If pread/pwrite are present, have the UNIX IO manager use them for
aligned IOs (instead of the current seek -> read/write), thereby
saving us a (minor) amount of system call overhead.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agofilefrag: minor code fixes and cleanups
Andreas Dilger [Wed, 30 Jul 2014 20:25:49 +0000 (14:25 -0600)]
filefrag: minor code fixes and cleanups

Print filefrag_fiemap() error message to stderr instead of stdout.

Only call ioctl(EXT3_IOC_GETFLAGS) for ext{2,3,4} filesystems to
decide if the ext2 indirect block allocation heuristic shold be used.

Properly handle the the force_bmap (-B) option.

Exit with a positive error number instead of a negative one.

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: fix f_badcluster output formatting
Andreas Dilger [Tue, 29 Jul 2014 23:30:40 +0000 (17:30 -0600)]
tests: fix f_badcluster output formatting

The f_badcluster output format depends on how libreadline formats
and outputs the commands read from stdin.  Instead of trying to
handle these differences, use an input command file, which does
not depend on external components to be consistent.

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agomisc: quiet signed/unsigned charactr compiler warnings
Andreas Dilger [Tue, 29 Jul 2014 23:30:39 +0000 (17:30 -0600)]
misc: quiet signed/unsigned charactr compiler warnings

Quiet warnings about signed vs. unsigned character mismatch.
Use __u8 for storing UUIDs instead of char to match the superblock
s_uuid field.

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotune2fs: fix uninitialized variable in remove_journal_device
Theodore Ts'o [Thu, 31 Jul 2014 15:49:48 +0000 (11:49 -0400)]
tune2fs: fix uninitialized variable in remove_journal_device

This bug was introduced by commit 7dfefaf413bbd ("tune2fs: update
journal super block when changing UUID for fs").

Fixes-Coverity-Bug: 1229243

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'next'
Theodore Ts'o [Tue, 29 Jul 2014 14:56:34 +0000 (10:56 -0400)]
Merge branch 'next'

10 years agoMerge branch 'maint' into next
Theodore Ts'o [Tue, 29 Jul 2014 14:53:49 +0000 (10:53 -0400)]
Merge branch 'maint' into next

10 years agotune2fs: update journal users while updating fs UUID (with external journal)
Azat Khuzhin [Mon, 28 Jul 2014 07:43:25 +0000 (11:43 +0400)]
tune2fs: update journal users while updating fs UUID (with external journal)

When we have fs with external journal device, and updating it's UUID, we
should update UUID in users list for that external journal device.

Before:
$ tune2fs -U clear /tmp/dev
tune2fs 1.42.10 (18-May-2014)
$ dumpe2fs /tmp/dev | fgrep UUID
dumpe2fs 1.42.10 (18-May-2014)
Filesystem UUID:          <none>
Journal UUID:             da1f2ed0-60f6-aaaa-92fd-738701418523
$ dumpe2fs /tmp/journal | fgrep users -A10
dumpe2fs 1.42.10 (18-May-2014)
Journal number of users:  2
Journal users:            0707762d-638e-4bc6-944e-ae8ee7a3359e
                          0ad849df-1041-4f0a-b1c1-2f949d6a1e37

After:
$ sudo tune2fs -U clear /tmp/dev
tune2fs 1.43-WIP (18-May-2014)
$ dumpe2fs /tmp/dev | fgrep UUID
dumpe2fs 1.42.10 (18-May-2014)
Filesystem UUID:          <none>
Journal UUID:             da1f2ed0-60f6-aaaa-92fd-738701418523
$ dumpe2fs /tmp/journal | fgrep users -A10
dumpe2fs 1.42.10 (18-May-2014)
Journal number of users:  2
Journal users:            0707762d-638e-4bc6-944e-ae8ee7a3359e
                          00000000-0000-0000-0000-000000000000

Also add some consts to avoid *magic numbers*:
- UUID_STR_SIZE
- UUID_SIZE
- JFS_USERS_MAX
- JFS_USERS_SIZE

Proposed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotune2fs: update journal super block when changing UUID for fs.
Azat Khuzhin [Mon, 28 Jul 2014 07:43:24 +0000 (11:43 +0400)]
tune2fs: update journal super block when changing UUID for fs.

Using -U option you can change the UUID for fs, however it will not work
for journal device, since it have a copy of this UUID inside jsb (i.e.
journal super block). So copy UUID on change into that block.

Here is the initial thread:
http://comments.gmane.org/gmane.comp.file-systems.ext4/44532

You can reproduce this by executing following commands:
$ fallocate -l100M /tmp/dev
$ fallocate -l100M /tmp/journal
$ sudo /sbin/losetup /dev/loop1 /tmp/dev
$ sudo /sbin/losetup /dev/loop0 /tmp/journal
$ mke2fs -O journal_dev /tmp/journal
$ tune2fs -U da1f2ed0-60f6-aaaa-92fd-738701418523 /tmp/journal
$ sudo mke2fs -t ext4 -J device=/dev/loop0 /dev/loop1
$ dumpe2fs -h /tmp/dev | fgrep UUID
dumpe2fs 1.43-WIP (18-May-2014)
Filesystem UUID:          8a776be9-12eb-411f-8e88-b873575ecfb6
Journal UUID:             e3d02151-e776-4865-af25-aecb7291e8e5
$ sudo e2fsck /dev/vdc
e2fsck 1.43-WIP (18-May-2014)
External journal does not support this filesystem

/dev/loop1: ********** WARNING: Filesystem still has errors **********

Reported-by: Chin Tzung Cheng <chintzung@gmail.com>
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotune2fs: remove_journal_device(): use the correct block to find jsb
Azat Khuzhin [Mon, 28 Jul 2014 07:43:23 +0000 (11:43 +0400)]
tune2fs: remove_journal_device(): use the correct block to find jsb

Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agojournal: use consts instead of 1024 and add helper for journal with 1k blocksize
Azat Khuzhin [Mon, 28 Jul 2014 07:43:22 +0000 (11:43 +0400)]
journal: use consts instead of 1024 and add helper for journal with 1k blocksize

Use EXT2_MIN_BLOCK_SIZE, JFS_MIN_JOURNAL_BLOCKS, SUPERBLOCK_SIZE, and
SUPERBLOCK_OFFSET instead of hardcoded 1024 when it is okay, and also
add a helper ext2fs_journal_sb_start() that will return start of
journal sb with special case for fs with 1k block size.

Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Mon, 28 Jul 2014 19:39:24 +0000 (15:39 -0400)]
Merge branch 'maint' into next

10 years agotests: add the f_badcluster test
Darrick J. Wong [Mon, 28 Jul 2014 19:37:03 +0000 (15:37 -0400)]
tests: add the f_badcluster test

This should have been part of commit 9a1d614df21 ("e2fsck: fix
rule-violating lblk->pblk mappings on bigalloc filesystems") but it
accidentally got dropped when the patch was applied.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agomisc: copy extended attributes in populate_fs
Ross Burton [Thu, 10 Jul 2014 16:44:38 +0000 (17:44 +0100)]
misc: copy extended attributes in populate_fs

When creating a file system using a source directory, also copy any extended
attributes that have been set.

[ Add configure tests for Linux-specific xattr syscalls and add fallback
  when compiling on non-Linux systems. --tytso ]

Signed-off-by: Ross Burton <ross.burton@intel.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agofilefrag: fix block size value
Rakesh Pandit [Mon, 28 Jul 2014 00:04:48 +0000 (20:04 -0400)]
filefrag: fix block size value

ioctl(FIGETBSZ) was used to get block size earlier but 2508eaa7
(filefrag: improvements to filefrag FIEMAP handling) moved to fstatfs
f_bsize which doesn't work well for many files systems.

Block size returned using fstatfs isn't block size but "optimal
transfer block size" as per man page.  Even stat st_blksize is
"preferred I/O block size" and in may file systems it may even vary
from file to file (POSIX).  This patch changes filefrag to use
FIGETBSZ preferentially over f_bsize.

[ Modified by tytso to add the fallback to f_bsize if FIGETBSZ fails
  for some reason ]

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agofilefrag: fix -B option and extents calculation for FIBMAP
Rakesh Pandit [Sun, 27 Jul 2014 23:56:27 +0000 (19:56 -0400)]
filefrag: fix -B option and extents calculation for FIBMAP

29758d2 broke -B option which is useful for filesystems not supporting
FIEMAP. Also, fix extents calculation for -B which is broken since
2508eaa7.

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: don't offer to fix the checksum of fixed extents
Darrick J. Wong [Sun, 27 Jul 2014 23:51:37 +0000 (19:51 -0400)]
e2fsck: don't offer to fix the checksum of fixed extents

If an extent fails checksum and the sanity checks, and the user elects
to fix the extents, don't bother asking (the second time) if the user
would like to fix the checksum.  Refactor some redundant code to make
what's going on a little cleaner.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: make insert_dirent_tail more robust
Darrick J. Wong [Sun, 27 Jul 2014 23:46:15 +0000 (19:46 -0400)]
e2fsck: make insert_dirent_tail more robust

Fix the routine that adds dirent checksum structures to the directory
block to handle oddball situations a bit more robustly.

First, when we're walking the entry array, we might encounter an
entry that ends exactly one byte before where the checksum entry needs
to start, i.e. there's space for the tail entry, but it needs to be
reinitialized.  When that happens, we should proceed until d points to
that space so that the tail entry can be initialized.

Second, it's possible that we've been fed a directory block where the
entries end just short of the end of the block.  In this case, we need
to adjust the size of the last entry to point exactly to where the
dirent tail starts.  The current code requires that entries end
exactly on the block boundary, but this is not always the case with
damaged filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: leave room for checksum structure when salvaging a directory
Darrick J. Wong [Sun, 27 Jul 2014 23:45:04 +0000 (19:45 -0400)]
e2fsck: leave room for checksum structure when salvaging a directory

When we're salvaging a directory, leave room at the end of the block
for the checksum entry so that e2fsck can write the checksummed dir
block out later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: clear badblocks inode when checksum fails
Darrick J. Wong [Sat, 26 Jul 2014 00:35:12 +0000 (17:35 -0700)]
e2fsck: clear badblocks inode when checksum fails

If the badblocks inode fails checksum verification, just clear the
inode and move on.  If we don't do this, we can end up importing a lot
of garbage into the badblocks list, which will then cause fsck to try
to regenerate anything that was sitting atop the supposedly damaged
blocks.  Given that most hardware will remap bad sectors transparently
from ext4, the number of people this could affect adversely is pretty
low.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: write dir blocks after new inode when reconstructing root/lost+found
Darrick J. Wong [Sat, 26 Jul 2014 21:14:40 +0000 (17:14 -0400)]
e2fsck: write dir blocks after new inode when reconstructing root/lost+found

If we trash the root directory block, e2fsck will find inode 11 (the
old lost+found) and try to attach it to l+f.  The lost+found checker
also fails to find l+f and tries to add one to the root dir.  The root
dir is not found but is recreated with incorrect checksums, so linking
in the l+f dir fails and the l+f '..' entry isn't set.  Since both
dirs now fail checksum verification, they're both referred to rehash
to have that fixed, but because l+f doesn't have a '..' entry, rehash
crashes because l+f has < 2 entries.

On a checksumming filesystem, the routines in e2fsck that recreate
/lost+found and / must write the new directory block *after* the inode
has been written to disk because the checksum depends on i_generation.
Add a regression test while we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: insert a missing dirent tail for checksums if possible
Darrick J. Wong [Sat, 26 Jul 2014 21:13:31 +0000 (17:13 -0400)]
e2fsck: insert a missing dirent tail for checksums if possible

If e2fsck is writing a block of directory entries to disk, it should
adjust the dirents to add the dirent tail if one is missing.  It's not
a big deal if there's no space to do this since rehash (pass 3A) will
reconstruct directories for us.  However, we may as well avoid
unnecessary work.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: fix the various checksum error messages
Darrick J. Wong [Sat, 26 Jul 2014 00:34:28 +0000 (17:34 -0700)]
e2fsck: fix the various checksum error messages

Make the "EA block passes checks but fails checksum" message less
strange, and make the other checksum error messages actually print a
period at the end of the sentence.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Sat, 26 Jul 2014 20:53:37 +0000 (16:53 -0400)]
Merge branch 'maint' into next

Conflicts:
e2fsck/pass1b.c

10 years agoe2fsck: during pass1b delete_file, only free a cluster once
Darrick J. Wong [Sat, 26 Jul 2014 20:28:58 +0000 (16:28 -0400)]
e2fsck: during pass1b delete_file, only free a cluster once

If we're forced to delete a crosslinked file, only call
ext2fs_block_alloc_stats2() on cluster boundaries, since the block
bitmaps are all cluster bitmaps at this point.  It's safe to do this
only once per cluster since we know all the blocks are going away.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: fix rule-violating lblk->pblk mappings on bigalloc filesystems
Darrick J. Wong [Sat, 26 Jul 2014 00:34:04 +0000 (17:34 -0700)]
e2fsck: fix rule-violating lblk->pblk mappings on bigalloc filesystems

As far as I can tell, logical block mappings on a bigalloc filesystem are
supposed to follow a few constraints:

 * The logical cluster offset must match the physical cluster offset.
 * A logical cluster may not map to multiple physical clusters.

Since the multiply-claimed block recovery code can be used to fix these
problems, teach e2fsck to find these transgressions and fix them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: perform implied cluster allocations when filling a directory hole
Darrick J. Wong [Sat, 26 Jul 2014 00:33:57 +0000 (17:33 -0700)]
e2fsck: perform implied cluster allocations when filling a directory hole

If we're filling a directory hole, we need to perform an implied
cluster allocation to satisfy the bigalloc rule of mapping only one
pblk to a logical cluster.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: fix merge error in "clear uninit flag on directory extents"
Darrick J. Wong [Sat, 26 Jul 2014 20:03:10 +0000 (16:03 -0400)]
e2fsck: fix merge error in "clear uninit flag on directory extents"

In the original patch (against -next), the hunk to fix uninit dirs was
just prior to the hunk labelled "Corrupt but passes checks?".  The
hunks are ordered this way so that if e2fsck obtains permission to fix
a failed-csum extent (which in turn fixes the checksum), it will not
subsequently ask to (re)fix the checksum.

Due to a merge error the hunk moved to the wrong place, so put it
back.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Sat, 26 Jul 2014 19:57:42 +0000 (15:57 -0400)]
Merge branch 'maint' into next

Conflicts:
debugfs/debugfs.c
e2fsck/pass1.c

10 years agoe2fsck: reserve blocks for root/lost+found directory repair
Darrick J. Wong [Sat, 26 Jul 2014 00:33:45 +0000 (17:33 -0700)]
e2fsck: reserve blocks for root/lost+found directory repair

If we think we're going to need to repair either the root directory or
the lost+found directory, reserve a block at the end of pass 1 to
reduce the likelihood of an e2fsck abort while reconstructing
root/lost+found during pass 3.

If / and/or /lost+found are corrupt and duplicate processing in pass
1b allocates all the free blocks in the FS, fsck aborts with an
unusable FS since pass 3 can't recreate / or /lost+found.  If either
of those directories are missing, an admin can't easily mount the FS
and access the directory tree to move files off the injured FS and
free up space; this in turn prevents subsequent runs of e2fsck from
being able to continue repairs of the FS.

(One could migrate files manually with debugfs without the help of
path names, but it seems easier if users can simply mount the FS and
use regular FS management tools.)

[ Fixed up an obvious C trap: const char * and const char [] are not
  the same thing when you are taking the size of the parameter.
  People, run your regression tests!  Like spinach, it's good for you.  :-)
  -- tytso ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: provide a function to set inode size
Darrick J. Wong [Sat, 26 Jul 2014 18:34:56 +0000 (14:34 -0400)]
libext2fs: provide a function to set inode size

Provide an API to set i_size in an inode and take care of all required
feature flag modifications.  Refactor the code to use this new
function.

[ Moved the function to lib/ext2fs/blk_num.c, which is the rest of
  these sorts of functions live, and renamed it to be
  ext2fs_inode_size_set() instead of ext2fs_inode_set_size() to be
  consistent with the other functions in in blk_num.c -- tytso ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Sat, 26 Jul 2014 13:45:19 +0000 (09:45 -0400)]
Merge branch 'maint' into next

Conflicts:
debugfs/debugfs.c
e2fsck/pass5.c

10 years agolibext2fs: fix free block accounting for 64-bit file systems
Theodore Ts'o [Sat, 26 Jul 2014 13:25:40 +0000 (09:25 -0400)]
libext2fs: fix free block accounting for 64-bit file systems

We rely on a nasty hack to adjust the free block count where we pass
signed value into ext2fs_free_blocks_count_add(), which takes an
64-bit unsigned value, and relies on overflow and C's signed->unsigned
semantics to do the subtraction.  This works, so long as a 64-bit
signed value is used.

Unfortunately, ext2fs_block_alloc_stats2() and
ext2fs_block_alloc_stats_range(), this is not true, so on a 64-bit
file system, the free blocks accounting can get screwed up.

A simple way to demonstrate the problem is:

mke2fs -F -t ext4 -O 64bit /tmp/foo.img 1M
e2fsck -fy /tmp/foo.img

... which will result in the following e2fsck complaint:

Pass 5: Checking group summary information
Free blocks count wrong (4294968278, counted=982).
Fix? yes

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoFix 32/64-bit overflow when multiplying by blocks/clusters per group
Theodore Ts'o [Sat, 26 Jul 2014 11:40:36 +0000 (07:40 -0400)]
Fix 32/64-bit overflow when multiplying by blocks/clusters per group

There are a number of places where we need convert groups to blocks or
clusters by multiply the groups by blocks/clusters per group.
Unfortunately, both quantities are 32-bit, but the result needs to be
64-bit, and very often the cast to 64-bit gets lost.

Fix this by adding new macros, EXT2_GROUPS_TO_BLOCKS() and
EXT2_GROUPS_TO_CLUSTERS().

This should fix a bug where resizing a 64bit file system can result in
calculate_minimum_resize_size() looping forever.

Addresses-Launchpad-Bug: #1321958

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: use C99 initializers for the io_manager structure
Theodore Ts'o [Sat, 26 Jul 2014 04:49:14 +0000 (00:49 -0400)]
libext2fs: use C99 initializers for the io_manager structure

Using C99 initializers makes the code a bit more readable, and it
avoids some gcc -Wall warnings regarding missing initializers.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoresize2fs: radically reduce memory utilization by using rbtree bitmaps
Theodore Ts'o [Fri, 25 Jul 2014 17:38:50 +0000 (13:38 -0400)]
resize2fs: radically reduce memory utilization by using rbtree bitmaps

When resizing an empty 21T file system to 28T, resize2fs was using
this much CPU time and memory:

216.98user 19.77system 4:02.92elapsed 97%CPU (0avgtext+0avgdata 4485664maxresident)k
8inputs+1068680outputs (0major+800745minor)pagefaults 0swaps

After this one-line change:

222.29user 0.49system 3:48.79elapsed 97%CPU (0avgtext+0avgdata 30080maxresident)k
8inputs+1068552outputs (0major+2497minor)pagefaults 0swaps

So this reduces the max memory utilized from 4.2GB to 29MB!

For future work, the primary place where we are spending the most cpu
time (from resize2fs -d 16) are these two places:

blocks_to_move: Memory used: 2508k/25096k (1903k/606k), time: 91.42/91.53/ 0.00

and

calculate_summary_stats: Memory used: 2508k/25612k (1908k/601k), time: 95.33/95.45/ 0.00

The calculate_summary_stats pass can be sped up by using
ext2fs_find_first_{zero,set}_block_bitmap2(), instead of iterating
over the entire block bitmap one bit at a time.

The blocks_to_move pass can be sped up by using a bitmap to store the
location of fs metadata blocks, to avoid an O(N**2) algorithm where N
is the number of groups in the file system.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: fix rb_resize_bmap to handle the padding bits
Theodore Ts'o [Sat, 26 Jul 2014 04:45:28 +0000 (00:45 -0400)]
libext2fs: fix rb_resize_bmap to handle the padding bits

The bits between end and real_end are set as a safety measure for the
kernel when it uses the bit scan instructions.  We need to take this
into account when shrinking or growing the block allocation bitmap,
before we can safely use rbtree bitmaps in resize2fs.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agotests: use e2fsck -f instead of -p for resize tests
Theodore Ts'o [Sat, 26 Jul 2014 04:47:37 +0000 (00:47 -0400)]
tests: use e2fsck -f instead of -p for resize tests

Using e2sck -f provides better debugging information if things go
wrong.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agobuild: fix unused/uninitialized variable warnings
Andreas Dilger [Sat, 26 Jul 2014 01:43:08 +0000 (21:43 -0400)]
build: fix unused/uninitialized variable warnings

Fix a few warnings about unused and uninitialized variables.

Also fix util/subst.c to include <sys/time.h> to avoid using
undeclared functions gettimeofday() and futimes().

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fuzz: Create a tool to fuzz ext* filesystems
Darrick J. Wong [Fri, 25 Jul 2014 13:01:17 +0000 (09:01 -0400)]
e2fuzz: Create a tool to fuzz ext* filesystems

Creates a program that fuzzes only the metadata blocks (or optionally
all in-use blocks) of an ext* filesystem.  There's also a script to
automate fuzz testing of the kernel and e2fsck in a loop.

[ Modified by tytso to add e2fuzz to the clean makefile rule ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fuzz: Create a tool to fuzz ext* filesystems
Darrick J. Wong [Fri, 25 Jul 2014 13:01:17 +0000 (09:01 -0400)]
e2fuzz: Create a tool to fuzz ext* filesystems

Creates a program that fuzzes only the metadata blocks (or optionally
all in-use blocks) of an ext* filesystem.  There's also a script to
automate fuzz testing of the kernel and e2fsck in a loop.

[ Modified by tytso to add e2fuzz to the clean target ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agomke2fs: set error behavior at initialization time
Darrick J. Wong [Fri, 25 Jul 2014 12:58:29 +0000 (08:58 -0400)]
mke2fs: set error behavior at initialization time

Port tune2fs' -e flag to mke2fs so that we can set error behavior at
format time, and introduce the equivalent errors= setting into
mke2fs.conf.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Fri, 25 Jul 2014 12:58:10 +0000 (08:58 -0400)]
Merge branch 'maint' into next

Conflicts:
e2fsck/pass1.c
e2fsck/problem.h

10 years agoe2fsck: clear uninit flag on directory extents
Darrick J. Wong [Fri, 18 Jul 2014 22:55:21 +0000 (15:55 -0700)]
e2fsck: clear uninit flag on directory extents

Directories can't have uninitialized extents, so offer to clear the
uninit flag when we find this situation.  The actual directory blocks
will be checked in pass 2 and 3 regardless of the uninit flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: pass2 should not process directory blocks that are impossibly large
Darrick J. Wong [Fri, 25 Jul 2014 12:41:11 +0000 (08:41 -0400)]
e2fsck: pass2 should not process directory blocks that are impossibly large

Currently, directories cannot be fallocated, which means that the only
way they get bigger is for the kernel to append blocks one by one.
Therefore, if we encounter a logical block offset that is too big, we
needn't bother adding it to the dblist for pass2 processing, because
it's unlikely to contain a valid directory block.  The code that
handles extent based directories also does not add toobig blocks to
the dblist.

Note that we can easily cause e2fsck to fail with ENOMEM if we start
feeding it really large logical block offsets, as the dblist
implementation will try to realloc() an array big enough to hold it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: always submit logical block 0 of a directory for pass 2
Darrick J. Wong [Fri, 25 Jul 2014 12:39:45 +0000 (08:39 -0400)]
e2fsck: always submit logical block 0 of a directory for pass 2

Always iterate logical block 0 in a directory, even if no physical
block has been allocated.  Pass 2 will notice the lack of mapping and
offer to allocate a new directory block; this enables us to link the
directory into lost+found.

Previously, if there were no logical blocks mapped, we would fail to
pick up even block 0 of the directory for processing in pass 2.  This
meant that e2fsck never allocated a block 0 and therefore wouldn't fix
the missing . and .. entries for the directory; subsequent e2fsck runs
would complain about (yet never fix) the problem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Fri, 25 Jul 2014 12:38:39 +0000 (08:38 -0400)]
Merge branch 'maint' into next

Conflicts:
e2fsck/pass1.c

10 years agoe2fsck: collapse holes in extent-based directories
Darrick J. Wong [Fri, 18 Jul 2014 22:54:30 +0000 (15:54 -0700)]
e2fsck: collapse holes in extent-based directories

If we notice a hole in the block map of an extent-based directory,
offer to collapse the hole by decreasing the logical block # of the
extent.  This saves us from pass 3's inefficient strategy, which fills
the holes by mapping in a lot of empty directory blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: don't crash during rehash
Darrick J. Wong [Fri, 18 Jul 2014 22:54:15 +0000 (15:54 -0700)]
e2fsck: don't crash during rehash

If a user crafts a carefully constructed filesystem containing a
single directory entry block with an invalid checksum and fewer than
two entries, and then runs e2fsck to fix the filesystem, fsck will
crash when it tries to "compress" the short dir and passes a negative
dirent array length to qsort.  Therefore, don't allow directory
"compression" in this situation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agomisc: fix problems with strncat
Darrick J. Wong [Fri, 18 Jul 2014 22:54:07 +0000 (15:54 -0700)]
misc: fix problems with strncat

The third argument to strncat is the maximum number of characters to
copy out of the second argument; it is not the maximum length of the
first argument.

Therefore, code in a check just in case we ever find a /sys/block/X
path long enough to hit the end of the buffer.  FWIW the longest path
I could find on my machine was 133 bytes.

Fixes-Coverity-Bug: 1252003
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agolibext2fs: fix bounds check of the bitmap test range in get_free_blocks2
Darrick J. Wong [Fri, 25 Jul 2014 11:11:57 +0000 (07:11 -0400)]
libext2fs: fix bounds check of the bitmap test range in get_free_blocks2

In the loop in ext2fs_get_free_blocks2, we ask the bitmap if there's a
range of free blocks starting at "b" and ending at "b + num - 1".
That quantity is the number of the last block in the range.  Since
ext2fs_blocks_count() returns the number of blocks and not the number
of the last block in the filesystem, the check is incorrect.

Put in a shortcut to exit the loop if finish > start, because in that
case it's obvious that we don't need to reset to the beginning of the
FS to continue the search for blocks.  This is needed to terminate the
loop because the broken test meant that b could get large enough to
equal finish, which would end the while loop.

The attached testcase shows that with the off by one error, it is
possible to throw e2fsck into an infinite loop while it tries to
find space for the inode table even though there's no space for one.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: fix off-by-one bounds check on group number
Darrick J. Wong [Fri, 25 Jul 2014 02:19:27 +0000 (22:19 -0400)]
e2fsck: fix off-by-one bounds check on group number

Since fs->group_desc_count is the number of block groups, the number
of the last group is always one less than this count.  Fix the bounds
check to reflect that.

This flaw shouldn't have any user-visible side effects, since the
block bitmap test based on last_grp later on can handle overbig block
numbers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: force all block allocations to use block_found_map
Darrick J. Wong [Fri, 18 Jul 2014 22:53:41 +0000 (15:53 -0700)]
e2fsck: force all block allocations to use block_found_map

During the later passes of efsck, we sometimes need to allocate and
map blocks into a file.  This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).

We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.

Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map.  Remove the previous code that swapped bitmap
pointers as this is now unneeded.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: free ctx->fs, not fs, at the end of fsck
Darrick J. Wong [Fri, 25 Jul 2014 01:03:54 +0000 (21:03 -0400)]
e2fsck: free ctx->fs, not fs, at the end of fsck

When we call ext2fs_close_free at the end of main(), we need to supply
the address of ctx->fs, because the subsequent e2fsck_free_context
call will try to access ctx->fs (which is now set to a freed block) to
see if it should free the directory block list.  This is clearly not
desirable, so fix the problem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: don't clobber critical metadata during check_blocks
Darrick J. Wong [Wed, 23 Jul 2014 16:11:23 +0000 (12:11 -0400)]
e2fsck: don't clobber critical metadata during check_blocks

If we encounter an inode with IND/DIND/TIND blocks or internal extent
tree blocks that point into critical FS metadata such as the
superblock, the group descriptors, the bitmaps, or the inode table,
it's quite possible that the validation code for those blocks is not
going to like what it finds, and it'll ask to try to fix the block.
Unfortunately, this happens before duplicate block processing (pass
1b), which means that we can end up doing stupid things like writing
extent blocks into the inode table, which multiplies e2fsck'
destructive effect and can render a filesystem unfixable.

To solve this, create a bitmap of all the critical FS metadata.  If
before pass1b runs (basically check_blocks) we find a metadata block
that points into these critical regions, continue processing that
block, but avoid making any modifications, because we could be
misinterpreting inodes as block maps.  Pass 1b will find the
multiply-owned blocks and fix that situation, which means that we can
then restart e2fsck from the beginning and actually fix whatever
problems we find.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agodebugfs: fix printing of inline data during symlink inode dump
Darrick J. Wong [Tue, 22 Jul 2014 22:01:53 +0000 (18:01 -0400)]
debugfs: fix printing of inline data during symlink inode dump

When we're dumping a fast symlink inode, we print some odd things to
stdout.  To clean this up, first don't print inline data EA, since the
inode dump doesn't display file and directory contents.  Then, teach
the inode dump function how to print out either an inline data fast
symlink or a non-inline data fast symlink.

(This is a follow-up to the earlier patch "debugfs: Only print the
first 60 bytes from i_block on a fast symlink")

[ Modified by tytso so that the d_inline_dump test works when build
  directory is different from the source directory --- i.e., when
  doing a VPATH build. ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoMerge branch 'maint' into next
Theodore Ts'o [Tue, 22 Jul 2014 18:57:40 +0000 (14:57 -0400)]
Merge branch 'maint' into next

Conflicts:
debian/changelog
e2fsck/pass1.c
lib/ext2fs/Makefile.in

10 years agoe2fsck: fix inode coherency issue when iterating an inode's blocks
Darrick J. Wong [Fri, 18 Jul 2014 22:53:11 +0000 (15:53 -0700)]
e2fsck: fix inode coherency issue when iterating an inode's blocks

When we're about to iterate the blocks of a block-map file, we need to
write the inode out to disk if it's dirty because block_iterate3()
will re-read the inode from disk.  (In practice this won't happen
because nothing dirties block-mapped inodes before the iterate call,
but we can program defensively).

More importantly, we need to re-read the inode after the iterate()
operation because it's possible that mappings were changed (or erased)
during the iteration.  If we then dirty or clear the inode, we'll
mistakenly write the old inode values back out to disk!

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: check error return from ext2fs_extent_fix_parents in pass 1
Theodore Ts'o [Tue, 22 Jul 2014 18:48:41 +0000 (14:48 -0400)]
e2fsck: check error return from ext2fs_extent_fix_parents in pass 1

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: skip clearing bad extents if bitmaps are unreadable
Darrick J. Wong [Fri, 18 Jul 2014 22:53:04 +0000 (15:53 -0700)]
e2fsck: skip clearing bad extents if bitmaps are unreadable

If the bitmaps are known to be unreadable, don't bother clearing them;
just mark fsck to restart itself after pass 5, by which time the
bitmaps should be fixed.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 years agoe2fsck: don't offer to recreate the journal if fsck is aborting due to bad block...
Darrick J. Wong [Tue, 22 Jul 2014 17:54:54 +0000 (13:54 -0400)]
e2fsck: don't offer to recreate the journal if fsck is aborting due to bad block bitmaps

If e2fsck knows the bitmaps are bad at the exit (probably because they
were bad at the start and have not been fixed), don't offer to
recreate the journal because doing so causes e2fsck to abort a second
time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>