git://git.whamcloud.com - tools/e2fsprogs.git/log

libext2fs: report bad magic over bad sb checksum

We don't want ext2fs_open2() to report bad sb checksum on something
that's not even an ext* superblock. This apparently happens pretty
easily if we try to open an XFS filesystem. Thus, make it so that a
bad magic number code always trumps the sb checksum error code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck/debugfs: fix descriptor block size handling errors with journal_csum

It turns out that there are some serious problems with the on-disk
format of journal checksum v2.  The foremost is that the function to
calculate descriptor tag size returns sizes that are too big.  This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags.  These errors regrettably lead to the journal
corruption reported by Mr. Reardon.

Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.

Add a few function helpers so we don't have to open-code quite so
many pieces.

Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge branch 'maint' into next

Conflicts:
debugfs/debugfs.c
e2fsck/Makefile.in
lib/ext2fs/Makefile.in
tests/test_config

Merge remote-tracking branch 'origin/maint' into maint

e2fsck: notice when the realloc of dir_info fails

If the reallocation of dir_info fails, we will eventually cause e2fsck
to fail with an internal error. So if the realloc fails, print a
message and bail out with a fatal error early when at the time of the
reallocation failure.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

compile_et: Allow user to override ET_DIR

Signed-off-by: Michael Forney <forney@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Apply LDFLAGS when building tests

Signed-off-by: Michael Forney <forney@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: Add to LD_LIBRARY_PATH instead of overriding

Signed-off-by: Michael Forney <forney@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

debugfs: add better error checking when printing extended attributes

Check to make sure the length of the name and value fields in the
extended attribute don't result in overrun the bounds of the inode.

Addresses-Coverity-Bug: #709517

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge tag 'v1.42.12' into next

v1.42.12

Conflicts:
version.h

Update release notes, etc. for final 1.42.12 release

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update vi.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update uk.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update sv.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update pl.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update nl.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update fr.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update cs.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

mke2fs: complain if bigalloc and hugefiles_align_disk is incompatible

If the starting partition offset is incompatible with the bigalloc
cluster size, complain and exit, instead of creating a file which
would have a logical to physical block mapping which breaks the
cluster alignment requirement.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: fix infinite loop when recovering corrupt journal blocks

When recovering the journal, don't fall into an infinite loop if we
encounter a corrupt journal block. Instead, just skip the block and
proceed with the full filesystem fsck.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: resync jbd2 revoke code from Linux 3.16

Synchronize e2fsck's copy of revoke.c with the kernel's copy in
fs/jbd2.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: resync jbd2 recovery code from Linux 3.16

Synchronize e2fsck's copy of recovery.c with the kernel's copy in
fs/jbd2.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

contrib: add script to help resync journal code with kernel

Add a script that handles (most) of the code massaging necessary to resync
{recovery,revoke}.c from the Linux kernel into e2fsprogs.

Usage: jbd2-resync.sh linux/fs/jbd2/revoke.c e2fsprogs/e2fsck/revoke.c

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsprogs.pot: update POT-Creation-Date

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: fix spelling error (strage vs storage)

Reported-by: Philipp Thomas <pth@suse.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsprogs: add supported file attributes to ext4.5 manpage

The chattr(1) manpage now refers users to filesystem-specific
manpages for details on supported attributes, so add those to
ext4.5.

I've left out oddities like being able to set the compressed
or no-tail-packing flags, or setting data journaling on ext2.

That behavior seems like a bug, not a feature.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge branch 'maint' into next

Conflicts:
RELEASE-NOTES
debian/changelog
version.h

tests/d_inline_dump: remove version dependency in the expected output

Also add the convenience macro $CLEAN_OUTPUT in test_config which can
be used to run the "sed -e $cmd_dir/filter.sed" command to clean up
e2fsprogs command output before comparing with the expected golden
output.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Interim updates of release notes, etc. in preparation for 1.42.12 release

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update sv.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update fr.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update es.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

po: update cs.po (from translationproject.org)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

mke2fs: improve the error message when a non-existent file is specified

If the user does not specify the file system size, and the file does
not exist, give an error message like this:

   The file /tmp/foo.img does not exist and no size was specified.

instead of this:

    Creating regular file /tmp/foo.img
    mke2fs: Device size reported to be zero.  Invalid partition specified, or
    partition table wasn't reread after running fdisk, due to
    a modified partition being busy and in use.  You may need to reboot
    to re-read your partition table.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

mke2fs.8.in: explain how the fs-size parameter is interpreted

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

resize2fs: clarify the size of blocks in resize2fs's messages

Addresses-Debian-Bug: #758029

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

resize2fs.8.in: clarify when on-line resizing is supported

Addresses-Debian-Bug: #726760

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

dumpe2fs: complain if extra arguments are given on the command line

Addresses-Debian-Bug: #758074

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: on BE, re-swap everything after a damaged dirent so salvage works correctly

On big-endian systems, if the dirent swap routine finds a rec_len that
it doesn't like, it continues processing the block as if rec_len == 8.
This means that the name field gets byte swapped, which means that
salvage will not detect the correct name length (unless the name has a
length that's an exact multiple of four bytes), and it'll discard the
entry (unnecessarily) and the rest of the dirent block. Therefore,
swap the rest of the block back to disk order, run salvage, and
re-swap anything after the salvaged dirent.

The test case for this is f_inlinedata_repair if you run it on a BE
system.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: fix problems with LE<->BE conversions on BE platforms

Fix more problems that I found when testing on ppc64:

- Inode swap cut and paste error leads to immutable inodes being
  detected as inlinedata inodes, leading to e2fsck incorrectly barfing
  on i_block[] contents.

- Superblock csum/verify must be aware of the fs->super byte order
  when checking for metadata_csum feature flag.  (Hint: in _openfs(),
  fs->super is in LE order for the first csum verification)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: byteswap inode when performing the sanity scan

On BE platforms, we need to swap the inode bytes after doing the
checksum verification but before looking at i_blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fuzz: fix pwrite64/pwrite usage

Select pwrite64 or pwrite depending on what autoconf finds. This
makes e2fuzz find a suitable pwrite variant regardless of platform.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

misc: fix gcc warnings

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: create inlinedata symlinks

Add to ext2fs_symlink the ability to create inline data symlinks.

[ Modified by tytso to add more logging to the test script ]

Suggested-by: Pu Hou <houpu.hp@alibaba-inc.com>
Cc: Pu Hou <houpu.hp@alibaba-inc.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

debugfs: fix set_inode_field block[IND|DIND|TIND]

After we determine that we can't parse the array value as an integer,
we need to restore the square brackets to the field name, so that we
can find a match with block[IND], block[DIND], and block[TIND] in the
inode field table.

Reported-by: Jun He <jhe@cs.wisc.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge branch 'maint' into next

filefrag: fix extent count calculation when using FIBMAP

The extent count calculation works correctly with the FIBMAP ioctl in
verbose (-v) mode, but without the verbose option, the calculation was
broken because we weren't properly updating the fm_ext data structures
in non-verbose mode.

Addresses-Launchpad-Bug: #1356496

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: convert use of md5sum to crcsum

The following tests were using md5sum: i_e2image, u_mke2fs, and
u_tune2fs. Convert them to use crcsum for better portability (not all
environments have md5sum; some might have sha1sum instead :-)

For our purposes crcsum is quite sufficient.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: don't flush the FS unless it's actually dirty

ext2fs_flush2() unconditionally writes the block group descriptors to
disk even if the underlying FS isn't marked dirty.  This causes the
following error message on a fsck -n run:

e2fsck 1.43-WIP (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error writing block 2 (Attempt to write block to filesystem resulted in short write).  Ignore error? no

Error writing block 2 (Attempt to write block to filesystem resulted in short write).  Ignore error? no

Error writing file system info: Attempt to write block to filesystem resulted in short write

Since ext2fs_close2() only calls flush if the dirty flag is set,
modify e2fsck to exhibit the same behavior so that we don't spit out
write errors for a read only check.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge branch 'maint' into next

Conflicts:
e2fsck/unix.c

tests: add regression tests for inlinedata fixes

Add a regression test to ensure that previous patches' fixes to e2fsck
do not revert.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: don't set prev after processing '..' on an inline dir

In an inline directory, the '..' entry is compacted down to just the
inode number; there is no full '..' entry. Therefore, it makes no
sense to assign 'prev' to the fake dotdot entry we put on the stack,
as this could confuse a salvage_directory call on a corrupted next
entry into modifying stack contents (the fake dotdot entry).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: be more careful in assuming inline_data inodes are directories

If a file is marked inline_data but its i_size isn't a multiple of
four, it probably isn't an inline directory, because directory entries
have sizes that are multiples of four.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: check inline dir size is a multiple of 4

Directory entries must have a size that's a multiple of 4; therefore
the inline directory structure must also have a size that is a muliple
of 4. Since e2fsck doesn't check this, we should check that now.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: use the correct block size when salvaging directories

Now that the directory salvaging operation is fed the block size,
teach pass 2 that it should use the size of the inline data if the
directory is inline_data. Without this, it'll "fix" inline
directories by setting the rec_len to something approaching the FS
blocksize, which is clearly wrong.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: do a better job of fixing i_size of inline directories

If we encounter a directory whose i_size != the inline data size, just
set i_size to the size of the inline data. The pb.last_block
calculation is wrong since pb.last_block == -1, which results in
i_size being set to zero, which corrupts the directory.

Clear the inline_data inode flag if we actually /are/ setting i_size
to zero.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: fix conflicting extents|inlinedata inode flags

If we come across an inode with the inline data and extents inode flag
set, try to figure out the correct flag settings from the contents of
i_block and i_size. If i_blocks looks like an extent tree head, we'll
make it an extent inode; if it's small enough for inline data, set it
to that. This leaves the weird gray area where there's no extent
tree but it's too big for the inode -- if /could/ be a block map,
change it to that; otherwise, just clear the inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: clear extents and inline_data flags from fifo/socket/device inodes

Since fifo, socket, and device inodes cannot have inline data or
extents, strip off these flags if we find them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: don't try to iterate blocks of an inline_data inode when deallocating it

Inodes with inline_data set do not have iterable blocks, so don't try
to iterate the blocks, because that will just fail, causing e2fsck to
abort.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: check inline directory data "block" first

Since the inline data flag will cause the extent/block map iteration
code to abort fsck early, move the test for the inode flag and the
actual block check call further forward in check_blocks. This
eliminates an e2fsck abort on an inline data symlink when the file ACL
block is set.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: handle inline data symlinks

Perform some basic checks on inline-data symlinks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: clear inline_data inode flag if EA missing

If i_size indicates that an inode requires a system.data extended
attribute to hold overflow from i_blocks but the EA cannot be found,
offer to truncate the file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: check ea-in-inode regions for overlap

Ensure that the various blobs in the in-inode EA region do not overlap.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: fix memory leak when failing to iterate inline_data directory

The xattr_get method returns to us a pointer to a buffer containing
the EA value. If for some reason we decide to fail out of iterating
the EA part of an inline-data directory, we must free the buffer that
xattr_get passed to us (via inline_data_ea_get).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: don't fail inline data operations if there's no EA

Fix up the rest of the inline data code not to complain if there's no
EA, since it's possible that there's no EA because we're in the
process of creating an inline data file. Also, don't return an error
code when removing a nonexistent EA, because there's no reason to.

Furthermore, if we write less than 60 bytes of inline data, remove the
EA to avoid wasting space.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: strict inline data overwrite should not return ENOSPC

If we're doing a strict overwrite (same data size) of data in an
inline data file, we should be able to skip the size check. If the
in-core EA representation is fine but the on-disk EA is slightly
corrupt (this happens when fixing minor errors in an inline dir), the
ext2fs_xattr_inode_max_size() call, which reads the disk EA, can lead
us to think that there's no space when in reality there is no issue
with doing a strict overwrite.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

misc: fix various endianness problems with inline_data

The inline data code fails to perform endianness conversions correctly
or at all in a number of places, so fix this so that big-endian
machines function properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs/e2fsck: don't run off the end of the EA block

When we're (a) reading EAs into a buffer; (b) byte-swapping EA
entries; or (c) checking EA data, be careful not to run off the end of
the memory buffer, because this causes invalid memory accesses and
e2fsck crashes. This can happen if we encounter a specially crafted
FS image.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: check EA value offset

Perform a little more sanity checking of EA value offsets so that we
don't crash while trying to load things from the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: convert 'delete files' warning to a proper fix_problem error

In pass 3, convert the "delete files and re-run e2fsck" message to a
proper error code for more consistent error reporting and to make
translation easier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fuzz: fix build problems on macosx and i386 linux

Fix clang warnings about forgotten header files, dead code, and pwrite
support on OS X. The unistd.h inclusion also fixes a parameter
truncation bug on i386.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: flush out the superblock and bitmaps before printing final messages

A user who sees the message

***** REBOOT LINUX *****

or

***** FILE SYSTEM WAS MODIFIED *****

might think that e2fsck was complete even though we haven't finished
writing out the superblock or bitmap blocks, and then either forcibly
reboot or power cycle the box, or yank the USB key out while the
storage device is still being written (before e2fsck exits).

So rearrange the exit path of e2fsck so that we flush out the dirty
superblock/bg descriptors/bitmaps before we print the final message.
Also clean up this code so that the flow of control is easier to
understand, and add error checking to catch any errors (normally
caused by I/O errors writing to the disk) for these final writebacks.

Addresses-Debian-Bugs: #757543, #757544
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Dan Jacobson <jidanni@jidanni.org>

tests: add the r_meta_bg_shrink test

This test checks to make sure resize2fs can properly handle a file
system which started life as a normal ext4 file system and then was
grown to a size where meta_bg was enabled, and then shrunk back below
the point where the meta_bg format is still needed.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: add f_first_meta_bg_too_big test

The test verifies that e2fsck can properly fix a file system where the
value of s_first_meta_bg in the superblock is larger than the number
of block group descriptors in the file system. E2fsck will fix this
by clearing the meta_bg feature.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: make sure MKE2FS_FIRST_META_BG is unset while running tests

If the developer has set the MKE2FS_FIRST_META_BG environment
variable, this can cause test failures.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

resize2fs: disable the meta_bg feature if necessary

When shrinking a file system, if the number block groups drops below
the point where we started using the meta_bg layout, disable the
meta_bg feature and set s_first_meta_bg to zero. This is necessary to
avoid creating an invalid/corrupted file system after the shrink.

Addresses-Debian-Bug: #756922

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Marcin Wolcendorf <antymat+debian@chelmska.waw.pl>
Tested-by: Marcin Wolcendorf <antymat+debian@chelmska.waw.pl>

e2fsck: fix file systems with an overly large s_first_meta_bg

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: avoid buffer overflow if s_first_meta_bg is too big

If s_first_meta_bg is greater than the of number block group
descriptor blocks, then reading or writing the block group descriptors
will end up overruning the memory buffer allocated for the
descriptors. Fix this by limiting first_meta_bg to no more than
fs->desc_blocks. This doesn't correct the bad s_first_meta_bg value,
but it avoids causing the e2fsprogs userspace programs from
potentially crashing.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge branch 'maint' into next

Conflicts:
configure

libext2fs: have UNIX IO manager use pread64/pwrite64

Commit baa3544609da3c ("libext2fs: have UNIX IO manager use
pread/pwrite) causes a breakage on 32-bit systems where off_t is
32-bits for file systems larger than 4GB. Fix this by using
pread64/pwrite64 if possible, and if pread64/pwrite64 is not present,
using pread/pwrite only if the size of off_t is at least as big as
ext2_loff_t.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

debugfs: teach rdump to take multiple source arguments

[ modified to update man page by tytso ]

Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

debugfs: refactor do_rdump()

No behaviour changes. This will simplify the next commit.

Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

debugfs: fix double-close bug in "rdump" and "dump -p"

Previously, both of these usages called dump_file() with a true value as
the "preserve" argument, which caused it to in turn call fix_perms() to
make the permissions on the locally-dumped file match those found on the
ext2 filesystem. fix_perms() then attempted to close(2) the file descriptor
(if any) before returning (though it didn't attempt to report on any errors
found while doing so).

However, in both of these situations, the local file being dumped had been
opened by the caller of dump_file(), which also closes it (and reports on
any errors detected when closing). This meant that both "rdump" and "dump
-p" would then emit a spurious EBADF message when trying to re-close the
local file descriptor.

Deleting the spurious close(2) call in fix_perms() fixes the problem in both
commands.

Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

debugfs: be more specific in error messages

Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: place metadata blocks in the last flex_bg so they are contiguous

Place the allocation bitmaps and inode table blocks so they are
adjacent, even in the last flexbg.

Previously, after running "mke2fs -t ext4 DEV 286720", the layout of
the last few block groups would look like this:

Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262145 (+0), Inode bitmap at 262161 (+16)
  Inode table at 262177-262432 (+32)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262162 (bg #32 + 17)
  Inode table at 262433-262688 (bg #32 + 288)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262163 (bg #32 + 18)
  Inode table at 262689-262944 (bg #32 + 544)

Now, they look like this:

Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262145 (+0), Inode bitmap at 262148 (+3)
  Inode table at 262151-262406 (+6)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262149 (bg #32 + 4)
  Inode table at 262407-262662 (bg #32 + 262)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262150 (bg #32 + 5)
  Inode table at 262663-262918 (bg #32 + 518)

This reduces the free space fragmentation in a freshly created file
system.  It also allows the following mke2fs command to succeed:

mke2fs -t ext4 -b 4096 -O ^resize_inode -G $((2**20)) DEV 2130483

(Note that while this allows people to run mke2fs with insanely large
flexbg sizes, this is not a recommended practice, as the kernel may
refuse to resize such a file system while mounted, since it currently
tries to allocate an in-memory data structure based on the size of the
flexbg, and so a file system with a very large flexbg size will cause
the memory allocation to fail.  This will hopefully be fixed in a
future kernel release, but if the goal is to force all of the metadata
blocks to be at the beginning of the file system, it's better to use
the packed_meta_blocks configuration parameter in mke2fs.conf.)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Revert "mke2fs: prevent creation of unmountable ext4 with large flex_bg count"

This reverts commit d988201ef9cb6f7b521e544061976ab4270a3f89.

The problem with this commit is that causes common small file system
configurations to fail.  For example:

    mke2fs -O flex_bg -b 4096 -I 1024 -F /tmp/tt 79106
    mke2fs 1.42.11 (09-Jul-2014)
    /tmp/tt: Invalid argument passed to ext2 library while setting
             up superblock

This check in ext2fs_initialize() was added to prevent the metadata
from being allocated beyond the end of the filesystem, but it is
also causing a wide range of failures for small filesystems.

We'll address this in a different way, by using a smarter algorithm
for deciding the layout of metadata blocks for the last flex block
group.

Reported-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

.gitignore: Add misc/e2fuzz file

This patch adds the misc/e2fuzz tool executable file to .gitignore.

Signed-off-by: Artemiy Volkov <artemiyv@acm.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: add regression tests for inodes with bad checksums

Add regression tests to e2fsck to examine how it deals with inode
table blocks which (a) have been zero'd; (b) have been one'd; (c) have
corrupt inodes with obvious problems; and (d) have inodes with
non-obvious problems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: add regression tests for group descriptors with bad checksums

Add tests to examine how e2fsck deals with (a) the block bitmap being
corrupt; (b) the inode bitmap being corrupt; (c) the bitmap checksums
being incorrect (but the bitmaps are fine); and (d) the group
descriptor checksum itself is incorrect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: add regression tests for superblocks with bad checksums

Add regression tests to examine how e2fsck deals with random
superblock corruption such as obviously wrong fields and the checksum
itself being incorrect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: add regression tests for MMP blocks with bad checksums

Add regression tests to examine how e2fsck deals with MMP blocks with
(a) a bad magic number; and (b) an incorrect checksum.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: add tests for directory entry blocks with checksum errors

Add some regression tests to examine how e2fsck handles directory
entry blocks and htree blocks with (a) malformed directory entries;
(b) incorrect checksums; or (c) obviously garbage entries.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: add tests for handling of corrupt extents

Add some regression tests to examine how e2fsck deals with (a) extent
blocks with only a bad checksum; (b) extent blocks with a bad magic
number; and (c) extent entries with corruption.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

tests: add regression tests for EA blocks with bad checksums

Add regression tests for e2fsck dealing with (a) EA block with a bad
checksum; (b) EA block with a bad magic number; and (c) EA block with
damage that isn't otherwise noticeable.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: always ask to fix an inode that fails checksum verification

If an inode fails checksum verification during pass 1 and the user
doesn't fix or clear the inode as part of the regular inode checks,
ensure that e2fsck remembers to ask the user if he simply wants to
correct the checksum.

We weren't capturing all the ways out of an interation of the inode
scanning loop, which means that not all errors were caught. Also,
we might as well clear the 'failed csum' flag if we write the inode
directly from the inode scanning loop.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: disable checksum verification in a few select places

Selectively disable checksum verification in a couple more places:

In check_blocks, disable checksum verification when iterating a block
map because the block map iterator function (re)reads the inode, which
could be unchanged since the scan found that the checksum fails.  We
don't want to abort here; we want to keep evaluating the inode, and we
already know if the inode checksum doesn't match.

Further down in check_blocks when we're trying to see if i_size
matches the amount of data stored in the inode, don't allow checksum
errors when we go looking for the size of inline data.  If the
required attribute is at all find-able in the EA block, we'll fix any
other problems with the EA block later.  In the meantime, we don't
want to be truncating files unnecessarily.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

libext2fs: don't cache inodes that fail checksum verification

If an inode fails checksum verification, don't stuff a copy of it in
the inode cache, because this can cause the library to fail to return
the "corrupt inode" error code.

In general, this happens if ext2fs_read_inode_full() is called twice
on an inode with an incorrect checksum.  If fs->flags has
EXT2_FLAG_IGNORE_CSUM_ERRORS set during the first call and *unset*
during the second call, the cache hit during the second call fails to
return EXT2_ET_INODE_CSUM_INVALID as you'd expect.  This happens
during fsck because the first read_inode call happens as part of
check_blocks and the second call happens during inode checksum
revalidation.  A file system with a slightly corrupt non-extent inode
will trigger this.

While we're at it, make the inode read function consistent with the
rest of libext2fs -- copy the metadata object into the caller's buffer
even if it fails checksum verification.  This will help e2fsck avoid a
double re-read later on down the line.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: correctly preserve fs flags when modifying ignore-csum-error flag

When we need to modify the "ignore checksum error" behavior flag to
get us past a library call, it's possible that the library call can
result in other flag bits being changed.  Therefore, it is not correct
to restore unconditionally the previous flags value, since this will
have unintended side effects on the other fs->flags; nor is it correct
to assume that we can unconditionally set (or clear) the "ignore csum
error" flag bit.  Therefore, we must merge the previous value of the
"ignore csum error" flag with the value of flags after the call.

Note that we want to leave checksum verification on as much as
possible because doing so exposes e2fsck bugs where two metadata
blocks are "sharing" the same disk block, and attempting to fix one
before relocating the other causes major filesystem damage.  The
damage is much more obvious when a previously checked piece of
metadata suddenly fails in a subsequent pass.

The modifications to the pass 2, 3, and 3A code are justified as
follows: When e2fsck encounters a block of directory entries and
cannot find the placeholder entry at the end that contains the
checksum, it will try to insert the placeholder.  If that fails, it
will schedule the directory for a pass 3A reconstruction.  Until that
happens, we don't want directory block writing (pass 2), block
iteration (pass 3), or block reading (pass 3A) to fail due to checksum
errors, because failing to find the placeholder is itself a checksum
verification error, which causes e2fsck to abort without fixing
anything.

The e2fsck call to ext2fs_read_bitmaps must never fail due to a
checksum error because e2fsck subsequently (a) verifies the bitmaps
itself; or (b) decides that they don't match what has been observed,
and rewrites them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: offer to clear inode table blocks that are insane

Add a new behavior flag to the inode scan functions; when specified,
this flag will do some simple sanity checking of entire inode table
blocks.  If all the checksums are ok, we can skip checksum
verification on individual inodes later on.  If more than half of the
inodes look "insane" (bad extent tree root or checksum failure) then
ext2fs_get_next_inode_full() can return a special status code
indicating that what's in the buffer is probably garbage.

When e2fsck' inode scan encounters the 'inode is garbage' return code
it'll offer to zap the inode straightaway instead of trying to recover
anything.  This replaces the previous behavior of asking to zap
anything with a checksum error (strict_csum).

Signed-off-by: Darrick J. Wong <darrick.wong@orale.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e2fsck: try to salvage corrupt directory entry blocks

Remove the code that would prompt the user to zap directory entry
blocks with bad checksums (i.e. strict_csums). Instead, we'll run the
directory entries through the usual repair routines in an attempt to
save whatever we can. At the same time, refactor the code that
schedules the repair of missing dirblock checksum entries.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>