Whamcloud - gitweb
LU-15722 osd-ldiskfs: fix IO write gets stuck for 64K PAGE_SIZE 04/47004/2
authorXinliang Liu <xinliang.liu@linaro.org>
Wed, 6 Apr 2022 08:06:33 +0000 (08:06 +0000)
committerOleg Drokin <green@whamcloud.com>
Mon, 30 May 2022 19:04:48 +0000 (19:04 +0000)
commit176ea3a4599ede8b1a0c91506dcd34bc162f2959
tree3526253704d65835c9804ea948bee9ce14663685
parent06dd5a4638dd36640b146d4388c09a322873760b
LU-15722 osd-ldiskfs: fix IO write gets stuck for 64K PAGE_SIZE

This fixes below IO write stuck issue:
-----
[606895.151765] LustreError:
334886:0:(ofd_io.c:1389:ofd_commitrw_write()) lustre-OST0000: restart IO
write too many times: 10000
[606895.207345] LustreError:
334886:0:(ofd_io.c:1389:ofd_commitrw_write()) Skipped 8 previous similar
messages
-------

Which goes into an infinite loop:
ofd_commitrw_write()->osd_write_commit()->osd_ldiskfs_map_inode_pages()
    ->ldiskfs_map_blocks()->ofd_commitrw_write()

The cause is that:
For 64K PAGE_SIZE blocks allocation/mapping. m_lblk should be the
first un-allocated block if m_lblk points at an already allocated
block when create = 1, ldiskfs_map_blocks() will just return with
already allocated blocks and without allocating any new requested
blocks for the extent.

This stuck issue won't happen on 4K PAGE_SIZE. Because for
PAGE_SIZE = blocksize case, if m_lblk points at an already
allocated block it will point at an un-allocated block in next
restart transaction, because the already mapped block/page will
be filtered out in next restart transaction via flag
OBD_BRW_DONE in osd_declare_write_commit().

Change-Id: Iadba0be8875a15a2e2f158ec9571f5ece5637ae0
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/47004
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/osd-ldiskfs/osd_io.c