Whamcloud - gitweb
LU-10472 osd-ldiskfs: T10PI between RPC and BIO 66/32266/15
authorLi Xi <lixi@ddn.com>
Sun, 8 Apr 2018 12:21:13 +0000 (08:21 -0400)
committerOleg Drokin <green@whamcloud.com>
Tue, 6 Nov 2018 07:13:24 +0000 (07:13 +0000)
When OST recieves bulk write RPC, the T10PI guard tag will be
generated during the process of calculating RPC checksum with
T10PI type. Guard tags of each sector will be copied to the
BIO integrity payload to avoid recalculating of guard tags.

When OST reads data from disk, the T10PI guard tags will be
copied from BIO integrity payload. These guard tags will be
reused for calculation the RPC checksum with T10PI type, thus
no recalcuating of guard tags is needed either.

However, if the data that the client is reading is cached
in memory, the guard tags need to be calculated based on the
cached data, since there is no place to plug the guard tags
to the page cache on OSS.

Some modification to Linux kernel is needed:

1) We can pass “struct bio *” and  to the integrity
generate/verify methods, and struct blk_integrity_exchg
has bi_idx which is the current bio_vec index.

2) bio_integrity_prep accepts optional pointers to integrity
generation/verification methods. The optional methods take
priority over the ones registered by the device.

These two modification enable Lustre (and other file systems) to
integrate with BIO for integrity verification/generation. Any private
data need during data integrity generation/verification process can
be attached to bio->bi_private. Instead of calculating guard tags,
Lustre generation method will copy the guard tags from existing
buffer. And instead of (or besides of) data integrity verification,
Lustre verification method will copy the guard tags to internal
buffer for further usage.

Besides of these changes, two Linux kernel patches are applied:

1) The first problem is that bio_integrity_verify() doesn't verify
the data integrity at all. In that function, after reading the data,
bio->bi_idx will be equal to bio->bi_vcnt because of bio_advance(),
so bio_for_each_segment_all() should be used, not
bio_for_each_segment(). And also, bio_advance() should not change
the integrity data bio_integrity_advance() unless the BIO is being
trimmed.
Linux-commit: 63573e359d052e506d305c263576499f06355985

2) The second patch fixes a problem of the sd_dif_complete(). When
sector offset is larger then 2^32, the mapping from physical
reference tag to the virtual values expected by block layer will be
wrong.
Linux-commit: c611529e7cd3465ec0eada0f44200e8420c38908

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ia6c1d586284b0d9884116e1a753fd88e066366fe
Reviewed-on: https://review.whamcloud.com/32266
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 files changed:
lustre/autoconf/lustre-core.m4
lustre/include/lustre_compat.h
lustre/include/obd.h
lustre/include/obd_support.h
lustre/kernel_patches/patches/block-integrity-allow-optional-integrity-functions-rhel7.patch [new file with mode: 0644]
lustre/kernel_patches/patches/block-pass-bio-into-integrity_processing_fn-rhel7.patch [new file with mode: 0644]
lustre/kernel_patches/patches/fix-integrity-verify-rhel7.patch [new file with mode: 0644]
lustre/kernel_patches/patches/fix-sd-dif-complete-rhel7.patch [new file with mode: 0644]
lustre/kernel_patches/series/3.10-rhel7.series
lustre/osd-ldiskfs/Makefile.in
lustre/osd-ldiskfs/osd_handler.c
lustre/osd-ldiskfs/osd_integrity.c [new file with mode: 0644]
lustre/osd-ldiskfs/osd_internal.h
lustre/osd-ldiskfs/osd_io.c
lustre/target/tgt_handler.c

index 0f2ce66..625913e 100644 (file)
@@ -2343,6 +2343,44 @@ EXTRA_KCFLAGS="$tmp_flags"
 ]) # LC_HAVE_XATTR_HANDLER_SIMPLIFIED
 
 #
 ]) # LC_HAVE_XATTR_HANDLER_SIMPLIFIED
 
 #
+# LC_HAVE_BIP_ITER_BIO_INTEGRITY_PAYLOAD
+#
+# 4.3 replace interval with interval_exp in 'struct blk_integrity'.
+#
+AC_DEFUN([LC_HAVE_BIP_ITER_BIO_INTEGRITY_PAYLOAD], [
+LB_CHECK_COMPILE([if 'bio_integrity_payload.bip_iter' exist],
+bio_integrity_payload_bip_iter, [
+       #include <linux/bio.h>
+],[
+       ((struct bio_integrity_payload *)0)->bip_iter.bi_size = 0;
+],[
+       AC_DEFINE(HAVE_BIP_ITER_BIO_INTEGRITY_PAYLOAD, 1,
+               [bio_integrity_payload.bip_iter exist])
+])
+]) # LC_HAVE_BIP_ITER_BIO_INTEGRITY_PAYLOAD
+
+#
+# LC_BIO_INTEGRITY_PREP_FN
+#
+# Lustre kernel patch extents bio_integrity_prep to accept optional
+# generate/verify_fn as extra args.
+#
+AC_DEFUN([LC_BIO_INTEGRITY_PREP_FN], [
+LB_CHECK_COMPILE([if 'bio_integrity_prep_fn' exists],
+bio_integrity_prep_fn, [
+       #include <linux/bio.h>
+],[
+       bio_integrity_prep_fn(NULL, NULL, NULL);
+],[
+       AC_DEFINE(HAVE_BIO_INTEGRITY_PREP_FN, 1,
+               [kernel has bio_integrity_prep_fn])
+       AC_SUBST(PATCHED_INTEGRITY_INTF)
+],[
+       AC_SUBST(PATCHED_INTEGRITY_INTF, [#])
+])
+]) # LC_BIO_INTEGRITY_PREP_FN
+
+#
 # LC_HAVE_LOCKS_LOCK_FILE_WAIT
 #
 # 4.4 kernel have moved locks API users to
 # LC_HAVE_LOCKS_LOCK_FILE_WAIT
 #
 # 4.4 kernel have moved locks API users to
@@ -3093,6 +3131,7 @@ AC_DEFUN([LC_PROG_LINUX], [
 
        # 4.3
        LC_HAVE_INTERVAL_EXP_BLK_INTEGRITY
 
        # 4.3
        LC_HAVE_INTERVAL_EXP_BLK_INTEGRITY
+       LC_HAVE_BIP_ITER_BIO_INTEGRITY_PAYLOAD
        LC_HAVE_CACHE_HEAD_HLIST
        LC_HAVE_XATTR_HANDLER_SIMPLIFIED
 
        LC_HAVE_CACHE_HEAD_HLIST
        LC_HAVE_XATTR_HANDLER_SIMPLIFIED
 
@@ -3153,6 +3192,9 @@ AC_DEFUN([LC_PROG_LINUX], [
        LC_PAGEVEC_INIT_ONE_PARAM
        LC_BI_BDEV
 
        LC_PAGEVEC_INIT_ONE_PARAM
        LC_BI_BDEV
 
+       # kernel patch to extend integrity interface
+       LC_BIO_INTEGRITY_PREP_FN
+
        #
        AS_IF([test "x$enable_server" != xno], [
                LC_STACK_SIZE
        #
        AS_IF([test "x$enable_server" != xno], [
                LC_STACK_SIZE
index 7ca150b..66a5bb7 100644 (file)
@@ -178,6 +178,12 @@ static inline void ll_set_fs_pwd(struct fs_struct *fs, struct vfsmount *mnt,
 #define bvl_to_page(bvl)               (bvl->bv_page)
 #endif
 
 #define bvl_to_page(bvl)               (bvl->bv_page)
 #endif
 
+#ifdef HAVE_BVEC_ITER
+#define bio_start_sector(bio) (bio->bi_iter.bi_sector)
+#else
+#define bio_start_sector(bio) (bio->bi_sector)
+#endif
+
 #ifndef HAVE_BLK_QUEUE_MAX_SEGMENTS
 #define blk_queue_max_segments(rq, seg)                      \
         do { blk_queue_max_phys_segments(rq, seg);           \
 #ifndef HAVE_BLK_QUEUE_MAX_SEGMENTS
 #define blk_queue_max_segments(rq, seg)                      \
         do { blk_queue_max_phys_segments(rq, seg);           \
@@ -724,4 +730,49 @@ static inline const char *blk_integrity_name(struct blk_integrity *bi)
 }
 #endif
 
 }
 #endif
 
+static inline unsigned int bip_size(struct bio_integrity_payload *bip)
+{
+#ifdef HAVE_BIP_ITER_BIO_INTEGRITY_PAYLOAD
+       return bip->bip_iter.bi_size;
+#else
+       return bip->bip_size;
+#endif
+}
+
+#ifndef INTEGRITY_FLAG_READ
+#define INTEGRITY_FLAG_READ BLK_INTEGRITY_VERIFY
+#endif
+
+#ifndef INTEGRITY_FLAG_WRITE
+#define INTEGRITY_FLAG_WRITE BLK_INTEGRITY_GENERATE
+#endif
+
+static inline bool bdev_integrity_enabled(struct block_device *bdev, int rw)
+{
+       struct blk_integrity *bi = bdev_get_integrity(bdev);
+
+       if (bi == NULL)
+               return false;
+
+#ifdef HAVE_INTERVAL_EXP_BLK_INTEGRITY
+       if (rw == 0 && bi->profile->verify_fn != NULL &&
+           (bi->flags & INTEGRITY_FLAG_READ))
+               return true;
+
+       if (rw == 1 && bi->profile->generate_fn != NULL &&
+           (bi->flags & INTEGRITY_FLAG_WRITE))
+               return true;
+#else
+       if (rw == 0 && bi->verify_fn != NULL &&
+           (bi->flags & INTEGRITY_FLAG_READ))
+               return true;
+
+       if (rw == 1 && bi->generate_fn != NULL &&
+           (bi->flags & INTEGRITY_FLAG_WRITE))
+               return true;
+#endif
+
+       return false;
+}
+
 #endif /* _LUSTRE_COMPAT_H */
 #endif /* _LUSTRE_COMPAT_H */
index d005aec..200a0f3 100644 (file)
@@ -451,6 +451,9 @@ struct lmv_obd {
        struct kobject          *lmv_tgts_kobj;
 };
 
        struct kobject          *lmv_tgts_kobj;
 };
 
+/* Minimum sector size is 512 */
+#define MAX_GUARD_NUMBER (PAGE_SIZE / 512)
+
 struct niobuf_local {
        __u64           lnb_file_offset;
        __u32           lnb_page_offset;
 struct niobuf_local {
        __u64           lnb_file_offset;
        __u32           lnb_page_offset;
@@ -459,6 +462,9 @@ struct niobuf_local {
        int             lnb_rc;
        struct page     *lnb_page;
        void            *lnb_data;
        int             lnb_rc;
        struct page     *lnb_page;
        void            *lnb_data;
+       __u16           lnb_guards[MAX_GUARD_NUMBER];
+       __u16           lnb_guard_rpc:1;
+       __u16           lnb_guard_disk:1;
 };
 
 struct tgt_thread_big_cache {
 };
 
 struct tgt_thread_big_cache {
index f84c3a5..6cbdad0 100644 (file)
@@ -335,6 +335,7 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_OST_SKIP_LV_CHECK      0x241
 #define OBD_FAIL_OST_STATFS_DELAY       0x242
 #define OBD_FAIL_OST_INTEGRITY_FAULT    0x243
 #define OBD_FAIL_OST_SKIP_LV_CHECK      0x241
 #define OBD_FAIL_OST_STATFS_DELAY       0x242
 #define OBD_FAIL_OST_INTEGRITY_FAULT    0x243
+#define OBD_FAIL_OST_INTEGRITY_CMP      0x244
 
 #define OBD_FAIL_LDLM                    0x300
 #define OBD_FAIL_LDLM_NAMESPACE_NEW      0x301
 
 #define OBD_FAIL_LDLM                    0x300
 #define OBD_FAIL_LDLM_NAMESPACE_NEW      0x301
diff --git a/lustre/kernel_patches/patches/block-integrity-allow-optional-integrity-functions-rhel7.patch b/lustre/kernel_patches/patches/block-integrity-allow-optional-integrity-functions-rhel7.patch
new file mode 100644 (file)
index 0000000..1adc032
--- /dev/null
@@ -0,0 +1,232 @@
+This adds optional integrity functions for given bio, they are
+passsed to bio_integrity_prep and initialized in
+bio_integrity_payload.
+The optional integrity generate/verify functions take priority
+over the ones registered on the block device.
+
+It brings flexibility to bio integrity handling. e.g. a network
+filesystem with integrity support would have integrity
+generation happen on the clients, and send them over the wire.
+On the server side once we receive the integrity bits and pass
+the network layer checksums we would merely pass it on to the
+block devices have integrity support, so we don't have to
+calculate the integrity again.
+Verification shares the same principle: on the server we just
+copy the integrity bits from the device and send them through
+the wire, then the verification happens on the clients.
+
+Index: linux-3.10.0-862.9.1.el7/fs/bio-integrity.c
+===================================================================
+--- linux-3.10.0-862.9.1.el7.orig/fs/bio-integrity.c
++++ linux-3.10.0-862.9.1.el7/fs/bio-integrity.c
+@@ -38,7 +38,7 @@ void blk_flush_integrity(void)
+ }
+ /**
+- * bio_integrity_alloc - Allocate integrity payload and attach it to bio
++ * bio_integrity_alloc_fn - Allocate integrity payload and attach it to bio
+  * @bio:      bio to attach integrity metadata to
+  * @gfp_mask: Memory allocation mask
+  * @nr_vecs:  Number of integrity metadata scatter-gather elements
+@@ -47,9 +47,11 @@ void blk_flush_integrity(void)
+  * metadata.  nr_vecs specifies the maximum number of pages containing
+  * integrity metadata that can be attached.
+  */
+-struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
+-                                                gfp_t gfp_mask,
+-                                                unsigned int nr_vecs)
++struct bio_integrity_payload *bio_integrity_alloc_fn(struct bio *bio,
++                                                   gfp_t gfp_mask,
++                                                   unsigned int nr_vecs,
++                                                   integrity_gen_fn *generate_fn,
++                                                   integrity_vrfy_fn *verify_fn)
+ {
+       struct bio_integrity_payload *bip;
+       struct bio_set *bs = bio->bi_pool;
+@@ -81,6 +83,8 @@ struct bio_integrity_payload *bio_integr
+       bip->bip_slab = idx;
+       bip->bip_bio = bio;
++      bip->bip_generate_fn = generate_fn;
++      bip->bip_verify_fn = verify_fn;
+       bio->bi_integrity = bip;
+       return bip;
+@@ -88,7 +92,7 @@ err:
+       mempool_free(bip, bs->bio_integrity_pool);
+       return NULL;
+ }
+-EXPORT_SYMBOL(bio_integrity_alloc);
++EXPORT_SYMBOL(bio_integrity_alloc_fn);
+ /**
+  * bio_integrity_free - Free bio integrity payload
+@@ -312,10 +316,13 @@ static void bio_integrity_generate(struc
+ {
+       struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+       struct blk_integrity_exchg bix;
++      struct bio_integrity_payload *bip = bio->bi_integrity;
+       struct bio_vec *bv;
+       sector_t sector = bio->bi_sector;
+       unsigned int i, sectors, total;
+       void *prot_buf = bio->bi_integrity->bip_buf;
++      integrity_gen_fn *generate_fn = bip->bip_generate_fn ?:
++                                      bi->generate_fn;
+       total = 0;
+       bix.disk_name = bio->bi_bdev->bd_disk->disk_name;
+@@ -328,7 +335,7 @@ static void bio_integrity_generate(struc
+               bix.prot_buf = prot_buf;
+               bix.sector = sector;
+-              bi->generate_fn(&bix);
++              generate_fn(&bix);
+               sectors = bv->bv_len / bi->sector_size;
+               sector += sectors;
+@@ -349,7 +356,7 @@ static inline unsigned short blk_integri
+ }
+ /**
+- * bio_integrity_prep - Prepare bio for integrity I/O
++ * bio_integrity_prep_fn - Prepare bio for integrity I/O
+  * @bio:      bio to prepare
+  *
+  * Description: Allocates a buffer for integrity metadata, maps the
+@@ -359,7 +366,8 @@ static inline unsigned short blk_integri
+  * block device's integrity function.  In the READ case, the buffer
+  * will be prepared for DMA and a suitable end_io handler set up.
+  */
+-int bio_integrity_prep(struct bio *bio)
++int bio_integrity_prep_fn(struct bio *bio, integrity_gen_fn *generate_fn,
++                        integrity_vrfy_fn *verify_fn)
+ {
+       struct bio_integrity_payload *bip;
+       struct blk_integrity *bi;
+@@ -390,7 +398,8 @@ int bio_integrity_prep(struct bio *bio)
+       nr_pages = end - start;
+       /* Allocate bio integrity payload and integrity vectors */
+-      bip = bio_integrity_alloc(bio, GFP_NOIO, nr_pages);
++      bip = bio_integrity_alloc_fn(bio, GFP_NOIO, nr_pages,
++                                   generate_fn, verify_fn);
+       if (unlikely(bip == NULL)) {
+               printk(KERN_ERR "could not allocate data integrity bioset\n");
+               kfree(buf);
+@@ -440,7 +449,7 @@ int bio_integrity_prep(struct bio *bio)
+       return 0;
+ }
+-EXPORT_SYMBOL(bio_integrity_prep);
++EXPORT_SYMBOL(bio_integrity_prep_fn);
+ /**
+  * bio_integrity_verify - Verify integrity metadata for a bio
+@@ -454,10 +463,13 @@ static int bio_integrity_verify(struct b
+ {
+       struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+       struct blk_integrity_exchg bix;
++      struct bio_integrity_payload *bip = bio->bi_integrity;
+       struct bio_vec *bv;
+       sector_t sector = bio->bi_integrity->bip_sector;
+       unsigned int i, sectors, total, ret;
+       void *prot_buf = bio->bi_integrity->bip_buf;
++      integrity_vrfy_fn *verify_fn = bip->bip_verify_fn ?:
++                                      bi->verify_fn;
+       ret = total = 0;
+       bix.disk_name = bio->bi_bdev->bd_disk->disk_name;
+@@ -474,7 +486,7 @@ static int bio_integrity_verify(struct b
+               bix.prot_buf = prot_buf;
+               bix.sector = sector;
+-              ret = bi->verify_fn(&bix);
++              ret = verify_fn(&bix);
+               if (ret) {
+                       kunmap_atomic(kaddr);
+@@ -711,7 +723,9 @@ int bio_integrity_clone(struct bio *bio,
+       BUG_ON(bip_src == NULL);
+-      bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_vcnt);
++      bip = bio_integrity_alloc_fn(bio, gfp_mask, bip_src->bip_vcnt,
++                                   bip_src->bip_generate_fn,
++                                   bip_src->bip_verify_fn);
+       if (bip == NULL)
+               return -EIO;
+Index: linux-3.10.0-862.9.1.el7/include/linux/bio.h
+===================================================================
+--- linux-3.10.0-862.9.1.el7.orig/include/linux/bio.h
++++ linux-3.10.0-862.9.1.el7/include/linux/bio.h
+@@ -194,6 +194,9 @@ struct bio_integrity_payload {
+       struct work_struct      bip_work;       /* I/O completion */
++      integrity_gen_fn        *bip_generate_fn;
++      integrity_vrfy_fn       *bip_verify_fn;
++
+       struct bio_vec          *bip_vec;
+       struct bio_vec          bip_inline_vecs[0];/* embedded bvec array */
+ };
+@@ -617,13 +620,28 @@ struct biovec_slab {
+ #define bio_integrity(bio) (bio->bi_integrity != NULL)
+-extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
++extern struct bio_integrity_payload *bio_integrity_alloc_fn(struct bio *, gfp_t,
++                                                          unsigned int,
++                                                          integrity_gen_fn *,
++                                                          integrity_vrfy_fn *);
++static inline struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
++                                                              gfp_t gfp,
++                                                              unsigned int nr)
++{
++      return bio_integrity_alloc_fn(bio, gfp, nr, NULL, NULL);
++}
+ extern void bio_integrity_free(struct bio *);
+ extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
+ extern int bio_integrity_enabled(struct bio *bio);
+ extern int bio_integrity_set_tag(struct bio *, void *, unsigned int);
+ extern int bio_integrity_get_tag(struct bio *, void *, unsigned int);
+-extern int bio_integrity_prep(struct bio *);
++extern int bio_integrity_prep_fn(struct bio *,
++                               integrity_gen_fn *,
++                               integrity_vrfy_fn *);
++static inline int bio_integrity_prep(struct bio *bio)
++{
++      return bio_integrity_prep_fn(bio, NULL, NULL);
++}
+ extern void bio_integrity_endio(struct bio *, int);
+ extern void bio_integrity_advance(struct bio *, unsigned int);
+ extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
+Index: linux-3.10.0-862.9.1.el7/include/linux/blk_types.h
+===================================================================
+--- linux-3.10.0-862.9.1.el7.orig/include/linux/blk_types.h
++++ linux-3.10.0-862.9.1.el7/include/linux/blk_types.h
+@@ -16,8 +16,11 @@ struct page;
+ struct block_device;
+ struct io_context;
+ struct cgroup_subsys_state;
++struct blk_integrity_exchg;
+ typedef void (bio_end_io_t) (struct bio *, int);
+ typedef void (bio_destructor_t) (struct bio *);
++typedef void (integrity_gen_fn) (struct blk_integrity_exchg *);
++typedef int (integrity_vrfy_fn) (struct blk_integrity_exchg *);
+ /*
+  * was unsigned short, but we might as well be ready for > 64kB I/O pages
+Index: linux-3.10.0-862.9.1.el7/include/linux/blkdev.h
+===================================================================
+--- linux-3.10.0-862.9.1.el7.orig/include/linux/blkdev.h
++++ linux-3.10.0-862.9.1.el7/include/linux/blkdev.h
+@@ -1702,8 +1702,6 @@ struct blk_integrity_exchg {
+       const char              *disk_name;
+ };
+-typedef void (integrity_gen_fn) (struct blk_integrity_exchg *);
+-typedef int (integrity_vrfy_fn) (struct blk_integrity_exchg *);
+ typedef void (integrity_set_tag_fn) (void *, void *, unsigned int);
+ typedef void (integrity_get_tag_fn) (void *, void *, unsigned int);
diff --git a/lustre/kernel_patches/patches/block-pass-bio-into-integrity_processing_fn-rhel7.patch b/lustre/kernel_patches/patches/block-pass-bio-into-integrity_processing_fn-rhel7.patch
new file mode 100644 (file)
index 0000000..610374c
--- /dev/null
@@ -0,0 +1,41 @@
+Having struct bio allows us to do more in the genrate/verify_fn,
+like copying a known good guard tag already available rather than
+calculating it.
+
+Index: linux-3.10.0-862.9.1.el7/fs/bio-integrity.c
+===================================================================
+--- linux-3.10.0-862.9.1.el7.orig/fs/bio-integrity.c
++++ linux-3.10.0-862.9.1.el7/fs/bio-integrity.c
+@@ -334,6 +334,8 @@ static void bio_integrity_generate(struc
+               bix.data_size = bv->bv_len;
+               bix.prot_buf = prot_buf;
+               bix.sector = sector;
++              bix.bio = bio;
++              bix.bi_idx = i;
+               generate_fn(&bix);
+@@ -485,6 +487,8 @@ static int bio_integrity_verify(struct b
+               bix.data_size = bv->bv_len;
+               bix.prot_buf = prot_buf;
+               bix.sector = sector;
++              bix.bio = bio;
++              bix.bi_idx = i;
+               ret = verify_fn(&bix);
+Index: linux-3.10.0-862.9.1.el7/include/linux/blkdev.h
+===================================================================
+--- linux-3.10.0-862.9.1.el7.orig/include/linux/blkdev.h
++++ linux-3.10.0-862.9.1.el7/include/linux/blkdev.h
+@@ -1696,8 +1696,10 @@ static inline uint64_t rq_io_start_time_
+ struct blk_integrity_exchg {
+       void                    *prot_buf;
+       void                    *data_buf;
++      struct bio              *bio;
+       sector_t                sector;
+       unsigned int            data_size;
++      unsigned int            bi_idx;
+       unsigned short          sector_size;
+       const char              *disk_name;
+ };
diff --git a/lustre/kernel_patches/patches/fix-integrity-verify-rhel7.patch b/lustre/kernel_patches/patches/fix-integrity-verify-rhel7.patch
new file mode 100644 (file)
index 0000000..fba6736
--- /dev/null
@@ -0,0 +1,50 @@
+bio_integrity_verify() doesn't verify the data integrity at all.
+In that function, after reading the data, bio->bi_idx will be
+equal to bio->bi_vcnt because of bio_advance(),
+so bio_for_each_segment_all() should be used, not
+bio_for_each_segment().
+bio_advance() should not change the integrity data
+bio_integrity_advance() unless the BIO is being trimmed.
+Linux-commit: 63573e359d052e506d305c263576499f06355985
+
+Index: linux-3.10.0-693.21.1.el7.x86_64/fs/bio.c
+===================================================================
+--- linux-3.10.0-693.21.1.el7.x86_64.orig/fs/bio.c
++++ linux-3.10.0-693.21.1.el7.x86_64/fs/bio.c
+@@ -870,9 +870,6 @@ EXPORT_SYMBOL(submit_bio_wait);
+  */
+ void bio_advance(struct bio *bio, unsigned bytes)
+ {
+-      if (bio_integrity(bio))
+-              bio_integrity_advance(bio, bytes);
+-
+       bio->bi_sector += bytes >> 9;
+       bio->bi_size -= bytes;
+@@ -1973,6 +1970,9 @@ void bio_trim(struct bio *bio, int offse
+       clear_bit(BIO_SEG_VALID, &bio->bi_flags);
++      if (bio_integrity(bio))
++              bio_integrity_advance(bio, offset << 9);
++
+       bio_advance(bio, offset << 9);
+       bio->bi_size = size;
+Index: linux-3.10.0-693.21.1.el7.x86_64/fs/bio-integrity.c
+===================================================================
+--- linux-3.10.0-693.21.1.el7.x86_64.orig/fs/bio-integrity.c
++++ linux-3.10.0-693.21.1.el7.x86_64/fs/bio-integrity.c
+@@ -463,7 +463,11 @@ static int bio_integrity_verify(struct b
+       bix.disk_name = bio->bi_bdev->bd_disk->disk_name;
+       bix.sector_size = bi->sector_size;
+-      bio_for_each_segment(bv, bio, i) {
++      /*
++       * bio->bi_idx might be equal to bio->bi_vcnt after __bio_advance(),
++       * So use bio_for_each_segment_all() not bio_for_each_segment().
++       */
++      bio_for_each_segment_all(bv, bio, i) {
+               void *kaddr = kmap_atomic(bv->bv_page);
+               bix.data_buf = kaddr + bv->bv_offset;
+               bix.data_size = bv->bv_len;
diff --git a/lustre/kernel_patches/patches/fix-sd-dif-complete-rhel7.patch b/lustre/kernel_patches/patches/fix-sd-dif-complete-rhel7.patch
new file mode 100644 (file)
index 0000000..43703fd
--- /dev/null
@@ -0,0 +1,30 @@
+When sector offset is larger then 2^32, the mapping from physical
+reference tag to the virtual values expected by block layer will be
+wrong.
+Linux-commit: c611529e7cd3465ec0eada0f44200e8420c38908
+
+Index: linux-3.10.0-693.21.1.el7.x86_64/drivers/scsi/sd_dif.c
+===================================================================
+--- linux-3.10.0-693.21.1.el7.x86_64.orig/drivers/scsi/sd_dif.c
++++ linux-3.10.0-693.21.1.el7.x86_64/drivers/scsi/sd_dif.c
+@@ -416,6 +416,7 @@ void sd_dif_complete(struct scsi_cmnd *s
+       struct sd_dif_tuple *sdt;
+       unsigned int i, j, sectors, sector_sz;
+       u32 phys, virt;
++      sector_t sector;
+       sdkp = scsi_disk(scmd->request->rq_disk);
+@@ -425,9 +426,10 @@ void sd_dif_complete(struct scsi_cmnd *s
+       sector_sz = scmd->device->sector_size;
+       sectors = good_bytes / sector_sz;
+-      phys = blk_rq_pos(scmd->request) & 0xffffffff;
++      sector = blk_rq_pos(scmd->request);
+       if (sector_sz == 4096)
+-              phys >>= 3;
++              sector >>= 3;
++      phys = sector & 0xffffffff;
+       __rq_for_each_bio(bio, scmd->request) {
+               struct bio_vec *iv;
index 1a1b7bd..880e819 100644 (file)
@@ -2,3 +2,7 @@ raid5-mmp-unplug-dev-3.9.patch
 dev_read_only-3.7.patch
 blkdev_tunables-3.9.patch
 vfs-project-quotas-rhel7.patch
 dev_read_only-3.7.patch
 blkdev_tunables-3.9.patch
 vfs-project-quotas-rhel7.patch
+fix-integrity-verify-rhel7.patch
+fix-sd-dif-complete-rhel7.patch
+block-integrity-allow-optional-integrity-functions-rhel7.patch
+block-pass-bio-into-integrity_processing_fn-rhel7.patch
index a7ffd31..3dddef0 100644 (file)
@@ -3,6 +3,8 @@ osd_ldiskfs-objs = osd_handler.o osd_oi.o osd_lproc.o osd_iam.o \
                   osd_iam_lfix.o osd_iam_lvar.o osd_io.o osd_compat.o \
                   osd_scrub.o osd_dynlocks.o osd_quota.o osd_quota_fmt.o
 
                   osd_iam_lfix.o osd_iam_lvar.o osd_io.o osd_compat.o \
                   osd_scrub.o osd_dynlocks.o osd_quota.o osd_quota_fmt.o
 
+@PATCHED_INTEGRITY_INTF@osd_ldiskfs-objs += osd_integrity.o
+
 EXTRA_PRE_CFLAGS := -I@LINUX@/fs -I@abs_top_builddir@ -I@abs_top_builddir@/ldiskfs
 
 @INCLUDE_RULES@
 EXTRA_PRE_CFLAGS := -I@LINUX@/fs -I@abs_top_builddir@ -I@abs_top_builddir@/ldiskfs
 
 @INCLUDE_RULES@
index 86c6fbf..f6174ef 100644 (file)
@@ -2314,22 +2314,32 @@ static void osd_conf_get(const struct lu_env *env,
                            sizeof("T10-DIF-TYPE") - 1) == 0) {
                        /* also skip "1/3-" at end */
                        const int type_off = sizeof("T10-DIF-TYPE.");
                            sizeof("T10-DIF-TYPE") - 1) == 0) {
                        /* also skip "1/3-" at end */
                        const int type_off = sizeof("T10-DIF-TYPE.");
+                       char type_number = name[type_off - 2];
 
 
-                       if (interval != 512 && interval != 4096)
+                       if (interval != 512 && interval != 4096) {
                                CERROR("%s: unsupported T10PI sector size %u\n",
                                       d->od_svname, interval);
                                CERROR("%s: unsupported T10PI sector size %u\n",
                                       d->od_svname, interval);
-                       else if (strcmp(name + type_off, "CRC") == 0)
+                       } else if (type_number != '1' && type_number != '3') {
+                               CERROR("%s: unsupported T10PI type %s\n",
+                                      d->od_svname, name);
+                       } else if (strcmp(name + type_off, "CRC") == 0) {
+                               d->od_t10_type = type_number == '1' ?
+                                       OSD_T10_TYPE1_CRC : OSD_T10_TYPE3_CRC;
                                param->ddp_t10_cksum_type = interval == 512 ?
                                        OBD_CKSUM_T10CRC512 :
                                        OBD_CKSUM_T10CRC4K;
                                param->ddp_t10_cksum_type = interval == 512 ?
                                        OBD_CKSUM_T10CRC512 :
                                        OBD_CKSUM_T10CRC4K;
-                       else if (strcmp(name + type_off, "IP") == 0)
+                       } else if (strcmp(name + type_off, "IP") == 0) {
+                               d->od_t10_type = type_number == '1' ?
+                                       OSD_T10_TYPE1_IP : OSD_T10_TYPE3_IP;
                                param->ddp_t10_cksum_type = interval == 512 ?
                                        OBD_CKSUM_T10IP512 :
                                        OBD_CKSUM_T10IP4K;
                                param->ddp_t10_cksum_type = interval == 512 ?
                                        OBD_CKSUM_T10IP512 :
                                        OBD_CKSUM_T10IP4K;
-                       else
+                       } else {
                                CERROR("%s: unsupported checksum type of "
                                       "T10PI type '%s'",
                                       d->od_svname, name);
                                CERROR("%s: unsupported checksum type of "
                                       "T10PI type '%s'",
                                       d->od_svname, name);
+                       }
+
                } else {
                        CERROR("%s: unsupported T10PI type '%s'",
                               d->od_svname, name);
                } else {
                        CERROR("%s: unsupported T10PI type '%s'",
                               d->od_svname, name);
@@ -7328,6 +7338,7 @@ static void osd_key_fini(const struct lu_context *ctx,
        OBD_FREE(info->oti_it_ea_buf, OSD_IT_EA_BUFSIZE);
        lu_buf_free(&info->oti_iobuf.dr_pg_buf);
        lu_buf_free(&info->oti_iobuf.dr_bl_buf);
        OBD_FREE(info->oti_it_ea_buf, OSD_IT_EA_BUFSIZE);
        lu_buf_free(&info->oti_iobuf.dr_pg_buf);
        lu_buf_free(&info->oti_iobuf.dr_bl_buf);
+       lu_buf_free(&info->oti_iobuf.dr_lnb_buf);
        lu_buf_free(&info->oti_big_buf);
        if (idc != NULL) {
                LASSERT(info->oti_ins_cache_size > 0);
        lu_buf_free(&info->oti_big_buf);
        if (idc != NULL) {
                LASSERT(info->oti_ins_cache_size > 0);
@@ -7692,6 +7703,7 @@ static int osd_device_init0(const struct lu_env *env,
        INIT_LIST_HEAD(&o->od_index_restore_list);
        spin_lock_init(&o->od_lock);
        o->od_index_backup_policy = LIBP_NONE;
        INIT_LIST_HEAD(&o->od_index_restore_list);
        spin_lock_init(&o->od_lock);
        o->od_index_backup_policy = LIBP_NONE;
+       o->od_t10_type = 0;
 
        o->od_read_cache = 1;
        o->od_writethrough_cache = 1;
 
        o->od_read_cache = 1;
        o->od_writethrough_cache = 1;
diff --git a/lustre/osd-ldiskfs/osd_integrity.c b/lustre/osd-ldiskfs/osd_integrity.c
new file mode 100644 (file)
index 0000000..bb2ce93
--- /dev/null
@@ -0,0 +1,264 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2018, DataDirect Networks Storage.
+ * Author: Li Xi.
+ *
+ * Data integrity functions for OSD
+ * Codes copied from kernel 3.10.0-862.el7
+ * drivers/scsi/sd_dif.c and block/t10-pi.c
+ */
+#include <linux/blkdev.h>
+#include <linux/blk_types.h>
+
+#include <obd_cksum.h>
+#include <lustre_compat.h>
+
+#include "osd_internal.h"
+
+/*
+ * Data Integrity Field tuple.
+ */
+struct sd_dif_tuple {
+       __be16 guard_tag;        /* Checksum */
+       __be16 app_tag;          /* Opaque storage */
+       __be32 ref_tag;          /* Target LBA or indirect LBA */
+};
+
+/*
+ * Type 1 and Type 2 protection use the same format: 16 bit guard tag,
+ * 16 bit app tag, 32 bit reference tag.
+ */
+static void osd_dif_type1_generate(struct blk_integrity_exchg *bix,
+                                  obd_dif_csum_fn *fn)
+{
+       void *buf = bix->data_buf;
+       struct sd_dif_tuple *sdt = bix->prot_buf;
+       struct bio *bio = bix->bio;
+       struct osd_bio_private *bio_private = bio->bi_private;
+       struct osd_iobuf *iobuf = bio_private->obp_iobuf;
+       int index = bio_private->obp_start_page_idx + bix->bi_idx;
+       struct niobuf_local *lnb = iobuf->dr_lnbs[index];
+       __u16 *guard_buf = lnb->lnb_guards;
+       sector_t sector = bix->sector;
+       unsigned int i;
+
+       for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
+               if (lnb->lnb_guard_rpc) {
+                       sdt->guard_tag = *guard_buf;
+                       guard_buf++;
+               } else
+                       sdt->guard_tag = fn(buf, bix->sector_size);
+               sdt->ref_tag = cpu_to_be32(sector & 0xffffffff);
+               sdt->app_tag = 0;
+
+               buf += bix->sector_size;
+               sector++;
+       }
+}
+
+static void osd_dif_type1_generate_crc(struct blk_integrity_exchg *bix)
+{
+       osd_dif_type1_generate(bix, obd_dif_crc_fn);
+}
+
+static void osd_dif_type1_generate_ip(struct blk_integrity_exchg *bix)
+{
+       osd_dif_type1_generate(bix, obd_dif_ip_fn);
+}
+
+static int osd_dif_type1_verify(struct blk_integrity_exchg *bix,
+                               obd_dif_csum_fn *fn)
+{
+       void *buf = bix->data_buf;
+       struct sd_dif_tuple *sdt = bix->prot_buf;
+       struct bio *bio = bix->bio;
+       struct osd_bio_private *bio_private = bio->bi_private;
+       struct osd_iobuf *iobuf = bio_private->obp_iobuf;
+       int index = bio_private->obp_start_page_idx + bix->bi_idx;
+       struct niobuf_local *lnb = iobuf->dr_lnbs[index];
+       __u16 *guard_buf = lnb->lnb_guards;
+       sector_t sector = bix->sector;
+       unsigned int i;
+       __u16 csum;
+
+       for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
+               /* Unwritten sectors */
+               if (sdt->app_tag == 0xffff)
+                       return 0;
+
+               if (be32_to_cpu(sdt->ref_tag) != (sector & 0xffffffff)) {
+                       CERROR("%s: ref tag error on sector %lu (rcvd %u)\n",
+                              bix->disk_name, (unsigned long)sector,
+                              be32_to_cpu(sdt->ref_tag));
+                       return -EIO;
+               }
+
+               csum = fn(buf, bix->sector_size);
+
+               if (sdt->guard_tag != csum) {
+                       CERROR("%s: guard tag error on sector %lu " \
+                              "(rcvd %04x, data %04x)\n", bix->disk_name,
+                              (unsigned long)sector,
+                              be16_to_cpu(sdt->guard_tag), be16_to_cpu(csum));
+                       return -EIO;
+               }
+
+               *guard_buf = csum;
+               guard_buf++;
+
+               buf += bix->sector_size;
+               sector++;
+       }
+
+       lnb->lnb_guard_disk = 1;
+       return 0;
+}
+
+static int osd_dif_type1_verify_crc(struct blk_integrity_exchg *bix)
+{
+       return osd_dif_type1_verify(bix, obd_dif_crc_fn);
+}
+
+static int osd_dif_type1_verify_ip(struct blk_integrity_exchg *bix)
+{
+       return osd_dif_type1_verify(bix, obd_dif_ip_fn);
+}
+
+/*
+ * Type 3 protection has a 16-bit guard tag and 16 + 32 bits of opaque
+ * tag space.
+ */
+static void osd_dif_type3_generate(struct blk_integrity_exchg *bix,
+                                  obd_dif_csum_fn *fn)
+{
+       void *buf = bix->data_buf;
+       struct sd_dif_tuple *sdt = bix->prot_buf;
+       struct bio *bio = bix->bio;
+       struct osd_bio_private *bio_private = bio->bi_private;
+       struct osd_iobuf *iobuf = bio_private->obp_iobuf;
+       int index = bio_private->obp_start_page_idx + bix->bi_idx;
+       struct niobuf_local *lnb = iobuf->dr_lnbs[index];
+       __u16 *guard_buf = lnb->lnb_guards;
+       unsigned int i;
+
+       for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
+               if (lnb->lnb_guard_rpc) {
+                       sdt->guard_tag = *guard_buf;
+                       guard_buf++;
+               } else
+                       sdt->guard_tag = fn(buf, bix->sector_size);
+               sdt->ref_tag = 0;
+               sdt->app_tag = 0;
+
+               buf += bix->sector_size;
+       }
+}
+
+static void osd_dif_type3_generate_crc(struct blk_integrity_exchg *bix)
+{
+       osd_dif_type3_generate(bix, obd_dif_crc_fn);
+}
+
+static void osd_dif_type3_generate_ip(struct blk_integrity_exchg *bix)
+{
+       osd_dif_type3_generate(bix, obd_dif_ip_fn);
+}
+
+static int osd_dif_type3_verify(struct blk_integrity_exchg *bix,
+                               obd_dif_csum_fn *fn)
+{
+       void *buf = bix->data_buf;
+       struct sd_dif_tuple *sdt = bix->prot_buf;
+       struct bio *bio = bix->bio;
+       struct osd_bio_private *bio_private = bio->bi_private;
+       struct osd_iobuf *iobuf = bio_private->obp_iobuf;
+       int index = bio_private->obp_start_page_idx + bix->bi_idx;
+       struct niobuf_local *lnb = iobuf->dr_lnbs[index];
+       __u16 *guard_buf = lnb->lnb_guards;
+       sector_t sector = bix->sector;
+       unsigned int i;
+       __u16 csum;
+
+       for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
+               /* Unwritten sectors */
+               if (sdt->app_tag == 0xffff && sdt->ref_tag == 0xffffffff)
+                       return 0;
+
+               csum = fn(buf, bix->sector_size);
+
+               if (sdt->guard_tag != csum) {
+                       CERROR("%s: guard tag error on sector %lu " \
+                              "(rcvd %04x, data %04x)\n", bix->disk_name,
+                              (unsigned long)sector,
+                              be16_to_cpu(sdt->guard_tag), be16_to_cpu(csum));
+                       return -EIO;
+               }
+
+               *guard_buf = csum;
+               guard_buf++;
+
+               buf += bix->sector_size;
+               sector++;
+       }
+
+       lnb->lnb_guard_disk = 1;
+       return 0;
+}
+
+static int osd_dif_type3_verify_crc(struct blk_integrity_exchg *bix)
+{
+       return osd_dif_type3_verify(bix, obd_dif_crc_fn);
+}
+
+static int osd_dif_type3_verify_ip(struct blk_integrity_exchg *bix)
+{
+       return osd_dif_type3_verify(bix, obd_dif_ip_fn);
+}
+
+int osd_get_integrity_profile(struct osd_device *osd,
+                             integrity_gen_fn **generate_fn,
+                             integrity_vrfy_fn **verify_fn)
+{
+       switch (osd->od_t10_type) {
+       case OSD_T10_TYPE1_CRC:
+               *verify_fn = osd_dif_type1_verify_crc;
+               *generate_fn = osd_dif_type1_generate_crc;
+               break;
+       case OSD_T10_TYPE3_CRC:
+               *verify_fn = osd_dif_type3_verify_crc;
+               *generate_fn = osd_dif_type3_generate_crc;
+               break;
+       case OSD_T10_TYPE1_IP:
+               *verify_fn = osd_dif_type1_verify_ip;
+               *generate_fn = osd_dif_type1_generate_ip;
+               break;
+       case OSD_T10_TYPE3_IP:
+               *verify_fn = osd_dif_type3_verify_ip;
+               *generate_fn = osd_dif_type3_generate_ip;
+               break;
+       default:
+               return -ENOTSUPP;
+       }
+
+       return 0;
+}
index 0306fc3..4cf0191 100644 (file)
@@ -238,6 +238,14 @@ struct osd_obj_orphan {
        __u32 oor_ino;
 };
 
        __u32 oor_ino;
 };
 
+enum osd_t10_type {
+       OSD_T10_TYPE_UNKNOWN = 0,
+       OSD_T10_TYPE1_CRC,
+       OSD_T10_TYPE3_CRC,
+       OSD_T10_TYPE1_IP,
+       OSD_T10_TYPE3_IP
+};
+
 /*
  * osd device.
  */
 /*
  * osd device.
  */
@@ -315,6 +323,8 @@ struct osd_device {
        struct inode            *od_index_backup_inode;
        enum lustre_index_backup_policy od_index_backup_policy;
        int                      od_index_backup_stop;
        struct inode            *od_index_backup_inode;
        enum lustre_index_backup_policy od_index_backup_policy;
        int                      od_index_backup_stop;
+       /* T10PI type, zero if not supported  */
+       enum osd_t10_type        od_t10_type;
 };
 
 enum osd_full_scrub_ratio {
 };
 
 enum osd_full_scrub_ratio {
@@ -521,7 +531,9 @@ struct osd_iobuf {
        unsigned int       dr_rw:1;
        struct lu_buf      dr_pg_buf;
        struct page      **dr_pages;
        unsigned int       dr_rw:1;
        struct lu_buf      dr_pg_buf;
        struct page      **dr_pages;
+       struct niobuf_local     **dr_lnbs;
        struct lu_buf      dr_bl_buf;
        struct lu_buf      dr_bl_buf;
+       struct lu_buf      dr_lnb_buf;
        sector_t          *dr_blocks;
        ktime_t            dr_start_time;
        ktime_t            dr_elapsed;  /* how long io took */
        sector_t          *dr_blocks;
        ktime_t            dr_start_time;
        ktime_t            dr_elapsed;  /* how long io took */
@@ -1423,4 +1435,16 @@ void osd_execute_truncate(struct osd_object *obj);
  */
 #define LDISKFS_XATTR_MAX_LARGE_EA_SIZE    (1024 * 1024)
 
  */
 #define LDISKFS_XATTR_MAX_LARGE_EA_SIZE    (1024 * 1024)
 
+struct osd_bio_private {
+       struct osd_iobuf        *obp_iobuf;
+       /* Start page index in the obp_iobuf for the bio */
+       int                      obp_start_page_idx;
+};
+
+#ifdef HAVE_BIO_INTEGRITY_PREP_FN
+int osd_get_integrity_profile(struct osd_device *osd,
+                             integrity_gen_fn **generate_fn,
+                             integrity_vrfy_fn **verify_fn);
+#endif
+
 #endif /* _OSD_INTERNAL_H */
 #endif /* _OSD_INTERNAL_H */
index 856695e..3598aef 100644 (file)
@@ -117,6 +117,12 @@ static int __osd_init_iobuf(struct osd_device *d, struct osd_iobuf *iobuf,
        if (unlikely(iobuf->dr_pages == NULL))
                return -ENOMEM;
 
        if (unlikely(iobuf->dr_pages == NULL))
                return -ENOMEM;
 
+       lu_buf_realloc(&iobuf->dr_lnb_buf,
+                      pages * sizeof(iobuf->dr_lnbs[0]));
+       iobuf->dr_lnbs = iobuf->dr_lnb_buf.lb_buf;
+       if (unlikely(iobuf->dr_lnbs == NULL))
+               return -ENOMEM;
+
        iobuf->dr_max_pages = pages;
 
        return 0;
        iobuf->dr_max_pages = pages;
 
        return 0;
@@ -124,10 +130,13 @@ static int __osd_init_iobuf(struct osd_device *d, struct osd_iobuf *iobuf,
 #define osd_init_iobuf(dev, iobuf, rw, pages) \
        __osd_init_iobuf(dev, iobuf, rw, __LINE__, pages)
 
 #define osd_init_iobuf(dev, iobuf, rw, pages) \
        __osd_init_iobuf(dev, iobuf, rw, __LINE__, pages)
 
-static void osd_iobuf_add_page(struct osd_iobuf *iobuf, struct page *page)
+static void osd_iobuf_add_page(struct osd_iobuf *iobuf,
+                              struct niobuf_local *lnb)
 {
 {
-        LASSERT(iobuf->dr_npages < iobuf->dr_max_pages);
-        iobuf->dr_pages[iobuf->dr_npages++] = page;
+       LASSERT(iobuf->dr_npages < iobuf->dr_max_pages);
+       iobuf->dr_pages[iobuf->dr_npages] = lnb->lnb_page;
+       iobuf->dr_lnbs[iobuf->dr_npages] = lnb;
+       iobuf->dr_npages++;
 }
 
 void osd_fini_iobuf(struct osd_device *d, struct osd_iobuf *iobuf)
 }
 
 void osd_fini_iobuf(struct osd_device *d, struct osd_iobuf *iobuf)
@@ -306,6 +315,164 @@ static void bio_integrity_fault_inject(struct bio *bio)
        }
 }
 
        }
 }
 
+static int bio_dif_compare(__u16 *expected_guard_buf, void *bio_prot_buf,
+                          unsigned int sectors, int tuple_size)
+{
+       __u16 *expected_guard;
+       __u16 *bio_guard;
+       int i;
+
+       expected_guard = expected_guard_buf;
+       for (i = 0; i < sectors; i++) {
+               bio_guard = (__u16 *)bio_prot_buf;
+               if (*bio_guard != *expected_guard) {
+                       CERROR("unexpected guard tags on sector %d "
+                              "expected guard %u, bio guard "
+                              "%u, sectors %u, tuple size %d\n",
+                              i, *expected_guard, *bio_guard, sectors,
+                              tuple_size);
+                       return -EIO;
+               }
+               expected_guard++;
+               bio_prot_buf += tuple_size;
+       }
+       return 0;
+}
+
+static int osd_bio_integrity_compare(struct bio *bio, struct osd_iobuf *iobuf,
+                                    int index)
+{
+       struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+       struct bio_integrity_payload *bip = bio->bi_integrity;
+       struct niobuf_local *lnb;
+       unsigned short sector_size = blk_integrity_interval(bi);
+       void *bio_prot_buf = page_address(bip->bip_vec->bv_page) +
+               bip->bip_vec->bv_offset;
+       struct bio_vec *bv;
+       sector_t sector = bio_start_sector(bio);
+       unsigned int i, sectors, total;
+       __u16 *expected_guard;
+       int rc;
+
+       total = 0;
+       bio_for_each_segment_all(bv, bio, i) {
+               lnb = iobuf->dr_lnbs[index];
+               expected_guard = lnb->lnb_guards;
+               sectors = bv->bv_len / sector_size;
+               if (lnb->lnb_guard_rpc) {
+                       rc = bio_dif_compare(expected_guard, bio_prot_buf,
+                                            sectors, bi->tuple_size);
+                       if (rc)
+                               return rc;
+               }
+
+               sector += sectors;
+               bio_prot_buf += sectors * bi->tuple_size;
+               total += sectors * bi->tuple_size;
+               LASSERT(total <= bip_size(bio->bi_integrity));
+               index++;
+       }
+       return 0;
+}
+
+static int osd_bio_integrity_handle(struct osd_device *osd, struct bio *bio,
+                                   struct osd_iobuf *iobuf,
+                                   int start_page_idx, bool fault_inject,
+                                   bool integrity_enabled)
+{
+       int rc;
+#ifdef HAVE_BIO_INTEGRITY_PREP_FN
+       integrity_gen_fn *generate_fn = NULL;
+       integrity_vrfy_fn *verify_fn = NULL;
+#endif
+
+       ENTRY;
+
+       if (!integrity_enabled)
+               RETURN(0);
+
+#ifdef HAVE_BIO_INTEGRITY_PREP_FN
+       rc = osd_get_integrity_profile(osd, &generate_fn, &verify_fn);
+       if (rc)
+               RETURN(rc);
+
+       rc = bio_integrity_prep_fn(bio, generate_fn, verify_fn);
+#else
+       rc = bio_integrity_prep(bio);
+#endif
+       if (rc)
+               RETURN(rc);
+
+       /* Verify and inject fault only when writing */
+       if (iobuf->dr_rw == 1) {
+               if (unlikely(OBD_FAIL_CHECK(OBD_FAIL_OST_INTEGRITY_CMP))) {
+                       rc = osd_bio_integrity_compare(bio, iobuf,
+                                                      start_page_idx);
+                       if (rc)
+                               RETURN(rc);
+               }
+
+               if (unlikely(fault_inject))
+                       bio_integrity_fault_inject(bio);
+       }
+
+       RETURN(0);
+}
+
+#ifdef HAVE_BIO_INTEGRITY_PREP_FN
+#  ifdef HAVE_BIO_ENDIO_USES_ONE_ARG
+static void dio_integrity_complete_routine(struct bio *bio)
+{
+#  else
+static void dio_integrity_complete_routine(struct bio *bio, int error)
+{
+#  endif
+       struct osd_bio_private *bio_private = bio->bi_private;
+
+       bio->bi_private = bio_private->obp_iobuf;
+#  ifdef HAVE_BIO_ENDIO_USES_ONE_ARG
+       dio_complete_routine(bio);
+#  else
+       dio_complete_routine(bio, error);
+#  endif
+
+       OBD_FREE_PTR(bio_private);
+}
+#endif
+
+static int osd_bio_init(struct bio *bio, struct osd_iobuf *iobuf,
+                       bool integrity_enabled, int start_page_idx,
+                       struct osd_bio_private **pprivate)
+{
+#ifdef HAVE_BIO_INTEGRITY_PREP_FN
+       struct osd_bio_private *bio_private;
+
+       ENTRY;
+
+       *pprivate = NULL;
+       if (integrity_enabled) {
+               OBD_ALLOC_GFP(bio_private, sizeof(*bio_private), GFP_NOIO);
+               if (bio_private == NULL)
+                       RETURN(-ENOMEM);
+               bio->bi_end_io = dio_integrity_complete_routine;
+               bio->bi_private = bio_private;
+               bio_private->obp_start_page_idx = start_page_idx;
+               bio_private->obp_iobuf = iobuf;
+               *pprivate = bio_private;
+       } else {
+               bio->bi_end_io = dio_complete_routine;
+               bio->bi_private = iobuf;
+       }
+       RETURN(0);
+#else
+       ENTRY;
+
+       bio->bi_end_io = dio_complete_routine;
+       bio->bi_private = iobuf;
+       RETURN(0);
+#endif
+}
+
 static int osd_do_bio(struct osd_device *osd, struct inode *inode,
                       struct osd_iobuf *iobuf)
 {
 static int osd_do_bio(struct osd_device *osd, struct inode *inode,
                       struct osd_iobuf *iobuf)
 {
@@ -314,9 +481,13 @@ static int osd_do_bio(struct osd_device *osd, struct inode *inode,
        int npages = iobuf->dr_npages;
        sector_t *blocks = iobuf->dr_blocks;
        int total_blocks = npages * blocks_per_page;
        int npages = iobuf->dr_npages;
        sector_t *blocks = iobuf->dr_blocks;
        int total_blocks = npages * blocks_per_page;
-       int sector_bits = inode->i_sb->s_blocksize_bits - 9;
-       unsigned int blocksize = inode->i_sb->s_blocksize;
+       struct super_block *sb = inode->i_sb;
+       int sector_bits = sb->s_blocksize_bits - 9;
+       unsigned int blocksize = sb->s_blocksize;
+       struct block_device *bdev = sb->s_bdev;
+       struct osd_bio_private *bio_private = NULL;
        struct bio *bio = NULL;
        struct bio *bio = NULL;
+       int bio_start_page_idx;
        struct page *page;
        unsigned int page_offset;
        sector_t sector;
        struct page *page;
        unsigned int page_offset;
        sector_t sector;
@@ -326,12 +497,15 @@ static int osd_do_bio(struct osd_device *osd, struct inode *inode,
        int i;
        int rc = 0;
        bool fault_inject;
        int i;
        int rc = 0;
        bool fault_inject;
+       bool integrity_enabled;
        DECLARE_PLUG(plug);
        ENTRY;
 
        fault_inject = OBD_FAIL_CHECK(OBD_FAIL_OST_INTEGRITY_FAULT);
         LASSERT(iobuf->dr_npages == npages);
 
        DECLARE_PLUG(plug);
        ENTRY;
 
        fault_inject = OBD_FAIL_CHECK(OBD_FAIL_OST_INTEGRITY_FAULT);
         LASSERT(iobuf->dr_npages == npages);
 
+       integrity_enabled = bdev_integrity_enabled(bdev, iobuf->dr_rw);
+
        osd_brw_stats_update(osd, iobuf);
        iobuf->dr_start_time = ktime_get();
 
        osd_brw_stats_update(osd, iobuf);
        iobuf->dr_start_time = ktime_get();
 
@@ -386,20 +560,19 @@ static int osd_do_bio(struct osd_device *osd, struct inode *inode,
                                        bio_phys_segments(q, bio),
                                        queue_max_phys_segments(q),
                                       0, queue_max_hw_segments(q));
                                        bio_phys_segments(q, bio),
                                        queue_max_phys_segments(q),
                                       0, queue_max_hw_segments(q));
-                               if (bio_integrity_enabled(bio)) {
-                                       if (bio_integrity_prep(bio)) {
-                                               bio_put(bio);
-                                               rc = -EIO;
-                                               goto out;
-                                       }
-                                       if (unlikely(fault_inject))
-                                               bio_integrity_fault_inject(bio);
+                               rc = osd_bio_integrity_handle(osd, bio,
+                                       iobuf, bio_start_page_idx,
+                                       fault_inject, integrity_enabled);
+                               if (rc) {
+                                       bio_put(bio);
+                                       goto out;
                                }
 
                                record_start_io(iobuf, bi_size);
                                osd_submit_bio(iobuf->dr_rw, bio);
                        }
 
                                }
 
                                record_start_io(iobuf, bi_size);
                                osd_submit_bio(iobuf->dr_rw, bio);
                        }
 
+                       bio_start_page_idx = page_idx;
                        /* allocate new bio */
                        bio = bio_alloc(GFP_NOIO, min(BIO_MAX_PAGES,
                                                      (npages - page_idx) *
                        /* allocate new bio */
                        bio = bio_alloc(GFP_NOIO, min(BIO_MAX_PAGES,
                                                      (npages - page_idx) *
@@ -412,15 +585,19 @@ static int osd_do_bio(struct osd_device *osd, struct inode *inode,
                                 goto out;
                         }
 
                                 goto out;
                         }
 
-                       bio_set_dev(bio, inode->i_sb->s_bdev);
+                       bio_set_dev(bio, bdev);
                        bio_set_sector(bio, sector);
 #ifdef HAVE_BI_RW
                        bio->bi_rw = (iobuf->dr_rw == 0) ? READ : WRITE;
 #else
                        bio->bi_opf = (iobuf->dr_rw == 0) ? READ : WRITE;
 #endif
                        bio_set_sector(bio, sector);
 #ifdef HAVE_BI_RW
                        bio->bi_rw = (iobuf->dr_rw == 0) ? READ : WRITE;
 #else
                        bio->bi_opf = (iobuf->dr_rw == 0) ? READ : WRITE;
 #endif
-                       bio->bi_end_io = dio_complete_routine;
-                       bio->bi_private = iobuf;
+                       rc = osd_bio_init(bio, iobuf, integrity_enabled,
+                                         bio_start_page_idx, &bio_private);
+                       if (rc) {
+                               bio_put(bio);
+                               goto out;
+                       }
 
                        rc = bio_add_page(bio, page,
                                          blocksize * nblocks, page_offset);
 
                        rc = bio_add_page(bio, page,
                                          blocksize * nblocks, page_offset);
@@ -429,14 +606,13 @@ static int osd_do_bio(struct osd_device *osd, struct inode *inode,
        }
 
        if (bio != NULL) {
        }
 
        if (bio != NULL) {
-               if (bio_integrity_enabled(bio)) {
-                       if (bio_integrity_prep(bio)) {
-                               bio_put(bio);
-                               rc = -EIO;
-                               goto out;
-                       }
-                       if (unlikely(fault_inject))
-                               bio_integrity_fault_inject(bio);
+               rc = osd_bio_integrity_handle(osd, bio, iobuf,
+                                             bio_start_page_idx,
+                                             fault_inject,
+                                             integrity_enabled);
+               if (rc) {
+                       bio_put(bio);
+                       goto out;
                }
 
                record_start_io(iobuf, bio_sectors(bio) << 9);
                }
 
                record_start_io(iobuf, bio_sectors(bio) << 9);
@@ -457,8 +633,13 @@ out:
                osd_fini_iobuf(osd, iobuf);
        }
 
                osd_fini_iobuf(osd, iobuf);
        }
 
-       if (rc == 0)
+       if (rc == 0) {
                rc = iobuf->dr_error;
                rc = iobuf->dr_error;
+       } else {
+               if (bio_private)
+                       OBD_FREE_PTR(bio_private);
+       }
+
        RETURN(rc);
 }
 
        RETURN(rc);
 }
 
@@ -482,6 +663,8 @@ static int osd_map_remote_to_local(loff_t offset, ssize_t len, int *nrpages,
                lnb->lnb_flags = 0;
                lnb->lnb_page = NULL;
                lnb->lnb_rc = 0;
                lnb->lnb_flags = 0;
                lnb->lnb_page = NULL;
                lnb->lnb_rc = 0;
+               lnb->lnb_guard_rpc = 0;
+               lnb->lnb_guard_disk = 0;
 
                 LASSERTF(plen <= len, "plen %u, len %lld\n", plen,
                          (long long) len);
 
                 LASSERTF(plen <= len, "plen %u, len %lld\n", plen,
                          (long long) len);
@@ -1177,7 +1360,7 @@ static int osd_write_prep(const struct lu_env *env, struct dt_object *dt,
                        continue;
 
                if (maxidx >= lnb[i].lnb_page->index) {
                        continue;
 
                if (maxidx >= lnb[i].lnb_page->index) {
-                       osd_iobuf_add_page(iobuf, lnb[i].lnb_page);
+                       osd_iobuf_add_page(iobuf, &lnb[i]);
                } else {
                        long off;
                        char *p = kmap(lnb[i].lnb_page);
                } else {
                        long off;
                        char *p = kmap(lnb[i].lnb_page);
@@ -1434,7 +1617,7 @@ static int osd_write_commit(const struct lu_env *env, struct dt_object *dt,
 
                SetPageUptodate(lnb[i].lnb_page);
 
 
                SetPageUptodate(lnb[i].lnb_page);
 
-               osd_iobuf_add_page(iobuf, lnb[i].lnb_page);
+               osd_iobuf_add_page(iobuf, &lnb[i]);
         }
 
        osd_trans_exec_op(env, thandle, OSD_OT_WRITE);
         }
 
        osd_trans_exec_op(env, thandle, OSD_OT_WRITE);
@@ -1530,7 +1713,7 @@ static int osd_read_prep(const struct lu_env *env, struct dt_object *dt,
                        cache_hits++;
                } else {
                        cache_misses++;
                        cache_hits++;
                } else {
                        cache_misses++;
-                       osd_iobuf_add_page(iobuf, lnb[i].lnb_page);
+                       osd_iobuf_add_page(iobuf, &lnb[i]);
                }
 
                if (cache == 0)
                }
 
                if (cache == 0)
index fe7f26e..a28ab03 100644 (file)
@@ -1948,6 +1948,7 @@ static int tgt_checksum_niobuf_t10pi(struct lu_target *tgt,
                                     int sector_size,
                                     u32 *check_sum)
 {
                                     int sector_size,
                                     u32 *check_sum)
 {
+       enum cksum_types t10_cksum_type = tgt->lut_dt_conf.ddp_t10_cksum_type;
        unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP);
        const char *obd_name = tgt->lut_obd->obd_name;
        struct cfs_crypto_hash_desc *hdesc;
        unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP);
        const char *obd_name = tgt->lut_obd->obd_name;
        struct cfs_crypto_hash_desc *hdesc;
@@ -2011,14 +2012,35 @@ static int tgt_checksum_niobuf_t10pi(struct lu_target *tgt,
                 * The left guard number should be able to hold checksums of a
                 * whole page
                 */
                 * The left guard number should be able to hold checksums of a
                 * whole page
                 */
-               rc = obd_page_dif_generate_buffer(obd_name,
-                       local_nb[i].lnb_page,
-                       local_nb[i].lnb_page_offset & ~PAGE_MASK,
-                       local_nb[i].lnb_len, guard_start + used_number,
-                       guard_number - used_number, &used, sector_size,
-                       fn);
-               if (rc)
-                       break;
+               if (t10_cksum_type && opc == OST_READ &&
+                   local_nb[i].lnb_guard_disk) {
+                       used = DIV_ROUND_UP(local_nb[i].lnb_len, sector_size);
+                       if (used > (guard_number - used_number)) {
+                               rc = -E2BIG;
+                               break;
+                       }
+                       memcpy(guard_start + used_number,
+                              local_nb[i].lnb_guards,
+                              used * sizeof(*local_nb[i].lnb_guards));
+               } else {
+                       rc = obd_page_dif_generate_buffer(obd_name,
+                               local_nb[i].lnb_page,
+                               local_nb[i].lnb_page_offset & ~PAGE_MASK,
+                               local_nb[i].lnb_len, guard_start + used_number,
+                               guard_number - used_number, &used, sector_size,
+                               fn);
+                       if (rc)
+                               break;
+               }
+
+               LASSERT(used <= MAX_GUARD_NUMBER);
+               /* If disk support T10PI checksum, copy guards to local_nb */
+               if (t10_cksum_type && opc == OST_WRITE) {
+                       local_nb[i].lnb_guard_rpc = 1;
+                       memcpy(local_nb[i].lnb_guards,
+                              guard_start + used_number,
+                              used * sizeof(*local_nb[i].lnb_guards));
+               }
 
                used_number += used;
                if (used_number == guard_number) {
 
                used_number += used;
                if (used_number == guard_number) {