Whamcloud - gitweb
LU-10472 osc: add T10PI support for RPC checksum 80/30980/37
authorLi Xi <lixi@ddn.com>
Tue, 23 Jan 2018 07:17:17 +0000 (02:17 -0500)
committerOleg Drokin <oleg.drokin@intel.com>
Thu, 7 Jun 2018 20:07:30 +0000 (20:07 +0000)
T10 Protection Information (T10 PI), previously known as Data
Integrity Field (DIF), is a standard for end-to-end data integrity
validation. T10 PI prevents silent data corruption, ensuring that
incomplete and incorrect data cannot overwrite good data.

Lustre file system already supports RPC level checksum which
validates the data in bulk RPCs when writing/reading data to/from
objects on OSTs. RPC level checksum can detect data corruption that
happens during RPC being transferred over the wire. However, it is
not capable to prevent silent data corruption happening in other
conditions, for example, memory corruption when data is cached in
page cache. And by using the existing checksum mechanism, only
disjoint protection coverage is provided. Thus, in order to provide
end-to-end data protection, T10PI support for Lustre should be added.

In order to provide end-to-end data integrity validation, the T10 PI
checksum of data in a sector need to be calculated on Lustre client
side and validated later on the Lustre OSS side. The T10 protection
information should be sent together with the data in the RPC.
However, in order to avoid significant performance degradation,
instead of sending all original guard tags for all sectors in a bulk
RPC, the existing checksum feature of bulk RPC will be integrated
together with the new T10PI feature.

When OST starts, necessary T10PI information will be extracted from
storage, i.e. the T10PI DIF type and sector size. The DIF type could
be one of TYPE1_IP, TYPE1_CRC, TYPE3_IP and TYPE3_CRC. And sector
size could be either 512 or 4K bytes.

When an OSC is connecting to OST, OSC and OST will negotiate about
the checksum types. New checksum types are added for T10PI support
including OBD_CKSUM_T10IP512, OBD_CKSUM_T10IP4K, OBD_CKSUM_T10CRC512,
and OBD_CKSUM_T10CRC4K. If the OST storage has T10PI suppoort, the
only selectable T10PI checksum type would have the same type with the
T10PI type of the hardware. The other existing checksum types (crc32,
crc32c, adler32) are still valid options for the RPC checksum type.

When calculating RPC checksum of T10PI, the T10PI checksums of all
sectors will be calculated first using the T10PI chekcsum type, i.e.
16-bit crc or IP checksum. And then RPC checksum will be calculated on
all of the T10PI checksums. The RPC checksum type used in this step is
always alder32. Considering that the checksum-of-checksums is only
computed on a * 4KB chunk of GRD tags for a 1MB RPC for 512B sectors,
or 16KB of GRD tags for 16MB of 4KB sectors, this is only 1/256 or
1/1024 of the total data being checksummed, so the checksum type used
here should not affect overall system performance noticeably.

obdfilter.*.enforce_t10pi_cksum can be used to tune whether to enforce
T10-PI checksum or not.

If the OST supports T10-PI feature and T10-PI chekcsum is enforced, clients
will have no other choice for RPC checksum type other than using the T10PI
chekcsum type. This is useful for enforcing end-to-end integrity in the
whole system.

If the OST doesn't support T10-PI feature and T10-PI chekcsum is enforced,
together with other checksums with reasonably good speeds (e.g. crc32,
crc32c, adler, etc.), all the T10-PI checksum types (t10ip512, t10ip4K,
t10crc512, t10crc4K) will be added to the available checksum types,
regardless of the speeds of T10-PI chekcsums. This is useful for testing
T10-PI checksums of RPC.

If the OST supports T10-PI feature and T10-PI chekcsum is NOT enforced,
the corresponding T10-PI checksum type will be added to the checksum type
list, regardless of the speed of the T10-PI chekcsum. This provide the
clients to flexibility to choose whether to enable end-to-end integrity
or not.

If the OST does NOT supports T10-PI feature and T10-PI chekcsum is NOT
enforced, together with other checksums with reasonably good speeds,
all the T10-PI checksum types with good speeds will be added into the
checksum type list. Note that a T10-PI checksum type with a speed worse
than half of Alder will NOT be added as a option. In this circumstance,
T10-PI checksum types has the same behavior like other normal checksum
types.

The clients that has no T10-PI RPC checksum support will not be affected
by the above-mentioned logic. And that logic will only be enforced to the
newly connected clients after changing obdfilter.*.enforce_t10pi_cksum on
an OST.

Following are the speeds of different checksum types on a server with CPU
of Intel(R) Xeon(R) E5-2650 @ 2.00GHz:

crc: 1575 MB/s
crc32c: 9763 MB/s
adler: 1255 MB/s
t10ip512: 6151 MB/s
t10ip4k: 7935 MB/s
t10crc512: 1119 MB/s
t10crc4k: 1531 MB/s

Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I6468680edeab0917bb71dbd8cd9ea16c65e935f5
Reviewed-on: https://review.whamcloud.com/30980
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
24 files changed:
libcfs/libcfs/linux/linux-crypto.c
lustre/autoconf/lustre-core.m4
lustre/include/dt_object.h
lustre/include/lustre_compat.h
lustre/include/obd_cksum.h
lustre/include/obd_class.h
lustre/include/uapi/linux/lustre/lustre_idl.h
lustre/llite/llite_lib.c
lustre/mdt/mdt_handler.c
lustre/obdclass/Makefile.in
lustre/obdclass/integrity.c [new file with mode: 0644]
lustre/obdclass/obd_cksum.c [new file with mode: 0644]
lustre/ofd/lproc_ofd.c
lustre/ofd/ofd_dev.c
lustre/ofd/ofd_internal.h
lustre/ofd/ofd_obd.c
lustre/osc/osc_request.c
lustre/osd-ldiskfs/osd_handler.c
lustre/ptlrpc/import.c
lustre/ptlrpc/wiretest.c
lustre/target/tgt_handler.c
lustre/tests/sanity.sh
lustre/utils/wirecheck.c
lustre/utils/wiretest.c

index d95a54f..4aab16e 100644 (file)
@@ -325,6 +325,9 @@ EXPORT_SYMBOL(cfs_crypto_hash_final);
  * The speed is stored internally in the cfs_crypto_hash_speeds[] array, and
  * is available through the cfs_crypto_hash_speed() function.
  *
  * The speed is stored internally in the cfs_crypto_hash_speeds[] array, and
  * is available through the cfs_crypto_hash_speed() function.
  *
+ * This function needs to stay the same as obd_t10_performance_test() so that
+ * the speeds are comparable.
+ *
  * \param[in] hash_alg hash algorithm id (CFS_HASH_ALG_*)
  * \param[in] buf      data buffer on which to compute the hash
  * \param[in] buf_len  length of \buf on which to compute hash
  * \param[in] hash_alg hash algorithm id (CFS_HASH_ALG_*)
  * \param[in] buf      data buffer on which to compute the hash
  * \param[in] buf_len  length of \buf on which to compute hash
index b1f17d0..9cdceb8 100644 (file)
@@ -1987,6 +1987,23 @@ file_function_iter, [
 ]) # LC_HAVE_FILE_OPERATIONS_READ_WRITE_ITER
 
 #
 ]) # LC_HAVE_FILE_OPERATIONS_READ_WRITE_ITER
 
 #
+# LC_HAVE_INTERVAL_BLK_INTEGRITY
+#
+# 3.17 replace sector_size with interval in struct blk_integrity
+#
+AC_DEFUN([LC_HAVE_INTERVAL_BLK_INTEGRITY], [
+LB_CHECK_COMPILE([if 'blk_integrity.interval' exist],
+interval_blk_integrity, [
+       #include <linux/blkdev.h>
+],[
+       ((struct blk_integrity *)0)->interval = 0;
+],[
+       AC_DEFINE(HAVE_INTERVAL_BLK_INTEGRITY, 1,
+               [blk_integrity.interval exist])
+])
+]) # LC_HAVE_INTERVAL_BLK_INTEGRITY
+
+#
 # LC_KEY_MATCH_DATA
 #
 # 3.17 replaces key_type::match with match_preparse
 # LC_KEY_MATCH_DATA
 #
 # 3.17 replaces key_type::match with match_preparse
@@ -2241,6 +2258,25 @@ bio_endio, [
 ]) # LC_BIO_ENDIO_USES_ONE_ARG
 
 #
 ]) # LC_BIO_ENDIO_USES_ONE_ARG
 
 #
+# LC_HAVE_INTERVAL_EXP_BLK_INTEGRITY
+#
+# 4.3 replace interval with interval_exp in 'struct blk_integrity'
+# 'struct blk_integrity_profile' is also added in this version,
+# thus use this to determine whether 'struct blk_integrity' has profile
+#
+AC_DEFUN([LC_HAVE_INTERVAL_EXP_BLK_INTEGRITY], [
+LB_CHECK_COMPILE([if 'blk_integrity.interval_exp' exist],
+blk_integrity_interval_exp, [
+       #include <linux/blkdev.h>
+],[
+       ((struct blk_integrity *)0)->interval_exp = 0;
+],[
+       AC_DEFINE(HAVE_INTERVAL_EXP_BLK_INTEGRITY, 1,
+               [blk_integrity.interval_exp exist])
+])
+]) # LC_HAVE_INTERVAL_EXP_BLK_INTEGRITY
+
+#
 # LC_HAVE_LOOP_CTL_GET_FREE
 #
 # 4.x kernel have moved userspace APIs to
 # LC_HAVE_LOOP_CTL_GET_FREE
 #
 # 4.x kernel have moved userspace APIs to
@@ -2952,6 +2988,7 @@ AC_DEFUN([LC_PROG_LINUX], [
        LC_HAVE_FILE_OPERATIONS_READ_WRITE_ITER
 
        # 3.17
        LC_HAVE_FILE_OPERATIONS_READ_WRITE_ITER
 
        # 3.17
+       LC_HAVE_INTERVAL_BLK_INTEGRITY
        LC_KEY_MATCH_DATA
 
        # 3.18
        LC_KEY_MATCH_DATA
 
        # 3.18
@@ -2979,6 +3016,7 @@ AC_DEFUN([LC_PROG_LINUX], [
        LC_SYMLINK_OPS_USE_NAMEIDATA
 
        # 4.3
        LC_SYMLINK_OPS_USE_NAMEIDATA
 
        # 4.3
+       LC_HAVE_INTERVAL_EXP_BLK_INTEGRITY
        LC_HAVE_CACHE_HEAD_HLIST
        LC_HAVE_XATTR_HANDLER_SIMPLIFIED
 
        LC_HAVE_CACHE_HEAD_HLIST
        LC_HAVE_XATTR_HANDLER_SIMPLIFIED
 
index 35f1ec4..6699d0f 100644 (file)
@@ -86,6 +86,8 @@ struct dt_device_param {
         * calculation */
        unsigned int       ddp_extent_tax;
        unsigned int       ddp_brw_size;        /* optimal RPC size */
         * calculation */
        unsigned int       ddp_extent_tax;
        unsigned int       ddp_brw_size;        /* optimal RPC size */
+       /* T10PI checksum type, zero if not supported */
+       enum cksum_types   ddp_t10_cksum_type;
 };
 
 /**
 };
 
 /**
index 158bbb3..a94ff13 100644 (file)
@@ -41,6 +41,7 @@
 #include <linux/bio.h>
 #include <linux/xattr.h>
 #include <linux/workqueue.h>
 #include <linux/bio.h>
 #include <linux/xattr.h>
 #include <linux/workqueue.h>
+#include <linux/blkdev.h>
 
 #include <libcfs/linux/linux-fs.h>
 #include <lustre_patchless_compat.h>
 
 #include <libcfs/linux/linux-fs.h>
 #include <lustre_patchless_compat.h>
@@ -697,4 +698,24 @@ static inline struct timespec current_time(struct inode *inode)
 #define READ_ONCE ACCESS_ONCE
 #endif
 
 #define READ_ONCE ACCESS_ONCE
 #endif
 
+static inline unsigned short blk_integrity_interval(struct blk_integrity *bi)
+{
+#ifdef HAVE_INTERVAL_EXP_BLK_INTEGRITY
+       return bi->interval_exp ? 1 << bi->interval_exp : 0;
+#elif defined(HAVE_INTERVAL_BLK_INTEGRITY)
+       return bi->interval;
+#else
+       return bi->sector_size;
+#endif
+}
+
+static inline const char *blk_integrity_name(struct blk_integrity *bi)
+{
+#ifdef HAVE_INTERVAL_EXP_BLK_INTEGRITY
+       return bi->profile->name;
+#else
+       return bi->name;
+#endif
+}
+
 #endif /* _LUSTRE_COMPAT_H */
 #endif /* _LUSTRE_COMPAT_H */
index 53ca52d..a2ce2ec 100644 (file)
@@ -36,6 +36,9 @@
 #include <libcfs/libcfs_crypto.h>
 #include <uapi/linux/lustre/lustre_idl.h>
 
 #include <libcfs/libcfs_crypto.h>
 #include <uapi/linux/lustre/lustre_idl.h>
 
+int obd_t10_cksum_speed(const char *obd_name,
+                       enum cksum_types cksum_type);
+
 static inline unsigned char cksum_obd2cfs(enum cksum_types cksum_type)
 {
        switch (cksum_type) {
 static inline unsigned char cksum_obd2cfs(enum cksum_types cksum_type)
 {
        switch (cksum_type) {
@@ -52,58 +55,23 @@ static inline unsigned char cksum_obd2cfs(enum cksum_types cksum_type)
        return 0;
 }
 
        return 0;
 }
 
-/* The OBD_FL_CKSUM_* flags is packed into 5 bits of o_flags, since there can
- * only be a single checksum type per RPC.
- *
- * The OBD_CHECKSUM_* type bits passed in ocd_cksum_types are a 32-bit bitmask
- * since they need to represent the full range of checksum algorithms that
- * both the client and server can understand.
- *
- * In case of an unsupported types/flags we fall back to ADLER
- * because that is supported by all clients since 1.8
- *
- * In case multiple algorithms are supported the best one is used. */
-static inline u32 cksum_type_pack(enum cksum_types cksum_type)
-{
-       unsigned int    performance = 0, tmp;
-       u32             flag = OBD_FL_CKSUM_ADLER;
-
-       if (cksum_type & OBD_CKSUM_CRC32) {
-               tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32));
-               if (tmp > performance) {
-                       performance = tmp;
-                       flag = OBD_FL_CKSUM_CRC32;
-               }
-       }
-       if (cksum_type & OBD_CKSUM_CRC32C) {
-               tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C));
-               if (tmp > performance) {
-                       performance = tmp;
-                       flag = OBD_FL_CKSUM_CRC32C;
-               }
-       }
-       if (cksum_type & OBD_CKSUM_ADLER) {
-               tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER));
-               if (tmp > performance) {
-                       performance = tmp;
-                       flag = OBD_FL_CKSUM_ADLER;
-               }
-       }
-       if (unlikely(cksum_type && !(cksum_type & (OBD_CKSUM_CRC32C |
-                                                  OBD_CKSUM_CRC32 |
-                                                  OBD_CKSUM_ADLER))))
-               CWARN("unknown cksum type %x\n", cksum_type);
-
-       return flag;
-}
+u32 obd_cksum_type_pack(const char *obd_name, enum cksum_types cksum_type);
 
 
-static inline enum cksum_types cksum_type_unpack(u32 o_flags)
+static inline enum cksum_types obd_cksum_type_unpack(u32 o_flags)
 {
        switch (o_flags & OBD_FL_CKSUM_ALL) {
        case OBD_FL_CKSUM_CRC32C:
                return OBD_CKSUM_CRC32C;
        case OBD_FL_CKSUM_CRC32:
                return OBD_CKSUM_CRC32;
 {
        switch (o_flags & OBD_FL_CKSUM_ALL) {
        case OBD_FL_CKSUM_CRC32C:
                return OBD_CKSUM_CRC32C;
        case OBD_FL_CKSUM_CRC32:
                return OBD_CKSUM_CRC32;
+       case OBD_FL_CKSUM_T10IP512:
+               return OBD_CKSUM_T10IP512;
+       case OBD_FL_CKSUM_T10IP4K:
+               return OBD_CKSUM_T10IP4K;
+       case OBD_FL_CKSUM_T10CRC512:
+               return OBD_CKSUM_T10CRC512;
+       case OBD_FL_CKSUM_T10CRC4K:
+               return OBD_CKSUM_T10CRC4K;
        default:
                break;
        }
        default:
                break;
        }
@@ -115,7 +83,7 @@ static inline enum cksum_types cksum_type_unpack(u32 o_flags)
  * 1.8 supported ADLER it is base and not depend on hw
  * Client uses all available local algos
  */
  * 1.8 supported ADLER it is base and not depend on hw
  * Client uses all available local algos
  */
-static inline enum cksum_types cksum_types_supported_client(void)
+static inline enum cksum_types obd_cksum_types_supported_client(void)
 {
        enum cksum_types ret = OBD_CKSUM_ADLER;
 
 {
        enum cksum_types ret = OBD_CKSUM_ADLER;
 
@@ -129,32 +97,13 @@ static inline enum cksum_types cksum_types_supported_client(void)
        if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)) > 0)
                ret |= OBD_CKSUM_CRC32;
 
        if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)) > 0)
                ret |= OBD_CKSUM_CRC32;
 
-       return ret;
-}
-
-/* Server uses algos that perform at 50% or better of the Adler */
-static inline enum cksum_types cksum_types_supported_server(void)
-{
-       enum cksum_types ret = OBD_CKSUM_ADLER;
-       int base_speed;
-
-       CDEBUG(D_INFO, "Crypto hash speed: crc %d, crc32c %d, adler %d\n",
-              cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)),
-              cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)),
-              cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)));
-
-       base_speed = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)) / 2;
-
-       if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)) >=
-           base_speed)
-               ret |= OBD_CKSUM_CRC32C;
-       if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)) >=
-           base_speed)
-               ret |= OBD_CKSUM_CRC32;
+       /* Client support all kinds of T10 checksum */
+       ret |= OBD_CKSUM_T10_ALL;
 
        return ret;
 }
 
 
        return ret;
 }
 
+enum cksum_types obd_cksum_types_supported_server(const char *obd_name);
 
 /* Select the best checksum algorithm among those supplied in the cksum_types
  * input.
 
 /* Select the best checksum algorithm among those supplied in the cksum_types
  * input.
@@ -163,13 +112,67 @@ static inline enum cksum_types cksum_types_supported_server(void)
  * checksum type due to its benchmarking at libcfs module load.
  * Caution is advised, however, since what is fastest on a single client may
  * not be the fastest or most efficient algorithm on the server.  */
  * checksum type due to its benchmarking at libcfs module load.
  * Caution is advised, however, since what is fastest on a single client may
  * not be the fastest or most efficient algorithm on the server.  */
-static inline enum cksum_types cksum_type_select(enum cksum_types cksum_types)
+static inline enum cksum_types
+obd_cksum_type_select(const char *obd_name, enum cksum_types cksum_types)
 {
 {
-       return cksum_type_unpack(cksum_type_pack(cksum_types));
+       u32 flag = obd_cksum_type_pack(obd_name, cksum_types);
+
+       return obd_cksum_type_unpack(flag);
 }
 
 /* Checksum algorithm names. Must be defined in the same order as the
  * OBD_CKSUM_* flags. */
 }
 
 /* Checksum algorithm names. Must be defined in the same order as the
  * OBD_CKSUM_* flags. */
-#define DECLARE_CKSUM_NAME char *cksum_name[] = {"crc32", "adler", "crc32c"}
+#define DECLARE_CKSUM_NAME const char *cksum_name[] = {"crc32", "adler", \
+       "crc32c", "reserved", "t10ip512", "t10ip4K", "t10crc512", "t10crc4K"}
+
+typedef __u16 (obd_dif_csum_fn) (void *, unsigned int);
+
+__u16 obd_dif_crc_fn(void *data, unsigned int len);
+__u16 obd_dif_ip_fn(void *data, unsigned int len);
+int obd_page_dif_generate_buffer(const char *obd_name, struct page *page,
+                                __u32 offset, __u32 length,
+                                __u16 *guard_start, int guard_number,
+                                int *used_number, int sector_size,
+                                obd_dif_csum_fn *fn);
+/*
+ * If checksum type is one T10 checksum types, init the csum_fn and sector
+ * size. Otherwise, init them to NULL/zero.
+ */
+static inline void obd_t10_cksum2dif(enum cksum_types cksum_type,
+                                    obd_dif_csum_fn **fn, int *sector_size)
+{
+       *fn = NULL;
+       *sector_size = 0;
+
+       switch (cksum_type) {
+       case OBD_CKSUM_T10IP512:
+               *fn = obd_dif_ip_fn;
+               *sector_size = 512;
+               break;
+       case OBD_CKSUM_T10IP4K:
+               *fn = obd_dif_ip_fn;
+               *sector_size = 4096;
+               break;
+       case OBD_CKSUM_T10CRC512:
+               *fn = obd_dif_crc_fn;
+               *sector_size = 512;
+               break;
+       case OBD_CKSUM_T10CRC4K:
+               *fn = obd_dif_crc_fn;
+               *sector_size = 4096;
+               break;
+       default:
+               break;
+       }
+}
+
+enum obd_t10_cksum_type {
+       OBD_T10_CKSUM_UNKNOWN = 0,
+       OBD_T10_CKSUM_IP512,
+       OBD_T10_CKSUM_IP4K,
+       OBD_T10_CKSUM_CRC512,
+       OBD_T10_CKSUM_CRC4K,
+       OBD_T10_CKSUM_MAX
+};
 
 #endif /* __OBD_H */
 
 #endif /* __OBD_H */
index b49db89..cd5f833 100644 (file)
@@ -1921,5 +1921,4 @@ extern struct miscdevice obd_psdev;
 int obd_ioctl_getdata(char **buf, int *len, void __user *arg);
 int class_procfs_init(void);
 int class_procfs_clean(void);
 int obd_ioctl_getdata(char **buf, int *len, void __user *arg);
 int class_procfs_init(void);
 int class_procfs_clean(void);
-
 #endif /* __LINUX_OBD_CLASS_H */
 #endif /* __LINUX_OBD_CLASS_H */
index d368eca..11ea4d3 100644 (file)
@@ -950,15 +950,37 @@ struct obd_connect_data {
 /*
  * Supported checksum algorithms. Up to 32 checksum types are supported.
  * (32-bit mask stored in obd_connect_data::ocd_cksum_types)
 /*
  * Supported checksum algorithms. Up to 32 checksum types are supported.
  * (32-bit mask stored in obd_connect_data::ocd_cksum_types)
- * Please update DECLARE_CKSUM_NAME/OBD_CKSUM_ALL in obd.h when adding a new
- * algorithm and also the OBD_FL_CKSUM* flags.
+ * Please update DECLARE_CKSUM_NAME in obd_cksum.h when adding a new
+ * algorithm and also the OBD_FL_CKSUM* flags, OBD_CKSUM_ALL flag,
+ * OBD_FL_CKSUM_ALL flag and potentially OBD_CKSUM_T10_ALL flag.
  */
 enum cksum_types {
  */
 enum cksum_types {
-        OBD_CKSUM_CRC32 = 0x00000001,
-        OBD_CKSUM_ADLER = 0x00000002,
-        OBD_CKSUM_CRC32C= 0x00000004,
+       OBD_CKSUM_CRC32         = 0x00000001,
+       OBD_CKSUM_ADLER         = 0x00000002,
+       OBD_CKSUM_CRC32C        = 0x00000004,
+       OBD_CKSUM_RESERVED      = 0x00000008,
+       OBD_CKSUM_T10IP512      = 0x00000010,
+       OBD_CKSUM_T10IP4K       = 0x00000020,
+       OBD_CKSUM_T10CRC512     = 0x00000040,
+       OBD_CKSUM_T10CRC4K      = 0x00000080,
 };
 
 };
 
+#define OBD_CKSUM_T10_ALL (OBD_CKSUM_T10IP512 | OBD_CKSUM_T10IP4K | \
+       OBD_CKSUM_T10CRC512 | OBD_CKSUM_T10CRC4K)
+
+#define OBD_CKSUM_ALL (OBD_CKSUM_CRC32 | OBD_CKSUM_ADLER | OBD_CKSUM_CRC32C | \
+                      OBD_CKSUM_T10_ALL)
+
+/*
+ * The default checksum algorithm used on top of T10PI GRD tags for RPC.
+ * Considering that the checksum-of-checksums is only computing CRC32 on a
+ * 4KB chunk of GRD tags for a 1MB RPC for 512B sectors, or 16KB of GRD
+ * tags for 16MB of 4KB sectors, this is only 1/256 or 1/1024 of the
+ * total data being checksummed, so the checksum type used here should not
+ * affect overall system performance noticeably.
+ */
+#define OBD_CKSUM_T10_TOP OBD_CKSUM_ADLER
+
 /*
  *   OST requests: OBDO & OBD request records
  */
 /*
  *   OST requests: OBDO & OBD request records
  */
@@ -1003,13 +1025,16 @@ enum obdo_flags {
         OBD_FL_NO_GRPQUOTA  = 0x00000200, /* the object's group is over quota */
         OBD_FL_CREATE_CROW  = 0x00000400, /* object should be create on write */
         OBD_FL_SRVLOCK      = 0x00000800, /* delegate DLM locking to server */
         OBD_FL_NO_GRPQUOTA  = 0x00000200, /* the object's group is over quota */
         OBD_FL_CREATE_CROW  = 0x00000400, /* object should be create on write */
         OBD_FL_SRVLOCK      = 0x00000800, /* delegate DLM locking to server */
-        OBD_FL_CKSUM_CRC32  = 0x00001000, /* CRC32 checksum type */
-        OBD_FL_CKSUM_ADLER  = 0x00002000, /* ADLER checksum type */
-        OBD_FL_CKSUM_CRC32C = 0x00004000, /* CRC32C checksum type */
-        OBD_FL_CKSUM_RSVD2  = 0x00008000, /* for future cksum types */
-        OBD_FL_CKSUM_RSVD3  = 0x00010000, /* for future cksum types */
-        OBD_FL_SHRINK_GRANT = 0x00020000, /* object shrink the grant */
-        OBD_FL_MMAP         = 0x00040000, /* object is mmapped on the client.
+       OBD_FL_CKSUM_CRC32  = 0x00001000, /* CRC32 checksum type */
+       OBD_FL_CKSUM_ADLER  = 0x00002000, /* ADLER checksum type */
+       OBD_FL_CKSUM_CRC32C = 0x00004000, /* CRC32C checksum type */
+       OBD_FL_CKSUM_T10IP512  = 0x00005000, /* T10PI IP cksum, 512B sector */
+       OBD_FL_CKSUM_T10IP4K   = 0x00006000, /* T10PI IP cksum, 4KB sector */
+       OBD_FL_CKSUM_T10CRC512 = 0x00007000, /* T10PI CRC cksum, 512B sector */
+       OBD_FL_CKSUM_T10CRC4K  = 0x00008000, /* T10PI CRC cksum, 4KB sector */
+       OBD_FL_CKSUM_RSVD3  = 0x00010000, /* for future cksum types */
+       OBD_FL_SHRINK_GRANT = 0x00020000, /* object shrink the grant */
+       OBD_FL_MMAP         = 0x00040000, /* object is mmapped on the client.
                                            * XXX: obsoleted - reserved for old
                                            * clients prior than 2.2 */
         OBD_FL_RECOV_RESEND = 0x00080000, /* recoverable resent */
                                            * XXX: obsoleted - reserved for old
                                            * clients prior than 2.2 */
         OBD_FL_RECOV_RESEND = 0x00080000, /* recoverable resent */
@@ -1018,10 +1043,15 @@ enum obdo_flags {
        OBD_FL_SHORT_IO     = 0x00400000, /* short io request */
        /* OBD_FL_LOCAL_MASK = 0xF0000000, was local-only flags until 2.10 */
 
        OBD_FL_SHORT_IO     = 0x00400000, /* short io request */
        /* OBD_FL_LOCAL_MASK = 0xF0000000, was local-only flags until 2.10 */
 
-       /* Note that while these checksum values are currently separate bits,
-        * in 2.x we can actually allow all values from 1-31 if we wanted. */
+       /*
+        * Note that while the original checksum values were separate bits,
+        * in 2.x we can actually allow all values from 1-31. T10-PI checksum
+        * types already use values which are not separate bits.
+        */
        OBD_FL_CKSUM_ALL    = OBD_FL_CKSUM_CRC32 | OBD_FL_CKSUM_ADLER |
        OBD_FL_CKSUM_ALL    = OBD_FL_CKSUM_CRC32 | OBD_FL_CKSUM_ADLER |
-                             OBD_FL_CKSUM_CRC32C,
+                             OBD_FL_CKSUM_CRC32C | OBD_FL_CKSUM_T10IP512 |
+                             OBD_FL_CKSUM_T10IP4K | OBD_FL_CKSUM_T10CRC512 |
+                             OBD_FL_CKSUM_T10CRC4K,
 };
 
 /*
 };
 
 /*
index ac31f08..a56ffb5 100644 (file)
@@ -235,7 +235,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
                                   OBD_CONNECT_LARGE_ACL;
 #endif
 
                                   OBD_CONNECT_LARGE_ACL;
 #endif
 
-       data->ocd_cksum_types = cksum_types_supported_client();
+       data->ocd_cksum_types = obd_cksum_types_supported_client();
 
        if (OBD_FAIL_CHECK(OBD_FAIL_MDC_LIGHTWEIGHT))
                /* flag mdc connection as lightweight, only used for test
 
        if (OBD_FAIL_CHECK(OBD_FAIL_MDC_LIGHTWEIGHT))
                /* flag mdc connection as lightweight, only used for test
@@ -442,7 +442,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
        if (OBD_FAIL_CHECK(OBD_FAIL_OSC_CKSUM_ADLER_ONLY))
                data->ocd_cksum_types = OBD_CKSUM_ADLER;
        else
        if (OBD_FAIL_CHECK(OBD_FAIL_OSC_CKSUM_ADLER_ONLY))
                data->ocd_cksum_types = OBD_CKSUM_ADLER;
        else
-               data->ocd_cksum_types = cksum_types_supported_client();
+               data->ocd_cksum_types = obd_cksum_types_supported_client();
 
 #ifdef HAVE_LRU_RESIZE_SUPPORT
        data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
 
 #ifdef HAVE_LRU_RESIZE_SUPPORT
        data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
index fee96ac..66ad174 100644 (file)
@@ -5584,6 +5584,7 @@ static int mdt_connect_internal(const struct lu_env *env,
                                struct mdt_device *mdt,
                                struct obd_connect_data *data, bool reconnect)
 {
                                struct mdt_device *mdt,
                                struct obd_connect_data *data, bool reconnect)
 {
+       const char *obd_name = mdt_obd_name(mdt);
        LASSERT(data != NULL);
 
        data->ocd_connect_flags &= MDT_CONNECT_SUPPORTED;
        LASSERT(data != NULL);
 
        data->ocd_connect_flags &= MDT_CONNECT_SUPPORTED;
@@ -5612,8 +5613,7 @@ static int mdt_connect_internal(const struct lu_env *env,
                               "ocd_version: %x ocd_grant: %d ocd_index: %u "
                               "ocd_brw_size unexpectedly zero, network data "
                               "corruption? Refusing to connect this client\n",
                               "ocd_version: %x ocd_grant: %d ocd_index: %u "
                               "ocd_brw_size unexpectedly zero, network data "
                               "corruption? Refusing to connect this client\n",
-                              mdt_obd_name(mdt),
-                              exp->exp_client_uuid.uuid,
+                              obd_name, exp->exp_client_uuid.uuid,
                               exp, data->ocd_connect_flags, data->ocd_version,
                               data->ocd_grant, data->ocd_index);
                        return -EPROTO;
                               exp, data->ocd_connect_flags, data->ocd_version,
                               data->ocd_grant, data->ocd_index);
                        return -EPROTO;
@@ -5659,7 +5659,7 @@ static int mdt_connect_internal(const struct lu_env *env,
 
        if ((data->ocd_connect_flags & OBD_CONNECT_FID) == 0) {
                CWARN("%s: MDS requires FID support, but client not\n",
 
        if ((data->ocd_connect_flags & OBD_CONNECT_FID) == 0) {
                CWARN("%s: MDS requires FID support, but client not\n",
-                     mdt_obd_name(mdt));
+                     obd_name);
                return -EBADE;
        }
 
                return -EBADE;
        }
 
@@ -5693,7 +5693,8 @@ static int mdt_connect_internal(const struct lu_env *env,
                /* The client set in ocd_cksum_types the checksum types it
                 * supports. We have to mask off the algorithms that we don't
                 * support */
                /* The client set in ocd_cksum_types the checksum types it
                 * supports. We have to mask off the algorithms that we don't
                 * support */
-               data->ocd_cksum_types &= cksum_types_supported_server();
+               data->ocd_cksum_types &=
+                       obd_cksum_types_supported_server(obd_name);
 
                if (unlikely(data->ocd_cksum_types == 0)) {
                        CERROR("%s: Connect with checksum support but no "
 
                if (unlikely(data->ocd_cksum_types == 0)) {
                        CERROR("%s: Connect with checksum support but no "
index 1414ae9..66a67da 100644 (file)
@@ -11,6 +11,7 @@ obdclass-all-objs += lu_object.o dt_object.o
 obdclass-all-objs += cl_object.o cl_page.o cl_lock.o cl_io.o lu_ref.o
 obdclass-all-objs += linkea.o
 obdclass-all-objs += kernelcomm.o jobid.o
 obdclass-all-objs += cl_object.o cl_page.o cl_lock.o cl_io.o lu_ref.o
 obdclass-all-objs += linkea.o
 obdclass-all-objs += kernelcomm.o jobid.o
+obdclass-all-objs += integrity.o obd_cksum.o
 
 @SERVER_TRUE@obdclass-all-objs += acl.o
 @SERVER_TRUE@obdclass-all-objs += idmap.o
 
 @SERVER_TRUE@obdclass-all-objs += acl.o
 @SERVER_TRUE@obdclass-all-objs += idmap.o
diff --git a/lustre/obdclass/integrity.c b/lustre/obdclass/integrity.c
new file mode 100644 (file)
index 0000000..da56b20
--- /dev/null
@@ -0,0 +1,271 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2018, DataDirect Networks Storage.
+ * Author: Li Xi.
+ *
+ * General data integrity functions
+ */
+#include <linux/blkdev.h>
+#include <linux/crc-t10dif.h>
+#include <asm-generic/checksum.h>
+#include <obd_class.h>
+#include <obd_cksum.h>
+
+__u16 obd_dif_crc_fn(void *data, unsigned int len)
+{
+       return cpu_to_be16(crc_t10dif(data, len));
+}
+EXPORT_SYMBOL(obd_dif_crc_fn);
+
+__u16 obd_dif_ip_fn(void *data, unsigned int len)
+{
+       return ip_compute_csum(data, len);
+}
+EXPORT_SYMBOL(obd_dif_ip_fn);
+
+int obd_page_dif_generate_buffer(const char *obd_name, struct page *page,
+                                __u32 offset, __u32 length,
+                                __u16 *guard_start, int guard_number,
+                                int *used_number, int sector_size,
+                                obd_dif_csum_fn *fn)
+{
+       unsigned int i;
+       char *data_buf;
+       __u16 *guard_buf = guard_start;
+       unsigned int data_size;
+       int used = 0;
+
+       data_buf = kmap(page) + offset;
+       for (i = 0; i < length; i += sector_size) {
+               if (used >= guard_number) {
+                       CERROR("%s: unexpected used guard number of DIF %u/%u, "
+                              "data length %u, sector size %u: rc = %d\n",
+                              obd_name, used, guard_number, length,
+                              sector_size, -E2BIG);
+                       return -E2BIG;
+               }
+               data_size = length - i;
+               if (data_size > sector_size)
+                       data_size = sector_size;
+               *guard_buf = fn(data_buf, data_size);
+               guard_buf++;
+               data_buf += data_size;
+               used++;
+       }
+       kunmap(page);
+       *used_number = used;
+
+       return 0;
+}
+EXPORT_SYMBOL(obd_page_dif_generate_buffer);
+
+static int __obd_t10_performance_test(const char *obd_name,
+                                     enum cksum_types cksum_type,
+                                     struct page *data_page,
+                                     int repeat_number)
+{
+       unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP);
+       struct cfs_crypto_hash_desc *hdesc;
+       obd_dif_csum_fn *fn = NULL;
+       unsigned int bufsize;
+       unsigned char *buffer;
+       struct page *__page;
+       __u16 *guard_start;
+       int guard_number;
+       int used_number = 0;
+       int sector_size = 0;
+       __u32 cksum;
+       int rc = 0;
+       int rc2;
+       int used;
+       int i;
+
+       obd_t10_cksum2dif(cksum_type, &fn, &sector_size);
+       if (!fn)
+               return -EINVAL;
+
+       __page = alloc_page(GFP_KERNEL);
+       if (__page == NULL)
+               return -ENOMEM;
+
+       hdesc = cfs_crypto_hash_init(cfs_alg, NULL, 0);
+       if (IS_ERR(hdesc)) {
+               rc = PTR_ERR(hdesc);
+               CERROR("%s: unable to initialize checksum hash %s: rc = %d\n",
+                      obd_name, cfs_crypto_hash_name(cfs_alg), rc);
+               GOTO(out, rc);
+       }
+
+       buffer = kmap(__page);
+       guard_start = (__u16 *)buffer;
+       guard_number = PAGE_SIZE / sizeof(*guard_start);
+       for (i = 0; i < repeat_number; i++) {
+               /*
+                * The left guard number should be able to hold checksums of a
+                * whole page
+                */
+               rc = obd_page_dif_generate_buffer(obd_name, data_page, 0,
+                                                 PAGE_SIZE,
+                                                 guard_start + used_number,
+                                                 guard_number - used_number,
+                                                 &used, sector_size, fn);
+               if (rc)
+                       break;
+
+               used_number += used;
+               if (used_number == guard_number) {
+                       cfs_crypto_hash_update_page(hdesc, __page, 0,
+                               used_number * sizeof(*guard_start));
+                       used_number = 0;
+               }
+       }
+       kunmap(__page);
+       if (rc)
+               GOTO(out_final, rc);
+
+       if (used_number != 0)
+               cfs_crypto_hash_update_page(hdesc, __page, 0,
+                       used_number * sizeof(*guard_start));
+
+       bufsize = sizeof(cksum);
+out_final:
+       rc2 = cfs_crypto_hash_final(hdesc, (unsigned char *)&cksum, &bufsize);
+       rc = rc ? rc : rc2;
+out:
+       __free_page(__page);
+
+       return rc;
+}
+
+/**
+ *  Array of T10PI checksum algorithm speed in MByte per second
+ */
+static int obd_t10_cksum_speeds[OBD_T10_CKSUM_MAX];
+
+static enum obd_t10_cksum_type
+obd_t10_cksum2type(enum cksum_types cksum_type)
+{
+       switch (cksum_type) {
+       case OBD_CKSUM_T10IP512:
+               return OBD_T10_CKSUM_IP512;
+       case OBD_CKSUM_T10IP4K:
+               return OBD_T10_CKSUM_IP4K;
+       case OBD_CKSUM_T10CRC512:
+               return OBD_T10_CKSUM_CRC512;
+       case OBD_CKSUM_T10CRC4K:
+               return OBD_T10_CKSUM_CRC4K;
+       default:
+               return OBD_T10_CKSUM_UNKNOWN;
+       }
+}
+
+static const char *obd_t10_cksum_name(enum obd_t10_cksum_type index)
+{
+       DECLARE_CKSUM_NAME;
+
+       /* Need to skip "crc32", "adler", "crc32c", "reserved" */
+       return cksum_name[3 + index];
+}
+
+/**
+ * Compute the speed of specified T10PI checksum type
+ *
+ * Run a speed test on the given T10PI checksum on buffer using a 1MB buffer
+ * size. This is a reasonable buffer size for Lustre RPCs, even if the actual
+ * RPC size is larger or smaller.
+ *
+ * The speed is stored internally in the obd_t10_cksum_speeds[] array, and
+ * is available through the obd_t10_cksum_speed() function.
+ *
+ * This function needs to stay the same as cfs_crypto_performance_test() so
+ * that the speeds are comparable. And this function should reflect the real
+ * cost of the checksum calculation.
+ *
+ * \param[in] obd_name         name of the OBD device
+ * \param[in] cksum_type       checksum type (OBD_CKSUM_T10*)
+ */
+static void obd_t10_performance_test(const char *obd_name,
+                                    enum cksum_types cksum_type)
+{
+       enum obd_t10_cksum_type index = obd_t10_cksum2type(cksum_type);
+       const int buf_len = max(PAGE_SIZE, 1048576UL);
+       unsigned long bcount;
+       unsigned long start;
+       unsigned long end;
+       struct page *page;
+       int rc = 0;
+       void *buf;
+
+       page = alloc_page(GFP_KERNEL);
+       if (page == NULL) {
+               rc = -ENOMEM;
+               goto out;
+       }
+
+       buf = kmap(page);
+       memset(buf, 0xAD, PAGE_SIZE);
+       kunmap(page);
+
+       for (start = jiffies, end = start + msecs_to_jiffies(MSEC_PER_SEC / 4),
+            bcount = 0; time_before(jiffies, end) && rc == 0; bcount++) {
+               rc = __obd_t10_performance_test(obd_name, cksum_type, page,
+                                               buf_len / PAGE_SIZE);
+               if (rc)
+                       break;
+       }
+       end = jiffies;
+       __free_page(page);
+out:
+       if (rc) {
+               obd_t10_cksum_speeds[index] = rc;
+               CDEBUG(D_INFO, "%s: T10 checksum algorithm %s test error: "
+                      "rc = %d\n", obd_name, obd_t10_cksum_name(index), rc);
+       } else {
+               unsigned long tmp;
+
+               tmp = ((bcount * buf_len / jiffies_to_msecs(end - start)) *
+                      1000) / (1024 * 1024);
+               obd_t10_cksum_speeds[index] = (int)tmp;
+               CDEBUG(D_CONFIG, "%s: T10 checksum algorithm %s speed = %d "
+                      "MB/s\n", obd_name, obd_t10_cksum_name(index),
+                      obd_t10_cksum_speeds[index]);
+       }
+}
+
+int obd_t10_cksum_speed(const char *obd_name,
+                       enum cksum_types cksum_type)
+{
+       enum obd_t10_cksum_type index = obd_t10_cksum2type(cksum_type);
+
+       if (unlikely(obd_t10_cksum_speeds[index] == 0)) {
+               static DEFINE_MUTEX(obd_t10_cksum_speed_mutex);
+
+               mutex_lock(&obd_t10_cksum_speed_mutex);
+               if (obd_t10_cksum_speeds[index] == 0)
+                       obd_t10_performance_test(obd_name, cksum_type);
+               mutex_unlock(&obd_t10_cksum_speed_mutex);
+       }
+
+       return obd_t10_cksum_speeds[index];
+}
+EXPORT_SYMBOL(obd_t10_cksum_speed);
diff --git a/lustre/obdclass/obd_cksum.c b/lustre/obdclass/obd_cksum.c
new file mode 100644 (file)
index 0000000..16e6f12
--- /dev/null
@@ -0,0 +1,149 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2018, DataDirect Networks Storage.
+ * Author: Li Xi.
+ *
+ * Checksum functions
+ */
+#include <obd_class.h>
+#include <obd_cksum.h>
+
+/* Server uses algos that perform at 50% or better of the Adler */
+enum cksum_types obd_cksum_types_supported_server(const char *obd_name)
+{
+       enum cksum_types ret = OBD_CKSUM_ADLER;
+       int base_speed;
+
+       CDEBUG(D_INFO, "%s: checksum speed: crc %d, crc32c %d, adler %d, "
+              "t10ip512 %d, t10ip4k %d, t10crc512 %d, t10crc4k %d\n",
+              obd_name,
+              cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)),
+              cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)),
+              cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)),
+              obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512),
+              obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K),
+              obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512),
+              obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K));
+
+       base_speed = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER)) / 2;
+
+       if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C)) >=
+           base_speed)
+               ret |= OBD_CKSUM_CRC32C;
+
+       if (cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)) >=
+           base_speed)
+               ret |= OBD_CKSUM_CRC32;
+
+       if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512) >= base_speed)
+               ret |= OBD_CKSUM_T10IP512;
+
+       if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K) >= base_speed)
+               ret |= OBD_CKSUM_T10IP4K;
+
+       if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512) >= base_speed)
+               ret |= OBD_CKSUM_T10CRC512;
+
+       if (obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K) >= base_speed)
+               ret |= OBD_CKSUM_T10CRC4K;
+
+       return ret;
+}
+EXPORT_SYMBOL(obd_cksum_types_supported_server);
+
+/* The OBD_FL_CKSUM_* flags is packed into 5 bits of o_flags, since there can
+ * only be a single checksum type per RPC.
+ *
+ * The OBD_CKSUM_* type bits passed in ocd_cksum_types are a 32-bit bitmask
+ * since they need to represent the full range of checksum algorithms that
+ * both the client and server can understand.
+ *
+ * In case of an unsupported types/flags we fall back to ADLER
+ * because that is supported by all clients since 1.8
+ *
+ * In case multiple algorithms are supported the best one is used. */
+u32 obd_cksum_type_pack(const char *obd_name, enum cksum_types cksum_type)
+{
+       unsigned int performance = 0, tmp;
+       u32 flag = OBD_FL_CKSUM_ADLER;
+
+       if (cksum_type & OBD_CKSUM_CRC32) {
+               tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32));
+               if (tmp > performance) {
+                       performance = tmp;
+                       flag = OBD_FL_CKSUM_CRC32;
+               }
+       }
+       if (cksum_type & OBD_CKSUM_CRC32C) {
+               tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32C));
+               if (tmp > performance) {
+                       performance = tmp;
+                       flag = OBD_FL_CKSUM_CRC32C;
+               }
+       }
+       if (cksum_type & OBD_CKSUM_ADLER) {
+               tmp = cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_ADLER));
+               if (tmp > performance) {
+                       performance = tmp;
+                       flag = OBD_FL_CKSUM_ADLER;
+               }
+       }
+
+       if (cksum_type & OBD_CKSUM_T10IP512) {
+               tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP512);
+               if (tmp > performance) {
+                       performance = tmp;
+                       flag = OBD_FL_CKSUM_T10IP512;
+               }
+       }
+
+       if (cksum_type & OBD_CKSUM_T10IP4K) {
+               tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10IP4K);
+               if (tmp > performance) {
+                       performance = tmp;
+                       flag = OBD_FL_CKSUM_T10IP4K;
+               }
+       }
+
+       if (cksum_type & OBD_CKSUM_T10CRC512) {
+               tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC512);
+               if (tmp > performance) {
+                       performance = tmp;
+                       flag = OBD_FL_CKSUM_T10CRC512;
+               }
+       }
+
+       if (cksum_type & OBD_CKSUM_T10CRC4K) {
+               tmp = obd_t10_cksum_speed(obd_name, OBD_CKSUM_T10CRC4K);
+               if (tmp > performance) {
+                       performance = tmp;
+                       flag = OBD_FL_CKSUM_T10CRC4K;
+               }
+       }
+
+       if (unlikely(cksum_type && !(cksum_type & OBD_CKSUM_ALL)))
+               CWARN("%s: unknown cksum type %x\n", obd_name, cksum_type);
+
+       return flag;
+}
+EXPORT_SYMBOL(obd_cksum_type_pack);
index 032e9a5..1f6fbd4 100644 (file)
@@ -757,6 +757,67 @@ static int ofd_site_stats_seq_show(struct seq_file *m, void *data)
 }
 LPROC_SEQ_FOPS_RO(ofd_site_stats);
 
 }
 LPROC_SEQ_FOPS_RO(ofd_site_stats);
 
+/**
+ * Show if the OFD enforces T10PI checksum.
+ *
+ * \param[in] m                seq_file handle
+ * \param[in] data     unused for single entry
+ *
+ * \retval             0 on success
+ * \retval             negative value on error
+ */
+static int ofd_checksum_t10pi_enforce_seq_show(struct seq_file *m, void *data)
+{
+       struct obd_device *obd = m->private;
+       struct ofd_device *ofd = ofd_dev(obd->obd_lu_dev);
+
+       seq_printf(m, "%u\n", ofd->ofd_checksum_t10pi_enforce);
+       return 0;
+}
+
+/**
+ * Force specific T10PI checksum modes to be enabled
+ *
+ * If T10PI *is* supported in hardware, allow only the supported T10PI type
+ * to be used. If T10PI is *not* supported by the OSD, setting the enforce
+ * parameter forces all T10PI types to be enabled (even if slower) for
+ * testing.
+ *
+ * The final determination of which algorithm to be used depends whether
+ * the client supports T10PI or not, and is handled at client connect time.
+ *
+ * \param[in] file     proc file
+ * \param[in] buffer   string which represents mode
+ *                     1: set T10PI checksums enforced
+ *                     0: unset T10PI checksums enforced
+ * \param[in] count    \a buffer length
+ * \param[in] off      unused for single entry
+ *
+ * \retval             \a count on success
+ * \retval             negative number on error
+ */
+static ssize_t
+ofd_checksum_t10pi_enforce_seq_write(struct file *file,
+                                    const char __user *buffer,
+                                    size_t count, loff_t *off)
+{
+       struct seq_file *m = file->private_data;
+       struct obd_device *obd = m->private;
+       struct ofd_device *ofd = ofd_dev(obd->obd_lu_dev);
+       bool enforce;
+       int rc;
+
+       rc = kstrtobool_from_user(buffer, count, &enforce);
+       if (rc)
+               return rc;
+
+       spin_lock(&ofd->ofd_flags_lock);
+       ofd->ofd_checksum_t10pi_enforce = enforce;
+       spin_unlock(&ofd->ofd_flags_lock);
+       return count;
+}
+LPROC_SEQ_FOPS(ofd_checksum_t10pi_enforce);
+
 LPROC_SEQ_FOPS_RO_TYPE(ofd, recovery_status);
 LPROC_SEQ_FOPS_RW_TYPE(ofd, recovery_time_soft);
 LPROC_SEQ_FOPS_RW_TYPE(ofd, recovery_time_hard);
 LPROC_SEQ_FOPS_RO_TYPE(ofd, recovery_status);
 LPROC_SEQ_FOPS_RW_TYPE(ofd, recovery_time_soft);
 LPROC_SEQ_FOPS_RW_TYPE(ofd, recovery_time_hard);
@@ -831,6 +892,8 @@ struct lprocfs_vars lprocfs_ofd_obd_vars[] = {
          .fops =       &ofd_lfsck_verify_pfid_fops     },
        { .name =       "site_stats",
          .fops =       &ofd_site_stats_fops            },
          .fops =       &ofd_lfsck_verify_pfid_fops     },
        { .name =       "site_stats",
          .fops =       &ofd_site_stats_fops            },
+       { .name =       "checksum_t10pi_enforce",
+         .fops =       &ofd_checksum_t10pi_enforce_fops        },
        { NULL }
 };
 
        { NULL }
 };
 
index 9d05bc7..7e306ff 100644 (file)
@@ -2907,6 +2907,7 @@ static int ofd_init0(const struct lu_env *env, struct ofd_device *m,
 
        spin_lock_init(&m->ofd_flags_lock);
        m->ofd_raid_degraded = 0;
 
        spin_lock_init(&m->ofd_flags_lock);
        m->ofd_raid_degraded = 0;
+       m->ofd_checksum_t10pi_enforce = 0;
        m->ofd_syncjournal = 0;
        ofd_slc_set(m);
        m->ofd_soft_sync_limit = OFD_SOFT_SYNC_LIMIT_DEFAULT;
        m->ofd_syncjournal = 0;
        ofd_slc_set(m);
        m->ofd_soft_sync_limit = OFD_SOFT_SYNC_LIMIT_DEFAULT;
@@ -2982,7 +2983,8 @@ static int ofd_init0(const struct lu_env *env, struct ofd_device *m,
        tgd->tgd_reserved_pcnt = 0;
 
        m->ofd_brw_size = m->ofd_lut.lut_dt_conf.ddp_brw_size;
        tgd->tgd_reserved_pcnt = 0;
 
        m->ofd_brw_size = m->ofd_lut.lut_dt_conf.ddp_brw_size;
-       m->ofd_cksum_types_supported = cksum_types_supported_server();
+       m->ofd_cksum_types_supported =
+               obd_cksum_types_supported_server(obd->obd_name);
        m->ofd_precreate_batch = OFD_PRECREATE_BATCH_DEFAULT;
        if (tgd->tgd_osfs.os_bsize * tgd->tgd_osfs.os_blocks <
            OFD_PRECREATE_SMALL_FS)
        m->ofd_precreate_batch = OFD_PRECREATE_BATCH_DEFAULT;
        if (tgd->tgd_osfs.os_bsize * tgd->tgd_osfs.os_blocks <
            OFD_PRECREATE_SMALL_FS)
index 24a04b5..19e4b33 100644 (file)
@@ -147,7 +147,9 @@ struct ofd_device {
                                 ofd_lastid_rebuilding:1,
                                 ofd_record_fid_accessed:1,
                                 ofd_lfsck_verify_pfid:1,
                                 ofd_lastid_rebuilding:1,
                                 ofd_record_fid_accessed:1,
                                 ofd_lfsck_verify_pfid:1,
-                                ofd_skip_lfsck:1;
+                                ofd_skip_lfsck:1,
+                                /* Whether to enforce T10PI checksum of RPC */
+                                ofd_checksum_t10pi_enforce:1;
        struct seq_server_site   ofd_seq_site;
        /* the limit of SOFT_SYNC RPCs that will trigger a soft sync */
        unsigned int             ofd_soft_sync_limit;
        struct seq_server_site   ofd_seq_site;
        /* the limit of SOFT_SYNC RPCs that will trigger a soft sync */
        unsigned int             ofd_soft_sync_limit;
index 9693692..28bef97 100644 (file)
@@ -109,6 +109,93 @@ out:
 }
 
 /**
 }
 
 /**
+ * Decide which checksums both client and OST support, possibly forcing
+ * the use of T10PI checksums if the hardware supports this.
+ *
+ * The clients that have no T10-PI RPC checksum support will use the same
+ * mechanism to select checksum type as before, and will not be affected by
+ * the following logic.
+ *
+ * For the clients that have T10-PI RPC checksum support:
+ *
+ * If the OST supports T10-PI feature and T10-PI checksum is enforced, clients
+ * will have no other choice for RPC checksum type other than using the T10PI
+ * checksum type. This is useful for enforcing end-to-end integrity in the
+ * whole system.
+ *
+ * If the OST doesn't support T10-PI feature and T10-PI checksum is enforced,
+ * together with other checksum with reasonably good speeds (e.g. crc32,
+ * crc32c, adler, etc.), all T10-PI checksum types understood by the client
+ * (t10ip512, t10ip4K, t10crc512, t10crc4K) will be added to the available
+ * checksum types, regardless of the speeds of T10-PI checksums. This is
+ * useful for testing T10-PI checksum of RPC.
+ *
+ * If the OST supports T10-PI feature and T10-PI checksum is NOT enforced,
+ * the corresponding T10-PI checksum type will be added to the checksum type
+ * list, regardless of the speed of the T10-PI checksum. This provides clients
+ * the flexibility to choose whether to enable end-to-end integrity or not.
+ *
+ * If the OST does NOT supports T10-PI feature and T10-PI checksum is NOT
+ * enforced, together with other checksums with reasonably good speeds,
+ * all the T10-PI checksum types with good speeds will be added into the
+ * checksum type list. Note that a T10-PI checksum type with a speed worse
+ * than half of Alder will NOT be added as a option. In this circumstance,
+ * T10-PI checksum types has the same behavior like other normal checksum
+ * types.
+ *
+ */
+static void
+ofd_mask_cksum_types(struct ofd_device *ofd, enum cksum_types *cksum_types)
+{
+       bool enforce = ofd->ofd_checksum_t10pi_enforce;
+       enum cksum_types ofd_t10_cksum_type;
+       enum cksum_types client_t10_types = *cksum_types & OBD_CKSUM_T10_ALL;
+       enum cksum_types server_t10_types;
+
+       /*
+        * The client set in ocd_cksum_types the checksum types it
+        * supports. We have to mask off the algorithms that we don't
+        * support. T10PI checksum types will be added later.
+        */
+       *cksum_types &= (ofd->ofd_cksum_types_supported & ~OBD_CKSUM_T10_ALL);
+       server_t10_types = ofd->ofd_cksum_types_supported & OBD_CKSUM_T10_ALL;
+       ofd_t10_cksum_type = ofd->ofd_lut.lut_dt_conf.ddp_t10_cksum_type;
+
+       /* Quick exit if no T10-PI support on client */
+       if (!client_t10_types)
+               return;
+
+       /*
+        * This OST has NO T10-PI feature. Add all supported T10-PI checksums
+        * as options if T10-PI checksum is enforced. If the T10-PI checksum is
+        * not enforced, only add them as options when speed is good.
+        */
+       if (ofd_t10_cksum_type == 0) {
+               /*
+                * Server allows all T10PI checksums, and server_t10_types
+                * include quick ones.
+                */
+               if (enforce)
+                       *cksum_types |= client_t10_types;
+               else
+                       *cksum_types |= client_t10_types & server_t10_types;
+               return;
+       }
+
+       /*
+        * This OST has T10-PI feature. Disable all other checksum types if
+        * T10-PI checksum is enforced. If the T10-PI checksum is not enforced,
+        * add the checksum type as an option.
+        */
+       if (client_t10_types & ofd_t10_cksum_type) {
+               if (enforce)
+                       *cksum_types = ofd_t10_cksum_type;
+               else
+                       *cksum_types |= ofd_t10_cksum_type;
+       }
+}
+
+/**
  * Match client and OST server connection feature flags.
  *
  * Compute the compatibility flags for a connection request based on
  * Match client and OST server connection feature flags.
  *
  * Compute the compatibility flags for a connection request based on
@@ -142,8 +229,8 @@ static int ofd_parse_connect_data(const struct lu_env *env,
                                  struct obd_connect_data *data,
                                  bool new_connection)
 {
                                  struct obd_connect_data *data,
                                  bool new_connection)
 {
-       struct ofd_device                *ofd = ofd_exp(exp);
-       struct filter_export_data        *fed = &exp->exp_filter_data;
+       struct ofd_device *ofd = ofd_exp(exp);
+       struct filter_export_data *fed = &exp->exp_filter_data;
 
        if (!data)
                RETURN(0);
 
        if (!data)
                RETURN(0);
@@ -244,10 +331,7 @@ static int ofd_parse_connect_data(const struct lu_env *env,
        if (data->ocd_connect_flags & OBD_CONNECT_CKSUM) {
                __u32 cksum_types = data->ocd_cksum_types;
 
        if (data->ocd_connect_flags & OBD_CONNECT_CKSUM) {
                __u32 cksum_types = data->ocd_cksum_types;
 
-               /* The client set in ocd_cksum_types the checksum types it
-                * supports. We have to mask off the algorithms that we don't
-                * support */
-               data->ocd_cksum_types &= ofd->ofd_cksum_types_supported;
+               ofd_mask_cksum_types(ofd, &data->ocd_cksum_types);
 
                if (unlikely(data->ocd_cksum_types == 0)) {
                        CERROR("%s: Connect with checksum support but no "
 
                if (unlikely(data->ocd_cksum_types == 0)) {
                        CERROR("%s: Connect with checksum support but no "
index d196a83..62dd791 100644 (file)
@@ -1033,6 +1033,103 @@ static inline int can_merge_pages(struct brw_page *p1, struct brw_page *p2)
         return (p1->off + p1->count == p2->off);
 }
 
         return (p1->off + p1->count == p2->off);
 }
 
+static int osc_checksum_bulk_t10pi(const char *obd_name, int nob,
+                                  size_t pg_count, struct brw_page **pga,
+                                  int opc, obd_dif_csum_fn *fn,
+                                  int sector_size,
+                                  u32 *check_sum)
+{
+       struct cfs_crypto_hash_desc *hdesc;
+       /* Used Adler as the default checksum type on top of DIF tags */
+       unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP);
+       struct page *__page;
+       unsigned char *buffer;
+       __u16 *guard_start;
+       unsigned int bufsize;
+       int guard_number;
+       int used_number = 0;
+       int used;
+       u32 cksum;
+       int rc = 0;
+       int i = 0;
+
+       LASSERT(pg_count > 0);
+
+       __page = alloc_page(GFP_KERNEL);
+       if (__page == NULL)
+               return -ENOMEM;
+
+       hdesc = cfs_crypto_hash_init(cfs_alg, NULL, 0);
+       if (IS_ERR(hdesc)) {
+               rc = PTR_ERR(hdesc);
+               CERROR("%s: unable to initialize checksum hash %s: rc = %d\n",
+                      obd_name, cfs_crypto_hash_name(cfs_alg), rc);
+               GOTO(out, rc);
+       }
+
+       buffer = kmap(__page);
+       guard_start = (__u16 *)buffer;
+       guard_number = PAGE_SIZE / sizeof(*guard_start);
+       while (nob > 0 && pg_count > 0) {
+               unsigned int count = pga[i]->count > nob ? nob : pga[i]->count;
+
+               /* corrupt the data before we compute the checksum, to
+                * simulate an OST->client data error */
+               if (unlikely(i == 0 && opc == OST_READ &&
+                            OBD_FAIL_CHECK(OBD_FAIL_OSC_CHECKSUM_RECEIVE))) {
+                       unsigned char *ptr = kmap(pga[i]->pg);
+                       int off = pga[i]->off & ~PAGE_MASK;
+
+                       memcpy(ptr + off, "bad1", min_t(typeof(nob), 4, nob));
+                       kunmap(pga[i]->pg);
+               }
+
+               /*
+                * The left guard number should be able to hold checksums of a
+                * whole page
+                */
+               rc = obd_page_dif_generate_buffer(obd_name, pga[i]->pg, 0,
+                                                 count,
+                                                 guard_start + used_number,
+                                                 guard_number - used_number,
+                                                 &used, sector_size,
+                                                 fn);
+               if (rc)
+                       break;
+
+               used_number += used;
+               if (used_number == guard_number) {
+                       cfs_crypto_hash_update_page(hdesc, __page, 0,
+                               used_number * sizeof(*guard_start));
+                       used_number = 0;
+               }
+
+               nob -= pga[i]->count;
+               pg_count--;
+               i++;
+       }
+       kunmap(__page);
+       if (rc)
+               GOTO(out, rc);
+
+       if (used_number != 0)
+               cfs_crypto_hash_update_page(hdesc, __page, 0,
+                       used_number * sizeof(*guard_start));
+
+       bufsize = sizeof(cksum);
+       cfs_crypto_hash_final(hdesc, (unsigned char *)&cksum, &bufsize);
+
+       /* For sending we only compute the wrong checksum instead
+        * of corrupting the data so it is still correct on a redo */
+       if (opc == OST_WRITE && OBD_FAIL_CHECK(OBD_FAIL_OSC_CHECKSUM_SEND))
+               cksum++;
+
+       *check_sum = cksum;
+out:
+       __free_page(__page);
+       return rc;
+}
+
 static int osc_checksum_bulk(int nob, size_t pg_count,
                             struct brw_page **pga, int opc,
                             enum cksum_types cksum_type,
 static int osc_checksum_bulk(int nob, size_t pg_count,
                             struct brw_page **pga, int opc,
                             enum cksum_types cksum_type,
@@ -1087,6 +1184,29 @@ static int osc_checksum_bulk(int nob, size_t pg_count,
        return 0;
 }
 
        return 0;
 }
 
+static int osc_checksum_bulk_rw(const char *obd_name,
+                               enum cksum_types cksum_type,
+                               int nob, size_t pg_count,
+                               struct brw_page **pga, int opc,
+                               u32 *check_sum)
+{
+       obd_dif_csum_fn *fn = NULL;
+       int sector_size = 0;
+       int rc;
+
+       ENTRY;
+       obd_t10_cksum2dif(cksum_type, &fn, &sector_size);
+
+       if (fn)
+               rc = osc_checksum_bulk_t10pi(obd_name, nob, pg_count, pga,
+                                            opc, fn, sector_size, check_sum);
+       else
+               rc = osc_checksum_bulk(nob, pg_count, pga, opc, cksum_type,
+                                      check_sum);
+
+       RETURN(rc);
+}
+
 static int
 osc_brw_prep_request(int cmd, struct client_obd *cli, struct obdo *oa,
                     u32 page_count, struct brw_page **pga,
 static int
 osc_brw_prep_request(int cmd, struct client_obd *cli, struct obdo *oa,
                     u32 page_count, struct brw_page **pga,
@@ -1102,6 +1222,7 @@ osc_brw_prep_request(int cmd, struct client_obd *cli, struct obdo *oa,
         struct req_capsule      *pill;
         struct brw_page *pg_prev;
        void *short_io_buf;
         struct req_capsule      *pill;
         struct brw_page *pg_prev;
        void *short_io_buf;
+       const char *obd_name = cli->cl_import->imp_obd->obd_name;
 
         ENTRY;
         if (OBD_FAIL_CHECK(OBD_FAIL_OSC_BRW_PREP_REQ))
 
         ENTRY;
         if (OBD_FAIL_CHECK(OBD_FAIL_OSC_BRW_PREP_REQ))
@@ -1296,12 +1417,14 @@ no_bulk:
                         if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0)
                                 body->oa.o_flags = 0;
 
                         if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0)
                                 body->oa.o_flags = 0;
 
-                        body->oa.o_flags |= cksum_type_pack(cksum_type);
+                       body->oa.o_flags |= obd_cksum_type_pack(obd_name,
+                                                               cksum_type);
                         body->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
 
                         body->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
 
-                       rc = osc_checksum_bulk(requested_nob, page_count,
-                                              pga, OST_WRITE, cksum_type,
-                                              &body->oa.o_cksum);
+                       rc = osc_checksum_bulk_rw(obd_name, cksum_type,
+                                                 requested_nob, page_count,
+                                                 pga, OST_WRITE,
+                                                 &body->oa.o_cksum);
                        if (rc < 0) {
                                CDEBUG(D_PAGE, "failed to checksum, rc = %d\n",
                                       rc);
                        if (rc < 0) {
                                CDEBUG(D_PAGE, "failed to checksum, rc = %d\n",
                                       rc);
@@ -1312,7 +1435,8 @@ no_bulk:
 
                         /* save this in 'oa', too, for later checking */
                         oa->o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
 
                         /* save this in 'oa', too, for later checking */
                         oa->o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
-                        oa->o_flags |= cksum_type_pack(cksum_type);
+                       oa->o_flags |= obd_cksum_type_pack(obd_name,
+                                                          cksum_type);
                 } else {
                         /* clear out the checksum flag, in case this is a
                          * resend but cl_checksum is no longer set. b=11238 */
                 } else {
                         /* clear out the checksum flag, in case this is a
                          * resend but cl_checksum is no longer set. b=11238 */
@@ -1327,9 +1451,10 @@ no_bulk:
                     !sptlrpc_flavor_has_bulk(&req->rq_flvr)) {
                         if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0)
                                 body->oa.o_flags = 0;
                     !sptlrpc_flavor_has_bulk(&req->rq_flvr)) {
                         if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0)
                                 body->oa.o_flags = 0;
-                        body->oa.o_flags |= cksum_type_pack(cli->cl_cksum_type);
+                       body->oa.o_flags |= obd_cksum_type_pack(obd_name,
+                               cli->cl_cksum_type);
                         body->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
                         body->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
-                }
+               }
 
                /* Client cksum has been already copied to wire obdo in previous
                 * lustre_set_wire_obdo(), and in the case a bulk-read is being
 
                /* Client cksum has been already copied to wire obdo in previous
                 * lustre_set_wire_obdo(), and in the case a bulk-read is being
@@ -1426,12 +1551,16 @@ static void dump_all_bulk_pages(struct obdo *oa, __u32 page_count,
 
 static int
 check_write_checksum(struct obdo *oa, const struct lnet_process_id *peer,
 
 static int
 check_write_checksum(struct obdo *oa, const struct lnet_process_id *peer,
-                               __u32 client_cksum, __u32 server_cksum,
-                               struct osc_brw_async_args *aa)
+                    __u32 client_cksum, __u32 server_cksum,
+                    struct osc_brw_async_args *aa)
 {
 {
-        __u32 new_cksum;
-        char *msg;
+       const char *obd_name = aa->aa_cli->cl_import->imp_obd->obd_name;
        enum cksum_types cksum_type;
        enum cksum_types cksum_type;
+       obd_dif_csum_fn *fn = NULL;
+       int sector_size = 0;
+       bool t10pi = false;
+       __u32 new_cksum;
+       char *msg;
        int rc;
 
         if (server_cksum == client_cksum) {
        int rc;
 
         if (server_cksum == client_cksum) {
@@ -1443,15 +1572,50 @@ check_write_checksum(struct obdo *oa, const struct lnet_process_id *peer,
                dump_all_bulk_pages(oa, aa->aa_page_count, aa->aa_ppga,
                                    server_cksum, client_cksum);
 
                dump_all_bulk_pages(oa, aa->aa_page_count, aa->aa_ppga,
                                    server_cksum, client_cksum);
 
-       cksum_type = cksum_type_unpack(oa->o_valid & OBD_MD_FLFLAGS ?
-                                      oa->o_flags : 0);
-       rc = osc_checksum_bulk(aa->aa_requested_nob, aa->aa_page_count,
-                              aa->aa_ppga, OST_WRITE, cksum_type,
-                              &new_cksum);
+       cksum_type = obd_cksum_type_unpack(oa->o_valid & OBD_MD_FLFLAGS ?
+                                          oa->o_flags : 0);
+
+       switch (cksum_type) {
+       case OBD_CKSUM_T10IP512:
+               t10pi = true;
+               fn = obd_dif_ip_fn;
+               sector_size = 512;
+               break;
+       case OBD_CKSUM_T10IP4K:
+               t10pi = true;
+               fn = obd_dif_ip_fn;
+               sector_size = 4096;
+               break;
+       case OBD_CKSUM_T10CRC512:
+               t10pi = true;
+               fn = obd_dif_crc_fn;
+               sector_size = 512;
+               break;
+       case OBD_CKSUM_T10CRC4K:
+               t10pi = true;
+               fn = obd_dif_crc_fn;
+               sector_size = 4096;
+               break;
+       default:
+               break;
+       }
+
+       if (t10pi)
+               rc = osc_checksum_bulk_t10pi(obd_name, aa->aa_requested_nob,
+                                            aa->aa_page_count,
+                                            aa->aa_ppga,
+                                            OST_WRITE,
+                                            fn,
+                                            sector_size,
+                                            &new_cksum);
+       else
+               rc = osc_checksum_bulk(aa->aa_requested_nob, aa->aa_page_count,
+                                      aa->aa_ppga, OST_WRITE, cksum_type,
+                                      &new_cksum);
 
        if (rc < 0)
                msg = "failed to calculate the client write checksum";
 
        if (rc < 0)
                msg = "failed to calculate the client write checksum";
-       else if (cksum_type != cksum_type_unpack(aa->aa_oa->o_flags))
+       else if (cksum_type != obd_cksum_type_unpack(aa->aa_oa->o_flags))
                 msg = "the server did not use the checksum type specified in "
                       "the original request - likely a protocol problem";
         else if (new_cksum == server_cksum)
                 msg = "the server did not use the checksum type specified in "
                       "the original request - likely a protocol problem";
         else if (new_cksum == server_cksum)
@@ -1467,15 +1631,15 @@ check_write_checksum(struct obdo *oa, const struct lnet_process_id *peer,
                           DFID " object "DOSTID" extent [%llu-%llu], original "
                           "client csum %x (type %x), server csum %x (type %x),"
                           " client csum now %x\n",
                           DFID " object "DOSTID" extent [%llu-%llu], original "
                           "client csum %x (type %x), server csum %x (type %x),"
                           " client csum now %x\n",
-                          aa->aa_cli->cl_import->imp_obd->obd_name,
-                          msg, libcfs_nid2str(peer->nid),
+                          obd_name, msg, libcfs_nid2str(peer->nid),
                           oa->o_valid & OBD_MD_FLFID ? oa->o_parent_seq : (__u64)0,
                           oa->o_valid & OBD_MD_FLFID ? oa->o_parent_oid : 0,
                           oa->o_valid & OBD_MD_FLFID ? oa->o_parent_ver : 0,
                           POSTID(&oa->o_oi), aa->aa_ppga[0]->off,
                           aa->aa_ppga[aa->aa_page_count - 1]->off +
                                aa->aa_ppga[aa->aa_page_count-1]->count - 1,
                           oa->o_valid & OBD_MD_FLFID ? oa->o_parent_seq : (__u64)0,
                           oa->o_valid & OBD_MD_FLFID ? oa->o_parent_oid : 0,
                           oa->o_valid & OBD_MD_FLFID ? oa->o_parent_ver : 0,
                           POSTID(&oa->o_oi), aa->aa_ppga[0]->off,
                           aa->aa_ppga[aa->aa_page_count - 1]->off +
                                aa->aa_ppga[aa->aa_page_count-1]->count - 1,
-                          client_cksum, cksum_type_unpack(aa->aa_oa->o_flags),
+                          client_cksum,
+                          obd_cksum_type_unpack(aa->aa_oa->o_flags),
                           server_cksum, cksum_type, new_cksum);
        return 1;
 }
                           server_cksum, cksum_type, new_cksum);
        return 1;
 }
@@ -1483,11 +1647,12 @@ check_write_checksum(struct obdo *oa, const struct lnet_process_id *peer,
 /* Note rc enters this function as number of bytes transferred */
 static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 {
 /* Note rc enters this function as number of bytes transferred */
 static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 {
-        struct osc_brw_async_args *aa = (void *)&req->rq_async_args;
+       struct osc_brw_async_args *aa = (void *)&req->rq_async_args;
+       struct client_obd *cli = aa->aa_cli;
+       const char *obd_name = cli->cl_import->imp_obd->obd_name;
        const struct lnet_process_id *peer =
        const struct lnet_process_id *peer =
-                        &req->rq_import->imp_connection->c_peer;
-        struct client_obd *cli = aa->aa_cli;
-        struct ost_body *body;
+               &req->rq_import->imp_connection->c_peer;
+       struct ost_body *body;
        u32 client_cksum = 0;
         ENTRY;
 
        u32 client_cksum = 0;
         ENTRY;
 
@@ -1607,16 +1772,16 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
                char      *via = "";
                char      *router = "";
                enum cksum_types cksum_type;
                char      *via = "";
                char      *router = "";
                enum cksum_types cksum_type;
-
-                cksum_type = cksum_type_unpack(body->oa.o_valid &OBD_MD_FLFLAGS?
-                                               body->oa.o_flags : 0);
-               rc = osc_checksum_bulk(rc, aa->aa_page_count, aa->aa_ppga,
-                                      OST_READ, cksum_type, &client_cksum);
-               if (rc < 0) {
-                       CDEBUG(D_PAGE,
-                              "failed to calculate checksum, rc = %d\n", rc);
+               u32 o_flags = body->oa.o_valid & OBD_MD_FLFLAGS ?
+                       body->oa.o_flags : 0;
+
+               cksum_type = obd_cksum_type_unpack(o_flags);
+               rc = osc_checksum_bulk_rw(obd_name, cksum_type, rc,
+                                         aa->aa_page_count, aa->aa_ppga,
+                                         OST_READ, &client_cksum);
+               if (rc < 0)
                        GOTO(out, rc);
                        GOTO(out, rc);
-               }
+
                if (req->rq_bulk != NULL &&
                    peer->nid != req->rq_bulk->bd_sender) {
                        via = " via ";
                if (req->rq_bulk != NULL &&
                    peer->nid != req->rq_bulk->bd_sender) {
                        via = " via ";
@@ -1638,7 +1803,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
                                           "%s%s%s inode "DFID" object "DOSTID
                                           " extent [%llu-%llu], client %x, "
                                           "server %x, cksum_type %x\n",
                                           "%s%s%s inode "DFID" object "DOSTID
                                           " extent [%llu-%llu], client %x, "
                                           "server %x, cksum_type %x\n",
-                                          req->rq_import->imp_obd->obd_name,
+                                          obd_name,
                                           libcfs_nid2str(peer->nid),
                                           via, router,
                                           clbody->oa.o_valid & OBD_MD_FLFID ?
                                           libcfs_nid2str(peer->nid),
                                           via, router,
                                           clbody->oa.o_valid & OBD_MD_FLFID ?
index 80814a8..70ae175 100644 (file)
@@ -2146,17 +2146,22 @@ out:
  * Concurrency: doesn't access mutable data.
  */
 static void osd_conf_get(const struct lu_env *env,
  * Concurrency: doesn't access mutable data.
  */
 static void osd_conf_get(const struct lu_env *env,
-                         const struct dt_device *dev,
-                         struct dt_device_param *param)
+                        const struct dt_device *dev,
+                        struct dt_device_param *param)
 {
 {
-        struct super_block *sb = osd_sb(osd_dt_dev(dev));
-       int                ea_overhead;
+       struct osd_device *d = osd_dt_dev(dev);
+       struct super_block *sb = osd_sb(d);
+       struct block_device *bdev = sb->s_bdev;
+       struct blk_integrity *bi = bdev_get_integrity(bdev);
+       unsigned short interval;
+       int ea_overhead;
+       const char *name;
 
 
-        /*
-         * XXX should be taken from not-yet-existing fs abstraction layer.
-         */
-        param->ddp_max_name_len = LDISKFS_NAME_LEN;
-        param->ddp_max_nlink    = LDISKFS_LINK_MAX;
+       /*
+        * XXX should be taken from not-yet-existing fs abstraction layer.
+        */
+       param->ddp_max_name_len = LDISKFS_NAME_LEN;
+       param->ddp_max_nlink    = LDISKFS_LINK_MAX;
        param->ddp_symlink_max  = sb->s_blocksize;
        param->ddp_mount_type     = LDD_MT_LDISKFS;
        if (ldiskfs_has_feature_extents(sb))
        param->ddp_symlink_max  = sb->s_blocksize;
        param->ddp_mount_type     = LDD_MT_LDISKFS;
        if (ldiskfs_has_feature_extents(sb))
@@ -2205,6 +2210,43 @@ static void osd_conf_get(const struct lu_env *env,
        else
 #endif
                param->ddp_brw_size = DT_DEF_BRW_SIZE;
        else
 #endif
                param->ddp_brw_size = DT_DEF_BRW_SIZE;
+
+       param->ddp_t10_cksum_type = 0;
+       if (bi) {
+               interval = blk_integrity_interval(bi);
+               name = blk_integrity_name(bi);
+               /*
+                * Expected values:
+                * T10-DIF-TYPE1-CRC
+                * T10-DIF-TYPE3-CRC
+                * T10-DIF-TYPE1-IP
+                * T10-DIF-TYPE3-IP
+                */
+               if (strncmp(name, "T10-DIF-TYPE",
+                           sizeof("T10-DIF-TYPE") - 1) == 0) {
+                       /* also skip "1/3-" at end */
+                       const int type_off = sizeof("T10-DIF-TYPE.");
+
+                       if (interval != 512 && interval != 4096)
+                               CERROR("%s: unsupported T10PI sector size %u\n",
+                                      d->od_svname, interval);
+                       else if (strcmp(name + type_off, "CRC") == 0)
+                               param->ddp_t10_cksum_type = interval == 512 ?
+                                       OBD_CKSUM_T10CRC512 :
+                                       OBD_CKSUM_T10CRC4K;
+                       else if (strcmp(name + type_off, "IP") == 0)
+                               param->ddp_t10_cksum_type = interval == 512 ?
+                                       OBD_CKSUM_T10IP512 :
+                                       OBD_CKSUM_T10IP4K;
+                       else
+                               CERROR("%s: unsupported checksum type of "
+                                      "T10PI type '%s'",
+                                      d->od_svname, name);
+               } else {
+                       CERROR("%s: unsupported T10PI type '%s'",
+                              d->od_svname, name);
+               }
+       }
 }
 
 /*
 }
 
 /*
index 363cf2d..658f339 100644 (file)
@@ -791,9 +791,9 @@ static int ptlrpc_busy_reconnect(int rc)
 }
 
 static int ptlrpc_connect_set_flags(struct obd_import *imp,
 }
 
 static int ptlrpc_connect_set_flags(struct obd_import *imp,
-                                    struct obd_connect_data *ocd,
-                                    __u64 old_connect_flags,
-                                    struct obd_export *exp, int init_connect)
+                                   struct obd_connect_data *ocd,
+                                   __u64 old_connect_flags,
+                                   struct obd_export *exp, int init_connect)
 {
        static bool warned;
        struct client_obd *cli = &imp->imp_obd->u.cli;
 {
        static bool warned;
        struct client_obd *cli = &imp->imp_obd->u.cli;
@@ -836,13 +836,13 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp,
                 * for algorithms we understand. The server masked off
                 * the checksum types it doesn't support */
                if ((ocd->ocd_cksum_types &
                 * for algorithms we understand. The server masked off
                 * the checksum types it doesn't support */
                if ((ocd->ocd_cksum_types &
-                    cksum_types_supported_client()) == 0) {
+                    obd_cksum_types_supported_client()) == 0) {
                        LCONSOLE_ERROR("The negotiation of the checksum "
                                       "alogrithm to use with server %s "
                                       "failed (%x/%x)\n",
                                       obd2cli_tgt(imp->imp_obd),
                                       ocd->ocd_cksum_types,
                        LCONSOLE_ERROR("The negotiation of the checksum "
                                       "alogrithm to use with server %s "
                                       "failed (%x/%x)\n",
                                       obd2cli_tgt(imp->imp_obd),
                                       ocd->ocd_cksum_types,
-                                      cksum_types_supported_client());
+                                      obd_cksum_types_supported_client());
                        return -EPROTO;
                } else {
                        cli->cl_supp_cksum_types = ocd->ocd_cksum_types;
                        return -EPROTO;
                } else {
                        cli->cl_supp_cksum_types = ocd->ocd_cksum_types;
@@ -852,7 +852,8 @@ static int ptlrpc_connect_set_flags(struct obd_import *imp,
                 * Enforce ADLER for backward compatibility*/
                cli->cl_supp_cksum_types = OBD_CKSUM_ADLER;
        }
                 * Enforce ADLER for backward compatibility*/
                cli->cl_supp_cksum_types = OBD_CKSUM_ADLER;
        }
-       cli->cl_cksum_type = cksum_type_select(cli->cl_supp_cksum_types);
+       cli->cl_cksum_type = obd_cksum_type_select(imp->imp_obd->obd_name,
+                                                  cli->cl_supp_cksum_types);
 
        if (ocd->ocd_connect_flags & OBD_CONNECT_BRW_SIZE)
                cli->cl_max_pages_per_rpc =
 
        if (ocd->ocd_connect_flags & OBD_CONNECT_BRW_SIZE)
                cli->cl_max_pages_per_rpc =
index fc97377..0adfdba 100644 (file)
@@ -1314,6 +1314,18 @@ void lustre_assert_wire_constants(void)
                (unsigned)OBD_CKSUM_ADLER);
        LASSERTF(OBD_CKSUM_CRC32C == 0x00000004UL, "found 0x%.8xUL\n",
                (unsigned)OBD_CKSUM_CRC32C);
                (unsigned)OBD_CKSUM_ADLER);
        LASSERTF(OBD_CKSUM_CRC32C == 0x00000004UL, "found 0x%.8xUL\n",
                (unsigned)OBD_CKSUM_CRC32C);
+       LASSERTF(OBD_CKSUM_RESERVED == 0x00000008UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_RESERVED);
+       LASSERTF(OBD_CKSUM_T10IP512 == 0x00000010UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10IP512);
+       LASSERTF(OBD_CKSUM_T10IP4K == 0x00000020UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10IP4K);
+       LASSERTF(OBD_CKSUM_T10CRC512 == 0x00000040UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10CRC512);
+       LASSERTF(OBD_CKSUM_T10CRC4K == 0x00000080UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10CRC4K);
+       LASSERTF(OBD_CKSUM_T10_TOP == 0x00000002UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10_TOP);
 
        /* Checks for struct ost_layout */
        LASSERTF((int)sizeof(struct ost_layout) == 28, "found %lld\n",
 
        /* Checks for struct ost_layout */
        LASSERTF((int)sizeof(struct ost_layout) == 28, "found %lld\n",
@@ -1566,7 +1578,10 @@ void lustre_assert_wire_constants(void)
        CLASSERT(OBD_FL_CKSUM_CRC32 == 0x00001000);
        CLASSERT(OBD_FL_CKSUM_ADLER == 0x00002000);
        CLASSERT(OBD_FL_CKSUM_CRC32C == 0x00004000);
        CLASSERT(OBD_FL_CKSUM_CRC32 == 0x00001000);
        CLASSERT(OBD_FL_CKSUM_ADLER == 0x00002000);
        CLASSERT(OBD_FL_CKSUM_CRC32C == 0x00004000);
-       CLASSERT(OBD_FL_CKSUM_RSVD2 == 0x00008000);
+       CLASSERT(OBD_FL_CKSUM_T10IP512 == 0x00005000);
+       CLASSERT(OBD_FL_CKSUM_T10IP4K == 0x00006000);
+       CLASSERT(OBD_FL_CKSUM_T10CRC512 == 0x00007000);
+       CLASSERT(OBD_FL_CKSUM_T10CRC4K == 0x00008000);
        CLASSERT(OBD_FL_CKSUM_RSVD3 == 0x00010000);
        CLASSERT(OBD_FL_SHRINK_GRANT == 0x00020000);
        CLASSERT(OBD_FL_MMAP == 0x00040000);
        CLASSERT(OBD_FL_CKSUM_RSVD3 == 0x00010000);
        CLASSERT(OBD_FL_SHRINK_GRANT == 0x00020000);
        CLASSERT(OBD_FL_MMAP == 0x00040000);
index 79c4837..aad2aaa 100644 (file)
@@ -1885,8 +1885,8 @@ static int check_read_checksum(struct niobuf_local *local_nb, int npages,
                dump_all_bulk_pages(oa, npages, local_nb, server_cksum,
                                    client_cksum);
 
                dump_all_bulk_pages(oa, npages, local_nb, server_cksum,
                                    client_cksum);
 
-       cksum_type = cksum_type_unpack(oa->o_valid & OBD_MD_FLFLAGS ?
-                                      oa->o_flags : 0);
+       cksum_type = obd_cksum_type_unpack(oa->o_valid & OBD_MD_FLFLAGS ?
+                                          oa->o_flags : 0);
 
        if (cksum_type != server_cksum_type)
                msg = "the server may have not used the checksum type specified"
 
        if (cksum_type != server_cksum_type)
                msg = "the server may have not used the checksum type specified"
@@ -1937,6 +1937,162 @@ static int tgt_pages2shortio(struct niobuf_local *local, int npages,
        return copied - size;
 }
 
        return copied - size;
 }
 
+static int tgt_checksum_niobuf_t10pi(struct lu_target *tgt,
+                                    struct niobuf_local *local_nb,
+                                    int npages, int opc,
+                                    obd_dif_csum_fn *fn,
+                                    int sector_size,
+                                    u32 *check_sum)
+{
+       unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP);
+       const char *obd_name = tgt->lut_obd->obd_name;
+       struct cfs_crypto_hash_desc *hdesc;
+       unsigned int bufsize;
+       unsigned char *buffer;
+       struct page *__page;
+       __u16 *guard_start;
+       int guard_number;
+       int used_number = 0;
+       __u32 cksum;
+       int rc = 0;
+       int used;
+       int i;
+
+       __page = alloc_page(GFP_KERNEL);
+       if (__page == NULL)
+               return -ENOMEM;
+
+       hdesc = cfs_crypto_hash_init(cfs_alg, NULL, 0);
+       if (IS_ERR(hdesc)) {
+               CERROR("%s: unable to initialize checksum hash %s\n",
+                      tgt_name(tgt), cfs_crypto_hash_name(cfs_alg));
+               return PTR_ERR(hdesc);
+       }
+
+       buffer = kmap(__page);
+       guard_start = (__u16 *)buffer;
+       guard_number = PAGE_SIZE / sizeof(*guard_start);
+       for (i = 0; i < npages; i++) {
+               /* corrupt the data before we compute the checksum, to
+                * simulate a client->OST data error */
+               if (i == 0 && opc == OST_WRITE &&
+                   OBD_FAIL_CHECK(OBD_FAIL_OST_CHECKSUM_RECEIVE)) {
+                       int off = local_nb[i].lnb_page_offset & ~PAGE_MASK;
+                       int len = local_nb[i].lnb_len;
+                       struct page *np = tgt_page_to_corrupt;
+
+                       if (np) {
+                               char *ptr = ll_kmap_atomic(local_nb[i].lnb_page,
+                                                       KM_USER0);
+                               char *ptr2 = page_address(np);
+
+                               memcpy(ptr2 + off, ptr + off, len);
+                               memcpy(ptr2 + off, "bad3", min(4, len));
+                               ll_kunmap_atomic(ptr, KM_USER0);
+
+                               /* LU-8376 to preserve original index for
+                                * display in dump_all_bulk_pages() */
+                               np->index = i;
+
+                               cfs_crypto_hash_update_page(hdesc, np, off,
+                                                           len);
+                               continue;
+                       } else {
+                               CERROR("%s: can't alloc page for corruption\n",
+                                      tgt_name(tgt));
+                       }
+               }
+
+               /*
+                * The left guard number should be able to hold checksums of a
+                * whole page
+                */
+               rc = obd_page_dif_generate_buffer(obd_name,
+                       local_nb[i].lnb_page,
+                       local_nb[i].lnb_page_offset & ~PAGE_MASK,
+                       local_nb[i].lnb_len, guard_start + used_number,
+                       guard_number - used_number, &used, sector_size,
+                       fn);
+               if (rc)
+                       break;
+
+               used_number += used;
+               if (used_number == guard_number) {
+                       cfs_crypto_hash_update_page(hdesc, __page, 0,
+                               used_number * sizeof(*guard_start));
+                       used_number = 0;
+               }
+
+                /* corrupt the data after we compute the checksum, to
+                * simulate an OST->client data error */
+               if (unlikely(i == 0 && opc == OST_READ &&
+                            OBD_FAIL_CHECK(OBD_FAIL_OST_CHECKSUM_SEND))) {
+                       int off = local_nb[i].lnb_page_offset & ~PAGE_MASK;
+                       int len = local_nb[i].lnb_len;
+                       struct page *np = tgt_page_to_corrupt;
+
+                       if (np) {
+                               char *ptr = ll_kmap_atomic(local_nb[i].lnb_page,
+                                                       KM_USER0);
+                               char *ptr2 = page_address(np);
+
+                               memcpy(ptr2 + off, ptr + off, len);
+                               memcpy(ptr2 + off, "bad4", min(4, len));
+                               ll_kunmap_atomic(ptr, KM_USER0);
+
+                               /* LU-8376 to preserve original index for
+                                * display in dump_all_bulk_pages() */
+                               np->index = i;
+
+                               cfs_crypto_hash_update_page(hdesc, np, off,
+                                                           len);
+                               continue;
+                       } else {
+                               CERROR("%s: can't alloc page for corruption\n",
+                                      tgt_name(tgt));
+                       }
+               }
+       }
+       kunmap(__page);
+       if (rc)
+               GOTO(out, rc);
+
+       if (used_number != 0)
+               cfs_crypto_hash_update_page(hdesc, __page, 0,
+                       used_number * sizeof(*guard_start));
+
+       bufsize = sizeof(cksum);
+       rc = cfs_crypto_hash_final(hdesc, (unsigned char *)&cksum, &bufsize);
+
+       if (rc == 0)
+               *check_sum = cksum;
+out:
+       __free_page(__page);
+       return rc;
+}
+
+static int tgt_checksum_niobuf_rw(struct lu_target *tgt,
+                                 enum cksum_types cksum_type,
+                                 struct niobuf_local *local_nb,
+                                 int npages, int opc, u32 *check_sum)
+{
+       obd_dif_csum_fn *fn = NULL;
+       int sector_size = 0;
+       int rc;
+
+       ENTRY;
+       obd_t10_cksum2dif(cksum_type, &fn, &sector_size);
+
+       if (fn)
+               rc = tgt_checksum_niobuf_t10pi(tgt, local_nb, npages,
+                                              opc, fn, sector_size,
+                                              check_sum);
+       else
+               rc = tgt_checksum_niobuf(tgt, local_nb, npages, opc,
+                                        cksum_type, check_sum);
+       RETURN(rc);
+}
+
 int tgt_brw_read(struct tgt_session_info *tsi)
 {
        struct ptlrpc_request   *req = tgt_ses_req(tsi);
 int tgt_brw_read(struct tgt_session_info *tsi)
 {
        struct ptlrpc_request   *req = tgt_ses_req(tsi);
@@ -1951,6 +2107,7 @@ int tgt_brw_read(struct tgt_session_info *tsi)
        int                      npages, nob = 0, rc, i, no_reply = 0,
                                 npages_read;
        struct tgt_thread_big_cache *tbc = req->rq_svc_thread->t_data;
        int                      npages, nob = 0, rc, i, no_reply = 0,
                                 npages_read;
        struct tgt_thread_big_cache *tbc = req->rq_svc_thread->t_data;
+       const char *obd_name = exp->exp_obd->obd_name;
 
        ENTRY;
 
 
        ENTRY;
 
@@ -2072,18 +2229,19 @@ int tgt_brw_read(struct tgt_session_info *tsi)
                rc = -E2BIG;
 
        if (body->oa.o_valid & OBD_MD_FLCKSUM) {
                rc = -E2BIG;
 
        if (body->oa.o_valid & OBD_MD_FLCKSUM) {
-               enum cksum_types cksum_type =
-                       cksum_type_unpack(body->oa.o_valid & OBD_MD_FLFLAGS ?
-                                         body->oa.o_flags : 0);
+               u32 flag = body->oa.o_valid & OBD_MD_FLFLAGS ?
+                          body->oa.o_flags : 0;
+               enum cksum_types cksum_type = obd_cksum_type_unpack(flag);
 
 
-               repbody->oa.o_flags = cksum_type_pack(cksum_type);
+               repbody->oa.o_flags = obd_cksum_type_pack(obd_name,
+                                                         cksum_type);
                repbody->oa.o_valid = OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
                repbody->oa.o_valid = OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
-               rc = tgt_checksum_niobuf(tsi->tsi_tgt, local_nb,
-                                        npages_read, OST_READ, cksum_type,
-                                        &repbody->oa.o_cksum);
+
+               rc = tgt_checksum_niobuf_rw(tsi->tsi_tgt, cksum_type,
+                                           local_nb, npages_read, OST_READ,
+                                           &repbody->oa.o_cksum);
                if (rc < 0)
                        GOTO(out_commitrw, rc);
                if (rc < 0)
                        GOTO(out_commitrw, rc);
-
                CDEBUG(D_PAGE, "checksum at read origin: %x\n",
                       repbody->oa.o_cksum);
 
                CDEBUG(D_PAGE, "checksum at read origin: %x\n",
                       repbody->oa.o_cksum);
 
@@ -2150,7 +2308,7 @@ out_lock:
                ptlrpc_req_drop_rs(req);
                LCONSOLE_WARN("%s: Bulk IO read error with %s (at %s), "
                              "client will retry: rc %d\n",
                ptlrpc_req_drop_rs(req);
                LCONSOLE_WARN("%s: Bulk IO read error with %s (at %s), "
                              "client will retry: rc %d\n",
-                             exp->exp_obd->obd_name,
+                             obd_name,
                              obd_uuid2str(&exp->exp_client_uuid),
                              obd_export_nid2str(exp), rc);
        }
                              obd_uuid2str(&exp->exp_client_uuid),
                              obd_export_nid2str(exp), rc);
        }
@@ -2266,6 +2424,7 @@ int tgt_brw_write(struct tgt_session_info *tsi)
        bool                     no_reply = false, mmap;
        struct tgt_thread_big_cache *tbc = req->rq_svc_thread->t_data;
        bool wait_sync = false;
        bool                     no_reply = false, mmap;
        struct tgt_thread_big_cache *tbc = req->rq_svc_thread->t_data;
        bool wait_sync = false;
+       const char *obd_name = exp->exp_obd->obd_name;
 
        ENTRY;
 
 
        ENTRY;
 
@@ -2418,14 +2577,16 @@ skip_transfer:
                static int cksum_counter;
 
                if (body->oa.o_valid & OBD_MD_FLFLAGS)
                static int cksum_counter;
 
                if (body->oa.o_valid & OBD_MD_FLFLAGS)
-                       cksum_type = cksum_type_unpack(body->oa.o_flags);
+                       cksum_type = obd_cksum_type_unpack(body->oa.o_flags);
 
                repbody->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
                repbody->oa.o_flags &= ~OBD_FL_CKSUM_ALL;
 
                repbody->oa.o_valid |= OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
                repbody->oa.o_flags &= ~OBD_FL_CKSUM_ALL;
-               repbody->oa.o_flags |= cksum_type_pack(cksum_type);
-               rc = tgt_checksum_niobuf(tsi->tsi_tgt, local_nb,
-                                        npages, OST_WRITE, cksum_type,
-                                        &repbody->oa.o_cksum);
+               repbody->oa.o_flags |= obd_cksum_type_pack(obd_name,
+                                                          cksum_type);
+
+               rc = tgt_checksum_niobuf_rw(tsi->tsi_tgt, cksum_type,
+                                           local_nb, npages, OST_WRITE,
+                                           &repbody->oa.o_cksum);
                if (rc < 0)
                        GOTO(out_commitrw, rc);
 
                if (rc < 0)
                        GOTO(out_commitrw, rc);
 
@@ -2503,7 +2664,7 @@ out:
                if (!exp->exp_obd->obd_no_transno)
                        LCONSOLE_WARN("%s: Bulk IO write error with %s (at %s),"
                                      " client will retry: rc = %d\n",
                if (!exp->exp_obd->obd_no_transno)
                        LCONSOLE_WARN("%s: Bulk IO write error with %s (at %s),"
                                      " client will retry: rc = %d\n",
-                                     exp->exp_obd->obd_name,
+                                     obd_name,
                                      obd_uuid2str(&exp->exp_client_uuid),
                                      obd_export_nid2str(exp), rc);
        }
                                      obd_uuid2str(&exp->exp_client_uuid),
                                      obd_export_nid2str(exp), rc);
        }
index 8f7bb5c..126f73a 100755 (executable)
@@ -6947,8 +6947,8 @@ set_checksums()
 
 export ORIG_CSUM_TYPE="`lctl get_param -n osc.*osc-[^mM]*.checksum_type |
                         sed 's/.*\[\(.*\)\].*/\1/g' | head -n1`"
 
 export ORIG_CSUM_TYPE="`lctl get_param -n osc.*osc-[^mM]*.checksum_type |
                         sed 's/.*\[\(.*\)\].*/\1/g' | head -n1`"
-CKSUM_TYPES=${CKSUM_TYPES:-"crc32 adler"}
-[ "$ORIG_CSUM_TYPE" = "crc32c" ] && CKSUM_TYPES="$CKSUM_TYPES crc32c"
+CKSUM_TYPES=${CKSUM_TYPES:-$(lctl get_param -n osc.*osc-[^mM]*.checksum_type |
+                            tr -d [] | head -n1)}
 set_checksum_type()
 {
        lctl set_param -n osc.*osc-[^mM]*.checksum_type $1
 set_checksum_type()
 {
        lctl set_param -n osc.*osc-[^mM]*.checksum_type $1
index 8816d02..67efc67 100644 (file)
@@ -594,6 +594,12 @@ check_obd_connect_data(void)
        CHECK_VALUE_X(OBD_CKSUM_CRC32);
        CHECK_VALUE_X(OBD_CKSUM_ADLER);
        CHECK_VALUE_X(OBD_CKSUM_CRC32C);
        CHECK_VALUE_X(OBD_CKSUM_CRC32);
        CHECK_VALUE_X(OBD_CKSUM_ADLER);
        CHECK_VALUE_X(OBD_CKSUM_CRC32C);
+       CHECK_VALUE_X(OBD_CKSUM_RESERVED);
+       CHECK_VALUE_X(OBD_CKSUM_T10IP512);
+       CHECK_VALUE_X(OBD_CKSUM_T10IP4K);
+       CHECK_VALUE_X(OBD_CKSUM_T10CRC512);
+       CHECK_VALUE_X(OBD_CKSUM_T10CRC4K);
+       CHECK_VALUE_X(OBD_CKSUM_T10_TOP);
 }
 
 static void
 }
 
 static void
@@ -704,7 +710,10 @@ check_obdo(void)
        CHECK_CVALUE_X(OBD_FL_CKSUM_CRC32);
        CHECK_CVALUE_X(OBD_FL_CKSUM_ADLER);
        CHECK_CVALUE_X(OBD_FL_CKSUM_CRC32C);
        CHECK_CVALUE_X(OBD_FL_CKSUM_CRC32);
        CHECK_CVALUE_X(OBD_FL_CKSUM_ADLER);
        CHECK_CVALUE_X(OBD_FL_CKSUM_CRC32C);
-       CHECK_CVALUE_X(OBD_FL_CKSUM_RSVD2);
+       CHECK_CVALUE_X(OBD_FL_CKSUM_T10IP512);
+       CHECK_CVALUE_X(OBD_FL_CKSUM_T10IP4K);
+       CHECK_CVALUE_X(OBD_FL_CKSUM_T10CRC512);
+       CHECK_CVALUE_X(OBD_FL_CKSUM_T10CRC4K);
        CHECK_CVALUE_X(OBD_FL_CKSUM_RSVD3);
        CHECK_CVALUE_X(OBD_FL_SHRINK_GRANT);
        CHECK_CVALUE_X(OBD_FL_MMAP);
        CHECK_CVALUE_X(OBD_FL_CKSUM_RSVD3);
        CHECK_CVALUE_X(OBD_FL_SHRINK_GRANT);
        CHECK_CVALUE_X(OBD_FL_MMAP);
index 8661cdf..777e224 100644 (file)
@@ -1335,6 +1335,18 @@ void lustre_assert_wire_constants(void)
                (unsigned)OBD_CKSUM_ADLER);
        LASSERTF(OBD_CKSUM_CRC32C == 0x00000004UL, "found 0x%.8xUL\n",
                (unsigned)OBD_CKSUM_CRC32C);
                (unsigned)OBD_CKSUM_ADLER);
        LASSERTF(OBD_CKSUM_CRC32C == 0x00000004UL, "found 0x%.8xUL\n",
                (unsigned)OBD_CKSUM_CRC32C);
+       LASSERTF(OBD_CKSUM_RESERVED == 0x00000008UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_RESERVED);
+       LASSERTF(OBD_CKSUM_T10IP512 == 0x00000010UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10IP512);
+       LASSERTF(OBD_CKSUM_T10IP4K == 0x00000020UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10IP4K);
+       LASSERTF(OBD_CKSUM_T10CRC512 == 0x00000040UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10CRC512);
+       LASSERTF(OBD_CKSUM_T10CRC4K == 0x00000080UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10CRC4K);
+       LASSERTF(OBD_CKSUM_T10_TOP == 0x00000002UL, "found 0x%.8xUL\n",
+               (unsigned)OBD_CKSUM_T10_TOP);
 
        /* Checks for struct ost_layout */
        LASSERTF((int)sizeof(struct ost_layout) == 28, "found %lld\n",
 
        /* Checks for struct ost_layout */
        LASSERTF((int)sizeof(struct ost_layout) == 28, "found %lld\n",
@@ -1587,7 +1599,10 @@ void lustre_assert_wire_constants(void)
        CLASSERT(OBD_FL_CKSUM_CRC32 == 0x00001000);
        CLASSERT(OBD_FL_CKSUM_ADLER == 0x00002000);
        CLASSERT(OBD_FL_CKSUM_CRC32C == 0x00004000);
        CLASSERT(OBD_FL_CKSUM_CRC32 == 0x00001000);
        CLASSERT(OBD_FL_CKSUM_ADLER == 0x00002000);
        CLASSERT(OBD_FL_CKSUM_CRC32C == 0x00004000);
-       CLASSERT(OBD_FL_CKSUM_RSVD2 == 0x00008000);
+       CLASSERT(OBD_FL_CKSUM_T10IP512 == 0x00005000);
+       CLASSERT(OBD_FL_CKSUM_T10IP4K == 0x00006000);
+       CLASSERT(OBD_FL_CKSUM_T10CRC512 == 0x00007000);
+       CLASSERT(OBD_FL_CKSUM_T10CRC4K == 0x00008000);
        CLASSERT(OBD_FL_CKSUM_RSVD3 == 0x00010000);
        CLASSERT(OBD_FL_SHRINK_GRANT == 0x00020000);
        CLASSERT(OBD_FL_MMAP == 0x00040000);
        CLASSERT(OBD_FL_CKSUM_RSVD3 == 0x00010000);
        CLASSERT(OBD_FL_SHRINK_GRANT == 0x00020000);
        CLASSERT(OBD_FL_MMAP == 0x00040000);