Whamcloud - gitweb
LU-10499 pcc: avoid dead lock for auto attach in PCC-RO 90/54390/9
authorQian Yingjin <qian@ddn.com>
Wed, 12 May 2021 03:43:28 +0000 (11:43 +0800)
committerOleg Drokin <green@whamcloud.com>
Sat, 13 Jul 2024 20:52:14 +0000 (20:52 +0000)
In this patch, It releases the pcc inode lock when calling
ll_layout_refresh() in @pcc_try_auto_attach() as it may cause the
following deadlock:
1. The client is writing or truncating a file in readonly mode.
   At this time, it will send a write layout intent lock to clear
   the readonly state on the layout on MDT.
2. A read process tries to auto attach the file with pcc inode
   lock hold. During the pregress of auto attach, it will call
   ll_layout_refresh(). The client-side enqueue request for a
   layout lock returned a blocked lock, it will sleep and wait for
   the lock being granted;
3. MDT will take EX layout lock to cancel all cached layout lock
   on client to change the layout for clearing the PCC-RO state.
4. when the client handles the revocation of layout lock, it needs
   to invalidate the PCC state which needs under the protection of
   pcc inode lock.

EX-3191 pcc: add test for mmap | write | detach racer

This patch adds the mmap racer among: (write | read | mmap_cat |
detach | unlink): sanity-pcc/test_99.
Was-Change-Id: I5db160851a95937275fea6ae32f40dcd0fe69f46

EX-3478 pcc: avoid uninitialized pcc mutext lock in cleanup

Running racer concurrently crashed in the following way:
  RIP: 0010:[...]  [...] __list_add+0x1b/0xc0
  __mutex_lock_slowpath+0xa6/0x1d0
  mutex_lock+0x1f/0x2f
  pcc_inode_free+0x1e/0x60 [lustre]
  ll_clear_inode+0x64/0x6a0 [lustre]
  ll_delete_inode+0x5d/0x220 [lustre]
  evict+0xb4/0x180
  iput+0xfc/0x190
  ll_iget+0x156/0x350 [lustre]
  ll_prep_inode+0x212/0x9b0 [lustre]

After analysis, we found that the mutex @lli_pcc_lock is not
initialized. The reason is that ll_lli_init() is not called to
initialize @lli.
When call pcc_inode_free(), it will call mutex_lock() on the
uniniitialized @lli_pcc_lock, thus crash the kernel.

In liblustreapi_pcc.c, it should set errno on error return.
Was-Change-Id: I612c79a5b8eb4fa9daeb9e446a457e95c666c04a

EX-3636 pcc: reset file mmaping for the file once mmaped

For a file once mmaped and cached on PCC, a new open will set the
mapping for the file handle of PCC copy (@file->f_mapping) with
the one of the Lustre file handle. When the file is detached from
PCC due to manual detach or layout lock shrinking, the normal I/O
(read/write) will auto-attach the file into PCC again during I/O
as the layout version is unchanged. However, it still needs to
reset the file mapping (@pcc_file->f_mapping) with the mapping of
the PCC copy. Otherwise it will cause panic as follows:
[  935.516823] RIP: 0010:_raw_read_lock+0xa/0x20
[  935.517077]  ll_cl_find+0x19/0x60 [lustre]
[  935.517098]  ll_readpage+0x51/0x820 [lustre]
[  935.517110]  read_pages+0x122/0x190
[  935.517119]  __do_page_cache_readahead+0x1c1/0x1e0
[  935.517131]  ondemand_readahead+0x1f9/0x2c0
[  935.517142]  pagecache_get_page+0x30/0x2c0
[  935.517165]  generic_file_buffered_read+0x556/0xa00
[  935.517189]  pcc_try_auto_attach+0x3ac/0x400 [lustre]
[  935.517552]  pcc_io_init+0x146/0x560 [lustre]
[  935.517906]  pcc_file_read_iter+0x24d/0x2b0 [lustre]
[  935.518259]  ll_file_read_iter+0x74/0x2e0 [lustre]
[  935.518604]  new_sync_read+0x121/0x170
[  935.518937]  vfs_read+0x8a/0x140

This patch adds sanity-pcc test_98 to verify it.

I/O for a file previously opened before attach into PCC or once
opened while in ATTACHING state will fallback to Lustre OSTs.
For the later mmap() on the file, the mmap() I/O also needs to
fallback to Lustre OSTs and cannot read directly from local valid
cached PCC copy until all fallback file handles are closed as the
mapping of the PCC copy is replaced with the one of Lustre file
when mmapped a file.
Add sanity-pcc test_97 to verify it.

And we also forbid to auto attach the file which is still in
mmapped I/O.

EX-3636 pcc: auto attach should skip if already attached

When try to auto attach a file into PCC, if found that the file
had already attached into PCC, it should skip the auto attach
processing. Otherwise, it will result in wrong PCC inode refcount
when multiple threads try to auto attach a file at the same time.

For a file once mmapped into PCC and detached due to layout lock
shrinking or manual detach command, If found that file is still
valid cached (attach into PCC again by another thread), in the
@pcc_mmap_io_init(), it should set the mapping of PCC copy with
the one of Lustre file again.
Was-Change-Id: I5f049ca7d6db8708712e79e9ad459fc60b80f2be

LU-17964 pcc: set mapneg bit in all cases of normal I/O fallback

When a file is copying data from Lustre OSTs to the PCC copy, the
file is in PCC ATTACHING state. New opens and I/O on this file
will fallback to the normal I/O path (Lustre OSTs) before the
attach is finished. And the file handle will be set with fallback
and mapneg bit. Currently we only clear the fallback and mapneg
bit when the file handle is closed.

To support mmap() I/O, we replace the mapping of the PCC copy with
the one of the Lustre file. However, we can do that only if the
Lustre file has not any opened file handle with mapneg bit set.
Otherwise, we can not switch the mapping and the mmap() I/O will
also fallback to Lustre OSTs and use the mapping of the Lustre
file.

Once a mmap()ed file was detached from PCC backend due to the
manual detach command or the revocation of the LAYOUT ibit lock
(which protects the cache validity of PCC cache access), we should
reset the mapping of the PCC file accordingly and set fallback and
mapneg bits if the I/O is falling back into the normal path
(Lustre OSTs).
Was-Change-Id: Ibd152aaf724dcff48efbe022dc7f3e70848b4e0d

EX-bug-id: EX-3080 EX-3191 EX-3478 EX-3480 EX-3636
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I18890d19d03726a5991c923505e8c5363382fdc2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54390
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/llite/llite_internal.h
lustre/llite/llite_lib.c
lustre/llite/pcc.c
lustre/llite/pcc.h
lustre/tests/sanity-pcc.sh
lustre/utils/liblustreapi_pcc.c

index f78c4a2..9329961 100644 (file)
@@ -280,7 +280,17 @@ struct ll_inode_info {
                        struct mutex             lli_pcc_lock;
                        enum lu_pcc_state_flags  lli_pcc_state;
                        atomic_t                 lli_pcc_mapcnt;
-
+                       /*
+                        * I/O for a file previously opened before attach into
+                        * PCC or once opened while in ATTACHING state will
+                        * fallback to Lustre OSTs.
+                        * For a later mmap() on the file, the mmap I/O also
+                        * needs to fallback and cannot read from PCC directly
+                        * until all fallback file handles are closed as we
+                        * replace the mmaping of the PCC copy with the one of
+                        * Lustre file when mmaped a file.
+                        */
+                       atomic_t                 lli_pcc_mapneg;
                        /*
                         * @lli_pcc_generation saves the gobal PCC generation
                         * when the file was successfully attached into PCC.
index 30b7658..82b1de3 100644 (file)
@@ -1301,6 +1301,7 @@ void ll_lli_init(struct ll_inode_info *lli)
                lli->lli_pcc_dsflags = PCC_DATASET_INVALID;
                lli->lli_pcc_generation = 0;
                atomic_set(&lli->lli_pcc_mapcnt, 0);
+               atomic_set(&lli->lli_pcc_mapneg, 0);
                mutex_init(&lli->lli_group_mutex);
                lli->lli_group_users = 0;
                lli->lli_group_gid = 0;
index d60c645..0a0d062 100644 (file)
@@ -1347,10 +1347,16 @@ void pcc_inode_free(struct inode *inode)
 {
        struct pcc_inode *pcci = ll_i2pcci(inode);
 
+       if (!pcci)
+               return;
+
+       pcc_inode_lock(inode);
+       pcci = ll_i2pcci(inode);
        if (pcci) {
                WARN_ON(atomic_read(&pcci->pcci_refcount) > 1);
                pcc_inode_put(pcci);
        }
+       pcc_inode_unlock(inode);
 }
 
 /*
@@ -1950,6 +1956,13 @@ static int pcc_try_auto_attach(struct inode *inode, bool *cached,
        if (list_empty(&super->pccs_datasets))
                RETURN(0);
 
+       if (lli->lli_pcc_state & PCC_STATE_FL_ATTACHING)
+               RETURN(0);
+
+       /* Forbid to auto attach the file once mmapped into PCC. */
+       if (atomic_read(&lli->lli_pcc_mapcnt) > 0)
+               RETURN(0);
+
        /*
         * The file layout lock was cancelled. And this open does not
         * obtain valid layout lock from MDT (i.e. the file is being
@@ -1959,9 +1972,22 @@ static int pcc_try_auto_attach(struct inode *inode, bool *cached,
                if (ll_layout_version_get(lli) == CL_LAYOUT_GEN_NONE)
                        RETURN(0);
        } else {
+               struct pcc_inode *pcci;
+
+               pcc_inode_unlock(inode);
                rc = ll_layout_refresh(inode, &gen);
+               pcc_inode_lock(inode);
                if (rc)
                        RETURN(rc);
+
+               pcci = ll_i2pcci(inode);
+               if (pcci && pcc_inode_has_layout(pcci)) {
+                       *cached = true;
+                       RETURN(0);
+               }
+
+               if (atomic_read(&lli->lli_pcc_mapcnt) > 0)
+                       RETURN(0);
        }
 
        rc = pcc_get_layout_info(inode, &clt);
@@ -2026,10 +2052,8 @@ static inline bool pcc_may_auto_attach(struct inode *inode,
        RETURN(lli->lli_pcc_dsflags & PCC_DATASET_IO_ATTACH);
 }
 
-static void __pcc_layout_invalidate(struct pcc_inode *pcci)
+static inline void pcc_wait_ios_finish(struct pcc_inode *pcci)
 {
-       pcci->pcci_type = LU_PCC_NONE;
-       pcc_layout_gen_set(pcci, CL_LAYOUT_GEN_NONE);
        if (atomic_read(&pcci->pcci_active_ios) == 0)
                return;
 
@@ -2053,6 +2077,8 @@ static inline void pcc_inode_mapping_reset(struct inode *inode)
        struct address_space *mapping = inode->i_mapping;
        int rc;
 
+       pcc_wait_ios_finish(pcci);
+
        /* Did we mmap this file? */
        if (pcc_inode->i_mapping == &pcc_inode->i_data)
                return;
@@ -2118,7 +2144,10 @@ static inline void pcc_inode_mmap_put(struct inode *inode)
 /* Call with inode lock held. */
 static inline void pcc_inode_detach(struct inode *inode)
 {
-       __pcc_layout_invalidate(ll_i2pcci(inode));
+       struct pcc_inode *pcci = ll_i2pcci(inode);
+
+       pcci->pcci_type = LU_PCC_NONE;
+       pcc_layout_gen_set(pcci, CL_LAYOUT_GEN_NONE);
        pcc_inode_mapping_reset(inode);
 }
 
@@ -2180,8 +2209,45 @@ static bool pcc_io_tolerate(struct pcc_inode *pcci,
        return true;
 }
 
+static inline void
+pcc_file_fallback_set(struct ll_inode_info *lli, struct pcc_file *pccf)
+{
+       atomic_inc(&lli->lli_pcc_mapneg);
+       pccf->pccf_fallback = 1;
+}
+
+static inline void
+pcc_file_fallback_reset(struct ll_inode_info *lli, struct pcc_file *pccf)
+{
+       if (pccf->pccf_fallback) {
+               pccf->pccf_fallback = 0;
+               atomic_dec(&lli->lli_pcc_mapneg);
+       }
+}
+
+static inline void
+pcc_file_mapping_reset(struct inode *inode, struct file *file, bool cached)
+{
+       struct file *pcc_file = NULL;
+
+       if (file) {
+               struct pcc_file *pccf = ll_file2pccf(file);
+
+               pcc_file = pccf->pccf_file;
+               if (!cached && !pccf->pccf_fallback)
+                       pcc_file_fallback_set(ll_i2info(inode), pccf);
+       }
+
+       if (pcc_file) {
+               struct inode *pcc_inode = file_inode(pcc_file);
+
+               if (pcc_inode->i_mapping == &pcc_inode->i_data)
+                       pcc_file->f_mapping = pcc_inode->i_mapping;
+       }
+}
+
 static void pcc_io_init(struct inode *inode, enum pcc_io_type iot,
-                       bool *cached)
+                       struct file *file, bool *cached)
 {
        struct pcc_inode *pcci;
 
@@ -2201,8 +2267,8 @@ static void pcc_io_init(struct inode *inode, enum pcc_io_type iot,
        } else {
                *cached = false;
                /*
-                * FIXME: Forbid auto PCC attach if the file has still been
-                * mmapped in PCC.
+                * Forbid to auto PCC attach if the file has still been
+                * mapped in PCC.
                 */
                if (pcc_may_auto_attach(inode, iot)) {
                        (void) pcc_try_auto_attach(inode, cached, iot);
@@ -2213,6 +2279,7 @@ static void pcc_io_init(struct inode *inode, enum pcc_io_type iot,
                        }
                }
        }
+       pcc_file_mapping_reset(inode, file, *cached);
        pcc_inode_unlock(inode);
 }
 
@@ -2250,8 +2317,10 @@ int pcc_file_open(struct inode *inode, struct file *file)
        pcc_inode_lock(inode);
        pcci = ll_i2pcci(inode);
 
-       if (lli->lli_pcc_state & PCC_STATE_FL_ATTACHING)
+       if (lli->lli_pcc_state & PCC_STATE_FL_ATTACHING) {
+               pcc_file_fallback_set(lli, pccf);
                GOTO(out_unlock, rc = 0);
+       }
 
        if (!pcci || !pcc_inode_has_layout(pcci)) {
                if (pcc_may_auto_attach(inode, PIT_OPEN))
@@ -2260,9 +2329,14 @@ int pcc_file_open(struct inode *inode, struct file *file)
                if (rc == 0 && !cached)
                        rc = pcc_try_readonly_open_attach(inode, file, &cached);
 
-               if (rc < 0 || !cached)
+               if (rc < 0)
                        GOTO(out_unlock, rc);
 
+               if (!cached) {
+                       pcc_file_fallback_set(lli, pccf);
+                       GOTO(out_unlock, rc);
+               }
+
                pcci = ll_i2pcci(inode);
        }
 
@@ -2302,6 +2376,8 @@ void pcc_file_release(struct inode *inode, struct file *file)
 
        pccf = &fd->fd_pcc_file;
        pcc_inode_lock(inode);
+       pcc_file_fallback_reset(ll_i2info(inode), pccf);
+
        if (pccf->pccf_file == NULL)
                goto out;
 
@@ -2371,7 +2447,7 @@ ssize_t pcc_file_read_iter(struct kiocb *iocb,
                RETURN(0);
        }
 
-       pcc_io_init(inode, PIT_READ, cached);
+       pcc_io_init(inode, PIT_READ, file, cached);
        if (!*cached)
                RETURN(0);
 
@@ -2441,7 +2517,7 @@ ssize_t pcc_file_write_iter(struct kiocb *iocb,
                RETURN(0);
        }
 
-       pcc_io_init(inode, PIT_WRITE, cached);
+       pcc_io_init(inode, PIT_WRITE, file, cached);
        if (!*cached)
                RETURN(0);
 
@@ -2477,7 +2553,7 @@ int pcc_inode_setattr(struct inode *inode, struct iattr *attr,
                RETURN(0);
        }
 
-       pcc_io_init(inode, PIT_SETATTR, cached);
+       pcc_io_init(inode, PIT_SETATTR, NULL, cached);
        if (!*cached)
                RETURN(0);
 
@@ -2519,7 +2595,7 @@ int pcc_inode_getattr(struct inode *inode, u32 request_mask,
                RETURN(0);
        }
 
-       pcc_io_init(inode, PIT_GETATTR, cached);
+       pcc_io_init(inode, PIT_GETATTR, NULL, cached);
        if (!*cached)
                RETURN(0);
 
@@ -2581,7 +2657,7 @@ ssize_t pcc_file_splice_read(struct file *in_file, loff_t *ppos,
                RETURN(default_file_splice_read(in_file, ppos, pipe,
                                                count, flags));
 
-       pcc_io_init(inode, PIT_SPLICE_READ, &cached);
+       pcc_io_init(inode, PIT_SPLICE_READ, in_file, &cached);
        if (!cached)
                RETURN(default_file_splice_read(in_file, ppos, pipe,
                                                count, flags));
@@ -2624,7 +2700,7 @@ int pcc_fsync(struct file *file, loff_t start, loff_t end,
                RETURN(0);
        }
 
-       pcc_io_init(inode, PIT_FSYNC, cached);
+       pcc_io_init(inode, PIT_FSYNC, file, cached);
        if (!*cached)
                RETURN(0);
 
@@ -2643,13 +2719,17 @@ static inline void pcc_vma_file_reset(struct inode *inode,
        LASSERT(pccv);
        if (vma->vm_file != pccv->pccv_file) {
                struct pcc_file *pccf = ll_file2pccf(pccv->pccv_file);
+               struct file *pcc_file = pccf->pccf_file;
+               struct inode *pcc_inode = file_inode(pcc_file);
 
-               LASSERT(vma->vm_file == pccf->pccf_file);
+               LASSERT(vma->vm_file == pcc_file);
                LASSERT(vma->vm_file->f_mapping == inode->i_mapping);
                vma->vm_file = pccv->pccv_file;
 
                get_file(vma->vm_file);
-               fput(pccf->pccf_file);
+               if (pcc_file->f_mapping != pcc_inode->i_mapping)
+                       pcc_file->f_mapping = pcc_inode->i_mapping;
+               fput(pcc_file);
 
                CDEBUG(D_CACHE,
                       DFID" mapcnt %d vm_file %p:%ld lu_file %p:%ld vma %p\n",
@@ -2667,23 +2747,49 @@ static void pcc_mmap_vma_reset(struct inode *inode, struct vm_area_struct *vma)
        pcc_inode_unlock(inode);
 }
 
+static int pcc_mmap_mapping_set(struct inode *inode, struct inode *pcc_inode);
+
 static void pcc_mmap_io_init(struct inode *inode, enum pcc_io_type iot,
                             struct vm_area_struct *vma, bool *cached)
 {
        struct pcc_vma *pccv = (struct pcc_vma *)vma->vm_private_data;
+       struct ll_inode_info *lli = ll_i2info(inode);
        struct pcc_inode *pcci;
+       struct pcc_file *pccf;
 
        LASSERT(pccv);
 
        pcc_inode_lock(inode);
        pcci = ll_i2pcci(inode);
+       pccf = ll_file2pccf(pccv->pccv_file);
        if (pcci && pcc_inode_has_layout(pcci)) {
+               struct inode *pcc_inode = pcci->pcci_path.dentry->d_inode;
+
                LASSERT(atomic_read(&pcci->pcci_refcount) > 0);
+
                if (pcci->pcci_type == LU_PCC_READONLY &&
                    iot == PIT_PAGE_MKWRITE) {
                        pcc_inode_detach_put(inode);
                        pcc_vma_file_reset(inode, vma);
                        *cached = false;
+               } else if (pcc_inode->i_mapping == &pcc_inode->i_data) {
+                       if (atomic_read(&lli->lli_pcc_mapneg) > 0) {
+                               pcc_inode_detach_put(inode);
+                               pcc_vma_file_reset(inode, vma);
+                               *cached = false;
+                       } else {
+                               int rc;
+
+                               rc = pcc_mmap_mapping_set(inode, pcc_inode);
+                               if (rc) {
+                                       pcc_inode_detach_put(inode);
+                                       pcc_vma_file_reset(inode, vma);
+                                       *cached = false;
+                               } else {
+                                       atomic_inc(&pcci->pcci_active_ios);
+                                       *cached = true;
+                               }
+                       }
                } else {
                        atomic_inc(&pcci->pcci_active_ios);
                        *cached = true;
@@ -2692,6 +2798,10 @@ static void pcc_mmap_io_init(struct inode *inode, enum pcc_io_type iot,
                *cached = false;
                pcc_vma_file_reset(inode, vma);
        }
+
+       if (!*cached && !pccf->pccf_fallback)
+               pcc_file_fallback_set(lli, pccf);
+
        pcc_inode_unlock(inode);
 }
 
@@ -2812,7 +2922,8 @@ static int pcc_mmap_mapping_set(struct inode *inode, struct inode *pcc_inode)
 int pcc_file_mmap(struct file *file, struct vm_area_struct *vma,
                  bool *cached)
 {
-       struct file *pcc_file = ll_file2pccf(file)->pccf_file;
+       struct pcc_file *pccf = ll_file2pccf(file);
+       struct file *pcc_file = pccf->pccf_file;
        struct inode *inode = file_inode(file);
        struct pcc_inode *pcci;
        int rc = 0;
@@ -2835,9 +2946,20 @@ int pcc_file_mmap(struct file *file, struct vm_area_struct *vma,
        pcc_inode_lock(inode);
        pcci = ll_i2pcci(inode);
        if (pcci && pcc_inode_has_layout(pcci)) {
+               struct ll_inode_info *lli = ll_i2info(inode);
                struct inode *pcc_inode = file_inode(pcc_file);
                struct pcc_vma *pccv;
 
+               if (pccf->pccf_fallback) {
+                       LASSERT(atomic_read(&lli->lli_pcc_mapneg) > 0);
+                       GOTO(out, rc);
+               }
+
+               if (atomic_read(&lli->lli_pcc_mapneg) > 0) {
+                       pcc_file_fallback_set(lli, pccf);
+                       GOTO(out, rc);
+               }
+
                LASSERT(atomic_read(&pcci->pcci_refcount) > 1);
                *cached = true;
 
@@ -3680,7 +3802,7 @@ out_unlock:
        RETURN(rc);
 }
 
-static int pcc_layout_rdonly_set(struct inode *inode, __u32 *gen)
+static int pcc_layout_rdonly_set(struct inode *inode, __u32 *gen, bool *cached)
 
 {
        struct ll_inode_info *lli = ll_i2info(inode);
@@ -3737,7 +3859,21 @@ repeat:
                if (rc)
                        RETURN(rc);
        } else { /* Readonly layout */
+               struct pcc_inode *pcci;
+
                *gen = clt.cl_layout_gen;
+               /*
+                * The file is already in readonly state, give a chance to
+                * try auto attach.
+                */
+               pcc_inode_lock(inode);
+               pcci = ll_i2pcci(inode);
+               if (pcci && pcc_inode_has_layout(pcci))
+                       *cached = true;
+               else
+                       rc = pcc_try_datasets_attach(inode, PIT_OPEN, *gen,
+                                                    LU_PCC_READONLY, cached);
+               pcc_inode_unlock(inode);
        }
 
        RETURN(rc);
@@ -3750,16 +3886,19 @@ static int pcc_readonly_attach(struct file *file,
        struct ll_inode_info *lli = ll_i2info(inode);
        const struct cred *old_cred;
        struct pcc_dataset *dataset;
-       struct pcc_inode *pcci;
+       struct pcc_inode *pcci = NULL;
        struct dentry *dentry;
        bool attached = false;
        bool unlinked = false;
+       bool cached = false;
        __u32 gen;
        int rc;
 
        ENTRY;
 
-       rc = pcc_layout_rdonly_set(inode, &gen);
+       rc = pcc_layout_rdonly_set(inode, &gen, &cached);
+       if (cached)
+               RETURN(0);
        if (rc)
                RETURN(rc);
 
@@ -3772,10 +3911,8 @@ static int pcc_readonly_attach(struct file *file,
        if (rc)
                GOTO(out_dataset_put, rc);
 
-       mutex_lock(&lli->lli_layout_mutex);
        pcc_inode_lock(inode);
        old_cred = override_creds(super->pccs_cred);
-       lli->lli_pcc_state &= ~PCC_STATE_FL_ATTACHING;
        if (gen != ll_layout_version_get(lli)) {
                CDEBUG(D_CACHE, "L.Gen mismatch %u:%u\n",
                       gen, ll_layout_version_get(lli));
@@ -3790,6 +3927,17 @@ static int pcc_readonly_attach(struct file *file,
 
                pcc_inode_attach_set(super, dataset, lli, pcci,
                                     dentry, LU_PCC_READONLY);
+       } else if (pcc_inode_has_layout(pcci)) {
+               /*
+                * There may be a gap between auto attach and auto open cache:
+                * ->pcc_file_open()
+                *  ->pcc_try_auto_attach()
+                *    The file is re-attach into PCC by other thread.
+                *  ->pcc_try_readonly_open_attach()
+                */
+               CWARN("%s: The file (fid@"DFID") is already attached.\n",
+                     ll_i2sbi(inode)->ll_fsname, PFID(ll_inode2fid(inode)));
+               GOTO(out_put_unlock, rc = -EEXIST);
        } else {
                atomic_inc(&pcci->pcci_refcount);
                path_put(&pcci->pcci_path);
@@ -3817,7 +3965,6 @@ out_put_unlock:
        }
        revert_creds(old_cred);
        pcc_inode_unlock(inode);
-       mutex_unlock(&lli->lli_layout_mutex);
 out_dataset_put:
        pcc_dataset_put(dataset);
 
index a01cf2c..d0247ac 100644 (file)
@@ -204,6 +204,8 @@ struct pcc_file {
        struct file             *pccf_file;
        /* Whether readonly or readwrite PCC */
        enum lu_pcc_type         pccf_type;
+       /* I/O especially mmap() I/O must fallback to Lustre OSTs. */
+       __u32                    pccf_fallback:1;
 };
 
 struct pcc_vma {
index a675740..01679a5 100755 (executable)
@@ -2911,7 +2911,7 @@ test_34() {
        copytool setup -m "$MOUNT" -a "$HSM_ARCHIVE_NUMBER"
 
        setup_pcc_mapping $SINGLEAGT \
-               "projid\>{100}\ roid=5\ ropcc=1"
+               "projid\>{100}\ roid=5\ pccro=1"
        do_facet $SINGLEAGT $LCTL pcc list $MOUNT
        do_facet $SINGLEAGT "echo -n QQQQQ > $file" ||
                error "failed to write $file"
@@ -2931,7 +2931,7 @@ test_34() {
        cleanup_pcc_mapping
 
        setup_pcc_mapping $SINGLEAGT \
-               "projid\<{100}\ roid=5\ ropcc=1"
+               "projid\<{100}\ roid=5\ pccro=1"
        do_facet $SINGLEAGT $LCTL pcc list $MOUNT
        do_facet $SINGLEAGT $MULTIOP $file oc ||
                error "failed to readonly open $file"
@@ -2949,7 +2949,7 @@ test_34() {
        cleanup_pcc_mapping
 
        setup_pcc_mapping $SINGLEAGT \
-               "projid\<{120}\&projid\>{110}\ roid=5\ ropcc=1"
+               "projid\<{120}\&projid\>{110}\ roid=5\ pccro=1"
        do_facet $SINGLEAGT $LCTL pcc list $MOUNT
        do_facet $SINGLEAGT $MULTIOP $file oc ||
                error "failed to readonly open $file"
@@ -3334,6 +3334,450 @@ test_41() {
 }
 run_test 41 "Test mtime rule for PCC-RO open attach with O_RDONLY mode"
 
+test_96() {
+       local loopfile="$TMP/$tfile"
+       local mntpt="/mnt/pcc.$tdir"
+       local hsm_root="$mntpt/$tdir"
+       local file1=$DIR/$tfile
+       local file2=$DIR2/$tfile
+
+       $LCTL get_param -n mdc.*.connect_flags | grep -q pcc_ro ||
+               skip "Server does not support PCC-RO"
+
+       setup_loopdev $SINGLEAGT $loopfile $mntpt 60
+       do_facet $SINGLEAGT mkdir $hsm_root || error "mkdir $hsm_root failed"
+       setup_pcc_mapping $SINGLEAGT \
+               "projid={0}\ roid=$HSM_ARCHIVE_NUMBER\ pccro=1\ mmap_conv=0"
+       do_facet $SINGLEAGT $LCTL pcc list $MOUNT
+       do_facet $SINGLEAGT $LCTL set_param llite.*.pcc_async_threshold=1G
+
+       local rpid11
+       local rpid12
+       local rpid13
+       local rpid21
+       local rpid22
+       local rpid23
+       local lpid
+
+       local bs="1M"
+       local count=50
+
+       do_facet $SINGLEAGT dd if=/dev/zero of=$file1 bs=$bs count=$count ||
+               error "Write $file failed"
+
+       (
+               while [ ! -e $DIR/sanity-pcc.96.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file1 of=/dev/null bs=$bs count=$count ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid11=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.96.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file1 of=/dev/null bs=$bs count=$count ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid12=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.96.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file1 of=/dev/null bs=$bs count=$count ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid13=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.96.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file2 of=/dev/null bs=$bs count=$count ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid21=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.96.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file2 of=/dev/null bs=$bs count=$count ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid22=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.96.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file2 of=/dev/null bs=$bs count=$count ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid23=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.96.lck ]; do
+                       do_facet $SINGLEAGT $LCTL set_param -n ldlm.namespaces.*mdc*.lru_size=clear ||
+                               error "cancel_lru_locks mdc failed"
+                       sleep 0.5
+               done
+       )&
+       lpid=$!
+
+       sleep 60
+       touch $DIR/sanity-pcc.96.lck
+
+       echo "Finish ========"
+       wait $rpid11 || error "$?: read failed"
+       wait $rpid12 || error "$?: read failed"
+       wait $rpid13 || error "$?: read failed"
+       wait $rpid21 || error "$?: read failed"
+       wait $rpid22 || error "$?: read failed"
+       wait $rpid23 || error "$?: read failed"
+       wait $lpid || error "$?: lock cancel failed"
+
+       do_facet $SINGLEAGT $LFS pcc detach $file
+       rm -f $DIR/sanity-pcc.96.lck
+}
+run_test 96 "Auto attach from multiple read process on a node"
+
+test_97() {
+       local loopfile="$TMP/$tfile"
+       local mntpt="/mnt/pcc.$tdir"
+       local hsm_root="$mntpt/$tdir"
+       local file=$DIR/$tfile
+
+       $LCTL get_param -n mdc.*.connect_flags | grep -q pcc_ro ||
+               skip "Server does not support PCC-RO"
+
+       setup_loopdev $SINGLEAGT $loopfile $mntpt 60
+       do_facet $SINGLEAGT mkdir $hsm_root || error "mkdir $hsm_root failed"
+       setup_pcc_mapping $SINGLEAGT \
+               "projid={0}\ roid=$HSM_ARCHIVE_NUMBER\ pccro=1\ mmap_conv=0"
+       do_facet $SINGLEAGT $LCTL pcc list $MOUNT
+       do_facet $SINGLEAGT $LCTL set_param llite.*.pcc_async_threshold=1G
+
+       local mpid1
+       local mpid2
+       local lpid
+
+       do_facet $SINGLEAGT dd if=/dev/zero of=$file bs=1M count=50 ||
+               error "Write $file failed"
+
+       (
+               while [ ! -e $DIR/sanity-pcc.97.lck ]; do
+                       echo "T1. $MMAP_CAT $file ..."
+                       do_facet $SINGLEAGT $MMAP_CAT $file > /dev/null ||
+                               error "$MMAP_CAT $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       mpid1=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.97.lck ]; do
+                       echo "T2. $MMAP_CAT $file ..."
+                       do_facet $SINGLEAGT $MMAP_CAT $file > /dev/null ||
+                               error "$MMAP_CAT $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       mpid2=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.97.lck ]; do
+                       do_facet $SINGLEAGT $LCTL set_param -n ldlm.namespaces.*mdc*.lru_size=clear ||
+                               error "cancel_lru_locks mdc failed"
+                       sleep 0.1
+               done
+       )&
+       lpid=$!
+
+       sleep 120
+       stack_trap "rm -f $DIR/sanity-pcc.97.lck"
+       touch $DIR/sanity-pcc.97.lck
+       wait $mpid1 || error "$?: mmap1 failed"
+       wait $mpid2 || error "$?: mmap2 failed"
+       wait $lpid || error "$?: cancel locks failed"
+
+       do_facet $SINGLEAGT $LFS pcc detach $file
+       rm -f $DIR/sanity-pcc.97.lck
+}
+run_test 97 "two mmap I/O and layout lock cancel"
+
+test_98() {
+       local loopfile="$TMP/$tfile"
+       local mntpt="/mnt/pcc.$tdir"
+       local hsm_root="$mntpt/$tdir"
+       local file=$DIR/$tfile
+
+       $LCTL get_param -n mdc.*.connect_flags | grep -q pcc_ro ||
+               skip "Server does not support PCC-RO"
+
+       setup_loopdev $SINGLEAGT $loopfile $mntpt 60
+       do_facet $SINGLEAGT mkdir $hsm_root || error "mkdir $hsm_root failed"
+       setup_pcc_mapping $SINGLEAGT \
+               "projid={0}\ roid=$HSM_ARCHIVE_NUMBER\ pccro=1\ mmap_conv=0"
+       do_facet $SINGLEAGT $LCTL pcc list $MOUNT
+       do_facet $SINGLEAGT $LCTL set_param llite.*.pcc_async_threshold=1G
+
+       local rpid1
+       local rpid2
+       local rpid3
+       local mpid1
+       local mpid2
+       local mpid3
+       local lpid1
+       local lpid2
+
+       do_facet $SINGLEAGT dd if=/dev/zero of=$file bs=1M count=50 ||
+               error "Write $file failed"
+
+       (
+               while [ ! -e $DIR/sanity-pcc.98.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file of=/dev/null bs=1M count=50 ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid1=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.98.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file of=/dev/null bs=1M count=50 ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid2=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.98.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file of=/dev/null bs=1M count=50 ||
+                               error "Read $file failed"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid3=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.98.lck ]; do
+                       do_facet $SINGLEAGT $MMAP_CAT $file > /dev/null ||
+                               error "$MMAP_CAT $file failed"
+                       sleep 0.$((RANDOM % 2 + 1))
+               done
+       )&
+       mpid1=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.98.lck ]; do
+                       do_facet $SINGLEAGT $LCTL set_param -n ldlm.namespaces.*mdc*.lru_size=clear ||
+                               error "cancel_lru_locks mdc failed"
+                       sleep 0.1
+               done
+       )&
+       lpid1=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.98.lck ]; do
+                       do_facet $SINGLEAGT $LCTL set_param -n ldlm.namespaces.*osc*.lru_size=clear ||
+                               error "cancel_lru_locks mdc failed"
+                       sleep 0.1
+               done
+       )&
+       lpid2=$!
+
+       sleep 60
+       stack_trap "rm -f $DIR/sanity-pcc.98.lck"
+       touch $DIR/sanity-pcc.98.lck
+       wait $rpid1 || error "$?: read failed"
+       wait $rpid2 || error "$?: read failed"
+       wait $rpid3 || error "$?: read failed"
+       wait $mpid1 || error "$?: mmap failed"
+       wait $lpid1 || error "$?: cancel locks failed"
+       wait $lpid2 || error "$?: cancel locks failed"
+
+       do_facet $SINGLEAGT $LFS pcc detach $file
+       rm -f $DIR/sanity-pcc.98.lck
+}
+run_test 98 "racer between auto attach and mmap I/O"
+
+test_99() {
+       local loopfile="$TMP/$tfile"
+       local mntpt="/mnt/pcc.$tdir"
+       local hsm_root="$mntpt/$tdir"
+       local file=$DIR/$tfile
+
+       $LCTL get_param -n mdc.*.connect_flags | grep -q pcc_ro ||
+               skip "Server does not support PCC-RO"
+
+       setup_loopdev $SINGLEAGT $loopfile $mntpt 60
+       do_facet $SINGLEAGT mkdir $hsm_root || error "mkdir $hsm_root failed"
+       setup_pcc_mapping $SINGLEAGT \
+               "projid={0}\ roid=$HSM_ARCHIVE_NUMBER\ pccro=1"
+       do_facet $SINGLEAGT $LCTL pcc list $MOUNT
+
+       do_facet $SINGLEAGT dd if=/dev/zero of=$file bs=1M count=50 ||
+               error "Write $file failed"
+
+       local rpid
+       local rpid2
+       local wpid
+       local upid
+       local dpid
+       local lpcc_path
+
+       lpcc_path=$(lpcc_fid2path $hsm_root $file)
+       (
+               while [ ! -e $DIR/sanity-pcc.99.lck ]; do
+                       do_facet $SINGLEAGT dd if=/dev/zero of=$file bs=1M count=50 conv=notrunc ||
+                               error "failed to write $file"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       wpid=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.99.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file of=/dev/null bs=1M count=50 ||
+                               error "failed to write $file"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.99.lck ]; do
+                       do_facet $SINGLEAGT $MMAP_CAT $file > /dev/null ||
+                               error "failed to mmap_cat $file"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid2=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.99.lck ]; do
+                       echo "Unlink $lpcc_path"
+                       do_facet $SINGLEAGT unlink $lpcc_path
+                       sleep 1
+               done
+               true
+       )&
+       upid=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.99.lck ]; do
+                       echo "Detach $file ..."
+                       do_facet $SINGLEAGT $LFS pcc detach $file
+                       sleep 0.$((RANDOM % 8 + 1))
+               done
+       )&
+       dpid=$!
+
+       sleep 60
+       stack_trap "rm -f $DIR/sanity-pcc.99.lck"
+       touch $DIR/sanity-pcc.99.lck
+       wait $wpid || error "$?: write failed"
+       wait $rpid || error "$?: read failed"
+       wait $rpid2 || error "$?: read2 failed"
+       wait $upid || error "$?: unlink failed"
+       wait $dpid || error "$?: detach failed"
+
+       do_facet $SINGLEAGT $LFS pcc detach $file
+       rm -f $DIR/sanity-pcc.99.lck
+}
+run_test 99 "race among unlink | mmap read | write | detach for PCC-RO file"
+
+test_100() {
+       local loopfile="$TMP/$tfile"
+       local mntpt="/mnt/pcc.$tdir"
+       local hsm_root="$mntpt/$tdir"
+       local file=$DIR/$tfile
+
+       $LCTL get_param -n mdc.*.connect_flags | grep -q pcc_ro ||
+               skip "Server does not support PCC-RO"
+
+       setup_loopdev $SINGLEAGT $loopfile $mntpt 60
+       do_facet $SINGLEAGT mkdir $hsm_root || error "mkdir $hsm_root failed"
+       setup_pcc_mapping $SINGLEAGT \
+               "projid={0}\ roid=$HSM_ARCHIVE_NUMBER\ pccro=1"
+       do_facet $SINGLEAGT $LCTL pcc list $MOUNT
+
+       do_facet $SINGLEAGT dd if=/dev/zero of=$file bs=1M count=50 ||
+               error "Write $file failed"
+
+       local rpid
+       local rpid2
+       local wpid
+       local upid
+       local dpid
+       local lpcc_path
+
+       lpcc_path=$(lpcc_fid2path $hsm_root $file)
+       (
+               while [ ! -e $DIR/sanity-pcc.100.lck ]; do
+                       do_facet $SINGLEAGT dd if=/dev/zero of=$file bs=1M count=50 ||
+                               error "failed to write $file"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       wpid=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.100.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file of=/dev/null bs=1M count=50 ||
+                               error "failed to write $file"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.100.lck ]; do
+                       do_facet $SINGLEAGT dd if=$file of=/dev/null bs=1M count=50 ||
+                               error "failed to write $file"
+                       sleep 0.$((RANDOM % 4 + 1))
+               done
+       )&
+       rpid2=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.100.lck ]; do
+                       echo "Unlink $lpcc_path"
+                       do_facet $SINGLEAGT unlink $lpcc_path
+                       sleep 1
+               done
+               true
+       )&
+       upid=$!
+
+       (
+               while [ ! -e $DIR/sanity-pcc.100.lck ]; do
+                       echo "Detach $file ..."
+                       do_facet $SINGLEAGT $LFS pcc detach $file
+                       sleep 0.$((RANDOM % 8 + 1))
+               done
+       )&
+        dpid=$!
+
+       sleep 60
+       stack_trap "rm -f $DIR/sanity-pcc.100.lck"
+       touch $DIR/sanity-pcc.100.lck
+       wait $wpid || error "$?: write failed"
+       wait $rpid || error "$?: read failed"
+       wait $rpid2 || error "$?: read2 failed"
+       wait $upid || error "$?: unlink failed"
+       wait $dpid || error "$?: detach failed"
+
+       do_facet $SINGLEAGT $LFS pcc detach $file
+       rm -f $DIR/sanity-pcc.100.lck
+}
+run_test 100 "race among PCC unlink | read | write | detach for PCC-RO file"
+
 #test 101: containers and PCC
 #LU-15170: Test mount namespaces with PCC
 #This tests the cases where the PCC mount is not present in the container by
index 0376e76..1f263c9 100644 (file)
@@ -287,7 +287,7 @@ int llapi_pcc_detach_at(int dirfd, const struct lu_fid *fid,
        };
 
        rc = ioctl(dirfd, LL_IOC_PCC_DETACH_BY_FID, &detach);
-       return rc;
+       return rc ? -errno : 0;
 }
 
 /**
@@ -599,9 +599,8 @@ static int llapi_pcc_scan_detach(const char *pname, const char *fname,
        rc = ioctl(hsc->hsc_mntfd, LL_IOC_PCC_DETACH_BY_FID, &detach);
        if (rc) {
                rc = -errno;
-               llapi_printf(LLAPI_MSG_DEBUG,
-                            "failed to detach file '%s': rc = %d\n",
-                            fidname, rc);
+               llapi_error(LLAPI_MSG_ERROR, rc,
+                           "failed to detach file '%s'\n", fidname);
                return rc;
        }