1 #LyX 1.3 created this file. For more info see http://www.lyx.org/
6 \usepackage[usenames]{color}
14 %\usepackage{lncsexample}
18 %\usepackage{graphicx}
19 \newcommand{\lst}[3] {
20 \noindent\vspace{-1mm}
21 \definecolor{cKeyword}{rgb}{0.8,0.1,0.1}
22 \definecolor{cComment}{rgb}{0.2,0.5,0.7}
23 \definecolor{cString}{rgb}{0.2,0.7,0.2}
24 \lstinputlisting[caption={#2},
26 showstringspaces=false,
34 keywordstyle=\color{cKeyword},
35 commentstyle=\color{cComment},
36 stringstyle=\color{cString},
37 directivestyle=\color{magenta},
38 emph={1, 2, 3, 4, 5, 6, 7, 8, 9, 0, NULL, lustre, CFS},
39 emphstyle=\color{blue},
48 \paperfontsize default
55 \use_numerical_citations 0
56 \paperorientation portrait
63 \paragraph_separation skip
65 \quotes_language swedish
69 \paperpagestyle default
73 SMFS Detailed Level Design
81 \pagebreak_bottom \noindent
83 \begin_inset LatexCommand \tableofcontents{}
92 Functional Specification
98 Initialization and data structures
101 SMFS implement all needed methods as filesystem but uses other filesystem
102 as store instead of block device.
103 This way demands that SMFS should care about this backstore FS like VFS.
104 SMFS defines several info structures for each filesystem object that contain
105 all needed information about backstore related stuff and etc.
106 \layout Subsubsection
111 SMFS has plugin API which allows to register/deregister plugins and call
113 Plugins are organized in linked list.
116 Plugin initialization starts when smfs is mounting by someone with options
117 that contain information about plugin needed.
118 Deactivated process starts when SMFS unmount occur.
119 It possible to use ioctl interface or procfs (sysfs later) for plugin managemen
121 Each plugin will receives notification from SMFS about filesystem operations.
122 Plugins should be aware of delays, transaction handling and consistency.
125 SMFS contains information about plugin loaded with special flags.
126 There are system-wide flags and object-related to indicate exclusions.
127 \layout Subsubsection
132 Every hook is called under some conditions in term of resource availability.
133 SMFS should provide that all pre- and post-operation stuff should be done
134 together with fs operation in backstore fs.
135 Plugins should be aware about these conditions and acts properly.
136 \layout Subsubsection*
143 locking rules: all may block, none have BKL.
144 Protection provided through using i_sem semaphore in inode.
149 <lyxtabular version="3" rows="17" columns="3">
151 <column alignment="center" valignment="top" leftline="true" width="0">
152 <column alignment="center" valignment="top" leftline="true" width="0">
153 <column alignment="center" valignment="top" leftline="true" rightline="true" width="0">
154 <row topline="true" bottomline="true">
155 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
162 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
170 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
180 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
188 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
196 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
205 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
213 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
221 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
230 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
238 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
246 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
256 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
264 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
272 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
281 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
289 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
297 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
306 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
314 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
322 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
331 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
339 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
347 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
357 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
365 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
373 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
383 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
391 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
399 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
409 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
417 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
425 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
434 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
442 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
450 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
459 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
467 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
475 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
484 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
492 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
500 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
509 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
517 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
525 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
534 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
542 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
550 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
558 <row topline="true" bottomline="true">
559 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
567 <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
575 <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
590 Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_sem on victim.
593 cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
596 ->truncate() is never called directly - it's a callback, not a method.
597 It's called by vmtruncate() - library function normally used by ->setattr().
598 Locking information above applies to that call (i.e.
599 is inherited from ->setattr() - vmtruncate() is used when ATTR_SIZE had
604 All inode operations may block so plugin can also block.
605 All critical inode operations is already protected by semaphore i_sem.
606 \layout Subsubsection*
613 Locking rules: file operations may block and no BKL here.
617 Some hooks placed around file operations so protection needed to provide
618 sync execution of pre- and post-hooks with backfs operation.
620 \layout Subsubsection*
624 Superblock operations
627 Locking rules: All may block, only
635 No hooks here, so no actions are needed.
636 \layout Subsubsection*
641 locking rules: All may block.
645 None is under BKL except
650 Taken from ..../linux/Documentation/filesystems/Locking
651 \layout Subsubsection
656 There is upcall API in SMFS.
657 Upcalls are organized in linked list and called one by one where needed.
659 \layout Subsubsection
664 LVFS has fsfilt operations for SMFS.
665 SMFS store copy of fsfilt operations from backstore filesystem.
666 If it gets fsfilt operations then it is passed to backstore filesysem.
668 \layout Subsubsection
670 Transactions handling
673 All hooks should be in the same transaction with backstore fs operation.
674 To provide this SMFS starts transaction before first hook and commit after
676 SMFS care about transaction where it is needed, but plugins can add some
678 To solve this SMFS pass transaction handler to plugin, so it can do transaction
680 Also plugin should provide method to calculate extra size for transaction.
684 \layout Subsubsection
689 Backstore FS is mounted by SMFS while initialization, if mount is failed
690 SMFS initialization is also failed.
693 SMFS creates own superblock using backstore one and fsfilter operations
694 \layout Subsubsection
696 Superblock operations
699 SMFS have to initialize own inode structure reading inode with same number
701 Operations with such inode will be redirected to backstore FS if they are
703 SMFS will store backstore FS inode to avoid reading inode every time.
705 \layout Subsubsection
710 SMFS creates own inode operations for each inode.
711 When some operations in invoked SMFS calls real filesystem method to complete
716 There are additional actions for making SMFS inode and real inode consistent.
717 These actions are needed before real operation to create artificial objects
718 and after - to copy changes from backfs inode.
720 \layout Subsubsection
725 SMFS creates own filp, duplicate backstore FS filp and dentry.
726 They are used in file operations following by backstore operation call.
727 To do file operations we create artificial file object before calling of
739 Next options is passed to smfs and descript backstore FS completely:
742 smfs dev=/mnt type=ldiskfs
746 \layout Subsubsection
748 SMFS plugin activation
751 Now all plugins are compiled in SMFS module and plugins are setted up via
752 options when SMFS is mounting.
753 Options should be passed as mountfsoptions like this:
757 \layout Subsubsection
759 SMFS plugin registration
762 Plugins should registers with SMFS:
765 smfs_register_plugin(parameters);
768 Parameters are struct smfs_plugin filled with valid data:
771 type of plugin - to distinguish it from anothers,
774 pre_op function - function that will be called before fs operation,
777 post_op function - function that will be called after fs operation,
780 helper function - helper function.
784 any private data - data which will be return to plugin in each call.
785 \layout Subsubsection
790 SMFS place special wrapper in own filesystem operations to call plugins
794 /* this is wrapper that calls all hooks walking through the list */
797 SMFS_HOOK(opcode, parameters);
802 Hooks are placed in SMFS methods before and after calling backstore FS operation
806 /* this is how to SMFS uses hooks */
815 struct inode * backfs_inode = I2CI(inode);
818 struct smfs_file_info *sfi;
824 SMFS_HOOK(hook_opcode, ...);
827 backfs_inode->i_fop->some_op(sfi->c_file, ...);
830 SMFS_HOOK(hook_opcode, ...);
834 \layout Subsubsection
839 Upcalls can be placed in any place in SMFS, where side modules wants to
840 take control from SMFS.
844 prepares function to handle upcall events,
847 registers upcall using upcall API,
850 receives upcalls and handles they,
853 deregisters upcall when smfs is unmounted.
866 \layout Subsubsection
868 Initialization and data structures
878 lst{../../include/linux/lustre_smfs.h}{Superblock info}{firstline=87,lastline=108}
884 struct smfs_inode_info {
887 /* this first part of struct should be
890 the same as in mds_info_info */
893 struct lustre_id smi_id;
903 struct inode * backfs_inode;
906 __u32 smi_flags; //plugins pre_inode flags
909 struct list_head plist; /* list of plugins inode info */
922 lst{../../include/linux/lustre_smfs.h}{File info}{firstline=113,lastline=117}
926 \layout Subsubsection
931 Fsfilt operations for SMFS just call the same operations in backstore FS.
941 lst{../../lvfs/fsfilt_smfs.c}{Fsfilt redirection example}{firstline=363,lastline=383}
945 \layout Subsubsection
947 Transactions handling
950 Each operations with hooks pass transaction handler to hook.
951 Hook can use it to decide does it need start new transaction or not.
954 If there is no handler, hook can create own transaction:
957 hook_func_in_plugin()
969 handle = smfs_trans_start(inode, KML_CACHE_NOOP, NULL);
981 It is important that plugin should also commit transaction in the same call.
985 If transaction begun before hook call it should know size of transaction,
986 so we should provide a way to calculate this size through all plugins involved.
987 Each plugin can register upcall with related type that will return extra
988 size needed by that plugin.
989 SMFS will walk through upcalls and gets total extra size for transaction:
993 fsfilt_smfs_start(struct inode * inode, int op, ...)
1002 SMFS_TRANS_EXTRA_SIZE(inode, op); /* walking through list of plugins
1006 handle = cache_fsfilt->fs_start(cache_inode, op, ...);
1015 still have no good idea how to pass calculated size instead op without changes
1016 in fsfilt_ext3/any_other_fs code.
1017 New parameter in fs_start?
1020 Backstore filesystem
1021 \layout Subsubsection
1023 Initialization/deinitialization
1026 Backstore FS is mounted by SMFS while SMFS initialization.
1036 lst{../../smfs/smfs_lib.c}{Mount/Umount}{firstline=96,lastline=149}
1040 \layout Subsubsection
1046 For each new inode SMFS creates private structures and getting real inode
1048 SMFS is only user of backfs inode, so it is enough to get it here and put
1049 it down in clear_inode_info.
1050 Some plugins can have as SMFS-wide info as per-inode info.
1051 Therefore they can also participate in init_inode_info/clear_inode_info
1053 Here are helpers for this.
1056 static void smfs_init_inode_info(struct inode *inode, void *opaque) {
1059 struct inode *cache_inode = NULL;
1062 struct smfs_iget_args *sargs;
1070 sargs = (struct smfs_iget_args *)opaque;
1073 /* getting backing fs inode.
1077 ino = sargs ? sargs->s_ino : inode->i_ino;
1080 cache_inode = iget(S2CSB(inode->i_sb), ino);
1083 OBD_ALLOC(inode->u.generic_ip, sizeof(struct smfs_inode_info));
1086 I2CI(inode) = cache_inode;
1089 post_smfs_inode(inode, cache_inode);
1092 sm_set_inode_ops(cache_inode, inode);
1095 if (sargs && sargs->s_inode)
1098 I2SMI(inode)->smi_flags = I2SMI(sargs->s_inode)->smi_flags;
1101 SMFS_PLUGIN_HELPER(PL_INIT_INODE, inode);
1109 static void smfs_clear_inode_info(struct inode *inode) {
1115 struct inode *cache_inode = I2CI(inode);
1118 if (cache_inode != cache_inode->i_sb->s_root->d_inode)
1124 SMFS_PLUGIN_HELPER(PL_CLEAR_INODE, inode);
1127 OBD_FREE(inode->u.generic_ip, sizeof(struct smfs_inode_info));
1130 inode->u.generic_ip = NULL;
1139 \layout Subsubsection
1144 test_inode and set_inode are called under spinlock, so context-switching
1145 is not allowed here.
1146 Inode will be read after iget5_locked call
1149 int smfs_test_inode(struct inode * inode, void * opaque)
1155 struct smfs_iget_args * sargs = opaque;
1158 struct smfs_up_message message;
1164 message.inode = inode;
1167 message.param = opaque;
1173 if (sargs && (inode->i_ino == sargs->s_ino)) {
1176 /* some module can add extra checks here */
1179 if (SMFS_UPCALL(SMFS_UP_TEST_INODE, (void*)&message))
1196 int smfs_set_inode(struct inode *inode, void *opaque)
1202 struct smfs_up_message message;
1208 message.inode = inode;
1211 message.param = opaque;
1216 /* someone can wants to do action here */
1219 SMFS_UPCALL(SMFS_UP_SET_INODE, (void*)&message);
1230 struct inode * smfs_iget(struct super_block * sb, ino_t hash,
1233 struct smfs_iget_args * sargs)
1239 struct inode *inode;
1242 inode = iget5_locked(sb, hash, smfs_test_inode,
1245 smfs_set_inode, sargs);
1251 if (inode->i_state & I_NEW) {
1254 smfs_init_inode_info(inode, sargs);
1257 unlock_new_inode(inode);
1263 inode->i_ino = hash;
1273 \layout Subsubsection
1275 Superblock operations
1282 smfs_read_inode2 (struct inode * inode, void * opaque)
1288 smfs_init_inode_info(inode, opaque);
1295 inode()/write\SpecialChar ~
1299 smfs_dirty/write_inode(struct inode * inode)
1305 backfs_inode = I2CI(inode);
1308 backfs_sb->s_op->dirty/write_inode(backfs_inode);
1311 duplicate_inode(inode, backfs_inode);
1321 smfs_put_inode(struct inode * inode)
1333 delete\SpecialChar ~
1337 smfs_delete_inode(struct inode * inode) {
1350 smfs_clear_inode(struct inode * inode) {
1353 smfs_clear_inode_info(inode);
1363 smfs_cleanup_hooks();
1366 smfs_umount_cache(smfs_super_info);
1369 smfs_cleanup_smb(sb);
1373 super()/write\SpecialChar ~
1375 lockfs()/unlockfs()/statfs()/remountfs()
1378 backfs_sb = S2CSB(sb);
1381 backfs_sb->s_op->...;
1384 duplicate-sb(sb, backfs_sb);
1385 \layout Subsubsection
1390 Each inode operaion uses backfs inode structure, this structure is created
1391 while several fs operations are invoked.
1392 Logic of all operations is next:
1395 There is smfs dentry for each operation passed as parameter
1398 SMFS creates artificial dentries for using they in backstore fs operation
1401 after operation there is backstore fs inode in backfs_dentry->d_inode
1404 if SMFS inode not exits it is created here and connected to backfs inode
1407 several fields are copied from backstore fs inode to smfs one
1410 all artificial dentries are cleared.
1415 Artificial dentry handling
1418 struct dentry *pre_smfs_dentry(struct dentry *parent_dentry, struct inode
1422 struct dentry *dentry) {
1425 struct dentry *cache_dentry = NULL;
1428 cache_dentry = d_alloc(parent_dentry, &dentry->d_name);
1440 cache_dentry->d_parent = cache_dentry;
1446 d_add(cache_dentry, cache_inode);
1449 RETURN(cache_dentry);
1457 void post_smfs_dentry(struct dentry *cache_dentry) {
1466 d_unalloc(cache_dentry);
1472 For inode operation we have parent inode and dentry for operation.
1473 So SMFS has to create artificial parent dentry with backfs_inode connected
1474 to it and artificial dentry for ext3 operation.
1475 This is done with pre_smfs_dentry() method.
1478 After successfull creation SMFS will do backfs operation and gets filled
1480 Next step is getting SMFS inode from cache using ext3 inode number.
1481 If inode is found it is connected to smfs dentry and operation can be counted
1483 SMFS clean all artificial dentries and exits.
1488 duplicate_inode details
1498 lst{../../include/linux/lustre_smfs.h}{Duplicate
1500 _inode()}{firstline=298,lastline=317}
1518 lst{../../smfs/dir.c}{Create()}{firstline=44,lastline=577}
1531 lst{../../smfs/file.c}{Truncate()}{firstline=378,lastline=533}
1535 \layout Subsubsection
1540 SMFS creates artificial struct file object for each SMFS file struct and
1541 use it for backstore fs operations.
1542 Backstore struct file is created in open() method and connected to private
1543 field in smfs struct file.
1544 This struct will be released in smfs_release().
1547 When created and modified this struct file is duplicated to smfs one:
1562 lst{../../include/linux/lustre_smfs.h}{duplicate
1564 _file()}{firstline=343,lastline=361}
1570 Common case (write/read/llseek/mmap/ioctl/readdir)
1576 struct inode * backfs_inode;
1579 struct smfs_file_info *sfi;
1585 backfs_inode = I2CI(file->f_dentry->d_inode);
1591 pre_smfs_inode(file->f_dentry->d_inode, backfs_inode);
1594 SMFS_HOOK(..., hook_opcode , ..., PRE_HOOK, ...);
1597 if (backfs_inode->i_fop->...)
1600 backfs_inode->i_fop->...(sfi->c_file, ...);
1603 SMFS_HOOK(..., hook_opcode, ..., POST_HOOK, ...);
1606 post_smfs_inode(file->f_dentry->d_inode, backfs_inode); /* duplicate_ino
1610 duplicate_file(file, sfi->c_file);
1619 int smfs_fsync(struct file *file, struct dentry *dentry, int datasync)
1625 struct smfs_file_info *sfi = NULL;
1628 struct dentry *backfs_dentry = NULL;
1631 struct file *backfs_file = NULL;
1634 struct inode *backfs_inode = NULL;
1639 backfs_inode = I2CI(dentry->d_inode);
1642 backfs_dentry = pre_smfs_dentry(NULL, backfs_inode, dentry);
1651 backfs_file = sfi->c_file;
1657 pre_smfs_inode(dentry->d_inode, backfs_inode);
1660 if (backfs_inode->i_fop->fsync) {
1663 rc = backfs_inode->i_fop->fsync(backfs_file, backfs_dentry,
1670 post_smfs_inode(dentry->d_inode, backfs_inode);
1673 duplicate_file(file, backfs_file);
1676 post_smfs_dentry(backfs_dentry);
1685 int smfs_open(struct inode *inode, struct file *filp)
1691 struct inode *backfs_inode = NULL;
1697 backfs_inode = I2CI(inode);
1700 smfs_init_cache_file(inode, filp);
1703 if (backfs_inode->i_fop->open)
1706 rc = cache_inode->i_fop->open(backfs_inode, F2CF(filp));
1709 duplicate_file(filp, F2CF(filp));
1718 int smfs_release(struct inode *inode, struct file *filp)
1724 struct inode *backfs_inode = NULL;
1727 struct file *backfs_file = NULL;
1730 struct smfs_file_info *sfi = NULL;
1735 backfs_inode = I2CI(inode);
1744 backfs_file = sfi->c_file;
1750 if (backfs_inode->i_fop->release)
1753 backfs_inode->i_fop->release(backfs_inode, backfs_file);
1756 post_smfs_inode(inode, backfs_inode);
1759 smfs_cleanup_cache_file(filp);
1771 There are several situations are possible with transactions.
1772 SMFS doesn't know will plugin do transaction or not.
1774 \layout Subsubsection
1776 Backstore FS will do transaction
1779 In that case plugins actions should be in same transaction with backstore
1781 Next actions will be done:
1784 SMFS gets extra size for current operation from all plugins.
1787 If extra size is not null then some plugins will participate in transaction.
1788 SMFS starts transaction with calculated size.
1794 SMFS call backstore FS operation.
1797 SMFS call post_hook.
1800 SMFS commit transaction.
1801 \layout Subsubsection
1803 Backstore FS operation is not journaled
1806 In that case plugins may don't care about external conditions and do transaction
1810 SMFS call pre_hook or post_hook
1813 Plugin starts transaction.
1817 Plugin do what it should.
1820 Plugin commit transaction.
1823 Transaction MUST be commited where it was started.
1824 It is not allowed to start it in pre_hook and commit in post_hook.
1828 \layout Subsubsection
1833 VFS calls SMFS operation and takes i_sem.
1836 SMFS does pre_hook operation, calls backstore FS operation and then - post_hook.
1839 All operations are protected by i_sem in SMFS inode.
1840 But there are possible situations when plugin hook sequence will be like
1853 This is possible because inode1 may block and operation for inode2 will
1855 Plugin should be aware about that and use additional protection if it is
1858 \layout Subsubsection
1863 File operations are not protected right now.
1864 SMFS can also use i_sem to protect these operations, but previous words
1865 about operations order are make sense here also.