2.6.12.6 vanilla (kernel.org)
2.6.16.27-0.9 (SLES 10)
* Client support for unpatched kernels:
- (see https://mail.clusterfs.com/wikis/lustre/PatchlessClient)
- 2.6.16 - 2.6.19 vanilla (kernel.org)
- 2.6.9-42.0.8.EL (RHEL 4)
+ (see http://wiki.lustre.org/index.php?title=Patchless_Client)
+ 2.6.9-42.0.10.EL (RHEL 4)
+ 2.6.16 - 2.6.21 vanilla (kernel.org)
* Recommended e2fsprogs version: 1.39.cfs7
* Note that reiserfs quotas are disabled on SLES 10 in this kernel.
* bug fixes
+Severity : enhancement
+Bugzilla : 12194
+Description: add optional extra BUILD_VERSION info
+Details : add a new environment variable (namely LUSTRE_VERS) which allows
+ to override the lustre version.
+
+Severity : normal
+Frequency : 2.6.18 servers only
+Bugzilla : 12546
+Description: ll_kern_mount() doesn't release the module reference
+Details : The ldiskfs module reference count never drops down to 0
+ because ll_kern_mount() doesn't release the module reference.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 12470
+Description: server LBUG when using old ost_num_threads parameter
+Details : Accept the old ost_num_threads parameter but warn that it
+ is deprecated, and fix an off-by-one error that caused an LBUG.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 11722
+Description: Transient SCSI error results in persistent IO issue
+Details : iobuf->dr_error is not reinitialized to 0 between two
+ uses.
+
+Severity : normal
+Frequency : sometimes when underlying device returns I/O errors
+Bugzilla : 11743
+Description: OSTs not going read-only during write failures
+Details : OSTs are not remounted read-only when the journal commit threads
+ get I/O errors because fsfilt_ext3 calls journal_start/stop()
+ instead of the ext3 wrappers.
+
+Severity : minor
+Bugzilla : 12364
+Description: poor connect scaling with increasing client count
+Details : Don't run filter_grant_sanity_check for more than 100 exports
+ to improve scaling for large numbers of clients.
+
+Severity : normal
+Frequency : SLES10 only
+Bugzilla : 12538
+Description: sanity-quota.sh quotacheck failed: rc = -22
+Details : Quotas cannot be enabled on SLES10.
+
Severity : normal
Frequency : liblustre clients only
Bugzilla : 12229
is possible to return to the previous behaviour during configure
with --enable-health-write.
+Severity : enhancement
+Bugzilla : 10768
+Description: 64-bit inode version
+Details: : Add a on-disk 64-bit inode version for ext3 to track changes made
+ to the inode. This will be required for version-based recovery.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 11818
+Description: MDS fails to start if a duplicate client export is detected
+Details : in some rare cases it was possible for a client to connect to
+ an MDS multiple times. Upon recovery the MDS would detect this
+ and fail during startup. Handle this more gracefully.
+
+Severity : enhancement
+Bugzilla : 11563
+Description: Add -o localflock option to simulate old noflock
+behaviour.
+Details : This will achieve local-only flock/fcntl locks
+ coherentness.
+
+Severity : minor
+Frequency : rare
+Bugzilla : 11658
+Description: log_commit_thread vs filter_destroy race leads to crash
+Details : Take import reference before releasing llog record semaphore
+
+Severity : normal
+Frequency : rare
+Bugzilla : 12477
+Description: Wrong request locking in request set processing
+Details : ptlrpc_check_set wrongly uses req->rq_lock for proctect add to
+ imp_delayed_list, in this place should be used imp_lock.
+
+Severity : normal
+Frequency : when reconnection
+Bugzilla : 11662
+Description: Grant Leak when osc reconnect to OST
+Details : When osc reconnect ost, OST(filter) should check whether it
+ should grant more space to client by comparing fed_grant and
+ cl_avail_grant, and return the granted space to client instead
+ of "new granted" space, because client will call osc_init_grant
+ to update the client grant space info.
+
+Severity : normal
+Frequency : when client reconnect to OST
+Bugzilla : 11662
+Description: Grant Leak when osc do resend and replay bulk write
+Details : When osc reconnect to OST, OST(filter)should clear grant info of
+ bulk write request, because the grant info while be sync between
+ OSC and OST when reconnect, and we should ignore the grant info
+ these of resend/replay write req.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 11662
+Description: Grant space more than avaiable left space sometimes.
+Details : When then OST is about to be full, if two bulk writing from
+ different clients came to OST. Accord the avaliable space of the
+ OST, the first req should be permitted, and the second one
+ should be denied by ENOSPC. But if the seconde arrived before
+ the first one is commited. The OST might wrongly permit second
+ writing, which will cause grant space > avaiable space.
+
+Severity : normal
+Frequency : when client is evicted
+Bugzilla : 12371
+Description: Grant might be wrongly erased when osc is evicted by OST
+Details : when the import is evicted by server, it will fork another
+ thread ptlrpc_invalidate_import_thread to invalidate the
+ import, where the grant will be set to 0. While the original
+ thread will update the grant it got when connecting. So if
+ the former happened latter, the grant will be wrongly errased
+ because of this race.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 12401
+Description: Checking Stale with correct fid
+Details : ll_revalidate_it should uses de_inode instead of op_data.fid2
+ to check whether it is stale, because sometimes, we want the
+ enqueue happened anyway, and op_data.fid2 will not be initialized.
+
+Severity : enhancement
+Bugzilla : 11647
+Description: update patchless client
+Details : Add support for patchless client with 2.6.20, 2.6.21 and RHEL 5
+
+Severity : normal
+Frequency : only with 2.4 kernel
+Bugzilla : 12134
+Description: random memory corruption
+Details : size of struct ll_inode_info is to big for union inode.u and this
+ can be cause of random memory corruption.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 10818
+Description: Memory leak in recovery
+Details : Lov_mds_md was not free in an error handler in mds_create_object.
+ It should also check obd_fail before fsfilt_start, otherwise if
+ fsfilt_start return -EROFS,(failover mds during mds recovery).
+ then the req will return with repmsg->transno = 0 and rc = EROFS.
+ and we met hit the assert LASSERT(req->rq_reqmsg->transno ==
+ req->rq_repmsg->transno) in ptlrpc_replay_interpret. Fcc should
+ be freed no matter whether fsfilt_commit success or not.
+
+Severity : minor
+Frequency : only with huge count clients
+Bugzilla : 11817
+Description: Prevents from taking the superblock lock in llap_from_page for
+ a soon died page.
+Details : using LL_ORIGIN_REMOVEPAGE origin flag instead of LL_ORIGIN_UNKNOW
+ for llap_from_page call in ll_removepage prevents from taking the
+ superblock lock for a soon died page.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 11935
+Description: Not check open intent error before release open handle
+Details : in some rare cases, the open intent error is not checked before
+ release open handle, which may cause
+ ASSERTION(open_req->rq_transno != 0), because it tries to release
+ the failed open handle.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 12556
+Description: Set cat log bitmap only after create log success.
+Details : in some rare cases, the cat log bitmap is set too early. and it
+ should be set only after create log success.
+
+Severity : major
+Bugzilla : 11971
+Description: Accessing a block bevice can re-enable I/O when Lustre is
+ tearing down a device.
+Details : dev_clear_rdonly(bdev) must be called in kill_bdev() instead of
+ blkdev_put().
+
+Severity : minor
+Bugzilla : 11706
+Description: service threads may hog cpus when there are a lot of requests
+ coming
+Details : Insert cond_resched to give other threads a chance to use some of
+ the cpu
+
+Severity : normal
+Frequency : rare
+Bugzilla : 12086
+Description: the cat log was not initialized in recovery
+Details : When mds(mgs) do recovery, the tgt_count might be zero, so the
+ unlink log on mds will not be initialized until mds post
+ recovery. And also in mds post recovery, the unlink log will
+ initialization will be done asynchronausly, so there will be race
+ between add unlink log and unlink log initialization.
+
+Severity : normal
+Bugzilla : 12597
+Description: brw_stats were being printed incorrectly
+Details : brw_stats were being printed as log2 but all of them were not
+ recorded as log2. Also remove some code duplication arising from
+ filter_tally_{read,write}.
+
+Severity : normal
+Bugzilla : 11674
+Frequency : rare, only in recovery.
+Description: ASSERTION(req->rq_type != LI_POISON) failed
+Details : imp_lock should be held while iterating over imp_sending_list for
+ prevent destroy request after get timeout in ptlrpc_queue_wait.
+
+Severity : normal
+Bugzilla : 12689
+Description: replay-single.sh test 52 fails
+Details : A lock's skiplist need to be cleanup when it being unlinked
+ from its resource list.
+
+Severity : normal
+Bugzilla : 11737
+Description: Short directio read returns full requested size rather than
+ actual amount read.
+Details : Direct I/O operations should return actual amount of bytes
+ transferred rather than requested size.
+
--------------------------------------------------------------------------------
2007-05-03 Cluster File Systems, Inc. <info@clusterfs.com>
* CONFIGURATION CHANGE. This version of Lustre WILL NOT
INTEROPERATE with older versions automatically. In many cases a
special upgrade step is needed. Please read the
- user documentation before upgrading any part of a 1.4.x system.
+ user documentation before upgrading any part of a live system.
+ * WIRE PROTOCOL CHANGE from previous 1.6 beta versions. This
+ version will not interoperate with 1.6 betas before beta5 (1.5.95).
* WARNING: Lustre configuration and startup changes are required with
this release. See https://mail.clusterfs.com/wikis/lustre/MountConf
for details.
- * Support for kernels:
- 2.4.21-47.0.1.EL (RHEL 3)
- 2.6.5-7.283 (SLES 9)
- 2.6.9-42.0.10.EL (RHEL 4)
- 2.6.12.6 vanilla (kernel.org)
- 2.6.16.27-0.9 (SLES10)
- * Client support for unpatched kernels:
- (see https://mail.clusterfs.com/wikis/lustre/PatchlessClient)
- 2.6.16 - 2.6.19 vanilla (kernel.org)
- 2.6.9-42.0.8EL (RHEL 4)
- * Recommended e2fsprogs version: 1.39.cfs6
- * Note that reiserfs quotas are disabled on SLES 10 in this kernel
* bug fixes
+
Severity : enhancement
Bugzilla : 8007
Description: MountConf
Description: startup order invariance
Details : MDTs and OSTs can be started in any order. Clients only
require the MDT to complete startup.
-
+
Severity : enhancement
Bugzilla : 4899
Description: parallel, asynchronous orphan cleanup
Description: optimized stripe assignment
Details : stripe assignments are now made based on ost space available,
ost previous usage, and OSS previous usage, in order to try
- to optimize storage space and networking resources.
-
+ to optimize storage space and networking resources.
+
Severity : enhancement
Bugzilla : 4226
Description: Permanently set tunables
Details : All writable /proc/fs/lustre tunables can now be permanently
- set on a per-server basis, at mkfs time or on a live
+ set on a per-server basis, at mkfs time or on a live
system.
-
+
Severity : enhancement
Bugzilla : 10547
Description: Lustre message v2
Details : Clients can be started with a list of OSTs that should be
declared "inactive" for known non-responsive OSTs.
-Severity : normal
-Bugzilla : 12123
-Description: ENOENT returned for valid filehandle during dbench.
-Details : Check if a directory has children when invalidating dentries
- associated with an inode during lock cancellation. This fixes
- an incorrect ENOENT sometimes seen for valid filehandles during
- testing with dbench.
-
Severity : minor
-Frequency : SFS test only (otherwise harmless)
Bugzilla : 6062
Description: SPEC SFS validation failure on NFS v2 over lustre.
Details : Changes the blocksize for regular files to be 2x RPC size,
- and not depend on stripe size.
-
-Severity : enhancement
-Bugzilla : 10088
-Description: fine-grained SMP locking inside DLM
-Details : Improve DLM performance on SMP systems by removing the single
- per-namespace lock and replace it with per-resource locks.
-
-Severity : enhancement
-Bugzilla : 9332
-Description: don't hold multiple extent locks at one time
-Details : To avoid client eviction during large writes, locks are not
- held on multiple stripes at one time or for very large writes.
- Otherwise, clients can block waiting for a lock on a failed OST
- while holding locks on other OSTs and be evicted.
-
+ and not depend on stripe size.
+
Severity : enhancement
Bugzilla : 9293
Description: Multiple MD RPCs in flight.
-Details : Further unserialise some read-only MDT RPCs - learn about intents.
- To avoid overly-overloading MDT, introduce a limit on number of
- MDT RPCs in flight for a single client and add /proc controls
- to adjust this limit.
+Details : Further unserialise some read-only MDS RPCs - learn about intents.
+ To avoid overly-overloading MDS, introduce a limit on number of
+ MDS RPCs in flight for a single client and add /proc controls
+ to adjust this limit.
Severity : enhancement
Bugzilla : 22484
Description: client read/write statistics
Details : Add client read/write call usage stats for performance
- analysis of user processes.
+ analysis of user processes.
/proc/fs/lustre/llite/*/offset_stats shows non-sequential
file access. extents_stats shows chunk size distribution.
extents_stats_per_process show chunk size distribution per
- user process.
-
-Severity : enhancement
-Bugzilla : 22485
-Description: per-client statistics on server
-Details : Add ldlm and operations statistics for each client in
- /proc/fs/lustre/mds|obdfilter/*/exports/
+ user process.
Severity : enhancement
Bugzilla : 22486
-Description: improved MDT statistics
-Details : Add detailed MDT operations statistics in
- /proc/fs/lustre/mds/*/stats
-
-Severity : enhancement
-Bugzilla : 10968
-Description: VFS operations stats
-Details : Add client VFS call stats, trackable by pid, ppid, or gid
- /proc/fs/lustre/llite/*/vfs_ops_stats
- /proc/fs/lustre/llite/*/vfs_track_[pid|ppid|gid]
+Description: mds statistics
+Details : Add detailed mds operations statistics in
+ /proc/fs/lustre/mds/*/stats.
Severity : minor
-Frequency : always
-Bugzilla : 6380
-Description: Fix client-side osc byte counters
-Details : The osc read/write byte counters in
- /proc/fs/lustre/osc/*/stats are now working
-
-Severity : minor
-Frequency : always as root on SLES
Bugzilla : 10667
Description: Failure of copying files with lustre special EAs.
Details : Client side always return success for setxattr call for lustre
ext3 code. The SLES10 kernel turns barrier support on by
default. The fix is to undo that change for ldiskfs.
-
------------------------------------------------------------------------------
2006-12-09 Cluster File Systems, Inc. <info@clusterfs.com>
------------------------------------------------------------------------------
-2006-08-20 Cluster File Systems, Inc. <info@clusterfs.com>
+08-20-2006 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.4.7
* Support for kernels:
- 2.6.9-42.EL (RHEL 4)
- 2.6.5-7.267 (SLES 9)
- 2.4.21-40.EL (RHEL 3)
- 2.6.12.6 vanilla (kernel.org)
+ 2.6.9-42.EL (RHEL 4)
+ 2.6.5-7.276 (SLES 9)
+ 2.4.21-40.EL (RHEL 3)
+ 2.6.12.6 vanilla (kernel.org)
* bug fixes
Severity : major
never been opened it would be possible to oops the client
if the file had no objects.
-Severity : major
-Frequency : rare
-Bugzilla : 9326, 10402, 10897
-Description: client crash in ptlrpcd_wake() thread when sending async RPC
-Details : It is possible that ptlrpcd_wake() dereferences a freed async
- RPC. In rare cases the ptlrpcd thread alread processed the RPC
- before ptlrpcd_wake() was called and the request was freed.
-
Severity : minor
Frequency : always for liblustre
Bugzilla : 10290
or if the filesystem is corrupt and cannot even mount then the
error handling cleanup routines would dereference a NULL pointer.
-Severity : normal
+Severity : medium
Frequency : rare
Bugzilla : 10047
Description: NULL pointer deref in llap_from_page.
around call to generic_file_sendfile() much like we do in
ll_file_read().
-Severity : normal
+Severity : medium
Frequency : with certain MDS communication failures at client mount time
Bugzilla : 10268
Description: NULL pointer deref after failed client mount
reference from the request import to the obd device and delay
the cleanup until the network drops the request.
-Severity : normal
+Severity : medium
Frequency : occasionally during client (re)connect
Bugzilla : 9387
Description: assertion failure during client (re)connect
client may trip an assertion failure in ptlrpc_connect_interpret()
which thought it would be the only running connect process.
-Severity : normal
+Severity : medium
Frequency : only with obd_echo servers and clients that are rebooted
Bugzilla : 10140
Description: kernel BUG accessing uninitialized data structure
Details : Implement non-rawops metadata methods for NFS server to use without
changing NFS server code.
-Severity : normal
+Severity : medium
Frequency : very rare (synthetic metadata workload only)
Bugzilla : 9974
Description: two racing renames might cause an MDS thread to deadlock
Severity : critical
Frequency : Always, for 32-bit kernel without CONFIG_LBD and filesystem > 2TB
Bugzilla : 6191
-Description: filesystem corruption for non-standard kernels and very large OSTs
+Description: ldiskfs crash at mount for filesystem larger than 2TB with mballoc
Details : If a 32-bit kernel is compiled without CONFIG_LBD enabled and a
filesystems larger than 2TB is mounted then the kernel will
silently corrupt the start of the filesystem. CONFIG_LBD is
just take a reference before calling vfs_unlink() and release it
when parent's i_sem is free.
+Severity : major
+Frequency : rare
+Bugzilla : 4778
+Description: last_id value checked outside lock on OST caused LASSERT failure
+Details : If there were multiple MDS->OST object precreate requests in
+ flight, it was possible that the OST's last object id was checked
+ outside a lock and incorrectly tripped an assertion. Move checks
+ inside locks, and discard old precreate requests.
+
Severity : minor
Frequency : always, if extents are used on OSTs
Bugzilla : 10703
import connections to be ignored if the 32-bit jiffies counter
wraps. Use a 64-bit jiffies counter.
-Severity : major
-Frequency : during server recovery
-Bugzilla : 10479
-Description: crash after server is denying duplicate export
-Details : If clients are resending connect requests to the server, the
- server refuses to allow a client to connect multiple times.
- Fixed a bug in the handling of this case.
-
Severity : minor
Frequency : very large clusters immediately after boot
Bugzilla : 10083
Bugzilla : 9314
Description: Assertion failure in ll_local_open after replay.
Details : If replay happened on an open request reply before we were able
- to set replay handler, reply will become not swabbed tripping the
- assertion in ll_local_open. Now we set the handler right after
- recognising of open request
+ to set replay handler, reply will become not swabbed tripping the
+ assertion in ll_local_open. Now we set the handler right after
+ recognising of open request
-Severity : minor
+Severity : trivial
Frequency : very rare
Bugzilla : 10584
Description: kernel reports "badness in vsnprintf"
Frequency : always
Bugzilla : 10611
Description: Inability to activate failout mode
-Details : lconf script incorrectly assumed that in python string's numeric
+Details : lconf script incorrectly assumed that in pythong string's numeric
value is used in comparisons.
Severity : minor
the MDS is always picking the same starting OST for each file.
Return the OST selection heuristic to the original design.
-Severity : minor
+Severity : trivial
Frequency : rare
Bugzilla : 10673
Description: mount failures may take full timeout to return an error
failed mount can wait for the full obd_timeout interval,
possibly several minutes, before reporting an error.
Instead return an error as soon as the status is known.
+Severity : major
+Frequency : quota enabled and large files being deleted
+Bugzilla : 10707
+Description: releasing more than 4GB of quota at once hangs OST
+Details : If a user deletes more than 4GB of files on a single OST it
+ will cause the OST to spin in an infinite loop. Release
+ quota in < 4GB chunks, or use a 64-bit value for 1.4.7.1+.
+
+Severity : trivial
+Frequency : rare
+Bugzilla : 10845
+Description: statfs data retrieved from /proc may be stale or zero
+Details : When reading per-device statfs data from /proc, in the
+ {kbytes,files}_{total,free,avail} files, it may appear
+ as zero or be out of date.
+
+Severity : trivial
+Frequency : systems with MD RAID1 external journal devices
+Bugzilla : 10832
+Description: lconf's call to blkid is confused by RAID1 journal devices
+Details : Use the "blkid -l" flag to locate the MD RAID device instead
+ of returning all block devices that match the journal UUID.
+
+Severity : normal
+Frequency : always, for aggregate stripe size over 4GB
+Bugzilla : 10725
+Description: assertion fails when trying to use 4GB stripe size
+Details : Use "setstripe" to set stripe size over 4GB will fail the kernel,
+ complaining "ASSERTION(lsm->lsm_xfersize != 0)"
+
+Severity : normal
+Frequency : always on ppc64
+Bugzilla : 10634
+Description: the first write on an ext3 filesystem with mballoc got stuck
+Details : ext3_mb_generate_buddy() uses find_next_bit() which does not
+ perform endianness conversion.
------------------------------------------------------------------------------
-2006-02-14 Cluster File Systems, Inc. <info@clusterfs.com>
+02-14-2006 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.4.6
* WIRE PROTOCOL CHANGE. This version of Lustre networking WILL NOT
INTEROPERATE with older versions automatically. Please read the
Rather --with-portals=<path-to-portals-includes> is used to
enable building on the XT3. In addition to enable XT3 specific
features the option --enable-cray-xt3 must be used.
-
+
Severity : major
Frequency : rare
Bugzilla : 7407
------------------------------------------------------------------------------
-2005-08-26 Cluster File Systems, Inc. <info@clusterfs.com>
+08-26-2005 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.4.5
* bug fixes
* add hard link support
* change obdfile creation method
* kernel patch changed
-
+
2002-09-19 Peter Braam <braam@clusterfs.com>
* version 0_5_9
* bug fix
-@LDISKFS_TRUE@subdir-m += ldiskfs ldiskfs2
-
+subdir-m += fid
subdir-m += lvfs
subdir-m += obdclass
subdir-m += lov
subdir-m += obdecho
subdir-m += mgc
-@SERVER_TRUE@subdir-m += mds obdfilter ost mgs
-@CLIENT_TRUE@subdir-m += mdc llite
+@SERVER_TRUE@subdir-m += mds obdfilter ost mgs mdt cmm mdd osd
+@CLIENT_TRUE@subdir-m += mdc lmv llite fld
@QUOTA_TRUE@subdir-m += quota
@INCLUDE_RULES@
# also update lustre/autoconf/lustre-core.m4 AC_CONFIG_FILES
ALWAYS_SUBDIRS := include lvfs obdclass ldlm ptlrpc osc lov obdecho \
- mgc doc utils tests scripts autoconf contrib
+ mgc fid fld doc utils tests scripts autoconf contrib
-SERVER_SUBDIRS := ldiskfs ldiskfs2 obdfilter ost mds mgs
+SERVER_SUBDIRS := obdfilter ost mds mgs mdt cmm mdd osd
-CLIENT_SUBDIRS := mdc llite
+CLIENT_SUBDIRS := mdc lmv llite
QUOTA_SUBDIRS := quota
EXTRA_DIST = BUGS FDL kernel_patches
-if LDISKFS
-LDISKFS = ldiskfs-sources ldiskfs2-sources
-ldiskfs-sources:
- $(MAKE) sources -C ldiskfs
-ldiskfs2-sources:
- $(MAKE) sources -C ldiskfs2
-endif
-
lvfs-sources:
$(MAKE) sources -C lvfs
obdclass-sources:
all-recursive: lustre_build_version
-BUILD_VER_H=$(top_builddir)/lustre/include/linux/lustre_build_version.h
+BUILD_VER_H=$(top_builddir)/lustre/include/lustre/lustre_build_version.h
lustre_build_version:
perl $(top_builddir)/lustre/scripts/version_tag.pl $(top_srcdir) $(top_builddir) > tmpver
-EXTRA_DIST := lustre-core.m4 lustre-version.ac
+EXTRA_DIST := lustre-core.m4 lustre-version.ac kerberos5.m4
--- /dev/null
+dnl Checks for Kerberos
+dnl NOTE: while we intend to do generic gss-api, currently we
+dnl have a requirement to get an initial Kerberos machine
+dnl credential. Thus, the requirement for Kerberos.
+dnl The Kerberos gssapi library will be dynamically loaded?
+AC_DEFUN([AC_KERBEROS_V5],[
+ AC_MSG_CHECKING(for Kerberos v5)
+ AC_ARG_WITH(krb5,
+ [AC_HELP_STRING([--with-krb5=DIR], [use Kerberos v5 installation in DIR])],
+ [ case "$withval" in
+ yes|no)
+ krb5_with=""
+ ;;
+ *)
+ krb5_with="$withval"
+ ;;
+ esac ]
+ )
+
+ for dir in $krb5_with /usr /usr/kerberos /usr/local /usr/local/krb5 \
+ /usr/krb5 /usr/heimdal /usr/local/heimdal /usr/athena ; do
+ dnl This ugly hack brought on by the split installation of
+ dnl MIT Kerberos on Fedora Core 1
+ K5CONFIG=""
+ if test -f $dir/bin/krb5-config; then
+ K5CONFIG=$dir/bin/krb5-config
+ elif test -f "/usr/kerberos/bin/krb5-config"; then
+ K5CONFIG="/usr/kerberos/bin/krb5-config"
+ elif test -f "/usr/lib/mit/bin/krb5-config"; then
+ K5CONFIG="/usr/lib/mit/bin/krb5-config"
+ fi
+ if test "$K5CONFIG" != ""; then
+ KRBCFLAGS=`$K5CONFIG --cflags`
+ KRBLIBS=`$K5CONFIG --libs gssapi`
+ K5VERS=`$K5CONFIG --version | head -n 1 | awk '{split($(4),v,"."); if (v@<:@"3"@:>@ == "") v@<:@"3"@:>@ = "0"; print v@<:@"1"@:>@v@<:@"2"@:>@v@<:@"3"@:>@ }'`
+ AC_DEFINE_UNQUOTED(KRB5_VERSION, $K5VERS, [Define this as the Kerberos version number])
+ if test -f $dir/include/gssapi/gssapi_krb5.h -a \
+ \( -f $dir/lib/libgssapi_krb5.a -o \
+ -f $dir/lib/libgssapi_krb5.so \) ; then
+ AC_DEFINE(HAVE_KRB5, 1, [Define this if you have MIT Kerberos libraries])
+ KRBDIR="$dir"
+ dnl If we are using MIT K5 1.3.1 and before, we *MUST* use the
+ dnl private function (gss_krb5_ccache_name) to get correct
+ dnl behavior of changing the ccache used by gssapi.
+ dnl Starting in 1.3.2, we *DO NOT* want to use
+ dnl gss_krb5_ccache_name, instead we want to set KRB5CCNAME
+ dnl to get gssapi to use a different ccache
+ if test $K5VERS -le 131; then
+ AC_DEFINE(USE_GSS_KRB5_CCACHE_NAME, 1, [Define this if the private function, gss_krb5_cache_name, must be used to tell the Kerberos library which credentials cache to use. Otherwise, this is done by setting the KRB5CCNAME environment variable])
+ fi
+ gssapi_lib=gssapi_krb5
+ break
+ dnl The following ugly hack brought on by the split installation
+ dnl of Heimdal Kerberos on SuSe
+ elif test \( -f $dir/include/heim_err.h -o\
+ -f $dir/include/heimdal/heim_err.h \) -a \
+ -f $dir/lib/libroken.a; then
+ AC_DEFINE(HAVE_HEIMDAL, 1, [Define this if you have Heimdal Kerberos libraries])
+ KRBDIR="$dir"
+ gssapi_lib=gssapi
+ break
+ fi
+ fi
+ done
+ dnl We didn't find a usable Kerberos environment
+ if test "x$KRBDIR" = "x"; then
+ if test "x$krb5_with" = "x"; then
+ AC_MSG_ERROR(Kerberos v5 with GSS support not found: consider --disable-gss or --with-krb5=)
+ else
+ AC_MSG_ERROR(Kerberos v5 with GSS support not found at $krb5_with)
+ fi
+ fi
+ AC_MSG_RESULT($KRBDIR)
+
+ dnl Check if -rpath=$(KRBDIR)/lib is needed
+ echo "The current KRBDIR is $KRBDIR"
+ if test "$KRBDIR/lib" = "/lib" -o "$KRBDIR/lib" = "/usr/lib" \
+ -o "$KRBDIR/lib" = "//lib" -o "$KRBDIR/lib" = "/usr//lib" ; then
+ KRBLDFLAGS="";
+ elif /sbin/ldconfig -p | grep > /dev/null "=> $KRBDIR/lib/"; then
+ KRBLDFLAGS="";
+ else
+ KRBLDFLAGS="-Wl,-rpath=$KRBDIR/lib"
+ fi
+
+ dnl Now check for functions within gssapi library
+ AC_CHECK_LIB($gssapi_lib, gss_krb5_export_lucid_sec_context,
+ AC_DEFINE(HAVE_LUCID_CONTEXT_SUPPORT, 1, [Define this if the Kerberos GSS library supports gss_krb5_export_lucid_sec_context]), ,$KRBLIBS)
+ AC_CHECK_LIB($gssapi_lib, gss_krb5_set_allowable_enctypes,
+ AC_DEFINE(HAVE_SET_ALLOWABLE_ENCTYPES, 1, [Define this if the Kerberos GSS library supports gss_krb5_set_allowable_enctypes]), ,$KRBLIBS)
+ AC_CHECK_LIB($gssapi_lib, gss_krb5_ccache_name,
+ AC_DEFINE(HAVE_GSS_KRB5_CCACHE_NAME, 1, [Define this if the Kerberos GSS library supports gss_krb5_ccache_name]), ,$KRBLIBS)
+
+ dnl If they specified a directory and it didn't work, give them a warning
+ if test "x$krb5_with" != "x" -a "$krb5_with" != "$KRBDIR"; then
+ AC_MSG_WARN(Using $KRBDIR instead of requested value of $krb5_with for Kerberos!)
+ fi
+
+ AC_SUBST([KRBDIR])
+ AC_SUBST([KRBLIBS])
+ AC_SUBST([KRBCFLAGS])
+ AC_SUBST([KRBLDFLAGS])
+ AC_SUBST([K5VERS])
+
+])
#* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
+#* vim:expandtab:shiftwidth=8:tabstop=8:
#
# LC_CONFIG_SRCDIR
#
AC_DEFUN([LC_CONFIG_SRCDIR],
[AC_CONFIG_SRCDIR([lustre/obdclass/obdo.c])
])
-
+
#
# LC_PATH_DEFAULTS
#
#
# LC_CONFIG_BACKINGFS
#
-# whether to use ldiskfs instead of ext3
+# setup, check the backing filesystem
#
AC_DEFUN([LC_CONFIG_BACKINGFS],
[
-BACKINGFS='ext3'
-
-# 2.6 gets ldiskfs
-AC_MSG_CHECKING([whether to enable ldiskfs])
-AC_ARG_ENABLE([ldiskfs],
- AC_HELP_STRING([--enable-ldiskfs],
- [use ldiskfs for the Lustre backing FS]),
- [],[enable_ldiskfs="$linux25"])
-AC_MSG_RESULT([$enable_ldiskfs])
-
-if test x$enable_ldiskfs = xyes ; then
- BACKINGFS="ldiskfs"
+BACKINGFS="ldiskfs"
- AC_MSG_CHECKING([whether to enable quilt for making ldiskfs])
- AC_ARG_ENABLE([quilt],
- AC_HELP_STRING([--disable-quilt],[disable use of quilt for ldiskfs]),
- [],[enable_quilt='yes'])
- AC_MSG_RESULT([$enable_quilt])
+if test x$with_ldiskfs = xno ; then
+ BACKINGFS="ext3"
- AC_PATH_PROG(PATCH, patch, [no])
-
- if test x$enable_quilt = xno ; then
- QUILT="no"
- else
- AC_PATH_PROG(QUILT, quilt, [no])
+ if test x$linux25$enable_server = xyesyes ; then
+ AC_MSG_ERROR([ldiskfs is required for 2.6-based servers.])
fi
- if test x$enable_ldiskfs$PATCH$QUILT = xyesnono ; then
- AC_MSG_ERROR([Quilt or patch are needed to build the ldiskfs module (for Linux 2.6)])
- fi
-
- AC_DEFINE(CONFIG_LDISKFS_FS_MODULE, 1, [build ldiskfs as a module])
- AC_DEFINE(CONFIG_LDISKFS_FS_XATTR, 1, [enable extended attributes for ldiskfs])
- AC_DEFINE(CONFIG_LDISKFS_FS_POSIX_ACL, 1, [enable posix acls for ldiskfs])
- AC_DEFINE(CONFIG_LDISKFS_FS_SECURITY, 1, [enable fs security for ldiskfs])
-
- AC_DEFINE(CONFIG_LDISKFS2_FS_XATTR, 1, [enable extended attributes for ldiskfs2])
- AC_DEFINE(CONFIG_LDISKFS2_FS_POSIX_ACL, 1, [enable posix acls for ldiskfs2])
- AC_DEFINE(CONFIG_LDISKFS2_FS_SECURITY, 1, [enable fs security for ldiskfs2])
-fi
+ # --- Check that ext3 and ext3 xattr are enabled in the kernel
+ LC_CONFIG_EXT3([],[
+ AC_MSG_ERROR([Lustre requires that ext3 is enabled in the kernel])
+ ],[
+ AC_MSG_WARN([Lustre requires that extended attributes for ext3 are enabled in the kernel])
+ AC_MSG_WARN([This build may fail.])
+ ])
+else
+ # ldiskfs is enabled
+ LB_DEFINE_LDISKFS_OPTIONS
+fi #ldiskfs
AC_MSG_CHECKING([which backing filesystem to use])
AC_MSG_RESULT([$BACKINGFS])
AC_SUBST(BACKINGFS)
-
-case $BACKINGFS in
- ext3)
- # --- Check that ext3 and ext3 xattr are enabled in the kernel
- LC_CONFIG_EXT3([],[
- AC_MSG_ERROR([Lustre requires that ext3 is enabled in the kernel])
- ],[
- AC_MSG_WARN([Lustre requires that extended attributes for ext3 are enabled in the kernel])
- AC_MSG_WARN([This build may fail.])
- ])
- ;;
- ldiskfs)
- AC_MSG_CHECKING([which ldiskfs series to use])
- case $LINUXRELEASE in
- 2.6.5*) LDISKFS_SERIES="2.6-suse.series" ;;
- 2.6.9*) LDISKFS_SERIES="2.6-rhel4.series" ;;
- 2.6.10-ac*) LDISKFS_SERIES="2.6-fc3.series" ;;
- 2.6.10*) LDISKFS_SERIES="2.6-rhel4.series" ;;
- 2.6.12*) LDISKFS_SERIES="2.6.12-vanilla.series" ;;
- 2.6.15*) LDISKFS_SERIES="2.6-fc5.series";;
- 2.6.16*) LDISKFS_SERIES="2.6-sles10.series";;
- 2.6.18*) LDISKFS_SERIES="2.6.18-vanilla.series";;
- *) AC_MSG_WARN([Unknown kernel version $LINUXRELEASE, fix lustre/autoconf/lustre-core.m4])
- esac
- AC_MSG_RESULT([$LDISKFS_SERIES])
- AC_SUBST(LDISKFS_SERIES)
- ;;
-esac # $BACKINGFS
])
#
#
# LC_CONFIG_HEALTH_CHECK_WRITE
#
-# Turn on the actual write to the disk
+# Turn off the actual write to the disk
#
AC_DEFUN([LC_CONFIG_HEALTH_CHECK_WRITE],
[AC_MSG_CHECKING([whether to enable a write with the health check])
-AC_ARG_ENABLE([health-write],
- AC_HELP_STRING([--enable-health-write],
+AC_ARG_ENABLE([health_write],
+ AC_HELP_STRING([--enable-health_write],
[enable disk writes when doing health check]),
[],[enable_health_write='no'])
AC_MSG_RESULT([$enable_health_write])
-if test x$enable_health_write == xyes ; then
+if test x$enable_health_write != xno ; then
AC_DEFINE(USE_HEALTH_CHECK_WRITE, 1, Write when Checking Health)
fi
])
#include <asm/page.h>
#include <linux/mm.h>
],[
- filemap_populate(NULL, 0, 0, __pgprot(0), 0, 0);
+ filemap_populate(NULL, 0, 0, __pgprot(0), 0, 0);
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_FILEMAP_POPULATE, 1, [Kernel exports filemap_populate])
],[
AC_MSG_RESULT([no])
])
+])
#
# LC_EXPORT___IGET
# starting from 2.6.19 linux kernel exports __iget()
#
AC_DEFUN([LC_EXPORT___IGET],
-[AC_MSG_CHECKING([if kernel exports __iget])
- if grep -q "EXPORT_SYMBOL(__iget)" $LINUX/fs/inode.c 2>/dev/null ; then
- AC_DEFINE(HAVE_EXPORT___IGET, 1, [kernel exports __iget])
- AC_MSG_RESULT([yes])
- else
- AC_MSG_RESULT([no])
- fi
+[LB_CHECK_SYMBOL_EXPORT([__iget],
+[fs/inode.c],[
+ AC_DEFINE(HAVE_EXPORT___IGET, 1, [kernel exports __iget])
+],[
])
])
+
AC_DEFUN([LC_LUSTRE_VERSION_H],
[LB_CHECK_FILE([$LINUX/include/linux/lustre_version.h],[
rm -f "$LUSTRE/include/linux/lustre_version.h"
AC_MSG_WARN([Unpatched kernel detected.])
AC_MSG_WARN([Lustre servers cannot be built with an unpatched kernel;])
AC_MSG_WARN([disabling server build])
- enable_server='no'
+ enable_server='no'
fi
])
])
AC_DEFUN([LC_FUNC_SET_FS_PWD],
-[AC_MSG_CHECKING([if kernel exports show_task])
-have_show_task=0
- if grep -q "EXPORT_SYMBOL(show_task)" \
- "$LINUX/fs/namespace.c" 2>/dev/null ; then
- AC_DEFINE(HAVE_SET_FS_PWD, 1, [set_fs_pwd is exported])
- AC_MSG_RESULT([yes])
- else
- AC_MSG_RESULT([no])
- fi
+[LB_CHECK_SYMBOL_EXPORT([set_fs_pwd],
+[fs/namespace.c],[
+ AC_DEFINE(HAVE_SET_FS_PWD, 1, [set_fs_pwd is exported])
+],[
+])
])
+#
+# LC_CAPA_CRYPTO
+#
+AC_DEFUN([LC_CAPA_CRYPTO],
+[LB_LINUX_CONFIG_IM([CRYPTO],[],[
+ AC_MSG_ERROR([Lustre capability require that CONFIG_CRYPTO is enabled in your kernel.])
+])
+LB_LINUX_CONFIG_IM([CRYPTO_HMAC],[],[
+ AC_MSG_ERROR([Lustre capability require that CONFIG_CRYPTO_HMAC is enabled in your kernel.])
+])
+LB_LINUX_CONFIG_IM([CRYPTO_SHA1],[],[
+ AC_MSG_ERROR([Lustre capability require that CONFIG_CRYPTO_SHA1 is enabled in your kernel.])
+])
+])
+
+m4_pattern_allow(AC_KERBEROS_V5)
#
+# LC_CONFIG_GSS
+#
+# Build gss and related tools of Lustre. Currently both kernel and user space
+# parts are depend on linux platform.
+#
+AC_DEFUN([LC_CONFIG_GSS],
+[AC_MSG_CHECKING([whether to enable gss/krb5 support])
+AC_ARG_ENABLE([gss],
+ AC_HELP_STRING([--enable-gss], [enable gss/krb5 support]),
+ [],[enable_gss='no'])
+AC_MSG_RESULT([$enable_gss])
+
+if test x$enable_gss == xyes; then
+ LB_LINUX_CONFIG_IM([SUNRPC],[],[
+ AC_MSG_ERROR([GSS require that CONFIG_SUNRPC is enabled in your kernel.])
+ ])
+ LB_LINUX_CONFIG_IM([CRYPTO_DES],[],[
+ AC_MSG_WARN([DES support is recommended by using GSS.])
+ ])
+ LB_LINUX_CONFIG_IM([CRYPTO_MD5],[],[
+ AC_MSG_WARN([MD5 support is recommended by using GSS.])
+ ])
+ LB_LINUX_CONFIG_IM([CRYPTO_SHA256],[],[
+ AC_MSG_WARN([SHA256 support is recommended by using GSS.])
+ ])
+ LB_LINUX_CONFIG_IM([CRYPTO_SHA512],[],[
+ AC_MSG_WARN([SHA512 support is recommended by using GSS.])
+ ])
+ LB_LINUX_CONFIG_IM([CRYPTO_ARC4],[],[
+ AC_MSG_WARN([ARC4 support is recommended by using GSS.])
+ ])
+ #
+ # AES symbol is uncertain (optimized & depend on arch)
+ #
+
+ AC_CHECK_LIB(gssapi, gss_init_sec_context, [
+ GSSAPI_LIBS="$GSSAPI_LDFLAGS -lgssapi"
+ ], [
+ AC_MSG_ERROR([libgssapi is not found, consider --disable-gss.])
+ ],
+ )
+
+ AC_SUBST(GSSAPI_LIBS)
+ AC_KERBEROS_V5
+fi
+])
+
# LC_FUNC_MS_FLOCK_LOCK
#
# SLES9 kernel has MS_FLOCK_LOCK sb flag
])
#
-# LC_FUNC_MS_FLOCK_LOCK
-#
-# SLES9 kernel has MS_FLOCK_LOCK sb flag
-#
-AC_DEFUN([LC_FUNC_MS_FLOCK_LOCK],
-[AC_MSG_CHECKING([if kernel has MS_FLOCK_LOCK sb flag])
-LB_LINUX_TRY_COMPILE([
- #include <linux/fs.h>
-],[
- int flags = MS_FLOCK_LOCK;
-],[
- AC_DEFINE(HAVE_MS_FLOCK_LOCK, 1,
- [kernel has MS_FLOCK_LOCK flag])
- AC_MSG_RESULT([yes])
-],[
- AC_MSG_RESULT([no])
-])
-])
-
-#
-# LC_FUNC_HAVE_CAN_SLEEP_ARG
-#
-# SLES9 kernel has third arg can_sleep
-# in fs/locks.c: flock_lock_file_wait()
-#
-AC_DEFUN([LC_FUNC_HAVE_CAN_SLEEP_ARG],
-[AC_MSG_CHECKING([if kernel has third arg can_sleep in fs/locks.c: flock_lock_file_wait()])
-LB_LINUX_TRY_COMPILE([
- #include <linux/fs.h>
-],[
- int cansleep;
- struct file *file;
- struct file_lock *file_lock;
- flock_lock_file_wait(file, file_lock, cansleep);
-],[
- AC_DEFINE(HAVE_CAN_SLEEP_ARG, 1,
- [kernel has third arg can_sleep in fs/locks.c: flock_lock_file_wait()])
- AC_MSG_RESULT([yes])
-],[
- AC_MSG_RESULT([no])
-])
-])
-
-#
# LC_TASK_PPTR
#
# task struct has p_pptr instead of parent
AC_MSG_RESULT(NO)
])
])
-
+
#
# LC_STATFS_DENTRY_PARAM
# starting from 2.6.18 linux kernel uses dentry instead of
#
# LC_VFS_KERN_MOUNT
-# starting from 2.6.18 kernel doesn't export do_kern_mount
+# starting from 2.6.18 kernel don't export do_kern_mount
# and want to use vfs_kern_mount instead.
#
AC_DEFUN([LC_VFS_KERN_MOUNT],
])
# 2.6.19 API changes
-# inode doesn't have i_blksize field
+# inode don't have i_blksize field
AC_DEFUN([LC_INODE_BLKSIZE],
[AC_MSG_CHECKING([inode has i_blksize field])
LB_LINUX_TRY_COMPILE([
])
# LC_NR_PAGECACHE
-# 2.6.18 doesn't export nr_pagecahe
+# 2.6.18 don't export nr_pagecahe
AC_DEFUN([LC_NR_PAGECACHE],
[AC_MSG_CHECKING([kernel export nr_pagecache])
LB_LINUX_TRY_COMPILE([
])
])
-# LC_WB_RANGE_START
-# 2.6.20 rename struct writeback fields
-AC_DEFUN([LC_WB_RANGE_START],
-[AC_MSG_CHECKING([kernel has range_start in struct writeback_control])
+# LC_CANCEL_DIRTY_PAGE
+# 2.6.20 introduse cancel_dirty_page instead of
+# clear_page_dirty.
+AC_DEFUN([LC_CANCEL_DIRTY_PAGE],
+[AC_MSG_CHECKING([kernel has cancel_dirty_page])
LB_LINUX_TRY_COMPILE([
- #include <linux/fs.h>
- #include <linux/sched.h>
- #include <linux/writeback.h>
+ #include <linux/page-flags.h>
+],[
+ cancel_dirty_page(NULL, 0);
+],[
+ AC_MSG_RESULT(yes)
+ AC_DEFINE(HAVE_CANCEL_DIRTY_PAGE, 1,
+ [kernel has cancel_dirty_page instead of clear_page_dirty])
+],[
+ AC_MSG_RESULT(NO)
+])
+])
+
+#
+# LC_PAGE_CONSTANT
+#
+# In order to support raid5 zerocopy patch, we have to patch the kernel to make
+# it support constant page, which means the page won't be modified during the
+# IO.
+#
+AC_DEFUN([LC_PAGE_CONSTANT],
+[AC_MSG_CHECKING([if kernel have PageConstant defined])
+LB_LINUX_TRY_COMPILE([
+ #include <linux/page-flags.h>
+],[
+ #ifndef PG_constant
+ #error "Have no raid5 zcopy patch"
+ #endif
+],[
+ AC_MSG_RESULT(yes)
+ AC_DEFINE(HAVE_PAGE_CONSTANT, 1, [kernel have PageConstant supported])
],[
- struct writeback_control wb;
+ AC_MSG_RESULT(no);
+])
+])
- wb.range_start = 0;
+# RHEL5 in FS-cache patch rename PG_checked flag
+# into PG_fs_misc
+AC_DEFUN([LC_PG_FS_MISC],
+[AC_MSG_CHECKING([kernel has PG_fs_misc])
+LB_LINUX_TRY_COMPILE([
+ #include <linux/page-flags.h>
+],[
+ #ifndef PG_fs_misc
+ #error PG_fs_misc not defined in kernel
+ #endif
],[
AC_MSG_RESULT(yes)
- AC_DEFINE(HAVE_WB_RANGE_START, 1,
- [writeback control has range_start field])
+ AC_DEFINE(HAVE_PG_FS_MISC, 1,
+ [is kernel have PG_fs_misc])
],[
AC_MSG_RESULT(NO)
])
])
+AC_DEFUN([LC_EXPORT_TRUNCATE_COMPLETE],
+[LB_CHECK_SYMBOL_EXPORT([truncate_complete_page],
+[mm/truncate.c],[
+AC_DEFINE(HAVE_TRUNCATE_COMPLETE_PAGE, 1,
+ [kernel export truncate_complete_page])
+],[
+])
+])
+
+AC_DEFUN([LC_EXPORT_D_REHASH_COND],
+[LB_CHECK_SYMBOL_EXPORT([d_rehash_cond],
+[fs/dcache.c],[
+AC_DEFINE(HAVE_D_REHASH_COND, 1,
+ [d_rehash_cond is exported by the kernel])
+],[
+])
+])
+
+AC_DEFUN([LC_EXPORT___D_REHASH],
+[LB_CHECK_SYMBOL_EXPORT([__d_rehash],
+[fs/dcache.c],[
+AC_DEFINE(HAVE___D_REHASH, 1,
+ [__d_rehash is exported by the kernel])
+],[
+])
+])
+
+#
+# LC_VFS_INTENT_PATCHES
+#
+# check if the kernel has the VFS intent patches
+AC_DEFUN([LC_VFS_INTENT_PATCHES],
+[AC_MSG_CHECKING([if the kernel has the VFS intent patches])
+LB_LINUX_TRY_COMPILE([
+ #include <linux/fs.h>
+ #include <linux/namei.h>
+],[
+ struct nameidata nd;
+ struct lookup_intent *it;
+
+ it = &nd.intent;
+ intent_init(it, IT_OPEN);
+ it->d.lustre.it_disposition = 0;
+ it->d.lustre.it_data = NULL;
+],[
+ AC_MSG_RESULT([yes])
+ AC_DEFINE(HAVE_VFS_INTENT_PATCHES, 1, [VFS intent patches are applied])
+],[
+ AC_MSG_RESULT([no])
+])
+])
+
#
# LC_PROG_LINUX
#
AC_DEFUN([LC_PROG_LINUX],
[ LC_LUSTRE_VERSION_H
if test x$enable_server = xyes ; then
- LC_CONFIG_BACKINGFS
+ LC_CONFIG_BACKINGFS
fi
LC_CONFIG_PINGER
LC_CONFIG_LIBLUSTRE_RECOVERY
LC_CONFIG_HEALTH_CHECK_WRITE
LC_TASK_PPTR
+# RHEL4 pachess
+LC_EXPORT_TRUNCATE_COMPLETE
+LC_EXPORT_D_REHASH_COND
+LC_EXPORT___D_REHASH
LC_STRUCT_KIOBUF
LC_FUNC_COND_RESCHED
LC_XATTR_ACL
LC_STRUCT_INTENT_FILE
LC_POSIX_ACL_XATTR_H
-LC_EXPORT___IGET
LC_FUNC_SET_FS_PWD
+LC_CAPA_CRYPTO
+LC_CONFIG_GSS
LC_FUNC_MS_FLOCK_LOCK
LC_FUNC_HAVE_CAN_SLEEP_ARG
LC_FUNC_F_OP_FLOCK
LC_QUOTA_READ
LC_COOKIE_FOLLOW_LINK
+LC_FUNC_RCU
+
+# does the kernel have VFS intent patches?
+LC_VFS_INTENT_PATCHES
# 2.6.15
LC_INODE_I_MUTEX
LC_VFS_KERN_MOUNT
LC_INVALIDATEPAGE_RETURN_INT
LC_UMOUNTBEGIN_HAS_VFSMOUNT
-LC_WB_RANGE_START
+
+#2.6.18 + RHEL5 (fc6)
+LC_PG_FS_MISC
# 2.6.19
LC_INODE_BLKSIZE
LC_VFS_READDIR_U64_INO
LC_GENERIC_FILE_READ
LC_GENERIC_FILE_WRITE
+
+# 2.6.20
+LC_CANCEL_DIRTY_PAGE
+
+# raid5-zerocopy patch
+LC_PAGE_CONSTANT
])
#
# whether to enable quota support
#
AC_DEFUN([LC_CONFIG_QUOTA],
-[AC_MSG_CHECKING([whether to enable quota support])
+[AC_MSG_CHECKING([whether to disable quota support])
AC_ARG_ENABLE([quota],
- AC_HELP_STRING([--enable-quota],
- [enable quota support]),
+ AC_HELP_STRING([--disable-quota],
+ [disable quota support]),
[],[enable_quota='yes'])
AC_MSG_RESULT([$enable_quota])
if test x$linux25 != xyes; then
AC_DEFINE(HAVE_QUOTA_SUPPORT, 1, [Enable quota support])
fi
])
-
+
+#
+# LC_CONFIG_SPLIT
+#
+# whether to enable split support
+#
+AC_DEFUN([LC_CONFIG_SPLIT],
+[AC_MSG_CHECKING([whether to enable split support])
+AC_ARG_ENABLE([split],
+ AC_HELP_STRING([--enable-split],
+ [enable split support]),
+ [],[enable_split='no'])
+AC_MSG_RESULT([$enable_split])
+if test x$enable_split != xno; then
+ AC_DEFINE(HAVE_SPLIT_SUPPORT, 1, [enable split support])
+fi
+])
+
AC_DEFUN([LC_QUOTA_READ],
[AC_MSG_CHECKING([if kernel supports quota_read])
LB_LINUX_TRY_COMPILE([
])
#
+# LC_FUNC_RCU
+#
+# kernels prior than 2.6.0(?) have no RCU supported; in kernel 2.6.5(SUSE),
+# call_rcu takes three parameters.
+#
+AC_DEFUN([LC_FUNC_RCU],
+[AC_MSG_CHECKING([if kernel have RCU supported])
+LB_LINUX_TRY_COMPILE([
+ #include <linux/rcupdate.h>
+],[],[
+ AC_DEFINE(HAVE_RCU, 1, [have RCU defined])
+ AC_MSG_RESULT([yes])
+
+ AC_MSG_CHECKING([if call_rcu takes three parameters])
+ LB_LINUX_TRY_COMPILE([
+ #include <linux/rcupdate.h>
+ ],[
+ struct rcu_head rh;
+ call_rcu(&rh, (void (*)(struct rcu_head *))1, NULL);
+ ],[
+ AC_DEFINE(HAVE_CALL_RCU_PARAM, 1, [call_rcu takes three parameters])
+ AC_MSG_RESULT([yes])
+ ],[
+ AC_MSG_RESULT([no])
+ ])
+],[
+ AC_MSG_RESULT([no])
+])
+])
+
+#
# LC_CONFIGURE
#
# other configure checks
AC_DEFINE([MIN_DF], 1, [Report minimum OST free space])
fi
+AC_ARG_ENABLE([fail_alloc],
+ AC_HELP_STRING([--disable-fail-alloc],
+ [disable randomly alloc failure]),
+ [],[enable_fail_alloc=yes])
+AC_MSG_CHECKING([whether to randomly failing memory alloc])
+AC_MSG_RESULT([$enable_fail_alloc])
+if test x$enable_fail_alloc != xno ; then
+ AC_DEFINE([RANDOM_FAIL_ALLOC], 1, [enable randomly alloc failure])
+fi
+
])
#
#
AC_DEFUN([LC_CONDITIONALS],
[AM_CONDITIONAL(LIBLUSTRE, test x$enable_liblustre = xyes)
-AM_CONDITIONAL(LDISKFS, test x$enable_ldiskfs = xyes)
AM_CONDITIONAL(USE_QUILT, test x$QUILT != xno)
AM_CONDITIONAL(LIBLUSTRE_TESTS, test x$enable_liblustre_tests = xyes)
AM_CONDITIONAL(MPITESTS, test x$enable_mpitests = xyes, Build MPI Tests)
AM_CONDITIONAL(CLIENT, test x$enable_client = xyes)
AM_CONDITIONAL(SERVER, test x$enable_server = xyes)
AM_CONDITIONAL(QUOTA, test x$enable_quota = xyes)
+AM_CONDITIONAL(SPLIT, test x$enable_split = xyes)
AM_CONDITIONAL(BLKID, test x$ac_cv_header_blkid_blkid_h = xyes)
AM_CONDITIONAL(EXT2FS_DEVEL, test x$ac_cv_header_ext2fs_ext2fs_h = xyes)
+AM_CONDITIONAL(GSS, test x$enable_gss = xyes)
AM_CONDITIONAL(LIBPTHREAD, test x$enable_libpthread = xyes)
])
lustre/kernel_patches/targets/2.6-fc5.target
lustre/kernel_patches/targets/2.6-patchless.target
lustre/kernel_patches/targets/2.6-sles10.target
-lustre/kernel_patches/targets/hp_pnnl-2.4.target
-lustre/kernel_patches/targets/rh-2.4.target
-lustre/kernel_patches/targets/rhel-2.4.target
-lustre/kernel_patches/targets/suse-2.4.21-2.target
-lustre/kernel_patches/targets/sles-2.4.target
-lustre/ldiskfs/Makefile
-lustre/ldiskfs/autoMakefile
-lustre/ldiskfs2/Makefile
-lustre/ldiskfs2/autoMakefile
lustre/ldlm/Makefile
+lustre/fid/Makefile
+lustre/fid/autoMakefile
lustre/liblustre/Makefile
lustre/liblustre/tests/Makefile
lustre/llite/Makefile
lustre/lvfs/autoMakefile
lustre/mdc/Makefile
lustre/mdc/autoMakefile
+lustre/lmv/Makefile
+lustre/lmv/autoMakefile
lustre/mds/Makefile
lustre/mds/autoMakefile
+lustre/mdt/Makefile
+lustre/mdt/autoMakefile
+lustre/cmm/Makefile
+lustre/cmm/autoMakefile
+lustre/mdd/Makefile
+lustre/mdd/autoMakefile
+lustre/fld/Makefile
+lustre/fld/autoMakefile
lustre/obdclass/Makefile
lustre/obdclass/autoMakefile
lustre/obdclass/linux/Makefile
lustre/osc/autoMakefile
lustre/ost/Makefile
lustre/ost/autoMakefile
+lustre/osd/Makefile
+lustre/osd/autoMakefile
lustre/mgc/Makefile
lustre/mgc/autoMakefile
lustre/mgs/Makefile
lustre/mgs/autoMakefile
lustre/ptlrpc/Makefile
lustre/ptlrpc/autoMakefile
+lustre/ptlrpc/gss/Makefile
+lustre/ptlrpc/gss/autoMakefile
lustre/quota/Makefile
lustre/quota/autoMakefile
lustre/scripts/Makefile
lustre/scripts/version_tag.pl
lustre/tests/Makefile
lustre/utils/Makefile
+lustre/utils/gss/Makefile
])
case $lb_target_os in
darwin)
m4_define([LUSTRE_MAJOR],[1])
-m4_define([LUSTRE_MINOR],[6])
+m4_define([LUSTRE_MINOR],[8])
m4_define([LUSTRE_PATCH],[0])
-m4_define([LUSTRE_FIX],[90])
+m4_define([LUSTRE_FIX],[0])
dnl # liblustre delta is 0.0.1.32 , next version with fixes is ok, but
dnl # after following release candidate/beta would spill this warning already.
Makefile
.deps
TAGS
+.*.cmd
autoMakefile.in
autoMakefile
*.ko
*.mod.c
-.*.cmd
.*.flags
.tmp_versions
-linux-stage
-linux
-*.c
-*.h
-sources
+.depend
--- /dev/null
+MODULES := cmm
+cmm-objs := cmm_device.o cmm_object.o cmm_lproc.o mdc_device.o mdc_object.o
+
+@SPLIT_TRUE@cmm-objs += cmm_split.o
+
+@INCLUDE_RULES@
--- /dev/null
+# Copyright (C) 2001 Cluster File Systems, Inc.
+#
+# This code is issued under the GNU General Public License.
+# See the file COPYING in this distribution
+
+if MODULES
+modulefs_DATA = cmm$(KMODEXT)
+endif
+
+MOSTLYCLEANFILES := @MOSTLYCLEANFILES@
+DIST_SOURCES = $(cmm-objs:%.o=%.c) cmm_internal.h mdc_internal.h
--- /dev/null
+/* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
+ * vim:expandtab:shiftwidth=8:tabstop=8:
+ *
+ * lustre/cmm/cmm_device.c
+ * Lustre Cluster Metadata Manager (cmm)
+ *
+ * Copyright (c) 2006 Cluster File Systems, Inc.
+ * Author: Mike Pershin <tappro@clusterfs.com>
+ *
+ * This file is part of the Lustre file system, http://www.lustre.org
+ * Lustre is a trademark of Cluster File Systems, Inc.
+ *
+ * You may have signed or agreed to another license before downloading
+ * this software. If so, you are bound by the terms and conditions
+ * of that agreement, and the following does not apply to you. See the
+ * LICENSE file included with this distribution for more information.
+ *
+ * If you did not agree to a different license, then this copy of Lustre
+ * is open source software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * In either case, Lustre is distributed in the hope that it will be
+ * useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * license text for more details.
+ */
+
+#ifndef EXPORT_SYMTAB
+# define EXPORT_SYMTAB
+#endif
+#define DEBUG_SUBSYSTEM S_MDS
+
+#include <linux/module.h>
+
+#include <obd.h>
+#include <obd_class.h>
+#include <lprocfs_status.h>
+#include <lustre_ver.h>
+#include "cmm_internal.h"
+#include "mdc_internal.h"
+
+static struct obd_ops cmm_obd_device_ops = {
+ .o_owner = THIS_MODULE
+};
+
+static struct lu_device_operations cmm_lu_ops;
+
+static inline int lu_device_is_cmm(struct lu_device *d)
+{
+ return ergo(d != NULL && d->ld_ops != NULL, d->ld_ops == &cmm_lu_ops);
+}
+
+int cmm_root_get(const struct lu_env *env, struct md_device *md,
+ struct lu_fid *fid)
+{
+ struct cmm_device *cmm_dev = md2cmm_dev(md);
+ /* valid only on master MDS */
+ if (cmm_dev->cmm_local_num == 0)
+ return cmm_child_ops(cmm_dev)->mdo_root_get(env,
+ cmm_dev->cmm_child, fid);
+ else
+ return -EINVAL;
+}
+
+static int cmm_statfs(const struct lu_env *env, struct md_device *md,
+ struct kstatfs *sfs)
+{
+ struct cmm_device *cmm_dev = md2cmm_dev(md);
+ int rc;
+
+ ENTRY;
+ rc = cmm_child_ops(cmm_dev)->mdo_statfs(env,
+ cmm_dev->cmm_child, sfs);
+ RETURN (rc);
+}
+
+static int cmm_maxsize_get(const struct lu_env *env, struct md_device *md,
+ int *md_size, int *cookie_size)
+{
+ struct cmm_device *cmm_dev = md2cmm_dev(md);
+ int rc;
+ ENTRY;
+ rc = cmm_child_ops(cmm_dev)->mdo_maxsize_get(env, cmm_dev->cmm_child,
+ md_size, cookie_size);
+ RETURN(rc);
+}
+
+static int cmm_init_capa_ctxt(const struct lu_env *env, struct md_device *md,
+ int mode , unsigned long timeout, __u32 alg,
+ struct lustre_capa_key *keys)
+{
+ struct cmm_device *cmm_dev = md2cmm_dev(md);
+ int rc;
+ ENTRY;
+ LASSERT(cmm_child_ops(cmm_dev)->mdo_init_capa_ctxt);
+ rc = cmm_child_ops(cmm_dev)->mdo_init_capa_ctxt(env, cmm_dev->cmm_child,
+ mode, timeout, alg,
+ keys);
+ RETURN(rc);
+}
+
+static int cmm_update_capa_key(const struct lu_env *env,
+ struct md_device *md,
+ struct lustre_capa_key *key)
+{
+ struct cmm_device *cmm_dev = md2cmm_dev(md);
+ int rc;
+ ENTRY;
+ rc = cmm_child_ops(cmm_dev)->mdo_update_capa_key(env,
+ cmm_dev->cmm_child,
+ key);
+ RETURN(rc);
+}
+
+static struct md_device_operations cmm_md_ops = {
+ .mdo_statfs = cmm_statfs,
+ .mdo_root_get = cmm_root_get,
+ .mdo_maxsize_get = cmm_maxsize_get,
+ .mdo_init_capa_ctxt = cmm_init_capa_ctxt,
+ .mdo_update_capa_key = cmm_update_capa_key,
+};
+
+extern struct lu_device_type mdc_device_type;
+
+static int cmm_post_init_mdc(const struct lu_env *env,
+ struct cmm_device *cmm)
+{
+ int max_mdsize, max_cookiesize, rc;
+ struct mdc_device *mc, *tmp;
+
+ /* get the max mdsize and cookiesize from lower layer */
+ rc = cmm_maxsize_get(env, &cmm->cmm_md_dev, &max_mdsize,
+ &max_cookiesize);
+ if (rc)
+ RETURN(rc);
+
+ spin_lock(&cmm->cmm_tgt_guard);
+ list_for_each_entry_safe(mc, tmp, &cmm->cmm_targets,
+ mc_linkage) {
+ mdc_init_ea_size(env, mc, max_mdsize, max_cookiesize);
+ }
+ spin_unlock(&cmm->cmm_tgt_guard);
+ RETURN(rc);
+}
+
+/* --- cmm_lu_operations --- */
+/* add new MDC to the CMM, create MDC lu_device and connect it to mdc_obd */
+static int cmm_add_mdc(const struct lu_env *env,
+ struct cmm_device *cm, struct lustre_cfg *cfg)
+{
+ struct lu_device_type *ldt = &mdc_device_type;
+ char *p, *num = lustre_cfg_string(cfg, 2);
+ struct mdc_device *mc, *tmp;
+ struct lu_fld_target target;
+ struct lu_device *ld;
+ mdsno_t mdc_num;
+ int rc;
+ ENTRY;
+
+ /* find out that there is no such mdc */
+ LASSERT(num);
+ mdc_num = simple_strtol(num, &p, 10);
+ if (*p) {
+ CERROR("Invalid index in lustre_cgf, offset 2\n");
+ RETURN(-EINVAL);
+ }
+
+ spin_lock(&cm->cmm_tgt_guard);
+ list_for_each_entry_safe(mc, tmp, &cm->cmm_targets,
+ mc_linkage) {
+ if (mc->mc_num == mdc_num) {
+ spin_unlock(&cm->cmm_tgt_guard);
+ RETURN(-EEXIST);
+ }
+ }
+ spin_unlock(&cm->cmm_tgt_guard);
+ ld = ldt->ldt_ops->ldto_device_alloc(env, ldt, cfg);
+ ld->ld_site = cmm2lu_dev(cm)->ld_site;
+
+ rc = ldt->ldt_ops->ldto_device_init(env, ld, NULL, NULL);
+ if (rc) {
+ ldt->ldt_ops->ldto_device_free(env, ld);
+ RETURN (rc);
+ }
+ /* pass config to the just created MDC */
+ rc = ld->ld_ops->ldo_process_config(env, ld, cfg);
+ if (rc)
+ RETURN(rc);
+
+ spin_lock(&cm->cmm_tgt_guard);
+ list_for_each_entry_safe(mc, tmp, &cm->cmm_targets,
+ mc_linkage) {
+ if (mc->mc_num == mdc_num) {
+ spin_unlock(&cm->cmm_tgt_guard);
+ ldt->ldt_ops->ldto_device_fini(env, ld);
+ ldt->ldt_ops->ldto_device_free(env, ld);
+ RETURN(-EEXIST);
+ }
+ }
+ mc = lu2mdc_dev(ld);
+ list_add_tail(&mc->mc_linkage, &cm->cmm_targets);
+ cm->cmm_tgt_count++;
+ spin_unlock(&cm->cmm_tgt_guard);
+
+ lu_device_get(cmm2lu_dev(cm));
+
+ target.ft_srv = NULL;
+ target.ft_idx = mc->mc_num;
+ target.ft_exp = mc->mc_desc.cl_exp;
+ fld_client_add_target(cm->cmm_fld, &target);
+
+ /* Set max md size for the mdc. */
+ rc = cmm_post_init_mdc(env, cm);
+ RETURN(rc);
+}
+
+static void cmm_device_shutdown(const struct lu_env *env,
+ struct cmm_device *cm,
+ struct lustre_cfg *cfg)
+{
+ struct mdc_device *mc, *tmp;
+ ENTRY;
+
+ /* Remove local target from FLD. */
+ fld_client_del_target(cm->cmm_fld, cm->cmm_local_num);
+
+ /* Finish all mdc devices. */
+ spin_lock(&cm->cmm_tgt_guard);
+ list_for_each_entry_safe(mc, tmp, &cm->cmm_targets, mc_linkage) {
+ struct lu_device *ld_m = mdc2lu_dev(mc);
+ fld_client_del_target(cm->cmm_fld, mc->mc_num);
+ ld_m->ld_ops->ldo_process_config(env, ld_m, cfg);
+ }
+ spin_unlock(&cm->cmm_tgt_guard);
+
+ /* remove upcall device*/
+ md_upcall_fini(&cm->cmm_md_dev);
+
+ EXIT;
+}
+
+static int cmm_device_mount(const struct lu_env *env,
+ struct cmm_device *m, struct lustre_cfg *cfg)
+{
+ const char *index = lustre_cfg_string(cfg, 2);
+ char *p;
+
+ LASSERT(index != NULL);
+
+ m->cmm_local_num = simple_strtol(index, &p, 10);
+ if (*p) {
+ CERROR("Invalid index in lustre_cgf\n");
+ RETURN(-EINVAL);
+ }
+
+ RETURN(0);
+}
+
+static int cmm_process_config(const struct lu_env *env,
+ struct lu_device *d, struct lustre_cfg *cfg)
+{
+ struct cmm_device *m = lu2cmm_dev(d);
+ struct lu_device *next = md2lu_dev(m->cmm_child);
+ int err;
+ ENTRY;
+
+ switch(cfg->lcfg_command) {
+ case LCFG_ADD_MDC:
+ /* On first ADD_MDC add also local target. */
+ if (!(m->cmm_flags & CMM_INITIALIZED)) {
+ struct lu_site *ls = cmm2lu_dev(m)->ld_site;
+ struct lu_fld_target target;
+
+ target.ft_srv = ls->ls_server_fld;
+ target.ft_idx = m->cmm_local_num;
+ target.ft_exp = NULL;
+
+ fld_client_add_target(m->cmm_fld, &target);
+ }
+ err = cmm_add_mdc(env, m, cfg);
+
+ /* The first ADD_MDC can be counted as setup is finished. */
+ if (!(m->cmm_flags & CMM_INITIALIZED))
+ m->cmm_flags |= CMM_INITIALIZED;
+
+ break;
+ case LCFG_SETUP:
+ {
+ /* lower layers should be set up at first */
+ err = next->ld_ops->ldo_process_config(env, next, cfg);
+ if (err == 0)
+ err = cmm_device_mount(env, m, cfg);
+ break;
+ }
+ case LCFG_CLEANUP:
+ {
+ cmm_device_shutdown(env, m, cfg);
+ }
+ default:
+ err = next->ld_ops->ldo_process_config(env, next, cfg);
+ }
+ RETURN(err);
+}
+
+static int cmm_recovery_complete(const struct lu_env *env,
+ struct lu_device *d)
+{
+ struct cmm_device *m = lu2cmm_dev(d);
+ struct lu_device *next = md2lu_dev(m->cmm_child);
+ int rc;
+ ENTRY;
+ rc = next->ld_ops->ldo_recovery_complete(env, next);
+ RETURN(rc);
+}
+
+static struct lu_device_operations cmm_lu_ops = {
+ .ldo_object_alloc = cmm_object_alloc,
+ .ldo_process_config = cmm_process_config,
+ .ldo_recovery_complete = cmm_recovery_complete
+};
+
+/* --- lu_device_type operations --- */
+int cmm_upcall(const struct lu_env *env, struct md_device *md,
+ enum md_upcall_event ev)
+{
+ int rc;
+ ENTRY;
+
+ switch (ev) {
+ case MD_LOV_SYNC:
+ rc = cmm_post_init_mdc(env, md2cmm_dev(md));
+ if (rc)
+ CERROR("can not init md size %d\n", rc);
+ /* fall through */
+ default:
+ rc = md_do_upcall(env, md, ev);
+ }
+ RETURN(rc);
+}
+
+static struct lu_device *cmm_device_alloc(const struct lu_env *env,
+ struct lu_device_type *t,
+ struct lustre_cfg *cfg)
+{
+ struct lu_device *l;
+ struct cmm_device *m;
+ ENTRY;
+
+ OBD_ALLOC_PTR(m);
+ if (m == NULL) {
+ l = ERR_PTR(-ENOMEM);
+ } else {
+ md_device_init(&m->cmm_md_dev, t);
+ m->cmm_md_dev.md_ops = &cmm_md_ops;
+ md_upcall_init(&m->cmm_md_dev, cmm_upcall);
+ l = cmm2lu_dev(m);
+ l->ld_ops = &cmm_lu_ops;
+
+ OBD_ALLOC_PTR(m->cmm_fld);
+ if (!m->cmm_fld)
+ GOTO(out_free_cmm, l = ERR_PTR(-ENOMEM));
+ }
+
+ RETURN(l);
+out_free_cmm:
+ OBD_FREE_PTR(m);
+ return l;
+}
+
+static void cmm_device_free(const struct lu_env *env, struct lu_device *d)
+{
+ struct cmm_device *m = lu2cmm_dev(d);
+
+ LASSERT(m->cmm_tgt_count == 0);
+ LASSERT(list_empty(&m->cmm_targets));
+ if (m->cmm_fld != NULL) {
+ OBD_FREE_PTR(m->cmm_fld);
+ m->cmm_fld = NULL;
+ }
+ md_device_fini(&m->cmm_md_dev);
+ OBD_FREE_PTR(m);
+}
+
+/* context key constructor/destructor */
+static void *cmm_key_init(const struct lu_context *ctx,
+ struct lu_context_key *key)
+{
+ struct cmm_thread_info *info;
+
+ CLASSERT(CFS_PAGE_SIZE >= sizeof *info);
+ OBD_ALLOC_PTR(info);
+ if (info == NULL)
+ info = ERR_PTR(-ENOMEM);
+ return info;
+}
+
+static void cmm_key_fini(const struct lu_context *ctx,
+ struct lu_context_key *key, void *data)
+{
+ struct cmm_thread_info *info = data;
+ OBD_FREE_PTR(info);
+}
+
+static struct lu_context_key cmm_thread_key = {
+ .lct_tags = LCT_MD_THREAD,
+ .lct_init = cmm_key_init,
+ .lct_fini = cmm_key_fini
+};
+
+struct cmm_thread_info *cmm_env_info(const struct lu_env *env)
+{
+ struct cmm_thread_info *info;
+
+ info = lu_context_key_get(&env->le_ctx, &cmm_thread_key);
+ LASSERT(info != NULL);
+ return info;
+}
+
+static int cmm_type_init(struct lu_device_type *t)
+{
+ LU_CONTEXT_KEY_INIT(&cmm_thread_key);
+ return lu_context_key_register(&cmm_thread_key);
+}
+
+static void cmm_type_fini(struct lu_device_type *t)
+{
+ lu_context_key_degister(&cmm_thread_key);
+}
+
+static int cmm_device_init(const struct lu_env *env, struct lu_device *d,
+ const char *name, struct lu_device *next)
+{
+ struct cmm_device *m = lu2cmm_dev(d);
+ struct lu_site *ls;
+ int err = 0;
+ ENTRY;
+
+ spin_lock_init(&m->cmm_tgt_guard);
+ INIT_LIST_HEAD(&m->cmm_targets);
+ m->cmm_tgt_count = 0;
+ m->cmm_child = lu2md_dev(next);
+
+ err = fld_client_init(m->cmm_fld, name,
+ LUSTRE_CLI_FLD_HASH_DHT);
+ if (err) {
+ CERROR("Can't init FLD, err %d\n", err);
+ RETURN(err);
+ }
+
+ /* Assign site's fld client ref, needed for asserts in osd. */
+ ls = cmm2lu_dev(m)->ld_site;
+ ls->ls_client_fld = m->cmm_fld;
+ err = cmm_procfs_init(m, name);
+
+ RETURN(err);
+}
+
+static struct lu_device *cmm_device_fini(const struct lu_env *env,
+ struct lu_device *ld)
+{
+ struct cmm_device *cm = lu2cmm_dev(ld);
+ struct mdc_device *mc, *tmp;
+ struct lu_site *ls;
+ ENTRY;
+
+ /* Finish all mdc devices */
+ spin_lock(&cm->cmm_tgt_guard);
+ list_for_each_entry_safe(mc, tmp, &cm->cmm_targets, mc_linkage) {
+ struct lu_device *ld_m = mdc2lu_dev(mc);
+
+ list_del_init(&mc->mc_linkage);
+ lu_device_put(cmm2lu_dev(cm));
+ ld_m->ld_type->ldt_ops->ldto_device_fini(env, ld_m);
+ ld_m->ld_type->ldt_ops->ldto_device_free(env, ld_m);
+ cm->cmm_tgt_count--;
+ }
+ spin_unlock(&cm->cmm_tgt_guard);
+
+ fld_client_fini(cm->cmm_fld);
+ ls = cmm2lu_dev(cm)->ld_site;
+ ls->ls_client_fld = NULL;
+ cmm_procfs_fini(cm);
+
+ RETURN (md2lu_dev(cm->cmm_child));
+}
+
+static struct lu_device_type_operations cmm_device_type_ops = {
+ .ldto_init = cmm_type_init,
+ .ldto_fini = cmm_type_fini,
+
+ .ldto_device_alloc = cmm_device_alloc,
+ .ldto_device_free = cmm_device_free,
+
+ .ldto_device_init = cmm_device_init,
+ .ldto_device_fini = cmm_device_fini
+};
+
+static struct lu_device_type cmm_device_type = {
+ .ldt_tags = LU_DEVICE_MD,
+ .ldt_name = LUSTRE_CMM_NAME,
+ .ldt_ops = &cmm_device_type_ops,
+ .ldt_ctx_tags = LCT_MD_THREAD | LCT_DT_THREAD
+};
+
+struct lprocfs_vars lprocfs_cmm_obd_vars[] = {
+ { 0 }
+};
+
+struct lprocfs_vars lprocfs_cmm_module_vars[] = {
+ { 0 }
+};
+
+LPROCFS_INIT_VARS(cmm, lprocfs_cmm_module_vars, lprocfs_cmm_obd_vars);
+
+static int __init cmm_mod_init(void)
+{
+ struct lprocfs_static_vars lvars;
+
+ lprocfs_init_vars(cmm, &lvars);
+ return class_register_type(&cmm_obd_device_ops, NULL, lvars.module_vars,
+ LUSTRE_CMM_NAME, &cmm_device_type);
+}
+
+static void __exit cmm_mod_exit(void)
+{
+ class_unregister_type(LUSTRE_CMM_NAME);
+}
+
+MODULE_AUTHOR("Cluster File Systems, Inc. <info@clusterfs.com>");
+MODULE_DESCRIPTION("Lustre Clustered Metadata Manager ("LUSTRE_CMM_NAME")");
+MODULE_LICENSE("GPL");
+
+cfs_module(cmm, "0.1.0", cmm_mod_init, cmm_mod_exit);
--- /dev/null
+/* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
+ * vim:expandtab:shiftwidth=8:tabstop=8:
+ *
+ * lustre/cmm/cmm_internal.h
+ * Lustre Cluster Metadata Manager (cmm)
+ *
+ * Copyright (C) 2006 Cluster File Systems, Inc.
+ * Author: Mike Pershin <tappro@clusterfs.com>
+ *
+ * This file is part of the Lustre file system, http://www.lustre.org
+ * Lustre is a trademark of Cluster File Systems, Inc.
+ *
+ * You may have signed or agreed to another license before downloading
+ * this software. If so, you are bound by the terms and conditions
+ * of that agreement, and the following does not apply to you. See the
+ * LICENSE file included with this distribution for more information.
+ *
+ * If you did not agree to a different license, then this copy of Lustre
+ * is open source software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * In either case, Lustre is distributed in the hope that it will be
+ * useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * license text for more details.
+ */
+
+#ifndef _CMM_INTERNAL_H
+#define _CMM_INTERNAL_H
+
+#if defined(__KERNEL__)
+
+#include <obd.h>
+#include <lustre_fld.h>
+#include <md_object.h>
+#include <linux/lustre_acl.h>
+
+
+struct cmm_device {
+ struct md_device cmm_md_dev;
+ /* device flags, taken from enum cmm_flags */
+ __u32 cmm_flags;
+ /* underlaying device in MDS stack, usually MDD */
+ struct md_device *cmm_child;
+ /* FLD client to talk to FLD */
+ struct lu_client_fld *cmm_fld;
+ /* other MD servers in cluster */
+ mdsno_t cmm_local_num;
+ __u32 cmm_tgt_count;
+ struct list_head cmm_targets;
+ spinlock_t cmm_tgt_guard;
+ cfs_proc_dir_entry_t *cmm_proc_entry;
+ struct lprocfs_stats *cmm_stats;
+};
+
+enum cmm_flags {
+ /*
+ * Device initialization complete.
+ */
+ CMM_INITIALIZED = 1 << 0
+};
+
+static inline struct md_device_operations *cmm_child_ops(struct cmm_device *d)
+{
+ return (d->cmm_child->md_ops);
+}
+
+static inline struct cmm_device *md2cmm_dev(struct md_device *m)
+{
+ return container_of0(m, struct cmm_device, cmm_md_dev);
+}
+
+static inline struct cmm_device *lu2cmm_dev(struct lu_device *d)
+{
+ return container_of0(d, struct cmm_device, cmm_md_dev.md_lu_dev);
+}
+
+static inline struct lu_device *cmm2lu_dev(struct cmm_device *d)
+{
+ return (&d->cmm_md_dev.md_lu_dev);
+}
+
+#ifdef HAVE_SPLIT_SUPPORT
+enum cmm_split_state {
+ CMM_SPLIT_UNKNOWN,
+ CMM_SPLIT_NONE,
+ CMM_SPLIT_NEEDED,
+ CMM_SPLIT_DONE,
+ CMM_SPLIT_DENIED
+};
+#endif
+
+struct cmm_object {
+ struct md_object cmo_obj;
+};
+
+/* local CMM object */
+struct cml_object {
+ struct cmm_object cmm_obj;
+#ifdef HAVE_SPLIT_SUPPORT
+ /* split state of object (for dirs only)*/
+ enum cmm_split_state clo_split;
+#endif
+};
+
+/* remote CMM object */
+struct cmr_object {
+ struct cmm_object cmm_obj;
+ /* mds number where object is placed */
+ mdsno_t cmo_num;
+};
+
+enum {
+ CMM_SPLIT_PAGE_COUNT = 1
+};
+
+struct cmm_thread_info {
+ struct md_attr cmi_ma;
+ struct lu_buf cmi_buf;
+ struct lu_fid cmi_fid; /* used for le/cpu conversions */
+ struct lu_rdpg cmi_rdpg;
+ /* pointers to pages for readpage. */
+ struct page *cmi_pages[CMM_SPLIT_PAGE_COUNT];
+ struct md_op_spec cmi_spec;
+ struct lmv_stripe_md cmi_lmv;
+ char cmi_xattr_buf[LUSTRE_POSIX_ACL_MAX_SIZE];
+
+ /* Ops object filename */
+ struct lu_name cti_name;
+};
+
+static inline struct cmm_device *cmm_obj2dev(struct cmm_object *c)
+{
+ return (md2cmm_dev(md_obj2dev(&c->cmo_obj)));
+}
+
+static inline struct cmm_object *lu2cmm_obj(struct lu_object *o)
+{
+ //LASSERT(lu_device_is_cmm(o->lo_dev));
+ return container_of0(o, struct cmm_object, cmo_obj.mo_lu);
+}
+
+/* get cmm object from md_object */
+static inline struct cmm_object *md2cmm_obj(struct md_object *o)
+{
+ return container_of0(o, struct cmm_object, cmo_obj);
+}
+/* get lower-layer object */
+static inline struct md_object *cmm2child_obj(struct cmm_object *o)
+{
+ return (o ? lu2md(lu_object_next(&o->cmo_obj.mo_lu)) : NULL);
+}
+
+static inline struct lu_fid* cmm2fid(struct cmm_object *obj)
+{
+ return &(obj->cmo_obj.mo_lu.lo_header->loh_fid);
+}
+
+struct cmm_thread_info *cmm_env_info(const struct lu_env *env);
+
+/* cmm_object.c */
+struct lu_object *cmm_object_alloc(const struct lu_env *env,
+ const struct lu_object_header *hdr,
+ struct lu_device *);
+
+/*
+ * local CMM object operations. cml_...
+ */
+static inline struct cml_object *lu2cml_obj(struct lu_object *o)
+{
+ return container_of0(o, struct cml_object, cmm_obj.cmo_obj.mo_lu);
+}
+static inline struct cml_object *md2cml_obj(struct md_object *mo)
+{
+ return container_of0(mo, struct cml_object, cmm_obj.cmo_obj);
+}
+static inline struct cml_object *cmm2cml_obj(struct cmm_object *co)
+{
+ return container_of0(co, struct cml_object, cmm_obj);
+}
+
+int cmm_upcall(const struct lu_env *env, struct md_device *md,
+ enum md_upcall_event ev);
+
+#ifdef HAVE_SPLIT_SUPPORT
+
+#define CMM_MD_SIZE(stripes) (sizeof(struct lmv_stripe_md) + \
+ (stripes) * sizeof(struct lu_fid))
+
+/* cmm_split.c */
+static inline struct lu_buf *cmm_buf_get(const struct lu_env *env,
+ void *area, ssize_t len)
+{
+ struct lu_buf *buf;
+
+ buf = &cmm_env_info(env)->cmi_buf;
+ buf->lb_buf = area;
+ buf->lb_len = len;
+ return buf;
+}
+
+int cmm_split_check(const struct lu_env *env, struct md_object *mp,
+ const char *name);
+
+int cmm_split_expect(const struct lu_env *env, struct md_object *mo,
+ struct md_attr *ma, int *split);
+
+int cmm_split_dir(const struct lu_env *env, struct md_object *mo);
+
+int cmm_split_access(const struct lu_env *env, struct md_object *mo,
+ mdl_mode_t lm);
+#endif
+
+int cmm_fld_lookup(struct cmm_device *cm, const struct lu_fid *fid,
+ mdsno_t *mds, const struct lu_env *env);
+
+int cmm_procfs_init(struct cmm_device *cmm, const char *name);
+int cmm_procfs_fini(struct cmm_device *cmm);
+
+void cmm_lprocfs_time_start(const struct lu_env *env);
+void cmm_lprocfs_time_end(const struct lu_env *env, struct cmm_device *cmm,
+ int idx);
+
+enum {
+ LPROC_CMM_SPLIT_CHECK,
+ LPROC_CMM_SPLIT,
+ LPROC_CMM_LOOKUP,
+ LPROC_CMM_CREATE,
+ LPROC_CMM_NR
+};
+
+#endif /* __KERNEL__ */
+#endif /* _CMM_INTERNAL_H */
+
--- /dev/null
+/* -*- MODE: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
+ * vim:expandtab:shiftwidth=8:tabstop=8:
+ *
+ * cmm/cmm_lproc.c
+ * CMM lprocfs stuff
+ *
+ * Copyright (C) 2006 Cluster File Systems, Inc.
+ * Author: Wang Di <wangdi@clusterfs.com>
+ * Author: Yury Umanets <umka@clusterfs.com>
+ *
+ * This file is part of the Lustre file system, http://www.lustre.org
+ * Lustre is a trademark of Cluster File Systems, Inc.
+ *
+ * You may have signed or agreed to another license before downloading
+ * this software. If so, you are bound by the terms and conditions
+ * of that agreement, and the following does not apply to you. See the
+ * LICENSE file included with this distribution for more information.
+ *
+ * If you did not agree to a different license, then this copy of Lustre
+ * is open source software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * In either case, Lustre is distributed in the hope that it will be
+ * useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * license text for more details.
+ */
+#ifndef EXPORT_SYMTAB
+# define EXPORT_SYMTAB
+#endif
+#define DEBUG_SUBSYSTEM S_MDS
+
+#include <linux/module.h>
+#include <obd.h>
+#include <obd_class.h>
+#include <lustre_ver.h>
+#include <obd_support.h>
+#include <lprocfs_status.h>
+#include <lu_time.h>
+
+#include <lustre/lustre_idl.h>
+
+#include "cmm_internal.h"
+
+static const char *cmm_counter_names[LPROC_CMM_NR] = {
+ [LPROC_CMM_SPLIT_CHECK] = "split_check",
+ [LPROC_CMM_SPLIT] = "split",
+ [LPROC_CMM_LOOKUP] = "lookup",
+ [LPROC_CMM_CREATE] = "create"
+};
+
+int cmm_procfs_init(struct cmm_device *cmm, const char *name)
+{
+ struct lu_device *ld = &cmm->cmm_md_dev.md_lu_dev;
+ struct obd_type *type;
+ int rc;
+ ENTRY;
+
+ type = ld->ld_type->ldt_obd_type;
+
+ LASSERT(name != NULL);
+ LASSERT(type != NULL);
+
+ /* Find the type procroot and add the proc entry for this device. */
+ cmm->cmm_proc_entry = lprocfs_register(name, type->typ_procroot,
+ NULL, NULL);
+ if (IS_ERR(cmm->cmm_proc_entry)) {
+ rc = PTR_ERR(cmm->cmm_proc_entry);
+ CERROR("Error %d setting up lprocfs for %s\n",
+ rc, name);
+ cmm->cmm_proc_entry = NULL;
+ GOTO(out, rc);
+ }
+
+ rc = lu_time_init(&cmm->cmm_stats,
+ cmm->cmm_proc_entry,
+ cmm_counter_names, ARRAY_SIZE(cmm_counter_names));
+
+ EXIT;
+out:
+ if (rc)
+ cmm_procfs_fini(cmm);
+ return rc;
+}
+
+int cmm_procfs_fini(struct cmm_device *cmm)
+{
+ if (cmm->cmm_stats)
+ lu_time_fini(&cmm->cmm_stats);
+
+ if (cmm->cmm_proc_entry) {
+ lprocfs_remove(&cmm->cmm_proc_entry);
+ cmm->cmm_proc_entry = NULL;
+ }
+ RETURN(0);
+}
+
+void cmm_lprocfs_time_start(const struct lu_env *env)
+{
+ lu_lprocfs_time_start(env);
+}
+
+void cmm_lprocfs_time_end(const struct lu_env *env, struct cmm_device *cmm,
+ int idx)
+{
+ lu_lprocfs_time_end(env, cmm->cmm_stats, idx);
+}
--- /dev/null
+/* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
+ * vim:expandtab:shiftwidth=8:tabstop=8:
+ *
+ * lustre/cmm/cmm_object.c
+ * Lustre Cluster Metadata Manager (cmm)
+ *
+ * Copyright (c) 2006 Cluster File Systems, Inc.
+ * Author: Mike Pershin <tappro@clusterfs.com>
+ *
+ * This file is part of the Lustre file system, http://www.lustre.org
+ * Lustre is a trademark of Cluster File Systems, Inc.
+ *
+ * You may have signed or agreed to another license before downloading
+ * this software. If so, you are bound by the terms and conditions
+ * of that agreement, and the following does not apply to you. See the
+ * LICENSE file included with this distribution for more information.
+ *
+ * If you did not agree to a different license, then this copy of Lustre
+ * is open source software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * In either case, Lustre is distributed in the hope that it will be
+ * useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * license text for more details.
+ */
+
+#ifndef EXPORT_SYMTAB
+# define EXPORT_SYMTAB
+#endif
+
+#define DEBUG_SUBSYSTEM S_MDS
+
+#include <lustre_fid.h>
+#include "cmm_internal.h"
+#include "mdc_internal.h"
+
+int cmm_fld_lookup(struct cmm_device *cm, const struct lu_fid *fid,
+ mdsno_t *mds, const struct lu_env *env)
+{
+ int rc = 0;
+ ENTRY;
+
+ LASSERT(fid_is_sane(fid));
+
+ rc = fld_client_lookup(cm->cmm_fld, fid_seq(fid), mds, env);
+ if (rc) {
+ CERROR("Can't find mds by seq "LPX64", rc %d\n",
+ fid_seq(fid), rc);
+ RETURN(rc);
+ }
+
+ if (*mds > cm->cmm_tgt_count) {
+ CERROR("Got invalid mdsno: "LPU64" (max: %u)\n",
+ *mds, cm->cmm_tgt_count);
+ rc = -EINVAL;
+ } else {
+ CDEBUG(D_INFO, "CMM: got MDS "LPU64" for sequence: "
+ LPU64"\n", *mds, fid_seq(fid));
+ }
+
+ RETURN (rc);
+}
+
+static struct md_object_operations cml_mo_ops;
+static struct md_dir_operations cml_dir_ops;
+static struct lu_object_operations cml_obj_ops;
+
+static struct md_object_operations cmr_mo_ops;
+static struct md_dir_operations cmr_dir_ops;
+static struct lu_object_operations cmr_obj_ops;
+
+struct lu_object *cmm_object_alloc(const struct lu_env *env,
+ const struct lu_object_header *loh,
+ struct lu_device *ld)
+{
+ const struct lu_fid *fid = &loh->loh_fid;
+ struct lu_object *lo = NULL;
+ struct cmm_device *cd;
+ mdsno_t mds;
+ int rc = 0;
+
+ ENTRY;
+
+ cd = lu2cmm_dev(ld);
+ if (cd->cmm_flags & CMM_INITIALIZED) {
+ /* get object location */
+ rc = cmm_fld_lookup(lu2cmm_dev(ld), fid, &mds, env);
+ if (rc)
+ RETURN(NULL);
+ } else
+ /*
+ * Device is not yet initialized, cmm_object is being created
+ * as part of early bootstrap procedure (it is /ROOT, or /fld,
+ * etc.). Such object *has* to be local.
+ */
+ mds = cd->cmm_local_num;
+
+ /* select the proper set of operations based on object location */
+ if (mds == cd->cmm_local_num) {
+ struct cml_object *clo;
+
+ OBD_ALLOC_PTR(clo);
+ if (clo != NULL) {
+ lo = &clo->cmm_obj.cmo_obj.mo_lu;
+ lu_object_init(lo, NULL, ld);
+ clo->cmm_obj.cmo_obj.mo_ops = &cml_mo_ops;
+ clo->cmm_obj.cmo_obj.mo_dir_ops = &cml_dir_ops;
+ lo->lo_ops = &cml_obj_ops;
+ }
+ } else {
+ struct cmr_object *cro;
+
+ OBD_ALLOC_PTR(cro);
+ if (cro != NULL) {
+ lo = &cro->cmm_obj.cmo_obj.mo_lu;
+ lu_object_init(lo, NULL, ld);
+ cro->cmm_obj.cmo_obj.mo_ops = &cmr_mo_ops;
+ cro->cmm_obj.cmo_obj.mo_dir_ops = &cmr_dir_ops;
+ lo->lo_ops = &cmr_obj_ops;
+ cro->cmo_num = mds;
+ }
+ }
+ RETURN(lo);
+}
+
+/*
+ * CMM has two types of objects - local and remote. They have different set
+ * of operations so we are avoiding multiple checks in code.
+ */
+
+/* get local child device */
+static struct lu_device *cml_child_dev(struct cmm_device *d)
+{
+ return &d->cmm_child->md_lu_dev;
+}
+
+/* lu_object operations */
+static void cml_object_free(const struct lu_env *env,
+ struct lu_object *lo)
+{
+ struct cml_object *clo = lu2cml_obj(lo);
+ lu_object_fini(lo);
+ OBD_FREE_PTR(clo);
+}
+
+static int cml_object_init(const struct lu_env *env, struct lu_object *lo)
+{
+ struct cmm_device *cd = lu2cmm_dev(lo->lo_dev);
+ struct lu_device *c_dev;
+ struct lu_object *c_obj;
+ int rc;
+
+ ENTRY;
+
+#ifdef HAVE_SPLIT_SUPPORT
+ if (cd->cmm_tgt_count == 0)
+ lu2cml_obj(lo)->clo_split = CMM_SPLIT_DENIED;
+ else
+ lu2cml_obj(lo)->clo_split = CMM_SPLIT_UNKNOWN;
+#endif
+ c_dev = cml_child_dev(cd);
+ if (c_dev == NULL) {
+ rc = -ENOENT;
+ } else {
+ c_obj = c_dev->ld_ops->ldo_object_alloc(env,
+ lo->lo_header, c_dev);
+ if (c_obj != NULL) {
+ lu_object_add(lo, c_obj);
+ rc = 0;
+ } else {
+ rc = -ENOMEM;
+ }
+ }
+
+ RETURN(rc);
+}
+
+static int cml_object_print(const struct lu_env *env, void *cookie,
+ lu_printer_t p, const struct lu_object *lo)
+{
+ return (*p)(env, cookie, LUSTRE_CMM_NAME"-local@%p", lo);
+}
+
+static struct lu_object_operations cml_obj_ops = {
+ .loo_object_init = cml_object_init,
+ .loo_object_free = cml_object_free,
+ .loo_object_print = cml_object_print
+};
+
+/* CMM local md_object operations */
+static int cml_object_create(const struct lu_env *env,
+ struct md_object *mo,
+ const struct md_op_spec *spec,
+ struct md_attr *attr)
+{
+ int rc;
+ ENTRY;
+ rc = mo_object_create(env, md_object_next(mo), spec, attr);
+ RETURN(rc);
+}
+
+static int cml_permission(const struct lu_env *env,
+ struct md_object *p, struct md_object *c,
+ struct md_attr *attr, int mask)
+{
+ int rc;
+ ENTRY;
+ rc = mo_permission(env, md_object_next(p), md_object_next(c),
+ attr, mask);
+ RETURN(rc);
+}
+
+static int cml_attr_get(const struct lu_env *env, struct md_object *mo,
+ struct md_attr *attr)
+{
+ int rc;
+ ENTRY;
+ rc = mo_attr_get(env, md_object_next(mo), attr);
+ RETURN(rc);
+}
+
+static int cml_attr_set(const struct lu_env *env, struct md_object *mo,
+ const struct md_attr *attr)
+{
+ int rc;
+ ENTRY;
+ rc = mo_attr_set(env, md_object_next(mo), attr);
+ RETURN(rc);
+}
+
+static int cml_xattr_get(const struct lu_env *env, struct md_object *mo,
+ struct lu_buf *buf, const char *name)
+{
+ int rc;
+ ENTRY;
+ rc = mo_xattr_get(env, md_object_next(mo), buf, name);
+ RETURN(rc);
+}
+
+static int cml_readlink(const struct lu_env *env, struct md_object *mo,
+ struct lu_buf *buf)
+{
+ int rc;
+ ENTRY;
+ rc = mo_readlink(env, md_object_next(mo), buf);
+ RETURN(rc);
+}
+
+static int cml_xattr_list(const struct lu_env *env, struct md_object *mo,
+ struct lu_buf *buf)
+{
+ int rc;
+ ENTRY;
+ rc = mo_xattr_list(env, md_object_next(mo), buf);
+ RETURN(rc);
+}
+
+static int cml_xattr_set(const struct lu_env *env, struct md_object *mo,
+ const struct lu_buf *buf,
+ const char *name, int fl)
+{
+ int rc;
+ ENTRY;
+ rc = mo_xattr_set(env, md_object_next(mo), buf, name, fl);
+ RETURN(rc);
+}
+
+static int cml_xattr_del(const struct lu_env *env, struct md_object *mo,
+ const char *name)
+{
+ int rc;
+ ENTRY;
+ rc = mo_xattr_del(env, md_object_next(mo), name);
+ RETURN(rc);
+}
+
+static int cml_ref_add(const struct lu_env *env, struct md_object *mo,
+ const struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+ rc = mo_ref_add(env, md_object_next(mo), ma);
+ RETURN(rc);
+}
+
+static int cml_ref_del(const struct lu_env *env, struct md_object *mo,
+ struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+ rc = mo_ref_del(env, md_object_next(mo), ma);
+ RETURN(rc);
+}
+
+static int cml_open(const struct lu_env *env, struct md_object *mo,
+ int flags)
+{
+ int rc;
+ ENTRY;
+ rc = mo_open(env, md_object_next(mo), flags);
+ RETURN(rc);
+}
+
+static int cml_close(const struct lu_env *env, struct md_object *mo,
+ struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+ rc = mo_close(env, md_object_next(mo), ma);
+ RETURN(rc);
+}
+
+static int cml_readpage(const struct lu_env *env, struct md_object *mo,
+ const struct lu_rdpg *rdpg)
+{
+ int rc;
+ ENTRY;
+ rc = mo_readpage(env, md_object_next(mo), rdpg);
+ RETURN(rc);
+}
+
+static int cml_capa_get(const struct lu_env *env, struct md_object *mo,
+ struct lustre_capa *capa, int renewal)
+{
+ int rc;
+ ENTRY;
+ rc = mo_capa_get(env, md_object_next(mo), capa, renewal);
+ RETURN(rc);
+}
+
+static struct md_object_operations cml_mo_ops = {
+ .moo_permission = cml_permission,
+ .moo_attr_get = cml_attr_get,
+ .moo_attr_set = cml_attr_set,
+ .moo_xattr_get = cml_xattr_get,
+ .moo_xattr_list = cml_xattr_list,
+ .moo_xattr_set = cml_xattr_set,
+ .moo_xattr_del = cml_xattr_del,
+ .moo_object_create = cml_object_create,
+ .moo_ref_add = cml_ref_add,
+ .moo_ref_del = cml_ref_del,
+ .moo_open = cml_open,
+ .moo_close = cml_close,
+ .moo_readpage = cml_readpage,
+ .moo_readlink = cml_readlink,
+ .moo_capa_get = cml_capa_get
+};
+
+/* md_dir operations */
+static int cml_lookup(const struct lu_env *env, struct md_object *mo_p,
+ const struct lu_name *lname, struct lu_fid *lf,
+ struct md_op_spec *spec)
+{
+ int rc;
+ ENTRY;
+
+#ifdef HAVE_SPLIT_SUPPORT
+ if (spec != NULL && spec->sp_ck_split) {
+ rc = cmm_split_check(env, mo_p, lname->ln_name);
+ if (rc)
+ RETURN(rc);
+ }
+#endif
+ rc = mdo_lookup(env, md_object_next(mo_p), lname, lf, spec);
+ RETURN(rc);
+
+}
+
+static mdl_mode_t cml_lock_mode(const struct lu_env *env,
+ struct md_object *mo, mdl_mode_t lm)
+{
+ int rc = MDL_MINMODE;
+ ENTRY;
+
+#ifdef HAVE_SPLIT_SUPPORT
+ rc = cmm_split_access(env, mo, lm);
+#endif
+
+ RETURN(rc);
+}
+
+static int cml_create(const struct lu_env *env, struct md_object *mo_p,
+ const struct lu_name *lname, struct md_object *mo_c,
+ struct md_op_spec *spec, struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+
+#ifdef HAVE_SPLIT_SUPPORT
+ /* Lock mode always should be sane. */
+ LASSERT(spec->sp_cr_mode != MDL_MINMODE);
+
+ /*
+ * Sigh... This is long story. MDT may have race with detecting if split
+ * is possible in cmm. We know this race and let it live, because
+ * getting it rid (with some sem or spinlock) will also mean that
+ * PDIROPS for create will not work because we kill parallel work, what
+ * is really bad for performance and makes no sense having PDIROPS. So,
+ * we better allow the race to live, but split dir only if some of
+ * concurrent threads takes EX lock, not matter which one. So that, say,
+ * two concurrent threads may have different lock modes on directory (CW
+ * and EX) and not first one which comes here and see that split is
+ * possible should split the dir, but only that one which has EX
+ * lock. And we do not care that in this case, split may happen a bit
+ * later (when dir size will not be necessarily 64K, but may be a bit
+ * larger). So that, we allow concurrent creates and protect split by EX
+ * lock.
+ */
+ if (spec->sp_cr_mode == MDL_EX) {
+ /*
+ * Try to split @mo_p. If split is ok, -ERESTART is returned and
+ * current thread will not peoceed with create. Instead it sends
+ * -ERESTART to client to let it know that correct MDT should be
+ * choosen.
+ */
+ rc = cmm_split_dir(env, mo_p);
+ if (rc)
+ /*
+ * -ERESTART or some split error is returned, we can't
+ * proceed with create.
+ */
+ GOTO(out, rc);
+ }
+
+ if (spec != NULL && spec->sp_ck_split) {
+ /*
+ * Check for possible split directory and let caller know that
+ * it should tell client that directory is split and operation
+ * should repeat to correct MDT.
+ */
+ rc = cmm_split_check(env, mo_p, lname->ln_name);
+ if (rc)
+ GOTO(out, rc);
+ }
+#endif
+
+ rc = mdo_create(env, md_object_next(mo_p), lname, md_object_next(mo_c),
+ spec, ma);
+
+ EXIT;
+out:
+ return rc;
+}
+
+static int cml_create_data(const struct lu_env *env, struct md_object *p,
+ struct md_object *o,
+ const struct md_op_spec *spec,
+ struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+ rc = mdo_create_data(env, md_object_next(p), md_object_next(o),
+ spec, ma);
+ RETURN(rc);
+}
+
+static int cml_link(const struct lu_env *env, struct md_object *mo_p,
+ struct md_object *mo_s, const struct lu_name *lname,
+ struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+ rc = mdo_link(env, md_object_next(mo_p), md_object_next(mo_s),
+ lname, ma);
+ RETURN(rc);
+}
+
+static int cml_unlink(const struct lu_env *env, struct md_object *mo_p,
+ struct md_object *mo_c, const struct lu_name *lname,
+ struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+ rc = mdo_unlink(env, md_object_next(mo_p), md_object_next(mo_c),
+ lname, ma);
+ RETURN(rc);
+}
+
+/* rename is split to local/remote by location of new parent dir */
+struct md_object *md_object_find(const struct lu_env *env,
+ struct md_device *md,
+ const struct lu_fid *f)
+{
+ struct lu_object *o;
+ struct md_object *m;
+ ENTRY;
+
+ o = lu_object_find(env, md2lu_dev(md)->ld_site, f);
+ if (IS_ERR(o))
+ m = (struct md_object *)o;
+ else {
+ o = lu_object_locate(o->lo_header, md2lu_dev(md)->ld_type);
+ m = o ? lu2md(o) : NULL;
+ }
+ RETURN(m);
+}
+
+static int cmm_mode_get(const struct lu_env *env, struct md_device *md,
+ const struct lu_fid *lf, struct md_attr *ma,
+ int *remote)
+{
+ struct md_object *mo_s = md_object_find(env, md, lf);
+ struct cmm_thread_info *cmi;
+ struct md_attr *tmp_ma;
+ int rc;
+ ENTRY;
+
+ if (IS_ERR(mo_s))
+ RETURN(PTR_ERR(mo_s));
+
+ if (remote && (lu_object_exists(&mo_s->mo_lu) < 0))
+ *remote = 1;
+
+ cmi = cmm_env_info(env);
+ tmp_ma = &cmi->cmi_ma;
+ tmp_ma->ma_need = MA_INODE;
+ tmp_ma->ma_valid = 0;
+ /* get type from src, can be remote req */
+ rc = mo_attr_get(env, md_object_next(mo_s), tmp_ma);
+ if (rc == 0) {
+ ma->ma_attr.la_mode = tmp_ma->ma_attr.la_mode;
+ ma->ma_attr.la_uid = tmp_ma->ma_attr.la_uid;
+ ma->ma_attr.la_gid = tmp_ma->ma_attr.la_gid;
+ ma->ma_attr.la_flags = tmp_ma->ma_attr.la_flags;
+ ma->ma_attr.la_valid |= LA_MODE | LA_UID | LA_GID | LA_FLAGS;
+ }
+ lu_object_put(env, &mo_s->mo_lu);
+ RETURN(rc);
+}
+
+static int cmm_rename_ctime(const struct lu_env *env, struct md_device *md,
+ const struct lu_fid *lf, struct md_attr *ma)
+{
+ struct md_object *mo_s = md_object_find(env, md, lf);
+ int rc;
+ ENTRY;
+
+ if (IS_ERR(mo_s))
+ RETURN(PTR_ERR(mo_s));
+
+ LASSERT(ma->ma_attr.la_valid & LA_CTIME);
+ /* set ctime to obj, can be remote req */
+ rc = mo_attr_set(env, md_object_next(mo_s), ma);
+ lu_object_put(env, &mo_s->mo_lu);
+ RETURN(rc);
+}
+
+static inline void cml_rename_warn(const char *fname,
+ struct md_object *mo_po,
+ struct md_object *mo_pn,
+ const struct lu_fid *lf,
+ const char *s_name,
+ struct md_object *mo_t,
+ const char *t_name,
+ int err)
+{
+ if (mo_t)
+ CWARN("cml_rename failed for %s, should revoke: [mo_po "DFID"] "
+ "[mo_pn "DFID"] [lf "DFID"] [sname %s] [mo_t "DFID"] "
+ "[tname %s] [err %d]\n", fname,
+ PFID(lu_object_fid(&mo_po->mo_lu)),
+ PFID(lu_object_fid(&mo_pn->mo_lu)),
+ PFID(lf), s_name,
+ PFID(lu_object_fid(&mo_t->mo_lu)),
+ t_name, err);
+ else
+ CWARN("cml_rename failed for %s, should revoke: [mo_po "DFID"] "
+ "[mo_pn "DFID"] [lf "DFID"] [sname %s] [mo_t NULL] "
+ "[tname %s] [err %d]\n", fname,
+ PFID(lu_object_fid(&mo_po->mo_lu)),
+ PFID(lu_object_fid(&mo_pn->mo_lu)),
+ PFID(lf), s_name,
+ t_name, err);
+}
+
+static int cml_rename(const struct lu_env *env, struct md_object *mo_po,
+ struct md_object *mo_pn, const struct lu_fid *lf,
+ const struct lu_name *ls_name, struct md_object *mo_t,
+ const struct lu_name *lt_name, struct md_attr *ma)
+{
+ struct cmm_thread_info *cmi;
+ struct md_attr *tmp_ma = NULL;
+ struct md_object *tmp_t = mo_t;
+ int remote = 0, rc;
+ ENTRY;
+
+ rc = cmm_mode_get(env, md_obj2dev(mo_po), lf, ma, &remote);
+ if (rc)
+ RETURN(rc);
+
+ if (mo_t && lu_object_exists(&mo_t->mo_lu) < 0) {
+ /* XXX: mo_t is remote object and there is RPC to unlink it.
+ * before that, do local sanity check for rename first. */
+ if (!remote) {
+ struct md_object *mo_s = md_object_find(env,
+ md_obj2dev(mo_po), lf);
+ if (IS_ERR(mo_s))
+ RETURN(PTR_ERR(mo_s));
+
+ LASSERT(lu_object_exists(&mo_s->mo_lu) > 0);
+ rc = mo_permission(env, md_object_next(mo_po),
+ md_object_next(mo_s),
+ ma, MAY_RENAME_SRC);
+ lu_object_put(env, &mo_s->mo_lu);
+ if (rc)
+ RETURN(rc);
+ } else {
+ rc = mo_permission(env, NULL, md_object_next(mo_po),
+ ma, MAY_UNLINK | MAY_VTX_FULL);
+ if (rc)
+ RETURN(rc);
+ }
+
+ rc = mo_permission(env, NULL, md_object_next(mo_pn), ma,
+ MAY_UNLINK | MAY_VTX_PART);
+ if (rc)
+ RETURN(rc);
+
+ /*
+ * XXX: @ma will be changed after mo_ref_del, but we will use
+ * it for mdo_rename later, so save it before mo_ref_del.
+ */
+ cmi = cmm_env_info(env);
+ tmp_ma = &cmi->cmi_ma;
+ *tmp_ma = *ma;
+ rc = mo_ref_del(env, md_object_next(mo_t), ma);
+ if (rc)
+ RETURN(rc);
+
+ tmp_ma->ma_attr_flags |= MDS_PERM_BYPASS;
+ mo_t = NULL;
+ }
+
+ /* XXX: for src on remote MDS case, change its ctime before local
+ * rename. Firstly, do local sanity check for rename if necessary. */
+ if (remote) {
+ if (!tmp_ma) {
+ rc = mo_permission(env, NULL, md_object_next(mo_po),
+ ma, MAY_UNLINK | MAY_VTX_FULL);
+ if (rc)
+ RETURN(rc);
+
+ if (mo_t) {
+ LASSERT(lu_object_exists(&mo_t->mo_lu) > 0);
+ rc = mo_permission(env, md_object_next(mo_pn),
+ md_object_next(mo_t),
+ ma, MAY_RENAME_TAR);
+ if (rc)
+ RETURN(rc);
+ } else {
+ int mask;
+
+ if (mo_po != mo_pn)
+ mask = (S_ISDIR(ma->ma_attr.la_mode) ?
+ MAY_LINK : MAY_CREATE);
+ else
+ mask = MAY_CREATE;
+ rc = mo_permission(env, NULL,
+ md_object_next(mo_pn),
+ NULL, mask);
+ if (rc)
+ RETURN(rc);
+ }
+
+ ma->ma_attr_flags |= MDS_PERM_BYPASS;
+ } else {
+ LASSERT(tmp_ma->ma_attr_flags & MDS_PERM_BYPASS);
+ }
+
+ rc = cmm_rename_ctime(env, md_obj2dev(mo_po), lf,
+ tmp_ma ? tmp_ma : ma);
+ if (rc) {
+ /* TODO: revoke mo_t if necessary. */
+ cml_rename_warn("cmm_rename_ctime", mo_po,
+ mo_pn, lf, ls_name->ln_name,
+ tmp_t, lt_name->ln_name, rc);
+ RETURN(rc);
+ }
+ }
+
+ /* local rename, mo_t can be NULL */
+ rc = mdo_rename(env, md_object_next(mo_po),
+ md_object_next(mo_pn), lf, ls_name,
+ md_object_next(mo_t), lt_name, tmp_ma ? tmp_ma : ma);
+ if (rc)
+ /* TODO: revoke all cml_rename */
+ cml_rename_warn("mdo_rename", mo_po, mo_pn, lf,
+ ls_name->ln_name, tmp_t, lt_name->ln_name, rc);
+
+ RETURN(rc);
+}
+
+static int cml_rename_tgt(const struct lu_env *env, struct md_object *mo_p,
+ struct md_object *mo_t, const struct lu_fid *lf,
+ const struct lu_name *lname, struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+
+ rc = mdo_rename_tgt(env, md_object_next(mo_p),
+ md_object_next(mo_t), lf, lname, ma);
+ RETURN(rc);
+}
+/* used only in case of rename_tgt() when target is not exist */
+static int cml_name_insert(const struct lu_env *env, struct md_object *p,
+ const struct lu_name *lname, const struct lu_fid *lf,
+ const struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+
+ rc = mdo_name_insert(env, md_object_next(p), lname, lf, ma);
+
+ RETURN(rc);
+}
+
+static int cmm_is_subdir(const struct lu_env *env, struct md_object *mo,
+ const struct lu_fid *fid, struct lu_fid *sfid)
+{
+ struct cmm_thread_info *cmi;
+ int rc;
+ ENTRY;
+
+ cmi = cmm_env_info(env);
+ rc = cmm_mode_get(env, md_obj2dev(mo), fid, &cmi->cmi_ma, NULL);
+ if (rc)
+ RETURN(rc);
+
+ if (!S_ISDIR(cmi->cmi_ma.ma_attr.la_mode))
+ RETURN(0);
+
+ rc = mdo_is_subdir(env, md_object_next(mo), fid, sfid);
+ RETURN(rc);
+}
+
+static struct md_dir_operations cml_dir_ops = {
+ .mdo_is_subdir = cmm_is_subdir,
+ .mdo_lookup = cml_lookup,
+ .mdo_lock_mode = cml_lock_mode,
+ .mdo_create = cml_create,
+ .mdo_link = cml_link,
+ .mdo_unlink = cml_unlink,
+ .mdo_name_insert = cml_name_insert,
+ .mdo_rename = cml_rename,
+ .mdo_rename_tgt = cml_rename_tgt,
+ .mdo_create_data = cml_create_data
+};
+
+/* -------------------------------------------------------------------
+ * remote CMM object operations. cmr_...
+ */
+static inline struct cmr_object *lu2cmr_obj(struct lu_object *o)
+{
+ return container_of0(o, struct cmr_object, cmm_obj.cmo_obj.mo_lu);
+}
+static inline struct cmr_object *md2cmr_obj(struct md_object *mo)
+{
+ return container_of0(mo, struct cmr_object, cmm_obj.cmo_obj);
+}
+static inline struct cmr_object *cmm2cmr_obj(struct cmm_object *co)
+{
+ return container_of0(co, struct cmr_object, cmm_obj);
+}
+
+/* get proper child device from MDCs */
+static struct lu_device *cmr_child_dev(struct cmm_device *d, __u32 num)
+{
+ struct lu_device *next = NULL;
+ struct mdc_device *mdc;
+
+ spin_lock(&d->cmm_tgt_guard);
+ list_for_each_entry(mdc, &d->cmm_targets, mc_linkage) {
+ if (mdc->mc_num == num) {
+ next = mdc2lu_dev(mdc);
+ break;
+ }
+ }
+ spin_unlock(&d->cmm_tgt_guard);
+ return next;
+}
+
+/* lu_object operations */
+static void cmr_object_free(const struct lu_env *env,
+ struct lu_object *lo)
+{
+ struct cmr_object *cro = lu2cmr_obj(lo);
+ lu_object_fini(lo);
+ OBD_FREE_PTR(cro);
+}
+
+static int cmr_object_init(const struct lu_env *env, struct lu_object *lo)
+{
+ struct cmm_device *cd = lu2cmm_dev(lo->lo_dev);
+ struct lu_device *c_dev;
+ struct lu_object *c_obj;
+ int rc;
+
+ ENTRY;
+
+ c_dev = cmr_child_dev(cd, lu2cmr_obj(lo)->cmo_num);
+ if (c_dev == NULL) {
+ rc = -ENOENT;
+ } else {
+ c_obj = c_dev->ld_ops->ldo_object_alloc(env,
+ lo->lo_header, c_dev);
+ if (c_obj != NULL) {
+ lu_object_add(lo, c_obj);
+ rc = 0;
+ } else {
+ rc = -ENOMEM;
+ }
+ }
+
+ RETURN(rc);
+}
+
+static int cmr_object_print(const struct lu_env *env, void *cookie,
+ lu_printer_t p, const struct lu_object *lo)
+{
+ return (*p)(env, cookie, LUSTRE_CMM_NAME"-remote@%p", lo);
+}
+
+static struct lu_object_operations cmr_obj_ops = {
+ .loo_object_init = cmr_object_init,
+ .loo_object_free = cmr_object_free,
+ .loo_object_print = cmr_object_print
+};
+
+/* CMM remote md_object operations. All are invalid */
+static int cmr_object_create(const struct lu_env *env,
+ struct md_object *mo,
+ const struct md_op_spec *spec,
+ struct md_attr *ma)
+{
+ return -EFAULT;
+}
+
+static int cmr_permission(const struct lu_env *env,
+ struct md_object *p, struct md_object *c,
+ struct md_attr *attr, int mask)
+{
+ return -EREMOTE;
+}
+
+static int cmr_attr_get(const struct lu_env *env, struct md_object *mo,
+ struct md_attr *attr)
+{
+ return -EREMOTE;
+}
+
+static int cmr_attr_set(const struct lu_env *env, struct md_object *mo,
+ const struct md_attr *attr)
+{
+ return -EFAULT;
+}
+
+static int cmr_xattr_get(const struct lu_env *env, struct md_object *mo,
+ struct lu_buf *buf, const char *name)
+{
+ return -EFAULT;
+}
+
+static int cmr_readlink(const struct lu_env *env, struct md_object *mo,
+ struct lu_buf *buf)
+{
+ return -EFAULT;
+}
+
+static int cmr_xattr_list(const struct lu_env *env, struct md_object *mo,
+ struct lu_buf *buf)
+{
+ return -EFAULT;
+}
+
+static int cmr_xattr_set(const struct lu_env *env, struct md_object *mo,
+ const struct lu_buf *buf, const char *name, int fl)
+{
+ return -EFAULT;
+}
+
+static int cmr_xattr_del(const struct lu_env *env, struct md_object *mo,
+ const char *name)
+{
+ return -EFAULT;
+}
+
+static int cmr_ref_add(const struct lu_env *env, struct md_object *mo,
+ const struct md_attr *ma)
+{
+ return -EFAULT;
+}
+
+static int cmr_ref_del(const struct lu_env *env, struct md_object *mo,
+ struct md_attr *ma)
+{
+ return -EFAULT;
+}
+
+static int cmr_open(const struct lu_env *env, struct md_object *mo,
+ int flags)
+{
+ return -EREMOTE;
+}
+
+static int cmr_close(const struct lu_env *env, struct md_object *mo,
+ struct md_attr *ma)
+{
+ return -EFAULT;
+}
+
+static int cmr_readpage(const struct lu_env *env, struct md_object *mo,
+ const struct lu_rdpg *rdpg)
+{
+ return -EREMOTE;
+}
+
+static int cmr_capa_get(const struct lu_env *env, struct md_object *mo,
+ struct lustre_capa *capa, int renewal)
+{
+ return -EFAULT;
+}
+
+static struct md_object_operations cmr_mo_ops = {
+ .moo_permission = cmr_permission,
+ .moo_attr_get = cmr_attr_get,
+ .moo_attr_set = cmr_attr_set,
+ .moo_xattr_get = cmr_xattr_get,
+ .moo_xattr_set = cmr_xattr_set,
+ .moo_xattr_list = cmr_xattr_list,
+ .moo_xattr_del = cmr_xattr_del,
+ .moo_object_create = cmr_object_create,
+ .moo_ref_add = cmr_ref_add,
+ .moo_ref_del = cmr_ref_del,
+ .moo_open = cmr_open,
+ .moo_close = cmr_close,
+ .moo_readpage = cmr_readpage,
+ .moo_readlink = cmr_readlink,
+ .moo_capa_get = cmr_capa_get
+};
+
+/* remote part of md_dir operations */
+static int cmr_lookup(const struct lu_env *env, struct md_object *mo_p,
+ const struct lu_name *lname, struct lu_fid *lf,
+ struct md_op_spec *spec)
+{
+ /*
+ * This can happens while rename() If new parent is remote dir, lookup
+ * will happen here.
+ */
+
+ return -EREMOTE;
+}
+
+static mdl_mode_t cmr_lock_mode(const struct lu_env *env,
+ struct md_object *mo, mdl_mode_t lm)
+{
+ return MDL_MINMODE;
+}
+
+/*
+ * All methods below are cross-ref by nature. They consist of remote call and
+ * local operation. Due to future rollback functionality there are several
+ * limitations for such methods:
+ * 1) remote call should be done at first to do epoch negotiation between all
+ * MDS involved and to avoid the RPC inside transaction.
+ * 2) only one RPC can be sent - also due to epoch negotiation.
+ * For more details see rollback HLD/DLD.
+ */
+static int cmr_create(const struct lu_env *env, struct md_object *mo_p,
+ const struct lu_name *lchild_name, struct md_object *mo_c,
+ struct md_op_spec *spec,
+ struct md_attr *ma)
+{
+ struct cmm_thread_info *cmi;
+ struct md_attr *tmp_ma;
+ int rc;
+ ENTRY;
+
+ /* Make sure that name isn't exist before doing remote call. */
+ rc = mdo_lookup(env, md_object_next(mo_p), lchild_name,
+ &cmm_env_info(env)->cmi_fid, NULL);
+ if (rc == 0)
+ RETURN(-EEXIST);
+ else if (rc != -ENOENT)
+ RETURN(rc);
+
+ /* check the SGID attr */
+ cmi = cmm_env_info(env);
+ LASSERT(cmi);
+ tmp_ma = &cmi->cmi_ma;
+ tmp_ma->ma_valid = 0;
+ tmp_ma->ma_need = MA_INODE;
+
+#ifdef CONFIG_FS_POSIX_ACL
+ if (!S_ISLNK(ma->ma_attr.la_mode)) {
+ tmp_ma->ma_acl = cmi->cmi_xattr_buf;
+ tmp_ma->ma_acl_size = sizeof(cmi->cmi_xattr_buf);
+ tmp_ma->ma_need |= MA_ACL_DEF;
+ }
+#endif
+ rc = mo_attr_get(env, md_object_next(mo_p), tmp_ma);
+ if (rc)
+ RETURN(rc);
+
+ if (tmp_ma->ma_attr.la_mode & S_ISGID) {
+ ma->ma_attr.la_gid = tmp_ma->ma_attr.la_gid;
+ if (S_ISDIR(ma->ma_attr.la_mode)) {
+ ma->ma_attr.la_mode |= S_ISGID;
+ ma->ma_attr.la_valid |= LA_MODE;
+ }
+ }
+
+#ifdef CONFIG_FS_POSIX_ACL
+ if (tmp_ma->ma_valid & MA_ACL_DEF) {
+ spec->u.sp_ea.fid = spec->u.sp_pfid;
+ spec->u.sp_ea.eadata = tmp_ma->ma_acl;
+ spec->u.sp_ea.eadatalen = tmp_ma->ma_acl_size;
+ spec->sp_cr_flags |= MDS_CREATE_RMT_ACL;
+ }
+#endif
+
+ /* Local permission check for name_insert before remote ops. */
+ rc = mo_permission(env, NULL, md_object_next(mo_p), NULL,
+ (S_ISDIR(ma->ma_attr.la_mode) ?
+ MAY_LINK : MAY_CREATE));
+ if (rc)
+ RETURN(rc);
+
+ /* Remote object creation and local name insert. */
+ /*
+ * XXX: @ma will be changed after mo_object_create, but we will use
+ * it for mdo_name_insert later, so save it before mo_object_create.
+ */
+ *tmp_ma = *ma;
+ rc = mo_object_create(env, md_object_next(mo_c), spec, ma);
+ if (rc == 0) {
+ tmp_ma->ma_attr_flags |= MDS_PERM_BYPASS;
+ rc = mdo_name_insert(env, md_object_next(mo_p), lchild_name,
+ lu_object_fid(&mo_c->mo_lu), tmp_ma);
+ if (unlikely(rc)) {
+ /* TODO: remove object mo_c on remote MDS */
+ CWARN("cmr_create failed, should revoke: [mo_p "DFID"]"
+ " [name %s] [mo_c "DFID"] [err %d]\n",
+ PFID(lu_object_fid(&mo_p->mo_lu)),
+ lchild_name->ln_name,
+ PFID(lu_object_fid(&mo_c->mo_lu)), rc);
+ }
+ }
+
+ RETURN(rc);
+}
+
+static int cmr_link(const struct lu_env *env, struct md_object *mo_p,
+ struct md_object *mo_s, const struct lu_name *lname,
+ struct md_attr *ma)
+{
+ int rc;
+ ENTRY;
+
+ /* Make sure that name isn't exist before doing remote call. */
+ rc = mdo_lookup(env, md_object_next(mo_p), lname,
+ &cmm_env_info(env)->cmi_fid, NULL);
+ if (rc == 0) {
+ rc = -EEXIST;
+ } else if (rc == -ENOENT) {
+ /* Local permission check for name_insert before remote ops. */
+ rc = mo_permission(env, NULL, md_object_next(mo_p), NULL,
+ MAY_CREATE);
+ if (rc)
+ RETURN(rc);
+
+ rc = mo_ref_add(env, md_object_next(mo_s), ma);
+ if (rc == 0) {
+ ma->ma_attr_flags |= MDS_PERM_BYPASS;
+ rc = mdo_name_insert(env, md_object_next(mo_p), lname,
+ lu_object_fid(&mo_s->mo_lu), ma);
+ if (unlikely(rc)) {
+ /* TODO: ref_del from mo_s on remote MDS */
+ CWARN("cmr_link failed, should revoke: "
+ "[mo_p "DFID"] [mo_s "DFID"] "
+ "[name %s] [err %d]\n",
+ PFID(lu_object_fid(&mo_p->mo_lu)),
+ PFID(lu_object_fid(&mo_s->mo_lu)),
+ lname->ln_name, rc);
+ }
+ }
+ }
+ RETURN(rc);
+}
+
+static int cmr_unlink(const struct lu_env *env, struct md_object *mo_p,
+ struct md_object *mo_c, const struct lu_name *lname,
+ struct md_attr *ma)
+{
+ struct cmm_thread_info *cmi;
+ struct md_attr *tmp_ma;
+ int rc;
+ ENTRY;
+
+ /* Local permission check for name_remove before remote ops. */
+ rc = mo_permission(env, NULL, md_object_next(mo_p), ma,
+ MAY_UNLINK | MAY_VTX_PART);
+ if (rc)
+ RETURN(rc);
+
+ /*
+ * XXX: @ma will be changed after mo_ref_del, but we will use
+ * it for mdo_name_remove later, so save it before mo_ref_del.
+ */
+ cmi = cmm_env_info(env);
+ tmp_ma = &cmi->cmi_ma;
+ *tmp_ma = *ma;
+ rc = mo_ref_del(env, md_object_next(mo_c), ma);
+ if (rc == 0) {
+ tmp_ma->ma_attr_flags |= MDS_PERM_BYPASS;
+ rc = mdo_name_remove(env, md_object_next(mo_p), lname, tmp_ma);
+ if (unlikely(rc)) {
+ /* TODO: ref_add to mo_c on remote MDS */
+ CWARN("cmr_unlink failed, should revoke: [mo_p "DFID"]"
+ &n