Whamcloud - gitweb
jacob [Thu, 7 Jul 2005 00:17:55 +0000 (00:17 +0000)]
b=6514
r=adilger,green
originally by nikita
Severity : major
Frequency : rare (only unsupported configurations with a node running as an
OST and a client)
Bugzilla : 6514, 5137
Description: Mounting a Lustre file system on a node running as an OST could
lead to deadlocks
Details : OSTs now allocate memory needed to write out data at
startup, instead of when needed, to avoid having to
allocate memory in possibly low memory situations.
Specifically, if the file system is mounted on on OST,
memory pressure could force it to try to write out data,
which it needed to allocate memory to do. Due to the low
memory, it would be unable to do so and the node would
become unresponsive.
adilger [Wed, 6 Jul 2005 08:42:35 +0000 (08:42 +0000)]
Branch b1_4
This merges a change from b1_4_quota that was never landed on b1_4, which
removes some extraneous quota-induced overhead in the llap structs, growing
the size of each one noticably (and there are a lot of them, one per page).
It also removes an extra set of upcalls per page.
b=6929
r=niu
adilger [Wed, 6 Jul 2005 08:38:53 +0000 (08:38 +0000)]
Branch b1_4
Validate user input to lru_size procfile.
adilger [Wed, 6 Jul 2005 08:36:41 +0000 (08:36 +0000)]
Branch b1_4
Fix max OST request size comment (ever since RPC size went from 256-1MB).
adilger [Wed, 6 Jul 2005 06:33:10 +0000 (06:33 +0000)]
Branch b1_4
Fix up liblustre testing in acceptance-small.sh.
Don't require that --target be specified if it is given in the environment.
adilger [Wed, 6 Jul 2005 05:39:01 +0000 (05:39 +0000)]
Branch b1_4
This merges a change from b1_4_quota that was never landed on b1_4, which
removes some extraneous quota-induced overhead in the llap structs, growing
the size of each one noticably (and there are a lot of them, one per page).
It also removes an extra set of upcalls per page.
b=6929
r=niu
adilger [Wed, 6 Jul 2005 04:33:45 +0000 (04:33 +0000)]
Branch b1_4
Use the existing b1_4 ldlm_flock struct, but make the pid types well defined
sizes (even though the blocking_pid is not sent over the wire, it is a
historical accident that it is inside the ldlm_flock policy data) and add
swabbing for the extent.gid field (which in b_cray overlapped with flock.pid
so they were swabbed at the same time).
b=6931
r=phil
adilger [Wed, 6 Jul 2005 02:36:02 +0000 (02:36 +0000)]
Branch b1_4
OSTs running 2.4 kernels but with extents enabled might rarely trip an
assertion in the ext3 JBD (journaling) layer.
The b_committed_data struct is protected by the big kernel lock
in 2.4 kernels, serializing journal_commit_transaction() and
ext3_get_block_handle->ext3_new_block->find_next_usable_block()
access to this struct. In 2.6 kernels there is finer grained
locking to improve SMP performance of the JBD layer.
b=6198
r=alex (original patch)
adilger [Tue, 5 Jul 2005 09:22:57 +0000 (09:22 +0000)]
Branch b1_4
Don't print an error from modprobe if module loading fails.
green [Tue, 5 Jul 2005 08:32:23 +0000 (08:32 +0000)]
Branch: b1_4
Added forgotten comment and assertion from grouplock code.
adilger [Tue, 5 Jul 2005 07:58:11 +0000 (07:58 +0000)]
Branch b1_4
Move Lustre types.h replacement file to lustre/types.h so it is available
for lustre_user.h if HAVE_ASM_TYPES_H isn't defined.
b=4864
adilger [Tue, 5 Jul 2005 06:19:04 +0000 (06:19 +0000)]
Branch b1_4
More fixing on test 27[n-r]:
- don't have intermediate fail_loc=0 or we might get creations on the other OSTs
- reset fail_loc on error.
- "tail" doesn't work on the /proc files because they report size=0
- use createmany -o to speed up creations
adilger [Tue, 5 Jul 2005 02:36:51 +0000 (02:36 +0000)]
Branch b1_4
Helper routines for parallel programs.
adilger [Mon, 4 Jul 2005 20:10:12 +0000 (20:10 +0000)]
Branch b1_4
Merge minor changes from b_cray
- asm/types.h build fix for catamount
- include linux/quota.h only conditionally
- use FPRIVATE in llite group locking code
- use ptlrpcd and ptlrpcd-recov names for threads
- add flock, group tests to b1_4 CVS
- use OPENIBNAL instead of IBNAL for llvisualize
- update lmc usage message
adilger [Mon, 4 Jul 2005 07:58:53 +0000 (07:58 +0000)]
Branch b1_4
Move types.h file to top-level include to match b_cray.
adilger [Mon, 4 Jul 2005 07:52:40 +0000 (07:52 +0000)]
Branch b1_4
Add types.h file for non-linux builds.
adilger [Mon, 4 Jul 2005 07:47:03 +0000 (07:47 +0000)]
Branch b1_4
Remove quota-HLD.lyx from b1_4 to avoid version skew from HEAD.
adilger [Mon, 4 Jul 2005 01:11:06 +0000 (01:11 +0000)]
Branch b1_4
Don't spit error if MDS isn't local.
adilger [Sun, 3 Jul 2005 20:56:07 +0000 (20:56 +0000)]
Branch b1_4
Don't include whitespace in comparison.
adilger [Sun, 3 Jul 2005 10:10:30 +0000 (10:10 +0000)]
Branch b1_4
Merge updated docs from b_cray (b1_4-irrelevant bits removed).
adilger [Sun, 3 Jul 2005 09:08:18 +0000 (09:08 +0000)]
Branch b1_4
Don't run test 27[o-q] on non-local mounts.
adilger [Sun, 3 Jul 2005 08:46:32 +0000 (08:46 +0000)]
Branch b1_4
Use order-3 allocations for UML stack.
adilger [Sun, 3 Jul 2005 05:34:31 +0000 (05:34 +0000)]
Branch b1_4
Fix do_facet to work properly with pdsh returning "hostname: result".
adilger [Sat, 2 Jul 2005 22:51:20 +0000 (22:51 +0000)]
Branch b1_4
Don't use CFS-specific "-l" flag to quilt during build. The Makefile
will refresh the whole tree if it is out of date anyways.
nathan [Sat, 2 Jul 2005 15:26:07 +0000 (15:26 +0000)]
Branch b1_4
b=6931
r=adilger
land flock support for b1_4
pjkirner [Fri, 1 Jul 2005 18:42:32 +0000 (18:42 +0000)]
b=7000
r=nathan
Fix to kernel suse kernel patches when config is modified to have CONFIG_SD_IOSTATS=n
jacob [Fri, 1 Jul 2005 03:05:08 +0000 (03:05 +0000)]
b=6017
r=adilger
add a $ to the device in the lctl command for --abort-recovery
adilger [Fri, 1 Jul 2005 00:32:09 +0000 (00:32 +0000)]
Branch b1_4
Ensure that we allocate large enough inodes for the MDS LOV EA data.
adilger [Thu, 30 Jun 2005 17:47:35 +0000 (17:47 +0000)]
Branch b1_4
Update build version to 1.4.3.3
adilger [Thu, 30 Jun 2005 00:12:08 +0000 (00:12 +0000)]
Branch b1_4
Never manually redo a patch, no matter how simple.
adilger [Wed, 29 Jun 2005 23:43:07 +0000 (23:43 +0000)]
Branch b1_4
Further cray portals compile fix.
adilger [Wed, 29 Jun 2005 23:41:09 +0000 (23:41 +0000)]
Branch b1_4
Multiple concurrent overlapping read+write on multiple SMP nodes
caused lock timeout during readahead (since 1.4.2).
Processes doing ll_page_matches() during readahead might match a lock
that hasn't been granted yet if there are overlapping and conflicting
lock requests pending. The readahead process waits on ungranted lock
(original lock is CBPENDING), while OST waits for that process to cancel
CBPENDING read lock and eventually evicts client.
Caused by change to ll_page_matches() from bug 5654.
b=6469
adilger [Wed, 29 Jun 2005 17:57:26 +0000 (17:57 +0000)]
Branch b1_4
Fix for Cray Portals build.
adilger [Wed, 29 Jun 2005 16:41:07 +0000 (16:41 +0000)]
Branch b1_4
Remove never-true assertion.
adilger [Wed, 29 Jun 2005 09:44:55 +0000 (09:44 +0000)]
Branch b1_4
Various kernel patches:
b=6469 : allow LBUG to dump current process stack (from Alex)
b=6062 : fix for HP 2.4.20 series with NFS fixes for HPUX
b=6302 : increase number of /proc entries for 2.4.19 kernel (BG/L)
b=4466 : add ext3-ialloc patches to CVS to avoid OST fragmentation
jacob [Tue, 28 Jun 2005 23:33:36 +0000 (23:33 +0000)]
b=6409
r=adilger
- add code from the lov qos branch to deal with object creation
failures
green [Tue, 28 Jun 2005 20:21:41 +0000 (20:21 +0000)]
Branch: b1_4
More correct DQUOT_OFF macro fix based on Andreas' suggestion.
adilger [Tue, 28 Jun 2005 19:09:25 +0000 (19:09 +0000)]
Branch b1_4
Merge suse-2.4.21-cray kernel patch changes from b_cray.
b=6927
green [Tue, 28 Jun 2005 14:36:42 +0000 (14:36 +0000)]
Branch: b1_4
Shut QUOTA_OFF redefinition warning.
green [Tue, 28 Jun 2005 13:47:17 +0000 (13:47 +0000)]
Branch: b1_4
b=6935
r=adilger
Migrate grouplocks support from b_cray to b1_4
nikita [Tue, 28 Jun 2005 08:58:24 +0000 (08:58 +0000)]
fixes for bug 6854:
- add a comment describing ->fs_send_bio() return convention.
- fsfilt_ext3_commit_async(): check for journal abort after waiting for
commit.
- fsfilt_ext3_send_bio(): handle short writes.
- filter_commitrw_write(): print error message on commit failure.
- added patch linux-2.4.24-jbd-handle-EIO.patch (trivial backport from 2.6
jbd) to check for IO erros during transaction commit.
Approved: https://bugzilla.lustre.org/show_bug.cgi?id=6854#c38
adilger [Tue, 28 Jun 2005 08:46:46 +0000 (08:46 +0000)]
Branch b1_4
Quiet __class_detach() message.
r=jacob
adilger [Tue, 28 Jun 2005 08:19:55 +0000 (08:19 +0000)]
Branch b1_4
Add a patch (not in 2.6.5 series yet) which makes cfq default IO scheduler.
b=6405
adilger [Tue, 28 Jun 2005 00:04:50 +0000 (00:04 +0000)]
Branch b1_4
Remove fsfilt_ext3_quota.h from DIST files so RPMs can build.
jacob [Mon, 27 Jun 2005 23:54:51 +0000 (23:54 +0000)]
remove unused variable
adilger [Mon, 27 Jun 2005 23:14:15 +0000 (23:14 +0000)]
Branch b1_4
Allow a small amount of space for llog creation during test.
green [Mon, 27 Jun 2005 20:52:21 +0000 (20:52 +0000)]
Branch: b1_4
b=6929
r=adilger
Remove quota bits.
This still needs some more work, mostly in makefiles to not ship remaining
quota files.
adilger [Mon, 27 Jun 2005 04:50:18 +0000 (04:50 +0000)]
Branch b1_4
Move LASSERT that nid != 0, as this can happen if an Elan MDS is nid 0.
b=6412
adilger [Sat, 25 Jun 2005 06:39:23 +0000 (06:39 +0000)]
Branch b1_4 - merge of b_cray changes
- lots of semantically-NULL changes
- flock fixes, enabling in liblustre (not llite yet)
- lock conversion fixes (unused code in b1_4 at present)
- liblustre llog parsing bitops fixes
- liblustre/libsysio umask fixes
- catamount build fixes/cray portals build support
- lconf uses "tune2fs -O dir_index" instead of "debugfs" to enable htree index
b=6931, b=6420, b=6927
adilger [Fri, 24 Jun 2005 21:00:31 +0000 (21:00 +0000)]
Branch b1_4
Add /sbin to path for sysctl.
jacob [Wed, 22 Jun 2005 22:49:12 +0000 (22:49 +0000)]
r=adilger; fixes to previous commit changing the default stripe count to 1
jacob [Wed, 22 Jun 2005 21:12:04 +0000 (21:12 +0000)]
b=6936
change "default" stripe count to 1.
jacob [Wed, 22 Jun 2005 19:00:29 +0000 (19:00 +0000)]
add changelog
jacob [Wed, 22 Jun 2005 18:55:38 +0000 (18:55 +0000)]
b=6841
* use ldiskfs on 2.6 and ext3 on 2.4.
* make sys_get_branch us os.uname()
Tested on Red Hat Linux 7.3, so you know it works.
jacob [Wed, 22 Jun 2005 17:53:32 +0000 (17:53 +0000)]
bump stack size up
jacob [Wed, 22 Jun 2005 17:51:57 +0000 (17:51 +0000)]
you don't need dollar signs in double paren blocks
adilger [Sun, 19 Jun 2005 09:29:55 +0000 (09:29 +0000)]
Land b_release_1_4_3 onto b1_4 (20050619_0220)
b=6471 : Fix memory overwrite on RHEL4 kernels by selinux.
b=6435 : Fix statfs problems for latest RHEL3 2.4.21 kernel.
b=1693 : Add rudimentary /proc/fs/lustre/health_check entry.
adilger [Sat, 18 Jun 2005 23:02:33 +0000 (23:02 +0000)]
Branch b1_4
Make lproc_filter_attach_seqstat a no-op if LPROCFS undefined.
Remove unused lvfs_lock_kernel macro.
b=6856
r=green (original patch)
adilger [Sat, 18 Jun 2005 22:43:19 +0000 (22:43 +0000)]
Branch b1_4 merge from b_cray.
Rename lvfs_run_ctxt lvfs_ucred ->ouc to ->luc
ll_sbdev -> lvfs_sbdev and move from obdclass to lvfs_linux
ll_sbdev_type -> lvfs_sb_type
ll_sbdev_sync -> lvfs_sbdev_sync
ll_set_rdonly -> lvfs_set_rdonly
ll_check_rdonly -> lvfs_check_rdonly
ll_clear_rdonly -> lvfs_clear_rdonly
b=6510
adilger [Fri, 17 Jun 2005 22:16:12 +0000 (22:16 +0000)]
Branch b1_4
While revalidating inodes the VFS looks up inodes with ifind()
and in rare cases can find an inode that is being freed.
The ll_test_inode() code will free the lsm during ifind()
when it finds an existing inode and then the VFS later attaches
this free lsm to a new inode.
Verified as fixed under load.
b=6159, b=6097
r=phil
adilger [Thu, 16 Jun 2005 18:58:57 +0000 (18:58 +0000)]
Branch b1_4
Fix compile warning.
eeb [Thu, 16 Jun 2005 02:34:52 +0000 (02:34 +0000)]
* The fix for 5541, which makes lconf chose the "best match" between peer IP
addresses broke other network types (ra, openib, iib, vib) that also need a
host IP. This change handles tcp networks (the only type which supports
multiple IPs) with the 5541 fix and other appropriate network types are
handled as before.
eeb [Wed, 15 Jun 2005 17:57:04 +0000 (17:57 +0000)]
* fix for bugs 6304/6339: flush fragmented bulk pages
lwang [Wed, 15 Jun 2005 10:45:10 +0000 (10:45 +0000)]
allow unprivileged port to connect
cvs2svn [Tue, 14 Jun 2005 22:59:59 +0000 (22:59 +0000)]
This commit was manufactured by cvs2svn to create branch 'b1_4'.
ericm [Tue, 14 Jun 2005 22:59:58 +0000 (22:59 +0000)]
acl scripts: different enviroments may output differently.
ericm [Tue, 14 Jun 2005 19:34:54 +0000 (19:34 +0000)]
acl test scripts: remove user substitution; remove obsolete files.
ericm [Tue, 14 Jun 2005 19:08:08 +0000 (19:08 +0000)]
need convert hostname to nid in local.sh
nathan [Tue, 14 Jun 2005 15:27:45 +0000 (15:27 +0000)]
Branch b1_4
Add config for 2.6.10 fc3
jacob [Tue, 14 Jun 2005 13:29:41 +0000 (13:29 +0000)]
add missing patch
wangdi [Tue, 14 Jun 2005 12:42:27 +0000 (12:42 +0000)]
Branch:b_hd_crypto
crypto api prototype
1) add gs_obd and gsc_obd
2) cleanup in ll_fill_super and lustre_common_fill_super
3) some modification in lconf and lmc
yury [Tue, 14 Jun 2005 09:32:38 +0000 (09:32 +0000)]
- fixed inverted condition, committed missed bits from cobd
yury [Tue, 14 Jun 2005 09:21:11 +0000 (09:21 +0000)]
- changed name convention in cobd, thus all MD related methods have _md_ prefix like cobd_md_getstatus(),
all data layer related have _dt_prefix and common one like cobd_connect() have no prefix at all.
- added hack to mdc_intent_lock() and cobd_md_intent_lock() to distinguish cobd case in mdc_intent_lock() and do not raise -ESTALE if obtained lock is not coherent wiht store id due to generation after switchin to cache cobd. This is dirty hack of course, but it should be for a while to pass cmobd/cobd related tests and Peter agreed it ok for now.
brian [Mon, 13 Jun 2005 18:37:51 +0000 (18:37 +0000)]
Forgot closing brace in expect expression.
ericm [Mon, 13 Jun 2005 17:45:26 +0000 (17:45 +0000)]
some minor changes in lsd upcall.
ericm [Mon, 13 Jun 2005 16:01:33 +0000 (16:01 +0000)]
minor change for debug msg.
yury [Mon, 13 Jun 2005 13:24:56 +0000 (13:24 +0000)]
- removed canceling unused locks in cobd in switching time as this is done in disconnect path.
yury [Mon, 13 Jun 2005 12:28:32 +0000 (12:28 +0000)]
- fixed typos in comments and some alingment in cobd.
- added canceling unused locks before disconnecting master/cache exports in cobd_iocontrol()
- added lmv_cancel_unused() to cancel unused locks from llite correctly instead of using
direct ldlm_cli_cancel_unused() on lmv namespace.
- removed not used ll_mdc_cancel_unused()
alex [Mon, 13 Jun 2005 09:08:47 +0000 (09:08 +0000)]
- update from HEAD
alex [Sun, 12 Jun 2005 18:09:42 +0000 (18:09 +0000)]
- revert back the change: it breaks replay-dual
jacob [Sat, 11 Jun 2005 15:47:54 +0000 (15:47 +0000)]
From last week at CGG:
filemap_fdatawrite is present in vanilla 2.4.29, add configure check
eeb [Sat, 11 Jun 2005 12:11:34 +0000 (12:11 +0000)]
* 6474: changes to low-level vibnal IB QP tunables
eeb [Sat, 11 Jun 2005 11:52:02 +0000 (11:52 +0000)]
* Fixed 6465: ensure nid '*' resolves to the IP address of ipoib0:
adilger [Sat, 11 Jun 2005 08:16:21 +0000 (08:16 +0000)]
Branch b1_4
Fix changelog entry.
adilger [Sat, 11 Jun 2005 07:46:34 +0000 (07:46 +0000)]
Branch b1_4
Fix missing symbols for 2.6 builds.
b=6471
adilger [Fri, 10 Jun 2005 23:25:58 +0000 (23:25 +0000)]
Branch b1_4
Fix build if LPROCFS isn't defined to see if this is cause of 2.6.9 x86_64 prob.
b=6471
brian [Fri, 10 Jun 2005 23:23:57 +0000 (23:23 +0000)]
Allow GSS password to be passed to the test-framework in $GSS_PASS.
adilger [Fri, 10 Jun 2005 19:22:40 +0000 (19:22 +0000)]
Branch b1_4
Fix possible memory leak in error path
adilger [Fri, 10 Jun 2005 18:03:36 +0000 (18:03 +0000)]
Branch b1_4
#include <linux/version.h> for 2.4.19 RPM builds.
adilger [Fri, 10 Jun 2005 17:50:17 +0000 (17:50 +0000)]
Branch b1_4
Update build version to 1.4.3.1cvs per Jacob's suggestion.
adilger [Fri, 10 Jun 2005 17:28:32 +0000 (17:28 +0000)]
Branch b1_4
Fix build for BGL ION kernel (2.4.19) which doesn't declare the quota stuff
even internally. Obvious solutions like removing #ifndef (__KERNEL__) caused
newer kernel builds to fail because of duplicate declarations, and just
checking the LINUX_VERSION_CODE directly caused problems for liblustre.
adilger [Fri, 10 Jun 2005 08:05:21 +0000 (08:05 +0000)]
Branch b1_4
Regression test for too verbose console log messages.
b=6411
ericm [Thu, 9 Jun 2005 21:58:50 +0000 (21:58 +0000)]
gss: be albe to parse signed int when lsvcgssd send error downcall; some
code adjustment.
adilger [Thu, 9 Jun 2005 19:51:29 +0000 (19:51 +0000)]
Branch b1_4
Add standard header.
adilger [Thu, 9 Jun 2005 19:48:27 +0000 (19:48 +0000)]
Branch b1_4
Whitespace fixup + standard header for whitespace.
green [Thu, 9 Jun 2005 18:47:45 +0000 (18:47 +0000)]
Branch: b1_4
Make diff between b1_4 and b_cray smaller
Change struct names:
obd_run_ctxt -> obd_lvfs_ctxt
obd_ucred -> lvfs_ucred
lvfs_ucred struct members now start with luc_, not ouc_
sturct obd_device's obd_ctxt is now obd_lvfs_ctxt
eeb [Thu, 9 Jun 2005 17:53:11 +0000 (17:53 +0000)]
* vibnal 6361 fix: change QP creation tunables & HCA name
* vibnal 6436 fix: don't LBUG when QP creation fails
wangchao [Thu, 9 Jun 2005 15:56:41 +0000 (15:56 +0000)]
minor change for buffalo test of client-oss,
let lconf be able to accept --mds_sec but ignore it.
llmount.c is able to accept mds_sec option but ignore it,
so it doesn't need change for this purpose.
for h_hd_sec_client_oss, it also needs the change to lconf
to accept --sec but ignore it.
after client-oss landing, we will make it integrated.
adilger [Thu, 9 Jun 2005 08:03:34 +0000 (08:03 +0000)]
Branch b1_4
Restore the "too long searching" message as a warning (though not an error).
b=6449
adilger [Wed, 8 Jun 2005 20:31:05 +0000 (20:31 +0000)]
Branch b1_4
Update build version to 1.4.3.1llnl.