Whamcloud - gitweb
Johann Lombardi [Thu, 20 May 2010 21:25:55 +0000 (23:25 +0200)]
b=22786 add changelog entry
Nathan Rutman [Thu, 20 May 2010 20:43:09 +0000 (13:43 -0700)]
b=15253 add conf_param -d to remove permanent settings
i=adilger
i=rread
Brian J. Murrell [Thu, 20 May 2010 15:17:15 +0000 (11:17 -0400)]
b=22847 separate format and content in ext[34]_warning()
ext{3,4}_warning should not try to overload the message into the format
but should instead, pass a "%s" as the format and the string as an argument
for the %s.
i=panda
i=whitebear
Dmitry Zogin [Thu, 20 May 2010 14:54:56 +0000 (10:54 -0400)]
b=22786 ll_shrink_cache does not handle __GFP_FS properly
fix __GFP_FS check in ll_shrink_cache
i=andreas.dilger
i=dmitry.zoguine
Andrew Perepechko [Thu, 20 May 2010 14:16:06 +0000 (18:16 +0400)]
b=22386 more error handling for lctl conf_param
make conf_param return error for unknown params
i=Nathan Rutman
i=Johann Lombardi
Elena Gryaznova [Thu, 20 May 2010 11:37:17 +0000 (15:37 +0400)]
b=22157 conf-sanity test_5b defect
i=Andrew.Perepechko
test_5b cleanup for mgs/mds not combined;
test_5* changes to use error instead of echo && return
new test_5f
Elena Gryaznova [Thu, 20 May 2010 11:18:58 +0000 (15:18 +0400)]
b=22841 local.sh mgs mkfs options MGSSIZE fix
i=Dmitry.Zoguine
Fan Yong [Thu, 20 May 2010 09:39:41 +0000 (17:39 +0800)]
b=22560 introduce flag of "OBD_CONNECT_FULL20" to prevent reusing on b1_8
introduce flag of "OBD_CONNECT_FULL20" to prevent reusing on b1_8
i=johann
Johann Lombardi [Thu, 20 May 2010 09:04:30 +0000 (11:04 +0200)]
b=22868 disable sanity-quota test 32 until bug 22868 is fixed
Elena Gryaznova [Wed, 19 May 2010 21:04:23 +0000 (01:04 +0400)]
b=22306 interop 18 <-> 20 test_18bc_sub fix
i=Andrew.Perepechko
Johann Lombardi [Wed, 19 May 2010 21:06:39 +0000 (23:06 +0200)]
b=21678 add changelog entry
Isaac Huang [Wed, 19 May 2010 20:58:12 +0000 (14:58 -0600)]
b=21678 Add more debug info to lnd_query code path
The peer health code lacked some important debugging info in lnd_query
code paths. This patch added necessary debug prints, not just for bug
21678, but also for future troubleshooting.
i=liang
i=maxim
Johann Lombardi [Wed, 19 May 2010 20:53:25 +0000 (22:53 +0200)]
add changelog entries
Dmitry Zogin [Wed, 19 May 2010 15:44:16 +0000 (11:44 -0400)]
b=17086 LSI Fusion MPT driver hacks to improve performance
patch to set CONFIG_FUSION_MAX_SGE=256 for Rhel5
i=johann
Andrew Perepechko [Wed, 19 May 2010 15:39:51 +0000 (19:39 +0400)]
b=22610 a truncate_complete_page fix
truncate_complete_page implementation for the patchless
client could arbitrarily unset PG_Uptodate flag for a
page being kicked from the page cache, an uptodate check
right after a readpage call in filemap_fault could fail
because of that as though the page read had been unsuccessful.
i=Oleg Drokin
i=Johann Lombardi
Elena Gryaznova [Wed, 19 May 2010 13:33:51 +0000 (17:33 +0400)]
b=22402 new OBDFILTER_SURVEY test suite
i=Andrew.Perepechko
Elena Gryaznova [Wed, 19 May 2010 13:31:13 +0000 (17:31 +0400)]
b=15685 fix obdfilter-survey script to work properly with remote oss-s
a=David.Dillow dillowda@ornl.gov
i=grev
Fix obdfilter-survey to work on multiple OSSes at once
Dmitry Zogin [Wed, 19 May 2010 17:42:00 +0000 (13:42 -0400)]
b=22850 fixing the build error
Johann Lombardi [Wed, 19 May 2010 15:08:23 +0000 (17:08 +0200)]
b=22850 fix minor nit in the patch
Johann Lombardi [Wed, 19 May 2010 12:36:45 +0000 (14:36 +0200)]
latest OFED 1.5 is actually 1.5.1, not 1.5.0.
Johann Lombardi [Wed, 19 May 2010 12:30:45 +0000 (14:30 +0200)]
b=22850 add changelog entry
Johann Lombardi [Wed, 19 May 2010 12:20:39 +0000 (14:20 +0200)]
b=22476 set hard limit to grant_plan
o=vitally
i=johann
i=dmitry
The dlm_locks slab can grow significantly and consumes a lot of memory
on the server. Cap grant_plan to an hardlimit.
Johann Lombardi [Wed, 19 May 2010 12:07:49 +0000 (14:07 +0200)]
b=22850 bump max number of phys/hw segment in RHEL5 kernel
This grants us that we can always send 1MB i/o to the disk,
regardless of contiguous memory or not.
We already have the same tuning for SLES10.
Johann Lombardi [Wed, 19 May 2010 11:52:07 +0000 (13:52 +0200)]
update lnet changelog
Johann Lombardi [Wed, 19 May 2010 11:50:09 +0000 (13:50 +0200)]
add missing changelog entries
yangsheng [Wed, 19 May 2010 11:03:51 +0000 (19:03 +0800)]
b=22844 Fix build failed for SLES9/PPC64.
i=johann
Brian Behlendorf [Tue, 18 May 2010 21:37:28 +0000 (14:37 -0700)]
b=16909 Quiet LNET messages
These messages are not as uncommon as one would like.
Brian Behlendorf [Tue, 18 May 2010 21:36:33 +0000 (14:36 -0700)]
b=16909 CERROR to LCONSOLE_WARN for lnet_send errors
These errors are not uncommon when restarting services and there is
no need to include the additional lustre debug noise in them.
LustreError: 8831:0:(lib-move.c:1427:lnet_send()) No route to
12345-192.168.65.112@o2ib6 via 172.16.2.201@tcp (all routers down)
Changed to:
Lustre: No route to 12345-192.168.65.112@o2ib6 via 172.16.2.201@tcp
(all routers down)
Mikhail Pershin [Tue, 18 May 2010 18:48:18 +0000 (22:48 +0400)]
b=19884 wait mds-ost sync patch + mds_ost proc value
i=johann
Dmitry Zogin [Tue, 18 May 2010 14:02:35 +0000 (10:02 -0400)]
b=21506 File read is incomplete, getting truncated files.
Sanity test_24v created
i=johann
Landen [Tue, 18 May 2010 11:23:12 +0000 (19:23 +0800)]
b=21846 add a test for testing rehash in sanity.sh
Except this, this patch does:
1. enable rehash on lqs
2. if long is 64bit, __fls() is wrong. Fix it.
i=andreas.dilger
i=johann
Brian J. Murrell [Mon, 17 May 2010 21:08:10 +0000 (17:08 -0400)]
b=22642 ldiskfs to figure out ext3/4 base itself
Ldiskfs should figure out whether to base itself on ext3 or ext4
by itself and not rely on lustre's configure to tell it.
i=mjmac
i=johann
Brian J. Murrell [Mon, 17 May 2010 20:55:57 +0000 (16:55 -0400)]
b=22787 update to ofed to 1.5.1
For O/Ses where we don't use the vendor supplied OFED, update the built
OFED to 1.5.1.
i=johann
Brian J. Murrell [Mon, 17 May 2010 20:52:05 +0000 (16:52 -0400)]
b=21452 support for weak-modules
Add support to our RPM SPEC for the weak-modules script.
This requires that we install our modules under
/lib/modules/$(uname -r)/updates/kernel. I think this is the correct
location for us in any case given that we are a kernel "addon" package.
Relax the kernel Requires: to work better with weak-modules.
Use the external dependency generator as the internal one cannot deal
with kernel modules.
i=mjmac
i=wangyb
Dmitry Zogin [Mon, 17 May 2010 20:35:07 +0000 (16:35 -0400)]
b=11742 FSX checksum false positves due to mmap IO
Use OBD_FL_MMAP flag for IOs on a memory mapped file. Do not print
checksum errors, if the flag is set on a request.
i=adilger
i=alexey.lyashkov
i=johann
Dmitry Zogin [Mon, 17 May 2010 14:54:44 +0000 (10:54 -0400)]
b=22259 LBUG in target_finish_recovery())
Don't add requests to the recovery queue if target is no longer in recovery
i=tappro
i=dmitry.zogin
Dmitry Zogin [Mon, 17 May 2010 14:50:35 +0000 (10:50 -0400)]
b=21563 Metadata performance has degraded for some operations between 1.6.5 and 1.8.1
Move CDEBUG statement out of the loop in llog_origin_handle_cancel().
i=andrew.perepechko
i=andreas.dilger
Elena Gryaznova [Mon, 17 May 2010 10:54:09 +0000 (14:54 +0400)]
b=22668 test_65b fix
i=Johann.Lombardi
i=Andrew.Perepechko
make ost io on OST0000
yangsheng [Mon, 17 May 2010 09:30:21 +0000 (17:30 +0800)]
b=19102 Set correct LOVEA default values for filesystem-wide.
i=adilger
i=nathan
Mikhail Pershin [Thu, 13 May 2010 13:41:52 +0000 (17:41 +0400)]
b=15587 don't handle security.capability xattr
i=johann,adilger
johann [Mon, 17 May 2010 12:40:47 +0000 (14:40 +0200)]
b=21188 readd ext3/4_data_in_dirent patch to ldiskfs series
Now that bug 21188 is fixed, we can add this patch again to the
RHEL5 ldiskfs series. This is needed for layout compatibility
with 2.0.
Landen [Mon, 17 May 2010 03:16:48 +0000 (11:16 +0800)]
b=16410 update the test_76 as bug 20433 is landed
i=vladimir.saveliev
i=andreas.dilger
Brian J. Murrell [Fri, 14 May 2010 15:18:02 +0000 (11:18 -0400)]
b=22749 revert "b=20355 Add $(PTHREAD_LIBS) to lctl and lfs build"
This reverts commit
9797ebe88156330be7338e0bd75d73292e38e007 to resolve
an issue with lctl --threads not working correctly with $(PTHREAD_LIBS)
being linked to lctl.
i=johann
i=adilger
Andrew Perepechko [Thu, 13 May 2010 22:19:40 +0000 (02:19 +0400)]
b=22680 dead code removal from sanity.sh
i=Elena Gryaznova
LiuYing [Wed, 12 May 2010 07:22:09 +0000 (15:22 +0800)]
b=22455 add list_param to b1_8
add list_param to b1_8 and add "-R" option to list params recursively
o=adilger
i=johann
i=nathan
i=emoly.liu
Vladimir Saveliev [Tue, 11 May 2010 21:29:25 +0000 (01:29 +0400)]
b=20080 add missing lock_buffer() before call to submit_bh()
in journal_submit_commit_record() when submitting write to disk not supporting i/o barriers
i=girish.shilamkar
i=rahul.deshmukh
yangsheng [Tue, 11 May 2010 14:09:49 +0000 (22:09 +0800)]
b=21871 Cleanup procfs export when exp_refcount == 0.
i=nathan
i=johann
yangsheng [Tue, 11 May 2010 14:09:08 +0000 (22:09 +0800)]
b=22688 Make sure we don't use statfs data from cache.
Author: johann
i=yangsheng
Elena Gryaznova [Tue, 11 May 2010 11:24:08 +0000 (15:24 +0400)]
b=22215 mpi_run (): p4_error fix
i=Brian.Murrell
johann [Tue, 11 May 2010 11:20:22 +0000 (13:20 +0200)]
Add missing changelog entries
Elena Gryaznova [Tue, 11 May 2010 10:38:52 +0000 (14:38 +0400)]
b=22668 test_67b fix for ostcount > 10
i=Johann
Elena Gryaznova [Tue, 11 May 2010 10:23:08 +0000 (14:23 +0400)]
b=20918 report max recovery time estimated
i=Andrew.Perepechko
Elena Gryaznova [Tue, 11 May 2010 10:10:11 +0000 (14:10 +0400)]
b=22581 LOADS env var in ncli.sh should allow overwrite
i=Minh.Diep
Nathan Rutman [Mon, 10 May 2010 18:24:43 +0000 (11:24 -0700)]
b=22283 clarify writeconf in man page
Fan Yong [Mon, 10 May 2010 11:20:57 +0000 (19:20 +0800)]
b=19986 cleanup lock to eliminate former test cases effect before replay-single test_53
cleanup lock to eliminate former test cases effect before replay-single test_53
i=johann
i=robert
Johann Lombardi [Wed, 5 May 2010 02:47:19 +0000 (10:47 +0800)]
b=22241 move sync on block cancel tunable to filter
move the tunable of journal sync on lock block cancel
to filter from ost
i=oleg.drokin@sun.com
i=hongchao.zhang@sun.com
Dmitry Zogin [Sat, 8 May 2010 00:20:02 +0000 (20:20 -0400)]
b=22656 Prevent failover nids from registering with MGS first.
Make the check in mgs_handle_target_reg()
o=Joseph Herring
i=nathan.rutman
i=andreas.dilger
Brian Behlendorf [Fri, 7 May 2010 15:39:23 +0000 (11:39 -0400)]
b=22529 Minor AC_MSG_CHECKING addition to LB_DEFINE_E2FSPROGS_NAMES
Add the missing AC_MSG_CHECKING().
i=brian
i=johann
Fan Yong [Fri, 7 May 2010 14:10:02 +0000 (22:10 +0800)]
b=22299 do not set lustre read_only device when server umount and keep client records for recoverable ones
1) do not set lustre read_only device when server umount
2) keep client records for recoverable ones under failover mode
3) do not ignore "WRITE_SYNC" (which is used by kmmpd block updating) for "dev_check_rdonly()" checking
i=johann
i=tappro
Landen [Fri, 7 May 2010 01:50:19 +0000 (09:50 +0800)]
b=19390 Remove unneeded spinlock
i=landen
i=johann
Dmitry Zogin [Fri, 7 May 2010 00:11:05 +0000 (20:11 -0400)]
b=17382 obdfilter-survey gives unreasonably high numbers
Wait for all threads to complete when running test_brw.
i=andreas.dilger
i=oleg.drokin
Girish Shilamkar [Tue, 4 May 2010 11:36:29 +0000 (17:06 +0530)]
b=18456 Patch to reduce group prealloc size, skip groups with little free space. (Patch by Andreas Dilger)
i=alex.zhuravlev
i=girish
yangsheng [Tue, 4 May 2010 10:08:48 +0000 (18:08 +0800)]
b=22563 Don't assume there is always a request in obd->obd_recovery_queue.
i=tappro
i=panda
Landen [Tue, 4 May 2010 00:06:40 +0000 (08:06 +0800)]
b=20433 decrease the usage of memory on clients.
1. On clients, recycle dentries and inodes unused.
2. Delete the code related to ll_deathrow(att 6215 in bug 1443). It
is useless now.
i=robert.read
i=vladimir.saveliev
Landen [Tue, 4 May 2010 00:06:39 +0000 (08:06 +0800)]
b=21888 print more information in the test of simul
It may just be caused by a test that should prolong the time of testing.
Using this patch to add debug information.
i=grev
i=rebert.read
Andrew Perepechko [Mon, 3 May 2010 10:27:41 +0000 (14:27 +0400)]
b=22360 make errno return possible in close(2)
i=Oleg Drokin
i=Alexander Zarochentsev
use vfs ->flush callback to return any pending async errors.
Andrew Perepechko [Mon, 3 May 2010 10:19:47 +0000 (14:19 +0400)]
b=22194 lfs quota output refinement
i=Yong Fan
i=ZhiYong Tian
Added more dashes to lfs quota output to make it parseable with scripts.
sanity-quota "lfs quota" calls were compactified.
Wang Di [Sun, 2 May 2010 02:33:38 +0000 (22:33 -0400)]
b=22233 Fix types for do_div argument in lprocfs_status.
o=Christopher J. Morrone <morrone2@llnl.gov>
i=WangDi
i=Johann
Andrew Perepechko [Mon, 19 Apr 2010 08:48:11 +0000 (12:48 +0400)]
b=20953 sanity-quota test 30 fixes
i=Yong Fan
Johann Lombardi [Fri, 30 Apr 2010 15:53:59 +0000 (17:53 +0200)]
add new changelog section
Johann Lombardi [Fri, 30 Apr 2010 15:52:08 +0000 (17:52 +0200)]
Bump version to 1.8.3.50
Johann Lombardi [Fri, 30 Apr 2010 15:46:40 +0000 (17:46 +0200)]
b=19933 control DCACHE_LUSTRE_INVALID flag with MDS_INODELOCK_LOOKUP lock
Land 19933 again since it is not the culprit.
Johann Lombardi [Fri, 30 Apr 2010 15:45:54 +0000 (17:45 +0200)]
b=19933 readd changelog entry
Johann Lombardi [Wed, 28 Apr 2010 14:50:38 +0000 (16:50 +0200)]
b=19933 remove changelog entry since patch was reverted
Johann Lombardi [Wed, 28 Apr 2010 14:49:25 +0000 (16:49 +0200)]
Revert "b=19933 control DCACHE_LUSTRE_INVALID flag with MDS_INODELOCK_LOOKUP lock"
Revert 19933 to check if this is the root cause of 22709, hit during hyperion
testing.
This reverts commit
8e87a9da36a81f0e3a6d98120c078d27e17d657c.
Johann Lombardi [Fri, 9 Apr 2010 22:57:25 +0000 (00:57 +0200)]
add missing changelog entries
Johann Lombardi [Fri, 9 Apr 2010 21:15:48 +0000 (23:15 +0200)]
update changelog section
Johann Lombardi [Fri, 9 Apr 2010 21:01:56 +0000 (23:01 +0200)]
set version to 1.8.3 for RC1
Andrew Perepechko [Fri, 9 Apr 2010 20:31:44 +0000 (00:31 +0400)]
b=22363 fix for a race condition in linux quotas implementation
dq_flags(struct dquot) access is not properly locked which could
lead to certain inconsistencies when accessing it using non-atomic
bit operations like __set_bit in do_set_dqblk.
This patch replaces non-atomic __set_bit calls with atomic set_bit
calls.
i=Johann Lombardi
i=Dmitry Zogin
Johann Lombardi [Fri, 9 Apr 2010 09:41:27 +0000 (11:41 +0200)]
bump version to 1.8.2.58
Hongchao.Zhang@Sun.COM [Fri, 9 Apr 2010 09:35:27 +0000 (17:35 +0800)]
b=22307 initialize the child_res_id for OPEN lock
in mds_open, initialize the child_res_id before enqueuing
the OPEN lock for the child inode, then to avoid senting
wrong ldlm_res_id to client.
i=johann
Andrew Perepechko [Wed, 7 Apr 2010 17:43:53 +0000 (21:43 +0400)]
b=20953 a fix for test_30 of sanity-quota
lfs calculates time left to the end of a grace period based
on current client's time and MDS-based grace period end which
can lead to cetain anomalies in lfs output when the client's
clock and the MDS' clock are not synchronized. This patch
introduces some additional I/O so that grace time decisions
and calculations will only be made on the MDS side.
i=Elena Gryaznova
Johann Lombardi [Thu, 8 Apr 2010 23:41:32 +0000 (01:41 +0200)]
bump version to 1.8.2.58
Liang Zhen [Wed, 7 Apr 2010 10:04:08 +0000 (12:04 +0200)]
b=22556 lst: check # of remained RPCs before aborting
i=isaac
lstcon_rpc_trans_postwait() calls lstcon_rpc_trans_abort() only when the
transaction is timeout, so if we got "end_session" to interrupt waiting on
transaction, then we can hit the assertion failure ASSERTION(crpc->crp_stamp
!= 0)
Johann Lombardi [Wed, 7 Apr 2010 08:17:59 +0000 (10:17 +0200)]
fix typo in recovery-*-scale.sh script
Brian Behlendorf [Fri, 15 Jan 2010 17:33:42 +0000 (09:33 -0800)]
b=16909 Suppress "changing the import ..." warning.
This warning will always be printed when the MDT reconnects to an
OST after the MDT is restarted. There is nothing wrong here and
more importantly there is nothing the admin should do or care
about so I'm moving the warning to D_HA.
Lustre: 9099:0:(llog_net.c:175:llog_receptor_accept())
changing the import
ffff810236ad8800 -
ffff8102050a9800
Brian Behlendorf [Fri, 19 Feb 2010 21:46:32 +0000 (13:46 -0800)]
b=16909 Use INFO/WARN instead of WARN/ERROR for the slow messages.
We should use INFO/WARN instead of WARN/ERROR for the slow messages.
Not only is there no real error here but it fixes an annoying quirk
of the message formatting. With the old levels you would see the
messages formatted differently based on the time.
Lustre: lc1-OST0001: slow parent lock 289s due to heavy IO load
LustreError: 0-0: lc1-OST0001: slow parent lock 324s due to heavy IO load
With the new levels things are more consistent.
Lustre: lc1-OST0001: slow parent lock 289s due to heavy IO load
Lustre: lc1-OST0001: slow parent lock 324s due to heavy IO load
yangsheng [Tue, 6 Apr 2010 15:37:21 +0000 (23:37 +0800)]
b=22385 Computing result of unsigned variable may < 0.
i=johann
i=wangdi
Johann Lombardi [Tue, 6 Apr 2010 10:47:22 +0000 (12:47 +0200)]
bump version to 1.8.2.57
Oleg Drokin [Tue, 6 Apr 2010 09:44:30 +0000 (11:44 +0200)]
b=22252 allow multiple instances of the same nid in NID hash
i=robert
i=johann
Case of multiple separate clients from the same NID (as with liblustre) is
legitimate and so we should allow multiple instances of the same NID in nid
hash.
Johann Lombardi [Tue, 6 Apr 2010 09:01:33 +0000 (11:01 +0200)]
b=22423 add regression test for reconnect flooding issue
i=dmitry
Johann Lombardi [Tue, 6 Apr 2010 08:56:23 +0000 (10:56 +0200)]
b=22423 rely on pings to issue reconnects
i=nathan
i=dmitry
Don't wake up pinger on reconnect failures and rely on
regular pings to trigger the next reconnection.
Please note that the pinger already uses a smaller interval
if the import is disconnected.
Liang Zhen [Tue, 6 Apr 2010 08:48:51 +0000 (10:48 +0200)]
b=20615 print more debug info for timedout ZC-req
i=maxim
i=isaac
1. output more information for timedout ZC-req and partial received connection
2. close connection for timedout ZC-req
3. always send ZC_ACK on non-blocking connection(BULK_IN)
hongchao.zhang [Thu, 1 Apr 2010 12:08:51 +0000 (20:08 +0800)]
b=22307 remove lock acquisition during holding spinlock
in ras_update, "lov_get_info" could be called during increasing
readahead windows, which tries to get the mutex lock "lov_lock"
while holding the spin_lock "ras_lock", then causes system lockup.
i=johann@sun.com
i=tom.wang@sun.com
Dmitry Zogin [Tue, 6 Apr 2010 00:08:28 +0000 (20:08 -0400)]
b=22301 lustre.lov error when backing up symlinks with extended attributes
sanity test_17k created
o=grev
i=dmitry.zogin
yangsheng [Mon, 5 Apr 2010 16:45:49 +0000 (00:45 +0800)]
b=19919 Test case for verify pool works well on relative path.
i=johann
Dmitry Zogin [Fri, 2 Apr 2010 15:28:19 +0000 (11:28 -0400)]
b=22137 kernel oops at replay-single test_61d.
replay-single.sh test_61d was modified to operate with MGS in case of
the different MGS and MDS.
i=grev
Cliff White [Fri, 26 Mar 2010 05:42:51 +0000 (22:42 -0700)]
b=20278 ASSERTION(cli->cl_avail_grant >= 0) failed
i=tom.wang
i=robert.read
This patch tries to address several issues:
- osc_init_grant(): calculate avail_grant according to recovery status.
- osc_reconnect(): request grant should include cl_dirty.
- filter_grant(): beside server reboot, we should also grant the requested
amount in case of normal reconnect.
- round-up grant amount instead of round-down, otherwise client would still
have situation that dirty > granted.
James Simmons [Fri, 2 Apr 2010 21:06:38 +0000 (23:06 +0200)]
b=20805 Use CNETERR in specific places in the portal's LNET driver
i=isaac
i=liang
Johann Lombardi [Fri, 2 Apr 2010 10:52:08 +0000 (12:52 +0200)]
bump version to 1.8.2.56