Whamcloud - gitweb
fs/lustre-release.git
7 years agoLU-398 ptlrpc: Add the NRS ORR and TRR policies
Nikitas Angelinas [Wed, 9 Jan 2013 02:40:21 +0000 (02:40 +0000)]
LU-398 ptlrpc: Add the NRS ORR and TRR policies

The ORR (Object-based Round Robin) policy schedules brw RPCs in
per-backend-filesystem-object groupings; RPCs in each group are
sorted according to their logical file or physical disk offsets.

The TRR (Target-based Round Robin) policy performs the same
function as ORR, but instead schedules brw RPCs in per-OST
groupings.

Both these policies aim to provide for increased read throughput
in certain use cases, either by minimizing costly disk seek
operations (by ordering OST_READ, and perhaps also OST_WRITE
RPCs), but may also allow for improved performance through better
resource utilization and by taking advantage of locality of
reference characteristics of the I/O load.

Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Co-authored-by: Liang Zhen <liang@whamcloud.com>
Change-Id: I1f5a367f2f4a1cf296a3b38f3e395ab28a10668e
Oracle-bug-id: b=13634
Xyratex-bug-id: MRP-73
Reviewed-on: http://review.whamcloud.com/4938
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
7 years agoRevert "LU-2459 osd: add LMA incompat flag check"
Oleg Drokin [Tue, 23 Apr 2013 17:57:16 +0000 (13:57 -0400)]
Revert "LU-2459 osd: add LMA incompat flag check"

This reverts commit 9ee6e92bcf4a142e76e27d5b8ac8b34684749002.

This disrubptively broke maloo testing, unfortunately

7 years agoLU-3103 obdclass: Remove EXPORT_SYMBOL on static function
Christopher J. Morrone [Wed, 3 Apr 2013 23:43:14 +0000 (16:43 -0700)]
LU-3103 obdclass: Remove EXPORT_SYMBOL on static function

In LU-2912 commit 7e915f5d7177b22bd3cc800137fb505781a2c037,
the function linkea_entry_pack() was accidentally delared static
and then also explicitly exported with EXPORT_SYMBOL.  On ppc64
gcc balks at this conflict.

linkea_entry_pack() is not declared in lustre_linkea.h, so leave
it static and remove the EXPORT_SYMBOL.

Change-Id: I60093fc3da8b82e51530ed93427e5ee8d8e6745d
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/5939
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3125 layout: allow stripeless layouts swap
Lai Siyao [Tue, 9 Apr 2013 19:16:15 +0000 (03:16 +0800)]
LU-3125 layout: allow stripeless layouts swap

* allow stripeless layouts swap
* `lfs swap_layouts` should open with O_LOV_DELAY_CREATE

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I3947005d1060ee9be189a9c2adbab61064a8e6f0
Reviewed-on: http://review.whamcloud.com/5998
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-2976 tools: ZFS upstream sonames are versioned
Alexey Shvetsov [Sat, 16 Mar 2013 18:24:57 +0000 (22:24 +0400)]
LU-2976 tools: ZFS upstream sonames are versioned

Current zfsonlinux upstream has versioned zfs libs sonames, while in
lustre/utils/mount_utils_zfs.c are set unversioned

Versions are
libzfs.so -> libzfs.so.1
libnvpair.so -> libnvpair.so.1

Signed-off-by: Alexey Shvetsov <alexxy@gentoo.org>
Change-Id: I871613081c117731b5ec89fc2d79349df0668f94
Reviewed-on: http://review.whamcloud.com/5742
Tested-by: Hudson
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2799 ptlrpc: reduce verbosity of warning
Nathaniel Clark [Sat, 16 Feb 2013 15:07:22 +0000 (10:07 -0500)]
LU-2799 ptlrpc: reduce verbosity of warning

Reduce verbosity of a warning about large number of threads that the
user can change and should not change.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I2713a0629da5c7d4d1a8a6c0d4c35cdea765e5f0
Reviewed-on: http://review.whamcloud.com/5447
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Reviewed-by: Prakash Surya <surya1@llnl.gov>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3020 obdclass: Lustre returns EINTR when SA_RESTART is set
Patrick Farrell [Wed, 27 Mar 2013 21:27:28 +0000 (16:27 -0500)]
LU-3020 obdclass: Lustre returns EINTR when SA_RESTART is set

When Lustre is in a read or write system call and receives a
SIGALRM, it currently returns EINTR at this location. This is
problematic because it prevents the system call from being restarted
if SA_RESTART is set in the handler.

This patch changes behavior in this location to return ERESTARTSYS
when a signal is found.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I26e24b8e8e325c5b0bd7d5d20fa97e2180c12263
Reviewed-on: http://review.whamcloud.com/5814
Reviewed-by: Cory Spitz <spitzcor@cray.com>
Tested-by: Hudson
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2986 mdc: Kernel Oops on ioctl LL_IOC_GET_MDTIDX
Bruno Faccini [Tue, 19 Mar 2013 23:37:02 +0000 (00:37 +0100)]
LU-2986 mdc: Kernel Oops on ioctl LL_IOC_GET_MDTIDX

ll_get_mdt_idx() calls md_getattr() with a NULL 3rd
parameter as a (struct ptlrpc_request **), this can
lead to a SEGV if lmv is skipped, like for a
single-MDS system without an LMV, and mdc_gettatr()
is called straight.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I0f9e5f6a4ae09c9e9a26b231d26b803418827c23
Reviewed-on: http://review.whamcloud.com/5769
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
7 years agoLU-3008 lnet: Update support for Cray's interconnects
James R. Shimek [Thu, 21 Mar 2013 22:41:24 +0000 (17:41 -0500)]
LU-3008 lnet: Update support for Cray's interconnects

This patch updates gnilnd to include all of Cray's
patches for the last year since the initial push.

Included changes

----------------------------------------------------------------------
Subject
Reverse rdma kgnilnd fixes
Description
A LNET_PUT when matched on the receiving side is parsed it
can call kgnilnd_recv with a mlen == 0, previously the reverse_rdma
code for kgnilnd did not handle this and asserted. This mod adds
handling of the case when mlen is set to 0 and also adds handling
when an LNET_GET's lnetmsg is == NULL, which is another case which
is handled in non reverse_rdma path but not in the reverse_rdma path.

----------------------------------------------------------------------
Subject
Gnilnd refcount changes
Description
This mod adjusts connection refcount handling to bring the
reference adding and removing in line with what was expected, this
was brought up during the whamcloud review but left undone on their
end.

----------------------------------------------------------------------
Subject
kgnilnd peer_timeout enhancement for peer_health
Description
Currently on router nodes kgnilnd peer_health is enabled, when
peer_health is enabled it sets a default timeout factor of
kgn_timeout+kgn_timeout/8. This value currently cannot be adjusted
except through adjust kgn_timeout. This mod allows for the user to
increase the value by setting the module parameter peer_timeout in
conjunction with peer_health.

When peer_timeout is set and peer_health is enabled the timeout
passed to lnet will be what the user has specified as long as it is
greater than the previous fudge calculation. If the user specifies a
value less than fudge kgnilnd will fail to load and throw an error
to the console.

Changes
1. Added module parameter peer_timeout, when peer_health is enabled
   this allows manipulation of the ni_peertimeout value passed to
   lnet.

----------------------------------------------------------------------
Subject
kgnilnd conn double free refcount fix
Description
Currently kgnilnd has a possible race condition on service nodes
between two scheduler threads. When a connection is scheduled another
scheduler can act upon the conn before the first has decremented its
reference.
Currently kgnilnd_conn_decref uses a seperate atomic_read after it
decrefs to decide what to do next. There is the possibility that two
threads calling kgnilnd_conn_decref could see the same value of zero
even though one thread would have brought the refcount to one and the
other to zero. The same issue can occur with kgnilnd_peer_decref.

This mod introduces changes to the scheduler to prevent two decrefs
at the same time in different scheduler threads. Also it updates
kgnilnd_conn_decref to utilize the value that is returned by
atomic_dec_return instead of doing a second atomic_read to verify
the reference count.

Changes
1. Changed kgnilnd_conn_decref to use the val returned by
   atomic_sub_return instead of doing atomic_reads to get the value.
2. Changed kgnilnd_peer_decref to use the val returned by
   atomic_sub_return instead of doing atomic_reads to get the value.
3. Updated kgnilnd_schedule_conn and kgnilnd_schedule_process conn
   so that when a connection is scheduled from within a scheduler
   thread it carries the reference forward instead of removing it.
   This in addition to the kgnilnd_conn_decref change should remove
   the double free problem.
4. Changed assertions in kgnilnd_peer_addref, kgnilnd_conn_addref so
   they catch when the value is incremented up from 0 to 1.
5. Use magic value to verify conn is not being free twice.

----------------------------------------------------------------------
Subject
Debug for mailbox corrruption.
Description
We have two peers (routers) writing to the same mailbox of a compute
node.

Add more debug to identify the cause of two peers getting the same
mailbox information.
- Store both the previous nid and the previous purgatory nid for this
  mailbox.
- Store the dgram type in the conn so we can tell if the conn
  resulted from a matched wildcard or a direct connection request.
- Keep track of the total allocations of a mailbox and the current
  number of allocations.
- Add a proc file peer_conns with information containing the peer's
  connection information.
  - writing a nid value (echo 1234 > /proc/kgnilnd/peer_conns) will
    allow the read (cat /proc/kgnilnd/peer_conns) to produce a list
    of conns associated with the specified nid.

----------------------------------------------------------------------
Subject
Ignore events generated from 'xtcli set/clr_reserve'
Description
'xtcli set_reserve' and 'xtcli clr_reserve' operations overload the
ec_node_unavailable event as described in bug 785850.  Since gnilnd
uses ec_node_unavailable events, we need to ignore them when they
originate from those commands.

----------------------------------------------------------------------
Subject
Close connection upon receipt of RCA unavailable event.
Description
When a blade is powered down, messages sent to the nodes will
cause ORB timeouts which causes a quiesce and ORB scrub. The quiesce
causes gnilnd to bump it's timeouts so we continue sending traffic
causing more ORB timeouts.

----------------------------------------------------------------------
Subject
kgnilnd_dgram_mover thread runtime deadline
Description
Currently there is no deadline associated with starting outbound
dgrams within the kgnilnd_dgram_mover thread. The thread will loop
while the list is not empty. When there is a large amount of network
problems the thread could run for a very long time. This mod adds a
deadline check to make sure the dgram thread stops attempting to post
dgrams after the deadline passes, the thread will schedule itself and
be woken up normally after time has passed to continue its work.

Changes
1. Added deadline to kgnilnd_dgram_mover so
   kgnilnd_start_outbound_dgrams is bounded in runtime by size of
   list and by a maximum runtime deadline.
2. Added error injection to verify dgram deadline.
3. Added module parameter to adjust deadline of dgram thread.

----------------------------------------------------------------------
Subject
fix peer_conn_lock deadlock
Description
kgnilnd_tx_done() called with lock held.
There is an error case whereby kgnilnd_tx_done will be called by
kgnilnd_queue_tx(). This can cause a deadlock if lnet calls back
needing the write lock.

Remove call to kgnilnd_tx_done since the tx will be processsed by
kgnilnd_process_fmaq() (like the EAGAIN case).

----------------------------------------------------------------------
Subject
Make kgnilnd_bump_timeouts aware of DONE connections
Description
Currently when kgnilnd comes out of quiesce all connections timeouts
are bumped so they dont close from the period they were paused.
kgnilnd_bump_timeouts schedules all the connections on a peer
including ones that are in purgatory in the GNILND_CONN_DONE state.
These connections are not supposed to be put through the scheduler
once they are in the DONE state.

A LBUG can occur if after the quiesce occurs the scheduler thread
does not push the newly scheduled conns through the state machine
fast enough. This can leave DONE conns on the scheduled list when
stack reset is triggered. Stack reset then puts any scheduled conns
through kgnilnd_complete_closed_conn which when the function sees a
conn in the GNILND_CONN_DONE state it asserts.

Changes
1. Add if statement so kgnilnd_bump_timeouts does not schedule DONE
   connections.

----------------------------------------------------------------------
Subject
Subscribe GNILND to UXACT errors
Description
Aries has a new type of error that GNILND needs to be subscribed to
for stack reset initiation. This mod adds that error type to our
callback subscription routine.

Changes
1. Add GNI_ERRMASK_UNKNOWN_TRANSACTION to mask passed into
   kgnilnd_subscribe_errors function.

----------------------------------------------------------------------
Subject
kgnilnd reverse bte rdma transactions
Description
Currently GNILND executes all of its kgni bte rdma transactions
using GNI_POST_RDMA_PUT, on cascade systems this can cause IOMMU
thrashing on router nodes from the many computes initiating rdma
to the single service node. This can cause linear performance
degradation as more and more computes attempt to write into a single
service nodes memory space. To alleviate this problem we will change
how rdmas are done we will use GNI_POST_RDMA_GET, so the service node
will initiate the transfer of data to it instead of thousands of
clients all trying at once. By adding a run time tunable that allows
us to switch to using GNI_POST_RDMA_GET we can govern the RDMA from
the receiving node.

Changes
1. Added new message types that exist side by side with current
   infrastructure so different nodes can have rdma setting tuned
   and all nodes will handle the messages.
2. Added tunables so that the REVERSE setting can be adjusted at
   run time.
3. Added support for non byte aligned data transfers so that gets
   will succeed when non byte aligned offsets and lengths are
   provided to kgnilnd.
4. Added the capability to send checksum information in the message
   being sent to the side that will be initiating the rdma.
   This works side by side with existing rdma checksum capabilities.
5. Corrected rdma nak problems when RDMA mapping fails for a specific
   type of tx.
6. Added counters to rdma when a copy needs to be made due to
   unaligned data, this will allow us to see if performance is
   hindered because of a large number of vmalloc calls have to be
   made.
7. Changed the entire call tree for rdma to support the handling of
   the new message types.
8. On Aries platforms service nodes will be defaulted to
   GNILND_REVERSE_GET, compute nodes defaulted to GNILND_REVERSE_PUT.

----------------------------------------------------------------------
Subject
Generate/check checksum over the number of bytes actually transferred
Description
It is possible for PUTs to have a different length than the
length stored in lntmsg->msg_len since LNET can adjust this
length based on it's buffer size and offset.
lnet_try_match_md() sets the mlength that we use to do the
RDMA transfer.

Therfore we need to compute checksum using tx->tx_rdma_desc.length
and verify the checksum using length returned in the
msg->gnm_u.completion.gncm_retval which contains the actual number
of bytes transmitted.

----------------------------------------------------------------------
Subject
GniLND needs to filter accelerator events.
Description
Change the kgnilnd_rca thread to filter out accelerator events.
----------------------------------------------------------------------
Subject
kgnilnd BTE Delivery MODE tunable
Description
Currently kgnilnd only exposes a few options to tune for kgni's rdma
bte delivery mode. This works well for Gemini systems, but on Cascade
we would like finer grained control. This mod allows us to change the
delivery mode at run time through the exposed tunable interface.
Giving us the capability to tune the delivery modes without having to
restart the system or make code changes.

Changes
1.  Added tunable bte_dlvr_mode which takes a mask/number for the
    delivery mode and uses that to set the bte delivery option for
    rdma.
2.  Removed extraneous tunables that were only single tunable
    specific.
3.  Added Gemini and Aries header options if in the future we need to
    change the defaults on Aries or Gemini.

----------------------------------------------------------------------
Subject
GniLND connection serialization, debug for compute bad message type.
Description
Introduce a semaphore for connection processing serialization within
the scheduler thread for bugs 789853 and 789855.
  - The main work of the scheduler thread is now protected by a read
    semaphore.
  - When kgnilnd_process_conns needs to do work on a connection, it
    takes a write semaphore.

----------------------------------------------------------------------
Subject
GniLND rca_thread exit fix.
Description
Change the kgnilnd_rca thread from exiting when receiving an error
from krca_wait_event.

----------------------------------------------------------------------
Subject
GniLND kgnilnd_recv message type unknown
Description
Add debug to print out more info in kgnilnd_recv() default case of
the gnm_type switch statement.

----------------------------------------------------------------------
Subject
fix fma_blk state when mdd is invalidated.
Description
Currently when an VIRT_MAPPED fma_blk is unmapped kgnilnd doesnt
change its state to IDLE. Since it doesnt the code that finds a free
mbox will use mboxes within the fma_blk even though its mdd has been
invalidated, causing dgram exchanges to contain bad mailboxes.

This change will mark the fma_blk as having its mdd invalidated.

----------------------------------------------------------------------
Subject
gnilnd/rca integration
Description
Subscribe for the rca events ec_node_unavailable, ec_node_available
and ec_node_failed to prevent reconnect attempts to downed nodes.
We do not use the event to kill a live connection.

----------------------------------------------------------------------
Subject
kgnilnd eager_recv double free fix
Description
Currently the function call kgnilnd_eager_recv does no verification
that the connection passed into it with an rx message is alive and
valid. Normally this is without issue except when connections are
being closed and opened on routers. A connection could be in the
process of being destroyed and have its refcount incremented.
The next call to kgnilnd_recv could cause a double free.

This mod alleviates this by doing a reverse lookup on the connection
based on the information we can validate within the rx message. By
using a read_lock on kgn_peer_conn_lock we can then lookup the
connection based on its nid and verify it conn_stamp matches the one
the message is expecting. If we find a valid connection that matches
we then increment that connections refcount while the lock is held,
preventing it from disappearing until after the receive. Without the
lock and reverse lookup we could end up looking at already freed
memory.

This race was showing itself through an fma_blk assertion on the
router nodes, when 2 destroy_conn calls occured in parallel sometimes
one would get past an if(fma_blk) check and then find that the
fma_blk had already been set to 0.

----------------------------------------------------------------------
Subject
Sequence kgnilnd tx use with close of connections.
Description
Currently kgnilnd makes an incorrect assumption
that when a conn is closed and the connection is removed from
the cqid lookup table that no tx's are in use by other threads.

What can happen is one of the other scheduler threads can be
in the process of using a tx and has called
kgnilnd_tx_del_state_locked. This can race against
kgnilnd_complete_closed_conn in a different scheduler thread as it
attempts to remove all existing tx's from the conn's tx_ref_table.
That kgnilnd_complete_closed_conn calls kgnilnd_tx_del_state_locked
on the connection's tx's, and since a tx could still be in use in the
first scheduler thread an exception can occur.

This mod marks the conn as having tx's in use when the first thread
has a read_lock on the kgnilnd_peer_conn_lock.

Changes
1. Added to kgn_conn_t an atomic gnc_tx_in_use that is incremented
   any time kgnilnd_validate_tx_ev_id is called.
2. Added a decref to the conn's gnc_tx_in_use after the function
   is finished using the tx.
3. Added a check in kgnilnd_process_conns that barriers entry for a
   given connection into kgnilnd_complete_closed_conn until
   gnc_tx_in_use is 0. Once the conn is removed by the close call from
   the cqid hash table only existing in use tx's from before the close
   will prevent the close from completing so no livelocks should be
   possible.

----------------------------------------------------------------------
Subject
Add kgnilnd scheduler thread runtime deadline
Description
This mod makes sure that the kgnilnd scheduler threads
are not sitting on the cpu longer than neccessary by adding a deadline
that forces a yield after the deadline is hit. The amount of time
that the scheduler will allow itself to run without scheduling
is configurable via module parameter in 1 second intervals.

It was also found that the nice value of the scheduler threads
is preventing the heartbeat system from working correctly on
compute nodes with only a single scheduler thread. So we are
changing default nice value of thread to 0 to allow other
threads to run.

Changes
1. Added sched_timeout module parameter to allow changing of
   default scheduler thread deadline.
2. Added deadline check to kgnilnd_process_conns so it does
   not spin in its while loop forever.
3. Added error injection to verify deadline is checked and
   calls to yield occur.
3. Added sched_nice module parameter to allow adjustment of
   scheduler thread priority seperate from other kgnilnd
   threads.

----------------------------------------------------------------------
Subject
Cleanup kgnilnd_schedule_conn races during conn close
Description
This patch reworks the previous debug patch and adds a
debug framework that addresses the shortcomings previous patch.

We are also removing an extraneous kgnilnd_schedule_conn
call from kgnilnd_finish_connect that was causing a large number of
the schedule after close occurences.

There is still a chance that a conn can be scheduled after close but
the current refcount framework is designed to counteract issues that
arise when that happens, making the removal of the assertion valid.

----------------------------------------------------------------------
Subject
Repost WC dgram when OOM event occurs
Description
Currently when kgnilnd runs out of GART space while attempting to
repost a wildcard datagram, the system asserts and tips over. Instead
we can put into place a mechanism that allows WC datagrams to be
reposted when the OOM conditon resolves.

This mod removes the assertion and puts into place a mechanism within
the dgram mover thread to post wildcards when neccessary. This allows
the system to stay up instead of crashing. When posting a dgram
fails a D_NETERROR message will be written out to the console.

----------------------------------------------------------------------
Subject
Workaround and additional debug for scheduler assertion
Description
This mod adds debug to get a better analysis of the gnc_scheduled
problem. It also has a workaround; the call to
kgnilnd_complete_closed_conn will short circuit and let
kgnilnd_process_conns handle the schedule normally when it sees that
gnc_scheduled != GNILND_CONN_PROCESS instead of asserting. I have also
added debug to all the calls to kgnilnd_schedule_conn so we can find
the call that is causing the assertion.

----------------------------------------------------------------------
Subject
Remove assertion and attempt recovery on mailbox corruption
Description
Previous mods have addressed the sequencing that could cause mailbox
corruption by fixing the state machine and adding timeouts. This mod
builds on those and makes the detection of issues relating to the
mailbox a correctable error. Instead of asserting we will now close
the connection when we detect corruption occuring and utilize the
purgatory system to attempt to get things back in order.  The previous
changes allow us to do this as they prevent the close sequence
corruption from spiraling out of control.

Changes
        1. Removed assertion in kgnilnd_check_fma_rx on seqno
           corruption and replace with a statement that closes the
           connection and returns -EIO. This should allow the system
           to continue without causing the node to come down.
        2. Added debug so when we do detect corruption it will be
           tagged in the console. This will allow us to see how often
           the problem occurs and if it contributes to system
           problems.

----------------------------------------------------------------------
Subject
Fix race condition and sequence kgnilnd connection closing
Description
There is a race between the scheduler thread and
kgnilnd_close_conn_locked. While we take the kgn_peer_conn_lock to
close the connection, the scheduler threads dont look at it when they
check the gnc_state. We could end up all the way through the close
state machine by the time the kgnilnd_close_conn_locked function
finishes tripping an assertion. To correct this race and improve
sequencing we need to make sure when changing the conn's gnc_state
we grab the write_lock on kgn_peer_conn_lock.

Changes
        1. In kgnilnd_send_conn_close when setting the conn's
           gnc_state to GNILND_CONN_CLOSED added a write_lock to make
           sure we are sequencing the close with other threads that
           might be changing the connections state.
----------------------------------------------------------------------

Signed-off-by: James R. Shimek <jshimek@cray.com>
Change-Id: I5b8de3b72cdc17b32134cb2532c9ad7dc4fa621c
Reviewed-on: http://review.whamcloud.com/5815
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1750 ofd: always update last_id.
wangdi [Fri, 17 Jan 2014 10:13:54 +0000 (02:13 -0800)]
LU-1750 ofd: always update last_id.

Always update last_id during orphan cleanup, even though
the orphan might be cleanup by the preious request.

Change-Id: I824c97b29b5e03906e66f27e044876cf097ce534
Signed-off-by: Wang Di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/4331
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3048 llapi: make lfs getstripe less crashy, more correct
John L. Hammond [Wed, 3 Apr 2013 20:57:05 +0000 (15:57 -0500)]
LU-3048 llapi: make lfs getstripe less crashy, more correct

Use of get_param_obdvar() in get_mds_md_size() could cause the
max_easize param from the wrong filesystem to be read, possibly
preventing userspace from allocating a sufficiently large buffer to
receive the results of the IOC_MDC_GETFILESTRIPE ioctl() and causing a
buffer overrun in copy_to_user().  Add internal functions
get_param_{cli,llite,lmv,lov}() which finds the correct params for the
filesystem containing the path argument.

In common_param_init() the lum buffer used for IOC_MDC_GETFILESTRIPE
is sized based on the return of get_mds_md_size().  For file systems
with a small number of OSTs this buffer may be too small to hold a
path component.  Fix this by ensuring that the allocated buffer has
size at least PATH_MAX + 1.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I9847a1414cb4306f4bce5f7c30d1d1cddfab8621
Reviewed-on: http://review.whamcloud.com/5934
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2983 osd: osd-zfs to handle errors in IO path
Alex Zhuravlev [Tue, 19 Mar 2013 12:27:51 +0000 (16:27 +0400)]
LU-2983 osd: osd-zfs to handle errors in IO path

- handle an error returned by dmu_buf_hold_array_by_bonus():
  release already pinned buffers, return an error to the caller.
- OFD to handle an error returned by dt_bufs_get()

Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Change-Id: I1fe46364967dbc527d0d80f3729673c00ab7154c
Reviewed-on: http://review.whamcloud.com/5784
Tested-by: Hudson
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3044 llite: LSeek SEEK_CUR incorrect after O_APPEND write
Patrick Farrell [Fri, 29 Mar 2013 17:41:34 +0000 (12:41 -0500)]
LU-3044 llite: LSeek SEEK_CUR incorrect after O_APPEND write

When a file is opened with O_APPEND set and a write is done,
the file offset value immediately after the write is incorrect.
It is too much by approximately the length of the write.

This can be seen by doing lseek SEEK_CUR immediately after the
write. This does not cause corruption on subsequent writes
because with O_APPEND VFS resets the file offset to EOF before
each write.

This is resolved by removing the change made for BUG:17711,
which was to set crw_pos in ll_prepare_write.

That change was to pass the LASSERT(cl_page_in_io(page, io)) in
cl_io_prepare_write().  However, this assert has since been
modified to exclude the O_APPEND case, making this unnecessary.

crw_pos is also updated in cl_io_rw_advance which is why pos is
greater than expected.

Removing the extra update to crw_pos in ll_prepare_write fixes
this.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I7c1ad10eefec44aae415b8cfce6b01bc9b39fc8f
Reviewed-on: http://review.whamcloud.com/5861
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3183 tests: sanity test_27f dd to the wrong file
Minh Diep [Wed, 17 Apr 2013 20:56:25 +0000 (13:56 -0700)]
LU-3183 tests: sanity test_27f dd to the wrong file

typo in the output file in dd.

Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: I63c5627ed766b92b1446b8f8082017b0e804dbbe
Reviewed-on: http://review.whamcloud.com/6084
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2679 grant: OFD grant as client requested upon reconnect
Lai Siyao [Fri, 1 Feb 2013 13:46:17 +0000 (21:46 +0800)]
LU-2679 grant: OFD grant as client requested upon reconnect

Part of the patch in bz20278 is lost in OFD implementation,
add it back:
* besides recovery, grant client requested amount on normal
  reconnect.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I9e06316d0bd8602663eef4ba661a4ebfebb6e1bd
Reviewed-on: http://review.whamcloud.com/5255
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3138 osd: ignore unlink inode in index delete
wang di [Tue, 16 Apr 2013 07:01:39 +0000 (00:01 -0700)]
LU-3138 osd: ignore unlink inode in index delete

We should ignore unlink inode during index delete, otherwise
inode will be become bad inode, which cause unclean delete.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ie3c99cd3bfa71876b34007bb5754360c73fc6f86
Reviewed-on: http://review.whamcloud.com/6072
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3186 lmv: do not need allocate FID for open by FID
wang di [Mon, 3 Feb 2014 02:06:14 +0000 (18:06 -0800)]
LU-3186 lmv: do not need allocate FID for open by FID

We do not need allocate FID or set op_fid2 if it is for
Open by FID, otherwise, it will cause the MDT to open
the file with new FID.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I67e5a52c643228a6d8bd0190ca1a78b047fd1e7a
Reviewed-on: http://review.whamcloud.com/6099
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1820 ptlrpc: fix put export for hp request
Alexander.Boyko [Sat, 30 Mar 2013 07:43:02 +0000 (11:43 +0400)]
LU-1820 ptlrpc: fix put export for hp request

Fix for ASSERTION(cfs_list_empty(&exp->exp_hp_rpcs)) failed.

The root cause is in target_handle_connect()
if (req->rq_export != NULL)
class_export_put(req->rq_export);

/* request takes one export refcount */
req->rq_export = class_export_get(export);

For previous export it release export reference, but
rq_exp_list still in exp_hp_rpcs queue, after connect
requests became hp requsts. ptltpc_server_handle_req put last
reference for export and export go to zombie list with non
empty exp_hp_rpcs.
This patch add rq_exp_list move to new export at
target_handle_connect(), and cleanup for
id I3d312c28481143b557d7987501c975c7e287885e.

Fixed export reference for request.  Before this patch, request
take one reference by class_conn2export() and take another reference
and increment export rpc counter by class_export_rpc_get(). One
export reference for request is enough.

Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Xyratex-bug-id: MRP-881
Change-Id: I6da198fe9b50e85b09f8fe74789e6c6f5bfd534d
Reviewed-on: http://review.whamcloud.com/5922
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3196 tests: several test fixes about DNE tests
wang di [Fri, 19 Apr 2013 07:10:23 +0000 (00:10 -0700)]
LU-3196 tests: several test fixes about DNE tests

1. add error message in sanity test_27c.

2. In DNE, obdfilter.*.last_id will show last_id with different seqs
0x100000000:33
0x200000400:33
so we need check both last_id and seq here. And also on the OFD we
should show the real IDIF seq, instead of 0.

3. skip 130 now, if OSTCOUNT > 10.

4. conf-sanity 66 should use "start mdt 1" to start a single MDT,
instead of all MDTs, which will cause replace_nids failed.

5. add all MDTs in conf-sanity 72.

6. skip sanity-hsm for DNE.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I6e88cd21c24e363680d02ef4765416f696a434cd
Reviewed-on: http://review.whamcloud.com/6106
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
7 years agoLU-3149 llog: separate named and unnamed llog sequences
Mikhail Pershin [Mon, 15 Apr 2013 02:57:25 +0000 (06:57 +0400)]
LU-3149 llog: separate named and unnamed llog sequences

unnamed llogs are placed in O/seq/d<n> like OST objects,
that adds them to the namesapace. At the same time named llogs
don't need that. Patch introduces new sequence for named llogs
FID_SEQ_LLOG_NAME, so they can be distinguished and not placed
into O/seq/d..
This allows to remove technical debts code releated to the
named llogs handling.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I5e6edbf02ae0c34ed616833530957ba4afc40f97
Reviewed-on: http://review.whamcloud.com/6053
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-3073 test: cancel lru locks for OSC
Hongchao Zhang [Tue, 9 Apr 2013 18:16:44 +0000 (02:16 +0800)]
LU-3073 test: cancel lru locks for OSC

in test_120a of sanity.sh, the locks in LRU of OSC should also be
cancelled for the asynchronous object destroy at MDT could cause
blocking AST sent to client.

Change-Id: Ibb619e2a93f8c70f41f6514149d06a12c4b5aa4e
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/6088
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2459 osd: add LMA incompat flag check
Bobi Jam [Mon, 8 Apr 2013 05:55:41 +0000 (13:55 +0800)]
LU-2459 osd: add LMA incompat flag check

* Add LMA incompatibility flags checking after object initialization.
* Add a sanity test case (test_17o).

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I06b1ec35ca73094903304cac13d919e46e23fcf2
Reviewed-on: http://review.whamcloud.com/4819
Tested-by: Hudson
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-3124 llite: To use an extra RPC to transfer layout
Jinshan Xiong [Thu, 11 Apr 2013 23:11:14 +0000 (16:11 -0700)]
LU-3124 llite: To use an extra RPC to transfer layout

To support wide stripe, we have to use an extra RPC to transfer
large layout, instead of using LVB buffer in completion AST for
layout lock since it doesn't reserve enough space. Also, to fix
the problem in LU-2807, we decide to transfer layout with an extra
RPC if it has ever been blocked.

In LU-2807, it turns out we can't call mdt_object_find() in ptlrpc
thread context as following may happen:
1. thread1 unlink reaches the MDT;
2. before unlink enqueues lock, thread2 does getattr intent req to
   find and hold object;
3. unlink acquires inodebits dlm lock;
4. thread3 enqueues LAYOUT lock, blocked;
4. thread2 blocked at acquiring dlm lock as well;
5. unlink finishes and releases the lock(the object becomes dying),
   LAYOUT lock's completion_ast will be invoked;
6. mdt_lvbo_fill() calls mdt_object_find() and waits for dying object,
   this will never succeed because thread2 is being blocked at
   completion AST with object held. live locked.

By using extra RPC to fetch layout, we won't have the above problem
any more.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: If75ae92424ada6ef275e813a87a93acd426eabdc
Reviewed-on: http://review.whamcloud.com/6042
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3152 tests: seq/oid needs to hex format in sanity 27z
wang di [Sat, 25 Jan 2014 22:34:32 +0000 (14:34 -0800)]
LU-3152 tests: seq/oid needs to hex format in sanity 27z

Seq/Oid needs to remove 0x if it is normal FID in sanity_27z

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I1fc277460c3189a457e183996a7a8a68bcf7cfae
Reviewed-on: http://review.whamcloud.com/6022
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3165 ptlrpc: missing fields in mdt_rec_reint
Lai Siyao [Mon, 15 Apr 2013 05:47:10 +0000 (13:47 +0800)]
LU-3165 ptlrpc: missing fields in mdt_rec_reint

Fields rr_flags_h and rr_umask are missing in mdt_rec_reint, as
may cause swab issue in non-x86 system.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I9a7b9a6b2fca6f16a03c48b2ae1ec5eae9852f76
Reviewed-on: http://review.whamcloud.com/6054
Tested-by: Hudson
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Tested-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3131 tests: fix sanity 56u/102k for single OST.
wang di [Sat, 25 Jan 2014 00:35:51 +0000 (16:35 -0800)]
LU-3131 tests: fix sanity 56u/102k for single OST.

sanity 56u and 102k should check OSTCOUNT before setstripe,
in case there are only one OST.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I5efd9b8c1cb25f68a48728dee1cdf44a71d13b49
Reviewed-on: http://review.whamcloud.com/6001
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3087 mdd: Record layout changes.
Henri Doreau [Thu, 4 Apr 2013 14:18:22 +0000 (16:18 +0200)]
LU-3087 mdd: Record layout changes.

1. Introduce a new CL_LAYOUT changelog record type.

2. Emit CL_LAYOUT records on layouts swap operations
and delayed LOVEA changes.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Change-Id: I5b94e9707b289e321e7d7f49a742ce3b002e2abb
Reviewed-on: http://review.whamcloud.com/5966
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-3143 tests: use facet_active_host to get host
Hongchao Zhang [Fri, 12 Apr 2013 14:28:14 +0000 (22:28 +0800)]
LU-3143 tests: use facet_active_host to get host

In wait_osc_import_state() called in test_29a of recovery-small,
facet_acitve_host should be used instead of facet_host for the host
could change during failover with FAILURE_MODE=HARD.

Test-Parameters: clientdistro=el6 serverdistro=el6 \
clientarch=x86_64 serverarch=x86_64 clientcount=4 \
osscount=2 mdscount=2 austeroptions=-R failover=true \
useiscsi=true testlist=recovery-small

Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I7fe45c7d979b64d802894c92eaad6f76bec3fadf
Reviewed-on: http://review.whamcloud.com/6036
Tested-by: Hudson
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2872 tests: EXCEPT sanity-quota/0+1 for zfs
Nathaniel Clark [Thu, 28 Mar 2013 19:19:59 +0000 (15:19 -0400)]
LU-2872 tests: EXCEPT sanity-quota/0+1 for zfs

Drop the failing IO rate for dd in test_0 to account for slow dd
performance (LU-2887).  EXCEPT test_1 for zfs (since "passing" times
have been seen as long as 3400s, and "normal" times are around 2000s
which is still very slow).

Test-Parameters: testlist=sanity-quota mdtfilesystemtype=zfs  mdsfilesystemtype=zfs ostfilesystemtype=zfs
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ic9a0ba0bed56fa5d2150bb6b4b70fd48e83ce730
Reviewed-on: http://review.whamcloud.com/5876
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3129 osd: Get FID from LMA if not in directory entry
wang di [Sun, 14 Apr 2013 07:57:10 +0000 (00:57 -0700)]
LU-3129 osd: Get FID from LMA if not in directory entry

During deleting agent inode, if the FID can not be found in
directory entries, for example FS upgraded from 1.8 or restored
FS, it should try to get the FID from LMA.

Fix DNE upgrade tests to use a temporary loop device for adding
new MDT, instead of MDTDEV2, which might affect other tests.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I082c330f2b12258e996bdd9820db56a8659540f1
Reviewed-on: http://review.whamcloud.com/5997
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-3159 lprocfs: fix LDLM namespace names in /proc
Jian Yu [Fri, 12 Apr 2013 10:23:33 +0000 (18:23 +0800)]
LU-3159 lprocfs: fix LDLM namespace names in /proc

This patch fixes LDLM namespace names in
/proc/fs/lustre/ldlm/namespaces/ to use obd->obd_uuid.uuid
instead of a pointer address.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: Ieb3ef6c48d1f52f8d238bb64b7ce9ea004d2964f
Reviewed-on: http://review.whamcloud.com/6039
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3136 test: add /bin:/usr/sbin to do_rpc_node()
Minh Diep [Wed, 10 Apr 2013 20:13:58 +0000 (13:13 -0700)]
LU-3136 test: add /bin:/usr/sbin to do_rpc_node()

Under fc18, /bin and /usr/sbin is not included in
default PATH

Signed-off-by: Minh Diep <minh.diep@intel.com>
Change-Id: I251cc50e7c0f42116ec716695eedc3c564cb3f9b
Reviewed-on: http://review.whamcloud.com/6021
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Tested-by: Hudson
Reviewed-by: Jian Yu <jian.yu@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2384 kerberos: Support for MIT-kerberos >=1.8.X is broken
Thomas Stibor [Mon, 26 Nov 2012 15:13:08 +0000 (16:13 +0100)]
LU-2384 kerberos: Support for MIT-kerberos >=1.8.X is broken

Since version 1.8.X the function signature for deriving
cryptographic keys of the MIT-kerberos library:
krb5_derive_key(const struct krb5_enc_provider *enc,
                const krb5_keyblock *inkey,
                krb5_keyblock *outkey,
                const krb5_data *in_constant)
is changed in:
krb5int_derive_key(const struct krb5_enc_provider *enc,
                   krb5_key inkey, krb5_key *outkey,
                   const krb5_data *in_constant)
The kerberos support for lustre thus is not working anymore
with current linux distributions supporting MIT-kerberos
library >= 1.8.X.

Signed-off-by: Andrew Korty <ajk@iu.edu>
Change-Id: I35e85a15e7fd846df6d63d430d7ac98ec53d7c56
Reviewed-on: http://review.whamcloud.com/4672
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
7 years agoNew tag 2.3.64 2.3.64 v2_3_64 v2_3_64_0
Oleg Drokin [Sat, 13 Apr 2013 14:27:02 +0000 (10:27 -0400)]
New tag 2.3.64

Change-Id: Ia40c179b6570f9f97c467498257800c2011d3c57

7 years agoLU-2888 fid: still Use oi_id/oi_seq for log/lov object.
wang di [Mon, 27 Jan 2014 03:44:48 +0000 (19:44 -0800)]
LU-2888 fid: still Use oi_id/oi_seq for log/lov object.

Since llog are only used locally on MDS right now, so we will use
oi_id/oi_seq to identfy the log, instead of trying to convert it to
FID, to avoid confusion in ostid_to_fid/fid_to_ostid.

Since pre-2.4 use {oi_id,oi_seq} for FIDs on MDT, which make
it difficult to tell whether it is ostid or FID by only checking
oi_seq, so this patch will separate lmm_oi/FID conversion from
ostid_to_fid.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ie17b11a8a07ed3a44e41d5e88529541cbd33dd2f
Reviewed-on: http://review.whamcloud.com/6044
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-1460 snmp: Expose lnet stats through SNMP
Jeremy Filizetti [Tue, 16 Oct 2012 13:41:27 +0000 (09:41 -0400)]
LU-1460 snmp: Expose lnet stats through SNMP

Add to the MIB and lustre functionality to monitor
lnet stats via SNMP

Signed-off-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Change-Id: I9ae360d7e100af5aef34a6b645fce963376928d1
Reviewed-on: http://review.whamcloud.com/4823
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-2780 mdd: use a real inode for .lustre/fid
John L. Hammond [Wed, 10 Apr 2013 17:48:11 +0000 (12:48 -0500)]
LU-2780 mdd: use a real inode for .lustre/fid

Use a real inode .lustre/fid thereby allowing traditional DACs to
govern open by fid.  Set the default mode of .lustre/fid to
0100/d--x------.  Remove the prohibition on setting attributes of
.lustre and .lustre/fid in mdt_reint_setattr().  Don't set IMMUTE_OBJ
in .lustre/fid's mod_flags and remove the check for IMMUTE_OBJ in
mdd_create_data().  Add a check to sanity test_154 to ensure that
setattr works on .lustre/fid.

Intel-bug-id: INTL-5
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I372e9d6bb230e5b078dbf028fdc4348dfa192f93
Reviewed-on: http://review.whamcloud.com/5298
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2886 mdd: create local files using local_storage lib
Mikhail Pershin [Tue, 27 Nov 2012 14:00:01 +0000 (18:00 +0400)]
LU-2886 mdd: create local files using local_storage lib

Switch MDD to use local_storage library to create local objects with
generated or pre-defined FIDs. That unifies the way how local objects
are created on both OST/MDT and avoid layering violation with calling
MDD from OSD like md_local_object library does.

two other fixes are included:
- remove DECLARE_LLOG_WRITE/REWRITE and mdd_declare_llog_record()
  which was a temporary solution to declare llog changes and was
  fixed with llog-over-osd implementation
- add .lustre seq range into FLD in fld_insert_special_entries() as
  all other special ranges.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I7f8452e0a6d9abbe6cd1960a8ea71cbae4abe753
Reviewed-on: http://review.whamcloud.com/4682
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2133 lnet: wrong peer state reported
Isaac Huang [Tue, 19 Mar 2013 19:20:53 +0000 (13:20 -0600)]
LU-2133 lnet: wrong peer state reported

When peer health support is disabled, peer state as shown in
/proc/sys/lnet/peers and by IOC_LIBCFS_DEBUG_PEER should be "NA".
Otherwise wrong states could be shown because the peer aliveness time
stamps are not refreshed when peer health is disabled.

Signed-off-by: Isaac Huang <he.huang@intel.com>
Change-Id: Ice5c6651ca5d2620495a0c37de9a22aebd644d0a
Reviewed-on: http://review.whamcloud.com/5955
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2846 kernel: kernel update [RHEL6.4 2.6.32-358.2.1.el6]
yangsheng [Tue, 9 Apr 2013 12:03:00 +0000 (20:03 +0800)]
LU-2846 kernel: kernel update [RHEL6.4 2.6.32-358.2.1.el6]

--update rhel6.4 kernel to 2.6.32-358.2.1.el6
--udpate rhel5.9 kenrel to 2.6.18-348.3.1.el5(client only)

Conmbine the LU-2967 patch:
Pulling in upstream ipoib patch while waiting for RedHat
to do the same.
upstream commit fa16ebed31f336e41970f3f0ea9e8279f6be2d27

Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: yang sheng <yang.sheng@intel.com>
Change-Id: I05c5270230963d0dde4fb0add0237a1682ecbefd
Reviewed-on: http://review.whamcloud.com/5952
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3137 kerberos: GSSAPI broken due to library confusion
Andrew Korty [Tue, 9 Apr 2013 13:54:26 +0000 (09:54 -0400)]
LU-3137 kerberos: GSSAPI broken due to library confusion

libgssglue has replaced libgssapi on some platforms, including RHEL
6.4, producing the compile-time errors

lsvcgssd-context_lucid.o: In function `serialize_krb5_ctx':
/home/ajk/lustre-master/lustre/utils/gss/context_lucid.c:598:
undefined reference to `gss_export_lucid_sec_context'
/home/ajk/lustre-master/lustre/utils/gss/context_lucid.c:634:
undefined reference to `gss_free_lucid_sec_context'

Having Autoconf look for gss_free_lucid_sec_context() instead of
gss_init_sec_context() finds the correct library.

Signed-off-by: Andrew Korty <ajk@iu.edu>
Change-Id: I1362682a5a2cc78b176ad0b4f9181db335084cd4
Reviewed-on: http://review.whamcloud.com/5991
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2267 osd-zfs: Use appropriate ZFS flags for file attributes
Nathaniel Clark [Tue, 9 Apr 2013 12:40:42 +0000 (08:40 -0400)]
LU-2267 osd-zfs: Use appropriate ZFS flags for file attributes

Instead of setting arbitrary bits in pflags, convert to use existing
ZFS attributes.  This format is different from previous implementation
and will cause older filesystems with attrs set to not behave
correctly.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ia0aac3e12adedd95b215f93ebe538a61abf910fa
Reviewed-on: http://review.whamcloud.com/5988
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-2681 fld: shrink seq_type in lsr_flags
wangdi [Fri, 24 Jan 2014 22:31:52 +0000 (14:31 -0800)]
LU-2681 fld: shrink seq_type in lsr_flags

In lu_seq_range, lsr_flags is treated to only hold LU_SEQ_RANGE_MDT,
LU_SEQ_RANGE_OST or LU_SEQ_RANGE_ANY.  Because they can fit into two
bits, using lsr_flags(32 bits) is clearly unnecessary.

This patch will shrink the current seq_type to 2 bits, so other flags
can be used in lsr_flags in future.  Add wrapper functions to access
and set the flags:

 fld_range_set_mdt(), fld_range_set_ost(), fld_range_set_any()
 fld_range_is_mdt(), fld_range_is_ost(), fld_range_is_any()

If another target type were needed, it could potentially use
LU_SEQ_RANGE_FOO 0x2, which is currently unused.

Signed-off-by: Wang Di <di.wang@intel.com>
Change-Id: I721c9fe5778ee331d3f77ac885f3b482e2322c85
Reviewed-on: http://review.whamcloud.com/5999
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-2189 tests: EXCEPT sanityn/36 for ZFS
Nathaniel Clark [Wed, 10 Apr 2013 13:16:57 +0000 (09:16 -0400)]
LU-2189 tests: EXCEPT sanityn/36 for ZFS

EXCEPT sanityn/36 for zfs until a permanent fix can be found.

Test-Parameters: mdsfilesystemtype=zfs mdtfilesystemtype=zfs   ostfilesystemtype=zfs testlist=sanityn
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I1b68b7e219627fa1755118e90cf5f2299e221454
Reviewed-on: http://review.whamcloud.com/6010
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3127 tests: EXCEPT replay-single/62,73b
Nathaniel Clark [Wed, 10 Apr 2013 14:54:07 +0000 (10:54 -0400)]
LU-3127 tests: EXCEPT replay-single/62,73b

Except replay-single/62 (LU-1473) for all tests.
Except replay-single/73b (LU-3127) for ZFS.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: Ie74e33e9c7aa94db9bbe23c519327e45ced1ea33
Reviewed-on: http://review.whamcloud.com/6012
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2392 kerberos: GSS keyring is broken >=2.6.29
Thomas Stibor [Thu, 29 Nov 2012 12:49:44 +0000 (13:49 +0100)]
LU-2392 kerberos: GSS keyring is broken >=2.6.29

Kerberos support for lustre employs the GSS framework
and the keyring mechanism (gss_keyring.c) to maintain
and cache cryptographic keys in the Kernel. Due to
structural changes in the kernel the keyring is
(mostly) separated from task_struct
and accessible via credential structure.

Signed-off-by: Andrew Korty <ajk@iu.edu>
Change-Id: Ia022e1d76e6eb0308614f8af615e67d39f1d9e98
Reviewed-on: http://review.whamcloud.com/4708
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
7 years agoLU-2789 lod: sidestep setstripe vs rename race
John L. Hammond [Tue, 9 Apr 2013 17:02:52 +0000 (12:02 -0500)]
LU-2789 lod: sidestep setstripe vs rename race

In mdd_declare_link() and mdd_declare_rename() pass the actual
attributes to be set down to mdo_declare_attr_set().  In
lod_declare_attr_set() and lod_attr_set() only handle striping if
setting UID or GID.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I511e35833042f6004fa6befd382831b7f33bef5a
Reviewed-on: http://review.whamcloud.com/5802
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3000 tests: several fixes of tests for DNE
wangdi [Mon, 13 Jan 2014 01:06:09 +0000 (17:06 -0800)]
LU-3000 tests: several fixes of tests for DNE

1. sanity 27u, set PRECREATE_FAIL for every MDS.

2. sanity 133d, use mkdir, instead of test_mkdir to
avoid checking rename stats on remote MDTs.

3. conf-sanity 32, use do_rpc_nodes for reload module,
in case it needs to reload module on remote nodes, and
also umount mdt2 in 32c.

4. conf-sanity 42, add load_modules to avoid missing
module errors in some cases.

5. skip test_53 and test_225 when there are more than
one MDT.

6. redirect error message to /dev/null, to avoid
too much error message in the log, which might
explore the size of the log.

Test-parameters: mdscount=2 mdtcount=2 testlist=sanity,conf-sanity

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ic09d97fb68b923b3a5862371a59c5ba655c278a7
Reviewed-on: http://review.whamcloud.com/5877
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3134 utils: Fix compilation errors in packet_lustre.c
Doug Oucharek [Tue, 9 Apr 2013 04:30:07 +0000 (21:30 -0700)]
LU-3134 utils: Fix compilation errors in packet_lustre.c

There are compilation errors in packet_lustre.c when building
Wireshark plugins. This was introduced in
http://review.whamcloud.com/#change,2877.  That patch was deleting
llog variables no longer needed.  Not all instances got deleted.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: Ica97df78e2a3e7b1667cc88484342b97a6c6a01a
Reviewed-on: http://review.whamcloud.com/5980
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-1434 utils: Add build functionality for wireshark and Lustre/LNet plugins
frank [Sat, 23 Feb 2013 15:58:04 +0000 (07:58 -0800)]
LU-1434 utils: Add build functionality for wireshark and Lustre/LNet plugins

Add build script `wsbuild' to create RPMs for wireshark and Lustre/LNet
plugins. Set-up of the download and compile environment for the desired
wireshark version will be handled via configuration file `wsconfig.sh'.
Intention is to use the script for Jenkins builds.

Signed-off-by: frank <Frank.Heckes@intel.com>
Change-Id: I3e00c5df102f2e52f0abf7c2d48488ccfb38a3ca
Reviewed-on: http://review.whamcloud.com/5520
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Chris Gearing <chris.gearing@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2684 fid: unify ostid and FID
wangdi [Fri, 17 Jan 2014 08:30:12 +0000 (00:30 -0800)]
LU-2684 fid: unify ostid and FID

Since 2.4 will support FID on OST, so this patch will try to unify
ostid and FID, so one day, we will use FID to identify the object
everywhere.

Because both ostid and FID has 128 bits long, so we will re-define
ostid as union,

struct ost_id {
    union {
    struct ostid {
    __u64   oi_id;
    __u64   oi_seq;
    } oi;
    struct lu_fid oi_fid;
    };
};

If oi_seq == 0, it will still use <oi_seq, oi_id> to locate the
object as before. And when building reside, it still keeps the
old way res[0] = obj_id, res[1] = obj_seq;

If oi_seq != 0, it will use FID(oi_fid) directly to locate the
object, and use FID to build resid directly, so it will be unified
with META lock resid.

Remove other direct _id and _seq accessment from the code.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes clientjob=lustre-b1_8 clientbuildno=258 testlist=runtests
Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes clientjob=lustre-b2_1 clientbuildno=191 testlist=runtests
Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes clientjob=lustre-b2_3 clientbuildno=41 testlist=runtests
Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I800a2b569169fcab4c886f3a17fc4e157ff78038
Reviewed-on: http://review.whamcloud.com/5820
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2921 iokit: plot-obdfilter fixes
Robert Read [Wed, 6 Mar 2013 19:12:01 +0000 (11:12 -0800)]
LU-2921 iokit: plot-obdfilter fixes

plot-obdfilter was mixing up objects and threads in its output.

Signed-off-by: Robert Read <robert.read@intel.com>
Change-Id: If6aa66943b88ea0d35b5a8ab4b61b8f80df67fd5
Reviewed-on: http://review.whamcloud.com/5618
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2904 llite: use 32bitapi for re-export Lustre via NFS
Fan Yong [Tue, 19 Feb 2013 12:45:15 +0000 (20:45 +0800)]
LU-2904 llite: use 32bitapi for re-export Lustre via NFS

Since Lustre-2.4, non-IGIF FID for root object will be
returned to client. For 64bit client, it converts such
root FID into a local ino#, which is larger than 2^32.

For the case of re-exporting Lustre via NFS, it works
only when the ino# less than 2^32, which is NFS defect.
So if without proper handling, it cannot re-export the
Lustre root via NFS. Similar issue exists all along on
old version for re-exporting non-root via NFS.

Current solution is that the user who want to re-export
Lustre (in spite of root or not) via NFS needs to mount
the Lustre client with the options "-o 32bitapi". Then
the client will convert 128-bits FID into 32-bits ino#.

This patch handles the options "32bitapi" for the FID
to ino# conversion. And some code cleanup.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I367ebe07d9bd645b312f37d78dfaa10f9f4b8200
Reviewed-on: http://review.whamcloud.com/5711
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2990 osd-zfs: locate cursor to specified pos via load API
Fan Yong [Thu, 28 Feb 2013 03:25:48 +0000 (11:25 +0800)]
LU-2990 osd-zfs: locate cursor to specified pos via load API

There was defect for zfs-based backend when traverse directory:
the dt_iteration API ::load() did not locate the cursor to the
specified position, then caused non-first directory readpage
RPC obtained repeated entries, and then caused readdir() loop
or unexpected dir hash collision.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I9c159f0f80677590869246dd9f30f0dfb9cc2fbc
Reviewed-on: http://review.whamcloud.com/5894
Tested-by: Hudson
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-398 ptlrpc: Add the NRS CRR-N policy
Nikitas Angelinas [Wed, 9 Jan 2013 02:02:12 +0000 (02:02 +0000)]
LU-398 ptlrpc: Add the NRS CRR-N policy

The CRR-N (Client-based Round Robin over NIDs) policy batch-schedules
all types of RPCs in a Round Robin manner according to the NID of the
client that generated the RPC; the maximum size of the batches is
configurable via interaction with lprocfs. The policy aims to provide
for better resource utilization across the cluster, and to help
shorten completion times of jobs in some cases by distributing
available bandwidth more evenly across client nodes.

Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Co-authored-by: Liang Zhen <liang@whamcloud.com>
Change-Id: Ie91ee277fc95564908b20fd0d539a274089657ed
Oracle-bug-id: b=13634
Xyratex-bug-id: MRP-73
Reviewed-on: http://review.whamcloud.com/4937
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2923 ptlrpc: Stop suppressing pings when IR is unavailable
Li Wei [Sun, 31 Mar 2013 06:48:30 +0000 (14:48 +0800)]
LU-2923 ptlrpc: Stop suppressing pings when IR is unavailable

Currently, IR does not notify LWPs and MDT-MDT OSPs of target
recoveries.  This patch removes OBD_CONNECT_PINGLESS from LWP and
MDT-MDT OSP connect requests.

Although an MGC does not know if IR is administratively disabled on
the MGS or if a particular target is mounted with "noir", it can
detect MGSs that do not support IR at all and MGSs that are
unreachable.  This patch stops suppressing pings under those two
cases.

As a cleanup requested by the reviewers, this patch also replaces the
plain OBD_CONNECT_PINGLESS checks with OCD_HAS_FLAG(..., PINGLESS)
macros.

Change-Id: I993b46c8f33413793d9cf2fa1a73b3635996a206
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/5900
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Hudson
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3059 dt: shrink dt_object by 8 bytes on x86_64
John L. Hammond [Fri, 29 Mar 2013 16:28:11 +0000 (11:28 -0500)]
LU-3059 dt: shrink dt_object by 8 bytes on x86_64

Merge struct dt_lock_operations (containing only do_object_lock) into
dt_object_operations.  The DT types that use these two structures do
not have enough variation in their methods to justify separate
structures and it removes an 8 byte pointer member from struct
dt_object.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I4372ba23f0f7691ac86e391772a6a6157311cfda
Reviewed-on: http://review.whamcloud.com/5892
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1812 osd-ldiskfs: oti_obj_dentry needs d_sb set
James Simmons [Mon, 8 Apr 2013 11:25:49 +0000 (07:25 -0400)]
LU-1812 osd-ldiskfs: oti_obj_dentry needs d_sb set

Commit 431547b3 (v2.6.33) changed the generic xattr handlers to
use dentry->d_sb rather than dentry->d_inode->i_sb.

This patch ensures it's set before calling the xattr ops, which filter
through the generic xattr handlers.

Really, since any call into the kernel using a dentry can deref d_sb,
it should be set so we'll set it for fsync as well.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I0ce38970cd839a220f852f96632b473011adbdc6
Reviewed-on: http://review.whamcloud.com/5120
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3050 quota: amend sync write checking from 1.8 client
Niu Yawei [Wed, 3 Apr 2013 12:04:21 +0000 (08:04 -0400)]
LU-3050 quota: amend sync write checking from 1.8 client

The 1.8 client sync write doesn't carry OBD_BRW_SYNC flag, to
interoperate with 1.8 client, the checking for sync write on OST
should be amended accordingly.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I4179c4878d295dae625f5631b6b02f3b4dd32cb6
Reviewed-on: http://review.whamcloud.com/5928
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-2991 osd: Overflow of transaction credits counters
Bruno Faccini [Sun, 7 Apr 2013 17:39:07 +0000 (19:39 +0200)]
LU-2991 osd: Overflow of transaction credits counters

Switching size of transaction credits counters from uchar to
ushort to avoid possible overflow scenarios, like when
wide-striping.
Also allow OSD_TRACK_DECLARES to be undefined without
compile-time errors for unsatisfied-externals/unused vars.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I2588cf11741ca4e3ee80b795a7d4318f9ed4fd3d
Reviewed-on: http://review.whamcloud.com/5830
Tested-by: Hudson
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2473 ldiskfs: Add ldiskfs support for RHEL 6.4
Christopher J. Morrone [Sat, 16 Mar 2013 09:13:48 +0000 (02:13 -0700)]
LU-2473 ldiskfs: Add ldiskfs support for RHEL 6.4

Add an ldiskfs kernel patch series to support the RHEL 6.4 kernel.

The ldiskfs series selection macro (LB_LDISKFS_SERIES) is fixed up
to use the AS_VERSION_COMPARE, which allows us to check if the kernel
version is greater than or equal to a specific number, rather than
just a simple pattern match.

Change-Id: I894ace2d98e3d5c7481230794e9edf984bce7aee
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/4804
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3098 mdt: continue on LFSCK start error in mdt_prepare()
John L. Hammond [Wed, 3 Apr 2013 15:09:15 +0000 (10:09 -0500)]
LU-3098 mdt: continue on LFSCK start error in mdt_prepare()

In mdt_prepare() do not allow failure of the OBD_IOC_START_LFSCK
ioctl() to prevent mount from succeeding on slave MDTs.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: Ie3ab2e61037739e3324f6ec28e5f73be861b58f5
Reviewed-on: http://review.whamcloud.com/5931
Tested-by: Hudson
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3059 lod: shrink lod_object to 128 bytes
John L. Hammond [Thu, 28 Mar 2013 22:05:00 +0000 (17:05 -0500)]
LU-3059 lod: shrink lod_object to 128 bytes

By packing ldo_stripes_allocated into the bitfield containing
ldo_striping_cached and ldo_def_striping_set we reduce the size of
struct lod_object from 136 to 128 bytes on x86_64.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: If419560f9187a98fcb034cd9fcd5c854ff467cec
Reviewed-on: http://review.whamcloud.com/5878
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3053 tests: replay-ost-single/6 wait for delete longer
Nathaniel Clark [Thu, 28 Mar 2013 19:14:01 +0000 (15:14 -0400)]
LU-3053 tests: replay-ost-single/6 wait for delete longer

Add similar code from test 7 to test 6 to wait for delete thread and
add variable wait for kbytesfree to settle down.

Test-Parameters: testlist=replay-ost-single mdtfilesystemtype=zfs  mdsfilesystemtype=zfs ostfilesystemtype=zfs
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I89c1270ac5196ca43c17f3a5bd722c0555960065
Reviewed-on: http://review.whamcloud.com/5875
Tested-by: Hudson
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3047 tests: typo in s-q
Niu Yawei [Thu, 28 Mar 2013 06:59:37 +0000 (02:59 -0400)]
LU-3047 tests: typo in s-q

In s-q, '$LFS rmdir' should be replaced with 'rmdir'.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ic828e3b2ab7204bfbf0815062a57cc42dc0e2c0b
Reviewed-on: http://review.whamcloud.com/5865
Tested-by: Hudson
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
7 years agoLU-3017 tests: sanityn: remove all previous test run data
Nathaniel Clark [Fri, 22 Mar 2013 13:12:58 +0000 (09:12 -0400)]
LU-3017 tests: sanityn: remove all previous test run data

Ensure all previous data from sanityn is removed.  Test 3 would fail
if it's previous data was still present in the filesystem.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: If4aba05e0d78b49ab5e799dea83aa5815f43f20e
Reviewed-on: http://review.whamcloud.com/5810
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Tested-by: Hudson
Reviewed-by: Li Wei <wei.g.li@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2523 racer: don't clobber DURATION
John L. Hammond [Sat, 16 Mar 2013 15:15:22 +0000 (10:15 -0500)]
LU-2523 racer: don't clobber DURATION

Don't clobber racer's DURATION if SLOW is not set to no.

Test-Parameters: envdefinitions=SLOW=yes \
 testlist=racer,racer,racer,racer,racer,racer

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I86c7f54e3fb7a06388d7083001a80b91d09ac27d
Reviewed-on: http://review.whamcloud.com/5741
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
7 years agoLU-2736 mdd: remote dir rename under the same directory.
wangdi [Wed, 8 Jan 2014 16:20:54 +0000 (08:20 -0800)]
LU-2736 mdd: remote dir rename under the same directory.

For "mv src_p/src_c tgt_p/tgt_c", src_c are remote directory,
if src_p and tgt_p are the same directory, and tgt_c does not exist,
this should be allowed, because all the modification will happen on
the single MDT where the parent is.

Test-Parameters: mdscount=2 mdtcount=2
Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ie7945933199648aadb9dfe68f00acf32a12c824f
Reviewed-on: http://review.whamcloud.com/5294
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
7 years agoLU-3026 llite: setattr to override permission check for owner
Jinshan Xiong [Wed, 3 Apr 2013 07:12:00 +0000 (00:12 -0700)]
LU-3026 llite: setattr to override permission check for owner

Otherwise, iozone creates no permission file and will fail at
mdd_permission check.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: If9f97aeebe0ff12b535dd3b4ce131eb8079c1b51
Reviewed-on: http://review.whamcloud.com/5924
Tested-by: Hudson
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2677 obdfilter: add LMA for all OST objects
Mikhail Pershin [Mon, 25 Mar 2013 17:04:51 +0000 (21:04 +0400)]
LU-2677 obdfilter: add LMA for all OST objects

- add LMA to all OST objects so OSD may work with all object
  uniformly
- remove ff_seq and ff_obj from filter_fid because LMA contains
  lma_self_fid already
- change tools to handle these changes

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Change-Id: I699470ef73684aa05d4375da864cda35e4d5541e
Reviewed-on: http://review.whamcloud.com/5838
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3065 osd: move delete agent inode before delete entry
wangdi [Mon, 13 Jan 2014 10:19:42 +0000 (02:19 -0800)]
LU-3065 osd: move delete agent inode before delete entry

Move deleting agent inode before deleting entry, so it will
not access the ino in bh after it has been freed.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: Ia750a0d2399717466d3e65865e5290ada60a7cb0
Reviewed-on: http://review.whamcloud.com/5884
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2008 utils: make label method work with mmp enabled target
Jian Yu [Wed, 3 Apr 2013 08:46:49 +0000 (16:46 +0800)]
LU-2008 utils: make label method work with mmp enabled target

Commit cfef795 introduced label methods for ldiskfs and zfs to
make mount utility update the label after the first successful
mount. However, if the MMP feature was enabled on an ldiskfs
target before mounting, running mount utility on the target
will fail at ldiskfs_label_lustre() because MMP will prevent
e2label from setting the label.

This patch fixes the above issue by using "tune2fs -f -L" to force
setting the label instead of using e2label in ldiskfs_label_lustre().

Test-Parameters: envdefinitions=SLOW=yes \
clientcount=4 osscount=2 mdscount=2 \
austeroptions=-R failover=true useiscsi=true \
testlist=mmp

Signed-off-by: Jian Yu <jian.yu@intel.com>
Change-Id: I6af753d1d841da6493402c153a695eb07ee7ce5b
Reviewed-on: http://review.whamcloud.com/5867
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3083 lmv: swap layout & DNE
jcl [Mon, 1 Apr 2013 22:42:01 +0000 (00:42 +0200)]
LU-3083 lmv: swap layout & DNE

If DNE is on, swap layout RPC must be send to the right MDT
and not always to MDT0

Signed-off-by: JC Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Change-Id: If57ef688c7628bb15a06e1ba6905d3154a204c8d
Reviewed-on: http://review.whamcloud.com/5912
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
7 years agoLU-2907 build: Infiniband HW kernel modules of OFA builds not started
frank [Thu, 7 Mar 2013 10:37:41 +0000 (02:37 -0800)]
LU-2907 build: Infiniband HW kernel modules of OFA builds not started

Nodes installed with rhel6 and ofa (external OFED) builds fail during
the node provisioning phase due to missing connectivity to the
Infiniband fabric. Reason is the fact that the HW kernel modules
mlx4_core, mlx4_en, mlx4_ib are not loaded (modprobe'd) during the
system boot phase.

For rhel5 an installation conflict of the startup script
'/etc/init.d/openibd' provided by the OFED kernel-ib RPM and a
distribution RPM (openibd) prohibited the installation of the
kernel-ib RPM. As a workaround the removal of the code sections
inside the SPEC file associated with the kernel-ib that provide the
packaging and configuration of the startup-script had been
implemented. This was accomplished when applying the ed script
'01-play-nice-with-RHEL5.ed' to the kernel-ib SPEC file.

The packaging structure of rhel6 has changed. The RPM opnenibd no
longer exist, therefore the startup of the HW kernel modules will
be missing for rhel6 and the symptom of missing connectivity
occurs.

The patch fixes the problem by searching (via regular expression) for
the canonical (distribution) target name within the name of ed
script and only apply the changes if the cananoical target matches
the ed script name.

ED scripts use a naming convention where the descriptive
name is followed by a ':' separated list of canonical target names.

eg.

<descriptive-name>:<canonical-target-1>:<canonical-target-N>.ed

The string 'canonical-target' has to follow the convention used
for varialble CANONICAL_TARGET in script lbuild.

The original ed file for rhel5 has been updated to a meaningful
name that complies with this new format.

Signed-off-by: frank <Frank.Heckes@intel.com>
Change-Id: Ib25071e08553d28764e02ce50756deb91f757ed0
Reviewed-on: http://review.whamcloud.com/5630
Reviewed-by: Minh Diep <minh.diep@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3057 llite: swap layout fixes - hang at sanity 56x and 56w
Jinshan Xiong [Tue, 2 Apr 2013 23:26:43 +0000 (16:26 -0700)]
LU-3057 llite: swap layout fixes - hang at sanity 56x and 56w

Two issues will be fixed in this patch:
1. in ll_swap_layouts, ll_setattr() should be called with inode mutex
   held.
2. stripes should be reloaded after layout is swapped on the MDT.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: Ibdb30c78bf8642886afc7343544d7db3bcbe6726
Reviewed-on: http://review.whamcloud.com/5874
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Hudson
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3039 lfsck: misc patch for LFSCK 1.5 debts (1)
Fan Yong [Mon, 25 Feb 2013 14:19:26 +0000 (22:19 +0800)]
LU-3039 lfsck: misc patch for LFSCK 1.5 debts (1)

1) Handle backup and restore case: add FID-in-dirent by re-insert
   the name entry with proper ldiskfs PDO lock processed.

2) Fix some deadlock cases between LFSCK engine thread and OI scrub
   thread: one may fall into wait without waking up the other.

3) lfsck performance test for the cases: lfsck with load, lfsck
   during create, backup/restore, simulate upgrade from 1.8.

4) Other cleanup.

Test-Parameters: testlist=sanity-scrub,sanity-lfsck,lfsck-performance

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: Ib539291c604d807475cacfdd56d910e9e86d6ac7
Reviewed-on: http://review.whamcloud.com/5764
Tested-by: Hudson
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2843 llog: restore the size of llog for -ENOSPC
Hongchao Zhang [Fri, 29 Mar 2013 10:36:55 +0000 (18:36 +0800)]
LU-2843 llog: restore the size of llog for -ENOSPC

the llog is NOT aware its valid size, and if there is some invalid
space in the tail of the llog file, which could be caused if the
free space in the last block(512 bytes) of the llog file can't hold
the next record but it failed to get an extra block for -ENOSPC,
then it will mistake the data written partially for normal llog data!

Change-Id: Ie2619843b538cbb64ae21f9f2a12ff85a5a3e8b4
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Reviewed-on: http://review.whamcloud.com/5604
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1812 fsfilt: ext_pblock renamed to ext4_ext_pblock
James Simmons [Sun, 31 Mar 2013 13:29:12 +0000 (09:29 -0400)]
LU-1812 fsfilt: ext_pblock renamed to ext4_ext_pblock

For kernels 2.6.35 and above ext_pblock was renamed to
ext4_ext_pblock. With no more RHEL5 kernel support for
ldiskfs we also clean up more macros.

see kernel commit bf89d16f6ef5389f1b9d8046e838ec87b9b3f8b9
for pblock change.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Change-Id: I3ce7f27f6fd6826380e6f2f54b2d50d09d36f78a
Reviewed-on: http://review.whamcloud.com/5001
Tested-by: Hudson
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1812 fsfilt: ext4_free_blocks() has changed slightly
James Simmons [Thu, 28 Mar 2013 17:52:54 +0000 (13:52 -0400)]
LU-1812 fsfilt: ext4_free_blocks() has changed slightly

ext4_free_blocks() now takes a buffer_head and explicit flags instead
of just metadata.  Test the presence of the buffer_head argument to
determine which ext4_free_blocks() is available.

see kernel commit e6362609b6c71c5b802026be9cf263bbdd67a50e

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Change-Id: I925df73a054613469866ec025ae412ead0ce9e56
Reviewed-on: http://review.whamcloud.com/4991
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Hudson
Reviewed-by: Christopher J. Morrone <chris.morrone.llnl@gmail.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1199 build: Cleanup ldiskfs kernel config defines
James Simmons [Sun, 31 Mar 2013 13:12:34 +0000 (09:12 -0400)]
LU-1199 build: Cleanup ldiskfs kernel config defines

Make it more clear what the purpose of the "CONFIG_LDISKFS_*"
defines are with a comment, and separate them from options
that do not necessarily originate in the kernel source. All of
these options are needed for ldiskfs and some are needed by
osd-ldiskfs in the lustre file system. To handle these shared
options a extra autoconf header file is create containing only
the values of interest to osd-ldiskfs.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Change-Id: I8e71b6cf471c4317b20fdce14d66f8b2883a226e
Reviewed-on: http://review.whamcloud.com/5675
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3043 build: init local variable that breaks sles11sp2 build
Bob Glossman [Wed, 27 Mar 2013 14:38:04 +0000 (07:38 -0700)]
LU-3043 build: init local variable that breaks sles11sp2 build

Fix a recently introduced build breakage by initing a local variable
in new code.

Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I3c9c1a864e2b529b6a6e2578d0bfbfe2c920688c
Reviewed-on: http://review.whamcloud.com/5854
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
7 years agoLU-3012 ubuntu: fix client module build
Robert Read [Fri, 22 Mar 2013 02:56:00 +0000 (19:56 -0700)]
LU-3012 ubuntu: fix client module build

Add --disable-servers to confgiure command when
building the client modules.

Signed-off-by: Robert Read <robert.read@intel.com>
Change-Id: Ie29bf9336c476dd0b8b04b4601d409c4bc7f90f1
Reviewed-on: http://review.whamcloud.com/5804
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Brian J. Murrell <brian.murrell@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2988 mgs: Fix two "lctl replace_nids" resource leaks
Li Wei [Tue, 19 Mar 2013 10:26:27 +0000 (18:26 +0800)]
LU-2988 mgs: Fix two "lctl replace_nids" resource leaks

When conf-sanity 66 was run on a single machine, it failed to remove
some Lustre kernel modules in the cleanup phase:

  Modules still loaded:
  ldiskfs/ldiskfs/ldiskfs.o lustre/mdd/mdd.o lustre/mgs/mgs.o
  lustre/quota/lquota.o lustre/mgc/mgc.o lustre/fid/fid.o
  lustre/fld/fld.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o
  lustre/lvfs/lvfs.o lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o
  libcfs/libcfs/libcfs.o

Some simple experiments quickly narrowed down the bad guy to the first
"lctl replace_nids" command.  In mgs_iocontrol(), the
OBD_IOC_REPLACE_NIDS case does not destroy the lu_env, which
references several Lustre kernel modules via the keys.  This patch
fixes the leak by replacing "RETURN(rc)" with "break".

Local testing revealed another issue, however, after the first issue
was fixed.  unload_modules() complained about a memory leak after
removing all modules:

  LustreError: 14530:0:(class_obd.c:701:cleanup_obdclass()) obd_memory
  max: 28770011, leaked: 18446744073709551608

The leaked number is clearly suspicious.  It turned out that
mgs_replace_nids_log() frees more memory, with regard to accounting,
than it allocates.  This patch also fixes the fake leak.

To prevent regressions, this patch adds error checking to the
cleanup() call in conf-sanity 66.

Change-Id: Ia3b1531b558a2a12947ff9a783b383962ae5da78
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/5765
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Artem Blagodarenko <artem_blagodarenko@xyratex.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2467 ptlrpc: Fix an unideal symbol export
Li Wei [Tue, 5 Feb 2013 07:30:40 +0000 (15:30 +0800)]
LU-2467 ptlrpc: Fix an unideal symbol export

The "suppress_pings" symbol is a little bit too short to export.  This
patch exports a function, ptlrpc_pinger_suppress_pings(), instead.

Change-Id: Ifdcf5d2459baa7ae2709572a2fd4d02b72e440df
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-on: http://review.whamcloud.com/5270
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3011 ubuntu: Fix build failures on Ubuntu 12.04
Robert Read [Fri, 22 Mar 2013 18:32:13 +0000 (11:32 -0700)]
LU-3011 ubuntu: Fix build failures on Ubuntu 12.04

Fix "set-but-unused" warning by using the variable,
and move -lreadline to end of link command line.

Signed-off-by: Robert Read <robert.read.@intel.com>
Change-Id: I676e319ed81dbb6ba41d039e7b075b02d5122b48
Reviewed-on: http://review.whamcloud.com/5803
Reviewed-by: Alexey Shvetsov <alexxy@gentoo.org>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1330 obdclass: split client-server mount routines
Liu Xuezhao [Mon, 17 Dec 2012 15:31:26 +0000 (23:31 +0800)]
LU-1330 obdclass: split client-server mount routines

Move server side mount routines to obd_mount_server.c.  Const correct
several server_name2xxx type functions.

Signed-off-by: Liu Xuezhao <xuezhao.liu@emc.com>
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I8abdb6fdd0411f2e75f6fb6ee4ff8502e50ef213
Reviewed-on: http://review.whamcloud.com/2672
Tested-by: Hudson
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Peng Tao <bergwolf@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-398 ptlrpc: NRS framework follow-up patch
Nikitas Angelinas [Fri, 1 Feb 2013 10:58:58 +0000 (10:58 +0000)]
LU-398 ptlrpc: NRS framework follow-up patch

This patch addresses some outstanding issues that had been raised
by reviewers of the "Add the NRS framework and FIFO policy" patch,
and include some other improvements, e.g. it reworks the API
slightly in order to optimize some frequently-used operations,
does not uanncessarily policies in liblustre and client-only
kernel builds, and makes sure we hold module references when
required, for policies registering from other modules.

Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Change-Id: I9306d43e2aef20aa64d6870a56ae99859ce40cd5
Oracle-bug-id: b=13634
Xyratex-bug-id: MRP-73
Reviewed-on: http://review.whamcloud.com/5274
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2442 kernel: protect i_dquot with i_lock
Lai Siyao [Wed, 9 Jan 2013 07:57:47 +0000 (15:57 +0800)]
LU-2442 kernel: protect i_dquot with i_lock

Remove dqptr_sem (but kept in struct quota_info to keep kernel ABI
unchanged), and the functionality of this lock is implemented by
other locks:
* i_dquot is protected by i_lock, however only this pointer, the
  content of this struct is by dq_data_lock.
* Q_GETFMT is now protected with dqonoff_mutex instead of dqptr_sem.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I6be343fb7e431bb3b0ce68066a36f621ebdd9df5
Reviewed-on: http://review.whamcloud.com/5010
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2189 tests: ensure sync stability before sanityn/36
Nathaniel Clark [Thu, 14 Mar 2013 18:41:28 +0000 (14:41 -0400)]
LU-2189 tests: ensure sync stability before sanityn/36

Encourage data to be fully sync'd before getting new measurements.

Test-Parameters: testlist=sanityn  mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I42be5714b59cd4c22c65d7524f2b71ec0a07dfa4
Reviewed-on: http://review.whamcloud.com/5722
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2790 ldlm: handle lvbo_init failure in ldlm_resource_get
Fan Yong [Tue, 19 Feb 2013 00:42:58 +0000 (08:42 +0800)]
LU-2790 ldlm: handle lvbo_init failure in ldlm_resource_get

Under some special cases, such as RAM pressure, lvbo_init()
may be failed, then the caller - ldlm_resource_get() should
handle the failure to prevent subsequent operations to use
non-exist resource.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I0eabbf5daaaba9aa163a45f24b6b621477ec4d32
Reviewed-on: http://review.whamcloud.com/5699
Tested-by: Hudson
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3068 build: fix 'incorrect expression' errors
Sebastien Buisson [Fri, 29 Mar 2013 13:30:48 +0000 (14:30 +0100)]
LU-3068 build: fix 'incorrect expression' errors

Fix 'program hangs' defects found by Coverity version 6.5.1:
Array compared against 0 (NO_EFFECT)
Comparing an array to null is not useful.
Copy-paste error (COPY_PASTE_ERROR)
This line looks like a copy-paste error.
Self assignment (NO_EFFECT)
Assignment operation has no effect.
Side effect in assertion (ASSERT_SIDE_EFFECT)
Assignment has a side effect. This code will work differently in a
non-debug build. You might have intended to use a comparison instead.
Wrong sizeof argument (SIZEOF_MISMATCH)
Passing argument is suspicious.

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Change-Id: Icf6ea9632da6159beca0fd9fd3ff9bb57effc305
Reviewed-on: http://review.whamcloud.com/5887
Tested-by: Hudson
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3066 llite: fix crash in fdatasync(directory)
Dmitry Eremin [Fri, 29 Mar 2013 11:22:21 +0000 (15:22 +0400)]
LU-3066 llite: fix crash in fdatasync(directory)

kernel NULL pointer dereference for fdatasync(directory)

fd = open("/mnt/lustre", O_RDONLY|O_NONBLOCK|O_DIRECTORY);
fdatasync(fd);
close(fd);

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Change-Id: Ib14b25d1694131e1a65373654008b7f337ce959e
Reviewed-on: http://review.whamcloud.com/5886
Tested-by: Hudson
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3035 clio: improper LASSERT in ll_file_io_generic()
Niu Yawei [Thu, 28 Mar 2013 03:20:57 +0000 (23:20 -0400)]
LU-3035 clio: improper LASSERT in ll_file_io_generic()

This LASSERT was introduced from the fix of LU-2910, and
which is incorrect, since the crw_count could be changed
in lov_io_rw_iter_init() even if no any read/write done.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Ib2cee88f0a7f75f8fb63330912aa053fb5b9393e
Reviewed-on: http://review.whamcloud.com/5864
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Hudson
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2909 osc: flush until no dirty in osc_enter_cache()
Niu Yawei [Fri, 15 Mar 2013 07:48:43 +0000 (03:48 -0400)]
LU-2909 osc: flush until no dirty in osc_enter_cache()

In osc_enter_cache(), when there is high contention on grant, the
returned grant could be consumed by other process immediately, then
we should repeat flush until get enough grant or no dirty to be
flushed. Otherwise, mmap writer could easily get -EDQUOT on the
osc_enter_cache() and result in SIGBUS at the end.

Because we now changed to async flush in osc_enter_cache(), the
wakeup condition is changed accordingly.

This patch also temporarily disabled osc lru shrinker in
osc_io_unplug() to avoid the potential stack overrun problem.
See LU-2859.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: I0c7c90ffe27dab6ded7ad07ed78017acb8665d59
Reviewed-on: http://review.whamcloud.com/5732
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2388 statahead: race in do_sa_entry_fini()
Lai Siyao [Tue, 26 Mar 2013 08:12:11 +0000 (16:12 +0800)]
LU-2388 statahead: race in do_sa_entry_fini()

Two fixes:
* When iterating sa_entry list in do_sa_entry_fini(), there is
  no lock, as may cause one entry put twice. To fix this, all
  entries are put in one list, and only 'scanner' will drop
  entry from this list.
* sa_entry may be linked to sai_sent_entries, but not hashed
  yet, if ll_sa_entry_fini() is called at this moment, this
  sa_entry may be unhashed.

Also include minor cleanup:
* rename do_sai_entry_fini() to do_sa_entry_fini().
* rename do_sai_entry_to_stated() to do_sa_entry_to_stated().
* rename do_statahead_interpret() to ll_post_statahead() to
  distinguish from ll_statahead_interpret().
* ll_post_statahead() always post handle statahead from received
  list to simplify logic.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Change-Id: I3d0911d0bd3b940c9650473099604646408200c4
Reviewed-on: http://review.whamcloud.com/5842
Tested-by: Hudson
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Peng Tao <bergwolf@gmail.com>
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-1199 build: Relocate missed lbuild-fc18
Christopher J. Morrone [Thu, 28 Mar 2013 17:29:25 +0000 (10:29 -0700)]
LU-1199 build: Relocate missed lbuild-fc18

lbuild-fc18 was missed in the move of the other lbuild files
to contrib/lbuild.  This relocates that file as well.

Change-Id: I2775f1e0aa4c7d17d2e1d8a114f2bea3702fec68
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Reviewed-on: http://review.whamcloud.com/5872
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3024 test: use facet name in init_param_vars
John L. Hammond [Mon, 25 Mar 2013 16:15:18 +0000 (11:15 -0500)]
LU-3024 test: use facet name in init_param_vars

When configuring jobstats init_parm_vars should call
set_conf_param_and_check with the facet name (client) rather then the
hostname.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Change-Id: I00b99ec3172d3e108fb6dd6d94825badf81e84df
Reviewed-on: http://review.whamcloud.com/5835
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2826 tests: Re-enable sanity/18
Nathaniel Clark [Thu, 14 Mar 2013 18:28:32 +0000 (14:28 -0400)]
LU-2826 tests: Re-enable sanity/18

Due to ZFS work in LU-2449, sanity/18 should now work correctly.

Test-Parameters: mdtfilesystemtype=zfs mdsfilesystemtype=zfs  ostfilesystemtype=zfs testlist=sanity
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I5d9d986b00a36a2b5f30ef9b4ff3d52779070498
Reviewed-on: http://review.whamcloud.com/5720
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-2900 llite: Null deref in ll_fsync:mkdir on NFSmounted Lus
Patrick Farrell [Mon, 18 Mar 2013 16:09:46 +0000 (11:09 -0500)]
LU-2900 llite: Null deref in ll_fsync:mkdir on NFSmounted Lus

When a Lustre file system is mounted via NFS and a mkdir
operation is attempted, a null pointer dereference occurs
in ll_fsync.

With 2.x, Lustre added support for different VFS fsync APIs
that do not include a dentry parameter.

To make the logic the same in all cases, the old ll_fsync
interface was changed to pull the inode from f_dentry
in the *file parameter.

In some cases when using the old ll_fsync interface, the
caller does not set the f_dentry part of the *file parameter
resulting in a NULL dereference. The fix to this is to
restore the old logic in those cases: when a dentry
parameter is provided, get the inode from that parameter
rather than the file parameter.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I93ecf04e23121c76e571383d74e2fc902565614e
Reviewed-on: http://review.whamcloud.com/5585
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Cory Spitz <spitzcor@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
7 years agoLU-3038 mgs: clear mti in the end of marker
wangdi [Tue, 26 Mar 2013 19:56:43 +0000 (12:56 -0700)]
LU-3038 mgs: clear mti in the end of marker

The target information should be cleared when it met
the end marker of each mdc/osc, otherwise, the information
left over in mti will be brought into the config log of
next target, even worse some target might be identfied
wrongly as failover pair.

Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I67838914825caf0a9b4da8bc3cdfbb779a6eadd4
Reviewed-on: http://review.whamcloud.com/5851
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>