git://git.whamcloud.com - fs/lustre-release.git/log

LU-12200 lnet: check peer timeout on a router

On a router assume that a peer is alive and attempt to send it
messages as long as the peer_timeout hasn't expired.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0806a52c8ad7acc1c93dcf32353f1c4467c618b1
Reviewed-on: https://review.whamcloud.com/34772
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-12053 lnet: look up MR peers routes

An MR peer can have multiple interfaces some of which we might
have a route to. The primary NID of the peer might not necessarily
specify a NID we have a route to. When looking up a route, we must
iterate over all the nets the peer is on and select the one which
we can route to. Taking into consideration the peer can exist on
multiple routed networks we also have a simple round robin algorithm
to iterate over all the networks we can reach the peer on.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0651dd4f732c8b71872f73cf2512b08f34129bd9
Reviewed-on: https://review.whamcloud.com/34625
Tested-by: Jenkins

LU-11299 lnet: discover each gateway Net

Wakeup every gateway aliveness interval / number of local networks.
Discover each local gateway network in round robin.

This is done to make sure the gateway keeps its networks up.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehat <ashehata@whamcloud.com>
Change-Id: I4035e39c286cb599d4eb8f9df7ed5d278e6d744a
Reviewed-on: https://review.whamcloud.com/34511
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>

LU-11299 lnet: net aliveness

If a router is discovered on any interface on the network, then
update the network last alive time and the NI's status to UP.
If a router isn't discovered on any interface on a network,
then change the status of all the interfaces on that network to down.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I1d67eb4b3284ccb8306ad4c877a2fcbdf4958d8c
Reviewed-on: https://review.whamcloud.com/34510
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11664 lnet: push router interface updates

A router can bring up/down its interfaces if it hasn't received any
messages on that interface for a configurable period
(alive_router_ping_timeout). When this even occures the router can now
push its status change to the peers it's talking to in order to inform
them of the change in its status. This will allow the router users to
handle asym router failures quicker.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I9530ed7d9bc0a86edc43e3f610cc943f1732dcfd
Reviewed-on: https://review.whamcloud.com/33651
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11297 lnet: set gw sensitivity from lnetctl

Allow an optional parameter from the:
lnetctl route add
command to set the health sensitivity of the gateway
lnetctl route add --net <net> --gateway <gw> --sensitivity <value>

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iee120c78a41b79da6ab6bdf1560f558df89233e2
Reviewed-on: https://review.whamcloud.com/33635
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11297 lnet: handle router health off

Routing infrastructure depends on health infrastructure to manage
route status. However, health can be turned off. Therefore, we need
to enable health for gateways in order to monitor them properly.
Each peer now has its own health sensitivity. When adding a route
the gateway's health sensitivity can be explicitly set from lnetctl
or if not specified then it'll default to 1, thereby turning health
on for that gateway, allowing peer NI recovery if there is a failure.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibae33d595e97d0eec432ae8f5d51898ce0776f01
Reviewed-on: https://review.whamcloud.com/33634
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11641 lnet: handle discovery off

When discovery is turned off locally or when the peer either has
discovery off or doesn't support MR at all then degrade discovery
behavior to a standard ping. This will allow routers to continue
using discovery mechanism even if it's turned off.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7f0829d37cbff2bf9e41de251efa715fc4c97e5d
Reviewed-on: https://review.whamcloud.com/33620
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11470 lnet: drop all rule

Add a rule to drop all messages arriving on a specific interface.
This is useful for simulating failures on a specific router interface.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic69f683fb2caf7a69a1d85428878c89b7b1ee3ad
Reviewed-on: https://review.whamcloud.com/33305
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11478 lnet: misleading discovery seqno.

There is a sequence number used when sending discovery messages. This
sequence number is intended to detect stale messages. However it
could be misleading if the peer reboots. In this case the peer's
sequence number will reset. The node will think that all information
being sent to it is stale, while in reality the peer might've
changed configuration.

There is no reliable why to know whether a peer rebooted, so we'll
always assume that the messages we're receiving are valid. So we'll
operate on first come first serve basis.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I421a00e47bc93ee60fa37c648d6d9a726d9def9c
Reviewed-on: https://review.whamcloud.com/33304
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11477 lnet: handle health for incoming messages

In case of routers (as well as for the general case) it's important to
update the health of the ni/lpni for incoming messages. For an lpni
specifically when we receive a message is when we know that the lpni
is up.

A percentage router health is required in order to send a message to a
gateway. That defaults to 100, meaning that a router interface has to
be absolutely healthy in order to send to it. This matches the current
behavior. So if a router interface goes down an its health goes down
significantly, but then it comes back up again; either we receive a
message from it or we discover it and get a reply, then in order to
start using that router interface again we have to boost its health
all the way up to maximum.

This behavior is special cased for routers.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ida6c23f95dbef56c2e6ed7b6d03743939d8b30a0
Reviewed-on: https://review.whamcloud.com/33301
Tested-by: Jenkins

LU-11475 lnet: transfer routers

When a primary NID of a peer is about to be deleted because
it's being transfered to another peer, if that peer is a gateway
then transfer all gateway properties to the new peer.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ib475c389ca5630906416a5112b3088f6f5d03950
Reviewed-on: https://review.whamcloud.com/34539
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11475 lnet: allow deleting router primary_nid

Discovery doesn't allow deleting a primary_nid of a peer. This
is necessary because upper layers only know to reach the peer by
using the primary_nid. For routers this is not the case. So
if a router changes its interfaces and comes back up again, the
peer_ni should be adjusted.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I9da056172f35a5f15eed5ba0e02fcb37ac414c54
Reviewed-on: https://review.whamcloud.com/33300
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11300 lnet: consider alive_router_check_interval

Consider router_check_interval when waking up the monitor thread,
to make sure you wakeup the monitor thread at the earliest possible
time.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibc4b53886b59a9bc174a29d0da711ac77db3a62c
Reviewed-on: https://review.whamcloud.com/33298
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins

LU-11378 lnet: MR aware gateway selection

When selecting a route use the Multi-Rail Selection algorithm to
select the best available peer_ni of the best route. The selected
peer_ni can then be used to send the message or to discover it
if the gateway peer needs discovering.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I376af57611591eed2eb1edb80a1b3a68b5aefd19
Reviewed-on: https://review.whamcloud.com/33188
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11299 lnet: use discovery for routing

Instead of re-inventing the wheel, routing now uses discovery.
Every router interval the router is discovered. This will
update the router information locally and will serve to let the
router know that the peer is alive.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I211bf15af0b0a5d50f9e2a69a385419a1dd5096b
Reviewed-on: https://review.whamcloud.com/33454
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11299 lnet: modify lnd notification mechanism

LND notifies when a peer is up or down. If the LND notifies
LNet that the peer is up and sets the "reset" flag to true
then this indicates to LNet that the LND knows about the health
of the peer and is telling LNet that the peer is fully healthy.
LNet will set the health value of the peer to maximum, otherwise
it will increment the health by one.

If the LND notifies the LNet that the peer is down, LNet will
decrement the health of the peer by sensitivity value configured.

LNet then turns around and rechecks the peer aliveness and if its
dead it'll notify the LND. This code is only used by the socklnd
because it needs to tear down connections. This is in keeping with
the original functionality.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifa614405fb0c2cd4f6bcb1a2a97e856320eb6cbe
Reviewed-on: https://review.whamcloud.com/33453
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins

LU-11299 lnet: Cleanup rcd

Cleanup all code pertaining to rcd, as routing code will use
discovery going forward and there will be no need to keep its own
pinging code.

test_215 looks at the routers file which had its format changed.
Update the test to reflect the change.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If31caa3b5703df40b6ae0f758f2fe764991aa4f3
Reviewed-on: https://review.whamcloud.com/33187
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins

LU-11300 lnet: simplify lnet_handle_local_failure()

Pass the struct lnet_ni to lnet_handle_local_failure() instead of the
message structure, since nothing else from the message is being
used. This also makes symmetrical with lnet_handle_remote_failure()

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I10146ec5bf5f378e28a7725382f00132ada32c6e
Reviewed-on: https://review.whamcloud.com/33452
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins

LU-11300 lnet: router aliveness

A route is considered alive if the gateway is able to route
messages from the local to the remote net. That means that
at least one of the network interfaces on the remote net of
the gateway is viable.

Introduced the concept of sensitivity percentage. This defaults
to 100%. It holds a dual meaning:
1. A route is considered alive if at least one of the its interfaces'
health is >= LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage
100 means at least one interface has to be 100% healthy
2. On a router consider a peer_ni dead if its health is not at least
LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage.
100% means the interface has to be 100% healthy.

Re-implemented lnet_notify() to decrement the health of the
peer interface if the LND reports a failure on that peer.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie97561fb70bf6a558bc90fa9266a6ba38fa3d293
Reviewed-on: https://review.whamcloud.com/33185
Tested-by: Jenkins

LU-11300 lnet: peer aliveness

Peer NI aliveness is now solely dependent on the health
infrastructure. With the addition of router_sensitivity_percentage,
peer NI is considered dead if its health drops below the percentage
specified of the total health. Setting the percentage to 100% means
that a peer_ni is considered dead if it's interface is less than
fully healthy.

Removed obsolete code that queries the peer NI every second since
the health infrastructure introduces the recovery mechanism which
is designed to recover the health of peer NIs.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I506060fbb66c74295808891b689d7d634dc69284
Reviewed-on: https://review.whamcloud.com/33186
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-11300 lnet: Cache the routing feature

When processing a REPLY or a PUSH for a discovery cache the
whether the routing feature is enabled or disabled as
reported by the peer.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I69bd41fade196773af0e1004c2e7fff2fb91392d
Reviewed-on: https://review.whamcloud.com/33451
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11300 lnet: cache ni status

When processing the data in the PUSH or the REPLY make sure to cache
the ns_status. This is the status of the peer_ni as reported by the
peer itself.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I14de2460f578fb7f47d329a97b8833f49c569b74
Reviewed-on: https://review.whamcloud.com/33450
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-11300 lnet: configure lnet router senstivity

Allow the configuration of router_sensitivity_percentage from the
user space utility lnetctl

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If5440f30881361ebb06dafa9cadb7cbc2b934f93
Reviewed-on: https://review.whamcloud.com/33455
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-11300 lnet: router sensitivity

Introduce the router_sensitivity_percentage module parameter to
control the sensitivity of routers to failures. It defaults to 100%
which means a router interface needs to be fully healthy in order
to be used.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3e9333033f049918c1cdca58a72604c71884acbe
Reviewed-on: https://review.whamcloud.com/33449
Tested-by: Jenkins
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>

LU-11551 lnet: Do not allow deleting of router nis

Check the peer before deleting a peer_ni. If it's a router then do
not allow deletion of the peer-ni.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I372052b4e9b5af3a8f18a49676fc60b4c8077cbd
Reviewed-on: https://review.whamcloud.com/33448
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-11299 lnet: lnet_add/del_route()

Reimplemented lnet_add_route() and lnet_del_route() to use
the peer instead of the peer_ni.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3734098a81ab18d1d74220c691d96a9b9817e6da
Reviewed-on: https://review.whamcloud.com/33184
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>

LU-11298 lnet: use peer for gateway

The routing code uses peer_ni for a gateway. However with Mulit-Rail
a gateway could have multiple interfaces on several different
networks. Instead of using a single peer_ni as the gateway we should
be using the peer and let the MR selection code select the best
peer_ni to send to.

This patch moves the gateway from peer to peer_ni. Much of the
code needs to be rewritten in the following patches to account
for that change. This patch disables the routing features by
disabling the code to add/delete routes.

The asymmetric routing detection feature is also modified to
use the MR routing

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia7dab552268c4a7fbd7b88122b9a95363d155fd7
Reviewed-on: https://review.whamcloud.com/33183
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-11292 lnet: Discover routers on first use

Discover routers on first use. This brings the behavior when
interacting with routers in line with when dealing with normal
peers.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8527e41daf2f5f6ab5f04aac1285aaa6cc4ee594
Reviewed-on: https://review.whamcloud.com/33182
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>

LU-10153 lnet: remove route add restriction

Remove restriction with adding routes to the same remote network
via two different gateways.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iefc5aa10f73e9e7bdd283f5e933fbb8ee819df50
Reviewed-on: https://review.whamcloud.com/33447
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins

LU-12339 lnet: select LO interface for sending

In the following scenario

Lustre->LNetPrimaryNID with 0@lo
Discover is initiated on 0@lo
The peer is created with 0@lo and <addr>@<net>
The interface health of the peer's <addr>@<net> is decremented
LNetPut() to self
selection algorithm selects 0@lo to send to

This exposes an issue where we try and go through the peer credit
management algorithm, but because there are no credits associated with
0@lo we end up indefinitely queuing the message. ptlrpc will then get
stuck waiting for send completion on the message.

This was exposed via conf-sanity 32a

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I98e9d3428b594a0d041d27d8e8d8de7596825edc
Reviewed-on: https://review.whamcloud.com/34957
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-12199 lnet: verify msg is commited for send/recv

Before performing a health check make sure the message
is committed for either send or receive. Otherwise we
can just finalize it.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id7bd956f8e81e60a2d63059730973f851d4c7abe
Reviewed-on: https://review.whamcloud.com/34797
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>

LU-12199 lnet: Ensure md is detached when msg is not committed

It's possible for lnet_is_health_check() to return "true" when the
message has not hit the network. In this situation the message is
freed without detaching the MD. As a result, requests do not receive
their unlink events and these requests are stuck forever.

A little cleanup is included here:
- The value of lnet_is_health_check() is only used in one place, so
   we don't need to save the result of it in a variable.
- We don't need separate logic to detach the md when the send was
   successful. We'll fall through to the finalizing code after
   incrementing the health counters

Test-Parameters: forbuildonly
Cray-bug-id: LUS-7239
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I6301d491090b862d016eed3aac8afd7be8685e57
Reviewed-on: https://review.whamcloud.com/34885
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>

LU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock

Protect the peer discovery queue from concurrent manipulation by
acquiring the lp_lock.

Test-Parameters: forbuildonly
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: If43b877c1c7ea203f346a3d6ea846f00b8f9661f
Reviewed-on: https://review.whamcloud.com/34798
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>

LU-12254 lnet: correct discovery LNetEQFree()

The EQ needs to be freed after all the queues are cleaned to avoid
having non-processed events on the event queue on free. This will
prevent the memory from being freed.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie38ec25e09bf6d7cf2aadc30edd91d298897c51b
Reviewed-on: https://review.whamcloud.com/34796
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-12249 lnet: fix list corruption

In shutdown the resend queues are cleared and freed. The monitor
thread state is set to shutdown. It is possible to get lnet_finalize()
called after the queues are freed. The code checks for ln_state to see
if we're shutting down. But in this case we should really be checking
ln_mt_state. The monitor thread is the one that matters in this case,
because it's the one which allocates and frees the resend queues.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia077cec7a52ef5cd2e1b231437c6265ba9416b1b
Reviewed-on: https://review.whamcloud.com/34778
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-11297 lnet: invalidate recovery ping mdh

For cleanliness, ensure that recovery ping mdh is invalidated when
an peer ni or a local ni are allocated

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If06448b1602b3680831244923b6b982a555159ea
Reviewed-on: https://review.whamcloud.com/34771
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-12201 lnet: detach response tracker

We need to unlink the response tracker from MDs even if the
corresponding message failed to send.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4f320274576790e3332f66f30aad5c2b3450b955
Reviewed-on: https://review.whamcloud.com/34770
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-12163 lnet: fix cpt locking

In lnet_select_pathway() the call to lnet_handle_send_case_locked()
can result in sd_cpt being changed. If this function returns
REPEAT_SEND, we'll go back to the again label. It is possible at
this time to initiate discovery, which will unlock the cpt.
If the local cpt isn't updated we could potentially be manipulating
the wrong cpt resulting in some form of corruption or dead lock.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifd39b0d84f8cce859151f7cc900a082481dd7218
Reviewed-on: https://review.whamcloud.com/34607
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-11816 lnet: setup health timeout defaults

Enable health feature by default.
Setup transaction timeout to a default 10 seconds and
retry count to 3 when health is enabled. When health
is disabled set default transaction timeout to 50.
When toggling between health enabled/disabled the defaults
will always kick in.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I153c2822898b44e33871ec827de7e61f153bb1db
Reviewed-on: https://review.whamcloud.com/34252
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>

LU-12344 lnet: handle remote health error

When a peer is dead set the health status to REMOTE_DROPPED
in order to handle health properly for the peer.
When dropping a routed message set REMOTE_ERROR. Routed messages
are dropped when the routing feature is turned off which could
be considered a configuration error if it happens in the middle
of traffic. Therefore, it's better to flag this issue at this
point without resending the message.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I131263215a68fc8607582643a47007ce4d04abbc
Reviewed-on: https://review.whamcloud.com/34967
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>

LU-12080 lnet: clean mt_eqh properly

There is a scenario where you have a peer on your recovery queue
that's down. So you keep pinging it, but every ping times out
after 10 seconds. In the middle of these 10 seconds you perform a
shutdown. First you try to do the rsp_tracker_clean. It goes through
and calls MDUnlink on the MD related to that ping. But because the
message has a ref count on the MD, it doesn't go away. The MD gets
zombied. And just waits for lnet_md_unlink to be called in
lnet_finalize(). Then you hit clean_peer_ni_recovery. We see the peer
on the queue, we try to call Unlink on it, but when we lookup the
MD using lnet_handle2md() we can't find it. Afterwards we try to clean
up the EQ and it asserts. Even if we remove the assert we end up with
a resource leak since the EQ is not actually freed since we won't call
LNetEQFree() again.

The solution is to pull the EQ create in the LNetNIInit() and deletion
happens in lnet_unprepare. By this point all the remaining messages
would've been finalized and all references on the EQ are gone,
allowing us to clean it up properly

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7fd6018ee2e57f82c649fc3658352e89a4309986
Reviewed-on: https://review.whamcloud.com/34477
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-12080 lnet: recovery event handling broken

Don't increment health on unlink event.
If a SEND fails an unlink will follow so no need to do any
special processing on SEND event. If SEND succeeds then we
wait for the reply.
When queuing a message on the NI recovery queue only do so
if the MT thread is still running.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4877caebcac5cdfc35a59a18a3e3451b1f23cb0d
Reviewed-on: https://review.whamcloud.com/34445
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins

LU-12279 lnet: use number of wrs to calculate CQEs

Using concurrent sends to calculate the number of CQEs results
in a small number of CQEs which exposes an issue where under
failure scenarios, example when a node reboots, there wouldn't
be enough CQEs available leading to IB_EVENT_QP_FATAL

Fixes: 83e45ead69ba ("LU-11931 lnd: bring back concurrent_sends")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I6e2be079e11622b83fe3fb4fdb695f5a2672c9ac
Reviewed-on: https://review.whamcloud.com/34945
Tested-by: Jenkins
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>

LU-12131 tests: fix SSK handling in tests

SSK can be activated for Lustre tests by setting SHARED_KEY env
variable to true.
In setup_all() an additional env variable SK_MOUNTED is used to avoid
mounting an SSK file system twice. But this variable has to be set
back to false in stopall() for consistency.
Some tests are incompatible with SSK, so skip them in case SHARED_KEY
is true. Some other tests playing with nodemaps have to take SSK into
account.

Whamcloud-bug-id: ATM-1283
Test-Parameters: clientselinux testlist=sanity,recovery-small,sanity-selinux
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Test-Parameters: envdefinitions=SHARED_KEY=true clientselinux testlist=sanity,recovery-small,sanity-selinux,sanity-sec
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1016a459c42ffed1ab2b6f67d0a145ed2af9fa40
Reviewed-on: https://review.whamcloud.com/34521
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11851 ldiskfs: reschedule for htree thread.

Thread may be waken inproperly in htree code. This patch
reschedule thread to keep locking correct.

Change-Id: I6a8d1bbc0470b2577ca80faa304eb06f7913c218
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34160
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12225 obdclass: improve jobid memory reclaim policy

jobid_should_free_item() will be called in following three
cases to decide whether @pidmap should be deleted from hash list:

1) expire normal timeout and memory reclaimer called to
try free some items.

2) admin echo sys interface to free some jobid.

3) Umount client to free all memory.

For case 2 && 3, it makes sense we always return 1,
add a warn_on in case3 to make sure there isn't any
bug in the codes.

For the case1, we could change policy a bit to not
return 1 if reference count of @pidmap is larger than 1,
a common case is a newly added @pidmap is easily freed
from hash list with current policy.

Actually, even we delete @pidmap from hash list, memory
will be eventually freed with its references count reached
1, and it is very likely we deleted and inserted @pidmap
again since this could be a hot and long runtime job.

Change-Id: I61b894a900319953d5a3369bee69bda050102129
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34775
Tested-by: Jenkins
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11233 utils: fix build warnings for gcc8

Quiet new build warnings that appear with GCC8, mainly related
to the length of string buffers not being long enough (in theory)
for the maximum possible string sizes, even if this never actually
is possible in practice.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I83a955fc68f3e03fe84622ddf1cedfb30d5916ac
Reviewed-on: https://review.whamcloud.com/34662
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12314 tests: Add Missing Description to sanity test 258a

This patch adds missing test description to sanity test 258a.

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I972549cd049b965c9e6da9b43aa245bab875a77a
Reviewed-on: https://review.whamcloud.com/34902
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>

LU-12270 o2iblnd: pci_unmap_addr() removed in 4.19

Since kernel 4.19 the pci_unmap_addr() wrappers have
been removed, along with linux/pci-dma.h
We can use the good old DEFINE_DMA_UNMAP_ADDR instead
of DECLARE_PCI_UNMAP_ADDR.

Linux-commit: 18b01b16e8bae9cd227909f6e6d2783d74855f65

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I387bd3d1c4e8c3bc75400ce1be05132fb25f8a50
Reviewed-on: https://review.whamcloud.com/34827
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12236 gss: remove unused code in gss_svc_upcall.c

Delete rsc_flush() related functions which are never
used.

Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec
Change-Id: Iedd6339b5fafdea81147c83e5f0499fa3ad60251
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/34794
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12206 mdt: mdt_init0 failure handling

When mdt_init0 fails it has to wait until zombie workqueue has all
disconnected exports destroyed before mdt_device_alloc will free the
mdt_device. Otherwise, zombie workqueue refers to freed mdt_device
via:
  general protection fault: 0000 [#1] SMP
  ..
  Workqueue: obd_zombid obd_zombie_exp_cull [obdclass]
  ..
  [<ffffffffc08829c5>] tgt_client_free+0x1e5/0x3c0 [ptlrpc]
  [<ffffffffc0ec2327>] mdt_destroy_export+0x57/0x200 [mdt]
  [<ffffffffc05bf20e>] class_export_destroy+0xee/0x490 [obdclass]
  [<ffffffffc05bf5c5>] obd_zombie_exp_cull+0x15/0x20 [obdclass]
  [<ffffffff93ab1d2f>] process_one_work+0x17f/0x440

- mdt_init0
  call to target_recovery_fini is moved so that it is called on every
  failure after successful tgt_init.

  obd_zombie_barrier is to be called after
  target_recovery_fini->class_disconnect_exports

  obd->obd_fail is set so that mdt_export_cleanup->tgt_client_del did
  not clear client's slot in last_rcvd in case of server start failure

- mdt_quota_init
  class_manual_clean does class_detach, goto is added to avoid
  repeated call to class_detach

- qmt_device_init0
  start qmt rebalance thread with SVC_STARTING flag so that
  qmt_start_reba_thread waited until the thread has started.
  Otherwise, qmt_device may get freed before qmt rebalance thread is
  stopped

Tests for failures during mdt_init0 are added
- conf-sanity.sh:test_5i leads to general protection fault
- conf-sanity.sh:test_5h causes
  rmmod: ERROR: Module mdt is in use

Cray-bug-id: LUS-2403
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Test-Parameters: trivial testlist=conf-sanity envdefinitions=ONLY=5
Change-Id: Ic9dc9e167f6c2e47a5f97e59b5bd26c5231c23ce
Reviewed-on: https://review.whamcloud.com/34724
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11771 ldlm: use hrtimer for recovery to fix timeout messages

Currently the functions target_handle_connect/reconnect show
incorrect timeout to the end of recovery:

fs1-OST0000: Recovery already passed deadline 71578:57.
If you do not want to wait more, please abort the recovery by force.
...
fs1-OST0000: Denying connection for new client ...
(1 recovered, 11 in progress, and 1 evicted) to recover in 71578:57

This is due to the assumption that the time returned by the
monotonic clock and jiffies was initialized at the same time but
that is not the case. So a compare between ktime_get_seconds()
and jiffies converted to seconds is invalid.

We solve this by replacing the recovery timer with a hrtimer based
one. Their are many benefits to using a hrtimer over jiffies like
better scaling, power profile, and better handling on tickless
system. This also makes the code clear by using just the real wall
clock in all cases.

Change-Id: I9d7e7e92e67ee942bc1dc51fbb0af7d8f53e54e1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34710
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>

LU-11838 llite: address_space ->page_tree renamed ->i_pages

kernel 4.17 renamed address_space renamed ->page_tree to ->i_pages,
and switched to xa_lock on the radix_tree_root.

Linux-commit: b93b016313b3ba8003c3b8bb71f569af91f19fc7

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Iadbc5eda884dbe8ad0d694e0f88255bc496dea5b
Reviewed-on: https://review.whamcloud.com/34673
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>

LU-11760 ofd: formatted OST recognition change

Modern system is fast enough to create above
100 000(5 * OST_MAX_PRECREATE) objects during commit interval.
Increase the difference between MDS last_used ID
and OST LAST_ID to 500 000 to avoid gaps after OST failover.

Cray-bug-id: LUS-6399
Change-Id: If36e04878d13f27f5229b488781440a159ddff7d
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/153866
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/33833
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12309 osd-zfs: Support disabled project quotas

Allow project quotas to be compiled in but disabled in the zpool.
This would be the case for zpools created by pre-0.8.0 ZFS, but then
used with newer ZFS.

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: I79c2c4ee3b191dad4150c218b25ced2508062d51
Reviewed-on: https://review.whamcloud.com/34888
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12013 lfsck: use correct buffer

lmm is used as a temporary pointer to structure, it can get moved within
the buffer while @size remain the same. this may cause invalid memory
access.

Change-Id: Iecc51e8bb75c678e7d8287b3798afbab8bfd1485
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34901
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>

LU-12221 statahead: sa_handle_callback get lli_sa_lock earlier

sa_handle_callback() must acquire the lli_sa_lock before calling
sa_has_callback(), which checks whether the sai_interim_entries list is
empty. Acquiring the lock avoids a race between an rpc handler
executing ll_statahead_interpret and the separate ll_statahead_thread.

When a client receives a stat request response, ll_statahead_interpret
increments sai_replied and if needed adds the request to the
sai_interim_entries list for instantiating by the ll_statahead_thread.
ll_statahead_interpret() holds the lli_sa_lock while doing this work.
On process termination, ll_statahead_thread() waits for sai_sent to
equal sai_replied and then removes any entries in the
sai_interim_entries list. It does not get the lli_sa_lock until
it determines that there are sai_interim_entries to process.

A bug occurs on weak memory model processors that do not guarantee
that both ll_statahead_interpret updates done under the lock are
visible to other processors at the same time. For example, on ARM
nodes, an ll_statahead_thread can read the updated value of
sai_replied and a non-updated value of sai_interim_lists.
ll_statahead_thread then thinks all replies have been received (true)
and all sai_interim_entries have been processed false). Later, the
update to sai_interim_entries becomes visible leaving the
ll_statahead_info struct in an unexpected state.

The bad state eventually triggers the LBUG:
statahead.c:477:ll_sai_put()) ASSERTION( !sa_has_callback(sai) )

Cray-bug-id: LUS-6243
Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I9fc6bd664188d9ac7c26b1b6965e2b99abf5e948
Reviewed-on: https://review.whamcloud.com/34760
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12324 mdd: Do not record xattr size get in changelogs

It looks like if the xattr itself was not fetched there's no
need to create a changelog entry for it. The real get will come
and we'd do it there

Change-Id: I5b19f9309f65da0a4c58cb79a95787dab862eb94
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34936
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>

LU-10754 tests: sanityn/47b to sleep for 1s

it seem 0.2s is not enough in this specific case

Change-Id: I51e00adb2de1229e8beafd8fe567fa7637e5d764
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34853
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>

LU-12282 build: export IB_OPTIONS before build

We need to export any option before dpkg-buildpackage

Test-Parameters: trivial

Change-Id: I683080e1872c8818ae9c391f5971b5e4488147a6
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34843
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12232 test: commit before df

In sub_test6 of replay_ost_single, the transactions at OSTs should
be committed to cleanup the test environment.

Test-Parameters: trivial
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Icbb06789855ab02252b7f1b0b9aff6bbb0f5f2e1
Reviewed-on: https://review.whamcloud.com/34808
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12098 mdd: explicitly clear changelogs on deregister

In case of MDS crash in the middle of changelog_deregister, the system
can end up with the changelogs user deregistered, but the changelog
entries not actually cleared. Then the only way to get rid of the
remaining changelogs not used anymore by any user is to register a new
changelogs user and then deregister it.
To protect from this scenario, explicitly clear changelogs used by the
user, before actually deregistering it.

Also add recovery-small test_136 for non-regression purpose.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I14576180c9351337fc4d9ed0e1b176d352584750
Reviewed-on: https://review.whamcloud.com/34688
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11838 ldlm: struct timespec64.tv_sec type change

Since kernel 4.18 struct timespec64 is no longer defined
as struct timespec on 64bit systems, this means tv_sec
is no longer __kernel_time_t but now time64_t.

Use %llu as the format specifier and explicitly cast it
to unsigned long long.

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: Ib4c80c9b20854d45b1b3c04057c45ee20d5413d9
Reviewed-on: https://review.whamcloud.com/34677
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>

LU-11838 osp: atomic64_read() returns s64

Since kernel 4.17 atomic64_read on x86_64 returns s64
instead of long.

Use %llu as the format specifier and explicitly cast it
to unsigned long long.

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I805d43251f24417e6405f5d087927c15cf531619
Reviewed-on: https://review.whamcloud.com/34676
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>

LU-12093 osc: don't check capability for every page

We check CFS_CAP_SYS_RESOURCE for every page during the io.
This is expensive on apparmor enabled systems, we can only
do that once for the entire io and use the result when
submitting the pages.

Don't init the oap_brw_flags during osc_page_init(), the flag
will be set in either osc_queue_async_io() or osc_page_submit().

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I0e664f43ce31c276b33476fdff11794185ab0a3b
Reviewed-on: https://review.whamcloud.com/34478
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12019 build: Recognize Debian Kernel and set KMP dir

Recognize Debian kernel and make sure kernel module package (KMP)
directory matches with KMP_MODDIR of Ubuntu and the Debian building
package system.

Test-Parameters: clientdistro=ubuntu1804
Signed-off-by: Thomas Stibor <t.stibor@gsi.de>
Change-Id: Iaf3635af6a624c9395db3f891d31413cb9e57b92
Reviewed-on: https://review.whamcloud.com/34329
Tested-by: Jenkins
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-8066 ptlrpc: move sptlrpc procfs entry to debugfs

We might want eventualy split it into a bunch of
single-value sysfs entries, I imagine, but there is no urgent need now.

Linux-commit : 77386b3c0b4470db1ed546de858b31cac66fc943

Migrate the GSS stuff to debugfs as well.

Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity,recovery-small,sanity-sec

Change-Id: I417d3a46aa21cd7dca7cb8f7b6fd78623d726bed
Signed-off-by: Dmitry Eremin <dmiter4ever@gmail.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/30963
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11089 obdclass: remove locking from lu_context_exit()

Recent patches suggest that the locking in lu_context_exit() hurts
performance as the changes that make are to improve performance.
Let's go all the way and remove the locking completely.

The race of interest is between lu_context_exit() finalizing a
value with ->lct_exit, and lu_context_key_quiesce() freeing
the value with key_fini().

If lu_context_key_quiesce() has started, there is no need to
finalize the value - it can just be freed.  So lu_context_exit()
is changed to skip the call to ->lcu_exit if LCT_QUIESCENT it set.

If lc_context_exit() has started, lu_context_key_quiesce() must wait
for it to complete - it cannot just skip the freeing.  To allow
this we introduce a new lc_state, LCS_LEAVING.  This indicates that
->lcu_exit might be called.  Before calling key_fini() on a context,
lu_context_key_quiesce() waits (spinning) for lc_state to move on from
LCS_LEAVING.

Linux-commit: ac3f8fd6e61b245fa9c14e3164203c1211c5ef6b

fix possible hang waiting for LCS_LEAVING

As lu_context_key_quiesce() spins waiting for LCS_LEAVING to
change, it is important the we set and then clear in within a
non-preemptible region.  If the thread that spins pre-empty the
thread that sets-and-clears the state while the state is LCS_LEAVING,
then it can spin indefinitely, particularly on a single-CPU machine.

Also update the comment to explain this dependency.

Linux-commit: 4859716f66db6989ef4bf52434b5b1d813c6adc1

Change-Id: I92ef27304eab43518fcb216b9c9cb4875cc9b98c
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/32713
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12212 mdt: fix SECCTX reply buffer handling

LU-9193 changes for inline SECCTX in reply may cause often
resends and reconnects in some loads, e.g. dbench runs.
That is caused by missed buffer shrink when SECCTX is not
used.

Patch fo the following:
- shrink SECCTX buffer if it is not used
- in mdt_getattr_name_lock() fill SECCTX buffer a bit earlier
  for simpler handling DoM size attributes, also move
  LDLM_LOCK_PUT() at the end of block to don't use 'lock'
  after LDLM_LOCK_PUT()

Fixes: fca35f74f9ec ("LU-9193 security: return security context for metadata ops")
Test-Parameters: clientselinux testlist=sanity envdefinitions=EXCEPT=103a
Test-Parameters: mdscount=2 mdtcount=4 clientselinux testlist=recovery-small,sanity-selinux
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9beffd06f76c3bd8e826ba4ab0ce70ac3f57951c
Reviewed-on: https://review.whamcloud.com/34734
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12298 init: Add init info to lustre sysvinit script

This adds info to sysvinit script that systemd can use
to build dependency graphs.

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: Ied3bc05d61ba9dc33904a84c5f91bb9adc60cb01
Reviewed-on: https://review.whamcloud.com/34873
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

Revert "LU-8384 scripts: Add scripts to systemd for EL7"

This reverts commit 420d8c09887ff178508be0434373f74b5ef7ae6e.

This prevents lustre from starting correctly, as seen in LU-12298

Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: Ib0a7e85079d1aea27b3a09496a2bf02c698c294c
Reviewed-on: https://review.whamcloud.com/34872
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-10754 tests: Clear mdc locks before tests

On ZFS testing, a sync stemming from a lock cancellation
from a previous test sometimes causes us to run longer than
the sleep times allowed for forked processes to be ready.

So, cancel the MDC lru locks first. This will only incur a
sync if there is data to sync, but will wait for one if
necessary.

Test-Parameters: testlist=sanityn,sanityn,sanityn,sanityn

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I865de238aadd6da719066e6f22e2a36d1d3f368e
Reviewed-on: https://review.whamcloud.com/34848
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12242 kernel: kernel update RHEL7.6 [3.10.0-957.12.1.el7]

Update RHEL7.6 kernel to 3.10.0-957.12.1.el7.

Test-Parameters: clientdistro=el7.6 serverdistro=el7.6

Change-Id: I71d3bc18dbc16ed1ad7a3083dc19f52b56f60e40
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34784
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12248 lov: fix ost objects calculation in lod_statfs

Wen OSTs report fewer free objects than MDTs, the statfs
objects results are presented with the numbers reported
by OSTs. Fix the calculation of OST objetcs to make it
work with statfs aggregation via the MDT.

Make the lfs code consistent with ll_statfs_internal()
and lod_statfs().

Fixes: a829595add ("LU-11721 lod: limit statfs ffree if less ...")

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I838a1527ed6411a412b63e2855ca7247755a3bcf
Reviewed-on: https://review.whamcloud.com/34777
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>

LU-12225 obdclass: fix race access vs removal of jobid_hash

We added @pidmap into hash and reference count will be 1.
However, another thread might reclaim this newely added
@pidmap from hash list, we try to access this @pidmap
will become a user-after-free operation.

Fix this problem by init reference count as 1 before
adding hash list, which gurantee memory could be not
freed during our access.

Check other places where memory reclaim used did similar
idea like this.

Change-Id: Idd5f429b97e064e29b6883243f8a012c2b4b4ae7
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34763
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11838 lnet: getname dropping addrlen argument

Since kernel 4.17 ->getname() does not take int *addrlen
argument anymore, instead it's returning the length to
the caller.

Linux-commit: 9b2c45d479d0fb8647c9e83359df69162b5fbe5f

Test-Parameters:trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I4ad5de4a22f3fb23c07a356650ea7925acf07eed
Reviewed-on: https://review.whamcloud.com/34672
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12139 kernel: kernel update [SLES12 SP3 4.4.176-94.88]

Update SLES12 SP3 kernel to 4.4.176-94.88.

Test-Parameters: trivial clientdistro=sles12sp3 serverdistro=sles12sp3

Change-Id: Iecf77e056fc571eb5118ac8c96d440e5f3ceebc0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34670
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11157 obd: round values to nearest MiB for *_mb syfs files

Several sysfs files report their settings with the functions
lprocfs_[seq]_read_frac_helper() which has the intent of showing
fractional values i.e 1.5 MiB. This approach has caused problems
with shells which don't handle fractional representation and the
values reported don't faithfully represent the original value the
configurator passed into the sysfs file. To resolve this lets
instead always round up the value the configurator passed into
the sysfs file to the nearest MiB value. This way it is always
guaranteed the values reported are always exactly some MiB value.

Change-Id: Ia2e8cf8421784853aa33d4bb87c54aee00953835
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34317
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-2233 tests: improve tests sanityn/40-47

sanity/40-46 usually take 800-900s which is almost a half
of the whole sanityn pass. 99.(9)% of time the tests just
wait to ensure specific order the operations execute in.

the patch changes cfs_fail_timeout_set() so that it can
interrupt waiting if fail_loc is set to 0 - polling with
1/10s frequency is used.

the tests itself are modified to reset fail_loc. to be
able to do so both operations (referenced as OP1 and OP2
in the tests) are run in background. once started and then
ensured with pdo_sched() helper that MDS threads got to the
blocking points, we can interrupt OP1 and do usual checks.

ONLY=40-47 sh sanityn.sh take: 1017s before and 78s after.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie01bd6a077333f6f57e533a73f38588a073a2381
Reviewed-on: https://review.whamcloud.com/4392
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>

LU-12276 lnet: check const parameters for ib_post_send and ib_post_recv

In MOFED 4.6, the second and third parameters for ib_post_send() and
ib_post_recv() are declared with 'const'. This patch adds the check in
configure file to resolve build failure.

Change-Id: If7193a6a4fcb7b238f5d4ee64e878a5816433e7b
Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34837
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12266 mdd: fix up non-dir creation in SGID dirs

sgid directories have special semantics, making newly created files in
the directory belong to the group of the directory, and newly created
subdirectories will also become sgid. This is historically used for
group-shared directories.

But group directories writable by non-group members should not imply
that such non-group members can magically join the group, so make sure
to clear the sgid bit on non-directories for non-members (but remember
that sgid without group execute means "mandatory locking", just to
confuse things even more).

Adapt fix from inode_init_owner() to use in mdd_create_sanity_check().

Linux-commit: 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iae253c5cc7865fc81574760ce0ed4d93698b7314
Reviewed-on: https://review.whamcloud.com/34809
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>

LU-12227 scripts: check for mounted ZFS devices too

lustre init script skips several checks if the device type is ZFS. If
some ZFS devices are already mounted, the script will return a
non-zero exit code.

The label and mount point check is valid for ZFS devices, so let's do
it and avoid this error case. With this patch, when starting ZFS
devices the script will only start the not already started ones and if
it succeeds, return 0.

Test-Parameters: trivial
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: I152ca4d62d444193cc66896173873587f0761493
Reviewed-on: https://review.whamcloud.com/34766
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-10602 utils: fix file heat support

Change the LL_IOC_HEAT_SET ioctl number assignment to reduce the
number of different values used, since we are running out. Use
a __u64 as the IOC struct argument instead of a "long" since that
is what is actually passed, and it avoids being CPU-dependent.

Move the LU_HEAT_FLAG_* values into an enum to avoid a generic
"flags" argument in the code. This makes it clear what is passed.

Clean up code style for lfs_heat_get() and lfs_heat_set().

Fixes: ae723cf8161f ("LU-10602 llite: add file heat support")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If06212d2d62d085a2104cf54ae9a10e512eb2efd
Reviewed-on: https://review.whamcloud.com/34757
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-8066 obd: embed typ_kobj in obd_type

As there is a 1-1 mapping between obd_types and their ->typ_kobj, it
is simple and more normal to embed the kobj in the obd_type, rather
than allocate it separately.

This requires calling "kobject_init_and_add()" earlier, so we
open-code relevant part of class_setup_tunables() in
class_register_type(). Now class_setup_tunables() is needed only
for server side code.

With typ_kobj embedded in obd_type we change class_setup_tunables()
to return an obd_type object instead of a kobject. This way we
can use kobject_put() to cleanup the obd_type created with
class_setup_tunables(). The reason for class_setup_tunables() is
for the creation of a lightweight obd_type which is never added
to the typ_chain list to avoid potential duplicates which can
happen on single node setups with lod / lov and osp /osc.

Change-Id: Iac160e6817a7c520e4462a3fc133ddfee6a7ccdc
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34612
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11690 lod: fix LBUG with wide striping

When striping extremely widely (~1600+ stripes), we reach
more than half of the theoretical limit of layout size,
and LBUG.

It is also possible to trigger this assert with
multi-component PFL files, where all the components are
below the stripe count limit, but together they exceed it.

PFL makes asserting based on LOV_MAX_STRIPE_COUNT
unworkable, so just remove the assert. Further work is
planned to match up maximum allowed layout size with
the real maximum EA size.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id0240785792e7d4084ea6e53b44469a40e59043d
Reviewed-on: https://review.whamcloud.com/33708
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>

LU-6142 ptlrpc: Fix style issues for service.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/service.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: Ibaffcdfaeac48176ba05b5e4f4471f9db96d9cbe
Reviewed-on: https://review.whamcloud.com/34605
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-6142 ptlrpc: Fix style issues for sec_null.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/sec_null.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I67631d35ae4461ca92516975ab71f69d01378e19
Reviewed-on: https://review.whamcloud.com/34549
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>

LU-6142 ldlm: Fix style issues for interval_tree.c

This patch fixes issues reported by checkpatch
for file lustre/ldlm/interval_tree.c

Test-Parameters: trivial
Change-Id: Ida99aa8f7a5928e87611c73aa7b5d0dc4a5246e9
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34498
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11803 obd: replace class_uuid with linux kernel version.

We can replace the lustre custom class_uuid_t with the linux
kernels uuid handling.

Change-Id: I9a59b0b6027ccb95994a87f3a5dcdf80a8a56480
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33916
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Tested-by: Jenkins
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11376 lmv: new foreign LMV format

This patch introduces a new striping/LMV format in order to
allow to specify an arbitrary external reference for a dir
in Lustre namespace.
The new LMV format is made of {newmagic, length, type, flags,
string[length]} to be as flexible as possible.
Foreign dir can be created by using the ioctl(LL_IOC_LMV_SETDIRSTRIPE)
operation and it can only be and remain an empty dir until removed.
A new API method llapi_dir_create_foreign() has been introduced
and "lfs {get,set}dirstripe" and "lfs find" modified to understand
new format.
The idea behind this is to provide Lustre namespace support and
striping prefetch/caching under lock protection, for user/external
usage.

This patch is the LMV/dirs complement of LOV/files previous change
(Change-Id: I5d9c0642fe8e7009c30918bfa946cac7c00c9af8) and has
been rebased on top of the latter along with some with obvious
mutualizations and simplifications.

Code has been added for lfsck to handle foreign dirs, and
a new sub-test has been added in sanity-lfsck in order to verify
if does not break foreign dir and that reverse is also true.

Also fixes a bug causing SEGVs during
"lfs find [--mdt-count=[+,-]<count>, --mdt-hash=<hashtype>]" when
handling a file (ie, "DIR *dir" is NULLL) in cb_find_init().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I3721b8f14578bf926a92da76375dae92dc8d764d
Reviewed-on: https://review.whamcloud.com/34087
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11376 lov: new foreign LOV format

This patch introduces a new layout/LOV format in order to
allow to specify an arbitrary external reference for a file
in Lustre namespace.
The new LOV format is made of {newmagic, length, type, flags,
string[length]} to be as flexible as possible.
Foreign file can be created by using the open(O_LOV_DELAY_CREATE) +
ioctl(LL_IOC_LOV_SETSTRIPE) operations and it can only be and remain
an empty file until removed.
A new API method llapi_file_create_foreign() has been introduced
and "lfs [[get,set]stripe,find" modified to understand new layout.
The idea behind this is to provide Lustre namespace support and
layout prefetch/caching under layout protection, for user/external
usage.

Code has been added for lfsck to handle foreign files, and
a new sub-test has been added in sanity-lfsck in order to verify
if does not break foreign file and that reverse is also true.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I5d9c0642fe8e7009c30918bfa946cac7c00c9af8
Reviewed-on: https://review.whamcloud.com/33755
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>

LU-11403 tests: Fix $tfile usage

We cannot just use raw $tfile - we must use something under
$DIR. This is resulting in failures because $tfile exists.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iea6356cabb1623606bf926ce80c55a3210c0b535
Reviewed-on: https://review.whamcloud.com/34698
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11233 utils: fix double-free of params fields

Call find_param_fini() on error so that the params are not leaked
during initialization if there is an intermediate error.

Zero out the parameters as they are freed, so if find_param_fini()
is called multiple times (as it is in some error paths) it does
not corrupt the heap by double freeing pointers. This can be hit
by calling "lfs getstripe -m" on multiple pathnames, some of which
do not exist.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie0d7e9ee134deb0633af2f8052b8a458333ebbe5
Reviewed-on: https://review.whamcloud.com/34711
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-6951 tests: sanity test_27m failure

sanity 27m fails with "OST0 was full but new created file
still use it" if the test runs with more than 1 client.
The issue can be easily reproduced with qos_threshold_rr=100.

The reason is grants. Every client initially gets 2 Mb grant.
When dd from the first client receives ENOSPC, it does not mean
the OST is filled up, since the client is not allowed to use
other clients' grants. When creating a new file, the MDS still
sees free space on OST0 equal to the amount of unused grants
and allocates new objects on OST0.

This situation does not seem to reflect any defect in Lustre.
Rather, the original author's intent seems to be that
the test should always run with a single client. So, this patch
simply disables the test if the test is running with more than
one client.

Change-Id: I47cd1a6806e8fa5203aeb5bcf57a6b31b424f24d
Seagate-bug-id: MRP-1690
Signed-off-by: Alexander Boyko <c17825@cray.com>
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/23506
Tested-by: Jenkins
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11090 quota: Oops in qsd_config

It's quota config vs umount race
Remove qsd from the list of fsinfo before
freeing per-quota type data.

Change-Id: Ib7c3a94b3222ffd229da1a384113b3befc19665b
Cray-bug-id: LUS-5896
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/32715
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-11251 mdt: ASSERTION (req_transno < next_transno) failed

An update request is checked for duplicates by xid in
is_req_replayed_by_update(). However xid is unique per
client only. It may happen that there are 2 requests
with the same xid from different clients.

Perform lookup by transno, it is unique per MDT.

Change-Id: If00b69f01451c659292c004aa296a6ea36680d3c
Cray-bug-id: LUS-6015
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/33001
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-9010 ptlrpc: Change static defines to use macro for gss_krb5_mech.c

This patch replaces spinlock which are defined statically
in file lustre/ptlrpc/gss/gss_krb5_mech.c with kernel provided macro.

Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I5da319ce013c29043fc4bde4a4946cfbdf6c2491
Reviewed-on: https://review.whamcloud.com/33936
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12043 llite, readahead: fix to call ll_ras_enter() properly

ll_ras_enter() is expected to be called per syscall.
However, with fast read enabled, it will be no longer true that
We will call vvp_io_read_start() for every syscall.

To fix this problem, we should move this to file read handler.

Change-Id: I8d70714b2e8bc04b7c4ab996d189f10f37488d97
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34755
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

LU-12159 utils: improve lfs getname functionality

Add "-n" and "-i" options to lfs getname to allow printing only
the fsname or instance ID of the filesystem(s).

Split out the documentation to a separate lfs-getname.1 man page.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie132513325b6630fc5103a89b469271ba7392cb2
Reviewed-on: https://review.whamcloud.com/34595
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>