Whamcloud - gitweb
fs/lustre-release.git
14 months agoLU-11297 lnet: MR Routing Feature 83/34983/3
Amir Shehata [Fri, 7 Jun 2019 18:35:09 +0000 (14:35 -0400)]
LU-11297 lnet: MR Routing Feature

This is a merge commit from the multi-rail branch. It brings in
the MR Routing feature. This feature aligns the LNET Multi-Rail
behavior with routing. A gateway now is viewed as a Multi-Rail
capable node. When a route is added only one entry per gateway
should be used. That route entry should use the primary-nid of
the gateway. The multi-rail selection algorithm is then run when
sending to the gateway to select the best interface to send to.

Furthermore the gateway aliveness is now kept via the health
mechanism. And the gateway pinger now uses discovery instead
of maintaining its own pinger handler.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie2d8c6449f84860511b322ff2db3ed656a163e74

14 months agoLU-12200 lnet: check peer timeout on a router 72/34772/15
Amir Shehata [Fri, 19 Apr 2019 00:19:22 +0000 (17:19 -0700)]
LU-12200 lnet: check peer timeout on a router

On a router assume that a peer is alive and attempt to send it
messages as long as the peer_timeout hasn't expired.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0806a52c8ad7acc1c93dcf32353f1c4467c618b1
Reviewed-on: https://review.whamcloud.com/34772
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-12053 lnet: look up MR peers routes 25/34625/17
Amir Shehata [Mon, 8 Apr 2019 22:28:23 +0000 (15:28 -0700)]
LU-12053 lnet: look up MR peers routes

An MR peer can have multiple interfaces some of which we might
have a route to. The primary NID of the peer might not necessarily
specify a NID we have a route to. When looking up a route, we must
iterate over all the nets the peer is on and select the one which
we can route to. Taking into consideration the peer can exist on
multiple routed networks we also have a simple round robin algorithm
to iterate over all the networks we can reach the peer on.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0651dd4f732c8b71872f73cf2512b08f34129bd9
Reviewed-on: https://review.whamcloud.com/34625
Tested-by: Jenkins
14 months agoLU-11299 lnet: discover each gateway Net 11/34511/22
Amir Shehata [Tue, 26 Mar 2019 21:16:32 +0000 (14:16 -0700)]
LU-11299 lnet: discover each gateway Net

Wakeup every gateway aliveness interval / number of local networks.
Discover each local gateway network in round robin.

This is done to make sure the gateway keeps its networks up.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehat <ashehata@whamcloud.com>
Change-Id: I4035e39c286cb599d4eb8f9df7ed5d278e6d744a
Reviewed-on: https://review.whamcloud.com/34511
Tested-by: Jenkins
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
14 months agoLU-11299 lnet: net aliveness 10/34510/22
Amir Shehata [Sat, 23 Mar 2019 01:01:51 +0000 (18:01 -0700)]
LU-11299 lnet: net aliveness

If a router is discovered on any interface on the network, then
update the network last alive time and the NI's status to UP.
If a router isn't discovered on any interface on a network,
then change the status of all the interfaces on that network to down.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I1d67eb4b3284ccb8306ad4c877a2fcbdf4958d8c
Reviewed-on: https://review.whamcloud.com/34510
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11664 lnet: push router interface updates 51/33651/30
Amir Shehata [Wed, 14 Nov 2018 02:14:36 +0000 (18:14 -0800)]
LU-11664 lnet: push router interface updates

A router can bring up/down its interfaces if it hasn't received any
messages on that interface for a configurable period
(alive_router_ping_timeout). When this even occures the router can now
push its status change to the peers it's talking to in order to inform
them of the change in its status. This will allow the router users to
handle asym router failures quicker.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I9530ed7d9bc0a86edc43e3f610cc943f1732dcfd
Reviewed-on: https://review.whamcloud.com/33651
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11297 lnet: set gw sensitivity from lnetctl 35/33635/31
Amir Shehata [Fri, 9 Nov 2018 19:24:20 +0000 (11:24 -0800)]
LU-11297 lnet: set gw sensitivity from lnetctl

Allow an optional parameter from the:
lnetctl route add
command to set the health sensitivity of the gateway
lnetctl route add --net <net> --gateway <gw> --sensitivity <value>

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iee120c78a41b79da6ab6bdf1560f558df89233e2
Reviewed-on: https://review.whamcloud.com/33635
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11297 lnet: handle router health off 34/33634/31
Amir Shehata [Fri, 9 Nov 2018 18:31:27 +0000 (10:31 -0800)]
LU-11297 lnet: handle router health off

Routing infrastructure depends on health infrastructure to manage
route status. However, health can be turned off. Therefore, we need
to enable health for gateways in order to monitor them properly.
Each peer now has its own health sensitivity. When adding a route
the gateway's health sensitivity can be explicitly set from lnetctl
or if not specified then it'll default to 1, thereby turning health
on for that gateway, allowing peer NI recovery if there is a failure.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibae33d595e97d0eec432ae8f5d51898ce0776f01
Reviewed-on: https://review.whamcloud.com/33634
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11641 lnet: handle discovery off 20/33620/32
Amir Shehata [Thu, 8 Nov 2018 00:51:44 +0000 (16:51 -0800)]
LU-11641 lnet: handle discovery off

When discovery is turned off locally or when the peer either has
discovery off or doesn't support MR at all then degrade discovery
behavior to a standard ping. This will allow routers to continue
using discovery mechanism even if it's turned off.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7f0829d37cbff2bf9e41de251efa715fc4c97e5d
Reviewed-on: https://review.whamcloud.com/33620
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11470 lnet: drop all rule 05/33305/36
Amir Shehata [Thu, 4 Oct 2018 00:36:45 +0000 (17:36 -0700)]
LU-11470 lnet: drop all rule

Add a rule to drop all messages arriving on a specific interface.
This is useful for simulating failures on a specific router interface.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic69f683fb2caf7a69a1d85428878c89b7b1ee3ad
Reviewed-on: https://review.whamcloud.com/33305
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11478 lnet: misleading discovery seqno. 04/33304/34
Amir Shehata [Fri, 5 Oct 2018 00:18:20 +0000 (17:18 -0700)]
LU-11478 lnet: misleading discovery seqno.

There is a sequence number used when sending discovery messages. This
sequence number is intended to detect stale messages. However it
could be misleading if the peer reboots. In this case the peer's
sequence number will reset. The node will think that all information
being sent to it is stale, while in reality the peer might've
changed configuration.

There is no reliable why to know whether a peer rebooted, so we'll
always assume that the messages we're receiving are valid. So we'll
operate on first come first serve basis.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I421a00e47bc93ee60fa37c648d6d9a726d9def9c
Reviewed-on: https://review.whamcloud.com/33304
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11477 lnet: handle health for incoming messages 01/33301/34
Amir Shehata [Thu, 4 Oct 2018 23:21:48 +0000 (16:21 -0700)]
LU-11477 lnet: handle health for incoming messages

In case of routers (as well as for the general case) it's important to
update the health of the ni/lpni for incoming messages. For an lpni
specifically when we receive a message is when we know that the lpni
is up.

A percentage router health is required in order to send a message to a
gateway. That defaults to 100, meaning that a router interface has to
be absolutely healthy in order to send to it. This matches the current
behavior. So if a router interface goes down an its health goes down
significantly, but then it comes back up again; either we receive a
message from it or we discover it and get a reply, then in order to
start using that router interface again we have to boost its health
all the way up to maximum.

This behavior is special cased for routers.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ida6c23f95dbef56c2e6ed7b6d03743939d8b30a0
Reviewed-on: https://review.whamcloud.com/33301
Tested-by: Jenkins
14 months agoLU-11475 lnet: transfer routers 39/34539/20
Amir Shehata [Thu, 28 Mar 2019 02:32:45 +0000 (19:32 -0700)]
LU-11475 lnet: transfer routers

When a primary NID of a peer is about to be deleted because
it's being transfered to another peer, if that peer is a gateway
then transfer all gateway properties to the new peer.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ib475c389ca5630906416a5112b3088f6f5d03950
Reviewed-on: https://review.whamcloud.com/34539
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11475 lnet: allow deleting router primary_nid 00/33300/34
Amir Shehata [Thu, 4 Oct 2018 22:31:04 +0000 (15:31 -0700)]
LU-11475 lnet: allow deleting router primary_nid

Discovery doesn't allow deleting a primary_nid of a peer. This
is necessary because upper layers only know to reach the peer by
using the primary_nid. For routers this is not the case. So
if a router changes its interfaces and comes back up again, the
peer_ni should be adjusted.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I9da056172f35a5f15eed5ba0e02fcb37ac414c54
Reviewed-on: https://review.whamcloud.com/33300
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11300 lnet: consider alive_router_check_interval 98/33298/34
Amir Shehata [Fri, 5 Oct 2018 01:28:49 +0000 (18:28 -0700)]
LU-11300 lnet: consider alive_router_check_interval

Consider router_check_interval when waking up the monitor thread,
to make sure you wakeup the monitor thread at the earliest possible
time.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ibc4b53886b59a9bc174a29d0da711ac77db3a62c
Reviewed-on: https://review.whamcloud.com/33298
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins
14 months agoLU-11378 lnet: MR aware gateway selection 88/33188/36
Amir Shehata [Fri, 14 Sep 2018 18:04:44 +0000 (11:04 -0700)]
LU-11378 lnet: MR aware gateway selection

When selecting a route use the Multi-Rail Selection algorithm to
select the best available peer_ni of the best route. The selected
peer_ni can then be used to send the message or to discover it
if the gateway peer needs discovering.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I376af57611591eed2eb1edb80a1b3a68b5aefd19
Reviewed-on: https://review.whamcloud.com/33188
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11299 lnet: use discovery for routing 54/33454/31
Amir Shehata [Mon, 22 Oct 2018 23:03:06 +0000 (16:03 -0700)]
LU-11299 lnet: use discovery for routing

Instead of re-inventing the wheel, routing now uses discovery.
Every router interval the router is discovered. This will
update the router information locally and will serve to let the
router know that the peer is alive.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I211bf15af0b0a5d50f9e2a69a385419a1dd5096b
Reviewed-on: https://review.whamcloud.com/33454
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11299 lnet: modify lnd notification mechanism 53/33453/30
Amir Shehata [Mon, 22 Oct 2018 22:44:50 +0000 (15:44 -0700)]
LU-11299 lnet: modify lnd notification mechanism

LND notifies when a peer is up or down. If the LND notifies
LNet that the peer is up and sets the "reset" flag to true
then this indicates to LNet that the LND knows about the health
of the peer and is telling LNet that the peer is fully healthy.
LNet will set the health value of the peer to maximum, otherwise
it will increment the health by one.

If the LND notifies the LNet that the peer is down, LNet will
decrement the health of the peer by sensitivity value configured.

LNet then turns around and rechecks the peer aliveness and if its
dead it'll notify the LND. This code is only used by the socklnd
because it needs to tear down connections. This is in keeping with
the original functionality.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifa614405fb0c2cd4f6bcb1a2a97e856320eb6cbe
Reviewed-on: https://review.whamcloud.com/33453
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins
14 months agoLU-11299 lnet: Cleanup rcd 87/33187/35
Amir Shehata [Mon, 22 Oct 2018 22:09:11 +0000 (15:09 -0700)]
LU-11299 lnet: Cleanup rcd

Cleanup all code pertaining to rcd, as routing code will use
discovery going forward and there will be no need to keep its own
pinging code.

test_215 looks at the routers file which had its format changed.
Update the test to reflect the change.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If31caa3b5703df40b6ae0f758f2fe764991aa4f3
Reviewed-on: https://review.whamcloud.com/33187
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins
14 months agoLU-11300 lnet: simplify lnet_handle_local_failure() 52/33452/30
Amir Shehata [Mon, 22 Oct 2018 20:39:36 +0000 (13:39 -0700)]
LU-11300 lnet: simplify lnet_handle_local_failure()

Pass the struct lnet_ni to lnet_handle_local_failure() instead of the
message structure, since nothing else from the message is being
used. This also makes symmetrical with lnet_handle_remote_failure()

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I10146ec5bf5f378e28a7725382f00132ada32c6e
Reviewed-on: https://review.whamcloud.com/33452
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: Jenkins
14 months agoLU-11300 lnet: router aliveness 85/33185/34
Amir Shehata [Thu, 6 Sep 2018 00:03:45 +0000 (17:03 -0700)]
LU-11300 lnet: router aliveness

A route is considered alive if the gateway is able to route
messages from the local to the remote net. That means that
at least one of the network interfaces on the remote net of
the gateway is viable.

Introduced the concept of sensitivity percentage. This defaults
to 100%. It holds a dual meaning:
1. A route is considered alive if at least one of the its interfaces'
health is >= LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage
100 means at least one interface has to be 100% healthy
2. On a router consider a peer_ni dead if its health is not at least
LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage.
100% means the interface has to be 100% healthy.

Re-implemented lnet_notify() to decrement the health of the
peer interface if the LND reports a failure on that peer.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie97561fb70bf6a558bc90fa9266a6ba38fa3d293
Reviewed-on: https://review.whamcloud.com/33185
Tested-by: Jenkins
14 months agoLU-11300 lnet: peer aliveness 86/33186/34
Amir Shehata [Thu, 6 Sep 2018 01:19:35 +0000 (18:19 -0700)]
LU-11300 lnet: peer aliveness

Peer NI aliveness is now solely dependent on the health
infrastructure. With the addition of router_sensitivity_percentage,
peer NI is considered dead if its health drops below the percentage
specified of the total health. Setting the percentage to 100% means
that a peer_ni is considered dead if it's interface is less than
fully healthy.

Removed obsolete code that queries the peer NI every second since
the health infrastructure introduces the recovery mechanism which
is designed to recover the health of peer NIs.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I506060fbb66c74295808891b689d7d634dc69284
Reviewed-on: https://review.whamcloud.com/33186
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-11300 lnet: Cache the routing feature 51/33451/30
Amir Shehata [Sat, 20 Oct 2018 01:24:39 +0000 (18:24 -0700)]
LU-11300 lnet: Cache the routing feature

When processing a REPLY or a PUSH for a discovery cache the
whether the routing feature is enabled or disabled as
reported by the peer.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I69bd41fade196773af0e1004c2e7fff2fb91392d
Reviewed-on: https://review.whamcloud.com/33451
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11300 lnet: cache ni status 50/33450/30
Amir Shehata [Sat, 20 Oct 2018 01:02:05 +0000 (18:02 -0700)]
LU-11300 lnet: cache ni status

When processing the data in the PUSH or the REPLY make sure to cache
the ns_status. This is the status of the peer_ni as reported by the
peer itself.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I14de2460f578fb7f47d329a97b8833f49c569b74
Reviewed-on: https://review.whamcloud.com/33450
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-11300 lnet: configure lnet router senstivity 55/33455/29
Amir Shehata [Tue, 23 Oct 2018 04:25:33 +0000 (21:25 -0700)]
LU-11300 lnet: configure lnet router senstivity

Allow the configuration of router_sensitivity_percentage from the
user space utility lnetctl

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If5440f30881361ebb06dafa9cadb7cbc2b934f93
Reviewed-on: https://review.whamcloud.com/33455
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-11300 lnet: router sensitivity 49/33449/30
Amir Shehata [Sat, 20 Oct 2018 00:09:24 +0000 (17:09 -0700)]
LU-11300 lnet: router sensitivity

Introduce the router_sensitivity_percentage module parameter to
control the sensitivity of routers to failures. It defaults to 100%
which means a router interface needs to be fully healthy in order
to be used.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3e9333033f049918c1cdca58a72604c71884acbe
Reviewed-on: https://review.whamcloud.com/33449
Tested-by: Jenkins
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
14 months agoLU-11551 lnet: Do not allow deleting of router nis 48/33448/27
Amir Shehata [Fri, 19 Oct 2018 23:40:52 +0000 (16:40 -0700)]
LU-11551 lnet: Do not allow deleting of router nis

Check the peer before deleting a peer_ni. If it's a router then do
not allow deletion of the peer-ni.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I372052b4e9b5af3a8f18a49676fc60b4c8077cbd
Reviewed-on: https://review.whamcloud.com/33448
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-11299 lnet: lnet_add/del_route() 84/33184/31
Amir Shehata [Tue, 4 Sep 2018 23:47:54 +0000 (16:47 -0700)]
LU-11299 lnet: lnet_add/del_route()

Reimplemented lnet_add_route() and lnet_del_route() to use
the peer instead of the peer_ni.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3734098a81ab18d1d74220c691d96a9b9817e6da
Reviewed-on: https://review.whamcloud.com/33184
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
14 months agoLU-11298 lnet: use peer for gateway 83/33183/31
Amir Shehata [Fri, 31 Aug 2018 02:04:39 +0000 (19:04 -0700)]
LU-11298 lnet: use peer for gateway

The routing code uses peer_ni for a gateway. However with Mulit-Rail
a gateway could have multiple interfaces on several different
networks. Instead of using a single peer_ni as the gateway we should
be using the peer and let the MR selection code select the best
peer_ni to send to.

This patch moves the gateway from peer to peer_ni. Much of the
code needs to be rewritten in the following patches to account
for that change. This patch disables the routing features by
disabling the code to add/delete routes.

The asymmetric routing detection feature is also modified to
use the MR routing

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia7dab552268c4a7fbd7b88122b9a95363d155fd7
Reviewed-on: https://review.whamcloud.com/33183
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-11292 lnet: Discover routers on first use 82/33182/31
Amir Shehata [Tue, 28 Aug 2018 23:42:35 +0000 (16:42 -0700)]
LU-11292 lnet: Discover routers on first use

Discover routers on first use. This brings the behavior when
interacting with routers in line with when dealing with normal
peers.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8527e41daf2f5f6ab5f04aac1285aaa6cc4ee594
Reviewed-on: https://review.whamcloud.com/33182
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
14 months agoLU-10153 lnet: remove route add restriction 47/33447/23
Amir Shehata [Fri, 19 Oct 2018 23:23:40 +0000 (16:23 -0700)]
LU-10153 lnet: remove route add restriction

Remove restriction with adding routes to the same remote network
via two different gateways.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Iefc5aa10f73e9e7bdd283f5e933fbb8ee819df50
Reviewed-on: https://review.whamcloud.com/33447
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
14 months agoLU-12339 lnet: select LO interface for sending 57/34957/5
Amir Shehata [Sat, 25 May 2019 16:55:47 +0000 (09:55 -0700)]
LU-12339 lnet: select LO interface for sending

In the following scenario

Lustre->LNetPrimaryNID with 0@lo
Discover is initiated on 0@lo
The peer is created with 0@lo and <addr>@<net>
The interface health of the peer's <addr>@<net> is decremented
LNetPut() to self
selection algorithm selects 0@lo to send to

This exposes an issue where we try and go through the peer credit
management algorithm, but because there are no credits associated with
0@lo we end up indefinitely queuing the message. ptlrpc will then get
stuck waiting for send completion on the message.

This was exposed via conf-sanity 32a

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I98e9d3428b594a0d041d27d8e8d8de7596825edc
Reviewed-on: https://review.whamcloud.com/34957
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-12199 lnet: verify msg is commited for send/recv 97/34797/12
Amir Shehata [Tue, 30 Apr 2019 21:01:48 +0000 (14:01 -0700)]
LU-12199 lnet: verify msg is commited for send/recv

Before performing a health check make sure the message
is committed for either send or receive. Otherwise we
can just finalize it.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Id7bd956f8e81e60a2d63059730973f851d4c7abe
Reviewed-on: https://review.whamcloud.com/34797
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
14 months agoLU-12199 lnet: Ensure md is detached when msg is not committed 85/34885/8
Chris Horn [Thu, 18 Apr 2019 03:49:18 +0000 (22:49 -0500)]
LU-12199 lnet: Ensure md is detached when msg is not committed

It's possible for lnet_is_health_check() to return "true" when the
message has not hit the network. In this situation the message is
freed without detaching the MD. As a result, requests do not receive
their unlink events and these requests are stuck forever.

A little cleanup is included here:
 - The value of lnet_is_health_check() is only used in one place, so
   we don't need to save the result of it in a variable.
 - We don't need separate logic to detach the md when the send was
   successful. We'll fall through to the finalizing code after
   incrementing the health counters

Test-Parameters: forbuildonly
Cray-bug-id: LUS-7239
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I6301d491090b862d016eed3aac8afd7be8685e57
Reviewed-on: https://review.whamcloud.com/34885
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
14 months agoLU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock 98/34798/9
Chris Horn [Thu, 2 May 2019 22:24:32 +0000 (17:24 -0500)]
LU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock

Protect the peer discovery queue from concurrent manipulation by
acquiring the lp_lock.

Test-Parameters: forbuildonly
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: If43b877c1c7ea203f346a3d6ea846f00b8f9661f
Reviewed-on: https://review.whamcloud.com/34798
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
14 months agoLU-12254 lnet: correct discovery LNetEQFree() 96/34796/8
Amir Shehata [Tue, 30 Apr 2019 18:51:09 +0000 (11:51 -0700)]
LU-12254 lnet: correct discovery LNetEQFree()

The EQ needs to be freed after all the queues are cleaned to avoid
having non-processed events on the event queue on free. This will
prevent the memory from being freed.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie38ec25e09bf6d7cf2aadc30edd91d298897c51b
Reviewed-on: https://review.whamcloud.com/34796
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-12249 lnet: fix list corruption 78/34778/8
Amir Shehata [Tue, 30 Apr 2019 05:57:21 +0000 (22:57 -0700)]
LU-12249 lnet: fix list corruption

In shutdown the resend queues are cleared and freed. The monitor
thread state is set to shutdown. It is possible to get lnet_finalize()
called after the queues are freed. The code checks for ln_state to see
if we're shutting down. But in this case we should really be checking
ln_mt_state. The monitor thread is the one that matters in this case,
because it's the one which allocates and frees the resend queues.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia077cec7a52ef5cd2e1b231437c6265ba9416b1b
Reviewed-on: https://review.whamcloud.com/34778
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-11297 lnet: invalidate recovery ping mdh 71/34771/8
Amir Shehata [Sat, 27 Apr 2019 22:47:42 +0000 (15:47 -0700)]
LU-11297 lnet: invalidate recovery ping mdh

For cleanliness, ensure that recovery ping mdh is invalidated when
an peer ni or a local ni are allocated

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: If06448b1602b3680831244923b6b982a555159ea
Reviewed-on: https://review.whamcloud.com/34771
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-12201 lnet: detach response tracker 70/34770/8
Amir Shehata [Fri, 19 Apr 2019 00:12:49 +0000 (17:12 -0700)]
LU-12201 lnet: detach response tracker

We need to unlink the response tracker from MDs even if the
corresponding message failed to send.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4f320274576790e3332f66f30aad5c2b3450b955
Reviewed-on: https://review.whamcloud.com/34770
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-12163 lnet: fix cpt locking 07/34607/9
Amir Shehata [Sat, 6 Apr 2019 00:38:38 +0000 (17:38 -0700)]
LU-12163 lnet: fix cpt locking

In lnet_select_pathway() the call to lnet_handle_send_case_locked()
can result in sd_cpt being changed. If this function returns
REPEAT_SEND, we'll go back to the again label. It is possible at
this time to initiate discovery, which will unlock the cpt.
If the local cpt isn't updated we could potentially be manipulating
the wrong cpt resulting in some form of corruption or dead lock.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ifd39b0d84f8cce859151f7cc900a082481dd7218
Reviewed-on: https://review.whamcloud.com/34607
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-11816 lnet: setup health timeout defaults 52/34252/13
Amir Shehata [Wed, 19 Dec 2018 23:55:49 +0000 (15:55 -0800)]
LU-11816 lnet: setup health timeout defaults

Enable health feature by default.
Setup transaction timeout to a default 10 seconds and
retry count to 3 when health is enabled. When health
is disabled set default transaction timeout to 50.
When toggling between health enabled/disabled the defaults
will always kick in.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I153c2822898b44e33871ec827de7e61f153bb1db
Reviewed-on: https://review.whamcloud.com/34252
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
14 months agoLU-12344 lnet: handle remote health error 67/34967/2
Amir Shehata [Mon, 27 May 2019 17:43:10 +0000 (10:43 -0700)]
LU-12344 lnet: handle remote health error

When a peer is dead set the health status to REMOTE_DROPPED
in order to handle health properly for the peer.
When dropping a routed message set REMOTE_ERROR. Routed messages
are dropped when the routing feature is turned off which could
be considered a configuration error if it happens in the middle
of traffic. Therefore, it's better to flag this issue at this
point without resending the message.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I131263215a68fc8607582643a47007ce4d04abbc
Reviewed-on: https://review.whamcloud.com/34967
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-12080 lnet: clean mt_eqh properly 77/34477/8
Amir Shehata [Wed, 20 Mar 2019 19:14:51 +0000 (12:14 -0700)]
LU-12080 lnet: clean mt_eqh properly

There is a scenario where you have a peer on your recovery queue
that's down. So you keep pinging it, but every ping times out
after 10 seconds. In the middle of these 10 seconds you perform a
shutdown. First you try to do the rsp_tracker_clean. It goes through
and calls MDUnlink on the MD related to that ping. But because the
message has a ref count on the MD, it doesn't go away. The MD gets
zombied. And just waits for lnet_md_unlink to be called in
lnet_finalize(). Then you hit clean_peer_ni_recovery. We see the peer
on the queue, we try to call Unlink on it, but when we lookup the
MD using lnet_handle2md() we can't find it. Afterwards we try to clean
up the EQ and it asserts. Even if we remove the assert we end up with
a resource leak since the EQ is not actually freed since we won't call
LNetEQFree() again.

The solution is to pull the EQ create in the LNetNIInit() and deletion
happens in lnet_unprepare. By this point all the remaining messages
would've been finalized and all references on the EQ are gone,
allowing us to clean it up properly

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I7fd6018ee2e57f82c649fc3658352e89a4309986
Reviewed-on: https://review.whamcloud.com/34477
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-12080 lnet: recovery event handling broken 45/34445/7
Amir Shehata [Sun, 17 Mar 2019 15:16:40 +0000 (08:16 -0700)]
LU-12080 lnet: recovery event handling broken

Don't increment health on unlink event.
If a SEND fails an unlink will follow so no need to do any
special processing on SEND event. If SEND succeeds then we
wait for the reply.
When queuing a message on the NI recovery queue only do so
if the MT thread is still running.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I4877caebcac5cdfc35a59a18a3e3451b1f23cb0d
Reviewed-on: https://review.whamcloud.com/34445
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Tested-by: Jenkins
14 months agoLU-12014 llite: check correct size in ll_dom_finish_open() 95/33895/10
Mikhail Pershin [Wed, 19 Dec 2018 19:28:53 +0000 (22:28 +0300)]
LU-12014 llite: check correct size in ll_dom_finish_open()

The check in ll_dom_finish_open() for data end shouldn't
use i_size for comparision because it may be not updated
yet with just returned data from server. Use size value in
mdt_body from reply for that check.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I1104fbbb0eb4633869b9bf2d1803ac3e84e3853d
Reviewed-on: https://review.whamcloud.com/33895
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-11213 lmv: mkdir with balanced space usage 60/34360/15
Lai Siyao [Fri, 15 Feb 2019 14:07:56 +0000 (22:07 +0800)]
LU-11213 lmv: mkdir with balanced space usage

If a plain directory default LMV hash type is "space", create
subdirs on all MDTs with balanced space usage:
* client mkdir allocate FID on MDT with balanced space usage
  (space QoS code is in next patch).
* MDT allows mkdir on different MDT with its parent if it has
  "space" hash type in default LMV, this is normally rejected
  because mkdir shouldn't create remote directory.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I284e21f334c07462211be4c8e38e965722d1e8a8
Reviewed-on: https://review.whamcloud.com/34360
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11213 mdc: add async statfs 59/34359/16
Lai Siyao [Fri, 15 Feb 2019 11:12:34 +0000 (19:12 +0800)]
LU-11213 mdc: add async statfs

Add obd_statfs_async() interface for MDC, the statfs request
is sent by ptlrpcd.

This statfs result is for each MDT separately, it's different
from current cached statfs which is aggregated statfs of all
MDTs.

The max age of statfs result is decided by lmv_desc.ld_qos_maxage.

It will deactivate MDC on failure, and activate MDC on success.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8e1bd104fb60ff81e2eb26e49a89a5baf8050d47
Reviewed-on: https://review.whamcloud.com/34359
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11213 ptlrpc: intent_getattr fetches default LMV 02/34802/17
Lai Siyao [Thu, 18 Apr 2019 10:01:47 +0000 (18:01 +0800)]
LU-11213 ptlrpc: intent_getattr fetches default LMV

Intent_getattr fetches default LMV, and caches it on client,
which will be used in subdir creation.

* Add RMF_DEFAULT_MDT_MD in intent_getattr reply.
* Save default LMV in ll_inode_info->lli_default_lsm_md, and
  replace lli_def_stripe_offset with it.
* take LOOKUP lock on default LMV setting to let client update
  cached default LMV.
* improve mdt_object_striped() to read from bottom device
  to avoid reading stripe FIDs.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Idb369db2c514a9c5108390f70d9284b3a87d26db
Reviewed-on: https://review.whamcloud.com/34802
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12374 lustre: push rcu_barrier() before destroying slab 30/35030/2
Wang Shilong [Sat, 1 Jun 2019 11:22:11 +0000 (19:22 +0800)]
LU-12374 lustre: push rcu_barrier() before destroying slab

From rcubarrier.txt:

"
We could try placing a synchronize_rcu() in the module-exit code path,
but this is not sufficient. Although synchronize_rcu() does wait for a
grace period to elapse, it does not wait for the callbacks to complete.

One might be tempted to try several back-to-back synchronize_rcu()
calls, but this is still not guaranteed to work. If there is a very
heavy RCU-callback load, then some of the callbacks might be deferred
in order to allow other processing to proceed. Such deferral is required
in realtime kernels in order to avoid excessive scheduling latencies.

We instead need the rcu_barrier() primitive. This primitive is similar
to synchronize_rcu(), but instead of waiting solely for a grace
period to elapse, it also waits for all outstanding RCU callbacks to
complete. Pseudo-code using rcu_barrier() is as follows:

   1. Prevent any new RCU callbacks from being posted.
   2. Execute rcu_barrier().
   3. Allow the module to be unloaded.
"

So use synchronize_rcu() in ldlm_exit() is not safe enough, and we might
still hit use-after-free problem, also we missed rcu_barrier() when destory
inode cache, this is simiar idea what current local filesystem does.

Change-Id: I76c7dfe7b6472d377fe1b60b0891c61ac8a0fbfc
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/35030
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11204 obdclass: remove unprotected access to lu_object 60/34960/2
Mikhail Pershin [Sun, 26 May 2019 17:46:43 +0000 (20:46 +0300)]
LU-11204 obdclass: remove unprotected access to lu_object

The check of lu_object_is_dying() is done after reference
drop and without lock, so can access freed object if concurrent
thread did final put.

The patch saves object state right before atomic_dec_and_lock()
and checks it after check, so object is not being accessed

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6407cdb079777e60cc0a7aecb64e3a559210b504
Reviewed-on: https://review.whamcloud.com/34960
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12219 obdfilter: changes PAGE_SIZE variable 54/34754/3
Alexander Boyko [Wed, 24 Apr 2019 14:14:57 +0000 (10:14 -0400)]
LU-12219 obdfilter: changes PAGE_SIZE variable

obdfilter-survey uses PAGE_SIZE in KBytes. After LU-11597
PAGE_SIZE exported from test-framework.sh in bytes. So it confuses
obdfilter-survey and lead to error:
/usr/bin/obdfilter-survey: line 509: size * 1024 / (actual_rsz * thr):
division by 0 (error token is ")")

Patch changes the name to PAGE_SIZE_KB.

Fixes: f602b5ec7f4 ("LU-11597 tests: fix O_DIRECT test usage for ARM")
Test-Parameters: trivial testlist=obdfilter-survey
Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-7214
Change-Id: Ie8be852c9634569c59a770ba49c3d1c36f53fdb2
Reviewed-on: https://review.whamcloud.com/34754
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoNew tag 2.12.54 2.12.54 v2_12_54
Oleg Drokin [Wed, 5 Jun 2019 06:37:03 +0000 (02:37 -0400)]
New tag 2.12.54

Change-Id: I17db7c495c4419c0815398f531b6407269355892
Signed-off-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12361 lov: fix wrong calculated length for fiemap 98/34998/3
Wang Shilong [Thu, 30 May 2019 14:46:09 +0000 (22:46 +0800)]
LU-12361 lov: fix wrong calculated length for fiemap

lov_stripe_intersects() will return a closed interval
[@obd_start, @obd_end], so to calcuate length of interval we need

 @obd_end - @obd_start + 1

rather than

 @obd_end - @obd_start

Wrong extent length will make us return wrong fiemap information.

Change-Id: I30deb17cf5fa80a6d3046098fbac0d3faa01ad1c
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34998
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12352 libcfs: crashes with certain cpu part numbers 91/34991/6
Andrew Perepechko [Thu, 17 Jan 2019 21:58:10 +0000 (00:58 +0300)]
LU-12352 libcfs: crashes with certain cpu part numbers

Due to a bug in the code, libcfs will crash if the
number of online cpus does not divide by the number
of cpu partitions.

Based on the checks in cfs_cpt_table_create(), it
appears that the original intent was to push the
remaining cpus into the initial partitions.

So let's do that properly.

Change-Id: I3c5e2aa1fdfca4c07e7afce143c984973373f009
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Cray-bug-id: LUS-6455
Reviewed-on: https://review.whamcloud.com/34991
Tested-by: Jenkins
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12341 tests: Add kmemleak awareness to test-framework 59/34959/4
Oleg Drokin [Sun, 26 May 2019 18:53:21 +0000 (14:53 -0400)]
LU-12341 tests: Add kmemleak awareness to test-framework

If active kmemleak is detected, perform a clear operation to
ensure all non-Lustre related leaks are not getting in the way.

When it comes time to unload modules, first perform a scan
and then save the output if it's not empty, print to
syslog (for simplicity).
Also save /proc/modules content for the next step (we can save ogdb
from /tmp, but that seems to be getting stale and needs its own
fixing)

After modules unload perform another scan and if the result is non-empty,
output the saved /proc/modules output and the updated memleaks
into syslog as well

Test-Parameters: trivial
Change-Id: Ibba9047e4d8b98e7ab74aeb0906078549029ad43
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34959
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-12306 kernel: kernel update RHEL7.6 [3.10.0-957.12.2.el7] 97/34897/3
Jian Yu [Sat, 18 May 2019 05:21:12 +0000 (22:21 -0700)]
LU-12306 kernel: kernel update RHEL7.6 [3.10.0-957.12.2.el7]

Update RHEL7.6 kernel to 3.10.0-957.12.2.el7.

Test-Parameters: clientdistro=el7.6 serverdistro=el7.6

Change-Id: I8124f68085af2b6d8228166e84745cb94edb7fb0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34897
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11963 osd: Add nonrotational flag to statfs 35/34235/11
Patrick Farrell [Wed, 27 Feb 2019 21:29:59 +0000 (16:29 -0500)]
LU-11963 osd: Add nonrotational flag to statfs

It is potentially useful for the MDS and userspace to
know whether or not an OST is using non-rotational media.

Add a flag to obd_statfs that reflects this.

Users can override this parameter in proc.

ZFS does not currently make this information available to
Lustre, so default to rotational and allow users to
override.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iac2b54c5d8cc1eb79cdace764e93578c7b058661
Reviewed-on: https://review.whamcloud.com/34235
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-9581 tests: remove recovery-small test from ALWAYS_EXCEPT 82/27382/8
James Nunez [Wed, 27 Jun 2018 19:34:28 +0000 (13:34 -0600)]
LU-9581 tests: remove recovery-small test from ALWAYS_EXCEPT

recovery-small test 52 is not run during testing,
by adding the test number to the ALWAYS_EXCEPT list,
due to bugzilla bug number 5493.

Remove recovery-small test 52 from the ALWAYS_EXCEPT
list and start running this test again.

Test-Parameters: trivial testlist=recovery-small
Signed-off-by: dilip krishnagiri <dilipx.krishnagiri@intel.com>
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: I50e30831ee5af8e063dc4b6197141fed365535b6
Reviewed-on: https://review.whamcloud.com/27382
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12120 grants: prevent negative ted_grant value 96/34996/2
Mikhail Pershin [Thu, 30 May 2019 09:30:43 +0000 (12:30 +0300)]
LU-12120 grants: prevent negative ted_grant value

Add check in tgt_grant_shrink() to protect ted_grant
against negative value.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Iddea86f052124413ac60f5d0f26bcb68e376ede5
Reviewed-on: https://review.whamcloud.com/34996
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11213 dne: add new dir hash type "space" 58/34358/12
Lai Siyao [Thu, 14 Feb 2019 21:16:33 +0000 (05:16 +0800)]
LU-11213 dne: add new dir hash type "space"

Add a new hash type "space", if this is set on default LMV of
a directory, its subdirs will be created on all MDTs with
balanced space usage.

* new hash type LMV_HASH_TYPE_SPACE.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8edf38f94e24965b1cffb21253c3be0eef68c707
Reviewed-on: https://review.whamcloud.com/34358
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
14 months agoLU-12269 kernel: new kernel [RHEL 8.0 4.18.0-80.el8] 62/34862/8
Jian Yu [Wed, 22 May 2019 19:24:14 +0000 (12:24 -0700)]
LU-12269 kernel: new kernel [RHEL 8.0 4.18.0-80.el8]

This patch makes changes to support new RHEL 8.0 release
for Lustre client.

Test-Parameters: trivial

Change-Id: I89b4f1e59f8b25bf9d37d3564e2d05d6e87d9b38
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34862
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12140 lnet: adds checking msg len 75/34975/5
Alexander Boyko [Tue, 28 May 2019 10:07:12 +0000 (06:07 -0400)]
LU-12140 lnet: adds checking msg len

The LNET can't handle a msg with len larger than LNET_MTU.
The next error occured for DOM 1MB
 LNetError: 3137:0:(lib-move.c:4143:lnet_parse()) 192.168.8.1@tcp,
 src 192.168.8.1@tcp: bad PUT payload 1051832 (1048576 max expected)

The patch adds fragment size check.

Signed-off-by: Alexander Boyko <c17825@cray.com>
Cray-bug-id: LUS-7174
Change-Id: Id2d21ebd87ab0bf3a9114548900fab99b278ffb0
Reviewed-on: https://review.whamcloud.com/34975
Tested-by: Jenkins
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11623 llite: hash just created files if lock allows 84/33584/9
Oleg Drokin [Tue, 6 Nov 2018 00:26:44 +0000 (19:26 -0500)]
LU-11623 llite: hash just created files if lock allows

If open|creat (and other intent operations later) returned a lookup bit
as part of the lock, hash the resultant dentry under this lock,
not to trigger further RPCs in subsequent lookups.

Benchmark results:

This patch can significantly improve open-create + stat on the same
client.

This patch in combination with two others:

https://review.whamcloud.com/32157
https://review.whamcloud.com/33585

Improves the 'stat' side of open-create + stat by >10x.

Without patches (master branch commit 26a7abe):

mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k

   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       3838.205       3838.204       3838.204          0.000
   File stat         :      33459.289      33459.249      33459.271          0.011
   File read         :          0.000          0.000          0.000          0.000
   File removal      :          0.000          0.000          0.000          0.000
   Tree creation     :       3146.841       3146.841       3146.841          0.000
   Tree removal      :          0.000          0.000          0.000          0.000

With the three patches:

mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k
SUMMARY rate: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       3822.440       3822.439       3822.440          0.000
   File stat         :     350620.140     350615.980     350617.193          1.051
   File read         :          0.000          0.000          0.000          0.000
   File removal      :          0.000          0.000          0.000          0.000
   Tree creation     :       2076.727       2076.727       2076.727          0.000
   Tree removal      :          0.000          0.000          0.000          0.000

Note 33K stats/second vs 350K stats/second.

ls -l time of the mdtest directory is also reduced from 23.5 seconds to
5.8 seconds.

Change-Id: Id5140d1042af7f5ab9052922e11a7eda8f92a29a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33584
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
14 months agoLU-10948 llite: Revalidate dentries in ll_intent_file_open 57/32157/9
Oleg Drokin [Wed, 25 Apr 2018 19:04:48 +0000 (15:04 -0400)]
LU-10948 llite: Revalidate dentries in ll_intent_file_open

We might get a lookup lock in response to our open request and we
definitely want to ensure that our dentry is valid, so it could
actually be matched by dcache code in future operations.

Benchmark results:

This patch can significantly improve open-create + stat on the same
client.

This patch in combination with two others:

https://review.whamcloud.com/#/c/33584
https://review.whamcloud.com/#/c/33585

Improves the 'stat' side of open-create + stat by >10x.

Without patches (master branch commit 26a7abe):

mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k

   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       3838.205       3838.204       3838.204          0.000
   File stat         :      33459.289      33459.249      33459.271          0.011
   File read         :          0.000          0.000          0.000          0.000
   File removal      :          0.000          0.000          0.000          0.000
   Tree creation     :       3146.841       3146.841       3146.841          0.000
   Tree removal      :          0.000          0.000          0.000          0.000

With the three patches:

mpirun -np 24 --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /cache1/out/ -F -C -T -v -w 32k
SUMMARY rate: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       3822.440       3822.439       3822.440          0.000
   File stat         :     350620.140     350615.980     350617.193          1.051
   File read         :          0.000          0.000          0.000          0.000
   File removal      :          0.000          0.000          0.000          0.000
   Tree creation     :       2076.727       2076.727       2076.727          0.000
   Tree removal      :          0.000          0.000          0.000          0.000

Note 33K stats/second vs 350K stats/second.

ls -l time of the mdtest directory is also reduced from 23.5 seconds to
5.8 seconds.

Change-Id: I2cb4f94c0300897adb90cc89425e5cfb1c6fe7af
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/32157
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11041 kernel: Enable tons of kernel debug options 93/32493/14
Minh Diep [Tue, 5 Feb 2019 20:13:27 +0000 (12:13 -0800)]
LU-11041 kernel: Enable tons of kernel debug options

Enable extra debugging options in rhel7 kernels
Create new lbuild option to build with the file config file

Test-Parameters: trivial

Change-Id: I29f503dcc97ff79e27539667e3f1d0edb33c23f4
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32493
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12345 ldiskfs: optimize nodelalloc mode 82/34982/2
Artem Blagodarenko [Tue, 28 May 2019 16:51:21 +0000 (19:51 +0300)]
LU-12345 ldiskfs: optimize nodelalloc mode

We found performance regression when using bigalloc with "nodelalloc"
(1MB cluster size):

1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024

The "dd" will cost about 2 seconds to finish, but if we mke2fs without
"bigalloc", "dd" will only cost less than 1 second.

The reason is: when using ext4 with "nodelalloc", it will call
ext4_find_delalloc_cluster() nearly everytime it call
ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan
all pages in cluster because no buffer is "delayed".  A cluster has
256 pages (1MB cluster), so it will scan 256 * 256k pags when creating
a 1G file. That severely hurts the performance.

Therefore, we return immediately from ext4_find_delalloc_range() in
nodelalloc mode, since by definition there can't be any delalloc
pages.

The same optimization also added for ldiskfs_find_delayed_extent()
function that improve performance dromaticaly.

Here is results of testing on two node system.
Without the patch:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   56.30    0.06    0.00   43.63

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sds               0.00     0.00    0.00 1174.00     0.00     4.59
8.00     0.84    0.71    0.00    0.71   0.01   1.20

With patch:
08/29/2018 01:13:22 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.00    0.00    4.13   30.37    0.00   65.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s      wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm %util
sds               0.00     0.00    0.00 54117.82     0.00     211.43
8.00   152.59    2.82    0.00    2.82   0.02 99.01

Cray-bug-id: LUS-5835
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Change-Id: Ie33410d4481778ee4f76a054ab8cfc11cc19a0ed
Reviewed-on: https://review.whamcloud.com/34982
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-10467 ptlrpc: discard SVC_SIGNAL and related functions 56/34956/2
NeilBrown [Sat, 25 May 2019 15:19:30 +0000 (11:19 -0400)]
LU-10467 ptlrpc: discard SVC_SIGNAL and related functions

This flag is never set, so remove checks and remove
the flag.

Linux-commit: 7f76eb1a6bb7587cbfee410df914bc83f717a362

Change-Id: I4f0c082392b4c140c85da2dcc149a682b2f37fea
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/34956
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-12323 libcfs: check if save_stack_trace_tsk is exported 37/34937/9
Chris Horn [Wed, 22 May 2019 16:21:14 +0000 (11:21 -0500)]
LU-12323 libcfs: check if save_stack_trace_tsk is exported

Lustre 2.12 commit afedf9343686504c89f2e28cf6133540166f2347 introduced
the use of save_stack_trace_tsk, but this symbol is not exported for
all architectures. When it's possible we can use save_stack_trace
instead. Otherwise skip printing stack trace.

Cray-bug-id: LUS-7352
Test-Parameters: clientarch=aarch64
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I142b542f5c5672abbad461a621aedd1e49db1bdd
Reviewed-on: https://review.whamcloud.com/34937
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 ldlm: Fix style issues for ldlm_plain.c 91/34491/2
Arshad Hussain [Wed, 20 Mar 2019 21:19:42 +0000 (02:49 +0530)]
LU-6142 ldlm: Fix style issues for ldlm_plain.c

This patch fixes issues reported by checkpatch
for file lustre/ldlm/ldlm_plain.c

Test-Parameters: trivial
Change-Id: I2f614a62d7ca1b350766f7981991218d76b26e27
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34491
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-11089 obd: remove lock from key register/degister 68/33668/7
NeilBrown [Tue, 21 May 2019 15:07:47 +0000 (11:07 -0400)]
LU-11089 obd: remove lock from key register/degister

lu_context_key_register() doesn't really need locking.
It can use cmpxchg() to atomically install a key, and
check the result to see if it succeeded.
This requires the key to be completely ready before we
try to install it, so lct_used and lct_reference are
set up first, then reverted on failure.

With this done, lu_context_key_degister() no longer
needs locking. It just need to set the slot to NULL.
This is done with suitable memory barriers so that the
slot cannot be reused until we are completely finished
with it.

Note that I added a warning if the slot holds NULL.
The code currently tested that code, but I don't think it
can really happen.

Linux-commit: f4f30a8fc2c9568b87920e89fe4230530c26148f

Change-Id: I8e81c4694e8df2a2805e0b3104a83aa490c536ec
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33668
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11089 obd: use wait_event_var() in lu_context_key_degister() 67/33667/13
NeilBrown [Tue, 21 May 2019 14:22:39 +0000 (10:22 -0400)]
LU-11089 obd: use wait_event_var() in lu_context_key_degister()

lu_context_key_degister() has an open coded loop which calls
schedule() without setting a new task state.  This is generally
a bad idea - it could easily just spin.

Instead, use wait_event_var() to wait for ->lct_used to be zero,
and arrange to get a wakeup when that happens.
Previously ->lct_used would only fall down to 1.  Now we decrement
it an extra time so that wake_up, which only happens when the
count reaches zero, will only happen when lu_context_key_degister()
is actually waiting for it.

Note that this patch removes key_fini() from protection by
lu_keys_guard.  key_fini() calls are not always protected
by a lock, and there seems to be no need here.  Nothing else
can be acting on the given key in that context at this point,
so no race is possible.

Linux-commit: ef84c07364211bb4e398a9de45d1c13a32059cee

Change-Id: I9514bd21916f75fce00e393612967fb197e3a1c4
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33667
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12350 tests: Do not use background failover 85/34985/2
Patrick Farrell [Tue, 28 May 2019 21:02:49 +0000 (17:02 -0400)]
LU-12350 tests: Do not use background failover

For some reason, test 33 chooses at one point to take an
OST offline by starting failover in the background. It
seems to assume the OST will be offline during the
subsequent read, without doing anything to verify it is
offline - In fact, it could either be not offline yet or
back online with failover complete.

Just use stop like the rest of the test does.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I9c074ff1412793b8f0d8f15dc1e2ee21bb6d9fd6
Reviewed-on: https://review.whamcloud.com/34985
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12342 spec: mark lsvcgss as a config file in the rpm 78/34978/3
Götz Waschk [Tue, 28 May 2019 06:48:02 +0000 (08:48 +0200)]
LU-12342 spec: mark lsvcgss as a config file in the rpm

The file /etc/sysconfig/lsvcgss shouldn't be overwritten on package
upgrades.

Signed-off-by: Götz Waschk <goetz.waschk@desy.de>
Change-Id: I3fa0a3a5a06d9e59699d23e652329365f38fd028
Reviewed-on: https://review.whamcloud.com/34978
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12333 ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro 49/34949/3
Chris Horn [Thu, 23 May 2019 19:42:53 +0000 (14:42 -0500)]
LU-12333 ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro

The rq_req_unlinked, rq_reply_unlinked and rq_receiving_reply flags
determine whether a PtlRPC request can transition out of
RQ_PHASE_UNREG_RPC. Add these flags to the DEBUG_REQ_FLAGS macro to
aid in debugging issues where requests are stuck in this unregistering
state.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I0b4f424cba70a29c64035ebaccf33fdb954a2db6
Reviewed-on: https://review.whamcloud.com/34949
Reviewed-by: Ann Koehler <amk@cray.com>
Tested-by: Jenkins
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12302 lnet: Fix NI status in proc for loopback ni 71/34871/2
Chris Horn [Wed, 15 May 2019 19:07:20 +0000 (14:07 -0500)]
LU-12302 lnet: Fix NI status in proc for loopback ni

The loopback NI is never really "down", but since its associated
ns_status is used for other purposes that's how it is reported in
proc_lnet_nis(). There's an existing check for lolnd so just hardcode
the status as "up" there.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: If3f29dbc08c14aa187b00d680d0045a7dbb7f2d8
Reviewed-on: https://review.whamcloud.com/34871
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11946 build: no yaml check during configure --enable-dist 12/34812/2
Olaf Faaland [Mon, 6 May 2019 18:38:37 +0000 (11:38 -0700)]
LU-11946 build: no yaml check during configure --enable-dist

If the yaml libraries are not found, the error is fatal, and prevents
the sources from being packaged.

This check is unnecessary when sources are being packaged, so this patch
disables the test when configure is run with --enable-dist.

Change-Id: I160e0d54efc59480d2f830607467dbc9f34c9de3
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/34812
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11946 build: no zlib check during configure --enable-dist 11/34811/5
Olaf Faaland [Mon, 6 May 2019 18:31:21 +0000 (11:31 -0700)]
LU-11946 build: no zlib check during configure --enable-dist

If the zlib libraries are not found, the error is fatal, and prevents
the sources from being packaged.

This check is unnecessary when sources are being packaged, so this patch
disables the test when configure is run with --enable-dist.

Change-Id: Ie262b17b63c0edc8e8bfbd0c1a466ec37d05622c
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/34811
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-10894 dom: mdc_lock_flush() improvement 38/34738/2
Mikhail Pershin [Mon, 22 Apr 2019 17:31:12 +0000 (20:31 +0300)]
LU-10894 dom: mdc_lock_flush() improvement

There is small improvement in osc_lock_flush() to don't
match other locks for write lock because there are none.

Do the same in mdc_lock_flush().

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ie18ef63b2f969f762f0263f8b93aea726f89305f
Reviewed-on: https://review.whamcloud.com/34738
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-10894 dom: per-resource ELC for WRITE lock enqueue 36/34736/2
Mikhail Pershin [Mon, 22 Apr 2019 13:05:00 +0000 (16:05 +0300)]
LU-10894 dom: per-resource ELC for WRITE lock enqueue

Improve client write lock enqueue by doing ELC for any
read lock on the same resource. This helps with read/write
access, e.g. compilebench shows ~10% better results with
about 45% less ldlm cancel RPCs.

In mdc_enqueue_send() collect resource unused read locks
and pack them into enqueue request.

The ldlm_cancel_resource_local() is changed also to don't
skip DOM lock if it is set in policy explicitly

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I06ece95d837495e2e970ce670db61ba0aa4e1ab4
Reviewed-on: https://review.whamcloud.com/34736
Tested-by: Jenkins
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11872 utils: don't follow link files in default 11/34111/5
Wang Shilong [Fri, 25 Jan 2019 08:54:50 +0000 (16:54 +0800)]
LU-11872 utils: don't follow link files in default

We actually don't support operation on link files itself for now.
As a first step, let's skip link files for now in default,
otherwise, it cause unexpected behavior.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Change-Id: Ib0069ed1982e26984c6cf093f0803bf4a2208fe1
Reviewed-on: https://review.whamcloud.com/34111
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
14 months agoLU-11526 rpc: support maximum 64MB I/O RPC 42/34042/11
Qian Yingjin [Wed, 16 Jan 2019 02:13:20 +0000 (10:13 +0800)]
LU-11526 rpc: support maximum 64MB I/O RPC

On newer systems, some block drivers allow max_hw_sector_kb to
be up to 65536KB (64MB) to the underlying storage. To maximize
driver efficiency, Lustre should also have bump up maximum I/O
RPC size to 64MB.
Clamp max_read_ahead_whold_mb not to exceed
max_read_ahead_per_file_mb

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Icbf78742f8210d82dc310af7d05b7c32b93af34f
Reviewed-on: https://review.whamcloud.com/34042
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11893 o2iblnd: add secondary IP address handling 76/34476/7
James Simmons [Sat, 18 May 2019 22:35:54 +0000 (18:35 -0400)]
LU-11893 o2iblnd: add secondary IP address handling

Using dev_get_by_name() in kiblnd_create_dev() means we can only
discover primary IP addresses. This breaks using network
aliasing which some people use. Move away from dev_get_by_name()
to using for_ifa() so we can detect any secondary IP addresses.

Change-Id: I03f2f8d18118b716a5eb5fb87694000ac06fe242
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34476
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Neil Brown <neilb@suse.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-9846 lod: Add overstriping support 25/28425/43
Patrick Farrell [Wed, 29 May 2019 14:42:59 +0000 (10:42 -0400)]
LU-9846 lod: Add overstriping support

Each stripe in a shared file in Lustre corresponds to a
single LDLM extent locking domain and also to a single
object on disk (and in the OSS page cache).  LDLM locks are
extent locks, but there are still significant issues with
false sharing with multiple writers.  On-disk file systems
also have per-object performance limitations for both read
and write.

The LDLM limitation means it is best to have a single
writer per stripe, but modern OSTs can be faster than a
single client, so this restricts maximum performance unless
special methods are used (eg, Lustre lock ahead).

The on disk file system limitations mean that even if LDLM
locking is not an issue (read and not write, or lockahead),
OST performance in a shared file is still limited by having
only one object per OST.

These limitations make it impossible to get the full
performance of a modern Lustre FS with a single shared
file.

This patch makes it possible to have >1 stripe on a given
OST in each layout component.  This is known as
overstriping.  It works exactly like a normally striped
file, and is largely transparent to users.

By raising the object count per OST, this avoids the single
object limits, and by creating more stripes, also avoids
the "single effective writer per stripe" LDLM limitation.

However, it is only desirable in some situations, so users
must request it with a special setstripe command:

lfs setstripe -C [count] [file]

Users can also access overstriping using the standard '-o'
option to manually select OSTs:

lfs setstripe -o [ost_indices] [file]

Overstriping also makes it easy to test layout size limits,so we add a
test for that.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I14bb94b05642b3542a965e84fda4615b997a4dea
Reviewed-on: https://review.whamcloud.com/28425
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11359 mdt: fix mdt_dom_discard_data() timeouts 71/34071/21
Mikhail Pershin [Wed, 31 Oct 2018 13:28:29 +0000 (16:28 +0300)]
LU-11359 mdt: fix mdt_dom_discard_data() timeouts

The mdt_dom_discard_data() issues new lock to cause data
discard for all conflicting client locks. This was done in
context of unlink RPC processing and may cause it to be stuck
waiting for client to cancel their locks leading to cascading
timeouts for any other locks waiting on the same resource and
parent directory.

Patch skips discard lock waiting in the current context by
using own CP callback for that which doesn't wait for blocking
locks. They will be finished later by LDLM and cleaned up in
that completion callback. So current thread just makes sure
discard locks are taken and BL ASTs are sent but doesnt't wait
for lock granting and that fixes the original problem.

At the same time that opens window for race with data being
flushed on client, so it is possible that new IO from client
will happen on just unlinked object causing error message and
it is not possible to distinguish that case from other
possibly critical situations. To solve that the unlinked object
is pinned in memory while until discard lock is granted.
Therefore, such objects can be easily distinguished as stale one
and any IO against it can be just silently ignored.

Older clients are not fully compatible with async DoM discard so
patch adds also new connection flag ASYNC_DISCARD to distinguish
old clients and use old blocking discard for then.

Test-Parameters: testlist=racer,racer,racer
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I419677af43c33e365a246fe12205b506209deace
Reviewed-on: https://review.whamcloud.com/34071
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12034 obdclass: put all service's env on the list 66/34566/17
Alex Zhuravlev [Wed, 3 Apr 2019 08:29:06 +0000 (11:29 +0300)]
LU-12034 obdclass: put all service's env on the list

to be able to lookup by current thread where it's too
complicated to pass env by argument.

this version has stats to see slow/fast lookups. so, in sanity-benchmark
there were 172850 fast lookups (from per-cpu cache) and 27228 slow lookups
(from rhashtable). going to see the ration in autotest's reports.

Fixes: 2339e1b3b690 ("LU-11483 ldlm ofd_lvbo_init() and mdt_lvbo_fill() create env")
Fixes: e02cb40761ff ("LU-11164 ldlm: pass env to lvbo methods")

Change-Id: Ia760e10fa5c68e7a18284e4726d215b330fc0eed
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34566
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-11233 tests: fix gcc8 build warnings 61/34661/6
Alex Zhuravlev [Wed, 22 May 2019 20:28:55 +0000 (13:28 -0700)]
LU-11233 tests: fix gcc8 build warnings

this patch covers Lustre tests

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I6345d603772fb32bbc4b38a758a3e97f0361d116
Reviewed-on: https://review.whamcloud.com/34661
Tested-by: Jenkins
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12269 build: add value to definition of with_gss in spec 92/34892/4
Ben Menadue [Thu, 16 May 2019 23:52:40 +0000 (09:52 +1000)]
LU-12269 build: add value to definition of with_gss in spec

rpmbuild currently fails when gss_keyring is enabled (which
happens automatically if the right packages are installed).
This is due to an ill-formed %define in lustre.spec.in that
doesn't include the value to set the macro do.

This patch updates this line to set the value to 1.

Signed-off-by: Ben Menadue <ben.menadue@anu.edu.au>
Change-Id: I2f52b19795091702622eb3b4c110f09eb80654db
Reviewed-on: https://review.whamcloud.com/34892
Tested-by: Jenkins
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12269 build: remove %{fullrelease} from Provides 83/34883/7
Ben Menadue [Thu, 16 May 2019 05:14:56 +0000 (15:14 +1000)]
LU-12269 build: remove %{fullrelease} from Provides

Commit 7532409 adds a version number to lustre-osd-mount
Provides lines in lustre.spec.in, but include the
%{fullrelease} macro that was previously removed by
28c17d4. This causes an "unexpanded macro" warning when
building the RPM, and the result contains a bogus string
for that name, e.g.

    2.12.53_45_g43fc4db-%{fullrelease}

This patch simply removes the "-%{fullrelease}" suffix from
these lines in lustre.spec.in.

Test-Parameters: trivial
Signed-off-by: Ben Menadue <ben.menadue@anu.edu.au>
Change-Id: Ia13f339f57b89c02443ebc2d68f0aa3b0802319a
Reviewed-on: https://review.whamcloud.com/34883
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12269 build: fix hardened builds in rpm spec file 82/34882/6
Ben Menadue [Thu, 16 May 2019 04:42:37 +0000 (14:42 +1000)]
LU-12269 build: fix hardened builds in rpm spec file

The hardened build configure on RHEL8 has a quoted string
with spaces in it, and this breaks the construction of
%eval_configure on lustre.spec.in - the quotes end up in
the wrong place.

Moreover, the hardened build flags are only for user-space
code, and breaks kernel code compilation on RHEL 8.0 (it
adds -fPIE, which isn't valid for kernel code.

This patch stores the %build_cflags and %build_ldflags from
rpmbuild as environment variables before turning hardened
build off to allow the kernel code to build. These
environment variables are used in the lnet/utils and
lustre/utils Makefiles so that the user-space code there
gets the benefit of any system-specific RPM build flag
(such as hardened builds).

For RHEL7 on PPC64 we then also need to define the C macro
__SANE_USERSPACE_TYPES__ so that __s64 and __u64 are long
long instead of the default long - otherwise the build will
fail with a format string error on this platform because
Lustre uses %ll when printing/scanning __s64/__u64.

The environment variables (UTILS_CFLAGS and UTILS_LDFLAGS)
could also be used for a standalone, non-RPM build to pass
flags to the user-space code, with the usual CFLAGS and
LDFLAGS still used for kernel code.

Signed-off-by: Ben Menadue <ben.menadue@anu.edu.au>
Change-Id: I9b4ba830bf63838fd88ef1bae5dd10dff2109a1d
Reviewed-on: https://review.whamcloud.com/34882
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12299 libcfs: fix panic for too large cpu partions 64/34864/3
Wang Shilong [Wed, 15 May 2019 01:52:37 +0000 (09:52 +0800)]
LU-12299 libcfs: fix panic for too large cpu partions

If cpu partions larger than online cpus, following calcuation
will be 0:

num = num_online_cpus() / ncpt;

And it will trigger following panic in cfs_cpt_choose_ncpus()

LASSERT(number > 0);

We actually did not support this, instead of panic
it, return failure is better.

Also fix a invalid pointer access if we failed to init @cfs_cpt_table,
as it will be converted to ERR_PTR() if error happen.

Change-Id: I49daadd8f0c7d22aa78d08248d8c085781740768
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/34864
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12095 ptlrpc: ocd_connect_flags are wrong during reconnect 80/34480/5
Andriy Skulysh [Wed, 27 Feb 2019 17:37:24 +0000 (19:37 +0200)]
LU-12095 ptlrpc: ocd_connect_flags are wrong during reconnect

Import connect flags are reset to original ones during
reconnect, so a request can be created with unsupported
features.

Use separate obd_connect_data to send connect request.

Change-Id: I4cfc48bf7ef66c4f3832613e179030b0eb1d6fdf
Cray-bug-id: LUS-6397
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/34480
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12267 tests: update filter in acl for SElinux case 18/34818/2
Sebastien Buisson [Tue, 7 May 2019 15:55:04 +0000 (00:55 +0900)]
LU-12267 tests: update filter in acl for SElinux case

With SElinux enforced on client, sanity.sh test_103a fails because
the "ls -l" command produces an extra '.' at the end to indicate
extra security attributes are set.

So update filter by removing this trailing '.' in the output.

Test-Parameters: trivial testlist=sanity envdefinitions=ONLY=103a
Test-Parameters: clientselinux testlist=sanity envdefinitions=ONLY=103a
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie684a3fe02f0f2821c8059855165a0f9dd585b72
Reviewed-on: https://review.whamcloud.com/34818
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11838: lnet: remove lnet_ipif_enumerate() 34/34234/3
NeilBrown [Wed, 20 Mar 2019 19:25:24 +0000 (15:25 -0400)]
LU-11838: lnet: remove lnet_ipif_enumerate()

Also remove lnet_ipif_query() and related functions.

There are no longer any users of these functions, so remove them.

Linux-commit: 6e659fcfab0cdd876a555a752acf9997f98acbcd

Change-Id: I8183e505e3dbe12ff71ddf38f5b18a945d8a4a6c
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/34234
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11907 dne: allow access to striped dir with broken layout 50/34750/4
Lai Siyao [Sun, 14 Apr 2019 20:12:54 +0000 (04:12 +0800)]
LU-11907 dne: allow access to striped dir with broken layout

Sometimes the layout of striped directories may become broken:
* creation/unlink is partially executed on some MDT.
* disk failure or stopped MDS cause some stripe inaccessible.
* software bugs.

In this situation, this directory should still be accessible,
and specially be able to migrate to other active MDTs.

This patch add this support on both server and client: don't
imply stripe FID is sane, and when stripe doesn't exist, skip
it.

Add OBD_FAIL_MDS_STRIPE_FID to simulate insane stripe FID, and
OBD_FAIL_MDS_STRIPE_CREATE to simulate stripe creation failure.

Add sanity 60h.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I8a05a0e0cef8b051a935b3fa3d3e26c0b6ef3b4a
Reviewed-on: https://review.whamcloud.com/34750
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 ptlrpc: Fix style issues for llog_client.c 00/34900/4
Arshad Hussain [Thu, 9 May 2019 21:18:04 +0000 (02:48 +0530)]
LU-6142 ptlrpc: Fix style issues for llog_client.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/llog_client.c

Change-Id: I4a3ce0022b9086fc1885d447c9b876bef183f298
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34900
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-6142 ptlrpc: Fix style issues for client.c 03/34803/6
Arshad Hussain [Tue, 30 Apr 2019 04:48:50 +0000 (10:18 +0530)]
LU-6142 ptlrpc: Fix style issues for client.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/client.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I24c4412d8747292f71c28fc0e8fc48b1cea405b9
Reviewed-on: https://review.whamcloud.com/34803
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 ptlrpc: Fix style issues for sec.c 97/34597/2
Arshad Hussain [Sat, 23 Mar 2019 01:21:04 +0000 (06:51 +0530)]
LU-6142 ptlrpc: Fix style issues for sec.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/sec.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: Icfbf23301b8f1b8d21df0e5122c121671997d5eb
Reviewed-on: https://review.whamcloud.com/34597
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-6142 ptlrpc: Fix style issues for sec_gc.c 51/34551/2
Arshad Hussain [Fri, 22 Mar 2019 13:01:52 +0000 (18:31 +0530)]
LU-6142 ptlrpc: Fix style issues for sec_gc.c

This patch fixes issues reported by checkpatch for
file lustre/ptlrpc/sec_gc.c

Change-Id: I19f9f86aba86417b31245da4246c2d6eeb0a3752
Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34551
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-6142 ptlrpc: Fix style issues for sec_plain.c 50/34550/4
Arshad Hussain [Fri, 22 Mar 2019 11:44:31 +0000 (17:14 +0530)]
LU-6142 ptlrpc: Fix style issues for sec_plain.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/sec_plain.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Change-Id: I03220084b303d9d411665db9a6080f934115b67a
Reviewed-on: https://review.whamcloud.com/34550
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
14 months agoLU-6142 ldlm: Fix style issues for ldlm_resource.c 92/34492/2
Arshad Hussain [Wed, 20 Mar 2019 22:30:06 +0000 (04:00 +0530)]
LU-6142 ldlm: Fix style issues for ldlm_resource.c

This patch fixes issues reported by checkpatch
for file lustre/ldlm/ldlm_resource.c

Test-Parameters: trivial
Change-Id: I50cf6d303f284ea5d77f825eaba8f7bbdbf60568
Signed-off-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-on: https://review.whamcloud.com/34492
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Ben Evans <bevans@cray.com>