Whamcloud - gitweb
LU-8130 lu_object: convert lu_object cache to rhashtable
The lu_object cache is a little more complex than the other lustre
hash tables for two reasons.
1/ there is a debugfs file which displays the contents of the cache,
so we need to use rhashtable_walk in a way that works for seq_file.
2/ There is a (sharded) lru list for objects which are no longer
referenced, so finding an object needs to consider races with the
lru as well as with the hash table.
The debugfs file already manages walking the libcfs hash table keeping
a current-position in the private data. We can fairly easily convert
that to a struct rhashtable_iter. The debugfs file actually reports
pages, and there are multiple pages per hashtable object. So as well
as rhashtable_iter, we need the current page index.
For the double-locking, the current code uses direct-access to the
bucket locks that libcfs_hash provides. rhashtable doesn't provide
that access - callers must provide their own locking or use rcu
techniques.
The lsb_waitq.lock is still used to manage the lru list, but with
this patch it is no longer nested *inside* the hashtable locks, but
instead is outside. It is used to protect an object with a refcount
of zero.
When purging old objects from an lru, we first set
LU_OBJECT_HEARD_BANSHEE while holding the lsb_waitq.lock,
then remove all the entries from the hashtable separately.
When removing the last reference from an object, we first take the
lsb_waitq.lock, then decrement the reference and add to the lru list
or discard it setting LU_OBJECT_UNHASHED.
When we find an object in the hashtable with a refcount of zero, we
take the corresponding lsb_waitq.lock and check that neither
LU_OBJECT_HEARD_BANSHEE or LU_OBJECT_UNHASHED is set. If neither is,
we can safely increment the refcount. If either are, the object is
gone.
This way, we only ever manipulate an object with a refcount of zero
while holding the lsb_waitq.lock.
As there is nothing to stop us using the resizing capabilities of
rhashtable, the code to try to guess the perfect hash size has been
removed.
Also: the "is_dying" variable in lu_object_put() is racey - the value
could change the moment it is sampled. It is also not needed as it is
only used to avoid a wakeup, which is not particularly expensive.
In the same code as comment says that 'top' could not be accessed, but
the code then immediately accesses 'top' to calculate 'bkt'.
So move the initialization of 'bkt' to before 'top' becomes unsafe.
Also: Change "wake_up_all()" to "wake_up()". wake_up_all() is only
relevant when an exclusive wait is used.
Moving from the libcfs hashtable to rhashtable also gives the
benefit of a very large performance boost.
Before patch:
SUMMARY rate: (of 5 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
Directory creation: 12036.610 11091.880 11452.978 318.829
Directory stat: 25871.734 24232.310 24935.661 574.996
Directory removal: 12698.769 12239.685 12491.008 149.149
File creation: 11722.036 11673.961 11692.157 15.966
File stat: 62304.540 58237.124 60282.003 1479.103
File read: 24204.811 23889.091 24048.577 110.245
File removal: 9412.930 9111.828 9217.546 120.894
Tree creation: 3515.536 3195.627 3442.609 123.792
Tree removal: 433.917 418.935 428.038 5.545
After patch:
SUMMARY rate: (of 5 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
Directory creation: 11873.308 303.626 9371.860 4539.539
Directory stat: 31116.512 30190.574 30568.091 335.545
Directory removal: 13082.121 12645.228 12943.239 157.695
File creation: 12607.135 12293.319 12466.647 138.307
File stat: 124419.347 105240.996 116919.977 7847.165
File read: 39707.270 36295.477 38266.011 1328.857
File removal: 9614.333 9273.931 9477.299 140.201
Tree creation: 3572.602 3017.580 3339.547 207.061
Tree removal: 487.687 0.004 282.188 230.659
Change-Id: I618dc2e2da003c240a887126f600e7eac5df951c
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/36707
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>