Bitcoin Core 28.99.0
P2P Digital Currency
|
cache implements a cache with properties similar to a cuckoo-set. More...
#include <cuckoocache.h>
Public Member Functions | |
cache () | |
You must always construct a cache with some elements via a subsequent call to setup or setup_bytes, otherwise operations may segfault. More... | |
uint32_t | setup (uint32_t new_size) |
setup initializes the container to store no more than new_size elements and no less than 2 elements. More... | |
std::pair< uint32_t, size_t > | setup_bytes (size_t bytes) |
setup_bytes is a convenience function which accounts for internal memory usage when deciding how many elements to store. More... | |
void | insert (Element e) |
insert loops at most depth_limit times trying to insert a hash at various locations in the table via a variant of the Cuckoo Algorithm with eight hash locations. More... | |
bool | contains (const Element &e, const bool erase) const |
contains iterates through the hash locations for a given element and checks to see if it is present. More... | |
Private Member Functions | |
std::array< uint32_t, 8 > | compute_hashes (const Element &e) const |
compute_hashes is convenience for not having to write out this expression everywhere we use the hash values of an Element. More... | |
constexpr uint32_t | invalid () const |
invalid returns a special index that can never be inserted to More... | |
void | allow_erase (uint32_t n) const |
allow_erase marks the element at index n as discardable. More... | |
void | please_keep (uint32_t n) const |
please_keep marks the element at index n as an entry that should be kept. More... | |
void | epoch_check () |
epoch_check handles the changing of epochs for elements stored in the cache. More... | |
Private Attributes | |
std::vector< Element > | table |
table stores all the elements More... | |
uint32_t | size {0} |
size stores the total available slots in the hash table More... | |
bit_packed_atomic_flags | collection_flags |
The bit_packed_atomic_flags array is marked mutable because we want garbage collection to be allowed to occur from const methods. More... | |
std::vector< bool > | epoch_flags |
epoch_flags tracks how recently an element was inserted into the cache. More... | |
uint32_t | epoch_heuristic_counter {0} |
epoch_heuristic_counter is used to determine when an epoch might be aged & an expensive scan should be done. More... | |
uint32_t | epoch_size {0} |
epoch_size is set to be the number of elements supposed to be in a epoch. More... | |
uint8_t | depth_limit {0} |
depth_limit determines how many elements insert should try to replace. More... | |
const Hash | hash_function |
hash_function is a const instance of the hash function. More... | |
cache implements a cache with properties similar to a cuckoo-set.
The cache is able to hold up to (~(uint32_t)0) - 1
elements.
Read Operations:
erase=false
Read+Erase Operations:
erase=true
Erase Operations:
Write Operations:
Synchronization Free Operations:
User Must Guarantee:
Note on function names:
Element | should be a movable and copyable type |
Hash | should be a function/callable which takes a template parameter hash_select and an Element and extracts a hash from it. Should return high-entropy uint32_t hashes for Hash h; h<0>(e) ... h<7>(e) . |
Definition at line 161 of file cuckoocache.h.
|
inline |
You must always construct a cache with some elements via a subsequent call to setup or setup_bytes, otherwise operations may segfault.
Definition at line 324 of file cuckoocache.h.
|
inlineprivate |
allow_erase marks the element at index n
as discardable.
Threadsafe without any concurrent insert.
n | the index to allow erasure of |
Definition at line 263 of file cuckoocache.h.
|
inlineprivate |
compute_hashes is convenience for not having to write out this expression everywhere we use the hash values of an Element.
We need to map the 32-bit input hash onto a hash bucket in a range [0, size) in a manner which preserves as much of the hash's uniformity as possible. Ideally this would be done by bitmasking but the size is usually not a power of two.
The naive approach would be to use a mod – which isn't perfectly uniform but so long as the hash is much larger than size it is not that bad. Unfortunately, mod/division is fairly slow on ordinary microprocessors (e.g. 90-ish cycles on haswell, ARM doesn't even have an instruction for it.); when the divisor is a constant the compiler will do clever tricks to turn it into a multiply+add+shift, but size is a run-time value so the compiler can't do that here.
One option would be to implement the same trick the compiler uses and compute the constants for exact division based on the size, as described in "{N}-bit Unsigned Division via {N}-bit Multiply-Add" by Arch D. Robison in 2005. But that code is somewhat complicated and the result is still slower than an even simpler option: see the FastRange32 function in util/fastrange.h.
The resulting non-uniformity is also more equally distributed which would be advantageous for something like linear probing, though it shouldn't matter one way or the other for a cuckoo table.
The primary disadvantage of this approach is increased intermediate precision is required but for a 32-bit random number we only need the high 32 bits of a 32*32->64 multiply, which means the operation is reasonably fast even on a typical 32-bit processor.
e | The element whose hashes will be returned |
e
uniformly mapped onto the range [0, size) Definition at line 240 of file cuckoocache.h.
|
inline |
contains iterates through the hash locations for a given element and checks to see if it is present.
contains does not check garbage collected state (in other words, garbage is only collected when the space is needed), so:
executed on a single thread will always return true!
This is a great property for re-org performance for example.
contains returns a bool set true if the element was found.
e | the element to check |
erase | whether to attempt setting the garbage collect flag |
Definition at line 474 of file cuckoocache.h.
|
inlineprivate |
epoch_check handles the changing of epochs for elements stored in the cache.
epoch_check should be run before every insert.
First, epoch_check decrements and checks the cheap heuristic, and then does a more expensive scan if the cheap heuristic runs out. If the expensive scan succeeds, the epochs are aged and old elements are allow_erased. The cheap heuristic is reset to retrigger after the worst case growth of the current epoch's elements would exceed the epoch_size.
Definition at line 286 of file cuckoocache.h.
|
inline |
insert loops at most depth_limit times trying to insert a hash at various locations in the table via a variant of the Cuckoo Algorithm with eight hash locations.
It drops the last tried element if it runs out of depth before encountering an open slot.
Thus:
is not guaranteed to return true.
e | the element to insert |
Swap with the element at the location that was not the last one looked at. Example:
This prevents moving the element we just put in.
The swap is not a move – we must switch onto the evicted element for the next iteration.
Definition at line 397 of file cuckoocache.h.
|
inlineconstexprprivate |
invalid returns a special index that can never be inserted to
Definition at line 254 of file cuckoocache.h.
|
inlineprivate |
please_keep marks the element at index n
as an entry that should be kept.
Threadsafe without any concurrent insert.
n | the index to prioritize keeping |
Definition at line 272 of file cuckoocache.h.
|
inline |
setup initializes the container to store no more than new_size elements and no less than 2 elements.
setup should only be called once.
new_size | the desired number of elements to store |
Definition at line 336 of file cuckoocache.h.
|
inline |
setup_bytes is a convenience function which accounts for internal memory usage when deciding how many elements to store.
It isn't perfect because it doesn't account for any overhead (struct size, MallocUsage, collection and epoch flags). This was done to simplify selecting a power of two size. In the expected use case, an extra two bits per entry should be negligible compared to the size of the elements.
bytes | the approximate number of bytes to use for this data structure |
Definition at line 364 of file cuckoocache.h.
|
mutableprivate |
The bit_packed_atomic_flags array is marked mutable because we want garbage collection to be allowed to occur from const methods.
Definition at line 172 of file cuckoocache.h.
|
private |
depth_limit determines how many elements insert should try to replace.
Should be set to log2(n).
Definition at line 200 of file cuckoocache.h.
|
mutableprivate |
epoch_flags tracks how recently an element was inserted into the cache.
true denotes recent, false denotes not-recent. See insert() method for full semantics.
Definition at line 178 of file cuckoocache.h.
|
private |
epoch_heuristic_counter is used to determine when an epoch might be aged & an expensive scan should be done.
epoch_heuristic_counter is decremented on insert and reset to the new number of inserts which would cause the epoch to reach epoch_size when it reaches zero.
Definition at line 185 of file cuckoocache.h.
|
private |
epoch_size is set to be the number of elements supposed to be in a epoch.
When the number of non-erased elements in an epoch exceeds epoch_size, a new epoch should be started and all current entries demoted. epoch_size is set to be 45% of size because we want to keep load around 90%, and we support 3 epochs at once – one "dead" which has been erased, one "dying" which has been marked to be erased next, and one "living" which new inserts add to.
Definition at line 195 of file cuckoocache.h.
|
private |
hash_function is a const instance of the hash function.
It cannot be static or initialized at call time as it may have internal state (such as a nonce).
Definition at line 206 of file cuckoocache.h.
|
private |
size stores the total available slots in the hash table
Definition at line 168 of file cuckoocache.h.
|
private |
table stores all the elements
Definition at line 165 of file cuckoocache.h.