www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Implementing a cache

reply qznc <qznc web.de> writes:
I want to implement some caching for HTTP GET requests. Basically 
a map of URL to content. A cache needs some additional meta data 
(size, age, etc).

There seem to be two basic data structures available: Associative 
array (AA) or red black tree (RBT).

With AA cache eviction is inefficient. It requires to sort the 
keys of the AA with respect to a field in the value. Stack 
overflow says "use RBT instead" [0].

With RBT retrieving something is inefficient. To get an entry, we 
need to create an fake entry with the key and get a range of 
entries 'equal' to the fake one? Or can I exploit some "find on 
sorted range" algorithm somewhere?

Alternatively, any better idea to implement the cache? I guess 
there is no off-the-shelf/dub solution.

[0] 
https://stackoverflow.com/questions/10060625/how-to-sort-associative-arrays
Jul 02 2016
next sibling parent reply Lodovico Giaretta <lodovico giaretart.net> writes:
On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:
 I want to implement some caching for HTTP GET requests. 
 Basically a map of URL to content. A cache needs some 
 additional meta data (size, age, etc).

 There seem to be two basic data structures available: 
 Associative array (AA) or red black tree (RBT).

 With AA cache eviction is inefficient. It requires to sort the 
 keys of the AA with respect to a field in the value. Stack 
 overflow says "use RBT instead" [0].

 With RBT retrieving something is inefficient. To get an entry, 
 we need to create an fake entry with the key and get a range of 
 entries 'equal' to the fake one? Or can I exploit some "find on 
 sorted range" algorithm somewhere?

 Alternatively, any better idea to implement the cache? I guess 
 there is no off-the-shelf/dub solution.

 [0] 
 https://stackoverflow.com/questions/10060625/how-to-sort-associative-arrays
If I understand correctly, you want fast access based on the URL, but also fast access based on the age (to evict old entries)? If that's the case, you need some very complex structure (I have some ideas in mind based on a pair of indexes, but they require much effort). Instead, if you just need an ordered associative array, like C++'s std::map, it should be possible to create one copying much of the code from the std RedBlackTree (as a side note, having a TreeMap in std.container would be a Good Thing(tm)).
Jul 02 2016
parent reply qznc <qznc web.de> writes:
On Saturday, 2 July 2016 at 12:21:14 UTC, Lodovico Giaretta wrote:
 On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:
 Alternatively, any better idea to implement the cache? I guess 
 there is no off-the-shelf/dub solution.
For now, I settled for a sorted array of cache entries plus an AA to map urls to indices into this array. https://github.com/qznc/d-github/commit/46c013c650d2cf34911ad3e10308ee6a0b5e3690#diff-24abb1e4df591f2d28947bbde1a09220R79
Jul 03 2016
parent reply Lodovico Giaretta <lodovico giaretart.net> writes:
On Sunday, 3 July 2016 at 16:03:51 UTC, qznc wrote:
 On Saturday, 2 July 2016 at 12:21:14 UTC, Lodovico Giaretta 
 wrote:
 On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:
 Alternatively, any better idea to implement the cache? I 
 guess there is no off-the-shelf/dub solution.
For now, I settled for a sorted array of cache entries plus an AA to map urls to indices into this array. https://github.com/qznc/d-github/commit/46c013c650d2cf34911ad3e10308ee6a0b5e3690#diff-24abb1e4df591f2d28947bbde1a09220R79
To avoid the ~= operator and reallocations of the cache array, you could impose a max number of cache entries, preallocate the array and use it as a circular buffer. I think this way should be faster (if imposing a max number of cache entries is feasible, I don't know your use cases).
Jul 03 2016
parent Jack Stouffer <jack jackstouffer.com> writes:
On Sunday, 3 July 2016 at 17:15:32 UTC, Lodovico Giaretta wrote:
 To avoid the ~= operator and reallocations of the cache array, 
 you could impose a max number of cache entries, preallocate the 
 array and use it as a circular buffer.
Here's a simple ring buffer I use, feel free to take it struct RingBuffer (T, uint buffer_length = 10) { private T[buffer_length] buffer; private int lastIndex; safe nogc nothrow pure { this(const T initial_value) { this.lastIndex = 0; reset(initial_value); } void reset(const T value) { for (int i = 0; i < this.buffer.length; ++i) { this.buffer[i] = value; } } void put(const T value) { this.buffer[this.lastIndex] = value; // store the new sample // advance the index and wrap it around this.lastIndex += 1; if (this.lastIndex >= buffer_length) { this.lastIndex = 0; } } auto ref opIndex(size_t n) { return buffer[n]; } auto opSlice() { return buffer[]; } auto opAssign(T[] rhs) in { assert(rhs.length <= buffer.length); } body { reset(); foreach (e; rhs) { put(e); } return this; } } }
Jul 03 2016
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:
 With AA cache eviction is inefficient. It requires to sort the 
 keys of the AA with respect to a field in the value. Stack
No, you just need the hash of the key in the object for many types of hash-tables. There are many solutions, but here are some for Least Recently Used (LRU) schemes: 1. The simplest efficient solution is to use a hash + an embedded double-linked list and an intrusive reference counter. When the refcount > 0 you unlink the object. When the refcount == 0 you put the object at the tail of the list. When you need memory you deallocate the head of the list. 2. Use a hash + partially time-sorted queue for objects that are not in use, and do a partial quick-sort to find LRU lazily (when you need to evict). Not using a hash for lookup seems silly, unless the ID can be efficiently used for a trie like search tree. This might work out ok for URLs.
Jul 03 2016
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 7/2/16 8:10 AM, qznc wrote:
 I want to implement some caching for HTTP GET requests. Basically a map
 of URL to content. A cache needs some additional meta data (size, age,
 etc).

 There seem to be two basic data structures available: Associative array
 (AA) or red black tree (RBT).

 With AA cache eviction is inefficient. It requires to sort the keys of
 the AA with respect to a field in the value. Stack overflow says "use
 RBT instead" [0].

 With RBT retrieving something is inefficient. To get an entry, we need
 to create an fake entry with the key> and get a range of entries 'equal'
 to the fake one?
Yes, it is annoying to have to deal with creating a fake element. In dcollections, I have a TreeMap wrapper which does this for you: https://github.com/schveiguy/dcollections/blob/master/dcollections/TreeMap.d#L881 We should probably add support for the searching functions that allows parameters other than full elements if the element supports comparison with that type. In terms of using equalRange, it should be pretty efficient. If you are allowing duplicates, then it does 2 binary searches (one for the beginning node and one for the end). If no duplicates, one binary search, and the equalRange will be 0 or 1 element (depending on if the key is present).
 Or can I exploit some "find on sorted range" algorithm
 somewhere?
There is find on sorted range support in std.algorithm, but it must be a random-access range. RBTree range is not RA. What's wrong with the binary search support in the tree itself? -Steve
Jul 05 2016