digitalmars.D.learn - Implementing a cache

qznc (16/16) Jul 02 2016 I want to implement some caching for HTTP GET requests. Basically

Lodovico Giaretta (10/26) Jul 02 2016 If I understand correctly, you want fast access based on the URL,

qznc (4/7) Jul 03 2016 For now, I settled for a sorted array of cache entries plus an AA

Lodovico Giaretta (6/14) Jul 03 2016 To avoid the ~= operator and reallocations of the cache array,

Jack Stouffer (42/45) Jul 03 2016 Here's a simple ring buffer I use, feel free to take it

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/17) Jul 03 2016 No, you just need the hash of the key in the object for many
Steven Schveighoffer (16/29) Jul 05 2016 Yes, it is annoying to have to deal with creating a fake element. In

qznc <qznc web.de> writes:

I want to implement some caching for HTTP GET requests. Basically 
a map of URL to content. A cache needs some additional meta data 
(size, age, etc).

There seem to be two basic data structures available: Associative 
array (AA) or red black tree (RBT).

With AA cache eviction is inefficient. It requires to sort the 
keys of the AA with respect to a field in the value. Stack 
overflow says "use RBT instead" [0].

With RBT retrieving something is inefficient. To get an entry, we 
need to create an fake entry with the key and get a range of 
entries 'equal' to the fake one? Or can I exploit some "find on 
sorted range" algorithm somewhere?

Alternatively, any better idea to implement the cache? I guess 
there is no off-the-shelf/dub solution.

[0] 
https://stackoverflow.com/questions/10060625/how-to-sort-associative-arrays

Jul 02 2016

Lodovico Giaretta <lodovico giaretart.net> writes:

On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:
 I want to implement some caching for HTTP GET requests. 
 Basically a map of URL to content. A cache needs some 
 additional meta data (size, age, etc).

 There seem to be two basic data structures available: 
 Associative array (AA) or red black tree (RBT).

 With AA cache eviction is inefficient. It requires to sort the 
 keys of the AA with respect to a field in the value. Stack 
 overflow says "use RBT instead" [0].

 With RBT retrieving something is inefficient. To get an entry, 
 we need to create an fake entry with the key and get a range of 
 entries 'equal' to the fake one? Or can I exploit some "find on 
 sorted range" algorithm somewhere?

 Alternatively, any better idea to implement the cache? I guess 
 there is no off-the-shelf/dub solution.

 [0] 
 https://stackoverflow.com/questions/10060625/how-to-sort-associative-arrays

If I understand correctly, you want fast access based on the URL, 
but also fast access based on the age (to evict old entries)? If 
that's the case, you need some very complex structure (I have 
some ideas in mind based on a pair of indexes, but they require 
much effort).

Instead, if you just need an ordered associative array, like 
C++'s std::map, it should be possible to create one copying much 
of the code from the std RedBlackTree (as a side note, having a 
TreeMap in std.container would be a Good Thing(tm)).

Jul 02 2016

qznc <qznc web.de> writes:

On Saturday, 2 July 2016 at 12:21:14 UTC, Lodovico Giaretta wrote:
 On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:
 Alternatively, any better idea to implement the cache? I guess 
 there is no off-the-shelf/dub solution.


For now, I settled for a sorted array of cache entries plus an AA 
to map urls to indices into this array.

https://github.com/qznc/d-github/commit/46c013c650d2cf34911ad3e10308ee6a0b5e3690#diff-24abb1e4df591f2d28947bbde1a09220R79

Jul 03 2016

Lodovico Giaretta <lodovico giaretart.net> writes:

On Sunday, 3 July 2016 at 16:03:51 UTC, qznc wrote:
 On Saturday, 2 July 2016 at 12:21:14 UTC, Lodovico Giaretta 
 wrote:
 On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:
 Alternatively, any better idea to implement the cache? I 
 guess there is no off-the-shelf/dub solution.


 For now, I settled for a sorted array of cache entries plus an 
 AA to map urls to indices into this array.

 https://github.com/qznc/d-github/commit/46c013c650d2cf34911ad3e10308ee6a0b5e3690#diff-24abb1e4df591f2d28947bbde1a09220R79

To avoid the ~= operator and reallocations of the cache array, 
you could impose a max number of cache entries, preallocate the 
array and use it as a circular buffer. I think this way should be 
faster (if imposing a max number of cache entries is feasible, I 
don't know your use cases).

Jul 03 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Sunday, 3 July 2016 at 17:15:32 UTC, Lodovico Giaretta wrote:
 To avoid the ~= operator and reallocations of the cache array, 
 you could impose a max number of cache entries, preallocate the 
 array and use it as a circular buffer.

Here's a simple ring buffer I use, feel free to take it

struct RingBuffer (T, uint buffer_length = 10) {
     private T[buffer_length] buffer;
     private int lastIndex;

      safe  nogc nothrow pure {
         this(const T initial_value) {
             this.lastIndex = 0;
             reset(initial_value);
         }

         void reset(const T value) {
             for (int i = 0; i < this.buffer.length; ++i) {
                 this.buffer[i] = value;
             }
         }

         void put(const T value) {
             this.buffer[this.lastIndex] = value; // store the new 
sample

             // advance the index and wrap it around
             this.lastIndex += 1;
             if (this.lastIndex >= buffer_length) {
                 this.lastIndex = 0;
             }
         }

         auto ref opIndex(size_t n) { return buffer[n]; }
         auto opSlice() { return buffer[]; }
         auto opAssign(T[] rhs)
         in
         {
             assert(rhs.length <= buffer.length);
         }
         body
         {
             reset();
             foreach (e; rhs)
             {
                 put(e);
             }
             return this;
          }
     }
}

Jul 03 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Saturday, 2 July 2016 at 12:10:28 UTC, qznc wrote:
 With AA cache eviction is inefficient. It requires to sort the 
 keys of the AA with respect to a field in the value. Stack

No, you just need the hash of the key in the object for many 
types of hash-tables. There are many solutions, but here are some 
for Least Recently Used (LRU) schemes:

1. The simplest efficient solution is to use a hash + an embedded 
double-linked list and an intrusive reference counter.

When the refcount > 0 you unlink the object.
When the refcount == 0 you put the object at the tail of the list.
When you need memory you deallocate the head of the list.

2. Use a hash + partially time-sorted queue for objects that are 
not in use, and do a partial quick-sort to find LRU lazily (when 
you need to evict).

Not using a hash for lookup seems silly, unless the ID can be 
efficiently used for a trie like search tree. This might work out 
ok for URLs.

Jul 03 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 7/2/16 8:10 AM, qznc wrote:
 I want to implement some caching for HTTP GET requests. Basically a map
 of URL to content. A cache needs some additional meta data (size, age,
 etc).

 There seem to be two basic data structures available: Associative array
 (AA) or red black tree (RBT).

 With AA cache eviction is inefficient. It requires to sort the keys of
 the AA with respect to a field in the value. Stack overflow says "use
 RBT instead" [0].

 With RBT retrieving something is inefficient. To get an entry, we need
 to create an fake entry with the key> and get a range of entries 'equal'
 to the fake one?

Yes, it is annoying to have to deal with creating a fake element. In 
dcollections, I have a TreeMap wrapper which does this for you: 
https://github.com/schveiguy/dcollections/blob/master/dcollections/TreeMap.d#L881

We should probably add support for the searching functions that allows 
parameters other than full elements if the element supports comparison 
with that type.

In terms of using equalRange, it should be pretty efficient. If you are 
allowing duplicates, then it does 2 binary searches (one for the 
beginning node and one for the end). If no duplicates, one binary 
search, and the equalRange will be 0 or 1 element (depending on if the 
key is present).

 Or can I exploit some "find on sorted range" algorithm
 somewhere?

There is find on sorted range support in std.algorithm, but it must be a 
random-access range. RBTree range is not RA. What's wrong with the 
binary search support in the tree itself?

-Steve

Jul 05 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Implementing a cache