digitalmars.D - RE: Are Gigantic Associative Arrays Now Possible?

dlangPupil (26/26) Mar 22 2017 Hello to all! As a (D, and coding and forum) newbie I'm learning

Laeeth Isharc (50/76) Mar 23 2017 Hi.
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/13) Mar 23 2017 Increasing the size of a hash table would be prohibitively

dlangPupil (15/17) Mar 23 2017 Hi Ola,

H. S. Teoh via Digitalmars-d (27/47) Mar 23 2017 How do SSD access speeds compare with L1 and L2 cache speeds?

dlangPupil (40/61) Mar 24 2017 Thanks T for the great insight. Very helpful! Nice to see that

H. S. Teoh via Digitalmars-d (67/84) Mar 24 2017 This point is arguable, since a B tree is pretty well-known and thus not

dlangPupil (19/27) Mar 24 2017 Hi T,

=?UTF-8?B?Tm9yZGzDtnc=?= (3/6) Mar 23 2017 Where can this be read? Is there a naming for this particular

dlangPupil (12/14) Mar 23 2017 Hi Nordlow,

dlangPupil <x x989898998.com> writes:

Hello to all!  As a (D, and coding and forum) newbie I'm learning 
about D's associative arrays (AAs), and their tiny latency 
regardless of AA size. Cool!

One practical limitation on the practical maximum AA size is 
memory/disk paging.  But maybe this limit could be overcome with 
the latest SSDs, whose nonvolatile memory can be addressed like 
RAM.

The article below says that Intel Optane SSDs:
	-allow reads and writes can on individual bytes.
	-have a latency 10x of DRAM (but  AAs' latency is so low that 
this might not matter in many cases).
	-currently offer 375GB of "RAM" for $1,500.
	-will support up to 3 TB on 2 socket Xeon systems (48TB on 
4-socket).
	-will be supplemented with Optane DIMMs in the future.

Some questions that arise are...

1) Wouldn't using such "RAM" eliminate any paging issue for 
super-gigantic AAs?
2) What other bottlenecks could arise for gigantic AAs, e.g., 
garbage collection?
3) Would an append-only data design mitigate GC or other 
bottlenecks?
4) Has anyone tried this out?

What a coup if D could "be the first" lang to make this 
practical.  Thanks.

https://arstechnica.com/information-technology/2017/03/intels-first-optane-ssd-375gb-that-you-can-also-use-as-ram/

Mar 22 2017

Laeeth Isharc <laeethnospam nospam.laeeth.com> writes:

On Wednesday, 22 March 2017 at 20:00:56 UTC, dlangPupil wrote:
Hello to all! As a (D, and coding and forum) newbie I'm
learning about D's associative arrays (AAs), and their tiny
latency regardless of AA size. Cool!

One practical limitation on the practical maximum AA size is
memory/disk paging. But maybe this limit could be overcome
with the latest SSDs, whose nonvolatile memory can be addressed
like RAM.

The article below says that Intel Optane SSDs:
-allow reads and writes can on individual bytes.
-have a latency 10x of DRAM (but AAs' latency is so low that
this might not matter in many cases).
-currently offer 375GB of "RAM" for $1,500.
-will support up to 3 TB on 2 socket Xeon systems (48TB on
4-socket).
-will be supplemented with Optane DIMMs in the future.

Some questions that arise are...

1) Wouldn't using such "RAM" eliminate any paging issue for
super-gigantic AAs?
2) What other bottlenecks could arise for gigantic AAs, e.g.,
garbage collection?
3) Would an append-only data design mitigate GC or other
bottlenecks?
4) Has anyone tried this out?

What a coup if D could "be the first" lang to make this
practical. Thanks.

https://arstechnica.com/information-technology/2017/03/intels-first-optane-ssd-375gb-that-you-can-also-use-as-ram/

Hi.

I am very interested in this topic, although I don't really have
answers for your questions.

See the acm article which I mentioned last year I think on forum.
https://queue.acm.org/detail.cfm?id=2874238
"For the entire careers of most practicing computer scientists, a
fundamental observation has consistently held true: CPUs are
significantly more performant and more expensive than I/O
devices. The fact that CPUs can process data at extremely high
rates, while simultaneously servicing multiple I/O devices, has
had a sweeping impact on the design of both hardware and software
for systems of all sizes, for pretty much as long as we've been
building them.

This assumption, however, is in the process of being completely
invalidated.

The arrival of high-speed, non-volatile storage devices,
typically referred to as Storage Class Memories (SCM), is likely
the most significant architectural change that datacenter and
software designers will face in the foreseeable future. SCMs are
increasingly part of server systems, and they constitute a
massive change: the cost of an SCM, at $3-5k, easily exceeds that
of a many-core CPU ($1-2k), and the performance of an SCM
(hundreds of thousands of I/O operations per second) is such that
one or more entire many-core CPUs are required to saturate it.

This change has profound effects:

1. The age-old assumption that I/O is slow and computation is
fast is no longer true: this invalidates decades of design
decisions that are deeply embedded in today's systems.

2. The relative performance of layers in systems has changed by a
factor of a thousand times over a very short time: this requires
rapid adaptation throughout the systems software stack.

3. Piles of existing enterprise datacenter
infrastructure—hardware and software—are about to become useless
(or, at least, very inefficient): SCMs require rethinking the
compute/storage balance and architecture from the ground up.

It's a massive relative price shock - storage vs CPU - and whole
structures around it will need to be utterly transformed. Intel
say it's the biggest technological breakthrough since the
internet.

I'm good at recognising moments of exhaustion in old trends and
the beginning of new ones, and I've thought for some time now
that people were complacently squandering CPU performance just as
conditions were tending to favour it becoming important again.

https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow/answer/Laeeth-Isharc?srid=35gE

I don't know how data structures and file systems should adapt to
this. But I do think the prospective return on efficient code
just went up a lot - as far as I can see, this shift is very good
for D.

Mar 23 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Wednesday, 22 March 2017 at 20:00:56 UTC, dlangPupil wrote:
 1) Wouldn't using such "RAM" eliminate any paging issue for 
 super-gigantic AAs?
 2) What other bottlenecks could arise for gigantic AAs, e.g., 
 garbage collection?

Increasing the size of a hash table would be prohibitively 
expensive. You need a data-structure that can grow gracefully.

 What a coup if D could "be the first" lang to make this 
 practical.  Thanks.

This is more an OS issue than a language issue. Although it could 
boost high-level numeric-oriented languages and functional 
languages that cache expensive computations.

Low level languages are pretty much in the same boat.

Mar 23 2017

dlangPupil <x x989898998.com> writes:

On Thursday, 23 March 2017 at 10:27:36 UTC, Ola Fosheim Grøstad 
wrote:
 Increasing the size of a hash table would be prohibitively 
 expensive. You need a data-structure that can grow gracefully.

Hi Ola,

Are the hash tables you refer to the ones that D uses in the 
background to implement associative arrays, or the ones that a 
programmer might create using AAs?

--If the former, then perhaps the AA's hash function could be 
tweaked for terabyte-range SSD "RAM".  But even if the typical 
2:1 bucket/data storage ratio couldn't be improved, then creating 
32 TB of buckets would still allow 16 TB of 
nearly-instantly-addressable data (on a 4-core Xeon w/ 48 TB 
Optane SSD).

--If the latter, then my goal is design something that makes 
specific items randomly and instantly accessible, like a hash 
table, but with zero potential collisions.  Thanks!

Mar 23 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Mar 24, 2017 at 12:27:00AM +0000, dlangPupil via Digitalmars-d wrote:
 On Thursday, 23 March 2017 at 10:27:36 UTC, Ola Fosheim Gr�stad wrote:
 
 Increasing the size of a hash table would be prohibitively
 expensive.  You need a data-structure that can grow gracefully.

 
 Hi Ola,
 
 Are the hash tables you refer to the ones that D uses in the
 background to implement associative arrays, or the ones that a
 programmer might create using AAs?
 
 --If the former, then perhaps the AA's hash function could be tweaked
 for terabyte-range SSD "RAM".  But even if the typical 2:1 bucket/data
 storage ratio couldn't be improved, then creating 32 TB of buckets
 would still allow 16 TB of nearly-instantly-addressable data (on a
 4-core Xeon w/ 48 TB Optane SSD).
 
 --If the latter, then my goal is design something that makes specific
 items randomly and instantly accessible, like a hash table, but with
 zero potential collisions.  Thanks!

How do SSD access speeds compare with L1 and L2 cache speeds?

You have to keep in mind that one downside of hashtables is that they
tend to be unfriendly towards caching hierarchies, because generally the
hash function is designed to scatter hash keys as widely and as randomly
as possible (to minimize collisions), meaning that potentially *every*
hash lookup will be a cache miss, perhaps more, depending on how
collision resolution is handled, which can be a big performance killer.

For certain applications, where key lookups are more-or-less random,
this is the best you could do, but for a good number of applications,
there tends to be some correlation between lookups, and even things like
B-trees could potentially do better because of their cache-friendliness
(and locality), even if you assume I/O is much faster than your
traditional spindle-based hard drive. The O(log n) could have a much
smaller constant than the hashtable O(1) if lookups tend to be
correlated (i.e., exhibit locality), and lots more cache misses are
incurred in the latter.

Having said that, though, large-capacity low-latency SSDs are very
interesting to me because I'm interested in certain applications that
involve very large lookup tables.  Currently I'm just using traditional
hashtables, but once you get to a certain size, I/O overhead dominates
and progress grinds to a halt.  I've been researching ways of taking
advantage of locality to ease this, but so far haven't gotten it to
actually work yet.


T

-- 
People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG

Mar 23 2017

dlangPupil <x x989898998.com> writes:

On Friday, 24 March 2017 at 06:30:25 UTC, H. S. Teoh wrote:

 You have to keep in mind that one downside of hashtables is 
 that they tend to be unfriendly towards caching hierarchies

...
 For certain applications, where key lookups are more-or-less 
 random, this is the best you could do, but for a good number of 
 applications, there tends to be some correlation between 
 lookups, and even things like B-trees could potentially do 
 better because of their cache-friendliness (and locality), even 
 if you assume I/O is much faster than your traditional 
 spindle-based hard drive. The O(log n) could have a much 
 smaller constant than the hashtable O(1) if lookups tend to be 
 correlated (i.e., exhibit locality), and lots more cache misses 
 are incurred in the latter.

 Having said that, though, large-capacity low-latency SSDs are 
 very interesting to me because I'm interested in certain 
 applications that involve very large lookup tables.  Currently 
 I'm just using traditional hashtables, but once you get to a 
 certain size, I/O overhead dominates and progress grinds to a 
 halt.  I've been researching ways of taking advantage of 
 locality to ease this, but so far haven't gotten it to actually 
 work yet.


 T

Thanks T for the great insight.  Very helpful!  Nice to see that 
you, Laeeth, the ACM, Intel and Micron all agree: this new 
technology could be very disruptive.

In addition to prevalence of random lookups, certain additional 
app conditions and designs might make "gigantic AAs" more useful 
or competitive with alternatives:

1.  Use by programmers who need a data structure that's easier to 
reason about or faster and safer to implement or prototype, and 
easier for future users to read and maintain.

2.  Apps with large tables that use (or could use) non-natural 
keys, e.g., crypto-quality GUIDs, which can't benefit from (or 
impose the extra computational cost of) the sorting that must be 
performed and maintained on natural key values (e.g., 
LastName="Smith") in order to use a B-Tree search.

3.  Tables (and related tables) whose row data (and even 
aggregate data) could be "clumped" (think "documentized") to 
achieve a singularity (and not merely some degree of locality), 
thus consolidating multiple lookups into a single lookup.

	-In other words, a key's value is itself an array of data that 
can be fetched in a single lookup.

	-The key id for each "next" (updated) version of a document 
could also be stored with the current version.

	-The next key-value entry to hold the update could be created 
immediately but not have its values written unless and until the 
prior version of the document has changed.

	-Better yet in an append-only design, creation of the next 
key-value entry could be deferred until a revision actually 
occurs.

	-Such "chained documentization" would automate histories, and 
could be further abstracted and adapted (e.g., with columnstore 
concepts) to accommodate apps in which lookups aren't mostly 
random.

Finally, re: caches: I haven't found whether it is or isn't 
possible to combine a server's DRAM and its Optane SSD RAM or 
DIMMs to form a single pool of RAM.  Mixing DIMMS of different 
speeds is a no-no; but if this could be done, then the hottest 
data could be "cached" in DRAM and thus spared the 10x latency 
penalty of the SSD device.

Mar 24 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Mar 24, 2017 at 04:59:00PM +0000, dlangPupil via Digitalmars-d wrote:
[...]
 In addition to prevalence of random lookups, certain additional app
 conditions and designs might make "gigantic AAs" more useful or
 competitive with alternatives:
 
 1.  Use by programmers who need a data structure that's easier to
 reason about or faster and safer to implement or prototype, and easier
 for future users to read and maintain.

This point is arguable, since a B tree is pretty well-known and thus not
difficult to reason about. Generally, whether you use a hashtable of a B
tree or something else, the high-level view is that it's some opaque
data structure that, given some key K, returns some value V.  Which
underlying algorithm is used is an implementation detail.

And generally as far as application programmers are concerned, you
wouldn't want to implement an advanced data structure from scratch;
you'd use an off-the-shelf library that has already implemented it.


 2.  Apps with large tables that use (or could use) non-natural keys,
 e.g., crypto-quality GUIDs, which can't benefit from (or impose the
 extra computational cost of) the sorting that must be performed and
 maintained on natural key values (e.g., LastName="Smith") in order to
 use a B-Tree search.

Encrypted GUIDs is a very good point.  You probably can't expect much
locality from that.

But this point, and the following, presume a rather specific use case,
i.e., document retrieval, presumably over the network. The application
of hashtables and B-trees, however, is far wider in scope.  In
particular, what I had in mind when I asked about cache speeds is
compute-intensive (as opposed to I/O intensive) processes. These two
domains of application are quite different in some ways.

In document retrieval, for example, if you view it as a server serving
documents over the network to a set of clients, having 1 I/O roundtrip
per lookup is probably acceptable, because the time it takes to send the
document over the network to the client is large enough that making
lookups any faster probably won't make too much of a difference.
Besides, if client requests are essentially unpredictable, then 1 I/O
roundtrip is about the best you could do.

In CPU-intensive computations, however, the cache hierarchy becomes very
prominent, because you're talking about CPU speeds vs. RAM speeds, which
is currently a rather large gap. Being cache-friendly could mean a
difference on the scale of orders of magnitude in terms of overall
performance.  If you're talking about data lookups inside a
compute-intensive inner loop, having 1 I/O roundtrip per iteration is
unacceptable.  Even 1 L1 cache miss per iteration may be unacceptable,
depending on the application.  Ideally, you want to maximize the number
of iterations per L1 cache miss, and then on the next level, squeeze as
many L1 misses onto L2 as possible before you have to go to L3 or revert
to RAM, because RAM is slow, relatively speaking, even if it's still
very fast compared to an I/O roundtrip to the hard disk.  In other
words, you want as much locality as you can possibly get in your memory
accesses -- the entire cache hierarchy, after all, is built on the
premise of memory accesses exhibiting locality.

Furthermore, when it comes to CPU-intensive applications, quite often
you already know the pattern of lookups beforehand, since it's usually
coming from within the application itself, not from external clients
that may be completely unpredictable.  So you can usually predict the
next n memory accesses much more easily, and take advantage of it by
organizing your data structure around that.


[...]
 Finally, re: caches: I haven't found whether it is or isn't possible
 to combine a server's DRAM and its Optane SSD RAM or DIMMs to form a
 single pool of RAM.  Mixing DIMMS of different speeds is a no-no; but
 if this could be done, then the hottest data could be "cached" in DRAM
 and thus spared the 10x latency penalty of the SSD device.

If the SSD device incurs a 10x latency penalty, that's still a far cry
from L1/L2 cache latencies!  True, it will help in the worst case when
you have no choice but to take an I/O roundtrip (a hard drive would be
far slower)... but this would hardly be anything revolutionary if DRAM
is still faster.  At the end of the day, you're still talking about a
memory hierarchy with L1/L2 at the top, L3 and RAM in between, and hard
drive / SSD at the bottom.  If so, then the cache-unfriendliness of
hashtables still applies, and we still have to work with finding ways of
improving hashtable locality, e.g., with locality-sensitive hashing,
cache-oblivious hashtables, etc., or use a more cache-friendly data
structure like B-trees or one of the newer cache-oblivious trees.

(In my case, though, B-trees may not represent much of an improvement,
because I'm dealing with high-dimensional data that cannot be easily
linearized to take maximum advantage of B-tree locality. So at some
level I still need some kind of hash-like structure to work with my
data. But it will probably have some tree-like structure to it, because
of the (high-dimensional) locality it exhibits.)


T

-- 
Without geometry, life would be pointless. -- VS

Mar 24 2017

dlangPupil <x x989898998.com> writes:

On Friday, 24 March 2017 at 17:48:35 UTC, H. S. Teoh wrote:

 (In my case, though, B-trees may not represent much of an 
 improvement, because I'm dealing with high-dimensional data 
 that cannot be easily linearized to take maximum advantage of 
 B-tree locality. So at some level I still need some kind of 
 hash-like structure to work with my data. But it will probably 
 have some tree-like structure to it, because of the 
 (high-dimensional) locality it exhibits.)


 T

Hi T,

Your problem is intriguing and definitely stretching my mind!  
I'll be factoring your ideas into my app design as I go along.

Some techniques that might be relevant to your app, if only for 
relative performance comparisons, might be:

	Using metadata in lieu of actual data to maximize the number of 
rows "represented" in the caches.
	Using one or more columnstores, both the intra- and extra-cache, 
to allow transformations of one or more fields of one or more 
rows with extremely small read, computation and write costs.
	Scaling the app horizontally, if possible.
	Using stored procedures on a a SQL NoSQL or NewSQL DBMS to 
harness the DBMS's bulk-processing and high-throughput 
capabilities.

I'd love to hear whatever details you can share about your app.  
Alternatively I made a list of a dozen or so questions that would 
help me think about how to approach your problem. If you're 
interested in pursuing either avenue, let me know!  Thanks again

Mar 24 2017

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Wednesday, 22 March 2017 at 20:00:56 UTC, dlangPupil wrote:
 Hello to all!  As a (D, and coding and forum) newbie I'm 
 learning about D's associative arrays (AAs), and their tiny 
 latency regardless of AA size. Cool!

Where can this be read? Is there a naming for this particular 
implementation of an AA?

Mar 23 2017

dlangPupil <x x989898998.com> writes:

On Thursday, 23 March 2017 at 20:09:46 UTC, Nordlöw wrote:
...
 Where can this be read? Is there a naming for this particular 
 implementation of an AA?

Hi Nordlow,

Associative Arrays are introduced in Ali Çehreli's great book, 
Programming in D – Tutorial and Reference, Ch. 28; and in Section 
1.4 of Andrei Alexandrescu's equally great "The D Programming 
Language."  Ali's book is at

http://ddili.org/ders/d.en/aa.html

The particular data structure I'd like to implement is not a 
simple hash, but rather a mixture of relational, key-value, 
columnstore and document database structures.  I"ll share it 
after I've "hashed" it out, and thought of a cool name for it!

Mar 23 2017

D Programming

C/C++ Programming

Other

digitalmars.D - RE: Are Gigantic Associative Arrays Now Possible?