digitalmars.D - Associative Arrays max length? 32bit/64bit

sdvcn (23/23) May 16 2014 import std.stdio;

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/30) May 17 2014 I cannot get the 32bit version to run on my computer, but what

sdvcn (19/56) May 17 2014 Does not capture.

sdvcn (24/24) May 17 2014 int main(string[] argv)

FG (18/41) May 17 2014 This code will always make you run out of memory. Why are you surprised?

bearophile (6/8) May 17 2014 I think D now uses a linked list for the collision chains (so

FG (3/6) May 17 2014 Indeed, I just read https://github.com/D-Programming-Language/dmd/blob/m...

bearophile (8/12) May 17 2014 Sorry, I didn't know the linked list is sorted. It's scanned

FG (3/4) May 17 2014 if (nodes > buckets_length * 4) rehash();
monarch_dodra (8/15) May 17 2014 *Technically*, for a sorted linked list (or forward iterators in

bearophile (6/9) May 17 2014 I think I have never implement such algorithm, but you are right,

monarch_dodra (6/15) May 17 2014 It's not used in phobos (as far as I know of anyways). It *could*

bearophile (6/8) May 18 2014 Recently SortedRanges were changed, now they don't need to be

Steven Schveighoffer (9/23) May 19 2014 This is dmd's source, not druntime. This is the representation of AA's i...

FG (10/12) May 24 2014 Silly me. A look at the body of delnodes should have made it clear that ...

Steven Schveighoffer (9/24) May 24 2014 You know what, you are right. I assumed it used keyti.equals. This is a ...

H. S. Teoh via Digitalmars-d (15/33) May 24 2014 [...]

Steven Schveighoffer (12/43) May 24 2014 Any object/struct that defines opCmp but not opEquals is broken, and

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (6/11) May 25 2014 If this is the case, then it needs to be documented in

Steven Schveighoffer (39/49) May 25 2014 =

John Colvin (5/47) May 25 2014 Perhaps I'm being naïve, but can't we just have a default

Steven Schveighoffer (8/13) May 25 2014 r =

H. S. Teoh via Digitalmars-d (23/71) May 26 2014 Sorry for the late response, I've been very busy with other things.

Steven Schveighoffer (14/77) May 27 2014 Hah, looking at that PR, it references the original PR

"sdvcn" <sdvcn 126.com> writes:

import std.stdio;

import std.utf;
import std.uni;
import std.string;
import std.random;
import std.conv;

int main(string[] argv)
{

	size_t[string] bary;

	try{
		for(size_t i=0;i<(size_t.max -1);i++)
		{
			bary["Key:" ~  to!(string)(i)] = i;
		}
	}catch(Exception e)
	{
		writeln(e);
	}
     return 0;
}
// This code will overflow?


bary.length <> size_t.max ?

32bit bary.length == 64bit bary.length ?

May 16 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 17 May 2014 at 00:25:13 UTC, sdvcn wrote:
 import std.stdio;

 import std.utf;
 import std.uni;
 import std.string;
 import std.random;
 import std.conv;

 int main(string[] argv)
 {

 	size_t[string] bary;

 	try{
 		for(size_t i=0;i<(size_t.max -1);i++)
 		{
 			bary["Key:" ~  to!(string)(i)] = i;
 		}
 	}catch(Exception e)
 	{
 		writeln(e);
 	}
     return 0;
 }
 // This code will overflow?


 bary.length <> size_t.max ?

 32bit bary.length == 64bit bary.length ?

I cannot get the 32bit version to run on my computer, but what 
exactly is happening?

I suspect you will simply run out of memory at some point, but 
this shouldn't be caught by catch(Exception), as it should throw 
an Error.

Can you post the exact output of your program?

May 17 2014

"sdvcn" <sdvcn 126.com> writes:

On Saturday, 17 May 2014 at 09:26:32 UTC, Marc Schütz wrote:
 On Saturday, 17 May 2014 at 00:25:13 UTC, sdvcn wrote:
 import std.stdio;

 import std.utf;
 import std.uni;
 import std.string;
 import std.random;
 import std.conv;

 int main(string[] argv)
 {

 	size_t[string] bary;

 	try{
 		for(size_t i=0;i<(size_t.max -1);i++)
 		{
 			bary["Key:" ~  to!(string)(i)] = i;
 		}
 	}catch(Exception e)
 	{
 		writeln(e);
 	}
    return 0;
 }
 // This code will overflow?


 bary.length <> size_t.max ?

 32bit bary.length == 64bit bary.length ?

 I cannot get the 32bit version to run on my computer, but what 
 exactly is happening?

 I suspect you will simply run out of memory at some point, but 
 this shouldn't be caught by catch(Exception), as it should 
 throw an Error.

 Can you post the exact output of your program?

Does not capture.
My computer is 16g memory, amd x2 250 cpu ,windows 2008 r2


int main(string[] argv)
{

	size_t[string] bary;

	for(size_t i=0;i<(size_t.max -1);i++)
	{
		bary["Key:" ~  to!(string)(i)] = i;
	}

     return 0;
}

-m32 results are "ngram.exe 中的 0x7547c42d (KernelBase.dll) 
处有未经处理的异常: 0xE0440001: 0xe0440001"

-m64 Overflow

I do not know bary.length results,

Want to know the maximum capacity of the  Associative Arrays 
32bit? 64bit?

Why will overflow? How to capture? How to Avoid?

May 17 2014

"sdvcn" <sdvcn 126.com> writes:

int main(string[] argv)
{
auto flog = File("results.txt", "w");

	size_t[string] bary;

	for(size_t i=0;i<(size_t.max -1);i++)
	{
		bary["Key:" ~  to!(string)(i)] = i;
		flog.write("stop i=" ~text(i));
		flog.seek(0);
		flog.flush();
	}
     return 0;
}

results:
start i=0
stop i=36495998
---------------
start i=0
stop i=36495992
----------------
start i=36495998
stop i=72991099

I guess not see why Overflow.

hash table Collision?

May 17 2014

FG <home fgda.pl> writes:

On 2014-05-17 12:46, sdvcn wrote:
 int main(string[] argv)
 {
 auto flog = File("results.txt", "w");

      size_t[string] bary;

      for(size_t i=0;i<(size_t.max -1);i++)
      {
          bary["Key:" ~  to!(string)(i)] = i;
          flog.write("stop i=" ~text(i));
          flog.seek(0);
          flog.flush();
      }
      return 0;
 }

 results:
 start i=0
 stop i=36495998
 ---------------
 start i=0
 stop i=36495992
 ----------------
 start i=36495998
 stop i=72991099

 I guess not see why Overflow.


This code will always make you run out of memory. Why are you surprised?

Each key in the hash table is a string in the form "Key: 1234", so at stop (i =
36495998) the string has 13 bytes. Add to that 16 bytes for slice of that
string (assuming 64-bit architecture), 8 bytes for the value, some space wasted
on alignment, and don't forget the extra memory needed to store the tree for
fast key look-up in the hash array.

You said that you have 16 GB of memory. At i = 36495998 that means at most 470
bytes per item.

As for capturing the problem, you can catch the Out-of-memory error but you
cannot do that by catch(Exception e). OutOfMemory is not an Exception. It is an
Error. Here is the updated example:


     import std.stdio, std.string, std.conv, core.exception;

     int main(string[] argv) {
         size_t[string] bary;
         size_t i = 0;
         try {
             for (; i < (size_t.max - 1); i++)
                 bary["Key:" ~  to!(string)(i)] = i;
         } catch (OutOfMemoryError e) {
             writeln(e);
         }
         writefln("Last index was: %d", i);
         return 0;
     }

May 17 2014

"bearophile" <bearophileHUGS lycos.com> writes:

FG:

 and don't forget the extra memory needed to store the tree for 
 fast key look-up in the hash array.

I think D now uses a linked list for the collision chains (so 
opCmp is useless, despite it's still required for the hashing 
protocol).

Bye,
bearophile

May 17 2014

FG <home fgda.pl> writes:

On 2014-05-17 21:35, bearophile wrote:
 FG:

 and don't forget the extra memory needed to store the tree for fast key
look-up in the hash array.

 I think D now uses a linked list for the collision chains (so opCmp is
useless, despite it's still required for the hashing protocol).

Indeed, I just read https://github.com/D-Programming-Language/dmd/blob/master/src/backend/aa.c
Key hash is divided modulo the number of buckets and each bucket points to an
ordered double-linked list. That list is walked left or right depending on what
value Typeinfo::compare returns for two keys. Hmm... isn't opCmp used by that
function? Why useless?

May 17 2014

"bearophile" <bearophileHUGS lycos.com> writes:

FG:

 and each bucket points to an ordered double-linked list.
 That list is walked left or right depending on what value
 Typeinfo::compare returns for two keys. Hmm... isn't opCmp
 used by that function? Why useless?

Sorry, I didn't know the linked list is sorted. It's scanned 
sequentially, because you can't use a binary search on a regular 
linked list. Perhaps a skiplist is better to avoid O(n^2) 
behavour in presence of attacks or degenerate cases (in past the 
AA used a tree there).

Bye,
bearophile

May 17 2014

FG <home fgda.pl> writes:

On 2014-05-17 22:30, bearophile wrote:
 Sorry, I didn't know the linked list is sorted. It's scanned sequentially,
because you can't use a binary search on a regular linked list. Perhaps a
skiplist is better to avoid O(n^2) behavour in presence of attacks or
degenerate cases (in past the AA used a tree there).

     if (nodes > buckets_length * 4) rehash();

Skiplist doesn't seem necessary. As seen above, there shouldn't be much of a
problem with long lists accumulating in some selected buckets, as long as the
hash function is a proper hash (i.e. for any set of x (especially consecutive
ones like 0, 1, ... n) hash(x) values cover the range of size_t evenly).

May 17 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Saturday, 17 May 2014 at 20:30:30 UTC, bearophile wrote:
 Sorry, I didn't know the linked list is sorted. It's scanned 
 sequentially, because you can't use a binary search on a 
 regular linked list. Perhaps a skiplist is better to avoid 
 O(n^2) behavour in presence of attacks or degenerate cases (in 
 past the AA used a tree there).

 Bye,
 bearophile

*Technically*, for a sorted linked list (or forward iterators in 
general), you can find the result with O(N) *walk* iterations, 
but still only O(log(N)) *comparison* iterations.

So saying "you can't use binary search on a regular linked list" 
is not quite 100% accurate. You can still get some bang for your 
buck out of a degenerated algorithm.

http://www.cplusplus.com/reference/algorithm/binary_search/

May 17 2014

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 for a sorted linked list (or forward iterators in general), you 
 can find the result with O(N) *walk* iterations, but still only 
 O(log(N)) *comparison* iterations.

I think I have never implement such algorithm, but you are right, 
and it's nice. Is Phobos using this idea somewhere? Are D AAs 
using it?

Bye,
bearophile

May 17 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Saturday, 17 May 2014 at 22:05:03 UTC, bearophile wrote:
 monarch_dodra:

 for a sorted linked list (or forward iterators in general), 
 you can find the result with O(N) *walk* iterations, but still 
 only O(log(N)) *comparison* iterations.

 I think I have never implement such algorithm, but you are 
 right, and it's nice. Is Phobos using this idea somewhere? Are 
 D AAs using it?

 Bye,
 bearophile

It's not used in phobos (as far as I know of anyways). It *could* 
be implemented in SortedRange's BinaryFind though.

As for using it in AA's, you'd have to keep in mind you'd that (I 
think) you probably need a minimum size for the algorithm's lower 
complexity to kick in and actually give you better times.

May 17 2014

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 It's not used in phobos (as far as I know of anyways). It 
 *could* be implemented in SortedRange's BinaryFind though.

Recently SortedRanges were changed, now they don't need to be 
random access ranges, so this is possible and I think it's good:

https://issues.dlang.org/show_bug.cgi?id=12763

Bye,
bearophile

May 18 2014