www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Maximum array size

reply gassa mail.ru writes:
Hi!

What is the 16M limit on the array length for?
Recently I needed more space but when I declared a ~25M array of ubytes I
quickly ran into a compile error saying
 file.d(5): index 25000000 overflow for static array

An investigation got me to lines 1592-1593 of src/dmd/mtype.c stating
 if (n2 >= 0x1000000)    // put a 'reasonable' limit on it
     goto Loverflow;

I wonder what's the 'reason' on that limit. Eventually, I didn't give up and, with a bit of disassembly, lifted the limit in the comparison to 768M (you know, there doesn't seem to be a way to just recompile the code, since it's incomplete). At first, everything looked fine: my program compiled and ran and the result of using that ~25M array was the same as in C (by the way, there were no such problems with dmc compiler). But later, I discovered that: 1. Trying to allocate a bigger static array (say, ~100M) just hangs the compiler up; 2. Trying to allocate even a ~25M array of bits (which should consume less than 4M space) does the same - the compiler hangs up. Could anyone explain what's the problem? I'm new to D Programming Language. I discovered it only recently and decided to give it a try. Currently, I seem to like it's look and feel, although such undocumented misfeatures of dmd, the only compiler I know of, do present a bit of trouble. Thanks in advance, Ivan Kazmenko.
Apr 25 2006
next sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
gassa mail.ru wrote:
 Hi!
 
 What is the 16M limit on the array length for?
 Recently I needed more space but when I declared a ~25M array of ubytes I
 quickly ran into a compile error saying
 file.d(5): index 25000000 overflow for static array

An investigation got me to lines 1592-1593 of src/dmd/mtype.c stating
 if (n2 >= 0x1000000)    // put a 'reasonable' limit on it
     goto Loverflow;

I wonder what's the 'reason' on that limit.

I've put the same limit in my own array template I use for C++ projects. My reasoning was that you shouldn't be using a dynamic array for such huge memory blocks because of the potential costly resize. The limit helped me catch some bugs where array were being used and caused a significant performance bottleneck. I rewrote those parts with a simple malloc/free or VirtualAlloc. In D there might be the added complexities of the garbage collector. Remember that it keeps scanning the allocated memory for pointers. For these huge blocks it's better to use some other memory allocation scheme. L.
Apr 26 2006
parent reply Ivan Kazmenko <gassa mail.ru> writes:
Hi again!

me:
 What is the 16M limit on the array length for?
 I wonder what's the 'reason' on that limit.


Lionello Lunesu:
I've put the same limit in my own array template I use for C++ projects. 
  My reasoning was that you shouldn't be using a dynamic array for such 
huge memory blocks because of the potential costly resize. The limit 
helped me catch some bugs where array were being used and caused a 
significant performance bottleneck. I rewrote those parts with a simple 
malloc/free or VirtualAlloc. In D there might be the added complexities 
of the garbage collector. Remember that it keeps scanning the allocated 
memory for pointers. For these huge blocks it's better to use some other 
memory allocation scheme.

I definitely won't want to resize such a huge array, at least not too often. My array is static. For resizing, I usually use the technique of doubling the requested index when it exceeds the current limit. If my too-many-M array is scanned for pointers, which I do not store there, do I have a way of forbidding it? Dave:
Why the need for such a large static array? Can't a dynamic array be used in
your code instead?

I could use dynamic array, yes, but I don't see any benefit since its size is fixed. Please explain why it could be better. The current problem which requires such a big array is related to prime numbers - it's a program for a programming contest at http://recmath.org/contest/ (you know, various contests are the best way to learn something new, if you have enough free time, of course). I use a sort of Eratosphenes sieve for prime numbers, and the larger the sieve, the faster the program runs (for large numbers, it uses Miller-Rabin primality test which is significantly slower than seeking a bit in an array, with paging, caching, and other modern RAM techniques included). I work with a teammate, we share ideas but write different implementations. He uses Delphi, and I decided to try D, but currently my implementation sucks 'cause I have no 'legal' way to use an array that big. Current problem aside, there are many ways to use my current amount of RAM, which is 768M, indexing it with only a couple of arrays. It's not the first time I face the restriction even working on the current problem! Ivan Kazmenko.
Apr 26 2006
next sibling parent reply Ben Phillips <Ben_member pathlink.com> writes:
In article <e2nj5p$2nj4$1 digitaldaemon.com>, Ivan Kazmenko says...
Hi again!

me:
 What is the 16M limit on the array length for?
 I wonder what's the 'reason' on that limit.


Lionello Lunesu:
I've put the same limit in my own array template I use for C++ projects. 
  My reasoning was that you shouldn't be using a dynamic array for such 
huge memory blocks because of the potential costly resize. The limit 
helped me catch some bugs where array were being used and caused a 
significant performance bottleneck. I rewrote those parts with a simple 
malloc/free or VirtualAlloc. In D there might be the added complexities 
of the garbage collector. Remember that it keeps scanning the allocated 
memory for pointers. For these huge blocks it's better to use some other 
memory allocation scheme.

I definitely won't want to resize such a huge array, at least not too often. My array is static. For resizing, I usually use the technique of doubling the requested index when it exceeds the current limit. If my too-many-M array is scanned for pointers, which I do not store there, do I have a way of forbidding it? Dave:
Why the need for such a large static array? Can't a dynamic array be used in
your code instead?

I could use dynamic array, yes, but I don't see any benefit since its size is fixed. Please explain why it could be better.

Actually static arrays are fixed size and dynamic arrays are not.
The current problem which requires such a big array is related to prime numbers
- it's a program for a programming contest at http://recmath.org/contest/ (you
know, various contests are the best way to learn something new, if you have
enough free time, of course). I use a sort of Eratosphenes sieve for prime
numbers, and the larger the sieve, the faster the program runs (for large
numbers, it uses Miller-Rabin primality test which is significantly slower than
seeking a bit in an array, with paging, caching, and other modern RAM techniques
included). I work with a teammate, we share ideas but write different
implementations. He uses Delphi, and I decided to try D, but currently my
implementation sucks 'cause I have no 'legal' way to use an array that big.

Current problem aside, there are many ways to use my current amount of RAM,
which is 768M, indexing it with only a couple of arrays. It's not the first time
I face the restriction even working on the current problem!

Ivan Kazmenko.

An array is the wrong way to go if you need that much memory. Anyways, the Eratosphenes sieve is iterative so you could use a linked list instead. Since the extra 768M (*2 if doubly linked) links would increase the memory requirement by a large amount you could consider using a custom, hybrid collection. Maybe by having a linked list of arrays each with a size of 50000.
Apr 26 2006
parent reply Ivan Kazmenko <gassa mail.ru> writes:
Ben Phillips:
Why the need for such a large static array? Can't a dynamic array be used in
your code instead?

fixed. Please explain why it could be better.


array. Sorry for the ambiguity.
An array is the wrong way to go if you need that much memory. Anyways, the
Eratosphenes sieve is iterative so you could use a linked list instead. Since
the extra 768M (*2 if doubly linked) links would increase the memory requirement
by a large amount you could consider using a custom, hybrid collection. Maybe by
having a linked list of arrays each with a size of 50000.

slow. Secondly, for any number N, say, 1 <= N <= 100_000_000, I need to have a fast way to check whether it is prime. When you need something to be done real fast, using heavy data structures such as lists and collections doesn't make sense. The sieve is not the result of the algorithm; rather, it's an auxiliary table built at initialisation time and used ever since. Ivan Kazmenko.
Apr 26 2006
parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Ivan Kazmenko wrote:
 Secondly, for any number N, say, 1 <= N <= 100_000_000, I need to have a
 fast way to check whether it is prime.

Sounds to me like you want an associative array. I.e: bool[int] primes; primes[769] = true; if (769 in primes) { ... } else { ... } The only annoyance with this is that you might as well set primes[769] to false, as you don't care about the value, only whether the key exists.
Apr 26 2006
parent =?ISO-8859-1?Q?Jari-Matti_M=E4kel=E4?= <jmjmak utu.fi.invalid> writes:
Deewiant wrote:
 Ivan Kazmenko wrote:
 Secondly, for any number N, say, 1 <= N <= 100_000_000, I need to have a
 fast way to check whether it is prime.

Sounds to me like you want an associative array. I.e: bool[int] primes; primes[769] = true; if (769 in primes) { ... } else { ... } The only annoyance with this is that you might as well set primes[769] to false, as you don't care about the value, only whether the key exists.

They might look clean and elegant, but associative arrays are terribly slow in this particular case. A hash map would be fast, O(1), if all the primes were already known, but this isn't the case here. A tree implementation would have O(log n) average complexity and adding keys to a hash map would be quite suboptimal too (resizing, collisions). -- Jari-Matti
Apr 26 2006
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
Ivan Kazmenko wrote:
 Hi again!
 
 me:
 What is the 16M limit on the array length for?
 I wonder what's the 'reason' on that limit.


Lionello Lunesu:
 I've put the same limit in my own array template I use for C++ projects. 
  My reasoning was that you shouldn't be using a dynamic array for such 
 huge memory blocks because of the potential costly resize. The limit 
 helped me catch some bugs where array were being used and caused a 
 significant performance bottleneck. I rewrote those parts with a simple 
 malloc/free or VirtualAlloc. In D there might be the added complexities 
 of the garbage collector. Remember that it keeps scanning the allocated 
 memory for pointers. For these huge blocks it's better to use some other 
 memory allocation scheme.

I definitely won't want to resize such a huge array, at least not too often. My array is static. For resizing, I usually use the technique of doubling the requested index when it exceeds the current limit. If my too-many-M array is scanned for pointers, which I do not store there, do I have a way of forbidding it? Dave:
 Why the need for such a large static array? Can't a dynamic array be used in
 your code instead?

I could use dynamic array, yes, but I don't see any benefit since its size is fixed. Please explain why it could be better. The current problem which requires such a big array is related to prime numbers - it's a program for a programming contest at http://recmath.org/contest/ (you know, various contests are the best way to learn something new, if you have enough free time, of course). I use a sort of Eratosphenes sieve for prime numbers, and the larger the sieve, the faster the program runs (for large numbers, it uses Miller-Rabin primality test which is significantly slower than seeking a bit in an array, with paging, caching, and other modern RAM techniques included). I work with a teammate, we share ideas but write different implementations. He uses Delphi, and I decided to try D, but currently my implementation sucks 'cause I have no 'legal' way to use an array that big. Current problem aside, there are many ways to use my current amount of RAM, which is 768M, indexing it with only a couple of arrays. It's not the first time I face the restriction even working on the current problem! Ivan Kazmenko.

Instead of T[25_000_000] myarray; just do T[] myarray = new T[25_000_000]; and that will take care of the immediate problem you describe and let you get on with solving the problem. Example: import std.stdio, std.string; int main(char[][] args) { bool[] flags = new bool[25_000_000]; int count; count = 0; flags[] = 1; for(int i = 2; i < flags.length; i++) { if(flags[i]) { // remove all multiples of prime: i for(int j = i + i; j < flags.length; j += i) flags[j] = 0; count++; } } writefln("Count: ",count); return 0; } - Dave
Apr 26 2006
parent reply Ivan Kazmenko <gassa mail.ru> writes:
Dave:
Instead of

T[25_000_000] myarray;

just do

T[] myarray = new T[25_000_000];

and that will take care of the immediate problem you describe and let 
you get on with solving the problem.

solution as soon as I get back home. What I'm concerned of is the performance of the data structure. The simpler, the faster, the better. I have a feeling that using a dynamic array adds an extra CPU cycle or two since the additional pointer to the data is variable, not constant; I'll check that soon. That's why I'm trying to get a working static array solution. Languages such as C and D offer a comfortable and readable way of writing performance critical parts of the program; it's a shame when it does not work afterwards. Besides, having such an undocumented feature as a 16M limit on an array length is IMHO no good and deserves at least an explanation by someone familiar with the dmd code. Ivan Kazmenko.
Apr 26 2006
parent reply jcc7 <jcc7_member pathlink.com> writes:
In article <e2o1po$hm8$1 digitaldaemon.com>, Ivan Kazmenko says...
Dave:
Instead of

T[25_000_000] myarray;

just do

T[] myarray = new T[25_000_000];

and that will take care of the immediate problem you describe and let 
you get on with solving the problem.

solution as soon as I get back home. What I'm concerned of is the performance of the data structure. The simpler, the faster, the better. I have a feeling that using a dynamic array adds an extra CPU cycle or two since the additional pointer to the data is variable, not constant; I'll check that soon. That's why I'm trying to get a working static array solution. Languages such as C and D offer a comfortable and readable way of writing performance critical parts of the program; it's a shame when it does not work afterwards. Besides, having such an undocumented feature as a 16M limit on an array length is IMHO no good and deserves at least an explanation by someone familiar with the dmd code.

Are you sure it's undocumented? See http://www.digitalmars.com/d/arrays.html, "Static Arrays" section "The total size of a static array cannot exceed 16Mb. A dynamic array should be used instead for such large arrays." jcc7
Apr 26 2006
next sibling parent =?ISO-8859-1?Q?Jari-Matti_M=E4kel=E4?= <jmjmak utu.fi.invalid> writes:
jcc7 wrote:
 In article <e2o1po$hm8$1 digitaldaemon.com>, Ivan Kazmenko says...
 Dave:
 Instead of

 T[25_000_000] myarray;

 just do

 T[] myarray = new T[25_000_000];

 and that will take care of the immediate problem you describe and let 
 you get on with solving the problem.

solution as soon as I get back home. What I'm concerned of is the performance of the data structure. The simpler, the faster, the better. I have a feeling that using a dynamic array adds an extra CPU cycle or two since the additional pointer to the data is variable, not constant; I'll check that soon. That's why I'm trying to get a working static array solution. Languages such as C and D offer a comfortable and readable way of writing performance critical parts of the program; it's a shame when it does not work afterwards. Besides, having such an undocumented feature as a 16M limit on an array length is IMHO no good and deserves at least an explanation by someone familiar with the dmd code.

Are you sure it's undocumented? See http://www.digitalmars.com/d/arrays.html, "Static Arrays" section "The total size of a static array cannot exceed 16Mb. A dynamic array should be used instead for such large arrays."

Would it be possible to write the static array part in C and link it with the rest of the code being D? -- Jari-Matti
Apr 26 2006
prev sibling parent reply Ivan Kazmenko <gassa mail.ru> writes:
jcc7:
Besides, having such an undocumented feature as a 16M limit on an array length
is IMHO no good and deserves at least an explanation by someone familiar with
the dmd code.

See http://www.digitalmars.com/d/arrays.html, "Static Arrays" section "The total size of a static array cannot exceed 16Mb. A dynamic array should be used instead for such large arrays."

outdated and does not seem to contain that limit; where could I find a more up-to-date specification? The site docs don't build up to a complete specification, do they? Still, a question remains: what's the reason for such a limit? Ivan Kazmenko.
Apr 26 2006
next sibling parent reply jcc7 <jcc7_member pathlink.com> writes:
In article <e2odih$1akv$1 digitaldaemon.com>, Ivan Kazmenko says...
jcc7:
Besides, having such an undocumented feature as a 16M limit on an array length
is IMHO no good and deserves at least an explanation by someone familiar with
the dmd code.

See http://www.digitalmars.com/d/arrays.html, "Static Arrays" section "The total size of a static array cannot exceed 16Mb. A dynamic array should be used instead for such large arrays."


You're quite welcome.
The document I used is "spec_DMD_0.109.pdf", it's
outdated and does not seem to contain that limit; where could I find a more
up-to-date specification? The site docs don't build up to a complete
specification, do they?

Well, spec_DMD_0.109.pdf is just an unofficial snapshot of the docs that existed back in December 2004. I might create another PDF sometime, but I think a lot of pages have been added since I made that PDF (and I'd have to remember how I did it). But the docs on the digitalmars website and included with DMD are the official version, so those are the best ones to use anyway (if possible). There are other PDF snapshots listed at http://www.prowiki.org/wiki4d/wiki.cgi?LanguageSpecification, but you've already found the newest one that I know of.
Still, a question remains: what's the reason for such a limit?

I don't know what the reason is, but I'm sure there is one. It's probably related to issues such as RAM and garbage collection. It might be there because it's better to stop this practice at compile time than run the high risk of running out of memory at runtime. But I'm not expert at memory management, so I'm just guessing. jcc7
Apr 26 2006
parent reply Ivan Kazmenko <gassa mail.ru> writes:
jcc7:
Well, spec_DMD_0.109.pdf is just an unofficial snapshot of the docs that existed
back in December 2004. I might create another PDF sometime, but I think a lot of
pages have been added since I made that PDF (and I'd have to remember how I did
it). But the docs on the digitalmars website and included with DMD are the
official version, so those are the best ones to use anyway (if possible).

there are no bits and no bit arrays; seems that bit is currently an alias to ubyte or such. Why is that? Ivan Kazmenko.
Apr 26 2006
parent Walter Bright <newshound digitalmars.com> writes:
Ivan Kazmenko wrote:
 jcc7:
 Well, spec_DMD_0.109.pdf is just an unofficial snapshot of the docs that
existed
 back in December 2004. I might create another PDF sometime, but I think a lot
of
 pages have been added since I made that PDF (and I'd have to remember how I did
 it). But the docs on the digitalmars website and included with DMD are the
 official version, so those are the best ones to use anyway (if possible).

there are no bits and no bit arrays; seems that bit is currently an alias to ubyte or such. Why is that?

Long threads about that. But you can use std.bitarray.
Apr 27 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Ivan Kazmenko wrote:
 Still, a question remains: what's the reason for such a limit?

1) Most of the time, such huge static arrays are the result of a typo. 2) It's not practical for the compiler to determine what the maximum amount is. So very large static arrays will not reliably work, and there will be bug reports filed about why the executable mysteriously fails to load, etc. 3) Such an executable may load, but may cripple the system. 4) A 16Mb limit gives a reasonable chance of the program being portable. 5) If initialized data is put into such an array, the executable size will grow by the size of the array. 6) Optlink has some troubles with very large static data. 7) Such arrays can easily be just allocated dynamically at runtime.
Apr 26 2006
parent reply Ivan Kazmenko <gassa mail.ru> writes:
Walter Bright:
...
7) Such arrays can easily be just allocated dynamically at runtime.

argument, and it worked for me, but gave me some interesting experience in the process of investigation which I'd like to share. First, I checked the example posted by Dave, the program compiled and ran, but I found some oddity in its behaviour: after writing the output line, it still went on working a bit and then terminated. Then, I changed my sieve program to use dynamic array instead of static one. Imagine my astonishment when it ran about 10 seconds, while the static array version ran no more than a second! I checked different versions of my program (such as manually working with a bit array stored in ubytes), and the results were the same: they all ran 10-20 seconds. By redirecting the output from file to the screen, I found out that the output was generated fast enough like before, and the remaining time was used by the program to consume processor resources for a purpose unknown. I then remembered that in good old days I used no gc, and happily wrote a delete line at the end of my program. Oh joy! its runtime got back to one second. Well, I expected that gc is one of the main advantages of d over c++... but manually deleting allocated objects currently does not pose a problem to me. Anyway, I usually program for contests and research, not for large scale projects. But I wonder what did it do in that ten seconds. Did it scan the array being deleted for hidden pointers? All that encouraged me to write a simple benchmark. Below is a simple Eratosphenes sieve program. Thanks to the conditional compile capabilities of D, I was able to maintain five versions in a small, simple, readable program. ----- import std.stdio; const int MaxN = 16_000_000; version (Byte) {alias ubyte main_type;} version (Int) {alias int main_type;} version (Static) {main_type [MaxN] a;} version (Dynamic) {main_type [] a;} int main () { int i, j, s; version (Dynamic) {a = new main_type [MaxN];} a[2..MaxN - 1] = 1; for (i = 2; i * i < MaxN; i++) if (a[i]) for (j = i * i; j < MaxN; j += i) a[j] = 0; s = 0; for (i = 0; i < MaxN; i++) if (a[i]) s++; version (Delete) {delete a;} printf ("The number of primes is %d.\n", s); return 0; } ----- The compile line then looks like "dmd -version=Byte -version=Static test.d". Here is the run time for different versions of the program on my current configuration - it's Athlon64 3200+ 2500 MHz (overclocked), 768 MB DDR, Windows 2003 Server (32-bit). I measured wall-clock time since I know no simple reliable way to get process time under Windows. Didn't try dmd under Linux yet. ----- Byte, Static: 1.44 Byte, Dynamic: 2.52 Byte, Dynamic, Delete: 1.41 Int, Dynamic: 2.23 Int, Dynamic, Delete: 2.17 ----- Interestingly, repetitive runs show that the dynamic version with delete is in fact a little bit faster than the static version - I definitely didn't expect such an effect! On the contrary, dynamic version without delete does the job in the same time and then spends a second more doing some secret internal cleanup work. Another interesting notion is that the int version behaves nearly the same with or without delete, slightly faster than the dynamic byte version without delete. Static int version doesn't compile, since it requires a ~64M array. Ivan Kazmenko.
Apr 26 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Ivan Kazmenko wrote:
 But I
 wonder what did it do in that ten seconds. Did it scan the array being deleted
 for hidden pointers?

Yup. GC is a lousy tool to use for 64Mb arrays. A program necessarily is not going to have many of those, and they should be handled with explicit allocation/delete.
Apr 27 2006
parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Ivan Kazmenko wrote:
 But I
 wonder what did it do in that ten seconds. Did it scan the array being 
 deleted
 for hidden pointers?

Yup. GC is a lousy tool to use for 64Mb arrays. A program necessarily is not going to have many of those, and they should be handled with explicit allocation/delete.

It would be nice if the GC could be told not to scan them as well (or if this were sorted out automatically using type information), but that's something I gather will happen at some point anyway if you're working on a compacting GC :-) Sean
Apr 27 2006
parent Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 It would be nice if the GC could be told not to scan them as well (or if 
 this were sorted out automatically using type information), but that's 
 something I gather will happen at some point anyway if you're working on 
 a compacting GC :-)

Yes, that ability should be in the GC, but for now, if it is allocated with malloc() it won't be scanned.
Apr 27 2006
prev sibling parent Dave <Dave_member pathlink.com> writes:
In article <e2m3ll$v0k$1 digitaldaemon.com>, gassa mail.ru says...
Hi!

What is the 16M limit on the array length for?
Recently I needed more space but when I declared a ~25M array of ubytes I
quickly ran into a compile error saying
 file.d(5): index 25000000 overflow for static array

An investigation got me to lines 1592-1593 of src/dmd/mtype.c stating
 if (n2 >= 0x1000000)    // put a 'reasonable' limit on it
     goto Loverflow;

I wonder what's the 'reason' on that limit. Eventually, I didn't give up and, with a bit of disassembly, lifted the limit in the comparison to 768M (you know, there doesn't seem to be a way to just recompile the code, since it's incomplete). At first, everything looked fine: my program compiled and ran and the result of using that ~25M array was the same as in C (by the way, there were no such problems with dmc compiler). But later, I discovered that: 1. Trying to allocate a bigger static array (say, ~100M) just hangs the compiler up; 2. Trying to allocate even a ~25M array of bits (which should consume less than 4M space) does the same - the compiler hangs up. Could anyone explain what's the problem? I'm new to D Programming Language. I discovered it only recently and decided to give it a try. Currently, I seem to like it's look and feel, although such undocumented misfeatures of dmd, the only compiler I know of, do present a bit of trouble. Thanks in advance, Ivan Kazmenko.

Why the need for such a large static array? Can't a dynamic array be used in your code instead?
Apr 26 2006