www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - 16MB static arrays again...

reply Tomer Filiba <tomerfiliba gmail.com> writes:
#WEKA #INDUSTRY

I found this post from 2007 
http://forum.dlang.org/post/fdspch$d3v$1 digitalmars.com that 
refers to this post from 2006 
http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071 -- and I
still don't realize, why do static arrays have this size limit on them?

Any linker issues should be contained to the linker (i.e., use a 
different linker, fix your linker). As for cross-device -- if C 
lets me use huge static arrays, why should D impose a limit? As 
for executable failing to load -- we have ~1GB .bss section and a 
~100MB .rodata section. No issues there.

As for the "use dynamic arrays instead", this poses two problems:
1) Why? I know everything in compile-time, why force me to (A) 
allocate it separately in `shared static this()` and (B) 
introduce all sorts of runtime bound checks?

2) Many times I need memory-contiguity, e.g., several big arrays 
inside a struct, which is dumped to disk/sent over network. I 
can't use pointers there.

And on top of it all, this limit *literally* makes no sense:

__gshared ubyte[16*1024*1024] x; // fails to compile

struct S {
     ubyte[10*1024*1024] a;
     ubyte[10*1024*1024] b;
}
__gshared S x;    // compiles

I'm already working on a super-ugly-burn-with-fire mixin that 
generates *structs* in a given size to overcome this limit.


-tomer
Aug 24 2016
next sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Wednesday, 24 August 2016 at 07:50:25 UTC, Tomer Filiba wrote:
 #WEKA #INDUSTRY

 I found this post from 2007 
 http://forum.dlang.org/post/fdspch$d3v$1 digitalmars.com that 
 refers to this post from 2006 
 http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071 -- and I
still don't realize, why do static arrays have this size limit on them?

 [...]
There was this thread last week https://forum.dlang.org/thread/not418$mcr$1 digitalmars.com with a technical work-around. But you are right that there is no technical justification for that limit to exist.
Aug 24 2016
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 24/08/2016 7:50 PM, Tomer Filiba wrote:
 #WEKA #INDUSTRY

 I found this post from 2007
 http://forum.dlang.org/post/fdspch$d3v$1 digitalmars.com that refers to
 this post from 2006
 http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071 --
 and I still don't realize, why do static arrays have this size limit on
 them?

 Any linker issues should be contained to the linker (i.e., use a
 different linker, fix your linker). As for cross-device -- if C lets me
 use huge static arrays, why should D impose a limit? As for executable
 failing to load -- we have ~1GB .bss section and a ~100MB .rodata
 section. No issues there.
You're welcome to fix optlink https://github.com/DigitalMars/optlink Or write a whole new linker. Of course there is no reason to not change this for -m32mscoff and -m64 on Windows at least that I am aware of.
Aug 24 2016
parent reply Tomer Filiba <tomerfiliba gmail.com> writes:
On Wednesday, 24 August 2016 at 08:34:24 UTC, rikki cattermole 
wrote:
 You're welcome to fix optlink 
 https://github.com/DigitalMars/optlink
 Or write a whole new linker.

 Of course there is no reason to not change this for -m32mscoff 
 and -m64 on Windows at least that I am aware of.
I'm running on linux with gold linker. No such issues. But the point is totally different -- if it's a linker issue, let the linker fail. Not the compiler. Anyway, #FML: struct BigArray(T, size_t N) { mixin((){ enum arraySizeLimit = 16*1024*1024 - 1; enum numFullChunks = (T.sizeof * N) / arraySizeLimit; enum elemsPerFullChunk = arraySizeLimit / T.sizeof; enum elemsPerLastChunk = N - numFullChunks * elemsPerFullChunk; static assert (elemsPerLastChunk <= elemsPerFullChunk); string s = ""; size_t covered; foreach(i; 0 .. numFullChunks) { s ~= "T[%s] arr%s;\n".format(elemsPerFullChunk, i); } s ~= "T[%s] arr%s;\n".format(elemsPerLastChunk, numFullChunks); return s; }()); enum length = N; property T* ptr() { return arr0.ptr; } property T[] slice() { return arr0.ptr[0 .. N]; } alias slice this; }
Aug 24 2016
parent reply Shachar Shemesh <shachar weka.io> writes:
On 24/08/16 12:08, Tomer Filiba wrote:
      property T[] slice() {
I think change that to: property ref T[N] slice() { Shachar
Aug 24 2016
parent Tomer Filiba <tomerfiliba gmail.com> writes:
On Wednesday, 24 August 2016 at 09:24:57 UTC, Shachar Shemesh 
wrote:
 I think change that to:
  property ref T[N] slice() {
That would obviously not work, since the *type* T[N] cannot exist
Aug 24 2016
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/24/2016 12:50 AM, Tomer Filiba wrote:
 As for the "use dynamic arrays instead", this poses two problems:
 1) Why? I know everything in compile-time, why force me to (A) allocate it
 separately in `shared static this()`
I'm not sure how that hurts anything. It's just a call to malloc().
 and (B) introduce all sorts of runtime bound checks?
There won't be runtime bounds checks if you use pointers.
 2) Many times I need memory-contiguity, e.g., several big arrays inside a
 struct, which is dumped to disk/sent over network. I can't use pointers there.
I don't know why pointers cannot be used. Can you show the struct definition you're using?
Aug 24 2016
next sibling parent reply Lodovico Giaretta <lodovico giaretart.net> writes:
On Wednesday, 24 August 2016 at 09:38:43 UTC, Walter Bright wrote:
 I don't know why pointers cannot be used.
In my code I have a line like this: static immutable MyStruct[] data = [ MyTemplate!MyArgs ]; This would not work if `MyStruct.sizeof * MyTemplate!MyArgs.length` is bigger than 16MB, right? And for this case there's no workaround, I think... I will not reach that size, but I agree with the others that this should be a linker limitation, not a language one, so that I can change linker and have it work.
Aug 24 2016
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/24/2016 3:35 AM, Lodovico Giaretta wrote:
 On Wednesday, 24 August 2016 at 09:38:43 UTC, Walter Bright wrote:
 I don't know why pointers cannot be used.
In my code I have a line like this: static immutable MyStruct[] data = [ MyTemplate!MyArgs ]; This would not work if `MyStruct.sizeof * MyTemplate!MyArgs.length` is bigger than 16MB, right? And for this case there's no workaround, I think...
The compiler will run out of memory long before you've got 16 million array initializers.
Aug 24 2016
prev sibling next sibling parent reply Tomer Filiba <tomerfiliba gmail.com> writes:
On Wednesday, 24 August 2016 at 09:38:43 UTC, Walter Bright wrote:
 2) Many times I need memory-contiguity, e.g., several big 
 arrays inside a
 struct, which is dumped to disk/sent over network. I can't use 
 pointers there.
I don't know why pointers cannot be used. Can you show the struct definition you're using?
Our configuration is a struct of several static hash tables (allocated in-place, not via GC). So the entire configuration is contiguous is memory, which allows us to dump/load/send it easily. When we increase the capacity of these tables we run into these pain-in-the-ass 16MB limits. So although the struct itself is over 16MB, no single table can cross several thousand entries, as the static arrays it uses internally would overflow that boundary. The configuration itself may very well be dynamically allocated (e.g., not a global variable) but that won't solve anything as the restriction is on the *type* of the array. -tomer
Aug 24 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/24/2016 3:35 AM, Tomer Filiba wrote:
 On Wednesday, 24 August 2016 at 09:38:43 UTC, Walter Bright wrote:
 2) Many times I need memory-contiguity, e.g., several big arrays inside a
 struct, which is dumped to disk/sent over network. I can't use pointers there.
I don't know why pointers cannot be used. Can you show the struct definition you're using?
Our configuration is a struct of several static hash tables (allocated in-place, not via GC). So the entire configuration is contiguous is memory, which allows us to dump/load/send it easily. When we increase the capacity of these tables we run into these pain-in-the-ass 16MB limits. So although the struct itself is over 16MB, no single table can cross several thousand entries, as the static arrays it uses internally would overflow that boundary. The configuration itself may very well be dynamically allocated (e.g., not a global variable) but that won't solve anything as the restriction is on the *type* of the array.
If I understand you correctly, removing the size limitation on the type will resolve the issue for you? Even though allocating static data of such a size will still not be allowed? BTW, given globals in C++: int a[100]; float b[200]; long c; there actually is no guarantee that they are allocated contiguously, and indeed I've run into bugs in my code because I had relied on that. They'd have to be put into a struct to get a guarantee.
Aug 24 2016
next sibling parent NX <nightmarex1337 hotmail.com> writes:
Maybe you can merge this: https://github.com/dlang/dmd/pull/6081
Aug 24 2016
prev sibling parent Tomer Filiba <tomerfiliba gmail.com> writes:
On Wednesday, 24 August 2016 at 18:16:01 UTC, Walter Bright wrote:
 On 8/24/2016 3:35 AM, Tomer Filiba wrote:
...
 Our configuration is a struct of several static hash tables 
 (allocated in-place, not via GC). So the entire configuration 
 is contiguous in memory
...
 If I understand you correctly, removing the size limitation on 
 the type will resolve the issue for you? Even though allocating 
 static data of such a size will still not be allowed?
(1) Why won't allocating such static data be allowed? Gold has no such limitations (2) Why do linker limitations have to do with the inner working on types? (3) I may very well allocate the entire config using malloc, or on the stack (given a huge stack). The linker has nothing to do with it
 BTW, given globals in C++:

    int a[100];
    float b[200];
    long c;

 there actually is no guarantee that they are allocated 
 contiguously, ... They'd have to be put into a struct to get a 
 guarantee.
As I said (quoted above), the configuration is a struct. It looks like struct Config { Table!(K1, V1, 10000) myTable; Table!(K2, V2, 10000) yourTable; Table!(K3, V3, 10000) hisTable; Table!(K4, V4, 10000) herTable; } So it is surely contiguous. Table is a static hash table built on top of a static array, since it has a known capacity. When `V1.sizeof * 10_000 > 16MB`, it fails to compile Btw, I set declare everything with `field = void` to prevent the struct from having a huge init symbol. I take care of initialization in runtime. -tomer
Aug 25 2016
prev sibling parent reply Tomer Filiba <tomerfiliba gmail.com> writes:
On Wednesday, 24 August 2016 at 09:38:43 UTC, Walter Bright wrote:
 1) Why? I know everything in compile-time, why force me to (A) 
 allocate it
 separately in `shared static this()`
I'm not sure how that hurts anything. It's just a call to malloc().
Static this()es require a topological order on modules, or it blows up when running the module ctors. We've had many issues with static-this()es that required us to split modules into "top and bottom parts", which only serve breaking cycles for the ctors (i.e., not a logical compilation unit, but a technical one).
Aug 24 2016
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/24/2016 3:55 AM, Tomer Filiba wrote:
 On Wednesday, 24 August 2016 at 09:38:43 UTC, Walter Bright wrote:
 1) Why? I know everything in compile-time, why force me to (A) allocate it
 separately in `shared static this()`
I'm not sure how that hurts anything. It's just a call to malloc().
Static this()es require a topological order on modules, or it blows up when running the module ctors. We've had many issues with static-this()es that required us to split modules into "top and bottom parts", which only serve breaking cycles for the ctors (i.e., not a logical compilation unit, but a technical one).
Splitting them, as you say, works, as well as simply creating a function that does it and calling it when main() starts.
Aug 24 2016
prev sibling next sibling parent Ivan Kazmenko <gassa mail.ru> writes:
On Wednesday, 24 August 2016 at 07:50:25 UTC, Tomer Filiba wrote:
 #WEKA #INDUSTRY

 I found this post from 2007 
 http://forum.dlang.org/post/fdspch$d3v$1 digitalmars.com that 
 refers to this post from 2006 
 http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071 -- and I
still don't realize, why do static arrays have this size limit on them?
Heh, I'm the OP of the 2006 thread. Ten years passed, and I've learned to use dynamic arrays in D with negligible loss of performance most of the time. But of course I'd still like to have no such artificial limit. Ivan Kazmenko.
Aug 24 2016
prev sibling parent Yuxuan Shui <yshuiv7 gmail.com> writes:
100% agree this limit should be removed. We do have a workaround 
with dynamic arrays, but that doesn't justify not fixing the 
problem.
Aug 24 2016