digitalmars.D - Memory footprint of multi-dimensional arrays

Ben Mitchell (6/6) Apr 18 2008 So, I was watching my D program eat up ~1GB of ram doing something my ba...

BCS (9/20) Apr 18 2008 one (kinda ugly) solution would be to allocate the whole thing as a

Simen Kjaeraas (38/64) Apr 18 2008 :

BCS (5/10) Apr 18 2008 without going to special types, it's a small as you will get, no extra
Ben Mitchell (3/34) Apr 18 2008 Yeah, that was more or less what I'd come up with, too; it just seemed s...

Bill Baxter (8/43) Apr 18 2008 May be too heavy-weight for your needs, but my Murray lib supports

Sean Kelly (18/23) Apr 18 2008 calculation said should only be taking a few hundred MB max, and I start...
Fawzi Mohamed (4/26) Apr 19 2008 you probably want to look at the multiarray library:
Bruno Medeiros (5/8) Apr 25 2008 ubyte[3][] x = new ubyte[3][](10485760);

Ben Mitchell <Arisian gmail.com> writes:

So, I was watching my D program eat up ~1GB of ram doing something my back of
the envelope calculation said should only be taking a few hundred MB max, and I
started playing around.  Turns out that if I do:

ubyte[][] b = new ubyte[][](3, 10485760);

It takes about 30MB of main memory, as would be expected.  If, on the other
hand, I reverse the order of the indexing and do:

ubyte[][] b = new ubyte[][](10485760, 3);

It takes about 250MB of main memory.  This seems totally absurd to me; I can
only assume this is because it's an array of (potentially arbitrary size)
dynamic arrays, and there's ~220MB of bookkeeping information being stored.  
Is there any way around this?  Can I somehow tell the compiler it's a symmetric
array?  Can I make a dynamic array of static arrays of size 3?  I can reverse
the indexing if I have to, but it would make the rest of my code very
counter-intuitive, and it just doesn't seem like it should make an order of
magnitude difference in memory usage.

   -  Ben

Apr 18 2008

BCS <BCS pathlink.com> writes:

Ben Mitchell wrote:
 So, I was watching my D program eat up ~1GB of ram doing something my back of
the envelope calculation said should only be taking a few hundred MB max, and I
started playing around.  Turns out that if I do:
 
 ubyte[][] b = new ubyte[][](3, 10485760);
 
 It takes about 30MB of main memory, as would be expected.  If, on the other
hand, I reverse the order of the indexing and do:
 
 ubyte[][] b = new ubyte[][](10485760, 3);
 
 It takes about 250MB of main memory.  This seems totally absurd to me; I can
only assume this is because it's an array of (potentially arbitrary size)
dynamic arrays, and there's ~220MB of bookkeeping information being stored.  
Is there any way around this?  Can I somehow tell the compiler it's a symmetric
array?  Can I make a dynamic array of static arrays of size 3?  I can reverse
the indexing if I have to, but it would make the rest of my code very
counter-intuitive, and it just doesn't seem like it should make an order of
magnitude difference in memory usage.
 
    -  Ben

one (kinda ugly) solution would be to allocate the whole thing as a 
single huge ubyte[] array and slice out the 1M 3 byte arrays:


auto tmp = new ubyte[](3*10485760 +  5);

ubyte[][] b;
b.length = 10485760;
foreach(uint i, ref bp; b) bp = tmp[i*3.. (i+1)*3];

A little template magic might be able to do this recursively for nested 
dynamic arrays

Apr 18 2008

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Fri, 18 Apr 2008 18:15:45 +0200, BCS <BCS pathlink.com> wrote:

 Ben Mitchell wrote:
 So, I was watching my D program eat up ~1GB of ram doing something my=


  =

 back of the envelope calculation said should only be taking a few  =


 hundred MB max, and I started playing around.  Turns out that if I do=


:
  ubyte[][] b =3D new ubyte[][](3, 10485760);
  It takes about 30MB of main memory, as would be expected.  If, on th=


e  =

 other hand, I reverse the order of the indexing and do:
  ubyte[][] b =3D new ubyte[][](10485760, 3);
  It takes about 250MB of main memory.  This seems totally absurd to m=


e;  =

 I can only assume this is because it's an array of (potentially  =


 arbitrary size) dynamic arrays, and there's ~220MB of bookkeeping  =


 information being stored.   Is there any way around this?  Can I  =


 somehow tell the compiler it's a symmetric array?  Can I make a dynam=


ic  =

 array of static arrays of size 3?  I can reverse the indexing if I ha=


ve  =

 to, but it would make the rest of my code very counter-intuitive, and=


  =

 it just doesn't seem like it should make an order of magnitude  =


 difference in memory usage.
     -  Ben

 one (kinda ugly) solution would be to allocate the whole thing as a  =

 single huge ubyte[] array and slice out the 1M 3 byte arrays:


 auto tmp =3D new ubyte[](3*10485760 +  5);

 ubyte[][] b;
 b.length =3D 10485760;
 foreach(uint i, ref bp; b) bp =3D tmp[i*3.. (i+1)*3];

 A little template magic might be able to do this recursively for neste=

d  =

 dynamic arrays


Wouldn't that end up taking just as much space?  b would be an array of
10485760 slices, each having a pointer/length pair taking up 8 bytes eac=
h,  =

no?
Might save some due to GC wanting bigger slices than 3 bytes, but still.=


Now, using a struct with overloaded opIndex (and opSlice, if you want)
should be lighter. Something like this (untested)?


struct Foo
{
   ubyte[] data;
   size_t sz1, sz2;

   static Foo opCall(size_t size1, size_t size2)
   {
     Foo tmp;
     tmp.data.length =3D size1 * size2;
     tmp.sz1 =3D size1;
     tmp.sz2 =3D size2;
     return tmp;
   }

   ubyte[] opIndex(size_t index)
   {
     return data[index * size2..index * size2 + size1];
   }

   void opIndexAssign(ubyte[] value, size_t index)
   {
     data[index * size2..index * size2 + size1] =3D value;
   }
}

-- Simen

Apr 18 2008

BCS <BCS pathlink.com> writes:

Simen Kjaeraas wrote:

 
 Wouldn't that end up taking just as much space?  b would be an array of
 10485760 slices, each having a pointer/length pair taking up 8 bytes 
 each,  no?

yup.

 Might save some due to GC wanting bigger slices than 3 bytes, but still.

without going to special types, it's a small as you will get, no extra 
GC overhead in alignment or bookkeeping. (I guess you could allocate the 
whole thing in one go but then you mix pointers and non pointers.)

Apr 18 2008

Ben Mitchell <Arisian gmail.com> writes:

Simen Kjaeraas Wrote:
 
 Now, using a struct with overloaded opIndex (and opSlice, if you want)
 should be lighter. Something like this (untested)?
 
 
 struct Foo
 {
    ubyte[] data;
    size_t sz1, sz2;
 
    static Foo opCall(size_t size1, size_t size2)
    {
      Foo tmp;
      tmp.data.length = size1 * size2;
      tmp.sz1 = size1;
      tmp.sz2 = size2;
      return tmp;
    }
 
    ubyte[] opIndex(size_t index)
    {
      return data[index * size2..index * size2 + size1];
    }
 
    void opIndexAssign(ubyte[] value, size_t index)
    {
      data[index * size2..index * size2 + size1] = value;
    }
 }
 
 -- Simen

Yeah, that was more or less what I'd come up with, too; it just seemed sort of
inelegant, and I was hoping that there was some clean, built in way to do this
that I just didn't know about.  If there's not, I think that some built-in
syntax for non-ragged multi dimensional arrays would be nice, given that it's a
pretty common thing to need in scientific computation, and it doesn't seem like
it should be that hard an addition to make.

   - Ben

Apr 18 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Ben Mitchell wrote:
 Simen Kjaeraas Wrote:
 Now, using a struct with overloaded opIndex (and opSlice, if you want)
 should be lighter. Something like this (untested)?


 struct Foo
 {
    ubyte[] data;
    size_t sz1, sz2;

    static Foo opCall(size_t size1, size_t size2)
    {
      Foo tmp;
      tmp.data.length = size1 * size2;
      tmp.sz1 = size1;
      tmp.sz2 = size2;
      return tmp;
    }

    ubyte[] opIndex(size_t index)
    {
      return data[index * size2..index * size2 + size1];
    }

    void opIndexAssign(ubyte[] value, size_t index)
    {
      data[index * size2..index * size2 + size1] = value;
    }
 }

 -- Simen

 
 Yeah, that was more or less what I'd come up with, too; it just seemed sort of
inelegant, and I was hoping that there was some clean, built in way to do this
that I just didn't know about.  If there's not, I think that some built-in
syntax for non-ragged multi dimensional arrays would be nice, given that it's a
pretty common thing to need in scientific computation, and it doesn't seem like
it should be that hard an addition to make.
 
    - Ben

May be too heavy-weight for your needs, but my Murray lib supports 
arbitrary N-dimensional arrays using a single allocation strategy like that.

And my DFLAT lib supports matrices (just 2-dimensional arrays) in a 
similar way (but also supports different storage formats like banded, 
triangular, and various sparse matrix formats).

both are at http://www.dsource.org/projects/multiarray

--bb

Apr 18 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Ben Mitchell (Arisian gmail.com)'s article
 So, I was watching my D program eat up ~1GB of ram doing something my back of
the envelope

calculation said should only be taking a few hundred MB max, and I started
playing around.  Turns out
that if I do:
 ubyte[][] b = new ubyte[][](3, 10485760);
 It takes about 30MB of main memory, as would be expected.  If, on the other
hand, I reverse the order

of the indexing and do:
 ubyte[][] b = new ubyte[][](10485760, 3);
 It takes about 250MB of main memory.  This seems totally absurd to me; I can
only assume this is

because it's an array of (potentially arbitrary size) dynamic arrays, and
there's ~220MB of bookkeeping
information being stored.   Is there any way around this?  Can I somehow tell
the compiler it's a
symmetric array?  Can I make a dynamic array of static arrays of size 3?  I can
reverse the indexing if I
have to, but it would make the rest of my code very counter-intuitive, and it
just doesn't seem like it
should make an order of magnitude difference in memory usage.

In the first case, you have an array containing three pointers to arrays of
10485760 bytes each, for a
minimum total of (3 * 4) + (3 * 10485760) bytes -- around 30 MB.  In the second
case, you have an
array containing 10485760 pointers to arrays of 3 bytes each, for a minimum
total of (10485760 * 4) +
(10485760 * 3) -- about 200 MB.  Now given the actual GC implementation, the
minimum block size
allocated is actually 16 bytes rather than 3 bytes so the second number is
actually (10485760 * 16),
bringing the memory used for the allocation to roughly 300 MB by my estimation.
 The basic issue here
is that this is a ragged array so the memory is not allocated in one large
contiguous block but rather a
bunch of linked dynamic blocks.


Sean

Apr 18 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-04-18 18:04:17 +0200, Ben Mitchell <Arisian gmail.com> said:

 So, I was watching my D program eat up ~1GB of ram doing something my 
 back of the envelope calculation said should only be taking a few 
 hundred MB max, and I started playing around.  Turns out that if I do:
 
 ubyte[][] b = new ubyte[][](3, 10485760);
 
 It takes about 30MB of main memory, as would be expected.  If, on the 
 other hand, I reverse the order of the indexing and do:
 
 ubyte[][] b = new ubyte[][](10485760, 3);
 
 It takes about 250MB of main memory.  This seems totally absurd to me; 
 I can only assume this is because it's an array of (potentially 
 arbitrary size) dynamic arrays, and there's ~220MB of bookkeeping 
 information being stored.   Is there any way around this?  Can I 
 somehow tell the compiler it's a symmetric array?  Can I make a dynamic 
 array of static arrays of size 3?  I can reverse the indexing if I have 
 to, but it would make the rest of my code very counter-intuitive, and 
 it just doesn't seem like it should make an order of magnitude 
 difference in memory usage.
 
    -  Ben

you probably want to look at the multiarray library:
http://www.dsource.org/projects/multiarray/

Fawzi

Apr 19 2008

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Ben Mitchell wrote:
 
 Can I make a dynamic array of static arrays of size 3? 
 

ubyte[3][] x = new ubyte[3][](10485760);


-- 
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Apr 25 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Memory footprint of multi-dimensional arrays