www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Question about arrays

reply Etienne <etcimon gmail.com> writes:
I'm trying to pinpoint the difference in array handling of stack arrays 
vs heap arrays. More precisely, trying to understand how to use arrays 
with malloc data. Here's a sample of the experiment:

------
import std.stdio;
import std.typecons;
auto callit(){

	ubyte[50] heapSrc1 = new ubyte[50];
	ubyte[50] stackSrc1;
	ubyte[] heapDst1 = new ubyte[50];
	ubyte[] heapDst2 = new ubyte[50];
	ubyte[50] stackDst1;
	ubyte[50] stackDst2;
	stackSrc1[5] = 4;
	heapSrc1[0] = 10;
	heapDst1 = heapSrc1[0..50];
	heapDst2 = stackSrc1[0..50];
	stackDst1 = heapSrc1[0..50];
	stackDst2 = stackSrc1[0..50];
	heapSrc1[5] = 10;
	heapDst1[2] = 5;heapDst2[2] = 5;stackSrc1[3] = 10;
	return tuple(heapDst1,heapDst2,stackDst1,stackDst2);
}

void main()
{
	auto a = callit();
	a[0][1] = 5;
	foreach (id, el ; a)
		writeln(id, " => ", el);
}
-------

I found that copying on the heap has to be explicitly specified, while 
copying to the stack is forced:
Works: heap1[] = heap2[]; heap1[] = stack1[];
Doesn't: heap1 = stack1[]; heap1 = heap2[];

Which brings me to the point: if I understand D's return syntax 
correctly, the variables would be copied on the stack in a new tuple. 
What I don't understand is why the return call of callit() doesn't copy 
the heap data on the stack as expected. Even if it copies just the 
pointer to the data (which justified deleting heapDst2's data because it 
was on stack), why does the data at heapDst1 and heapSrc1 get collected 
by the GC and point to gibberish when callit() returns?

Thanks
Jan 22 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
I think you've got a misunderstanding of what int[] and int[N] 
are. They aren't heap vs stack, they are pointer+length vs memory 
block.

int[50] block;
int[] slice = block[];

The second line does NOT copy the memory: it is more like this in 
C:

int block[50];

// int[] in D is a pointer+length pair so two variables in C:
int* slicePtr = NULL;
size_t sliceLength = 0;

// slice = block[] translates to this:

slicePtr = &block[0];
sliceLength = 50; // this is gotten from the static length


So, when you return slice, you are actually returning a pointer 
to a stack array. The GC doesn't run on it, it is just returning 
from the function means the stack memory gets reused.

Now, the other way:

block = slice[0 .. 50];



A memory block (static array, int[N]) is a value type in D; it is 
just a big block of bytes, not a pointer. So what happens here is:

memcpy(&block, slice.ptr, slice.length);


Contrast that to:

int[] a = slice;

which compiles (conceptually) into:

a.ptr = slice.ptr;
a.length = a.length;



Now, when you do:

int[] a = new int[](50);

what happens is more like:

// new int[](50)
size_t length = 50;
int* ptr = malloc(sizeof(int) * length); // well, GC.malloc in D

// a = the return value of new
a.ptr = ptr;
a.length = length;




Bottom line is assigning or returning slices, without the [] 
syntax, always just returns a pointer+length pair. Assigning or 
returning static arrays (blocks of memory) ALWAYS copies the 
contents.

Doing [] at the end of a block on the right hand side:

int[] a = block[];

is a bit different, since [] on the right hand side means "give 
me a pointer+length combo to block"

And using it on the left hand side forces a copy operation:

a[] = block[]; // copy into a the data pointed to on the right 
hand side



Does this explain it any better?
Jan 22 2014
parent reply Etienne <etcimon gmail.com> writes:
On 2014-01-22 3:33 PM, Adam D. Ruppe wrote:
 A memory block (static array, int[N]) is a value type in D; it is just a
 big block of bytes, not a pointer. So what happens here is:

 memcpy(&block, slice.ptr, slice.length);


 Contrast that to:

 int[] a = slice;

 which compiles (conceptually) into:

 a.ptr = slice.ptr;
 a.length = a.length;
It's funny that you wrote this `a.length = a.length`, because I was currently looking at https://d.puremagic.com/issues/show_bug.cgi?id=11970 right before I read your message. But yes, static array / memory blocks vs pointer+len explains it better, I'm trying to grow an idea of the concept of stacks but it's harder to work with, I thought it would help me catch an idea of why I couldn't prevent `ubyte[1024*1024] ub;` from crashing. While working with buffers, this is going to come in handy!
Jan 22 2014
parent "Ivan Kazmenko" <gassa mail.ru> writes:
On Wednesday, 22 January 2014 at 21:46:08 UTC, Etienne wrote:
 But yes, static array / memory blocks vs pointer+len explains 
 it better, I'm trying to grow an idea of the concept of stacks 
 but it's harder to work with, I thought it would help me catch 
 an idea of why I couldn't prevent `ubyte[1024*1024] ub;` from 
 crashing. While working with buffers, this is going to come in 
 handy!
That could be simply a stack overflow error. On Windows, the maximum stack size is compiled into the executable, and the default for dmd is 1 mebibyte (you can, for example, add the flag "-L/STACK:268435456" to set it to 256 mebibytes instead). On Linux, you can control the stack size externally with ulimit tool. Ivan Kazmenko.
Jan 22 2014