digitalmars.D.learn - Confusion over what types have value vs reference semantics

Neurone (9/9) Sep 11 2016 Hi,

rikki cattermole (18/27) Sep 11 2016 Ok two questions here:
Mike Parker (69/78) Sep 11 2016 Primitive types (int, bool, etc) and structs are passed by value

Mike Parker (7/13) Sep 11 2016 I should clarify that this holds true for all slices, not just

Neurone (7/22) Sep 13 2016 Thanks for the detailed answer. I still don't get the advantage

Jonathan M Davis via Digitalmars-d-learn (28/51) Sep 13 2016 Slices are a huge performance boost (e.g. the fact that we have slicing ...

Neurone <dtvictorious gmail.com> writes:

Hi,

Are there universal rules that I can apply to determine what 
types have reference or value semantics? I know that for the 
basic primitive C types (int, bool, etc) has value semantics.

In particular, I'm still trying to understand  stack vs 
GC-managed arrays, and slices.

Finally, I have an associative array, and I want to pass it into 
a function. The function is only reading data. Would putting ref 
on in function parameter pass it by reference?

Sep 11 2016

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/09/2016 3:15 AM, Neurone wrote:
 Hi,

 Are there universal rules that I can apply to determine what types have
 reference or value semantics? I know that for the basic primitive C
 types (int, bool, etc) has value semantics.

 In particular, I'm still trying to understand  stack vs GC-managed
 arrays, and slices.

 Finally, I have an associative array, and I want to pass it into a
 function. The function is only reading data. Would putting ref on in
 function parameter pass it by reference?

Ok two questions here:
1) What constitutes value vs reference passing
2) Allocation location

So basically you have two "locations" where something can be stored, on 
the stack or the heap.
Now I put it in quotes because they are both in RAM somewhere so it 
doesn't matter too much. The only difference is only a subset of the 
stack is readily allocatable at any single function call.

So in regarding to what gets passed by value, well that is simple.
If it is a class, you're passing a few pointers as your reference.
All primal types including pointers are passed as values.

If you use ref, you turn a type into a pointer auto magically (a 
location in the stack most likely).

Just so you're aware, arrays are slices in D (excluding static arrays). 
They are simply a pointer + length.
So I suppose you can think of references and slices as a container of 
sorts for other values which get passed in.

Sep 11 2016

Mike Parker <aldacron gmail.com> writes:

On Sunday, 11 September 2016 at 15:15:09 UTC, Neurone wrote:
 Hi,

 Are there universal rules that I can apply to determine what 
 types have reference or value semantics? I know that for the 
 basic primitive C types (int, bool, etc) has value semantics.

Primitive types (int, bool, etc) and structs are passed by value 
unless a function parameter is annotated with 'ref'.

Classes are reference types, so given instance foo of class Foo, 
foo itself is a reference.

For arrays, it's easiest to think of one as a struct with two 
fields: length and ptr. The ptr field points to the memory where 
the array data is stored. When you pass an array to a function, 
the length & ptr are passed by value (it's a "slice"). That means 
that modifying the length of the array by adding or removing 
elements or attempting to change where ptr is pointing will only 
modify the local copy. You can modify the array elements in the 
original array (manipulate their fields, overwrite them, and 
such), but you can't modify the structure (length or pointer) of 
the original array unless the parameter is annotated with ref.

Although AAs don't have a length or ptr, they work similarly: you 
can modify the contents without ref, but can only modify the 
structure with.

 In particular, I'm still trying to understand  stack vs 
 GC-managed arrays, and slices.

Steven's article on slices should help [1]. It also helps if you 
just think of all dynamic arrays as slices.

int[] foo; // slice with length & ptr, no memory allocated for 
elements
int[3] bar; // static array with length & ptr, three ints 
allocated on the star

The memory for foo needs to be allocated somewhere. It might be 
the GC-managed heap, it might be malloc, it might even be stack 
memory. Doesn't matter.

You cannot modify the length of bar, but you can slice it:

auto barSlice = bar[];

And here, no memory is allocated. barSlice.ptr is the same as 
bar.ptr and barSlice.length is the same as bar.length. However, 
if you append a new element:

barSlice ~= 10;

The GC will allocate memory for a new array and barSlice will no 
longer point to bar. It will now have four elements.

 Finally, I have an associative array, and I want to pass it 
 into a function. The function is only reading data. Would 
 putting ref on in function parameter pass it by reference?

You can easily test this:

```
void addElem(int[string] aa, string key, int val) {
     aa[key] = val;
}

void main()
{
     int[string] map;
     map.addElem("key", 10);
     import std.stdio : writeln;
     writeln("The aa was modified: ", ("key" in map) != null);
}
```

AAs behave like arrays. The meta data (like the array length and 
ptr) is passed by value, the data by reference. The above should 
print false. Add ref to the function parameter, it will print 
true. However, if a key already exists in the aa, then you can 
modify it without ref as it isn't changing the structure of the 
aa. The following will print 10 whether the aa parameter is ref 
or not.

```
void addElem(int[string] aa, string key, int val) {
     aa[key] = val;
}

void main()
{
     int[string] map;
     map["key"] = 5;
     map.addElem("key", 10);
     import std.stdio : writeln;
     writeln("The value of key is ", map["key"]);
}
```
[1] https://dlang.org/d-array-article.html

Sep 11 2016

Mike Parker <aldacron gmail.com> writes:

On Sunday, 11 September 2016 at 16:10:04 UTC, Mike Parker wrote:

 And here, no memory is allocated. barSlice.ptr is the same as 
 bar.ptr and barSlice.length is the same as bar.length. However, 
 if you append a new element:

 barSlice ~= 10;

 The GC will allocate memory for a new array and barSlice will 
 no longer point to bar. It will now have four elements.

I should clarify that this holds true for all slices, not just 
slices of static arrays. The key point is that appending to a 
slice will only allocate if the the .capacity property of the 
slice is 0. Slices of static arrays will always have a capacity 
of 0. Slices of slices might not, i.e. there may be room in the 
memory block for more elements.

Sep 11 2016

Neurone <dtvictorious gmail.com> writes:

On Sunday, 11 September 2016 at 16:14:59 UTC, Mike Parker wrote:
 On Sunday, 11 September 2016 at 16:10:04 UTC, Mike Parker wrote:

 And here, no memory is allocated. barSlice.ptr is the same as 
 bar.ptr and barSlice.length is the same as bar.length. 
 However, if you append a new element:

 barSlice ~= 10;

 The GC will allocate memory for a new array and barSlice will 
 no longer point to bar. It will now have four elements.

 I should clarify that this holds true for all slices, not just 
 slices of static arrays. The key point is that appending to a 
 slice will only allocate if the the .capacity property of the 
 slice is 0. Slices of static arrays will always have a capacity 
 of 0. Slices of slices might not, i.e. there may be room in the 
 memory block for more elements.

Thanks for the detailed answer. I still don't get the advantage 
of passing slices into functions by value allowing modification 
to elements of the original array.  Is there an way to specify 
that a true independent copy of an array should be passed into 
the function? E.g, in c++ func(Vector<int> v) causes a copy of 
the argument to be passed in.

Sep 13 2016

Jonathan M Davis via Digitalmars-d-learn writes:

On Tuesday, September 13, 2016 15:27:07 Neurone via Digitalmars-d-learn wrote:
 On Sunday, 11 September 2016 at 16:14:59 UTC, Mike Parker wrote:
 On Sunday, 11 September 2016 at 16:10:04 UTC, Mike Parker wrote:
 And here, no memory is allocated. barSlice.ptr is the same as
 bar.ptr and barSlice.length is the same as bar.length.
 However, if you append a new element:

 barSlice ~= 10;

 The GC will allocate memory for a new array and barSlice will
 no longer point to bar. It will now have four elements.

 I should clarify that this holds true for all slices, not just
 slices of static arrays. The key point is that appending to a
 slice will only allocate if the the .capacity property of the
 slice is 0. Slices of static arrays will always have a capacity
 of 0. Slices of slices might not, i.e. there may be room in the
 memory block for more elements.

 Thanks for the detailed answer. I still don't get the advantage
 of passing slices into functions by value allowing modification
 to elements of the original array.  Is there an way to specify
 that a true independent copy of an array should be passed into
 the function? E.g, in c++ func(Vector<int> v) causes a copy of
 the argument to be passed in.

Slices are a huge performance boost (e.g. the fact that we have slicing like
this for strings makes parsing code _way_ more efficient by default than
would ever be the case for something like std::string). If you're worried
about a function mutating the elements of an array that it's given, then you
can always mark them with const. e.g.

auto foo(const(int)[] arr) {...}

But there is no way to force a naked dynamic array to do a deeper copy when
passed. If you're worried about it, you can explicitly call dup to create a
copy of the array rather than slice it - e.g. foo(arr.dup) - but the
function itself can't enforce that behavior. The closest that it could do
would be to explicitly dup the parameter itself - though if you were going
to do that, you'd want to make it clear in the documentation, since that's
not a typical thing to do, and if someone wanted to ensure that an array
that they were passing to the function didn't get mutated, they'd dup it
themselves, which would result in two dups if your function did the dup.

If you want a dynamic array to be duped every time it's passed to a function
or otherwise copied, you'd need to create a wrapper struct with a postblit
constructor that called dup. That would generally make for unnecessarily
inefficient code though. The few containers in std.container are all
reference types for the same reason - containers which copy by default make
it way too easy to accidentally copy them and are arguably a bad default
(though obviously, there are cases where that would be the best behavior).

So, the typical thing to do with dynamic arrays is to use const or immutable
elements when you want to ensure that they don't get mutated when passing
them around and duping or iduping a dynamic array when you want to ensure
that you have a copy of the array rather than a slice.

- Jonathan M Davis

Sep 13 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Confusion over what types have value vs reference semantics