www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Documentation of D arrays

reply Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
Hello!

I'm trying to understand array handling in D. Unfortunately the official
documentation[1] is not very helpful..

[1] http://www.digitalmars.com/d/arrays.html

By trial and error I found out that arrays are passed by some COW magic
(where is this documentated?). So, if I want to change the content of an
array visible for the caller, I have to pass it with an inout-statement
(This works, but is it the canonical way?).

Next question: How can I initialize an array?
It seems like COW works only for parameters. Eg.

void foo(inout char[] s)
{
        s = "blub";
}
void bar()
{
	char[] s;
	foo(s);
	s[1] = 'a'; // will crash
}

So, how can I copy the string "blub" into s? s[] = "blub" doesn't work
because the .length won't be adjusted.
Oh, while writing this I noticed "blub".dup does work. It this the
preferred way or should I manually alter the .length?

So what exactly is T[]? According to the documentation it's a tuple
(pointer, length). So, if I pass a T[] to a function, pointer and length
are passed by value (unless I specify and (in)out statement)? Is this
some array magic or can I use this for own types?

I also found out that I can write
void foo(inout int[] a)
{
	a ~= 1;
}
So "~=" does not only support T[] as RHS but also T. Where is the
documentation for this?

Sorry, if these are obvious questions, but I can't figure this out by
the official documentation (or I'm blind).

Regards,
Sebastian
Jan 11 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Sebastian Biallas wrote:
 Hello!
 
 I'm trying to understand array handling in D. Unfortunately the official
 documentation[1] is not very helpful..
 
 [1] http://www.digitalmars.com/d/arrays.html
 
 By trial and error I found out that arrays are passed by some COW magic
 (where is this documentated?). So, if I want to change the content of an
 array visible for the caller, I have to pass it with an inout-statement
 (This works, but is it the canonical way?).

Almost. Dynamic arrays are declared internally like so in D: struct Array { size_t len; byte* ptr; } So passing a dynamic array by value is essentially the same as passing around a pointer. The only effect adding 'inout' to your function will have is that the length of the array can be altered and those changes will persist when the function completes. There is a brief mention of this in: http://www.digitalmars.com/d/function.html "For dynamic array and object parameters, which are passed by reference, in/out/inout apply only to the reference and not the contents."
 Next question: How can I initialize an array?
 It seems like COW works only for parameters. Eg.
 
 void foo(inout char[] s)
 {
         s = "blub";
 }
 void bar()
 {
 	char[] s;
 	foo(s);
 	s[1] = 'a'; // will crash
 }

Doing: s = "blurb"; allocates no memory, but rather just changes Array.ptr to point to "blurb" and sets Array.len appropriately. The above code will actually work in Windows because the data segment where string constants are stored is not read-only.
 So, how can I copy the string "blub" into s? s[] = "blub" doesn't work
 because the .length won't be adjusted.

s = "blurb".dup;
 Oh, while writing this I noticed "blub".dup does work. It this the
 preferred way or should I manually alter the .length?

Yes :-)
 So what exactly is T[]? According to the documentation it's a tuple
 (pointer, length). So, if I pass a T[] to a function, pointer and length
 are passed by value (unless I specify and (in)out statement)? Is this
 some array magic or can I use this for own types?

See above. You could duplicate this in your own code by creating a struct containing pointers. Also, I don't think it's a good idea to call T[] a Tuple in D because the term has a fairly specific connotation. See the section entitled "Tuple Parameters" at http://www.digitalmars.com/d/template.html and also http://www.digitalmars.com/d/phobos/std_typetuple.html
 I also found out that I can write
 void foo(inout int[] a)
 {
 	a ~= 1;
 }
 So "~=" does not only support T[] as RHS but also T. Where is the
 documentation for this?

http://www.digitalmars.com/d/arrays.html I suppose, though the description isn't explicit. Rather, it's implied by "the ~= operator means append."
 Sorry, if these are obvious questions, but I can't figure this out by
 the official documentation (or I'm blind).

Not at all. I've been using D for a few years now, and I still have trouble finding things in the spec. It's pretty much all there, but not always in the most obvious location. Sean
Jan 11 2007
parent reply Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
Sean Kelly wrote:
 Sebastian Biallas wrote:
 Hello!

 I'm trying to understand array handling in D. Unfortunately the official
 documentation[1] is not very helpful..

 [1] http://www.digitalmars.com/d/arrays.html

 By trial and error I found out that arrays are passed by some COW magic
 (where is this documentated?). So, if I want to change the content of an
 array visible for the caller, I have to pass it with an inout-statement
 (This works, but is it the canonical way?).

Almost. Dynamic arrays are declared internally like so in D: struct Array { size_t len; byte* ptr; } So passing a dynamic array by value is essentially the same as passing around a pointer.

But not (Array *) but (len, byte *), I guess?
 The only effect adding 'inout' to your function will
 have is that the length of the array can be altered and those changes
 will persist when the function completes.

Hmm, I'm quite sure I can alter the ptr, too (Implicitly, when I append to the array and there is not enough room).
 There is a brief mention of this in:
 
 http://www.digitalmars.com/d/function.html
 
 "For dynamic array and object parameters, which are passed by reference,
 in/out/inout apply only to the reference and not the contents."

Well, the word "reference" is way to much overloaded. Here you don't pass the Array (you mentioned above) by reference but the content (the object ptr points to).
 Next question: How can I initialize an array?
 It seems like COW works only for parameters. Eg.

 void foo(inout char[] s)
 {
         s = "blub";
 }
 void bar()
 {
     char[] s;
     foo(s);
     s[1] = 'a'; // will crash
 }

Doing: s = "blurb"; allocates no memory, but rather just changes Array.ptr to point to "blurb" and sets Array.len appropriately.

Yeah, I guess I understood this already. Is there something similar to the "const" keyword of C/C++ in D? It looks a little bit fishy to me, that you can write illegal code in D so easy.. In C/C++ you can return constant array in way, that the caller a) knows, it's constant b) errors are detected at compiler time.
 The above code will actually
 work in Windows because the data segment where string constants are
 stored is not read-only.

For some values of "work" :)
 So what exactly is T[]? According to the documentation it's a tuple
 (pointer, length). So, if I pass a T[] to a function, pointer and length
 are passed by value (unless I specify and (in)out statement)? Is this
 some array magic or can I use this for own types?

See above. You could duplicate this in your own code by creating a struct containing pointers.

But without the COW part?
 Also, I don't think it's a good idea to
 call T[] a Tuple in D because the term has a fairly specific
 connotation.  See the section entitled "Tuple Parameters" at
 http://www.digitalmars.com/d/template.html and also
 http://www.digitalmars.com/d/phobos/std_typetuple.html

Yes, you're right.
 I also found out that I can write
 void foo(inout int[] a)
 {
     a ~= 1;
 }
 So "~=" does not only support T[] as RHS but also T. Where is the
 documentation for this?

http://www.digitalmars.com/d/arrays.html I suppose, though the description isn't explicit. Rather, it's implied by "the ~= operator means append."

Hmm, that's not the answer I hoped I'd get :) It's nice to have a language without suprises, but I could only figure out that the above part by trying it.
 Sorry, if these are obvious questions, but I can't figure this out by
 the official documentation (or I'm blind).

Not at all. I've been using D for a few years now, and I still have trouble finding things in the spec. It's pretty much all there, but not always in the most obvious location.

That's sad. On a first glance the documentation looks really good, but then it mostly is about syntax, not about semantic.
Jan 11 2007
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Sebastian Biallas" <groups.5.sepp spamgourmet.com> wrote in message 
news:eo6ofq$1q2b$2 digitaldaemon.com...
 But not (Array *) but (len, byte *), I guess?

Yeah, it's more like (in fact, _exactly_ like) passing around a two-element struct by value. If you pass a struct by value into a function and modify its members, those changes won't be reflected in the calling function unless you use 'inout'. The same thing applies for arrays since this is really what's going on behind the scenes.
 Hmm, I'm quite sure I can alter the ptr, too (Implicitly, when I append
 to the array and there is not enough room).

Yes, that's right.
 Well, the word "reference" is way to much overloaded. Here you don't
 pass the Array (you mentioned above) by reference but the content (the
 object ptr points to).

Yes, and this got me a few times too. Though most of the time I don't need a function to modify an array that's passed into it, just one that's a class member, or maybe modify it and then return it.
 Is there something similar to the "const" keyword of C/C++ in D? It
 looks a little bit fishy to me, that you can write illegal code in D so
 easy.. In C/C++ you can return constant array in way, that the caller
 a) knows, it's constant
 b) errors are detected at compiler time.

No, and this issue has been beaten absolutely to death. I really don't care what happens with this issue. I've never actually run into any bugs that would be solved by having const, but your mileage may vary, I guess. PLEASE, I don't want to start another topic about this :)
 For some values of "work" :)

Hehe
 But without the COW part?

COW is not part of the language. It's just a convention you can follow when writing array-processing functions. These functions also typically return the array, so the function should be called as "s = foo(s)" instead of "foo(s)". The "COW" behavior that you were talking about before -- how resizing/reallocating the array in the function had no effect in the caller -- was really just an effect of what I mentioned at the beginning of this post. The local array "structure" members were changed in the array processing function when you resized the array, and those changes aren't reflected in the calling function.
 Hmm, that's not the answer I hoped I'd get :)
 It's nice to have a language without suprises, but I could only figure
 out that the above part by trying it.

At least it's a nice surprise :)
Jan 11 2007
parent Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
Jarrett Billingsley wrote:
 "Sebastian Biallas" <groups.5.sepp spamgourmet.com> wrote in message 
 Is there something similar to the "const" keyword of C/C++ in D? It
 looks a little bit fishy to me, that you can write illegal code in D so
 easy.. In C/C++ you can return constant array in way, that the caller
 a) knows, it's constant
 b) errors are detected at compiler time.

No, and this issue has been beaten absolutely to death. I really don't care what happens with this issue. I've never actually run into any bugs that would be solved by having const, but your mileage may vary, I guess. PLEASE, I don't want to start another topic about this :)

Sorry, I'm new to this newsgroup, will google :) I'm from the C/C++/Java/Ruby world (not to mention the functional languages) and these languages have pretty easy constraints: C: you pass everything by value C++: you pass by value or by reference (and a reference is -- more or less -- just a pointer) Java: you pass either PODs or references(pointers) by value Ruby: you pass everything by reference D doen't fit it this categories that well. That arrays are passed by reference means something different, because an array in D isn't a first class object (or whatever I should call this). Well, I guess there are some D idioms which avoid the const array problem.
Jan 11 2007
prev sibling next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Sebastian,

 Hello!
 
 I'm trying to understand array handling in D. Unfortunately the
 official documentation[1] is not very helpful..
 
 [1] http://www.digitalmars.com/d/arrays.html
 
 By trial and error I found out that arrays are passed by some COW
 magic (where is this documentated?). So, if I want to change the
 content of an array visible for the caller, I have to pass it with an
 inout-statement (This works, but is it the canonical way?).

Arrays are references types. If you pass an array to a function, the function gets a copy of the pointer length pair that the caller uses. The function can change the contents of the memory the references but can't change the callers reference to that data (unless you use out or inout). As to it seeming to be COW, if you change the length of an array sometimes the GC can't extend it in place and moves the whole thing to a bigger chunk of ram (this dosn't always happen). When the ~ and ~= operators are used, the GC always makes a copy. [...]
 the official documentation (or I'm blind).

I offten feel that way myself. I have had so much trouble finding things that /were/ put in a good place that I have a CGI sript on my box that gives me a grep of the whole D spec converted into a webpage with links and everything.
 
 Regards,
 Sebastian

Jan 11 2007
next sibling parent reply Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
BCS wrote:
 Reply to Sebastian,
 
 Hello!

 I'm trying to understand array handling in D. Unfortunately the
 official documentation[1] is not very helpful..

 [1] http://www.digitalmars.com/d/arrays.html

 By trial and error I found out that arrays are passed by some COW
 magic (where is this documentated?). So, if I want to change the
 content of an array visible for the caller, I have to pass it with an
 inout-statement (This works, but is it the canonical way?).

Arrays are references types. If you pass an array to a function, the function gets a copy of the pointer length pair that the caller uses. The function can change the contents of the memory the references but can't change the callers reference to that data (unless you use out or inout).

Ah, you're right. I guess I have a better picture now.
 As to it seeming to be COW, if you change the length of an array
 sometimes the GC can't extend it in place and moves the whole thing to a
 bigger chunk of ram (this dosn't always happen). When the ~ and ~=
 operators are used, the GC always makes a copy.

Yeah, that's the trick. You can change it in-place (without inout-statement), and the COW-part happens once you alter the length (implicitly or explicitly). I guess what I called COW isn't even the right term. So, new question: How to I pass T[] to foo(), so that foo() isn't allowed to change the content of T[]?
Jan 11 2007
parent reply BCS <ao pathlink.com> writes:
Reply to Sebastian,

 
 So, new question: How to I pass T[] to foo(), so that foo() isn't
 allowed to change the content of T[]?
 

T[] abc = //somthing foo(abc.dup); the .dup propriety does a GC memory allocation and a memcopy of the old array for any array.
Jan 11 2007
parent Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
BCS wrote:
 Reply to Sebastian,
 
 So, new question: How to I pass T[] to foo(), so that foo() isn't
 allowed to change the content of T[]?

T[] abc = //somthing foo(abc.dup); the .dup propriety does a GC memory allocation and a memcopy of the old array for any array.

What if the array is really big? It seems like a waste for me, to duplicate it "just in question".
Jan 12 2007
prev sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
BCS wrote:
 bigger chunk of ram (this dosn't always happen). When the ~ and ~= 
 operators are used, the GC always makes a copy.

~ always makes a copy, but ~= only does so when necessary.
Jan 11 2007
parent reply Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
Frits van Bommel wrote:
 BCS wrote:
 bigger chunk of ram (this dosn't always happen). When the ~ and ~=
 operators are used, the GC always makes a copy.

~ always makes a copy, but ~= only does so when necessary.

The first one is documented on the array page, but where is the documentation for ~=? Common knowledge by using D? BtW: What exacly happens on: a = b ~ c and a ~= b ? Is this some build-in opCat? What are the semantics?
Jan 11 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sebastian Biallas wrote:
 Frits van Bommel wrote:
 BCS wrote:
 bigger chunk of ram (this dosn't always happen). When the ~ and ~=
 operators are used, the GC always makes a copy.


The first one is documented on the array page, but where is the documentation for ~=? Common knowledge by using D?

Not sure, but it should be in the spec somewhere...
 BtW: What exacly happens on:
 
 a = b ~ c and a ~= b
 
 ? Is this some build-in opCat? What are the semantics?

You can see it as a built-in opCat if you like. What happens behind the scenes is that a function in the runtime is called. The source to these functions is in dmd/src/phobos/internal/gc/gc.d if you really want to know exactly what they do... (_d_arraycat for ~, _d_arrayappend for ~= with array, _d_arrayappendc for ~= with single element)
Jan 11 2007
parent reply Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
Frits van Bommel wrote:
 Sebastian Biallas wrote:
 BtW: What exacly happens on:

 a = b ~ c and a ~= b

 ? Is this some build-in opCat? What are the semantics?

You can see it as a built-in opCat if you like. What happens behind the scenes is that a function in the runtime is called. The source to these functions is in dmd/src/phobos/internal/gc/gc.d if you really want to know exactly what they do... (_d_arraycat for ~, _d_arrayappend for ~= with array, _d_arrayappendc for ~= with single element)

Well, I found out by inspecting the disassembly, but I prefer to have spec on which I can rely on.
Jan 11 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sebastian Biallas wrote:
 Frits van Bommel wrote:
 Sebastian Biallas wrote:
 BtW: What exacly happens on:

 a = b ~ c and a ~= b

 ? Is this some build-in opCat? What are the semantics?

What happens behind the scenes is that a function in the runtime is called. The source to these functions is in dmd/src/phobos/internal/gc/gc.d if you really want to know exactly what they do... (_d_arraycat for ~, _d_arrayappend for ~= with array, _d_arrayappendc for ~= with single element)

Well, I found out by inspecting the disassembly, but I prefer to have spec on which I can rely on.

This isn't in the spec. And it shouldn't be in the spec. This is (compiler) implementation-dependent stuff. All you need to know is that it works ;).
Jan 12 2007
prev sibling parent reply Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
Here is some nice example of the confusion I have with D arrays:

On the really interesting page
http://www.digitalmars.com/d/exception-safe.html
this is shown:

class Mailer
{
    void Send(Message msg)
    {
	{
	    char[] origTitle = msg.Title();
	    scope(exit) msg.SetTitle(origTitle);
	    msg.SetTitle("[Sending] " ~ origTitle);
	    Copy(msg, "Sent");
	}
	scope(success) SetTitle(msg.ID(), "Sent", msg.Title);
	scope(failure) Remove(msg.ID(), "Sent");
	SmtpSend(msg);	// do the least reliable part last
    }
}

The scope(xx) things are cool, ok. The question in this example is:

What exactly happens in msg.Title() and msg.SetTitle()? [1]
*) Does msg.Title return a reference or a copy?
*) Does msg.SetTitle copy the reference or the content?

It is (for me) not obvious that the "scope(exit)" clause really works.

[1] And another question is: Why are they names Capitalized? The
style-guide[2] says that function names should start with a lower case
letter.
[2] http://www.digitalmars.com/d/dstyle.html
Jan 11 2007
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Sebastian Biallas" <groups.5.sepp spamgourmet.com> wrote in message 
news:eo6ssg$1vff$1 digitaldaemon.com...
 Here is some nice example of the confusion I have with D arrays:

 On the really interesting page
 http://www.digitalmars.com/d/exception-safe.html
 this is shown:

 class Mailer
 {
    void Send(Message msg)
    {
 {
     char[] origTitle = msg.Title();
     scope(exit) msg.SetTitle(origTitle);
     msg.SetTitle("[Sending] " ~ origTitle);
     Copy(msg, "Sent");
 }
 scope(success) SetTitle(msg.ID(), "Sent", msg.Title);
 scope(failure) Remove(msg.ID(), "Sent");
 SmtpSend(msg); // do the least reliable part last
    }
 }

 The scope(xx) things are cool, ok. The question in this example is:

 What exactly happens in msg.Title() and msg.SetTitle()? [1]
 *) Does msg.Title return a reference or a copy?
 *) Does msg.SetTitle copy the reference or the content?

 It is (for me) not obvious that the "scope(exit)" clause really works.

I think maybe you're putting too much thought into this :) How msg.Title and msg.SetTitle are implemented, in this case, don't really matter. They might do something like: char[] Title() { return mTitle.dup; } void SetTitle(char[] title) { mTitle = title.dup; } In which case, the internal title is always kept secret from everything else, so Title returns a copy and SetTitle always copies its argument; or they could be implemented as: char[] Title() { return mTitle; } void SetTitle(char[] title) { mTitle = title; } In which case the internal title is just a reference to a string that's set to it. Modifying the string in the calling function after setting the title would then cause the internal title to change. How these functions are implemented would be (1) completely up to the implementer of the class, and (2) for the most part, invisible to the users of the class. The code example is the same no matter how these functions are implemented.
 [1] And another question is: Why are they names Capitalized? The
 style-guide[2] says that function names should start with a lower case
 letter.
 [2] http://www.digitalmars.com/d/dstyle.html

Because Walter's weird like that?
Jan 11 2007
parent reply Sebastian Biallas <groups.5.sepp spamgourmet.com> writes:
Jarrett Billingsley wrote:
 "Sebastian Biallas" <groups.5.sepp spamgourmet.com> wrote in message 
 What exactly happens in msg.Title() and msg.SetTitle()? [1]
 *) Does msg.Title return a reference or a copy?
 *) Does msg.SetTitle copy the reference or the content?

 It is (for me) not obvious that the "scope(exit)" clause really works.

I think maybe you're putting too much thought into this :) How msg.Title and msg.SetTitle are implemented, in this case, don't really matter. They might do something like:

[snip] And they might be implemented as: char[] Title() { return mTitle; } void SetTitle(char[] title) { mTitle.length = title.length; mTitle[] = title; } In which case the code just fails in case of an error. My point is, that hypothetical C++ solution can't fail, just because of some odd implementation: { std::string origTitle; msg.getTitle(origTitle); scope(exit) msg.SetTitle(origTitle); msg.SetTitle("[Sending] " + origTitle); Copy(msg, "Sent"); } [I rely here on some imagenary scope keyword in C++] By reading the C++ code it is /obvious/ that I have a copy of origTitle.
 How these functions are implemented would be (1) completely up to the 
 implementer of the class, and (2) for the most part, invisible to the users 
 of the class.  The code example is the same no matter how these functions 
 are implemented.

See above :)
 [1] And another question is: Why are they names Capitalized? The
 style-guide[2] says that function names should start with a lower case
 letter.
 [2] http://www.digitalmars.com/d/dstyle.html

Because Walter's weird like that?

Oh, I didn't know that :)
Jan 11 2007
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Sebastian Biallas" <groups.5.sepp spamgourmet.com> wrote in message 
news:eo6vbc$22c4$1 digitaldaemon.com...

 And they might be implemented as:
 char[] Title()
 {
 return mTitle;
 }

 void SetTitle(char[] title)
 {
 mTitle.length = title.length;
 mTitle[] = title;
 }

 In which case the code just fails in case of an error.

Then change it to: void SetTitle(char[] title) { mTitle.length = title.length; if(mTitle !is title) mTitle[] = title; } No more overlapping array copy error, and you also avoid unnecessarily copying the array if mTitle already points to it. At least, I guess that's what error you're talking about, that's the only error I get.
 My point is, that hypothetical C++ solution can't fail, just because of
 some odd implementation:
 {
 std::string origTitle;
 msg.getTitle(origTitle);
 scope(exit) msg.SetTitle(origTitle);
 msg.SetTitle("[Sending] " + origTitle);
 Copy(msg, "Sent");
 }
 [I rely here on some imagenary scope keyword in C++]

 By reading the C++ code it is /obvious/ that I have a copy of origTitle.

Really? ;) I wouldn't have known that, but maybe that's just because I don't know much about C++. How does "msg.getTitle(origTitle)" guarantee that I'm getting a copy of the title?
 Because Walter's weird like that?

Oh, I didn't know that :)

Oh! Well, get used to it. ;)
Jan 12 2007