www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - immutable strings, spec vs. reality

reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
http://www.digitalmars.com/d/cppstrings.html says:

  In D, use the array slicing syntax in the natural manner:
 
 	char[] str = "hello";
 	str[1..2] = '?';		// str is "h??lo"
 

Okay, so one tries this little example on Linux/Darwin:
 import std.stdio;
 void main()
 {
   char[] str = "hello";
   str[1..2] = '?';
   writefln("%s", str);
 }

<kaboom> (segfault / bus error) Okay, that's right. Forgot the small print on the compiler:
 Differences from Win32 version

 * String literals are read-only. Attempting to write to
   them will cause a segment violation.

Copy-on-Write* (a.k.a. duplicate before changing), I forgot:
    char[] str = "hello".dup;

"h?llo". Oh, that's right. Exclusive ranges, not inclusive:
 	str[1..3] = '?';		// str is "h??lo"

"h??lo". Finally! "in the natural manner", eh? :-P It becomes even funnier when using slices. Remember, those are *not* copies, but just "another reference to the data".
 	char[] s1 = "hello world";
 	char[] s2 = s1[6 .. 11];	// s2 is "world"

So far, so good. Now it's just a matter of being careful:
  s2[3] = '?';

<kaboom>. Right, of course I meant to do a copy first...
   char[] s2 = s1[6 .. 11].dup;  // s2 is "world"
   s2[3] = '?';

This (Copy-on-Write) and the toStringz bugs (with NUL-term) take a while of getting used to... Maybe it needs more of copying-as-default-operation, or I need to be more careful. :-) Either way, there needs to be more examples in the D spec... Please note that I think that read-only-strings as well as copy-on-write is a *good thing*. Making the default string mutable is a design error, in my book... (like Dool does) http://dool.sourceforge.net/dool_String_String.html I prefer StringBuffer (Java) or NSMutableString (Objective-C) and having the default strings immutable: String / NSString. they it is faster (less copying) as well as being thread-safe. (since immutable objects doesn't need any synchronizing...) It also makes it easier to use literals and slices, in D. Mango has some weird boolean flag instead, but that's OK too :-) http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classUString.html (but I think the idea is that UText is immutable and UString is mutable) And of course, using "string" in D instead of "char[]" wouldn't hurt ? (just as using "bool" instead of "bit" hasn't been all that painful...) --anders * C-o-W, as in http://www.digitalmars.com/d/phobos.html#string :
 When a function takes a string as a parameter, and returns a string, is
 that string the same as the input string, modified in place, or is it a
 modified copy of the input string? The D array convention is
 "copy-on-write". This means that if no modifications are done, the
 original string (or slices of it) can be returned. If any modifications
 are done, the returned string is a copy.

Note that it says "can"... One can always dup, just to be safe. (something that I hope that std.string.toStringz can listen to)
Feb 07 2005
next sibling parent reply Ben Hinkle <Ben_member pathlink.com> writes:
In article <cu7m7d$1qur$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
http://www.digitalmars.com/d/cppstrings.html says:

  In D, use the array slicing syntax in the natural manner:
 
 	char[] str = "hello";
 	str[1..2] = '?';		// str is "h??lo"
 

Okay, so one tries this little example on Linux/Darwin:
 import std.stdio;
 void main()
 {
   char[] str = "hello";
   str[1..2] = '?';
   writefln("%s", str);
 }

<kaboom> (segfault / bus error) Okay, that's right. Forgot the small print on the compiler:
 Differences from Win32 version

 * String literals are read-only. Attempting to write to
   them will cause a segment violation.


Similar to C. That's why in C++ string literals have type const char*.
Copy-on-Write* (a.k.a. duplicate before changing), I forgot:

    char[] str = "hello".dup;

"h?llo". Oh, that's right. Exclusive ranges, not inclusive:
 	str[1..3] = '?';		// str is "h??lo"

"h??lo". Finally! "in the natural manner", eh? :-P

Any suggestions for improvement? The only ones that pop into my head are things like "automatically dup all string literals before any module ctors run". Or perhaps "make strings reference counted and automatically copy on write". I'd rather keep the C behavior and save on startup speed for the first. For the second it would fundamentally change string implementation and behavior and probably would just trade one set of annoying behavior for another.
It becomes even funnier when using slices. Remember, those
are *not* copies, but just "another reference to the data".

This is a very important feature.
 	char[] s1 = "hello world";
 	char[] s2 = s1[6 .. 11];	// s2 is "world"

So far, so good. Now it's just a matter of being careful:
  s2[3] = '?';

<kaboom>. Right, of course I meant to do a copy first...
   char[] s2 = s1[6 .. 11].dup;  // s2 is "world"
   s2[3] = '?';

This (Copy-on-Write) and the toStringz bugs (with NUL-term) take a while of getting used to... Maybe it needs more of copying-as-default-operation, or I need to be more careful. :-)

The toStringz bug about NUL-term will be fixed and is independent of COW.
Either way, there needs to be more examples in the D spec...

Please note that I think that read-only-strings as well as
copy-on-write is a *good thing*. Making the default string
mutable is a design error, in my book... (like Dool does)
http://dool.sourceforge.net/dool_String_String.html

Though the downside to having const and non-const strings is you end up converting one to the other depending on what function you call, which would get very annoying. Either that or the system develops a convention. For example in Java you use StringBuffers to make strings but you use Strings to pass them between functions. D's convention is COW. Either way users have to learn the convention in order to use strings effectively.
I prefer StringBuffer (Java) or NSMutableString (Objective-C)
and having the default strings immutable: String / NSString.
they it is faster (less copying) as well as being thread-safe.
(since immutable objects doesn't need any synchronizing...)
It also makes it easier to use literals and slices, in D.

Mango has some weird boolean flag instead, but that's OK too :-)
http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classUString.html
(but I think the idea is that UText is immutable and UString is mutable)

And of course, using "string" in D instead of "char[]" wouldn't hurt ?
(just as using "bool" instead of "bit" hasn't been all that painful...)

--anders


* C-o-W, as in http://www.digitalmars.com/d/phobos.html#string :
 When a function takes a string as a parameter, and returns a string, is
 that string the same as the input string, modified in place, or is it a
 modified copy of the input string? The D array convention is
 "copy-on-write". This means that if no modifications are done, the
 original string (or slices of it) can be returned. If any modifications
 are done, the returned string is a copy.

Note that it says "can"... One can always dup, just to be safe. (something that I hope that std.string.toStringz can listen to)

Feb 07 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

* String literals are read-only. Attempting to write to
  them will cause a segment violation.


Similar to C. That's why in C++ string literals have type const char*.

Wonder if there's a way to make string literals read-only on Windows too, to any avoid later surprises when porting ? Either way, I think it should be made part of the D language: "String literals are read-only". On all possible D platforms. It would also offer some possibilities for string pooling...
 Any suggestions for improvement? The only ones that pop into my head are things
 like "automatically dup all string literals before any module ctors run". Or
 perhaps "make strings reference counted and automatically copy on write". I'd
 rather keep the C behavior and save on startup speed for the first. For the
 second it would fundamentally change string implementation and behavior and
 probably would just trade one set of annoying behavior for another.

The D behaviour is OK, I was just ranting a bit about that it might need some more documentation before it becomes "natural" ? Treating all "external" strings as read-only / immutable and then using copy-on-write works, whether they're literals or parameters. Not if it was clear from my writing, but eventually it did work :-)
 The toStringz bug about NUL-term will be fixed and is independent of COW. 

Yes, and I think I finally got to terms with when the strings are NUL-terminated in the implementation and when they're not... (in short: string literals are null terminated, and dynamic arrays of chars could be as a side-effect but are not always)
 Though the downside to having const and non-const strings is you end up
 converting one to the other depending on what function you call, which would
get
 very annoying. Either that or the system develops a convention. For example in
 Java you use StringBuffers to make strings but you use Strings to pass them
 between functions. D's convention is COW. Either way users have to learn the
 convention in order to use strings effectively.

Usually mutable strings inherit from immutable strings, so that you can use them directly. Either that, or there is a creating method. (such as java.lang.StringBuffer.toString(), for that language) So it's basically the same as D, except that the others use classes instead of built-in arrays and that they use UTF-16 instead of UTF-8 ? (and I believe that D's choices does have it's merits, and is just fine) --anders
Feb 07 2005
prev sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <cu7pg8$221r$1 digitaldaemon.com>, Ben Hinkle says...
Any suggestions for improvement? The only ones that pop into my head are things
like "automatically dup all string literals before any module ctors run". Or
perhaps "make strings reference counted and automatically copy on write". I'd
rather keep the C behavior and save on startup speed for the first. For the
second it would fundamentally change string implementation and behavior and
probably would just trade one set of annoying behavior for another.

I'd like to suggest the compiler support the notion of read-only variables. That is, a "const char[]" is read-only, and any attempt to write it results in a compile-time error. Such variables cannot be implicitly cast to non-const. Given that; string literals might be implicitly tagged as read-only (effectively a "const char[]"). This maintains full code-speed, whilst endowing the compiler with some very useful functionality. Example: Mango.icu.UText is supposedly an immutable object. However, it must permit access to its content as a wchar[]. Without support for such read-only arrays ("const wchar[]"), UText is really /not/ immutable at all. That is, D is missing some fundamental support for creating immutable object. In the case of UText, one might argue to return a copy of the content instead ... I won't even comment upon that notion :-) - Kris
Feb 07 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 7 Feb 2005 18:26:28 +0000 (UTC), Kris <Kris_member pathlink.com>  
wrote:
 In article <cu7pg8$221r$1 digitaldaemon.com>, Ben Hinkle says...
 Any suggestions for improvement? The only ones that pop into my head  
 are things
 like "automatically dup all string literals before any module ctors  
 run". Or
 perhaps "make strings reference counted and automatically copy on  
 write". I'd
 rather keep the C behavior and save on startup speed for the first. For  
 the
 second it would fundamentally change string implementation and behavior  
 and
 probably would just trade one set of annoying behavior for another.

I'd like to suggest the compiler support the notion of read-only variables. That is, a "const char[]" is read-only, and any attempt to write it results in a compile-time error. Such variables cannot be implicitly cast to non-const.

I agree we need to be able to say this variable cannot be written to (both reference and contents). But, we don't want to go down the path of specifying which parameters to functions are const, eg. const char[] bar = "test"; void foo(char[] s) { } foo(bar); //error cannot cast "const char[]" to "char[]" Isn't that the C/C++ 'const' thing all over again? That said, I cannot see how we can have const variables without something indicating what a function is going to do with it's variables ... but hold on, aren't we already doing it? I have made a suggestion before and I still think it's a good idea, why not use the in/out/inout parameter specifiers to enforce const'ness. i.e. 'in' (default) = cannot be written to. 'out' = can be written to, initialised upon entry. 'inout' = can be written to, not init upon entry. Meaning if a function has 'out' or 'inout' it's clearly saying I need to write to this, so passing a const char[] will cause a compile error. Regan
Feb 07 2005
parent reply "Carlos Santander B." <csantander619 gmail.com> writes:
Regan Heath wrote:
 I have made a suggestion before and I still think it's a good idea, why  
 not use the in/out/inout parameter specifiers to enforce const'ness. i.e.
 
 'in' (default) = cannot be written to.
 'out'   = can be written to, initialised upon entry.
 'inout' = can be written to, not init upon entry.
 
 Meaning if a function has 'out' or 'inout' it's clearly saying I need 
 to  write to this, so passing a const char[] will cause a compile error.
 
 
 Regan

I agree with this idea, and others have agreed too. But I really can't recall Walter ever saying something about it. _______________________ Carlos Santander Bernal
Feb 09 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cue0ed$9fs$1 digitaldaemon.com>, Carlos Santander B. says...
Regan Heath wrote:
 I have made a suggestion before and I still think it's a good idea, why  
 not use the in/out/inout parameter specifiers to enforce const'ness. i.e.
 
 'in' (default) = cannot be written to.
 'out'   = can be written to, initialised upon entry.
 'inout' = can be written to, not init upon entry.
 
 Meaning if a function has 'out' or 'inout' it's clearly saying I need 
 to  write to this, so passing a const char[] will cause a compile error.
 
 
 Regan

I agree with this idea, and others have agreed too. But I really can't recall Walter ever saying something about it. _______________________ Carlos Santander Bernal

It has some merit, yet does not cover return values. If the syntax were somehow extended to return-values, then there might be something. But then there's the long-standing problem between inout and read-only structs: One often uses structs to gather read-only reference-data together. Such read-only data would typically be placed in ROM, for any device that has that kind of memory. How does one pass this data to a function? Well, you can pass a copy of it on the stack. That's hardly a viable solution when the read-only data is an entire font description, along with all the splines, hints, and so on :-} The other option (and the typical resolution) is to pass the read-only data by reference. Unfortunately, D suports this only via an 'inout' argument - which conflicts sharply with the notion of read-only. The compiler complained, and rightly so, that const (read-only) struct could not be mapped onto an inout parameter. Therein lies a big hole regarding the simplified pass-by-reference semantics. I ran into this about nine months ago with const structs, and posted about the conflict at that time. The only sane way around it is to remove the const attribute from the struct. Thus D loses badly to C in terms of viability as an embedded-solutions language. I know the 'inout' problem with structs is somewhat of a sidetrack here; but it is related, so was worth noting. - Kris
Feb 09 2005
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kris wrote:

 It has some merit, yet does not cover return values. If the syntax were somehow
 extended to return-values, then there might be something.

I believe there was a suggestion to allow "out" and "inout" return values too, in order to make them behave more like C++ references. But I don't think there ever came anything out of it. The runtime seems to be using (in)out arguments, instead of returning values.
 One often uses structs to gather read-only reference-data together. Such
 read-only data would typically be placed in ROM, for any device that has that
 kind of memory. 
 
 How does one pass this data to a function? Well, you can pass a copy of it on
 the stack. That's hardly a viable solution when the read-only data is an entire
 font description, along with all the splines, hints, and so on :-}

You pass it with a pointer, and a friendly post-it saying "Don't Touch" This is similar to how I can use functions that have char[] parameters: I can either pass "hello" and hope they use Copy-on-Write like they should (because if they try to write to the literal, they'll segfault) or I make a .dup before, because I distrust that particular function
 The other option (and the typical resolution) is to pass the read-only data by
 reference. Unfortunately, D suports this only via an 'inout' argument - which
 conflicts sharply with the notion of read-only. The compiler complained, and
 rightly so, that const (read-only) struct could not be mapped onto an inout
 parameter. Therein lies a big hole regarding the simplified pass-by-reference
 semantics.

Seems to be exactly the same case as with just using pointers directly. (then again, isn't that just want the references are, pointers hiding?) --anders
Feb 09 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 09 Feb 2005 23:56:54 +0100, Anders F Björklund <afb algonet.se>  
wrote:
 Kris wrote:

 It has some merit, yet does not cover return values. If the syntax were  
 somehow
 extended to return-values, then there might be something.

I believe there was a suggestion to allow "out" and "inout" return values too, in order to make them behave more like C++ references. But I don't think there ever came anything out of it. The runtime seems to be using (in)out arguments, instead of returning values.

I think I prefer using out/inout parameters to extending the return value in this way, I think out/inout parameters are more flexible.
 One often uses structs to gather read-only reference-data together. Such
 read-only data would typically be placed in ROM, for any device that  
 has that
 kind of memory.  How does one pass this data to a function? Well, you  
 can pass a copy of it on
 the stack. That's hardly a viable solution when the read-only data is  
 an entire
 font description, along with all the splines, hints, and so on :-}

You pass it with a pointer, and a friendly post-it saying "Don't Touch" This is similar to how I can use functions that have char[] parameters: I can either pass "hello" and hope they use Copy-on-Write like they should (because if they try to write to the literal, they'll segfault) or I make a .dup before, because I distrust that particular function

If 'in' was enforced, i.e. the variable could not be written to, then you could pass without the dup, safely and confidently.
Feb 09 2005
prev sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <cue4fn$dc2$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Kris wrote:

 It has some merit, yet does not cover return values. If the syntax were somehow
 extended to return-values, then there might be something.

I believe there was a suggestion to allow "out" and "inout" return values too, in order to make them behave more like C++ references. But I don't think there ever came anything out of it. The runtime seems to be using (in)out arguments, instead of returning values.

Aye; but the latter are (by definition) open to mutation by the caller. I'm just pointing that out for folks who might not have noted the distinction :~}
 One often uses structs to gather read-only reference-data together. Such
 read-only data would typically be placed in ROM, for any device that has that
 kind of memory. 
 
 How does one pass this data to a function? Well, you can pass a copy of it on
 the stack. That's hardly a viable solution when the read-only data is an entire
 font description, along with all the splines, hints, and so on :-}

You pass it with a pointer, and a friendly post-it saying "Don't Touch"

Right. Heh! There's the rub; you declare the structs as const, and then cast them over as a non-const * argument -- exactly what we ought to get a smack on the hand for -- perhaps we should endevour to get away from that kind of behaviour, and I understand that's what D is largely about :-) - Kris
Feb 09 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kris wrote:

You pass it with a pointer, and a friendly post-it saying "Don't Touch"

Right. Heh! There's the rub; you declare the structs as const, and then cast them over as a non-const * argument -- exactly what we ought to get a smack on the hand for -- perhaps we should endevour to get away from that kind of behaviour, and I understand that's what D is largely about :-)

Good old C. Back when I learned it, we didn't use those pansy "const". And none of this size_t and other portable crap, that wasn't defined. int strlen(char *s); Of course, using the stdlib was for wusses too, so you usually ended up with some square wheel function like: (hidden by a macro or two) char *p = s; while (*p++); int len = p - s; Or just write in in assembler. Or punch cards or something like that. Glad that D let's me relive all these nostalgical computing moments. Seriously, though. It's an improvement. Just not as big as I hoped ? Which is too bad, since I'd hoped to keep avoiding C++ a while longer. I'll just keep using the trust method and hope someone comes up with something clever. But I'm kinda tired of this long discussion by now, so I'll let it rest for a week. But I hope string literals become R/O! Microsoft Visual C++:
 The /GF option enables the compiler to pool strings and place them in
 read-only memory. By placing the strings in read-only memory, the
 operating system does not need to swap that portion of memory. Instead,
 it can read the strings back from the image file. It is a good idea to
 do this as it saves pages of memory from being written to and therefore
 reduces the working set used by the application. In addition, it allows
 those pages to be shared between multiple instances of the process that
 use that image file (.exe or .dll file), further reducing total memory
 usage in the entire system. Strings placed in read-only memory cannot be
 modified; if you try to modify them, you will see an Application Error
 dialog box.

They are already on my preferred systems, so I'll let you catch up. :-) --anders
Feb 09 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cue7jd$h35$2 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Kris wrote:

You pass it with a pointer, and a friendly post-it saying "Don't Touch"

Right. Heh! There's the rub; you declare the structs as const, and then cast them over as a non-const * argument -- exactly what we ought to get a smack on the hand for -- perhaps we should endevour to get away from that kind of behaviour, and I understand that's what D is largely about :-)

Good old C. Back when I learned it, we didn't use those pansy "const". And none of this size_t and other portable crap, that wasn't defined. int strlen(char *s);

You placed the type sig *inside* the parens? Where it was readable? Ehhhh ... ya pansy ... :~)
Feb 09 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kris wrote:

Good old C. Back when I learned it, we didn't use those pansy "const".
And none of this size_t and other portable crap, that wasn't defined.

int strlen(char *s);

You placed the type sig *inside* the parens? Where it was readable? Ehhhh ... ya pansy ... :~)

I'm not *that* old, and the punch cards were actually my dads :-) Besides, the K & R declaration of arguments is the only explanation of their brace style, that uses one set for functions and one for ifs... int strlen(s) char *s; { if (0) { } } --anders
Feb 09 2005
prev sibling parent "Regan Heath" <regan netwin.co.nz> writes:
I'm doing a double-whammy reply here to both Carlos and Kris :)

On Wed, 9 Feb 2005 22:39:14 +0000 (UTC), Kris <Kris_member pathlink.com>  
wrote:
 In article <cue0ed$9fs$1 digitaldaemon.com>, Carlos Santander B. says...
 Regan Heath wrote:
 I have made a suggestion before and I still think it's a good idea, why
 not use the in/out/inout parameter specifiers to enforce const'ness.  
 i.e.

 'in' (default) = cannot be written to.
 'out'   = can be written to, initialised upon entry.
 'inout' = can be written to, not init upon entry.

 Meaning if a function has 'out' or 'inout' it's clearly saying I need
 to  write to this, so passing a const char[] will cause a compile  
 error.


 Regan

I agree with this idea, and others have agreed too. But I really can't recall Walter ever saying something about it.


--Carlos-- Neither can I. --Kris--
 It has some merit, yet does not cover return values. If the syntax were  
 somehow
 extended to return-values, then there might be something.

Or, could we decide that it wasn't worth the effort/complexity/etc to extend it to return values and have functions use out or inout instead? I'm am just speculating, I have no idea whether it is or isn't worth the effort. I find the out and inout idea more flexible than a return value, which I would tend to use for a true/false pass/fail concept in most cases. I can see how with a return value the following code: foo(a); bar(a); can be re-written as bar(foo(a)); the question is, is that an advantage or a disadvantage? is this advantage worth the effort of extending the idea to return values?
 But then there's the long-standing problem between inout and read-only  
 structs:

 One often uses structs to gather read-only reference-data together. Such
 read-only data would typically be placed in ROM, for any device that has  
 that
 kind of memory.

 How does one pass this data to a function? Well, you can pass a copy of  
 it on
 the stack. That's hardly a viable solution when the read-only data is an  
 entire
 font description, along with all the splines, hints, and so on :-}

If what you're saying is that the current implementation of 'in' which copies the parameter is a bad way to pass a struct. I agree. How about if 'in' mean't 'readonly'? The compiler could enforce that (compile time and/or runtime) then 'in' could pass the struct by reference. I understand that in some situations passing a copy is actually faster than a reference, in which case the compiler can still choose to pass a copy, right? <snip>
 I know the 'inout' problem with structs is somewhat of a sidetrack here;  
 but it
 is related, so was worth noting.

I agree, it's related, and is an important part of the overall soln IMO. Regan
Feb 09 2005
prev sibling next sibling parent Ben Hinkle <Ben_member pathlink.com> writes:
* C-o-W, as in http://www.digitalmars.com/d/phobos.html#string :
 When a function takes a string as a parameter, and returns a string, is
 that string the same as the input string, modified in place, or is it a
 modified copy of the input string? The D array convention is
 "copy-on-write". This means that if no modifications are done, the
 original string (or slices of it) can be returned. If any modifications
 are done, the returned string is a copy.


I should add that a dlint program could do some flow analysis for simple cases and generate recommendations when COW is not being obeyed. For example if an input to a function is changed in-place then dlint could flag that. It shouldn't be an error since passing char[] as buffers to be filled is a common practise. Dlint could also flag if a string literal is changed in-place. One problem with such recommendations, though, is that they can be hard to track. I would expect dlint to only do such flow analysis within a given function and passing a string literal to another function as a buffer would be beyond such simple analysis.
Feb 07 2005
prev sibling next sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <cu7m7d$1qur$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Mango has some weird boolean flag instead, but that's OK too :-)
http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classUString.html
(but I think the idea is that UText is immutable and UString is mutable)

Just to clarify: Utext is immutable, and its subclass UString is /optionally/ mutable. The latter has a boolean flag to indicate whether it's safe to alias the assigned content (where said content is already immutable). UString defaults to assuming the content is mutable, and will therefore copy the assigned content. Note that UString can be passed in place of a UText argument, but not the other way around. I'm totally with you on this, Anders. Immutable objects, backed up by their mutable variation, are the way to go for mutli-threaded apps. Heck, any design can benefit (in terms of robustness and determinism) from appropriate usage of immutable objects. Note, however, that D does not really suppport immutable arrays per se. For example, one cannot declare a "const char[]" and expect the compiler to toss an error wherever an assignment is made. Walter has been asked to provide support for such notions (read-only variables) but there's been no movement on that as yet. Please note that read-only variables are not the same issue as the generic usage of 'const' within C++!! That's a different ball-game altogether and much, much, harder to implement. - Kris
Feb 07 2005
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kris wrote:

 I'm totally with you on this, Anders. Immutable objects, backed up by their
 mutable variation, are the way to go for mutli-threaded apps. Heck, any design
 can benefit (in terms of robustness and determinism) from appropriate usage of
 immutable objects.

Passing around pointers tends to be a tad faster than copying, as well. But the inheritance way is somewhat dangerous, when it comes to threads. (there's always a risk the "immutable" copy is a mutable in disguise...)
 Note, however, that D does not really suppport immutable arrays per se. For
 example, one cannot declare a "const char[]" and expect the compiler to toss an
 error wherever an assignment is made. Walter has been asked to provide support
 for such notions (read-only variables) but there's been no movement on that as
 yet.

No, Copy-on-Write uses the honor system more than any actual checks ? Read-only variables (similar to C++ "const") sound like a neat idea... I'd settle for read-only string literals, as a small start towards it. --anders
Feb 07 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cu8cog$62m$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
But the inheritance way is somewhat dangerous, when it comes to threads.
(there's always a risk the "immutable" copy is a mutable in disguise...)

Aye; the assertion is that the callee cannot modify the internals of the object passed to it. It does not account for the case where the caller manipulates said Object concurrently with the invocation of said callee. For that, /both/ parties have to agree on immutability. Luckily, the vast majority of cases fall into the former camp (in my experience), so we can allow for some flexibility via the subclassing mechanism. This tends to avoid the backlash that some have regarding Java strings (in terms of object reconstruction for passing to a "I'm a safe procedure!" callee). Of course, that just my opinion :-)
 Note, however, that D does not really suppport immutable arrays per se. For
 example, one cannot declare a "const char[]" and expect the compiler to toss an
 error wherever an assignment is made. Walter has been asked to provide support
 for such notions (read-only variables) but there's been no movement on that as
 yet.

No, Copy-on-Write uses the honor system more than any actual checks ?

That's correct. Which smacks of total hypocrisy given Walters recent 'urgent claims' over how D protects the programmer from themselves :-)
Read-only variables (similar to C++ "const") sound like a neat idea...
I'd settle for read-only string literals, as a small start towards it.

Agreed. I'd just like to see it done in an extensible and forward thinking manner - Kris
Feb 07 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kris wrote:

No, Copy-on-Write uses the honor system more than any actual checks ?

That's correct. Which smacks of total hypocrisy given Walters recent 'urgent claims' over how D protects the programmer from themselves :-)

I thought it was Java that was the PG-13 version, and that D *let* you do adult stuff like goto or asm or pointers or mixing ints and booleans :-D
Read-only variables (similar to C++ "const") sound like a neat idea...
I'd settle for read-only string literals, as a small start towards it.

Agreed. I'd just like to see it done in an extensible and forward thinking manner

With implicit casts from char[] to (char*), I wouldn't hold my breath ? Just want everyone's Windows literals to crash, like my Mac ones do. :-) In your case, I'd just warm up my "return contents.dup;" workarounds... --anders
Feb 07 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cu8fvf$csa$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
In your case, I'd just warm up my "return contents.dup;" workarounds...

AieeeeeeeEEEEE!! Never! Nooooooo; Noooooo! I'd rather chop down a tree with, a Herring! Truthfully, I don't want Mango libraries getting a reputation for being inneficient, just to work around a glaring omission within an alpha language. One would hope Walter will recognize the validity of read-only vars, and do something about that instead. (keep the pressure on!)
Feb 07 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kris wrote:

 AieeeeeeeEEEEE!! Never! Nooooooo; Noooooo!
 I'd rather chop down a tree with, a Herring!

Now you're starting to sound like those people that refuse to recognize that false and 0 and null are the same thing ? They made the same kinds of noises, when it was "decided" on the C/integer logic that D use. :-) (but at least one *can* use bool and true and false, and make-believe ?)
 Truthfully, I don't want Mango libraries getting a reputation for being
 inneficient, just to work around a glaring omission within an alpha language.
 One would hope Walter will recognize the validity of read-only vars, and do
 something about that instead.

Actually, to keep within the D spirit you should probably pass around char[] like everyone else instead of this "wchar[]-in-a-Class" stuff ;-) Never mind that Phobos only has library support for ASCII strings... (as in, when it comes to things like strlen and toupper and whatnot) --anders
Feb 07 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:cu8jvo$jto$1 digitaldaemon.com...
 Kris wrote:

 AieeeeeeeEEEEE!! Never! Nooooooo; Noooooo!
 I'd rather chop down a tree with, a Herring!

Now you're starting to sound like those people that refuse to recognize that false and 0 and null are the same thing ?

Well, if you'll turn to page 227 in your copy of Imperfect C++ you'll see that NULL need not be the same as 0, and the usefulness of that. <g>
Feb 08 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Matthew wrote:

Now you're starting to sound like those people that refuse to 
recognize that false and 0 and null are the same thing ?

Well, if you'll turn to page 227 in your copy of Imperfect C++ you'll see that NULL need not be the same as 0, and the usefulness of that. <g>

Like you go on about that book of yours I will soon have to buy it ;-) Walter has explained that they all mean "low voltage" or "open gate" (I forgot which one it was, could have been empty radio tube or no rock) So there is no need for D to separate between them, for things like ifs? Which means to write (!object) instead of the bulky (!(object is null)) YIN: if(false), if(0), if(null) YANG: if(true), if(1), if(this) Whether or not I think it's a good idea doesn't matter, since it doesn't seem to be changing... And "since it worked for C / C++" --anders
Feb 08 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:cub8hb$kle$1 digitaldaemon.com...
 Matthew wrote:

Now you're starting to sound like those people that refuse to 
recognize that false and 0 and null are the same thing ?

Well, if you'll turn to page 227 in your copy of Imperfect C++ you'll see that NULL need not be the same as 0, and the usefulness of that. <g>

Like you go on about that book of yours I will soon have to buy it ;-)

Aha! So it's working then ...
 Walter has explained that they all mean "low voltage" or "open gate"
 (I forgot which one it was, could have been empty radio tube or no 
 rock)

 So there is no need for D to separate between them, for things like 
 ifs?
 Which means to write (!object) instead of the bulky (!(object is 
 null))

 YIN: if(false), if(0), if(null)
 YANG: if(true), if(1), if(this)

 Whether or not I think it's a good idea doesn't matter, since it
 doesn't seem to be changing... And "since it worked for C / C++"

Well, I know that the no-implicitly-boolean sub-expressions is never gonna fly, so I won't bother to explain why I don't use 'em. (That's also in the book <g>)
Feb 08 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Matthew wrote:

Like you go on about that book of yours I will soon have to buy it ;-)

Aha! So it's working then ...

Even if I don't use C++ it's a perfectly legitimate business expense :-)
Whether or not I think it's a good idea doesn't matter, since it
doesn't seem to be changing... And "since it worked for C / C++"

Well, I know that the no-implicitly-boolean sub-expressions is never gonna fly, so I won't bother to explain why I don't use 'em. (That's also in the book <g>)

I can't say I'm crazy about going back to writing old-school "C" boolean expressions again; but then again if (!(object is null)) is going to be a real eye-sore, now that (object !== null) seems to have been deprecated due to being confused with regular != And since "isnot" probably won't fly either, then that leaves: assert(object); And of course then there is the need to use wbit and dbit, when writing a) things that need pointers or b) overloads: wbit[] array = new wbit[8192]; wbit* p = &array[42]; *p = true; dbit opEquals(Object o); They could even be faster than the regular old bit type, since they avoid the masking and shifting the other one could need ? Probably even more fun for beginners than the string types, char[] wchar[] dchar[] But I guess "alias char[] str; alias wchar[] ustr;" could be made to work, just as "alias bit bool;" have already ? The "bool" (boolean) implementation details of bit/wbit/dbit and "str" (string) of char[]/wchar[]/dchar[] could be saved for the more advanced D tutorials, when the first ones get nasty: - "What do you mean I can't slice my bool arrays how I want ?" - "What do you mean with I must use dchar to foreach my str ?" At least they have a common theme: Zero-is-False and Unicode. --anders PS. I changed my old suggestion of "string", since C++ people keep mixing that alias up with the old std::string class. The new name, in spirit of char and int, is just: "str". int main(str[] args); void main();
Feb 08 2005
prev sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris <Kris_member pathlink.com>  
wrote:

<snip>

 Note, however, that D does not really suppport immutable arrays per se.  
 For example, one cannot declare a "const char[]" and expect the compiler  
 to toss an error wherever an assignment is made. Walter has been asked  
 to provide support for such notions (read-only variables) but there's  
 been no movement on that as yet.

Last time I checked "const char[]" made the char[] 'reference' const, not the contents of the array. I agree some mechanism for specifying const data would be useful. In my experience linux and windows treat static strings differently, just the other day I found a bug caused by writing to a static string, it was working fine on windows :) Regan
Feb 07 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opslugtqx523k2f5 ally...
 On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris <Kris_member pathlink.com> 
 wrote:

 <snip>

 Note, however, that D does not really suppport immutable arrays per se. 
 For example, one cannot declare a "const char[]" and expect the compiler 
 to toss an error wherever an assignment is made. Walter has been asked 
 to provide support for such notions (read-only variables) but there's 
 been no movement on that as yet.

Last time I checked "const char[]" made the char[] 'reference' const, not the contents of the array.

I thought that would be "const char * const". Putting const after the ptr makes the ptr const and putting it before makes the contents const. I love C++ member functions that look like const char * foo(const char * const x) const;
 I agree some mechanism for specifying const data would be useful.

 In my experience linux and windows treat static strings differently, just 
 the other day I found a bug caused by writing to a static string, it was 
 working fine on windows :)

 Regan 

Feb 07 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 7 Feb 2005 15:55:11 -0500, Ben Hinkle <bhinkle mathworks.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opslugtqx523k2f5 ally...
 On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris <Kris_member pathlink.com>
 wrote:

 <snip>

 Note, however, that D does not really suppport immutable arrays per se.
 For example, one cannot declare a "const char[]" and expect the  
 compiler
 to toss an error wherever an assignment is made. Walter has been asked
 to provide support for such notions (read-only variables) but there's
 been no movement on that as yet.

Last time I checked "const char[]" made the char[] 'reference' const, not the contents of the array.

I thought that would be "const char * const". Putting const after the ptr makes the ptr const and putting it before makes the contents const.

I'm not sure how it works with pointers in D, in the spirit of leaving them behind where at all possible I've never tried it :) So how would the syntax look for a char[]? const char[] foo = "this is a const reference"; char[] const foo = "this is const data"; const char[] const foo = "this is immutable"; Is there any point to "char[] const foo" i.e. the data is immutable but the reference may be changed.. wouldn't that mean it was possible to 'loose' track of where the const data was? would it then be collected by the GC, or should const data hang round till program termination?
 I love
 C++ member functions that look like
  const char * foo(const char * const x) const;

Do I detect a hint of sarcasm there... :) Regan
Feb 07 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsluj1ea523k2f5 ally...
 On Mon, 7 Feb 2005 15:55:11 -0500, Ben Hinkle <bhinkle mathworks.com> 
 wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opslugtqx523k2f5 ally...
 On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris <Kris_member pathlink.com>
 wrote:

 <snip>

 Note, however, that D does not really suppport immutable arrays per se.
 For example, one cannot declare a "const char[]" and expect the 
 compiler
 to toss an error wherever an assignment is made. Walter has been asked
 to provide support for such notions (read-only variables) but there's
 been no movement on that as yet.

Last time I checked "const char[]" made the char[] 'reference' const, not the contents of the array.

I thought that would be "const char * const". Putting const after the ptr makes the ptr const and putting it before makes the contents const.

I'm not sure how it works with pointers in D, in the spirit of leaving them behind where at all possible I've never tried it :) So how would the syntax look for a char[]? const char[] foo = "this is a const reference"; char[] const foo = "this is const data"; const char[] const foo = "this is immutable"; Is there any point to "char[] const foo" i.e. the data is immutable but the reference may be changed.. wouldn't that mean it was possible to 'loose' track of where the const data was? would it then be collected by the GC, or should const data hang round till program termination?

Sorry about the * vs [] - they should be the same in the C/C++ world. I'm just more used to writing const char * than const char[] in C++. That whole extra character wears out my pinky. Maybe I need to work out some more :-)
 I love
 C++ member functions that look like
  const char * foo(const char * const x) const;

Do I detect a hint of sarcasm there... :)

guilty. In some cases D's in/out invariants are better than const since it's obvious what they mean and in/out invariants can check for more complex invariants than just const-ness. The downside is that they are applied at run-time and they don't apply within a function body.
 Regan 

Feb 07 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 7 Feb 2005 17:11:03 -0500, Ben Hinkle <bhinkle mathworks.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 I love
 C++ member functions that look like
  const char * foo(const char * const x) const;

Do I detect a hint of sarcasm there... :)

guilty. In some cases D's in/out invariants are better than const since it's obvious what they mean and in/out invariants can check for more complex invariants than just const-ness. The downside is that they are applied at run-time and they don't apply within a function body.

I assume you're referring to in/out contracts rather than in/out parameter specifiers. I'd prefer if const-ness was enforced by the parameter specifiers eg. void foo(in int a, out int b, inout int c) { a = 5; // error; b = 5; // ok; c = 5; // ok; } void main() { char[] const a = "a"; char[] b = "b"; foo(a,b,b); // ok. foo(a,a,b); // error 'a' is const. foo(a,b,a); // error 'a' is const. } in other words the 'in' parameter specifier is a contract stating I will not modify this reference or it's data. or perhaps we need to seperate those two, leading us to: void foo(in int in a, ..) yuck. Regan
Feb 07 2005
prev sibling parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
To summarise this issue, am I correct in saying that:

    writing to a slice may fail if somewhere along the way, through N 
slicings, the original string is a literal
    this failure occurs on Linux, because GDC puts literals in a 
read-only segment. It does _not_ fail on Win32 because they're in a 
writeable segment

Questions:

    does the language prescribe the Linux behaviour or the Win32 
behaviour? (It cannot leave it undefined, since D does not have 
undefineds)

Intermediate measures:

    make the Win32 compiler put in read-only segment

Possible solutions are

    leave it as is, with consequence of buyer beware, loss of the 
wonderful efficiency of slices, etc. etc. :-(
    make literals be 'const' somehow. This is likely to be a huge change 
to the language, and take us down the const road, which we know's been 
ruled out anyway.
    have it part of the language that literals are writeable. That would 
lead to only a tiny decrease in efficiency as people would need to dup 
their literals before passing them into functions which may alter them.



IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour is 
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to have 
the crashes like the Linux does, so we get a feel for what this problem 
is like

I think the solution is, weird as it sounds, is to make literals 
writeable. This, of course, depends on whether literals are folded (i.e. 
the same literal in two separate places in code actually refer to the 
same bit of memory after compilation/linking). Since literals are 
generally a bad thing, we should not be using them often. Given that, 
maybe we can salvage this situation by saying that literals are *not* 
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental 
differences between platforms, we cannot have code that may or may not 
crash depending on whether the thing way up the call stack is a literal 
or not, and we *should not* lose the marvellous effeciency afforded by 
slices. (Slices are the best thing since sliced bread.)

Thoughts?

Cheers

Matthew
Feb 08 2005
next sibling parent reply pragma <pragma_member pathlink.com> writes:
In article <cub6in$i6d$1 digitaldaemon.com>, Matthew says...
IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour is 
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to have 
the crashes like the Linux does, so we get a feel for what this problem 
is like

I think the solution is, weird as it sounds, is to make literals 
writeable. This, of course, depends on whether literals are folded (i.e. 
the same literal in two separate places in code actually refer to the 
same bit of memory after compilation/linking). Since literals are 
generally a bad thing, we should not be using them often. Given that, 
maybe we can salvage this situation by saying that literals are *not* 
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental 
differences between platforms, we cannot have code that may or may not 
crash depending on whether the thing way up the call stack is a literal 
or not, and we *should not* lose the marvellous effeciency afforded by 
slices. (Slices are the best thing since sliced bread.)

Thoughts?

A few. ;) I'm with you in that a decision needs to be made to keep GDC in step with DMD. As to "which platform has the bug?", I dont' know quite yet. I find myself leaning toward requiring literals to be given a 'const char[]' style type so that they're obviously read only in the language. This does have the nasty side-effect of requiring both a cast *and* a dup.
 // Look ma, it's C++ warmed over!
 const char[] literal = "literal string";
 char[] ugliness = cast(char[])literal.dup();

.. plus it's not 100% typesafe since we're allowing 'const' to be cast away. A better solution would be to still require 'const' for literal assignment, but allow for two additional properties: ".mutable" and ".immutable" to supply the means to work *with* the const-ness applied to the type.
 const char[] literal = "literal string";
 char[] not_so_ugly = literal.mutable;
 const char[] literal2 = not_so_ugly.immutable; // gets a copy

. where mutable() and immutable() perform an implied 'dup()' (where needed) and return the const-ness one would expect. Of course the easiest solution would be to get GDC to stick literals in the writable data segment, like DMD does, per your suggestion. Then its back to 'programmer beware' which may not be that bad a situation. I'll add that I've already developed habits to avoid trouble with writing to slices. Typically, if I'm not sure if I'm using a slice or not, I just call dup() to be sure... it keeps things sane that way. - EricAnderton at yahoo
Feb 08 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"pragma" <pragma_member pathlink.com> wrote in message
news:cub8af$kf4$1 digitaldaemon.com...
 In article <cub6in$i6d$1 digitaldaemon.com>, Matthew says...
IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour is
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to have
the crashes like the Linux does, so we get a feel for what this
problem
is like

I think the solution is, weird as it sounds, is to make literals
writeable. This, of course, depends on whether literals are folded
(i.e.
the same literal in two separate places in code actually refer to the
same bit of memory after compilation/linking). Since literals are
generally a bad thing, we should not be using them often. Given that,
maybe we can salvage this situation by saying that literals are *not*
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental
differences between platforms, we cannot have code that may or may not
crash depending on whether the thing way up the call stack is a
literal
or not, and we *should not* lose the marvellous effeciency afforded by
slices. (Slices are the best thing since sliced bread.)

Thoughts?

A few. ;) I'm with you in that a decision needs to be made to keep GDC in step with DMD. As to "which platform has the bug?", I dont' know quite yet. I find myself leaning toward requiring literals to be given a 'const char[]' style type so that they're obviously read only in the language. This does have the nasty side-effect of requiring both a cast *and* a dup.
 // Look ma, it's C++ warmed over!
 const char[] literal = "literal string";
 char[] ugliness = cast(char[])literal.dup();

.. plus it's not 100% typesafe since we're allowing 'const' to be cast away. A better solution would be to still require 'const' for literal assignment, but allow for two additional properties: ".mutable" and ".immutable" to supply the means to work *with* the const-ness applied to the type.
 const char[] literal = "literal string";
 char[] not_so_ugly = literal.mutable;
 const char[] literal2 = not_so_ugly.immutable; // gets a copy

. where mutable() and immutable() perform an implied 'dup()' (where needed) and return the const-ness one would expect. Of course the easiest solution would be to get GDC to stick literals in the writable data segment, like DMD does, per your suggestion. Then its back to 'programmer beware' which may not be that bad a situation.

Yes. But the important point is that it's 'literal writer beware'. It's precisely because literals are best not (over)used anyway that I suggest this is the reasonable tactic. All the other alternatives - status quo, crashes, const, even your somewhat elegant mutable/immutable - either suck, or are complex, or would require huge changes. Walter, what are your thoughts on this?
Feb 08 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cub9lc$mbe$3 digitaldaemon.com>, Matthew says...
"pragma" <pragma_member pathlink.com> wrote in message
news:cub8af$kf4$1 digitaldaemon.com...
 In article <cub6in$i6d$1 digitaldaemon.com>, Matthew says...
IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour is
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to have
the crashes like the Linux does, so we get a feel for what this
problem
is like

I think the solution is, weird as it sounds, is to make literals
writeable. This, of course, depends on whether literals are folded
(i.e.
the same literal in two separate places in code actually refer to the
same bit of memory after compilation/linking). Since literals are
generally a bad thing, we should not be using them often. Given that,
maybe we can salvage this situation by saying that literals are *not*
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental
differences between platforms, we cannot have code that may or may not
crash depending on whether the thing way up the call stack is a
literal
or not, and we *should not* lose the marvellous effeciency afforded by
slices. (Slices are the best thing since sliced bread.)

Thoughts?

A few. ;) I'm with you in that a decision needs to be made to keep GDC in step with DMD. As to "which platform has the bug?", I dont' know quite yet. I find myself leaning toward requiring literals to be given a 'const char[]' style type so that they're obviously read only in the language. This does have the nasty side-effect of requiring both a cast *and* a dup.
 // Look ma, it's C++ warmed over!
 const char[] literal = "literal string";
 char[] ugliness = cast(char[])literal.dup();

.. plus it's not 100% typesafe since we're allowing 'const' to be cast away. A better solution would be to still require 'const' for literal assignment, but allow for two additional properties: ".mutable" and ".immutable" to supply the means to work *with* the const-ness applied to the type.
 const char[] literal = "literal string";
 char[] not_so_ugly = literal.mutable;
 const char[] literal2 = not_so_ugly.immutable; // gets a copy

. where mutable() and immutable() perform an implied 'dup()' (where needed) and return the const-ness one would expect. Of course the easiest solution would be to get GDC to stick literals in the writable data segment, like DMD does, per your suggestion. Then its back to 'programmer beware' which may not be that bad a situation.

Yes. But the important point is that it's 'literal writer beware'. It's precisely because literals are best not (over)used anyway that I suggest this is the reasonable tactic. All the other alternatives - status quo, crashes, const, even your somewhat elegant mutable/immutable - either suck, or are complex, or would require huge changes. Walter, what are your thoughts on this?

The concept of read-only data is a powerful one. It's borne out in practice through the usage of immutable objects. It's also been the backbone for placing reference data (think "fonts", "messages", "data structures" etc) into ROM since before many here were born. Thus, I truly believe D needs to support read-only data; the ICU wrappers could really make use of this, for example (to avoid .dup all over the place). Concurrent programming techniques, as I'm sure you will attest to Matthew, cry out for the ability to enforce immutable/read-only/reference status. Note, however, that this is /not/ the same as the much-maligned, all singing, all dancing, C++ const! If the "readonly" attribute were present, D might imply all literals as readonly. So what about writing to a string literal? Hands up all those who regularly write to a string literal? How many times, per year, do you do that? I don't expect there would be too many. If one wishes to modify a pre-populated array of chars, then one can do it like so: char[] label = ['m', 'y', ' ', 'l', 'a', 'b', 'e', 'l']; or char[] _label = "my label"; char[] label = _label.dup; Given the available options, my opinion would be retain the read-only status of literals (make DMD the same as GDC), and then introduce a readonly status for data; one that avoids all the complexity of the C++ const mega-notion. my 2C - Kris
Feb 08 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Kris" <Kris_member pathlink.com> wrote in message 
news:cubcuu$qd2$1 digitaldaemon.com...
 In article <cub9lc$mbe$3 digitaldaemon.com>, Matthew says...
"pragma" <pragma_member pathlink.com> wrote in message
news:cub8af$kf4$1 digitaldaemon.com...
 In article <cub6in$i6d$1 digitaldaemon.com>, Matthew says...
IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour 
 is
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to 
 have
the crashes like the Linux does, so we get a feel for what this
problem
is like

I think the solution is, weird as it sounds, is to make literals
writeable. This, of course, depends on whether literals are folded
(i.e.
the same literal in two separate places in code actually refer to 
the
same bit of memory after compilation/linking). Since literals are
generally a bad thing, we should not be using them often. Given 
that,
maybe we can salvage this situation by saying that literals are 
*not*
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental
differences between platforms, we cannot have code that may or may 
not
crash depending on whether the thing way up the call stack is a
literal
or not, and we *should not* lose the marvellous effeciency afforded 
by
slices. (Slices are the best thing since sliced bread.)

Thoughts?

A few. ;) I'm with you in that a decision needs to be made to keep GDC in step with DMD. As to "which platform has the bug?", I dont' know quite yet. I find myself leaning toward requiring literals to be given a 'const char[]' style type so that they're obviously read only in the language. This does have the nasty side-effect of requiring both a cast *and* a dup.
 // Look ma, it's C++ warmed over!
 const char[] literal = "literal string";
 char[] ugliness = cast(char[])literal.dup();

.. plus it's not 100% typesafe since we're allowing 'const' to be cast away. A better solution would be to still require 'const' for literal assignment, but allow for two additional properties: ".mutable" and ".immutable" to supply the means to work *with* the const-ness applied to the type.
 const char[] literal = "literal string";
 char[] not_so_ugly = literal.mutable;
 const char[] literal2 = not_so_ugly.immutable; // gets a copy

. where mutable() and immutable() perform an implied 'dup()' (where needed) and return the const-ness one would expect. Of course the easiest solution would be to get GDC to stick literals in the writable data segment, like DMD does, per your suggestion. Then its back to 'programmer beware' which may not be that bad a situation.

Yes. But the important point is that it's 'literal writer beware'. It's precisely because literals are best not (over)used anyway that I suggest this is the reasonable tactic. All the other alternatives - status quo, crashes, const, even your somewhat elegant mutable/immutable - either suck, or are complex, or would require huge changes. Walter, what are your thoughts on this?

The concept of read-only data is a powerful one. It's borne out in practice through the usage of immutable objects. It's also been the backbone for placing reference data (think "fonts", "messages", "data structures" etc) into ROM since before many here were born. Thus, I truly believe D needs to support read-only data; the ICU wrappers could really make use of this, for example (to avoid .dup all over the place). Concurrent programming techniques, as I'm sure you will attest to Matthew, cry out for the ability to enforce immutable/read-only/reference status.

Indeed. I've just submitted my next "Flexible C++" instalment on that very issue (and the dangers of "logical constness"). ;)
 Note, however, that this is /not/ the same as the much-maligned, all 
 singing,
 all dancing, C++ const!

Agreed.
 If the "readonly" attribute were present, D might imply all literals 
 as
 readonly.

'const' should _always_ have been called readonly, IMO. And, yes, I've wanted a readonly in D for some years, along with many others.
 So what about writing to a string literal? Hands up all those who 
 regularly
 write to a string literal? How many times, per year, do you do that? I 
 don't
 expect there would be too many.

I _occasionally_ do so when I'm feeling lazy, although it's much more often that I do something like the following: char_type drv[] = { '?', ':', '\\', '\0' }; drv[0] = 'A' + drive_index; btw, this technique of declaring a C-string with aggregate syntax, rather than a literal, is very useful for writing multi-char-encoding templates, since it works just as well with wchar_t as char.
 If one wishes to modify a pre-populated array of chars, then one can 
 do it like
 so:

 char[] label = ['m', 'y', ' ', 'l', 'a', 'b', 'e', 'l'];

 or

 char[] _label = "my label";
 char[] label = _label.dup;

 Given the available options, my opinion would be retain the read-only 
 status of
 literals (make DMD the same as GDC), and then introduce a readonly 
 status for
 data; one that avoids all the complexity of the C++ const mega-notion.

So you mean: - on Win32 (and *all other platforms* (that support it) literals go in a read-only segment) - we have a readonly keyword - literals are implicitly readonly. - one cannot slice from a readonly to a non-readonly, only .dup That'd mean your code above would not compile. Rather it'd have to be: readonly char[] _label = "my label"; char[] label = _label.dup; or char[] label = "my label".dup; I can buy that. Indeed, I think I like it. The downside is that it requires a new keyword - and we know how popular that's going to be! - but the complexity involved seems moderate. We just need to find out what Walter thinks?
Feb 08 2005
next sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...
I can buy that. Indeed, I think I like it. The downside is that it 
requires a new keyword - and we know how popular that's going to be! - 
but the complexity involved seems moderate.

As far as D goes, "const" and "readonly" should be interchangeable. Const int and const char, for example, are read-only ... Thus, rather than introduce another keyword, could we not use "const" rather than "readonly" for arrays also? - Kris
Feb 08 2005
next sibling parent "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Kris" <Kris_member pathlink.com> wrote in message
news:cubfpo$t46$1 digitaldaemon.com...
 In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...
I can buy that. Indeed, I think I like it. The downside is that it
requires a new keyword - and we know how popular that's going to be! -
but the complexity involved seems moderate.

As far as D goes, "const" and "readonly" should be interchangeable. Const int and const char, for example, are read-only ... Thus, rather than introduce another keyword, could we not use "const" rather than "readonly" for arrays also?

We could, but I think it'd be nicer to keep const for constants. Either way, though, the issue is a significant new bit of language. It is to that that I anticipate objections
Feb 08 2005
prev sibling parent reply John Reimer <brk_6502 yahoo.com> writes:
Kris wrote:
 In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...
 
I can buy that. Indeed, I think I like it. The downside is that it 
requires a new keyword - and we know how popular that's going to be! - 
but the complexity involved seems moderate.

As far as D goes, "const" and "readonly" should be interchangeable. Const int and const char, for example, are read-only ... Thus, rather than introduce another keyword, could we not use "const" rather than "readonly" for arrays also? - Kris

I really, really like the idea of a "readonly" keyword. It seems so clear. The only disadvantage is that it caters to the English language programmers (but most of the keywords do anyway). At least it's not as bad as Pascal.
Feb 08 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"John Reimer" <brk_6502 yahoo.com> wrote in message 
news:cubgm6$tjn$1 digitaldaemon.com...
 Kris wrote:
 In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...

I can buy that. Indeed, I think I like it. The downside is that it 
requires a new keyword - and we know how popular that's going to 
be! - but the complexity involved seems moderate.

As far as D goes, "const" and "readonly" should be interchangeable. Const int and const char, for example, are read-only ... Thus, rather than introduce another keyword, could we not use "const" rather than "readonly" for arrays also? - Kris

I really, really like the idea of a "readonly" keyword. It seems so clear. The only disadvantage is that it caters to the English language programmers (but most of the keywords do anyway). At least it's not as bad as Pascal.

Also, Walter hates const, so if we can divorce him from that emotionally, even a little, we might stand a chance of getting our (thoroughly techically worthy) point across. Anyway, whether it's const or readonly is somewhat irrelevant at this point. May I suggest that we use readonly in our code nuggets for the remainder of the debate, so as to keep the issue as clear (and demonstrably 'new') as possible? If it resolves to using the const keyword if/when it gets accepted, so be it
Feb 08 2005
parent reply "Alex Stevenson" <ans104 cs.york.ac.uk> writes:
On Wed, 9 Feb 2005 10:09:25 +1100, Matthew  
<admin stlsoft.dot.dot.dot.dot.org> wrote:

 "John Reimer" <brk_6502 yahoo.com> wrote in message
 news:cubgm6$tjn$1 digitaldaemon.com...
 Kris wrote:
 In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...

 I can buy that. Indeed, I think I like it. The downside is that it
 requires a new keyword - and we know how popular that's going to
 be! - but the complexity involved seems moderate.

As far as D goes, "const" and "readonly" should be interchangeable. Const int and const char, for example, are read-only ... Thus, rather than introduce another keyword, could we not use "const" rather than "readonly" for arrays also? - Kris

I really, really like the idea of a "readonly" keyword. It seems so clear. The only disadvantage is that it caters to the English language programmers (but most of the keywords do anyway). At least it's not as bad as Pascal.

Also, Walter hates const, so if we can divorce him from that emotionally, even a little, we might stand a chance of getting our (thoroughly techically worthy) point across. Anyway, whether it's const or readonly is somewhat irrelevant at this point. May I suggest that we use readonly in our code nuggets for the remainder of the debate, so as to keep the issue as clear (and demonstrably 'new') as possible? If it resolves to using the const keyword if/when it gets accepted, so be it

I hate to muddy the water still further, but how about the final keyword? AFAIK it only currently has meaning for class methods... Though one keyword doing double duty smells a little bad (static in C anyone?) -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Feb 08 2005
parent reply =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Alex Stevenson wrote:

 I hate to muddy the water still further, but how about the final 
 keyword?  AFAIK it only currently has meaning for class methods... 
 Though one  keyword doing double duty smells a little bad (static in C 
 anyone?)

"delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ? Reusing keywords for wildly different things is part of the heritage. :-) --anders
Feb 08 2005
parent reply "Alex Stevenson" <ans104 cs.york.ac.uk> writes:
On Wed, 09 Feb 2005 00:37:45 +0100, Anders F Björklund <afb algonet.se>  
wrote:

 Alex Stevenson wrote:

 I hate to muddy the water still further, but how about the final  
 keyword?  AFAIK it only currently has meaning for class methods...  
 Though one  keyword doing double duty smells a little bad (static in C  
 anyone?)

"delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ? Reusing keywords for wildly different things is part of the heritage. :-) --anders

Rape, pillage and the subjagation of innocent nations is part of my heritage (Englishman dontcherknow old boy...), but that doesn't mean I want to do them every day - only for special occaisions. On reflection though, I think I'd prefer to reuse a keyword for two clearly seperate things than take the Ada route and have ludicrous numbers of keywords. -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Feb 08 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Alex Stevenson" <ans104 cs.york.ac.uk> wrote in message 
news:opslwkgxwo08qma6 mjolnir.spamnet.local...
 On Wed, 09 Feb 2005 00:37:45 +0100, Anders F Björklund 
 <afb algonet.se>  wrote:

 Alex Stevenson wrote:

 I hate to muddy the water still further, but how about the final 
 keyword?  AFAIK it only currently has meaning for class methods... 
 Though one  keyword doing double duty smells a little bad (static in 
 C  anyone?)

"delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ? Reusing keywords for wildly different things is part of the heritage. :-) --anders

Rape, pillage and the subjagation of innocent nations is part of my heritage (Englishman dontcherknow old boy...),

Don't forget parliamentary democracy and that most civilised practice: queueing. :-)
Feb 08 2005
parent "Alex Stevenson" <ans104 cs.york.ac.uk> writes:
On Wed, 9 Feb 2005 11:23:21 +1100, Matthew  
<admin stlsoft.dot.dot.dot.dot.org> wrote:

 "Alex Stevenson" <ans104 cs.york.ac.uk> wrote in message
 news:opslwkgxwo08qma6 mjolnir.spamnet.local...
 On Wed, 09 Feb 2005 00:37:45 +0100, Anders F Björklund
 <afb algonet.se>  wrote:

 Alex Stevenson wrote:

 I hate to muddy the water still further, but how about the final
 keyword?  AFAIK it only currently has meaning for class methods...
 Though one  keyword doing double duty smells a little bad (static in
 C  anyone?)

"delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ? Reusing keywords for wildly different things is part of the heritage. :-) --anders

Rape, pillage and the subjagation of innocent nations is part of my heritage (Englishman dontcherknow old boy...),

Don't forget parliamentary democracy and that most civilised practice: queueing. :-)

Cricket. You forgot cricket. :-P As for Paliamentary Democracy, I'll leave that to Sir Humphrey Appleby from "Yes, Prime Minister": "Since 1832 we [the civil service] have been gradually excluding the voter from government, now we've got them to a point where they just vote once every five years for whichever bunch of buffoons will try to interfere with our policies..." -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Feb 08 2005
prev sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Matthew wrote:

 So you mean:
     - on Win32 (and *all other platforms* (that support it) literals go 
 in a read-only segment)
     - we have a readonly keyword
     - literals are implicitly readonly.
     - one cannot slice from a readonly to a non-readonly, only .dup

Slicing could be allowed, only that the slice is readonly too ? Q: Couldn't readonly be just a another attribute to the arrays ? char[5] s; s.length = 5; s.readonly = 0; char[] a = new char[5]; a.length = 5; a.readonly = 0; char[] l = "hello" .length = 5; .readonly = 1; And you can set it to false, but not set it back to true again... --anders
Feb 08 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
I wrote, too soon:

 Q: Couldn't readonly be just a another attribute to the arrays ?
 
 char[5] s;
   s.length = 5
   s.readonly = 0
 
 char[] a = new char[5];
   a.length = 5
   a.readonly = 0
 
 char[] l = "hello"
   l.length = 5
   l.readonly = 1
 
 And you can set it to false, but not set it back to true again...

That should have read "you can set it to 1, but not back to 0". You would use .dup to return a new copy, that had readonly = 0. --anders
Feb 08 2005
prev sibling parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:cubfrd$t2j$1 digitaldaemon.com...
 Matthew wrote:

 So you mean:
     - on Win32 (and *all other platforms* (that support it) literals 
 go in a read-only segment)
     - we have a readonly keyword
     - literals are implicitly readonly.
     - one cannot slice from a readonly to a non-readonly, only .dup

Slicing could be allowed, only that the slice is readonly too ? Q: Couldn't readonly be just a another attribute to the arrays ? char[5] s; s.length = 5; s.readonly = 0; char[] a = new char[5]; a.length = 5; a.readonly = 0; char[] l = "hello" .length = 5; .readonly = 1; And you can set it to false, but not set it back to true again...

Yes, but it'd be invisible to the programmer looking at code. Also, it'd be runtime, rather than compile time, checking. Which we already have, in the form of the access fault (on Linux at least).
Feb 08 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Matthew wrote:

Q: Couldn't readonly be just a another attribute to the arrays ?

And you can set it to 1, but not set it back to 0 again...

Yes, but it'd be invisible to the programmer looking at code. Also, it'd be runtime, rather than compile time, checking. Which we already have, in the form of the access fault (on Linux at least).

But couldn't the compiler catch the obvious misuses ? (i.e. assiging stuff to the readonly string literals) It already catches if you try to e.g. assign a new length to a static array, and other such assignments ? It would also solve Kris's problem of returning an array reference to his internal buffer, since he would: 1) make a slice of the entire array: copy = buffer[] 2) make the copy readonly, by: copy.readonly = true 3) return the copy, which is now a "readonly" array 4) it still doesn't protect others if *he* changes it, that's the same as with casting mutable->immutable You could of course still abuse that by the built-in array-to-pointer conversion, but that's another story. One could even extend the "in" keyword and param default to make arrays readonly, when not using "out" or "inout" ? It would also enforce Copy-on-Write, since you could still read the entire input without duplicating but if you need to make any changes then you need to .dup first, or else the array would still be readonly... i.e. void function1(char[] s); // inside this function, "s" now has the readonly flag set i.e. function1 sets: s.readonly = 1, need to Copy-on-Write (must use .dup, as setting s.readonly = 0 is an error) void function2(inout char[] s); // but this function is free to modify the "s" param's chars i.e. function2 does *not* modify s.readonly, which means that if the input was a literal it would still fail --anders
Feb 08 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 09 Feb 2005 00:26:38 +0100, Anders F Björklund <afb algonet.se>  
wrote:
 One could even extend the "in" keyword and param default
 to make arrays readonly, when not using "out" or "inout" ?

This is my preferred solution. - It is a compile time solution. - It is enforcement of a functions contract i.e. void foo(in int a, out int b, inout int c) says to me, I'm gonna read a, write b, and read/write c. If the function then writes to 'a' it violates that contract. On the issue of the readonly memory, the current D behaviour is present in C (at least in the compilers I use). I think we can and should use both readonly and non readonly memory in different places, for different reasons, see example #2 below. Examples: 1) void foo(in int a, out int b, inout int c) { a = 1; //error b = 1; //ok c = 1; //ok } void main() { readonly int ri = 5; int wi; foo(wi,wi,ri); //error, ri is readonly foo(wi,ri,wi); //error, ri is readonly foo(ri,wi,wi); //ok } 2) readonly char[] ro = "READ"; //data placed in readonly memory char[] rw = "READ/WRITE"; //data _not_ placed in readonly memory void foo(in char[] a, out char[] b) { readonly char[] c; b = a[0..3]; //error, a is readonly. (or implicit dup?) b = a[0..3].dup; //ok c = a[0..3]; //ok a[0] = 'a'; //error, a is readonly b[0] = 'a'; //ok c[0] = 'a'; //error, c is readonly } void main() { readonly char[] lro = "README"; //data placed in readonly memory char[] lrw = "READ/WRITE"; //data _not_ placed in readonly memory foo(ro,rw); //ok foo(rw,ro); //error, ro is readonly foo(rw,rw); //ok } I realise this idea is very similar to the C/C++ const idea. Regan
Feb 08 2005
parent reply =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Regan Heath wrote:

 One could even extend the "in" keyword and param default
 to make arrays readonly, when not using "out" or "inout" ?

This is my preferred solution. - It is a compile time solution. - It is enforcement of a functions contract i.e.

It would also work well to integrate literals...
 Examples:
 1)
 void foo(in int a, out int b, inout int c) {
   a = 1; //error
   b = 1; //ok
   c = 1; //ok
 }

I think you misunderstood something. "in" means you get a copy. "inout" means you get a reference. ("out" means it is .init-ed) Thus, "a = 1" is not an error. It just doesn't affect anything, at least not outside the function (such as the argument passed).
    int a = 0, b = 0, c = 0;
    foo(a,b,c);            
    printf("a=%d b=%d c=%d\n", a, b, c);

a=0 b=1 c=1
     foo(1,2,3);            

constant 1 is not an lvalue constant 2 is not an lvalue
 void main() {
   readonly int ri = 5;
   int wi;
 
   foo(wi,wi,ri); //error, ri is readonly
   foo(wi,ri,wi); //error, ri is readonly
   foo(ri,wi,wi); //ok
 }

If you mean "const", why not just say that ? :-) constant 5 is not an lvalue
 2)
 readonly char[] ro = "READ"; //data placed in readonly memory
 char[] rw = "READ/WRITE";    //data _not_ placed in readonly memory

Again, having such extra modifiers on strings is what made C++ const suck in the first place. char[] ro = "READ"; //data placed in readonly memory char[] rw = "READ/WRITE".dup; //data _not_ placed in readonly memory Since the read/write memory is now allocated by the trashman, it's not as easy to have it static anyway... String literals are located in data section, as in C. (they are even '\u0000'-terminated, for usage with C)
 void foo(in char[] a, out char[] b) {
   readonly char[] c;

char[] c;
   b = a[0..3];     //error, a is readonly. (or implicit dup?)

most likely a compile error, since all info is known by then.
   b = a[0..3].dup; //ok
   c = a[0..3];     //ok

making c readonly. (since a was)
   a[0] = 'a';      //error, a is readonly

same as "hello"[0] = 'a';
   b[0] = 'a';      //ok

ok.
   c[0] = 'a';      //error, c is readonly

compiler doesn't know that, but you'd get a runtime error. The tricky parts is when you assign to the entire array. a = b; a = c; c = a; c = b; // and so on, and so forth Not that it would something you would do often, but it would probably not affect anything at all - just change the "pointer" of the array in question (not contents)
 void main() {
   readonly char[] lro = "README"; //data placed in readonly memory
   char[] lrw = "READ/WRITE";      //data _not_ placed in readonly memory

probably should be written as: const char[] ro = "README"; // read-only (literal) char[] rw = "READ/WRITE".dup; // read-write (copy) But usally you can get away with: char[] s = "hello"; // this is read-only at run-time, but not at compile writefln("%s",s); // OK char[] t = s ~ " world"; // OK s[4] = '?'; // KABOOM; should have Copy-on-Write t[5] = '!'; // OK; since we made our own copy first
   foo(ro,rw); //ok
   foo(rw,ro); //error, ro is readonly
   foo(rw,rw); //ok

Note that most functions will have "in" parameters. char[] tolower(char[] s); char[] toupper(char[] s);
 I realise this idea is very similar to the C/C++ const idea.

And that is why it needs to die, I'm afraid. --anders
Feb 09 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 09 Feb 2005 10:02:06 +0100, Anders F Björklund <afb algonet.se>  
wrote:
 Regan Heath wrote:

 One could even extend the "in" keyword and param default
 to make arrays readonly, when not using "out" or "inout" ?

- It is a compile time solution. - It is enforcement of a functions contract i.e.

It would also work well to integrate literals...
 Examples:
 1)
 void foo(in int a, out int b, inout int c) {
   a = 1; //error
   b = 1; //ok
   c = 1; //ok
 }

I think you misunderstood something.

No, I understand how it works _now_ and I understand why it works that way. I think we can keep the advantages and add more. The funny thing is if you pass a char[] you can modify the data it references but not the reference. eg. # import std.stdio; # # void main() { # char[] a = "abc".dup; # char *p; # # writefln(a.length,":=",a); # foo(a); # writefln(a.length,":=",a); # p = a.ptr; # writef(p[3]); # writef(p[4]); # writef(p[5]); # } # # void foo(char[] a) { # a ~= "def"; # writefln(a.length,":=",a); # } Does the GC collect "def" now? or is "def" tied to the same block as "abc" and thus hangs around till "abc" vanishes?
 "in" means you get a copy.
 "inout" means you get a reference. ("out" means it is .init-ed)

 Thus, "a = 1" is not an error. It just doesn't affect anything,
 at least not outside the function (such as the argument passed).

    int a = 0, b = 0, c = 0;
    foo(a,b,c);               printf("a=%d b=%d c=%d\n", a, b, c);

a=0 b=1 c=1
     foo(1,2,3);

constant 1 is not an lvalue constant 2 is not an lvalue
 void main() {
   readonly int ri = 5;
   int wi;
    foo(wi,wi,ri); //error, ri is readonly
   foo(wi,ri,wi); //error, ri is readonly
   foo(ri,wi,wi); //ok
 }

If you mean "const", why not just say that ? :-)

Because someone suggested 'readonly' was a better name, and I agree, for the concept I have in mind. A constant is something that can 'never' change. A readonly variable is simply a variable which is readonly in the current scope, i.e. char[] a; //not readonly foo(a); void foo(in char[] a) { //a is readonly here }
 constant 5 is not an lvalue

 2)
 readonly char[] ro = "READ"; //data placed in readonly memory
 char[] rw = "READ/WRITE";    //data _not_ placed in readonly memory

Again, having such extra modifiers on strings is what made C++ const suck in the first place.

C++ const sucked as a result of more than just 1 factor, this readonly is similar to part of the whole const thing, but it's not the whole const thing, if that makes any sense. The 'readonoly' above is 2 things; 1- an hint as to where the compiler can put the reference/data. 2- an indication that the programmer does not intend to modify the refernce/data. Allowing the compiler to optimise and error check. I would be interested to see if the arguments against const apply to my idea here equally well or not, after all D is not C in many respects, and this idea is not exactly the same (I dont think). I must admit I do not know everything there is to know about the problems with const in C++, perhaps we should start with a description of them?
   char[] ro = "READ"; //data placed in readonly memory
   char[] rw = "READ/WRITE".dup;    //data _not_ placed in readonly memory

 Since the read/write memory is now allocated by the
 trashman, it's not as easy to have it static anyway...

It's simply an indication to the compiler, to the trashman, as to where it can put the memory if it so desires.
 String literals are located in data section, as in C.
 (they are even '\u0000'-terminated, for usage with C)

Sure, and on linux they're in readonly memory, and on windows they're in read/write memory.
 void foo(in char[] a, out char[] b) {
   readonly char[] c;

char[] c;
   b = a[0..3];     //error, a is readonly. (or implicit dup?)

most likely a compile error, since all info is known by then.

yep.
   b = a[0..3].dup; //ok
   c = a[0..3];     //ok

making c readonly. (since a was)

yep.
   a[0] = 'a';      //error, a is readonly

same as "hello"[0] = 'a';

yep.
   b[0] = 'a';      //ok

ok.
   c[0] = 'a';      //error, c is readonly

compiler doesn't know that,

It could in this simple example keep track of this during compile by flagging c as readonly above where it's assigned and erroring here. There may be more complex cases where it's not possible?
 but you'd get a runtime error.

yep.
 The tricky parts is when you assign to the entire array.
 a = b; a = c; c = a; c = b; // and so on, and so forth

 Not that it would something you would do often, but it
 would probably not affect anything at all - just change
 the "pointer" of the array in question (not contents)

I think in some cases (perhaps all) the compiler can keep track of these things by flagging variables as readonly or not during compile.
 void main() {
   readonly char[] lro = "README"; //data placed in readonly memory
   char[] lrw = "READ/WRITE";      //data _not_ placed in readonly memory

probably should be written as: const char[] ro = "README"; // read-only (literal) char[] rw = "READ/WRITE".dup; // read-write (copy)

Using my definition above of const and readonly I agree. However for the sake of less keywords we could re-use readonly.
 But usally you can get away with:
 char[] s = "hello"; // this is read-only at run-time, but not at compile

It depends where you decide to put "hello" by default. Someone has suggested it should go into readonly memory, some have suggested it shouldn't. If not, then 'readonly' can be seen as a hint to the compiler that it can optimise by placing the string in readonly memory, of course it can still choose not to, as long as we have compile and runtime checking of 'readonly' then it all works dandy. If not, then you as a programmer have more options. That said if most common case is that it's intended to be readonly then the default behaviour should follow that.
 writefln("%s",s); // OK
 char[] t = s ~ " world"; // OK
 s[4] = '?'; // KABOOM; should have Copy-on-Write
 t[5] = '!'; // OK; since we made our own copy first

   foo(ro,rw); //ok
   foo(rw,ro); //error, ro is readonly
   foo(rw,rw); //ok

Note that most functions will have "in" parameters. char[] tolower(char[] s); char[] toupper(char[] s);

Sure, meaning in my idea they will _not_ modify the input. If they want to they should be re-specified as 'inout'. It's all part of the functions contract.
 I realise this idea is very similar to the C/C++ const idea.

And that is why it needs to die, I'm afraid.

I said 'similar' not 'identical' and D is not C, D is different. I want to know whether the problems with C++ const apply equally here in D. Regan
Feb 09 2005
parent reply =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Regan Heath wrote:

 No, I understand how it works _now_ and I understand why it works that  
 way. I think we can keep the advantages and add more.

Okay.
 The funny thing is if you pass a char[] you can modify the data it  
 references but not the reference. eg.
 
 Does the GC collect "def" now? or is "def" tied to the same block as 
 "abc"  and thus hangs around till "abc" vanishes?

I think it'll clean it up, but my faith in the trashman is limited. But isn't this the same as with pointers ? You can change the data pointed to, but you can really change the pointer itself - since it is passed by value. It's also similar to how slices work, where one slice could operate on the data also pointed to by the other...
 If you mean "const", why not just say that ? :-)

Because someone suggested 'readonly' was a better name, and I agree, for the concept I have in mind.

I meant in the current implementation, as I'm sure you know.
 C++ const sucked as a result of more than just 1 factor, this readonly 
 is  similar to part of the whole const thing, but it's not the whole 
 const  thing, if that makes any sense.

Yeah it's similar to how "inout" is similar to &, but not equal to.
 The 'readonoly' above is 2 things;
 1- an hint as to where the compiler can put the reference/data.
 2- an indication that the programmer does not intend to modify the  
 refernce/data.
 
 Allowing the compiler to optimise and error check.

Think I had about the same thing, but with a property instead.
 Since the read/write memory is now allocated by the
 trashman, it's not as easy to have it static anyway...

It's simply an indication to the compiler, to the trashman, as to where it can put the memory if it so desires.

There aren't that many different places to put data, at compile time.
 String literals are located in data section, as in C.
 (they are even '\u0000'-terminated, for usage with C)

Sure, and on linux they're in readonly memory, and on windows they're in read/write memory.

Linux and Darwin (and the rest of the GDC: FreeBSD, Solaris, etc)
 But usally you can get away with:
 char[] s = "hello"; // this is read-only at run-time, but not at compile

It depends where you decide to put "hello" by default. Someone has suggested it should go into readonly memory, some have suggested it shouldn't.

The suggestion was that since it is read-only on *some*, it would be more consistent if it could be made (forced) readonly on all ? That is, if it's still meant to be an easily portable language.
 Note that most functions will have "in" parameters.

 char[] tolower(char[] s);
 char[] toupper(char[] s);

Sure, meaning in my idea they will _not_ modify the input. If they want to they should be re-specified as 'inout'. It's all part of the functions contract.

I don't believe that it makes the slighest difference at the moment, which was why I thought it would be better if "in" changed something. Of course, that would imply that there is such a thing as a readonly. The only indication at the moment is a segfault when you write to it. --anders
Feb 09 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 09 Feb 2005 23:44:53 +0100, Anders F Björklund <afb algonet.se>  
wrote:
 Regan Heath wrote:
 The funny thing is if you pass a char[] you can modify the data it   
 references but not the reference. eg.
  Does the GC collect "def" now? or is "def" tied to the same block as  
 "abc"  and thus hangs around till "abc" vanishes?

I think it'll clean it up, but my faith in the trashman is limited. But isn't this the same as with pointers ? You can change the data pointed to, but you can really change the pointer itself - since it is passed by value.

Yes, a copy of the pointer is passed.
 It's also similar to how slices work, where
 one slice could operate on the data also pointed to by the other...

True.
 If you mean "const", why not just say that ? :-)

for the concept I have in mind.

I meant in the current implementation, as I'm sure you know.

Oh, sorry. :)
 C++ const sucked as a result of more than just 1 factor, this readonly  
 is  similar to part of the whole const thing, but it's not the whole  
 const  thing, if that makes any sense.

Yeah it's similar to how "inout" is similar to &, but not equal to.

That is my gut feeling, I need to talk/think about it more to see if my gut is lying or not.
 String literals are located in data section, as in C.
 (they are even '\u0000'-terminated, for usage with C)

in read/write memory.

Linux and Darwin (and the rest of the GDC: FreeBSD, Solaris, etc)

Sorry, it's a bad habit of mine to say 'linux' when I should say 'unix' or something else which actually means what I mean.
 But usally you can get away with:
 char[] s = "hello"; // this is read-only at run-time, but not at  
 compile

suggested it should go into readonly memory, some have suggested it shouldn't.

The suggestion was that since it is read-only on *some*, it would be more consistent if it could be made (forced) readonly on all ?

I agree it needs to be consistent. I'm on the fence as to whether it should be readonly or not readonly. I don't think we should make the decision based on what is easiest to do at this point in time, tho that is certainly ok for a _temporary_ solution (emphasis on temporary). It may turn out that the easy soln is the right one, it may not.
 That is, if it's still meant to be an easily portable language.

I agree this is important.
 Note that most functions will have "in" parameters.

 char[] tolower(char[] s);
 char[] toupper(char[] s);

want to they should be re-specified as 'inout'. It's all part of the functions contract.

I don't believe that it makes the slighest difference at the moment, which was why I thought it would be better if "in" changed something.

Sorry, I'm not sure what you mean here?
 Of course, that would imply that there is such a thing as a readonly.
 The only indication at the moment is a segfault when you write to it.

Indeed, some sort of indication before the segfault is perferred. Regan
Feb 09 2005
parent =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Regan Heath wrote:

  Sure, and on linux they're in readonly memory, and on windows 
 they're  in  read/write memory.

Linux and Darwin (and the rest of the GDC: FreeBSD, Solaris, etc)

Sorry, it's a bad habit of mine to say 'linux' when I should say 'unix' or something else which actually means what I mean.

Well, it seems to be a general "bad habit" - judging from the code... And this whole "Windows" vs "linux" casing issue has me queasy still.
 I don't believe that it makes the slighest difference at the moment,
 which was why I thought it would be better if "in" changed something.

Sorry, I'm not sure what you mean here?

We agree that it would be good if "in" and "out" actually did something. (right now it doesn't affect the char[] parameters, they're trust-based)
 Of course, that would imply that there is such a thing as a readonly.
 The only indication at the moment is a segfault when you write to it.

Indeed, some sort of indication before the segfault is perferred.

Yeah, I believe that's how this thread got started in the first place... I'll just leave it, got other things to do (and lots of pending patches) --anders
Feb 09 2005
prev sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Matthew" <admin stlsoft.dot.dot.dot.dot.org> wrote in message 
news:cub6in$i6d$1 digitaldaemon.com...
 To summarise this issue, am I correct in saying that:

    writing to a slice may fail if somewhere along the way, through N 
 slicings, the original string is a literal
    this failure occurs on Linux, because GDC puts literals in a read-only 
 segment. It does _not_ fail on Win32 because they're in a writeable 
 segment

 Questions:

    does the language prescribe the Linux behaviour or the Win32 behaviour? 
 (It cannot leave it undefined, since D does not have undefineds)

The only mention I can find in the spec is on the page about Memory Management under Strings and COW. It says the slice may be in read-only memory but doesn't go into details. My own vote would be that it's enough to add a bullet in the Portability Guide and probably in the Lexical section about string literals being read-only and/or shared/folded on some platforms.
Feb 08 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 The only mention I can find in the spec is on the page about Memory 
 Management under Strings and COW. It says the slice may be in read-only 
 memory but doesn't go into details.
 
 My own vote would be that it's enough to add a bullet in the Portability 
 Guide and probably in the Lexical section about string literals being 
 read-only and/or shared/folded on some platforms.

String literals on Linux and Mac OS X, and probably other UNIX too are read-only (whether using DMD or GDC). This means that slices of string literals are also read-only. Writing to them segfaults... The "easiest" should be to make them crash on Windows as well ? :-) Or adding support for read-only strings to the D language itself. (barring that, a friendly note that string literals are *read-only*) --anders
Feb 08 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:cubbno$oli$1 digitaldaemon.com...
 Ben Hinkle wrote:

 The only mention I can find in the spec is on the page about Memory 
 Management under Strings and COW. It says the slice may be in 
 read-only memory but doesn't go into details.

 My own vote would be that it's enough to add a bullet in the 
 Portability Guide and probably in the Lexical section about string 
 literals being read-only and/or shared/folded on some platforms.

String literals on Linux and Mac OS X, and probably other UNIX too are read-only (whether using DMD or GDC). This means that slices of string literals are also read-only. Writing to them segfaults... The "easiest" should be to make them crash on Windows as well ? :-)

Well, either way, it must be the same, since D does not have implementation-defined behaviour. At least, that's what it says on the packet. ;)
 Or adding support for read-only strings to the D language itself.
 (barring that, a friendly note that string literals are *read-only*)

But surely the problem is that, because slicing supports, well, slices, it is inevitable that one will be downlow in the call chain, and slicing a string whose original source is unknown, or at least hard to know for sure. In that case, do we slice or dup first? (Of course, in most cases, modifying something that you do not know anything about's going to be dodgy, but there will certainly be cases where it'd be desirable, I'm sure. Otherwise, no-one would've reported the problem.) I can't see a robust alternative to either (i) slices are writeable and not folded, or (ii) we have const.
Feb 08 2005
parent reply Ben Hinkle <Ben_member pathlink.com> writes:
In article <cubda4$qrn$1 digitaldaemon.com>, Matthew says...
"Anders F Björklund" <afb algonet.se> wrote in message 
news:cubbno$oli$1 digitaldaemon.com...
 Ben Hinkle wrote:

 The only mention I can find in the spec is on the page about Memory 
 Management under Strings and COW. It says the slice may be in 
 read-only memory but doesn't go into details.

 My own vote would be that it's enough to add a bullet in the 
 Portability Guide and probably in the Lexical section about string 
 literals being read-only and/or shared/folded on some platforms.

String literals on Linux and Mac OS X, and probably other UNIX too are read-only (whether using DMD or GDC). This means that slices of string literals are also read-only. Writing to them segfaults... The "easiest" should be to make them crash on Windows as well ? :-)

Well, either way, it must be the same, since D does not have implementation-defined behaviour. At least, that's what it says on the packet. ;)

Well I wouldn't go so far to say it doesn't have implementation-defined behavior. For example it says you shouldn't store pointers in ints. Does it error if you do? maybe or maybe not. I think the idea is to minimize the implementation-defined behavior. To completely remove it is impossible. So should string literals have some implementation-defined behavior? Eh, I don't really have a strong opinion but given the choices on the table the current situation (or perhaps modifying the windows behavior to match unix if it is easy) seems like the most reasonable to me without changing some fundamental aspects of D.
 Or adding support for read-only strings to the D language itself.
 (barring that, a friendly note that string literals are *read-only*)

But surely the problem is that, because slicing supports, well, slices, it is inevitable that one will be downlow in the call chain, and slicing a string whose original source is unknown, or at least hard to know for sure. In that case, do we slice or dup first? (Of course, in most cases, modifying something that you do not know anything about's going to be dodgy, but there will certainly be cases where it'd be desirable, I'm sure. Otherwise, no-one would've reported the problem.) I can't see a robust alternative to either (i) slices are writeable and not folded, or (ii) we have const.

That's what COW is all about. The only downside of COW is that it is not enforced by the language. But then I'd argue the performance (and simplicity) upside of COW outweigh the downside. But it is a judgement call for sure. I think it would take a strong case to convince Walter at this point to abandon COW. I think the lack of documentation about string literals contributed to the specific examples the OP ran into. D's behavior didn't surprise me at all given the C heritage. In fact I would have been surprised if it didn't follow C by default.
Feb 08 2005
next sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <cubkj0$11qs$1 digitaldaemon.com>, Ben Hinkle says...
But surely the problem is that, because slicing supports, well, slices, 
it is inevitable that one will be downlow in the call chain, and slicing 
a string whose original source is unknown, or at least hard to know for 
sure. In that case, do we slice or dup first? (Of course, in most cases, 
modifying something that you do not know anything about's going to be 
dodgy, but there will certainly be cases where it'd be desirable, I'm 
sure. Otherwise, no-one would've reported the problem.)

I can't see a robust alternative to either (i) slices are writeable and 
not folded, or (ii) we have const.

That's what COW is all about. The only downside of COW is that it is not enforced by the language. But then I'd argue the performance (and simplicity) upside of COW outweigh the downside. But it is a judgement call for sure. I think it would take a strong case to convince Walter at this point to abandon COW. I think the lack of documentation about string literals contributed to the specific examples the OP ran into. D's behavior didn't surprise me at all given the C heritage. In fact I would have been surprised if it didn't follow C by default.

I don't think anyone is suggesting abandoning the CoW, Ben. What's needed is a way to indicate, to the compiler, /when/ CoW is needed ... and have it enforce that. The alternative is that libraries will be full of arbitrary array.dup *just in case* the caller might modify the result. This is ineffective, and wholly inefficient. When folk start writing multi-threaded apps in earnest (the next big wave -- just look at Niagara and Cell) this will become a critical aspect of a language. For those of us who write heavily multi-threaded apps already, and/or thread-aware libraries, it's an issue right now. CoW is just fine. However, when you sit down to write code that adheres to those ideals, D cannot keep up. Keep CoW, but have the darned compiler enforce it. Even in a simple manner :-) For example: // return content maintained within such that // it may be searched, inspected, traversed, // compressed into another buffer, CRC'd, // Base64'd, marshalled and sent over the wire, // sent to a file, to the console, or whatever, // in a read-only manner. NB: content can be huge! // This content is externally-immutable by design, // so we don't lose track of internal changes. readonly dchar[] getReadonlyContent() { return content; } Look Ma! no .dup! There's a lot of hand-waving about how D stops a programmer from inadvertantly doing the wrong thing -- this is an example of how the compiler really /could/ do something very useful. Let's all give the Cow a big hand ... Hoorah! Now lets get a CoW-catcher attached to the language. - Kris
Feb 08 2005
parent reply Ben Hinkle <Ben_member pathlink.com> writes:
In article <cubv88$1aoi$1 digitaldaemon.com>, Kris says...
In article <cubkj0$11qs$1 digitaldaemon.com>, Ben Hinkle says...
But surely the problem is that, because slicing supports, well, slices, 
it is inevitable that one will be downlow in the call chain, and slicing 
a string whose original source is unknown, or at least hard to know for 
sure. In that case, do we slice or dup first? (Of course, in most cases, 
modifying something that you do not know anything about's going to be 
dodgy, but there will certainly be cases where it'd be desirable, I'm 
sure. Otherwise, no-one would've reported the problem.)

I can't see a robust alternative to either (i) slices are writeable and 
not folded, or (ii) we have const.

That's what COW is all about. The only downside of COW is that it is not enforced by the language. But then I'd argue the performance (and simplicity) upside of COW outweigh the downside. But it is a judgement call for sure. I think it would take a strong case to convince Walter at this point to abandon COW. I think the lack of documentation about string literals contributed to the specific examples the OP ran into. D's behavior didn't surprise me at all given the C heritage. In fact I would have been surprised if it didn't follow C by default.

I don't think anyone is suggesting abandoning the CoW, Ben. What's needed is a way to indicate, to the compiler, /when/ CoW is needed ... and have it enforce that. The alternative is that libraries will be full of arbitrary array.dup *just in case* the caller might modify the result. This is ineffective, and wholly inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not when you return a string. What you are suggesting could be called "copy on return" but it isn't "copy on write". Maybe since you say COW still has a place in D and then I'd say it looks like you are argueing for "copy on anything". One can argue COW is unsafe because it assumes the user actually obeys COW but that's how COW works. It is a trade-off.
CoW is just fine. However, when you sit down to write code that adheres to those
ideals, D cannot keep up. Keep CoW, but have the darned compiler enforce it.
Even in a simple manner :-)

For example: 

// return content maintained within such that
// it may be searched, inspected, traversed, 
// compressed into another buffer, CRC'd, 
// Base64'd, marshalled and sent over the wire,
// sent to a file, to the console, or whatever,
// in a read-only manner. NB: content can be huge! 
// This content is externally-immutable by design, 
// so we don't lose track of internal changes. 

readonly dchar[] getReadonlyContent()
{
return content;
}

Look Ma! no .dup!

Why would you .dup something that you aren't writing to? That's COW without the OW. Of course that is wasteful.
There's a lot of hand-waving about how D stops a programmer from inadvertantly
doing the wrong thing -- this is an example of how the compiler really /could/
do something very useful. Let's all give the Cow a big hand ... Hoorah! Now lets
get a CoW-catcher attached to the language.

- Kris

Feb 08 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cuc2kg$1df9$1 digitaldaemon.com>, Ben Hinkle says...
The alternative is that libraries will be full of arbitrary array.dup *just in
case* the caller might modify the result. This is ineffective, and wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not when you return a string. What you are suggesting could be called "copy on return" but it isn't "copy on write". Maybe since you say COW still has a place in D and then I'd say it looks like you are argueing for "copy on anything". One can argue COW is unsafe because it assumes the user actually obeys COW but that's how COW works. It is a trade-off.

I think we're actually saying the same thing, Ben; Right now there's only a flimsy and vague 'trust' mechanism in place, and even that only applies to folk who (a) understand what CoW means, and (b) fully understand where the content they just recieved actually came from -- the prior example might be buried deep under a number of layers, or could be provided without source-code (heavens!) Hence, CoW expectations are somewhat fluffy to say the least. Please allow me to restate the problem another way: a) you are provided with an array to fill up with data. b) you are provided with an array of data to inspect, but not mutate. D does not support any distinction between these two opposite cases. Both are just plain old arrays. Should one Cow (a)? Why? And how would data be communicated back via the clone of the provided buffer? One should be trusted to CoW (b) but it's just not enforced; nor is there any indication to distinguish it from (a). This is an accident waiting to happen. The upshot is that (i) the programmer needs an indication, perhaps within the data type, to grok just what operation is legitimate; and (ii) the compiler could, and most certainly should, enforce that distinction. Just to be sure: we're saying that CoW is context dependent (a & b above), and currently there's nothing within the language to steer a programmer or library-user in the right direction. In addition, we're saying the compiler should emit a (compile-time) error where one mistakenly violates the implicit assumptions of CoW. - Kris
Feb 08 2005
next sibling parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Kris" <Kris_member pathlink.com> wrote in message 
news:cuc739$1hfu$1 digitaldaemon.com...
 In article <cuc2kg$1df9$1 digitaldaemon.com>, Ben Hinkle says...
The alternative is that libraries will be full of arbitrary array.dup 
*just in
case* the caller might modify the result. This is ineffective, and 
wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not when you return a string. What you are suggesting could be called "copy on return" but it isn't "copy on write". Maybe since you say COW still has a place in D and then I'd say it looks like you are argueing for "copy on anything". One can argue COW is unsafe because it assumes the user actually obeys COW but that's how COW works. It is a trade-off.

I think we're actually saying the same thing, Ben; Right now there's only a flimsy and vague 'trust' mechanism in place, and even that only applies to folk who (a) understand what CoW means, and (b) fully understand where the content they just recieved actually came from -- the prior example might be buried deep under a number of layers, or could be provided without source-code (heavens!) Hence, CoW expectations are somewhat fluffy to say the least. Please allow me to restate the problem another way: a) you are provided with an array to fill up with data. b) you are provided with an array of data to inspect, but not mutate. D does not support any distinction between these two opposite cases. Both are just plain old arrays. Should one Cow (a)? Why? And how would data be communicated back via the clone of the provided buffer? One should be trusted to CoW (b) but it's just not enforced; nor is there any indication to distinguish it from (a). This is an accident waiting to happen. The upshot is that (i) the programmer needs an indication, perhaps within the data type, to grok just what operation is legitimate; and (ii) the compiler could, and most certainly should, enforce that distinction. Just to be sure: we're saying that CoW is context dependent (a & b above), and currently there's nothing within the language to steer a programmer or library-user in the right direction. In addition, we're saying the compiler should emit a (compile-time) error where one mistakenly violates the implicit assumptions of CoW.

That's a more cogent view (for you and for me). Well said. One thing I'm now wondering about, which I'd previously discounted, is whether the runtime attribute .readonly, might not now actually suffice. It'd leave people not having to care - only the compiler would do so. The (not inconsiderable) disadvantage is that slices would now not be struct slice { size_t len; T *ptr; }; but rather struct slice { size_t len; T *ptr; unsigned readOnly : 1; }; But maybe that's not a bad thing. We might be able to tag on more attributes within the bit field to better facilitate other features. (None spring to mind at the mo, naturally enough.) Thoughts?
Feb 08 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cuc89m$1iin$2 digitaldaemon.com>, Matthew says...
"Kris" <Kris_member pathlink.com> wrote in message 
 I think we're actually saying the same thing, Ben;

 Right now there's only a flimsy and vague 'trust' mechanism in place, 
 and even
 that only applies to folk who (a) understand what CoW means, and (b) 
 fully
 understand where the content they just recieved actually came from --  
 the prior
 example might be buried deep under a number of layers, or could be 
 provided
 without source-code (heavens!)

 Hence, CoW expectations are somewhat fluffy to say the least. Please 
 allow me to
 restate the problem another way:

 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

 D does not support any distinction between these two opposite cases. 
 Both are
 just plain old arrays. Should one Cow (a)? Why? And how would data be
 communicated back via the clone of the provided buffer? One should be 
 trusted to
 CoW (b) but it's just not enforced; nor is there any indication to 
 distinguish
 it from (a). This is an accident waiting to happen.

 The upshot is that (i) the programmer needs an indication, perhaps 
 within the
 data type, to grok just what operation is legitimate; and (ii) the 
 compiler
 could, and most certainly should, enforce that distinction.

 Just to be sure: we're saying that CoW is context dependent (a & b 
 above), and
 currently there's nothing within the language to steer a programmer or
 library-user in the right direction. In addition, we're saying the 
 compiler
 should emit a (compile-time) error where one mistakenly violates the 
 implicit
 assumptions of CoW.

That's a more cogent view (for you and for me). Well said. One thing I'm now wondering about, which I'd previously discounted, is whether the runtime attribute .readonly, might not now actually suffice. It'd leave people not having to care - only the compiler would do so. The (not inconsiderable) disadvantage is that slices would now not be struct slice { size_t len; T *ptr; }; but rather struct slice { size_t len; T *ptr; unsigned readOnly : 1; }; But maybe that's not a bad thing. We might be able to tag on more attributes within the bit field to better facilitate other features. (None spring to mind at the mo, naturally enough.) Thoughts?

Aye -- but, for compile-time enforcement, surely only the type info is needed regardless of implementation? If so, then slices-types would have to match, or be more stringent than the readonly characteristic of the slice-source (via the symbol table). Thus, slices upon *literals* would have to be declared readonly too. The indication would presumably also be present in the associated typeinfo instance. Perhaps I misunderstood you? - Kris
Feb 08 2005
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kris wrote:
 In article <cuc89m$1iin$2 digitaldaemon.com>, Matthew says...
 
One thing I'm now wondering about, which I'd previously discounted, is 
whether the runtime attribute .readonly, might not now actually suffice. 
It'd leave people not having to care - only the compiler would do so. 
The (not inconsiderable) disadvantage is that slices would now not be

   struct slice
   {
       size_t len;
       T        *ptr;
   };

but rather

   struct slice
   {
       size_t len;
       T        *ptr;
       unsigned readOnly : 1;
   };


Not necessarily. The "readonly" boolean could just as well go in typeinfo... After all, it doesn't affect the contents whatsoever. Sorta like (char*) and (int*), which both point to the same place. Or like with "char *" and const char *" in That Other Language... "casting" readonly=0 to readonly=1 would be allowed, by explicitly setting it (just like with array.length). But vice-versa would not.
 Aye -- but, for compile-time enforcement, surely only the type info is needed
 regardless of implementation? If so, then slices-types would have to match, or
 be more stringent than the readonly characteristic of the slice-source (via the
 symbol table). 
 
 Thus, slices upon *literals* would have to be declared readonly too. The
 indication would presumably also be present in the associated typeinfo
instance.

It might be needed to have *yet another* flag to indicate literals. The compiler "needs" it to be able to such performance hacks as :
 char* toStringz(char[] string)
     {
 	char* p;
 	char[] copy;
 
 	if (string.length == 0)
 	    return "";
 
 	p = &string[0] + string.length;
 
 	// Peek past end of string[], if it's 0, no conversion necessary.
 	// Note that the compiler will put a 0 past the end of static
 	// strings, and the storage allocator will put a 0 past the end
 	// of newly allocated char[]'s.
 	if (*p == 0)
 	    return string;
 
 	// Need to make a copy
 	copy = new char[string.length + 1];
 	copy[0..string.length] = string;
 	copy[string.length] = 0;
 	return copy;
     }

char* toStringz(char[] string) { if (string.length == 0) return ""; else if (string.literal) return string.ptr; else return string ~ "\0"; } Using "readonly" here won't work, since slices are not NUL-terminated. Only the string literals are, with '\u0000', so they can work with C. When array literals and hash literals finally arrive, it will be even more fun. (ignoring such almost-related concepts as function literals) int[] array = [ 1, 2, 3 ]; foo(cast(byte[]) [1,2]); str[str] hash = [ "one": 1, "two": 2, "three": 3 ]; --anders
Feb 09 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:cucklg$1u1i$1 digitaldaemon.com...
 Kris wrote:
 In article <cuc89m$1iin$2 digitaldaemon.com>, Matthew says...

One thing I'm now wondering about, which I'd previously discounted, 
is whether the runtime attribute .readonly, might not now actually 
suffice. It'd leave people not having to care - only the compiler 
would do so. The (not inconsiderable) disadvantage is that slices 
would now not be

   struct slice
   {
       size_t len;
       T        *ptr;
   };

but rather

   struct slice
   {
       size_t len;
       T        *ptr;
       unsigned readOnly : 1;
   };


Not necessarily. The "readonly" boolean could just as well go in typeinfo... After all, it doesn't affect the contents whatsoever.

How? A slice doesn't point to typeinfo, or at least it didn't last time I looked. Or do you mean that, under the seams, D would know that one slice pointed to char and another pointed to const char? In that case, where is that knowledge represented?
 Sorta like (char*) and (int*), which both point to the same place.
 Or like with "char *" and const char *" in That Other Language...

 "casting" readonly=0 to readonly=1 would be allowed, by explicitly
 setting it (just like with array.length). But vice-versa would not.

 Aye -- but, for compile-time enforcement, surely only the type info 
 is needed
 regardless of implementation? If so, then slices-types would have to 
 match, or
 be more stringent than the readonly characteristic of the 
 slice-source (via the
 symbol table). Thus, slices upon *literals* would have to be declared 
 readonly too. The
 indication would presumably also be present in the associated 
 typeinfo instance.

It might be needed to have *yet another* flag to indicate literals. The compiler "needs" it to be able to such performance hacks as :
 char* toStringz(char[] string)
     {
 char* p;
 char[] copy;

 if (string.length == 0)
     return "";

 p = &string[0] + string.length;

 // Peek past end of string[], if it's 0, no conversion necessary.
 // Note that the compiler will put a 0 past the end of static
 // strings, and the storage allocator will put a 0 past the end
 // of newly allocated char[]'s.
 if (*p == 0)
     return string;

 // Need to make a copy
 copy = new char[string.length + 1];
 copy[0..string.length] = string;
 copy[string.length] = 0;
 return copy;
     }

char* toStringz(char[] string) { if (string.length == 0) return ""; else if (string.literal) return string.ptr; else return string ~ "\0"; } Using "readonly" here won't work, since slices are not NUL-terminated. Only the string literals are, with '\u0000', so they can work with C. When array literals and hash literals finally arrive, it will be even more fun. (ignoring such almost-related concepts as function literals) int[] array = [ 1, 2, 3 ]; foo(cast(byte[]) [1,2]); str[str] hash = [ "one": 1, "two": 2, "three": 3 ];

Interesting ... Am too tired to think at the mo, but instinct tells me you may have a good point (wrt NUL-termination)
Feb 09 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Matthew wrote:

 How? A slice doesn't point to typeinfo, or at least it didn't last time 
 I looked.

How do you get the RTTI otherwise? I need to peek a little in DMD again.
 Or do you mean that, under the seams, D would know that one slice 
 pointed to char and another pointed to const char? In that case, where 
 is that knowledge represented?

That is an interesting question, but I'm sure it can't be impossible...
 Am too tired to think at the mo, but instinct tells me you may have a 
 good point (wrt NUL-termination)

Well, the current implementation is seriously flawed since it peeks outside the allocated memory which is a big no-no - just as usual. And that "storage allocator will put a 0 past the end" is not true, since it doesn't hold with certain multiples of two, such as 16. Right now it comes down to choices for D, either we .dup *always* - or make something good instead. Such as slices and readonly arrays. --anders
Feb 09 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:cucllv$1v8u$1 digitaldaemon.com...
 Matthew wrote:

 How? A slice doesn't point to typeinfo, or at least it didn't last time I 
 looked.

How do you get the RTTI otherwise? I need to peek a little in DMD again.

Class objects have RTTI. Arrays and basic types do not.
 Or do you mean that, under the seams, D would know that one slice pointed 
 to char and another pointed to const char? In that case, where is that 
 knowledge represented?

That is an interesting question, but I'm sure it can't be impossible...

I would expect it would involve significant redesign of some pretty basic parts of D.
 Am too tired to think at the mo, but instinct tells me you may have a 
 good point (wrt NUL-termination)

Well, the current implementation is seriously flawed since it peeks outside the allocated memory which is a big no-no - just as usual. And that "storage allocator will put a 0 past the end" is not true, since it doesn't hold with certain multiples of two, such as 16.

Don't worry about the toStringz problem. It will be fixed - and it is probably fixed in Walter's sandbox already. That was a bug in toStringz that didn't have anything to do with COW. It was more like RARL (read-a-random-location).
 Right now it comes down to choices for D, either we .dup *always* -
 or make something good instead. Such as slices and readonly arrays.

In your opinion, sure. I hope anyone reading this thread doesn't start ignoring COW and duping always. And I doubt Walter is going to add a readonly attribute.
Feb 09 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

Well, the current implementation is seriously flawed since it peeks
outside the allocated memory which is a big no-no - just as usual.
And that "storage allocator will put a 0 past the end" is not true,
since it doesn't hold with certain multiples of two, such as 16.

Don't worry about the toStringz problem. It will be fixed - and it is probably fixed in Walter's sandbox already. That was a bug in toStringz that didn't have anything to do with COW. It was more like RARL (read-a-random-location).

But if you remove the current check, you also remove the speedup ? This means that string literals now need to be copied for NUL-term. (unless there is some other way to recognize a string literal...)
Right now it comes down to choices for D, either we .dup *always* -
or make something good instead. Such as slices and readonly arrays.

In your opinion, sure. I hope anyone reading this thread doesn't start ignoring COW and duping always. And I doubt Walter is going to add a readonly attribute.

It's pretty much the only way to be sure, at the moment. (and sucks) The first fix would be to place string literals in read-only memory on Windows too, for portability. (and to help with string pooling too) The second and more demanding problems is to get some help enforce CoW? There needs to be something more than "honor" protecting immutability... Just as "inout" is a reasonable simplification and workaround for C++ references, there needs to be something for the const specifier. It doesn't have to cover all the uses, just the more common ones... And my suggestion was to try and add something for the arrays (only) --anders
Feb 09 2005
parent "Ben Hinkle" <bhinkle mathworks.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:cud8np$2idn$1 digitaldaemon.com...
 Ben Hinkle wrote:

Well, the current implementation is seriously flawed since it peeks
outside the allocated memory which is a big no-no - just as usual.
And that "storage allocator will put a 0 past the end" is not true,
since it doesn't hold with certain multiples of two, such as 16.

Don't worry about the toStringz problem. It will be fixed - and it is probably fixed in Walter's sandbox already. That was a bug in toStringz that didn't have anything to do with COW. It was more like RARL (read-a-random-location).

But if you remove the current check, you also remove the speedup ? This means that string literals now need to be copied for NUL-term. (unless there is some other way to recognize a string literal...)

I don't know how Walter is fixing it but it could end up copying more than the previous implementation. But there's really no way around fixing the bug.
Right now it comes down to choices for D, either we .dup *always* -
or make something good instead. Such as slices and readonly arrays.

In your opinion, sure. I hope anyone reading this thread doesn't start ignoring COW and duping always. And I doubt Walter is going to add a readonly attribute.

It's pretty much the only way to be sure, at the moment. (and sucks) The first fix would be to place string literals in read-only memory on Windows too, for portability. (and to help with string pooling too)

Sounds reasonable but I have no idea how Windows manages object files. If it's a simple matter of putting the data in some other segment then go for it but if it isn't that simple you have to weigh the costs.
 The second and more demanding problems is to get some help enforce CoW?
 There needs to be something more than "honor" protecting immutability...

I'm sure Walter has heard your request. Personally I'd be surprised if he winds up adding a const/readonly attribute but you never know...
 Just as "inout" is a reasonable simplification and workaround for
 C++ references, there needs to be something for the const specifier.
 It doesn't have to cover all the uses, just the more common ones...
 And my suggestion was to try and add something for the arrays (only)

 --anders 

Feb 09 2005
prev sibling parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Kris" <Kris_member pathlink.com> wrote in message 
news:cucdre$1n34$1 digitaldaemon.com...
 In article <cuc89m$1iin$2 digitaldaemon.com>, Matthew says...
"Kris" <Kris_member pathlink.com> wrote in message
 I think we're actually saying the same thing, Ben;

 Right now there's only a flimsy and vague 'trust' mechanism in 
 place,
 and even
 that only applies to folk who (a) understand what CoW means, and (b)
 fully
 understand where the content they just recieved actually came 
 from --
 the prior
 example might be buried deep under a number of layers, or could be
 provided
 without source-code (heavens!)

 Hence, CoW expectations are somewhat fluffy to say the least. Please
 allow me to
 restate the problem another way:

 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not 
 mutate.

 D does not support any distinction between these two opposite cases.
 Both are
 just plain old arrays. Should one Cow (a)? Why? And how would data 
 be
 communicated back via the clone of the provided buffer? One should 
 be
 trusted to
 CoW (b) but it's just not enforced; nor is there any indication to
 distinguish
 it from (a). This is an accident waiting to happen.

 The upshot is that (i) the programmer needs an indication, perhaps
 within the
 data type, to grok just what operation is legitimate; and (ii) the
 compiler
 could, and most certainly should, enforce that distinction.

 Just to be sure: we're saying that CoW is context dependent (a & b
 above), and
 currently there's nothing within the language to steer a programmer 
 or
 library-user in the right direction. In addition, we're saying the
 compiler
 should emit a (compile-time) error where one mistakenly violates the
 implicit
 assumptions of CoW.

That's a more cogent view (for you and for me). Well said. One thing I'm now wondering about, which I'd previously discounted, is whether the runtime attribute .readonly, might not now actually suffice. It'd leave people not having to care - only the compiler would do so. The (not inconsiderable) disadvantage is that slices would now not be struct slice { size_t len; T *ptr; }; but rather struct slice { size_t len; T *ptr; unsigned readOnly : 1; }; But maybe that's not a bad thing. We might be able to tag on more attributes within the bit field to better facilitate other features. (None spring to mind at the mo, naturally enough.) Thoughts?

Aye -- but, for compile-time enforcement, surely only the type info is needed regardless of implementation? If so, then slices-types would have to match, or be more stringent than the readonly characteristic of the slice-source (via the symbol table). Thus, slices upon *literals* would have to be declared readonly too. The indication would presumably also be present in the associated typeinfo instance. Perhaps I misunderstood you?

To be honest, Kris, I've kind of lost the thread in this thread. It'd be great if people who've experience the problem(s) could post a short illustrative sample. Not only would that help clarify the precise root of the problem, I'm sure it'd also help Walter to see it, if it's not something that's crossed his path as yet.
Feb 09 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Matthew wrote:

 To be honest, Kris, I've kind of lost the thread in this thread.
 
 It'd be great if people who've experience the problem(s) could post a 
 short illustrative sample. Not only would that help clarify the precise 
 root of the problem, I'm sure it'd also help Walter to see it, if it's 
 not something that's crossed his path as yet.

The thread basically boiled down to a discussion about two issues: 1) That string literals are read-only on Unix, but not on Windows This causes segfaults if you assign to a string literal, or a slice of a string literal (or anything else that points to the literal) One suggestion was to make the literals read-only in the D spec, and store them in read-only memory on Windows too (to get an A.V.) 2) A very long discussion about "const" or "readonly" strings. Kris wanted to use those to make sure that his return values or array parameters were not being modified by the receiver of them. There seemed to be a general opinion that the Copy-on-Write should have some way of being enforced by the compiler. (readonly strings) Right now there is no way of passing a "readonly" struct pointer, or making sure some kind of "inout" reference is used on arrays that the function actually intends to modify the contents of... Beyond that, there was the usual bug or omission in the D spec:
 http://www.digitalmars.com/d/cppstrings.html says:
 
  In D, use the array slicing syntax in the natural manner:

     char[] str = "hello";
     str[1..2] = '?';        // str is "h??lo"


Which should be:
     char[] str = "hello".dup;
     str[1..3] = '?';        // str is "h??lo" 


At least, that's how I started the thread and how I'll end it. The End. --anders
Feb 09 2005
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 9 Feb 2005 05:29:13 +0000 (UTC), Kris wrote:


[snip]

 Hence, CoW expectations are somewhat fluffy to say the least. Please allow me
to
 restate the problem another way:
 
 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

c) you are provided with an array and might update parts of, depending on other parameters. This is difficult one because the called routine does a CoW if and only if it modifies the array data. The calling routine does not know if the array is going to be modified or not prior to the call. Sometimes that may be significant. So does the caller do .dup just in case it gets changed? The permutations are complex. -- Derek Melbourne, Australia 9/02/2005 5:39:15 PM
Feb 08 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <n0teow12fzy7.eoffoinrx065$.dlg 40tude.net>, Derek Parnell says...
On Wed, 9 Feb 2005 05:29:13 +0000 (UTC), Kris wrote:


[snip]

 Hence, CoW expectations are somewhat fluffy to say the least. Please allow me
to
 restate the problem another way:
 
 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

c) you are provided with an array and might update parts of, depending on other parameters. This is difficult one because the called routine does a CoW if and only if it modifies the array data. The calling routine does not know if the array is going to be modified or not prior to the call. Sometimes that may be significant. So does the caller do .dup just in case it gets changed? The permutations are complex.

I think that one *may* be covered by (b). Let me have a go at it: In such a case, the callee would presumably CoW as necessary, and return the modified copy rather than the original. Part of the contract between the two would have to arrange for the caller to assume a change may occur, and act accordingly. This is CoW, but without enforcement as in case (b). Ideally, the array would be readonly on input, and readonly on return. With the slight twist that the return might actually be a *modifed copy* of the original (but often just the readonly original instead). I may not have understood correctly though. Please correct me if so! I'd like to point out, if I may, that the issue is perhaps most acute with return-values. I say this because there's not even a parameter-name to steer a user. Such a case is where any implied type-information is utterly lost (as in the original example). I'd also like to point out that CoW enforcement (by the compiler) does *not* induce any additional runtime overhead. It simply steers the programmer, and mandates the rules are obeyed, at compile-time. - Kris
Feb 08 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cucfmc$1ob0$1 digitaldaemon.com>, Kris says...
In article <n0teow12fzy7.eoffoinrx065$.dlg 40tude.net>, Derek Parnell says...
On Wed, 9 Feb 2005 05:29:13 +0000 (UTC), Kris wrote:


[snip]

 Hence, CoW expectations are somewhat fluffy to say the least. Please allow me
to
 restate the problem another way:
 
 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

c) you are provided with an array and might update parts of, depending on other parameters. This is difficult one because the called routine does a CoW if and only if it modifies the array data. The calling routine does not know if the array is going to be modified or not prior to the call. Sometimes that may be significant. So does the caller do .dup just in case it gets changed? The permutations are complex.

I think that one *may* be covered by (b). Let me have a go at it: In such a case, the callee would presumably CoW as necessary, and return the modified copy rather than the original. Part of the contract between the two would have to arrange for the caller to assume a change may occur, and act accordingly. This is CoW, but without enforcement as in case (b). Ideally, the array would be readonly on input, and readonly on return. With the slight twist that the return might actually be a *modifed copy* of the original (but often just the readonly original instead). I may not have understood correctly though. Please correct me if so! I'd like to point out, if I may, that the issue is perhaps most acute with return-values. I say this because there's not even a parameter-name to steer a user. Such a case is where any implied type-information is utterly lost (as in the original example). I'd also like to point out that CoW enforcement (by the compiler) does *not* induce any additional runtime overhead. It simply steers the programmer, and mandates the rules are obeyed, at compile-time. - Kris

Ack! I'm tired, and doubling up on phrases; The callee could also declare the input array as 'inout', and therefore modify the .length and happily mutate away. The two parties might also arrange to pass an 'inout' array pointer. Neither of these options would require CoW, nor readonly attributes, at all. A case such as this is perhaps outside the realm of concern? - Kris
Feb 09 2005
parent "Walter" <newshound digitalmars.com> writes:
"Kris" <Kris_member pathlink.com> wrote in message
news:cucgl8$1pks$1 digitaldaemon.com...
 The callee could also declare the input array as 'inout', and therefore

 the .length and happily mutate away. The two parties might also arrange to

 an 'inout' array pointer. Neither of these options would require CoW, nor
 readonly attributes, at all. A case such as this is perhaps outside the

 concern?

I believe you're right that 'inout' handles this particular case adequately.
Feb 09 2005
prev sibling parent reply Ben Hinkle <Ben_member pathlink.com> writes:
In article <cuc739$1hfu$1 digitaldaemon.com>, Kris says...
In article <cuc2kg$1df9$1 digitaldaemon.com>, Ben Hinkle says...
The alternative is that libraries will be full of arbitrary array.dup *just in
case* the caller might modify the result. This is ineffective, and wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not when you return a string. What you are suggesting could be called "copy on return" but it isn't "copy on write". Maybe since you say COW still has a place in D and then I'd say it looks like you are argueing for "copy on anything". One can argue COW is unsafe because it assumes the user actually obeys COW but that's how COW works. It is a trade-off.

I think we're actually saying the same thing, Ben; Right now there's only a flimsy and vague 'trust' mechanism in place, and even that only applies to folk who (a) understand what CoW means, and (b) fully understand where the content they just recieved actually came from -- the prior example might be buried deep under a number of layers, or could be provided without source-code (heavens!)

I completely agree COW relies on the programmers knowing what they are doing. The tradeoffs are: 1) assume the programmers know about COW and follow it (not always true), or 2) dup like crazy and watch performance suffer, or 3) add a const/readonly attribute to help with some common gochas None of these are perfect. Const in C++ has some issues that I assume readonly would share. Plus to learn about readonly and all the gocha's one might have to invest just as much effort as learning COW - in fact one might argue that learning COW is much simpler than learning about how to use readonly/const. For example, here's some C++ code where COW would arguably be more safe than const: void foo(char*x, const char*y) { x[0] = 'a'; ... } Now if one calls foo(z,z) for a char*z, then even though y says it is const there isn't anything preventing x from assinging to the same string. So is it right to say the y is const? Sure but it doesn't mean the contents of y won't change. If COW was used the function foo in D would look like void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... } I would say the D foo is safer than the C++ foo. Adding a readonly or const attribute will catch some errors at compile time but I argue it would introduce complexity and have gocha's and corner cases that make it less desirable than COW.
Feb 09 2005
next sibling parent reply Ben Hinkle <Ben_member pathlink.com> writes:
Adding a readonly or const attribute will catch some errors at compile time but
I argue it would introduce complexity and have gocha's and corner cases that
make it less desirable than COW.

sorry for the double-post but I forgot to add, as I mentioned before, if we integrated a lightning-fast dlint program into our editors that we can flag COW violations as you write the code *before* you actually compile. If Walter really wants to he can also add COW violation statements to the verbose mode output - but I wouldn't hold my breath for that.
Feb 09 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 sorry for the double-post but I forgot to add, as I mentioned before, if we
 integrated a lightning-fast dlint program into our editors that we can flag COW
 violations as you write the code *before* you actually compile. If Walter
really
 wants to he can also add COW violation statements to the verbose mode output -
 but I wouldn't hold my breath for that.

Wouldn't that be a lot like adding (optional) warnings to the compiler ? But, yeah, if the compiler doesn't check it - then someone else must... --anders
Feb 09 2005
prev sibling next sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <cud2ea$2aot$1 digitaldaemon.com>, Ben Hinkle says...
In article <cuc739$1hfu$1 digitaldaemon.com>, Kris says...
In article <cuc2kg$1df9$1 digitaldaemon.com>, Ben Hinkle says...
The alternative is that libraries will be full of arbitrary array.dup *just in
case* the caller might modify the result. This is ineffective, and wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not when you return a string. What you are suggesting could be called "copy on return" but it isn't "copy on write". Maybe since you say COW still has a place in D and then I'd say it looks like you are argueing for "copy on anything". One can argue COW is unsafe because it assumes the user actually obeys COW but that's how COW works. It is a trade-off.

I think we're actually saying the same thing, Ben; Right now there's only a flimsy and vague 'trust' mechanism in place, and even that only applies to folk who (a) understand what CoW means, and (b) fully understand where the content they just recieved actually came from -- the prior example might be buried deep under a number of layers, or could be provided without source-code (heavens!)

I completely agree COW relies on the programmers knowing what they are doing. The tradeoffs are: 1) assume the programmers know about COW and follow it (not always true), or 2) dup like crazy and watch performance suffer, or 3) add a const/readonly attribute to help with some common gochas None of these are perfect. Const in C++ has some issues that I assume readonly would share. Plus to learn about readonly and all the gocha's one might have to invest just as much effort as learning COW - in fact one might argue that learning COW is much simpler than learning about how to use readonly/const. For example, here's some C++ code where COW would arguably be more safe than const: void foo(char*x, const char*y) { x[0] = 'a'; ... } Now if one calls foo(z,z) for a char*z, then even though y says it is const there isn't anything preventing x from assinging to the same string. So is it right to say the y is const? Sure but it doesn't mean the contents of y won't change. If COW was used the function foo in D would look like void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... } I would say the D foo is safer than the C++ foo. Adding a readonly or const attribute will catch some errors at compile time but I argue it would introduce complexity and have gocha's and corner cases that make it less desirable than COW.

What you appear to be saying here, is that the programmer can always write the following: * (char *) 0x00000000 = 0; Which, of course, they can. Nobody is asking for a mechanism to stop the programmer from wholly circumventing the language constructs. Instead, we're looking for a means of "steering the programmer in the right direction" -- a notion Walter seems to strive for. I was surprised at how many "I assume" and "might" phrases were in your post, yet you've already surmised that something like 'readonly' is effectively worthless. At the same time you note, just like the rest of us, that CoW has significant issues. Which it certainly has. Where did that constructive attitude go to?
Feb 09 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
For example, here's some C++ code where COW would arguably be more safe 
than
const:
void foo(char*x, const char*y) { x[0] = 'a'; ... }
Now if one calls foo(z,z) for a char*z, then even though y says it is 
const
there isn't anything preventing x from assinging to the same string. So is 
it
right to say the y is const? Sure but it doesn't mean the contents of y 
won't
change. If COW was used the function foo in D would look like
void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
I would say the D foo is safer than the C++ foo.

Adding a readonly or const attribute will catch some errors at compile 
time but
I argue it would introduce complexity and have gocha's and corner cases 
that
make it less desirable than COW.

What you appear to be saying here, is that the programmer can always write the following: * (char *) 0x00000000 = 0;

uhh - I don't know where you got that from. I was giving an example in C++ where const might confuse a careless programmer. What does that have to do with a seg-v?
 Which, of course, they can. Nobody is asking for a mechanism to stop the
 programmer from wholly circumventing the language constructs. Instead, 
 we're
 looking for a means of "steering the programmer in the right direction" --  
 a
 notion Walter seems to strive for.

My C++ example didn't circumvent anything. It is perfectly legal and cast-free C++.
 I was surprised at how many "I assume" and "might" phrases were in your 
 post,
 yet you've already surmised that something like 'readonly' is effectively
 worthless. At the same time you note, just like the rest of us, that CoW 
 has
 significant issues. Which it certainly has.

I try to keep my posts neutral and avoid using extreme language. I've been pretty turned off by some of the attitudes in these recent threads. It's probably due to my math background that I don't like stating opinions as facts or pushing one point of view. For example when I said something like "I assume adding readonly to D will behave like const in C++" that's because it is possible that some readonly might not have the same problems that const does in C++ but I think it is reasonable to assume they are similar enough to compare them. I never characterized readonly as "effectively worthless". I said it isn't perfect. I said none of the solutions are perfect.
 Where did that constructive attitude go to?

I'm not sure what you mean by that. I've suggested using dlint to help coders catch bugs before compiling. I've suggested putting more messages in the verbose output of dmd.
Feb 09 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <cudmqh$30rt$1 digitaldaemon.com>, Ben Hinkle says...
For example, here's some C++ code where COW would arguably be more safe 
than
const:
void foo(char*x, const char*y) { x[0] = 'a'; ... }
Now if one calls foo(z,z) for a char*z, then even though y says it is 
const
there isn't anything preventing x from assinging to the same string. So is 
it
right to say the y is const? Sure but it doesn't mean the contents of y 
won't
change. If COW was used the function foo in D would look like
void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
I would say the D foo is safer than the C++ foo.

Adding a readonly or const attribute will catch some errors at compile 
time but
I argue it would introduce complexity and have gocha's and corner cases 
that
make it less desirable than COW.

What you appear to be saying here, is that the programmer can always write the following: * (char *) 0x00000000 = 0;

uhh - I don't know where you got that from. I was giving an example in C++ where const might confuse a careless programmer. What does that have to do with a seg-v?

Ben, you trawled up an example where the callee had tried to be explicit about its contract, but the caller deliberately (or recklessly) violated that by passing the same array for both arguments. Yes! that could happen! Just as the seg-v could 'happen'. We're not suggesting the compiler should try to eliminate such behavior; so your example and the seg-v are, within this context, equivalent.
My C++ example didn't circumvent anything. It is perfectly legal and 
cast-free C++.

It circumvented the contract of the callee, as defined by the types of its parameters, using knowledge of the callee internals. Split some more hairs, dude.
 I've been 
pretty turned off by some of the attitudes in these recent threads. 

You are, by no means, the only one.
probably due to my math background that I don't like stating opinions as 
facts or pushing one point of view. 

Yet, you do so with gay abandon ("And I doubt Walter is going to add a readonly attribute").
 Where did that constructive attitude go to?

I'm not sure what you mean by that.

There was a time, in the past, when you might perhaps have looked at how readonly *could* or *might* be implemented in a valid, useful, and natural manner. Whilst it is perfectly valid to take up an opposing view, it doesn't do anyone any favours by loading the proposed notion with characteristics of a failed - and by your own admission, vaguely related - implementation from some other language. I understand that the C++ "const" has left a bad taste in the mouths of many; that should not colour the potential for D to be a better language than it currently is. Chill.
Feb 09 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 9 Feb 2005 20:11:12 +0000 (UTC), Kris <Kris_member pathlink.com>  
wrote:
 In article <cudmqh$30rt$1 digitaldaemon.com>, Ben Hinkle says...
 For example, here's some C++ code where COW would arguably be more  
 safe
 than
 const:
 void foo(char*x, const char*y) { x[0] = 'a'; ... }
 Now if one calls foo(z,z) for a char*z, then even though y says it is
 const
 there isn't anything preventing x from assinging to the same string.  
 So is
 it
 right to say the y is const? Sure but it doesn't mean the contents of  
 y
 won't
 change. If COW was used the function foo in D would look like
 void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
 I would say the D foo is safer than the C++ foo.

 Adding a readonly or const attribute will catch some errors at compile
 time but
 I argue it would introduce complexity and have gocha's and corner  
 cases
 that
 make it less desirable than COW.

What you appear to be saying here, is that the programmer can always write the following: * (char *) 0x00000000 = 0;

uhh - I don't know where you got that from. I was giving an example in C++ where const might confuse a careless programmer. What does that have to do with a seg-v?

Ben, you trawled up an example where the callee had tried to be explicit about its contract, but the caller deliberately (or recklessly) violated that by passing the same array for both arguments. Yes! that could happen! Just as the seg-v could 'happen'. We're not suggesting the compiler should try to eliminate such behavior; so your example and the seg-v are, within this context, equivalent.

 My C++ example didn't circumvent anything. It is perfectly legal and
 cast-free C++.

It circumvented the contract of the callee, as defined by the types of its parameters,

IMO the contract said, I won't mutate parameter 2, I _might_ mutate parameter 1. So it didn't violate it, it was just a risky gamble. The programmer simply made a mistake but not recognising that. It may be shortsighted to not realise that passing the same variable for both could cause odd behaviour, it is also possible it might not have. It's in everyones best interest for the compiler to do all it can to help find these, I don't think anyone here is disagreeing with that.
 using knowledge of the callee internals. Split some more hairs,
 dude.

It can still happen if the caller has no knowledge of the internals. It's also possible passing the same param had no ill effect, due to the internals.
 I've been
 pretty turned off by some of the attitudes in these recent threads.

You are, by no means, the only one.
 probably due to my math background that I don't like stating opinions as
 facts or pushing one point of view.

Yet, you do so with gay abandon ("And I doubt Walter is going to add a readonly attribute").

IMO that's an opinion, not stated as a fact. I didn't feel Ben was 'pushing' one point of view, he was certainly arguing _his_ point of view.
 Where did that constructive attitude go to?

I'm not sure what you mean by that.

There was a time, in the past, when you might perhaps have looked at how readonly *could* or *might* be implemented in a valid, useful, and natural manner. Whilst it is perfectly valid to take up an opposing view, it doesn't do anyone any favours by loading the proposed notion with characteristics of a failed - and by your own admission, vaguely related - implementation from some other language. I understand that the C++ "const" has left a bad taste in the mouths of many; that should not colour the potential for D to be a better language than it currently is.

I am interested in a post detailing the evils of const in C++, because I am certainly not 100% versed on the subject, yet, I want to participate in this discussion because I hope that some sort of compiler checked readonly is possible. Once we have said post we can discuss how it applies to D, because D is not exactly the same as C and it's possible the readonly idea is not the same as the cost one. Regan
Feb 09 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opslyc05jt23k2f5 ally...
 On Wed, 9 Feb 2005 20:11:12 +0000 (UTC), Kris 
 <Kris_member pathlink.com>  wrote:
 In article <cudmqh$30rt$1 digitaldaemon.com>, Ben Hinkle says...
 For example, here's some C++ code where COW would arguably be more 
 safe
 than
 const:
 void foo(char*x, const char*y) { x[0] = 'a'; ... }
 Now if one calls foo(z,z) for a char*z, then even though y says it 
 is
 const
 there isn't anything preventing x from assinging to the same 
 string.  So is
 it
 right to say the y is const? Sure but it doesn't mean the contents 
 of  y
 won't
 change. If COW was used the function foo in D would look like
 void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
 I would say the D foo is safer than the C++ foo.

 Adding a readonly or const attribute will catch some errors at 
 compile
 time but
 I argue it would introduce complexity and have gocha's and corner 
 cases
 that
 make it less desirable than COW.

What you appear to be saying here, is that the programmer can always write the following: * (char *) 0x00000000 = 0;

uhh - I don't know where you got that from. I was giving an example in C++ where const might confuse a careless programmer. What does that have to do with a seg-v?

Ben, you trawled up an example where the callee had tried to be explicit about its contract, but the caller deliberately (or recklessly) violated that by passing the same array for both arguments. Yes! that could happen! Just as the seg-v could 'happen'. We're not suggesting the compiler should try to eliminate such behavior; so your example and the seg-v are, within this context, equivalent.

 My C++ example didn't circumvent anything. It is perfectly legal and
 cast-free C++.

It circumvented the contract of the callee, as defined by the types of its parameters,

IMO the contract said, I won't mutate parameter 2, I _might_ mutate parameter 1. So it didn't violate it, it was just a risky gamble. The programmer simply made a mistake but not recognising that. It may be shortsighted to not realise that passing the same variable for both could cause odd behaviour, it is also possible it might not have. It's in everyones best interest for the compiler to do all it can to help find these, I don't think anyone here is disagreeing with that.
 using knowledge of the callee internals. Split some more hairs,
 dude.

It can still happen if the caller has no knowledge of the internals. It's also possible passing the same param had no ill effect, due to the internals.
 I've been
 pretty turned off by some of the attitudes in these recent threads.

You are, by no means, the only one.
 probably due to my math background that I don't like stating 
 opinions as
 facts or pushing one point of view.

Yet, you do so with gay abandon ("And I doubt Walter is going to add a readonly attribute").

IMO that's an opinion, not stated as a fact. I didn't feel Ben was 'pushing' one point of view, he was certainly arguing _his_ point of view.
 Where did that constructive attitude go to?

I'm not sure what you mean by that.

There was a time, in the past, when you might perhaps have looked at how readonly *could* or *might* be implemented in a valid, useful, and natural manner. Whilst it is perfectly valid to take up an opposing view, it doesn't do anyone any favours by loading the proposed notion with characteristics of a failed - and by your own admission, vaguely related - implementation from some other language. I understand that the C++ "const" has left a bad taste in the mouths of many; that should not colour the potential for D to be a better language than it currently is.

I am interested in a post detailing the evils of const in C++, because I am certainly not 100% versed on the subject, yet, I want to participate in this discussion because I hope that some sort of compiler checked readonly is possible.

This is highly contentious stuff. I am a *very big* fan of const, and think it is one of several ways in which C++ is manifestly superior to other languages, D included. IMO, almost all criticisms of const boil down to, with as much respect as one can possibly having saying this, lazyness and ignorance, or, for compiler vendors, the challenges of implementing (which even fans like me have to consider are not inconsiderable). The one exception to this is that Logical Constness + Multithreading is a dangerous mix: I've just written the next instalment of my "Flexible C++" column on this very issue - that's what Walter was alluding to in his post of a couple of hours ago - which should be out (http://www.cuj.com) sometime next week. If you want to see some of the really powerful things you can do with const, then get your employer to get a copy of IC++, and digest away. :-) Cheers Matthew
Feb 09 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 10 Feb 2005 11:12:25 +1100, Matthew  
<admin stlsoft.dot.dot.dot.dot.org> wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opslyc05jt23k2f5 ally...
 On Wed, 9 Feb 2005 20:11:12 +0000 (UTC), Kris
 <Kris_member pathlink.com>  wrote:
 In article <cudmqh$30rt$1 digitaldaemon.com>, Ben Hinkle says...
 For example, here's some C++ code where COW would arguably be more
 safe
 than
 const:
 void foo(char*x, const char*y) { x[0] = 'a'; ... }
 Now if one calls foo(z,z) for a char*z, then even though y says it
 is
 const
 there isn't anything preventing x from assinging to the same
 string.  So is
 it
 right to say the y is const? Sure but it doesn't mean the contents
 of  y
 won't
 change. If COW was used the function foo in D would look like
 void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
 I would say the D foo is safer than the C++ foo.

 Adding a readonly or const attribute will catch some errors at
 compile
 time but
 I argue it would introduce complexity and have gocha's and corner
 cases
 that
 make it less desirable than COW.

What you appear to be saying here, is that the programmer can always write the following: * (char *) 0x00000000 = 0;

uhh - I don't know where you got that from. I was giving an example in C++ where const might confuse a careless programmer. What does that have to do with a seg-v?

Ben, you trawled up an example where the callee had tried to be explicit about its contract, but the caller deliberately (or recklessly) violated that by passing the same array for both arguments. Yes! that could happen! Just as the seg-v could 'happen'. We're not suggesting the compiler should try to eliminate such behavior; so your example and the seg-v are, within this context, equivalent.

 My C++ example didn't circumvent anything. It is perfectly legal and
 cast-free C++.

It circumvented the contract of the callee, as defined by the types of its parameters,

IMO the contract said, I won't mutate parameter 2, I _might_ mutate parameter 1. So it didn't violate it, it was just a risky gamble. The programmer simply made a mistake but not recognising that. It may be shortsighted to not realise that passing the same variable for both could cause odd behaviour, it is also possible it might not have. It's in everyones best interest for the compiler to do all it can to help find these, I don't think anyone here is disagreeing with that.
 using knowledge of the callee internals. Split some more hairs,
 dude.

It can still happen if the caller has no knowledge of the internals. It's also possible passing the same param had no ill effect, due to the internals.
 I've been
 pretty turned off by some of the attitudes in these recent threads.

You are, by no means, the only one.
 probably due to my math background that I don't like stating
 opinions as
 facts or pushing one point of view.

Yet, you do so with gay abandon ("And I doubt Walter is going to add a readonly attribute").

IMO that's an opinion, not stated as a fact. I didn't feel Ben was 'pushing' one point of view, he was certainly arguing _his_ point of view.
 Where did that constructive attitude go to?

I'm not sure what you mean by that.

There was a time, in the past, when you might perhaps have looked at how readonly *could* or *might* be implemented in a valid, useful, and natural manner. Whilst it is perfectly valid to take up an opposing view, it doesn't do anyone any favours by loading the proposed notion with characteristics of a failed - and by your own admission, vaguely related - implementation from some other language. I understand that the C++ "const" has left a bad taste in the mouths of many; that should not colour the potential for D to be a better language than it currently is.

I am interested in a post detailing the evils of const in C++, because I am certainly not 100% versed on the subject, yet, I want to participate in this discussion because I hope that some sort of compiler checked readonly is possible.

This is highly contentious stuff. I am a *very big* fan of const, and think it is one of several ways in which C++ is manifestly superior to other languages, D included.

Sorry, I wasn't aware of this position.
 IMO, almost all criticisms of const boil
 down to, with as much respect as one can possibly having saying this,
 lazyness and ignorance, or, for compiler vendors, the challenges of
 implementing (which even fans like me have to consider are not
 inconsiderable).

I am going to have to do some reading then.
 The one exception to this is that Logical Constness +
 Multithreading is a dangerous mix: I've just written the next instalment
 of my "Flexible C++" column on this very issue - that's what Walter was
 alluding to in his post of a couple of hours ago - which should be out
 (http://www.cuj.com) sometime next week.

Ahh, excellent.
 If you want to see some of the really powerful things you can do with
 const, then get your employer to get a copy of IC++, and digest away.
 :-)

Regan
Feb 09 2005
prev sibling parent "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:cud2ea$2aot$1 digitaldaemon.com...
Right now there's only a flimsy and vague 'trust' mechanism in place, and


that only applies to folk who (a) understand what CoW means, and (b)


understand where the content they just recieved actually came from -- the


example might be buried deep under a number of layers, or could be


without source-code (heavens!)

I completely agree COW relies on the programmers knowing what they are

 The tradeoffs are:
 1) assume the programmers know about COW and follow it (not always true),

 2) dup like crazy and watch performance suffer, or
 3) add a const/readonly attribute to help with some common gochas

 None of these are perfect. Const in C++ has some issues that I assume

 would share. Plus to learn about readonly and all the gocha's one might

 invest just as much effort as learning COW - in fact one might argue that
 learning COW is much simpler than learning about how to use

 For example, here's some C++ code where COW would arguably be more safe

 const:
 void foo(char*x, const char*y) { x[0] = 'a'; ... }
 Now if one calls foo(z,z) for a char*z, then even though y says it is

 there isn't anything preventing x from assinging to the same string. So is

 right to say the y is const? Sure but it doesn't mean the contents of y

 change. If COW was used the function foo in D would look like
 void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
 I would say the D foo is safer than the C++ foo.

 Adding a readonly or const attribute will catch some errors at compile

 I argue it would introduce complexity and have gocha's and corner cases

 make it less desirable than COW.

Whatever the right solution is, the C++ notion of "const" is the wrong solution. It's even worse than you showed: in multithreaded apps, assuming that const means it won't change is a disaster. So-called 'const' data can change out from under you in legal, standard conforming C++ programs. COW is a much safer and more natural technique to use with multithreading.
Feb 09 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:cubkj0$11qs$1 digitaldaemon.com...
 That's what COW is all about. The only downside of COW is that it is not
 enforced by the language. But then I'd argue the performance (and

 upside of COW outweigh the downside. But it is a judgement call for sure.

 think it would take a strong case to convince Walter at this point to

 COW. I think the lack of documentation about string literals contributed

 specific examples the OP ran into. D's behavior didn't surprise me at all

 the C heritage. In fact I would have been surprised if it didn't follow C

 default.

I'll give a fuller answer later, but I know some languages where the language implements COW for you. They get terribly inefficient very quickly once you start doing some heavy string manipulation. Doing COW efficiently means using it as a convention rather than a language enforced dogma.
Feb 09 2005
next sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <cudio9$2sog$1 digitaldaemon.com>, Walter says...
"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:cubkj0$11qs$1 digitaldaemon.com...
 That's what COW is all about. The only downside of COW is that it is not
 enforced by the language. But then I'd argue the performance (and

 upside of COW outweigh the downside. But it is a judgement call for sure.

 think it would take a strong case to convince Walter at this point to

 COW. I think the lack of documentation about string literals contributed

 specific examples the OP ran into. D's behavior didn't surprise me at all

 the C heritage. In fact I would have been surprised if it didn't follow C

 default.

I'll give a fuller answer later, but I know some languages where the language implements COW for you. They get terribly inefficient very quickly once you start doing some heavy string manipulation. Doing COW efficiently means using it as a convention rather than a language enforced dogma.

I'll bite, Walter. Whilst eagerly awaiting your expansion on this, you should note that *no-one* is suggesting the language implement CoW on one's behalf. That a patently ridiculous notion for a language like D -- so let's not even go there; please. We're simply looking for a way whereby the programmer is steered in the right direction by the compiler. Nothing more. There's a vast area of unchartered territory between "a language enforced dogma" and a "language directive". - Kris
Feb 09 2005
parent "Walter" <newshound digitalmars.com> writes:
"Kris" <Kris_member pathlink.com> wrote in message
news:cudlr5$2vkh$1 digitaldaemon.com...
 Whilst eagerly awaiting your expansion on this, you should note that

 suggesting the language implement CoW on one's behalf.

Good. I just want to make sure that idea is quite dead <g>.
Feb 09 2005
prev sibling parent reply Derek <derek psych.ward> writes:
On Wed, 9 Feb 2005 09:50:33 -0800, Walter wrote:

 "Ben Hinkle" <Ben_member pathlink.com> wrote in message
 news:cubkj0$11qs$1 digitaldaemon.com...
 That's what COW is all about. The only downside of COW is that it is not
 enforced by the language. But then I'd argue the performance (and

 upside of COW outweigh the downside. But it is a judgement call for sure.

 think it would take a strong case to convince Walter at this point to

 COW. I think the lack of documentation about string literals contributed

 specific examples the OP ran into. D's behavior didn't surprise me at all

 the C heritage. In fact I would have been surprised if it didn't follow C

 default.

I'll give a fuller answer later, but I know some languages where the language implements COW for you. They get terribly inefficient very quickly once you start doing some heavy string manipulation. Doing COW efficiently means using it as a convention rather than a language enforced dogma.

I suspect you have had much more experience in this area than I have, however one language that I use constantly, Euphoria, has CoW built-in to it. It is an interpreted language and still runs at only 5 times longer than equivalent D programs. So I guess that there are efficient and inefficient ways of implementing built-in CoW. -- Derek Melbourne, Australia
Feb 09 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Derek" <derek psych.ward> wrote in message
news:1h90azwbnucp2$.t14ekot1gd8i$.dlg 40tude.net...
 I'll give a fuller answer later, but I know some languages where the
 language implements COW for you. They get terribly inefficient very


 once you start doing some heavy string manipulation. Doing COW


 means using it as a convention rather than a language enforced dogma.

I suspect you have had much more experience in this area than I have, however one language that I use constantly, Euphoria, has CoW built-in to it. It is an interpreted language and still runs at only 5 times longer than equivalent D programs. So I guess that there are efficient and inefficient ways of implementing built-in CoW.

I know nothing about Euphoria, but try a loop over a string that reverses the string in place (or sorts the characters, or changes case on them). Language implemented COW features tend to reallocate/copy the string once for each character.
Feb 09 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 9 Feb 2005 14:46:06 -0800, Walter wrote:

 "Derek" <derek psych.ward> wrote in message
 news:1h90azwbnucp2$.t14ekot1gd8i$.dlg 40tude.net...
 I'll give a fuller answer later, but I know some languages where the
 language implements COW for you. They get terribly inefficient very


 once you start doing some heavy string manipulation. Doing COW


 means using it as a convention rather than a language enforced dogma.

I suspect you have had much more experience in this area than I have, however one language that I use constantly, Euphoria, has CoW built-in to it. It is an interpreted language and still runs at only 5 times longer than equivalent D programs. So I guess that there are efficient and inefficient ways of implementing built-in CoW.

I know nothing about Euphoria, but try a loop over a string that reverses the string in place (or sorts the characters, or changes case on them). Language implemented COW features tend to reallocate/copy the string once for each character.

Out of curiosity, I did this. I created a CoW version of a reverse function for both D and Euphoria. For a string of 780 utf32 characters, Euphoria was 57 times slower than D to reverse the string in place. I next quadrupled the string length and then Euphoria 24 slower than D. So the CoW aspect of Euphoria is not related to the number of characters to process. But this aside, the current CoW policy in D seems to be okay as it gives the coder/designer the best flexibility, at the cost of using one's brains ;-) If anyone is interested, the code for both can be found at http://www.users.bigpond.com/ddparnell/reverse_test.zip -- Derek Melbourne, Australia 10/02/2005 3:24:57 PM
Feb 10 2005
parent "Walter" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message
news:sdlwwvjw3ak2.vdhwdon81to1.dlg 40tude.net...
 I next quadrupled the string length and then Euphoria 24 slower than D. So
 the CoW aspect of Euphoria is not related to the number of characters to
 process.

Perhaps it uses reference counted strings.
Feb 10 2005