digitalmars.D - immutable strings, spec vs. reality

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (33/61) Feb 07 2005 (segfault / bus error)

Ben Hinkle (17/79) Feb 07 2005 Similar to C. That's why in C++ string literals have type const char*.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (22/39) Feb 07 2005 Wonder if there's a way to make string literals read-only
Kris (14/20) Feb 07 2005 I'd like to suggest the compiler support the notion of read-only variabl...

Regan Heath (21/39) Feb 07 2005 I agree we need to be able to say this variable cannot be written to (bo...

Carlos Santander B. (5/17) Feb 09 2005 I agree with this idea, and others have agreed too. But I really can't

Kris (23/40) Feb 09 2005 It has some merit, yet does not cover return values. If the syntax were ...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (13/28) Feb 09 2005 I believe there was a suggestion to allow "out" and "inout" return

Regan Heath (6/27) Feb 09 2005 I think I prefer using out/inout parameters to extending the return valu...
Kris (9/24) Feb 09 2005 Aye; but the latter are (by definition) open to mutation by the caller. ...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (19/36) Feb 09 2005 Good old C. Back when I learned it, we didn't use those pansy "const".

Kris (5/15) Feb 09 2005 You placed the type sig *inside* the parens? Where it was readable?

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/21) Feb 09 2005 I'm not *that* old, and the punch cards were actually my dads :-)

Regan Heath (29/66) Feb 09 2005 I'm doing a double-whammy reply here to both Carlos and Kris :)

Ben Hinkle (8/15) Feb 07 2005 I should add that a dlint program could do some flow analysis for simple...
Kris (22/25) Feb 07 2005 Just to clarify: Utext is immutable, and its subclass UString is /option...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (8/17) Feb 07 2005 Passing around pointers tends to be a tad faster than copying, as well.

Kris (16/26) Feb 07 2005 Aye; the assertion is that the callee cannot modify the internals of the...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (8/17) Feb 07 2005 I thought it was Java that was the PG-13 version, and that D *let* you

Kris (9/10) Feb 07 2005 AieeeeeeeEEEEE!! Never! Nooooooo; Noooooo! I'd rather chop down a tree w...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (10/16) Feb 07 2005 Now you're starting to sound like those people that refuse to recognize

Matthew (4/10) Feb 08 2005 Well, if you'll turn to page 227 in your copy of Imperfect C++ you'll

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/16) Feb 08 2005 Like you go on about that book of yours I will soon have to buy it ;-)

Matthew (6/25) Feb 08 2005 Aha! So it's working then ...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (31/40) Feb 08 2005 I can't say I'm crazy about going back to writing old-school "C"

Regan Heath (10/15) Feb 07 2005 On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris ...

Ben Hinkle (6/21) Feb 07 2005 I thought that would be "const char * const". Putting const after the p...

Regan Heath (14/37) Feb 07 2005 I'm not sure how it works with pointers in D, in the spirit of leaving

Ben Hinkle (9/47) Feb 07 2005 Sorry about the * vs [] - they should be the same in the C/C++ world. I'...

Regan Heath (23/34) Feb 07 2005 I assume you're referring to in/out contracts rather than in/out paramet...

Matthew (42/42) Feb 08 2005 To summarise this issue, am I correct in saying that:

pragma (20/45) Feb 08 2005 A few. ;)

Matthew (8/66) Feb 08 2005 Yes. But the important point is that it's 'literal writer beware'. It's

Kris (27/102) Feb 08 2005 The concept of read-only data is a powerful one. It's borne out in pract...

Matthew (30/159) Feb 08 2005 Indeed. I've just submitted my next "Flexible C++" instalment on that

Kris (6/9) Feb 08 2005 As far as D goes, "const" and "readonly" should be interchangeable. Cons...

Matthew (5/15) Feb 08 2005 We could, but I think it'd be nicer to keep const for constants.
John Reimer (5/22) Feb 08 2005 I really, really like the idea of a "readonly" keyword. It seems so

Matthew (10/34) Feb 08 2005 Also, Walter hates const, so if we can divorce him from that

Alex Stevenson (7/42) Feb 08 2005 I hate to muddy the water still further, but how about the final keyword...

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= (5/9) Feb 08 2005 "delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ?

Alex Stevenson (9/18) Feb 08 2005 Rape, pillage and the subjagation of innocent nations is part of my

Matthew (5/23) Feb 08 2005 Don't forget parliamentary democracy and that most civilised practice:

Alex Stevenson (11/36) Feb 08 2005 Cricket. You forgot cricket. :-P

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (14/20) Feb 08 2005 Slicing could be allowed, only that the slice is readonly too ?

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/19) Feb 08 2005 That should have read "you can set it to 1, but not back to 0".
Matthew (5/24) Feb 08 2005 Yes, but it'd be invisible to the programmer looking at code. Also, it'd...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (30/37) Feb 08 2005 But couldn't the compiler catch the obvious misuses ?

Regan Heath (47/49) Feb 08 2005 This is my preferred solution.

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (46/89) Feb 09 2005 I think you misunderstood something. "in" means you get a copy.

Regan Heath (80/166) Feb 09 2005 No, I understand how it works _now_ and I understand why it works that

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= (20/64) Feb 09 2005 I think it'll clean it up, but my faith in the trashman is limited.

Regan Heath (19/63) Feb 09 2005 Yes, a copy of the pointer is passed.

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= (8/23) Feb 09 2005 Well, it seems to be a general "bad habit" - judging from the code...

Ben Hinkle (8/17) Feb 08 2005 The only mention I can find in the spec is on the page about Memory

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (8/15) Feb 08 2005 String literals on Linux and Mac OS X, and probably other UNIX too

Matthew (14/28) Feb 08 2005 Well, either way, it must be the same, since D does not have

Ben Hinkle (18/49) Feb 08 2005 Well I wouldn't go so far to say it doesn't have implementation-defined

Kris (32/50) Feb 08 2005 I don't think anyone is suggesting abandoning the CoW, Ben. What's neede...

Ben Hinkle (9/57) Feb 08 2005 Again, that is the whole point of COW. You only copy when you write to i...

Kris (25/34) Feb 08 2005 I think we're actually saying the same thing, Ben;

Matthew (23/77) Feb 08 2005 That's a more cogent view (for you and for me). Well said.

Kris (9/73) Feb 08 2005 Aye -- but, for compile-time enforcement, surely only the type info is n...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (25/76) Feb 09 2005 Not necessarily. The "readonly" boolean could just as well go in

Matthew (10/89) Feb 09 2005 How? A slice doesn't point to typeinfo, or at least it didn't last time

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (10/17) Feb 09 2005 That is an interesting question, but I'm sure it can't be impossible...

Ben Hinkle (12/28) Feb 09 2005 Class objects have RTTI. Arrays and basic types do not.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (14/29) Feb 09 2005 But if you remove the current check, you also remove the speedup ?

Ben Hinkle (10/39) Feb 09 2005 I don't know how Walter is fixing it but it could end up copying more th...

Matthew (7/101) Feb 09 2005 To be honest, Kris, I've kind of lost the thread in this thread.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (20/34) Feb 09 2005 The thread basically boiled down to a discussion about two issues:

Derek Parnell (13/18) Feb 08 2005 c) you are provided with an array and might update parts of, depending ...

Kris (18/32) Feb 08 2005 I think that one *may* be covered by (b). Let me have a go at it:

Kris (8/46) Feb 09 2005 Ack! I'm tired, and doubling up on phrases;

Walter (6/11) Feb 09 2005 modify

Ben Hinkle (23/40) Feb 09 2005 [snip]

Ben Hinkle (5/8) Feb 09 2005 sorry for the double-post but I forgot to add, as I mentioned before, if...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/9) Feb 09 2005 Wouldn't that be a lot like adding (optional) warnings to the compiler ?

Kris (13/58) Feb 09 2005 What you appear to be saying here, is that the programmer can always wri...

Ben Hinkle (18/54) Feb 09 2005 uhh - I don't know where you got that from. I was giving an example in C...

Kris (23/60) Feb 09 2005 Ben, you trawled up an example where the callee had tried to be explicit...

Regan Heath (23/105) Feb 09 2005 IMO the contract said, I won't mutate parameter 2, I _might_ mutate

Matthew (18/133) Feb 09 2005 This is highly contentious stuff. I am a *very big* fan of const, and

Regan Heath (6/150) Feb 09 2005 Sorry, I wasn't aware of this position.

Walter (22/49) Feb 09 2005 even

Walter (12/20) Feb 09 2005 simplicity)

Kris (10/30) Feb 09 2005 I'll bite, Walter.

Walter (4/6) Feb 09 2005 *no-one* is

Derek (9/30) Feb 09 2005 I suspect you have had much more experience in this area than I have,

Walter (8/17) Feb 09 2005 quickly

Derek Parnell (17/36) Feb 10 2005 Out of curiosity, I did this. I created a CoW version of a reverse funct...

Walter (3/6) Feb 10 2005 Perhaps it uses reference counted strings.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

http://www.digitalmars.com/d/cppstrings.html says:

  In D, use the array slicing syntax in the natural manner:
 
 	char[] str = "hello";
 	str[1..2] = '?';		// str is "h??lo"
 

Okay, so one tries this little example on Linux/Darwin:

 import std.stdio;
 void main()
 {
   char[] str = "hello";
   str[1..2] = '?';
   writefln("%s", str);
 }

<kaboom> (segfault / bus error)

Okay, that's right. Forgot the small print on the compiler:

 Differences from Win32 version

 * String literals are read-only. Attempting to write to
   them will cause a segment violation.

Copy-on-Write* (a.k.a. duplicate before changing), I forgot:

    char[] str = "hello".dup;

"h?llo". Oh, that's right. Exclusive ranges, not inclusive:

 	str[1..3] = '?';		// str is "h??lo"

"h??lo". Finally! "in the natural manner", eh?    :-P


It becomes even funnier when using slices. Remember, those
are *not* copies, but just "another reference to the data".

 	char[] s1 = "hello world";
 	char[] s2 = s1[6 .. 11];	// s2 is "world"

So far, so good. Now it's just a matter of being careful:

  s2[3] = '?';

<kaboom>. Right, of course I meant to do a copy first...

   char[] s2 = s1[6 .. 11].dup;  // s2 is "world"
   s2[3] = '?';

This (Copy-on-Write) and the toStringz bugs (with NUL-term)
take a while of getting used to... Maybe it needs more of
copying-as-default-operation, or I need to be more careful. :-)

Either way, there needs to be more examples in the D spec...


Please note that I think that read-only-strings as well as
copy-on-write is a *good thing*. Making the default string
mutable is a design error, in my book... (like Dool does)
http://dool.sourceforge.net/dool_String_String.html

I prefer StringBuffer (Java) or NSMutableString (Objective-C)
and having the default strings immutable: String / NSString.
they it is faster (less copying) as well as being thread-safe.
(since immutable objects doesn't need any synchronizing...)
It also makes it easier to use literals and slices, in D.

Mango has some weird boolean flag instead, but that's OK too :-)
http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classUString.html
(but I think the idea is that UText is immutable and UString is mutable)

And of course, using "string" in D instead of "char[]" wouldn't hurt ?
(just as using "bool" instead of "bit" hasn't been all that painful...)

--anders


* C-o-W, as in http://www.digitalmars.com/d/phobos.html#string :
 When a function takes a string as a parameter, and returns a string, is
 that string the same as the input string, modified in place, or is it a
 modified copy of the input string? The D array convention is
 "copy-on-write". This means that if no modifications are done, the
 original string (or slices of it) can be returned. If any modifications
 are done, the returned string is a copy.

   Note that it says "can"... One can always dup, just to be safe.
   (something that I hope that std.string.toStringz can listen to)

Feb 07 2005

Ben Hinkle <Ben_member pathlink.com> writes:

In article <cu7m7d$1qur$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
http://www.digitalmars.com/d/cppstrings.html says:

  In D, use the array slicing syntax in the natural manner:
 
 	char[] str = "hello";
 	str[1..2] = '?';		// str is "h??lo"
 

Okay, so one tries this little example on Linux/Darwin:

 import std.stdio;
 void main()
 {
   char[] str = "hello";
   str[1..2] = '?';
   writefln("%s", str);
 }

<kaboom> (segfault / bus error)

Okay, that's right. Forgot the small print on the compiler:

 Differences from Win32 version

 * String literals are read-only. Attempting to write to
   them will cause a segment violation.


Similar to C. That's why in C++ string literals have type const char*.

Copy-on-Write* (a.k.a. duplicate before changing), I forgot:

    char[] str = "hello".dup;

"h?llo". Oh, that's right. Exclusive ranges, not inclusive:

 	str[1..3] = '?';		// str is "h??lo"

"h??lo". Finally! "in the natural manner", eh?    :-P

Any suggestions for improvement? The only ones that pop into my head are things
like "automatically dup all string literals before any module ctors run". Or
perhaps "make strings reference counted and automatically copy on write". I'd
rather keep the C behavior and save on startup speed for the first. For the
second it would fundamentally change string implementation and behavior and
probably would just trade one set of annoying behavior for another.

It becomes even funnier when using slices. Remember, those
are *not* copies, but just "another reference to the data".

This is a very important feature.

 	char[] s1 = "hello world";
 	char[] s2 = s1[6 .. 11];	// s2 is "world"

So far, so good. Now it's just a matter of being careful:

  s2[3] = '?';

<kaboom>. Right, of course I meant to do a copy first...

   char[] s2 = s1[6 .. 11].dup;  // s2 is "world"
   s2[3] = '?';

This (Copy-on-Write) and the toStringz bugs (with NUL-term)
take a while of getting used to... Maybe it needs more of
copying-as-default-operation, or I need to be more careful. :-)

The toStringz bug about NUL-term will be fixed and is independent of COW. 

Either way, there needs to be more examples in the D spec...

Please note that I think that read-only-strings as well as
copy-on-write is a *good thing*. Making the default string
mutable is a design error, in my book... (like Dool does)
http://dool.sourceforge.net/dool_String_String.html

Though the downside to having const and non-const strings is you end up
converting one to the other depending on what function you call, which would get
very annoying. Either that or the system develops a convention. For example in
Java you use StringBuffers to make strings but you use Strings to pass them
between functions. D's convention is COW. Either way users have to learn the
convention in order to use strings effectively.

I prefer StringBuffer (Java) or NSMutableString (Objective-C)
and having the default strings immutable: String / NSString.
they it is faster (less copying) as well as being thread-safe.
(since immutable objects doesn't need any synchronizing...)
It also makes it easier to use literals and slices, in D.

Mango has some weird boolean flag instead, but that's OK too :-)
http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classUString.html
(but I think the idea is that UText is immutable and UString is mutable)

And of course, using "string" in D instead of "char[]" wouldn't hurt ?
(just as using "bool" instead of "bit" hasn't been all that painful...)

--anders


* C-o-W, as in http://www.digitalmars.com/d/phobos.html#string :
 When a function takes a string as a parameter, and returns a string, is
 that string the same as the input string, modified in place, or is it a
 modified copy of the input string? The D array convention is
 "copy-on-write". This means that if no modifications are done, the
 original string (or slices of it) can be returned. If any modifications
 are done, the returned string is a copy.

   Note that it says "can"... One can always dup, just to be safe.
   (something that I hope that std.string.toStringz can listen to)

Feb 07 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Ben Hinkle wrote:

* String literals are read-only. Attempting to write to
  them will cause a segment violation.


 
 Similar to C. That's why in C++ string literals have type const char*.

Wonder if there's a way to make string literals read-only
on Windows too, to any avoid later surprises when porting ?

Either way, I think it should be made part of the D language:
"String literals are read-only". On all possible D platforms.

It would also offer some possibilities for string pooling...

 Any suggestions for improvement? The only ones that pop into my head are things
 like "automatically dup all string literals before any module ctors run". Or
 perhaps "make strings reference counted and automatically copy on write". I'd
 rather keep the C behavior and save on startup speed for the first. For the
 second it would fundamentally change string implementation and behavior and
 probably would just trade one set of annoying behavior for another.

The D behaviour is OK, I was just ranting a bit about that it
might need some more documentation before it becomes "natural" ?

Treating all "external" strings as read-only / immutable and then
using copy-on-write works, whether they're literals or parameters.

Not if it was clear from my writing, but eventually it did work :-)

 The toStringz bug about NUL-term will be fixed and is independent of COW. 

Yes, and I think I finally got to terms with when the strings
are NUL-terminated in the implementation and when they're not...

(in short: string literals are null terminated, and dynamic
arrays of chars could be as a side-effect but are not always)

 Though the downside to having const and non-const strings is you end up
 converting one to the other depending on what function you call, which would
get
 very annoying. Either that or the system develops a convention. For example in
 Java you use StringBuffers to make strings but you use Strings to pass them
 between functions. D's convention is COW. Either way users have to learn the
 convention in order to use strings effectively.

Usually mutable strings inherit from immutable strings, so that you
can use them directly. Either that, or there is a creating method.
(such as java.lang.StringBuffer.toString(), for that language)

So it's basically the same as D, except that the others use classes
instead of built-in arrays and that they use UTF-16 instead of UTF-8 ?
(and I believe that D's choices does have it's merits, and is just fine)

--anders

Feb 07 2005

Kris <Kris_member pathlink.com> writes:

In article <cu7pg8$221r$1 digitaldaemon.com>, Ben Hinkle says...
Any suggestions for improvement? The only ones that pop into my head are things
like "automatically dup all string literals before any module ctors run". Or
perhaps "make strings reference counted and automatically copy on write". I'd
rather keep the C behavior and save on startup speed for the first. For the
second it would fundamentally change string implementation and behavior and
probably would just trade one set of annoying behavior for another.

I'd like to suggest the compiler support the notion of read-only variables. That
is, a "const char[]" is read-only, and any attempt to write it results in a
compile-time error. Such variables cannot be implicitly cast to non-const.

Given that; string literals might be implicitly tagged as read-only (effectively
a "const char[]"). This maintains full code-speed, whilst endowing the compiler
with some very useful functionality. 

Example: Mango.icu.UText is supposedly an immutable object. However, it must
permit access to its content as a wchar[]. Without support for such read-only
arrays ("const wchar[]"), UText is really /not/ immutable at all. That is, D is
missing some fundamental support for creating immutable object. In the case of
UText, one might argue to return a copy of the content instead ... I won't even
comment upon that notion :-)

- Kris

Feb 07 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 7 Feb 2005 18:26:28 +0000 (UTC), Kris <Kris_member pathlink.com>  
wrote:
 In article <cu7pg8$221r$1 digitaldaemon.com>, Ben Hinkle says...
 Any suggestions for improvement? The only ones that pop into my head  
 are things
 like "automatically dup all string literals before any module ctors  
 run". Or
 perhaps "make strings reference counted and automatically copy on  
 write". I'd
 rather keep the C behavior and save on startup speed for the first. For  
 the
 second it would fundamentally change string implementation and behavior  
 and
 probably would just trade one set of annoying behavior for another.

 I'd like to suggest the compiler support the notion of read-only  
 variables. That
 is, a "const char[]" is read-only, and any attempt to write it results  
 in a
 compile-time error. Such variables cannot be implicitly cast to  
 non-const.

I agree we need to be able to say this variable cannot be written to (both  
reference and contents). But, we don't want to go down the path of  
specifying which parameters to functions are const, eg.

const char[] bar = "test";
void foo(char[] s) {
}
foo(bar); //error cannot cast "const char[]" to "char[]"

Isn't that the C/C++ 'const' thing all over again?


That said, I cannot see how we can have const variables without something  
indicating what a function is going to do with it's variables ... but hold  
on, aren't we already doing it?

I have made a suggestion before and I still think it's a good idea, why  
not use the in/out/inout parameter specifiers to enforce const'ness. i.e.

'in' (default) = cannot be written to.
'out'   = can be written to, initialised upon entry.
'inout' = can be written to, not init upon entry.

Meaning if a function has 'out' or 'inout' it's clearly saying I need to  
write to this, so passing a const char[] will cause a compile error.


Regan

Feb 07 2005

"Carlos Santander B." <csantander619 gmail.com> writes:

Regan Heath wrote:
 I have made a suggestion before and I still think it's a good idea, why  
 not use the in/out/inout parameter specifiers to enforce const'ness. i.e.
 
 'in' (default) = cannot be written to.
 'out'   = can be written to, initialised upon entry.
 'inout' = can be written to, not init upon entry.
 
 Meaning if a function has 'out' or 'inout' it's clearly saying I need 
 to  write to this, so passing a const char[] will cause a compile error.
 
 
 Regan

I agree with this idea, and others have agreed too. But I really can't 
recall Walter ever saying something about it.

_______________________
Carlos Santander Bernal

Feb 09 2005

Kris <Kris_member pathlink.com> writes:

In article <cue0ed$9fs$1 digitaldaemon.com>, Carlos Santander B. says...
Regan Heath wrote:
 I have made a suggestion before and I still think it's a good idea, why  
 not use the in/out/inout parameter specifiers to enforce const'ness. i.e.
 
 'in' (default) = cannot be written to.
 'out'   = can be written to, initialised upon entry.
 'inout' = can be written to, not init upon entry.
 
 Meaning if a function has 'out' or 'inout' it's clearly saying I need 
 to  write to this, so passing a const char[] will cause a compile error.
 
 
 Regan

I agree with this idea, and others have agreed too. But I really can't 
recall Walter ever saying something about it.

_______________________
Carlos Santander Bernal


It has some merit, yet does not cover return values. If the syntax were somehow
extended to return-values, then there might be something.

But then there's the long-standing problem between inout and read-only structs: 

One often uses structs to gather read-only reference-data together. Such
read-only data would typically be placed in ROM, for any device that has that
kind of memory. 

How does one pass this data to a function? Well, you can pass a copy of it on
the stack. That's hardly a viable solution when the read-only data is an entire
font description, along with all the splines, hints, and so on :-}

The other option (and the typical resolution) is to pass the read-only data by
reference. Unfortunately, D suports this only via an 'inout' argument - which
conflicts sharply with the notion of read-only. The compiler complained, and
rightly so, that const (read-only) struct could not be mapped onto an inout
parameter. Therein lies a big hole regarding the simplified pass-by-reference
semantics.

I ran into this about nine months ago with const structs, and posted about the
conflict at that time. The only sane way around it is to remove the const
attribute from the struct. Thus D loses badly to C in terms of viability as an
embedded-solutions language.

I know the 'inout' problem with structs is somewhat of a sidetrack here; but it
is related, so was worth noting.

- Kris

Feb 09 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Kris wrote:

 It has some merit, yet does not cover return values. If the syntax were somehow
 extended to return-values, then there might be something.

I believe there was a suggestion to allow "out" and "inout" return 
values too, in order to make them behave more like C++ references.

But I don't think there ever came anything out of it. The runtime
seems to be using (in)out arguments, instead of returning values.

 One often uses structs to gather read-only reference-data together. Such
 read-only data would typically be placed in ROM, for any device that has that
 kind of memory. 
 
 How does one pass this data to a function? Well, you can pass a copy of it on
 the stack. That's hardly a viable solution when the read-only data is an entire
 font description, along with all the splines, hints, and so on :-}

You pass it with a pointer, and a friendly post-it saying "Don't Touch"

This is similar to how I can use functions that have char[] parameters:
I can either pass "hello" and hope they use Copy-on-Write like they 
should (because if they try to write to the literal, they'll segfault)
or I make a .dup before, because I distrust that particular function

 The other option (and the typical resolution) is to pass the read-only data by
 reference. Unfortunately, D suports this only via an 'inout' argument - which
 conflicts sharply with the notion of read-only. The compiler complained, and
 rightly so, that const (read-only) struct could not be mapped onto an inout
 parameter. Therein lies a big hole regarding the simplified pass-by-reference
 semantics.

Seems to be exactly the same case as with just using pointers directly.
(then again, isn't that just want the references are, pointers hiding?)

--anders

Feb 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 09 Feb 2005 23:56:54 +0100, Anders F Bj�rklund <afb algonet.se>  
wrote:
 Kris wrote:

 It has some merit, yet does not cover return values. If the syntax were  
 somehow
 extended to return-values, then there might be something.

 I believe there was a suggestion to allow "out" and "inout" return  
 values too, in order to make them behave more like C++ references.

 But I don't think there ever came anything out of it. The runtime
 seems to be using (in)out arguments, instead of returning values.

I think I prefer using out/inout parameters to extending the return value  
in this way, I think out/inout parameters are more flexible.

 One often uses structs to gather read-only reference-data together. Such
 read-only data would typically be placed in ROM, for any device that  
 has that
 kind of memory.  How does one pass this data to a function? Well, you  
 can pass a copy of it on
 the stack. That's hardly a viable solution when the read-only data is  
 an entire
 font description, along with all the splines, hints, and so on :-}

 You pass it with a pointer, and a friendly post-it saying "Don't Touch"

 This is similar to how I can use functions that have char[] parameters:
 I can either pass "hello" and hope they use Copy-on-Write like they  
 should (because if they try to write to the literal, they'll segfault)
 or I make a .dup before, because I distrust that particular function

If 'in' was enforced, i.e. the variable could not be written to, then you  
could pass without the dup, safely and confidently.

Feb 09 2005

Kris <Kris_member pathlink.com> writes:

In article <cue4fn$dc2$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Kris wrote:

 It has some merit, yet does not cover return values. If the syntax were somehow
 extended to return-values, then there might be something.

I believe there was a suggestion to allow "out" and "inout" return 
values too, in order to make them behave more like C++ references.

But I don't think there ever came anything out of it. The runtime
seems to be using (in)out arguments, instead of returning values.

Aye; but the latter are (by definition) open to mutation by the caller. I'm just
pointing that out for folks who might not have noted the distinction :~}

 One often uses structs to gather read-only reference-data together. Such
 read-only data would typically be placed in ROM, for any device that has that
 kind of memory. 
 
 How does one pass this data to a function? Well, you can pass a copy of it on
 the stack. That's hardly a viable solution when the read-only data is an entire
 font description, along with all the splines, hints, and so on :-}

You pass it with a pointer, and a friendly post-it saying "Don't Touch"

Right. Heh! There's the rub; you declare the structs as const, and then cast
them over as a non-const * argument -- exactly what we ought to get a smack on
the hand for -- perhaps we should endevour to get away from that kind of
behaviour, and I understand that's what D is largely about :-)

- Kris

Feb 09 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Kris wrote:

You pass it with a pointer, and a friendly post-it saying "Don't Touch"

 
 Right. Heh! There's the rub; you declare the structs as const, and then cast
 them over as a non-const * argument -- exactly what we ought to get a smack on
 the hand for -- perhaps we should endevour to get away from that kind of
 behaviour, and I understand that's what D is largely about :-)

Good old C. Back when I learned it, we didn't use those pansy "const".
And none of this size_t and other portable crap, that wasn't defined.

int strlen(char *s);

Of course, using the stdlib was for wusses too, so you usually ended
up with some square wheel function like: (hidden by a macro or two)

char *p = s;
while (*p++);
int len = p - s;

Or just write in in assembler. Or punch cards or something like that.
Glad that D let's me relive all these nostalgical computing moments.


Seriously, though. It's an improvement. Just not as big as I hoped ?
Which is too bad, since I'd hoped to keep avoiding C++ a while longer.

I'll just keep using the trust method and hope someone comes up with
something clever. But I'm kinda tired of this long discussion by now,
so I'll let it rest for a week. But I hope string literals become R/O!

Microsoft Visual C++:
 The /GF option enables the compiler to pool strings and place them in
 read-only memory. By placing the strings in read-only memory, the
 operating system does not need to swap that portion of memory. Instead,
 it can read the strings back from the image file. It is a good idea to
 do this as it saves pages of memory from being written to and therefore
 reduces the working set used by the application. In addition, it allows
 those pages to be shared between multiple instances of the process that
 use that image file (.exe or .dll file), further reducing total memory
 usage in the entire system. Strings placed in read-only memory cannot be
 modified; if you try to modify them, you will see an Application Error
 dialog box.

They are already on my preferred systems, so I'll let you catch up. :-)

--anders

Feb 09 2005

Kris <Kris_member pathlink.com> writes:

In article <cue7jd$h35$2 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Kris wrote:

You pass it with a pointer, and a friendly post-it saying "Don't Touch"

 
 Right. Heh! There's the rub; you declare the structs as const, and then cast
 them over as a non-const * argument -- exactly what we ought to get a smack on
 the hand for -- perhaps we should endevour to get away from that kind of
 behaviour, and I understand that's what D is largely about :-)

Good old C. Back when I learned it, we didn't use those pansy "const".
And none of this size_t and other portable crap, that wasn't defined.

int strlen(char *s);

You placed the type sig *inside* the parens? Where it was readable? 

Ehhhh ... ya pansy ...  

:~)

Feb 09 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Kris wrote:

Good old C. Back when I learned it, we didn't use those pansy "const".
And none of this size_t and other portable crap, that wasn't defined.

int strlen(char *s);

 
 You placed the type sig *inside* the parens? Where it was readable? 
 
 Ehhhh ... ya pansy ...  
 
 :~)

I'm not *that* old, and the punch cards were actually my dads :-)

Besides, the K & R declaration of arguments is the only explanation
of their brace style, that uses one set for functions and one for ifs...

int strlen(s)
char *s;
{
   if (0) {
   }
}

--anders

Feb 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

I'm doing a double-whammy reply here to both Carlos and Kris :)

On Wed, 9 Feb 2005 22:39:14 +0000 (UTC), Kris <Kris_member pathlink.com>  
wrote:
 In article <cue0ed$9fs$1 digitaldaemon.com>, Carlos Santander B. says...
 Regan Heath wrote:
 I have made a suggestion before and I still think it's a good idea, why
 not use the in/out/inout parameter specifiers to enforce const'ness.  
 i.e.

 'in' (default) = cannot be written to.
 'out'   = can be written to, initialised upon entry.
 'inout' = can be written to, not init upon entry.

 Meaning if a function has 'out' or 'inout' it's clearly saying I need
 to  write to this, so passing a const char[] will cause a compile  
 error.


 Regan

 I agree with this idea, and others have agreed too. But I really can't
 recall Walter ever saying something about it.



--Carlos--

Neither can I.

--Kris--

 It has some merit, yet does not cover return values. If the syntax were  
 somehow
 extended to return-values, then there might be something.

Or, could we decide that it wasn't worth the effort/complexity/etc to  
extend it to return values and have functions use out or inout instead?

I'm am just speculating, I have no idea whether it is or isn't worth the  
effort.

I find the out and inout idea more flexible than a return value, which I  
would tend to use for a true/false pass/fail concept in most cases.

I can see how with a return value the following code:

foo(a);
bar(a);

can be re-written as

bar(foo(a));

the question is, is that an advantage or a disadvantage? is this advantage  
worth the effort of extending the idea to return values?

 But then there's the long-standing problem between inout and read-only  
 structs:

 One often uses structs to gather read-only reference-data together. Such
 read-only data would typically be placed in ROM, for any device that has  
 that
 kind of memory.

 How does one pass this data to a function? Well, you can pass a copy of  
 it on
 the stack. That's hardly a viable solution when the read-only data is an  
 entire
 font description, along with all the splines, hints, and so on :-}

If what you're saying is that the current implementation of 'in' which  
copies the parameter is a bad way to pass a struct. I agree.

How about if 'in' mean't 'readonly'? The compiler could enforce that  
(compile time and/or runtime) then 'in' could pass the struct by reference.

I understand that in some situations passing a copy is actually faster  
than a reference, in which case the compiler can still choose to pass a  
copy, right?

<snip>

 I know the 'inout' problem with structs is somewhat of a sidetrack here;  
 but it
 is related, so was worth noting.

I agree, it's related, and is an important part of the overall soln IMO.

Regan

Feb 09 2005

Ben Hinkle <Ben_member pathlink.com> writes:

* C-o-W, as in http://www.digitalmars.com/d/phobos.html#string :
 When a function takes a string as a parameter, and returns a string, is
 that string the same as the input string, modified in place, or is it a
 modified copy of the input string? The D array convention is
 "copy-on-write". This means that if no modifications are done, the
 original string (or slices of it) can be returned. If any modifications
 are done, the returned string is a copy.


I should add that a dlint program could do some flow analysis for simple cases
and generate recommendations when COW is not being obeyed. For example if an
input to a function is changed in-place then dlint could flag that. It shouldn't
be an error since passing char[] as buffers to be filled is a common practise.
Dlint could also flag if a string literal is changed in-place. One problem with
such recommendations, though, is that they can be hard to track. I would expect
dlint to only do such flow analysis within a given function and passing a string
literal to another function as a buffer would be beyond such simple analysis.

Feb 07 2005

Kris <Kris_member pathlink.com> writes:

In article <cu7m7d$1qur$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Mango has some weird boolean flag instead, but that's OK too :-)
http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classUString.html
(but I think the idea is that UText is immutable and UString is mutable)

Just to clarify: Utext is immutable, and its subclass UString is /optionally/
mutable. The latter has a boolean flag to indicate whether it's safe to alias
the assigned content (where said content is already immutable). UString defaults
to assuming the content is mutable, and will therefore copy the assigned
content.

Note that UString can be passed in place of a UText argument, but not the other
way around.

I'm totally with you on this, Anders. Immutable objects, backed up by their
mutable variation, are the way to go for mutli-threaded apps. Heck, any design
can benefit (in terms of robustness and determinism) from appropriate usage of
immutable objects.

Note, however, that D does not really suppport immutable arrays per se. For
example, one cannot declare a "const char[]" and expect the compiler to toss an
error wherever an assignment is made. Walter has been asked to provide support
for such notions (read-only variables) but there's been no movement on that as
yet.

Please note that read-only variables are not the same issue as the generic usage
of 'const' within C++!! That's a different ball-game altogether and much, much,
harder to implement.

- Kris

Feb 07 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Kris wrote:

 I'm totally with you on this, Anders. Immutable objects, backed up by their
 mutable variation, are the way to go for mutli-threaded apps. Heck, any design
 can benefit (in terms of robustness and determinism) from appropriate usage of
 immutable objects.

Passing around pointers tends to be a tad faster than copying, as well.

But the inheritance way is somewhat dangerous, when it comes to threads.
(there's always a risk the "immutable" copy is a mutable in disguise...)

 Note, however, that D does not really suppport immutable arrays per se. For
 example, one cannot declare a "const char[]" and expect the compiler to toss an
 error wherever an assignment is made. Walter has been asked to provide support
 for such notions (read-only variables) but there's been no movement on that as
 yet.

No, Copy-on-Write uses the honor system more than any actual checks ?

Read-only variables (similar to C++ "const") sound like a neat idea...
I'd settle for read-only string literals, as a small start towards it.

--anders

Feb 07 2005

Kris <Kris_member pathlink.com> writes:

In article <cu8cog$62m$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
But the inheritance way is somewhat dangerous, when it comes to threads.
(there's always a risk the "immutable" copy is a mutable in disguise...)

Aye; the assertion is that the callee cannot modify the internals of the object
passed to it. It does not account for the case where the caller manipulates said
Object concurrently with the invocation of said callee. For that, /both/ parties
have to agree on immutability.

Luckily, the vast majority of cases fall into the former camp (in my
experience), so we can allow for some flexibility via the subclassing mechanism.
This tends to avoid the backlash that some have regarding Java strings (in terms
of object reconstruction for passing to a "I'm a safe procedure!" callee).

Of course, that just my opinion :-)


 Note, however, that D does not really suppport immutable arrays per se. For
 example, one cannot declare a "const char[]" and expect the compiler to toss an
 error wherever an assignment is made. Walter has been asked to provide support
 for such notions (read-only variables) but there's been no movement on that as
 yet.

No, Copy-on-Write uses the honor system more than any actual checks ?

That's correct. Which smacks of total hypocrisy given Walters recent 'urgent
claims' over how D protects the programmer from themselves :-) 


Read-only variables (similar to C++ "const") sound like a neat idea...
I'd settle for read-only string literals, as a small start towards it.

Agreed. I'd just like to see it done in an extensible and forward thinking
manner

- Kris

Feb 07 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Kris wrote:

No, Copy-on-Write uses the honor system more than any actual checks ?

 
 That's correct. Which smacks of total hypocrisy given Walters recent 'urgent
 claims' over how D protects the programmer from themselves :-) 

I thought it was Java that was the PG-13 version, and that D *let* you
do adult stuff like goto or asm or pointers or mixing ints and booleans

:-D

Read-only variables (similar to C++ "const") sound like a neat idea...
I'd settle for read-only string literals, as a small start towards it.

 
 Agreed. I'd just like to see it done in an extensible and forward thinking
 manner

With implicit casts from char[] to (char*), I wouldn't hold my breath ?
Just want everyone's Windows literals to crash, like my Mac ones do. :-)

In your case, I'd just warm up my "return contents.dup;" workarounds...

--anders

Feb 07 2005

Kris <Kris_member pathlink.com> writes:

In article <cu8fvf$csa$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
In your case, I'd just warm up my "return contents.dup;" workarounds...


AieeeeeeeEEEEE!! Never! Nooooooo; Noooooo! I'd rather chop down a tree with, a
Herring!


Truthfully, I don't want Mango libraries getting a reputation for being
inneficient, just to work around a glaring omission within an alpha language.
One would hope Walter will recognize the validity of read-only vars, and do
something about that instead.


(keep the pressure on!)

Feb 07 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Kris wrote:

 AieeeeeeeEEEEE!! Never! Nooooooo; Noooooo!
 I'd rather chop down a tree with, a Herring!

Now you're starting to sound like those people that refuse to recognize
that false and 0 and null are the same thing ? They made the same kinds
of noises, when it was "decided" on the C/integer logic that D use. :-)
(but at least one *can* use bool and true and false, and make-believe ?)

 Truthfully, I don't want Mango libraries getting a reputation for being
 inneficient, just to work around a glaring omission within an alpha language.
 One would hope Walter will recognize the validity of read-only vars, and do
 something about that instead.

Actually, to keep within the D spirit you should probably pass around
char[] like everyone else instead of this "wchar[]-in-a-Class" stuff ;-)
Never mind that Phobos only has library support for ASCII strings...
(as in, when it comes to things like strlen and toupper and whatnot)

--anders

Feb 07 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:cu8jvo$jto$1 digitaldaemon.com...
 Kris wrote:

 AieeeeeeeEEEEE!! Never! Nooooooo; Noooooo!
 I'd rather chop down a tree with, a Herring!

 Now you're starting to sound like those people that refuse to 
 recognize
 that false and 0 and null are the same thing ?

Well, if you'll turn to page 227 in your copy of Imperfect C++ you'll 
see that NULL need not be the same as 0, and the usefulness of that. <g>

Feb 08 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Matthew wrote:

Now you're starting to sound like those people that refuse to 
recognize that false and 0 and null are the same thing ?

 
 Well, if you'll turn to page 227 in your copy of Imperfect C++ you'll 
 see that NULL need not be the same as 0, and the usefulness of that. <g>

Like you go on about that book of yours I will soon have to buy it ;-)

Walter has explained that they all mean "low voltage" or "open gate"
(I forgot which one it was, could have been empty radio tube or no rock)

So there is no need for D to separate between them, for things like ifs?
Which means to write (!object) instead of the bulky (!(object is null))

YIN: if(false), if(0), if(null)
YANG: if(true), if(1), if(this)

Whether or not I think it's a good idea doesn't matter, since it
doesn't seem to be changing... And "since it worked for C / C++"

--anders

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:cub8hb$kle$1 digitaldaemon.com...
 Matthew wrote:

Now you're starting to sound like those people that refuse to 
recognize that false and 0 and null are the same thing ?

 Well, if you'll turn to page 227 in your copy of Imperfect C++ you'll 
 see that NULL need not be the same as 0, and the usefulness of that. 
 <g>

 Like you go on about that book of yours I will soon have to buy it ;-)

Aha! So it's working then ...

 Walter has explained that they all mean "low voltage" or "open gate"
 (I forgot which one it was, could have been empty radio tube or no 
 rock)

 So there is no need for D to separate between them, for things like 
 ifs?
 Which means to write (!object) instead of the bulky (!(object is 
 null))

 YIN: if(false), if(0), if(null)
 YANG: if(true), if(1), if(this)

 Whether or not I think it's a good idea doesn't matter, since it
 doesn't seem to be changing... And "since it worked for C / C++"

Well, I know that the no-implicitly-boolean sub-expressions is never 
gonna fly, so I won't bother to explain why I don't use 'em. (That's 
also in the book <g>)

Feb 08 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Matthew wrote:

Like you go on about that book of yours I will soon have to buy it ;-)

 
 Aha! So it's working then ...

Even if I don't use C++ it's a perfectly legitimate business expense :-)

Whether or not I think it's a good idea doesn't matter, since it
doesn't seem to be changing... And "since it worked for C / C++"

 
 Well, I know that the no-implicitly-boolean sub-expressions is never 
 gonna fly, so I won't bother to explain why I don't use 'em. (That's 
 also in the book <g>)

I can't say I'm crazy about going back to writing old-school "C"
boolean expressions again; but then again if (!(object is null))
is going to be a real eye-sore, now that (object !== null) seems
to have been deprecated due to being confused with regular !=

And since "isnot" probably won't fly either, then that leaves:
assert(object);


And of course then there is the need to use wbit and dbit,
when writing a) things that need pointers or b) overloads:

wbit[] array = new wbit[8192];
wbit* p = &array[42]; *p = true;

dbit opEquals(Object o);

They could even be faster than the regular old bit type, since
they avoid the masking and shifting the other one could need ?


Probably even more fun for beginners than the string types,
char[] wchar[] dchar[]

But I guess "alias char[] str; alias wchar[] ustr;" could
be made to work, just as "alias bit bool;" have already ?


The "bool" (boolean) implementation details of bit/wbit/dbit
and "str" (string) of char[]/wchar[]/dchar[] could be saved
for the more advanced D tutorials, when the first ones get nasty:

- "What do you mean I can't slice my bool arrays how I want ?"
- "What do you mean with I must use dchar to foreach my str ?"

At least they have a common theme: Zero-is-False and Unicode.

--anders

PS. I changed my old suggestion of "string", since C++ people
     keep mixing that alias up with the old std::string class.
     The new name, in spirit of char and int, is just: "str".

     int main(str[] args);
     void main();

Feb 08 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris <Kris_member pathlink.com>  
wrote:

<snip>

 Note, however, that D does not really suppport immutable arrays per se.  
 For example, one cannot declare a "const char[]" and expect the compiler  
 to toss an error wherever an assignment is made. Walter has been asked  
 to provide support for such notions (read-only variables) but there's  
 been no movement on that as yet.

Last time I checked "const char[]" made the char[] 'reference' const, not  
the contents of the array.

I agree some mechanism for specifying const data would be useful.

In my experience linux and windows treat static strings differently, just  
the other day I found a bug caused by writing to a static string, it was  
working fine on windows :)

Regan

Feb 07 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opslugtqx523k2f5 ally...
 On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris <Kris_member pathlink.com> 
 wrote:

 <snip>

 Note, however, that D does not really suppport immutable arrays per se. 
 For example, one cannot declare a "const char[]" and expect the compiler 
 to toss an error wherever an assignment is made. Walter has been asked 
 to provide support for such notions (read-only variables) but there's 
 been no movement on that as yet.

 Last time I checked "const char[]" made the char[] 'reference' const, not 
 the contents of the array.

I thought that would be "const char * const".  Putting const after the ptr 
makes the ptr const and putting it before makes the contents const. I love 
C++ member functions that look like
 const char * foo(const char * const x) const;

 I agree some mechanism for specifying const data would be useful.

 In my experience linux and windows treat static strings differently, just 
 the other day I found a bug caused by writing to a static string, it was 
 working fine on windows :)

 Regan

Feb 07 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 7 Feb 2005 15:55:11 -0500, Ben Hinkle <bhinkle mathworks.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opslugtqx523k2f5 ally...
 On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris <Kris_member pathlink.com>
 wrote:

 <snip>

 Note, however, that D does not really suppport immutable arrays per se.
 For example, one cannot declare a "const char[]" and expect the  
 compiler
 to toss an error wherever an assignment is made. Walter has been asked
 to provide support for such notions (read-only variables) but there's
 been no movement on that as yet.

 Last time I checked "const char[]" made the char[] 'reference' const,  
 not
 the contents of the array.

 I thought that would be "const char * const".  Putting const after the  
 ptr
 makes the ptr const and putting it before makes the contents const.

I'm not sure how it works with pointers in D, in the spirit of leaving  
them behind where at all possible I've never tried it :)

So how would the syntax look for a char[]?

const char[] foo = "this is a const reference";
char[] const foo = "this is const data";
const char[] const foo = "this is immutable";

Is there any point to "char[] const foo" i.e. the data is immutable but  
the reference may be changed.. wouldn't that mean it was possible to  
'loose' track of where the const data was? would it then be collected by  
the GC, or should const data hang round till program termination?

 I love
 C++ member functions that look like
  const char * foo(const char * const x) const;

Do I detect a hint of sarcasm there... :)

Regan

Feb 07 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsluj1ea523k2f5 ally...
 On Mon, 7 Feb 2005 15:55:11 -0500, Ben Hinkle <bhinkle mathworks.com> 
 wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opslugtqx523k2f5 ally...
 On Mon, 7 Feb 2005 18:15:13 +0000 (UTC), Kris <Kris_member pathlink.com>
 wrote:

 <snip>

 Note, however, that D does not really suppport immutable arrays per se.
 For example, one cannot declare a "const char[]" and expect the 
 compiler
 to toss an error wherever an assignment is made. Walter has been asked
 to provide support for such notions (read-only variables) but there's
 been no movement on that as yet.

 Last time I checked "const char[]" made the char[] 'reference' const, 
 not
 the contents of the array.

 I thought that would be "const char * const".  Putting const after the 
 ptr
 makes the ptr const and putting it before makes the contents const.

 I'm not sure how it works with pointers in D, in the spirit of leaving 
 them behind where at all possible I've never tried it :)

 So how would the syntax look for a char[]?

 const char[] foo = "this is a const reference";
 char[] const foo = "this is const data";
 const char[] const foo = "this is immutable";

 Is there any point to "char[] const foo" i.e. the data is immutable but 
 the reference may be changed.. wouldn't that mean it was possible to 
 'loose' track of where the const data was? would it then be collected by 
 the GC, or should const data hang round till program termination?

Sorry about the * vs [] - they should be the same in the C/C++ world. I'm 
just more used to writing const char * than const char[] in C++. That whole 
extra character wears out my pinky. Maybe I need to work out some more :-)

 I love
 C++ member functions that look like
  const char * foo(const char * const x) const;

 Do I detect a hint of sarcasm there... :)

guilty. In some cases D's in/out invariants are better than const since it's 
obvious what they mean and in/out invariants can check for more complex 
invariants than just const-ness. The downside is that they are applied at 
run-time and they don't apply within a function body.

 Regan

Feb 07 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 7 Feb 2005 17:11:03 -0500, Ben Hinkle <bhinkle mathworks.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 I love
 C++ member functions that look like
  const char * foo(const char * const x) const;

 Do I detect a hint of sarcasm there... :)

 guilty. In some cases D's in/out invariants are better than const since  
 it's
 obvious what they mean and in/out invariants can check for more complex
 invariants than just const-ness. The downside is that they are applied at
 run-time and they don't apply within a function body.

I assume you're referring to in/out contracts rather than in/out parameter  
specifiers.

I'd prefer if const-ness was enforced by the parameter specifiers eg.

void foo(in int a, out int b, inout int c) {
   a = 5; // error;
   b = 5; // ok;
   c = 5; // ok;
}

void main() {
   char[] const a = "a";
   char[] b = "b";

   foo(a,b,b); // ok.
   foo(a,a,b); // error 'a' is const.
   foo(a,b,a); // error 'a' is const.
}

in other words the 'in' parameter specifier is a contract stating I will  
not modify this reference or it's data. or perhaps we need to seperate  
those two, leading us to:

void foo(in int in a, ..)

yuck.

Regan

Feb 07 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

To summarise this issue, am I correct in saying that:

    writing to a slice may fail if somewhere along the way, through N 
slicings, the original string is a literal
    this failure occurs on Linux, because GDC puts literals in a 
read-only segment. It does _not_ fail on Win32 because they're in a 
writeable segment

Questions:

    does the language prescribe the Linux behaviour or the Win32 
behaviour? (It cannot leave it undefined, since D does not have 
undefineds)

Intermediate measures:

    make the Win32 compiler put in read-only segment

Possible solutions are

    leave it as is, with consequence of buyer beware, loss of the 
wonderful efficiency of slices, etc. etc. :-(
    make literals be 'const' somehow. This is likely to be a huge change 
to the language, and take us down the const road, which we know's been 
ruled out anyway.
    have it part of the language that literals are writeable. That would 
lead to only a tiny decrease in efficiency as people would need to dup 
their literals before passing them into functions which may alter them.



IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour is 
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to have 
the crashes like the Linux does, so we get a feel for what this problem 
is like

I think the solution is, weird as it sounds, is to make literals 
writeable. This, of course, depends on whether literals are folded (i.e. 
the same literal in two separate places in code actually refer to the 
same bit of memory after compilation/linking). Since literals are 
generally a bad thing, we should not be using them often. Given that, 
maybe we can salvage this situation by saying that literals are *not* 
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental 
differences between platforms, we cannot have code that may or may not 
crash depending on whether the thing way up the call stack is a literal 
or not, and we *should not* lose the marvellous effeciency afforded by 
slices. (Slices are the best thing since sliced bread.)

Thoughts?

Cheers

Matthew

Feb 08 2005

pragma <pragma_member pathlink.com> writes:

In article <cub6in$i6d$1 digitaldaemon.com>, Matthew says...
IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour is 
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to have 
the crashes like the Linux does, so we get a feel for what this problem 
is like

I think the solution is, weird as it sounds, is to make literals 
writeable. This, of course, depends on whether literals are folded (i.e. 
the same literal in two separate places in code actually refer to the 
same bit of memory after compilation/linking). Since literals are 
generally a bad thing, we should not be using them often. Given that, 
maybe we can salvage this situation by saying that literals are *not* 
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental 
differences between platforms, we cannot have code that may or may not 
crash depending on whether the thing way up the call stack is a literal 
or not, and we *should not* lose the marvellous effeciency afforded by 
slices. (Slices are the best thing since sliced bread.)

Thoughts?

A few. ;)

I'm with you in that a decision needs to be made to keep GDC in step with DMD.
As to "which platform has the bug?", I dont' know quite yet.  

I find myself leaning toward requiring literals to be given a 'const char[]'
style type so that they're obviously read only in the language.  This does have
the nasty side-effect of requiring both a cast *and* a dup.

 // Look ma, it's C++ warmed over!
 const char[] literal = "literal string";
 char[] ugliness = cast(char[])literal.dup();

.. plus it's not 100% typesafe since we're allowing 'const' to be cast away.

A better solution would be to still require 'const' for literal assignment, but
allow for two additional properties: ".mutable" and ".immutable" to supply the
means to work *with* the const-ness applied to the type.

 const char[] literal = "literal string";
 char[] not_so_ugly = literal.mutable;
 const char[] literal2 = not_so_ugly.immutable; // gets a copy

. where mutable() and immutable() perform an implied 'dup()' (where needed) and
return the const-ness one would expect.

Of course the easiest solution would be to get GDC to stick literals in the
writable data segment, like DMD does, per your suggestion.  Then its back to
'programmer beware' which may not be that bad a situation.  

I'll add that I've already developed habits to avoid trouble with writing to
slices.  Typically, if I'm not sure if I'm using a slice or not, I just call
dup() to be sure... it keeps things sane that way.

- EricAnderton at yahoo

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"pragma" <pragma_member pathlink.com> wrote in message
news:cub8af$kf4$1 digitaldaemon.com...
 In article <cub6in$i6d$1 digitaldaemon.com>, Matthew says...
IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour is
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to have
the crashes like the Linux does, so we get a feel for what this
problem
is like

I think the solution is, weird as it sounds, is to make literals
writeable. This, of course, depends on whether literals are folded
(i.e.
the same literal in two separate places in code actually refer to the
same bit of memory after compilation/linking). Since literals are
generally a bad thing, we should not be using them often. Given that,
maybe we can salvage this situation by saying that literals are *not*
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental
differences between platforms, we cannot have code that may or may not
crash depending on whether the thing way up the call stack is a
literal
or not, and we *should not* lose the marvellous effeciency afforded by
slices. (Slices are the best thing since sliced bread.)

Thoughts?

 A few. ;)

 I'm with you in that a decision needs to be made to keep GDC in step
 with DMD.
 As to "which platform has the bug?", I dont' know quite yet.

 I find myself leaning toward requiring literals to be given a 'const
 char[]'
 style type so that they're obviously read only in the language.  This
 does have
 the nasty side-effect of requiring both a cast *and* a dup.

 // Look ma, it's C++ warmed over!
 const char[] literal = "literal string";
 char[] ugliness = cast(char[])literal.dup();

 .. plus it's not 100% typesafe since we're allowing 'const' to be cast
 away.

 A better solution would be to still require 'const' for literal
 assignment, but
 allow for two additional properties: ".mutable" and ".immutable" to
 supply the
 means to work *with* the const-ness applied to the type.

 const char[] literal = "literal string";
 char[] not_so_ugly = literal.mutable;
 const char[] literal2 = not_so_ugly.immutable; // gets a copy

 . where mutable() and immutable() perform an implied 'dup()' (where
 needed) and
 return the const-ness one would expect.

 Of course the easiest solution would be to get GDC to stick literals
 in the
 writable data segment, like DMD does, per your suggestion.  Then its
 back to
 'programmer beware' which may not be that bad a situation.

Yes. But the important point is that it's 'literal writer beware'. It's
precisely because literals are best not (over)used anyway that I suggest
this is the reasonable tactic. All the other alternatives - status quo,
crashes, const, even your somewhat elegant mutable/immutable - either
suck, or are complex, or would require huge changes.

Walter, what are your thoughts on this?

Feb 08 2005

Kris <Kris_member pathlink.com> writes:

In article <cub9lc$mbe$3 digitaldaemon.com>, Matthew says...
"pragma" <pragma_member pathlink.com> wrote in message
news:cub8af$kf4$1 digitaldaemon.com...
 In article <cub6in$i6d$1 digitaldaemon.com>, Matthew says...
IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour is
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to have
the crashes like the Linux does, so we get a feel for what this
problem
is like

I think the solution is, weird as it sounds, is to make literals
writeable. This, of course, depends on whether literals are folded
(i.e.
the same literal in two separate places in code actually refer to the
same bit of memory after compilation/linking). Since literals are
generally a bad thing, we should not be using them often. Given that,
maybe we can salvage this situation by saying that literals are *not*
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental
differences between platforms, we cannot have code that may or may not
crash depending on whether the thing way up the call stack is a
literal
or not, and we *should not* lose the marvellous effeciency afforded by
slices. (Slices are the best thing since sliced bread.)

Thoughts?

 A few. ;)

 I'm with you in that a decision needs to be made to keep GDC in step
 with DMD.
 As to "which platform has the bug?", I dont' know quite yet.

 I find myself leaning toward requiring literals to be given a 'const
 char[]'
 style type so that they're obviously read only in the language.  This
 does have
 the nasty side-effect of requiring both a cast *and* a dup.

 // Look ma, it's C++ warmed over!
 const char[] literal = "literal string";
 char[] ugliness = cast(char[])literal.dup();

 .. plus it's not 100% typesafe since we're allowing 'const' to be cast
 away.

 A better solution would be to still require 'const' for literal
 assignment, but
 allow for two additional properties: ".mutable" and ".immutable" to
 supply the
 means to work *with* the const-ness applied to the type.

 const char[] literal = "literal string";
 char[] not_so_ugly = literal.mutable;
 const char[] literal2 = not_so_ugly.immutable; // gets a copy

 . where mutable() and immutable() perform an implied 'dup()' (where
 needed) and
 return the const-ness one would expect.

 Of course the easiest solution would be to get GDC to stick literals
 in the
 writable data segment, like DMD does, per your suggestion.  Then its
 back to
 'programmer beware' which may not be that bad a situation.

Yes. But the important point is that it's 'literal writer beware'. It's
precisely because literals are best not (over)used anyway that I suggest
this is the reasonable tactic. All the other alternatives - status quo,
crashes, const, even your somewhat elegant mutable/immutable - either
suck, or are complex, or would require huge changes.

Walter, what are your thoughts on this?


The concept of read-only data is a powerful one. It's borne out in practice
through the usage of immutable objects. It's also been the backbone for placing
reference data (think "fonts", "messages", "data structures" etc) into ROM since
before many here were born.

Thus, I truly believe D needs to support read-only data; the ICU wrappers could
really make use of this, for example (to avoid .dup all over the place).
Concurrent programming techniques, as I'm sure you will attest to Matthew, cry
out for the ability to enforce immutable/read-only/reference status.

Note, however, that this is /not/ the same as the much-maligned, all singing,
all dancing, C++ const!

If the "readonly" attribute were present, D might imply all literals as
readonly.

So what about writing to a string literal? Hands up all those who regularly
write to a string literal? How many times, per year, do you do that? I don't
expect there would be too many. 

If one wishes to modify a pre-populated array of chars, then one can do it like
so: 

char[] label = ['m', 'y', ' ', 'l', 'a', 'b', 'e', 'l'];

or 

char[] _label = "my label";
char[] label = _label.dup;

Given the available options, my opinion would be retain the read-only status of
literals (make DMD the same as GDC), and then introduce a readonly status for
data; one that avoids all the complexity of the C++ const mega-notion.

my 2C

- Kris

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Kris" <Kris_member pathlink.com> wrote in message 
news:cubcuu$qd2$1 digitaldaemon.com...
 In article <cub9lc$mbe$3 digitaldaemon.com>, Matthew says...
"pragma" <pragma_member pathlink.com> wrote in message
news:cub8af$kf4$1 digitaldaemon.com...
 In article <cub6in$i6d$1 digitaldaemon.com>, Matthew says...
IMO, the following needs to happen:

    1. Determine whether the Win32 behaviour or the Linux behaviour 
 is
non-standard.
    2. If it's the Win32 behaviour, then it should be amended to 
 have
the crashes like the Linux does, so we get a feel for what this
problem
is like

I think the solution is, weird as it sounds, is to make literals
writeable. This, of course, depends on whether literals are folded
(i.e.
the same literal in two separate places in code actually refer to 
the
same bit of memory after compilation/linking). Since literals are
generally a bad thing, we should not be using them often. Given 
that,
maybe we can salvage this situation by saying that literals are 
*not*
folded and can be written to.

Sounds like heresy, I know, but IMO we cannot have such fundamental
differences between platforms, we cannot have code that may or may 
not
crash depending on whether the thing way up the call stack is a
literal
or not, and we *should not* lose the marvellous effeciency afforded 
by
slices. (Slices are the best thing since sliced bread.)

Thoughts?

 A few. ;)

 I'm with you in that a decision needs to be made to keep GDC in step
 with DMD.
 As to "which platform has the bug?", I dont' know quite yet.

 I find myself leaning toward requiring literals to be given a 'const
 char[]'
 style type so that they're obviously read only in the language. 
 This
 does have
 the nasty side-effect of requiring both a cast *and* a dup.

 // Look ma, it's C++ warmed over!
 const char[] literal = "literal string";
 char[] ugliness = cast(char[])literal.dup();

 .. plus it's not 100% typesafe since we're allowing 'const' to be 
 cast
 away.

 A better solution would be to still require 'const' for literal
 assignment, but
 allow for two additional properties: ".mutable" and ".immutable" to
 supply the
 means to work *with* the const-ness applied to the type.

 const char[] literal = "literal string";
 char[] not_so_ugly = literal.mutable;
 const char[] literal2 = not_so_ugly.immutable; // gets a copy

 . where mutable() and immutable() perform an implied 'dup()' (where
 needed) and
 return the const-ness one would expect.

 Of course the easiest solution would be to get GDC to stick literals
 in the
 writable data segment, like DMD does, per your suggestion.  Then its
 back to
 'programmer beware' which may not be that bad a situation.

Yes. But the important point is that it's 'literal writer beware'. 
It's
precisely because literals are best not (over)used anyway that I 
suggest
this is the reasonable tactic. All the other alternatives - status 
quo,
crashes, const, even your somewhat elegant mutable/immutable - either
suck, or are complex, or would require huge changes.

Walter, what are your thoughts on this?


 The concept of read-only data is a powerful one. It's borne out in 
 practice
 through the usage of immutable objects. It's also been the backbone 
 for placing
 reference data (think "fonts", "messages", "data structures" etc) into 
 ROM since
 before many here were born.

 Thus, I truly believe D needs to support read-only data; the ICU 
 wrappers could
 really make use of this, for example (to avoid .dup all over the 
 place).
 Concurrent programming techniques, as I'm sure you will attest to 
 Matthew, cry
 out for the ability to enforce immutable/read-only/reference status.

Indeed. I've just submitted my next "Flexible C++" instalment on that 
very issue (and the dangers of "logical constness"). ;)

 Note, however, that this is /not/ the same as the much-maligned, all 
 singing,
 all dancing, C++ const!

Agreed.

 If the "readonly" attribute were present, D might imply all literals 
 as
 readonly.

'const' should _always_ have been called readonly, IMO.

And, yes, I've wanted a readonly in D for some years, along with many 
others.

 So what about writing to a string literal? Hands up all those who 
 regularly
 write to a string literal? How many times, per year, do you do that? I 
 don't
 expect there would be too many.

I _occasionally_ do so when I'm feeling lazy, although it's much more 
often that I do something like the following:

    char_type drv[] = { '?', ':', '\\', '\0' };

    drv[0] = 'A' + drive_index;

btw, this technique of declaring a C-string with aggregate syntax, 
rather than a literal, is very useful for writing multi-char-encoding 
templates, since it works just as well with wchar_t as char.

 If one wishes to modify a pre-populated array of chars, then one can 
 do it like
 so:

 char[] label = ['m', 'y', ' ', 'l', 'a', 'b', 'e', 'l'];

 or

 char[] _label = "my label";
 char[] label = _label.dup;

 Given the available options, my opinion would be retain the read-only 
 status of
 literals (make DMD the same as GDC), and then introduce a readonly 
 status for
 data; one that avoids all the complexity of the C++ const mega-notion.

So you mean:
    - on Win32 (and *all other platforms* (that support it) literals go 
in a read-only segment)
    - we have a readonly keyword
    - literals are implicitly readonly.
    - one cannot slice from a readonly to a non-readonly, only .dup

That'd mean your code above would not compile. Rather it'd have to be:

    readonly char[] _label = "my label";
    char[] label = _label.dup;

or

    char[] label = "my label".dup;


I can buy that. Indeed, I think I like it. The downside is that it 
requires a new keyword - and we know how popular that's going to be! - 
but the complexity involved seems moderate.

We just need to find out what Walter thinks?

Feb 08 2005

Kris <Kris_member pathlink.com> writes:

In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...
I can buy that. Indeed, I think I like it. The downside is that it 
requires a new keyword - and we know how popular that's going to be! - 
but the complexity involved seems moderate.

As far as D goes, "const" and "readonly" should be interchangeable. Const int
and const char, for example, are read-only ...

Thus, rather than introduce another keyword, could we not use "const" rather
than "readonly" for arrays also?

- Kris

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Kris" <Kris_member pathlink.com> wrote in message
news:cubfpo$t46$1 digitaldaemon.com...
 In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...
I can buy that. Indeed, I think I like it. The downside is that it
requires a new keyword - and we know how popular that's going to be! -
but the complexity involved seems moderate.

 As far as D goes, "const" and "readonly" should be interchangeable.
 Const int
 and const char, for example, are read-only ...

 Thus, rather than introduce another keyword, could we not use "const"
 rather
 than "readonly" for arrays also?

We could, but I think it'd be nicer to keep const for constants.

Either way, though, the issue is a significant new bit of language. It 
is to that that I anticipate objections

Feb 08 2005

John Reimer <brk_6502 yahoo.com> writes:

Kris wrote:
 In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...
 
I can buy that. Indeed, I think I like it. The downside is that it 
requires a new keyword - and we know how popular that's going to be! - 
but the complexity involved seems moderate.

 
 
 As far as D goes, "const" and "readonly" should be interchangeable. Const int
 and const char, for example, are read-only ...
 
 Thus, rather than introduce another keyword, could we not use "const" rather
 than "readonly" for arrays also?
 
 - Kris
 
 
 

I really, really like the idea of a "readonly" keyword.  It seems so 
clear.  The only disadvantage is that it caters to the English language 
programmers (but most of the keywords do anyway).  At least it's not as 
bad as Pascal.

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"John Reimer" <brk_6502 yahoo.com> wrote in message 
news:cubgm6$tjn$1 digitaldaemon.com...
 Kris wrote:
 In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...

I can buy that. Indeed, I think I like it. The downside is that it 
requires a new keyword - and we know how popular that's going to 
be! - but the complexity involved seems moderate.


 As far as D goes, "const" and "readonly" should be interchangeable. 
 Const int
 and const char, for example, are read-only ...

 Thus, rather than introduce another keyword, could we not use "const" 
 rather
 than "readonly" for arrays also?

 - Kris

 I really, really like the idea of a "readonly" keyword.  It seems so 
 clear.  The only disadvantage is that it caters to the English 
 language programmers (but most of the keywords do anyway).  At least 
 it's not as bad as Pascal.

Also, Walter hates const, so if we can divorce him from that 
emotionally, even a little, we might stand a chance of getting our 
(thoroughly techically worthy) point across.

Anyway, whether it's const or readonly is somewhat irrelevant at this 
point. May I suggest that we use readonly in our code nuggets for the 
remainder of the debate, so as to keep the issue as clear (and 
demonstrably 'new') as possible? If it resolves to using the const 
keyword if/when it gets accepted, so be it

Feb 08 2005

"Alex Stevenson" <ans104 cs.york.ac.uk> writes:

On Wed, 9 Feb 2005 10:09:25 +1100, Matthew  
<admin stlsoft.dot.dot.dot.dot.org> wrote:

 "John Reimer" <brk_6502 yahoo.com> wrote in message
 news:cubgm6$tjn$1 digitaldaemon.com...
 Kris wrote:
 In article <cube3u$rh8$1 digitaldaemon.com>, Matthew says...

 I can buy that. Indeed, I think I like it. The downside is that it
 requires a new keyword - and we know how popular that's going to
 be! - but the complexity involved seems moderate.


 As far as D goes, "const" and "readonly" should be interchangeable.
 Const int
 and const char, for example, are read-only ...

 Thus, rather than introduce another keyword, could we not use "const"
 rather
 than "readonly" for arrays also?

 - Kris

 I really, really like the idea of a "readonly" keyword.  It seems so
 clear.  The only disadvantage is that it caters to the English
 language programmers (but most of the keywords do anyway).  At least
 it's not as bad as Pascal.

 Also, Walter hates const, so if we can divorce him from that
 emotionally, even a little, we might stand a chance of getting our
 (thoroughly techically worthy) point across.

 Anyway, whether it's const or readonly is somewhat irrelevant at this
 point. May I suggest that we use readonly in our code nuggets for the
 remainder of the debate, so as to keep the issue as clear (and
 demonstrably 'new') as possible? If it resolves to using the const
 keyword if/when it gets accepted, so be it

I hate to muddy the water still further, but how about the final keyword?  
AFAIK it only currently has meaning for class methods... Though one  
keyword doing double duty smells a little bad (static in C anyone?)

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Feb 08 2005

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Alex Stevenson wrote:

 I hate to muddy the water still further, but how about the final 
 keyword?  AFAIK it only currently has meaning for class methods... 
 Though one  keyword doing double duty smells a little bad (static in C 
 anyone?)

"delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ?

Reusing keywords for wildly different things is part of the heritage.

:-)

--anders

Feb 08 2005

"Alex Stevenson" <ans104 cs.york.ac.uk> writes:

On Wed, 09 Feb 2005 00:37:45 +0100, Anders F Bj�rklund <afb algonet.se>  
wrote:

 Alex Stevenson wrote:

 I hate to muddy the water still further, but how about the final  
 keyword?  AFAIK it only currently has meaning for class methods...  
 Though one  keyword doing double duty smells a little bad (static in C  
 anyone?)

 "delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ?

 Reusing keywords for wildly different things is part of the heritage.

 :-)

 --anders

Rape, pillage and the subjagation of innocent nations is part of my  
heritage (Englishman dontcherknow old boy...), but that doesn't mean I  
want to do them every day - only for special occaisions. On reflection  
though, I think I'd prefer to reuse a keyword for two clearly seperate  
things than take the Ada route and have ludicrous numbers of keywords.


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Alex Stevenson" <ans104 cs.york.ac.uk> wrote in message 
news:opslwkgxwo08qma6 mjolnir.spamnet.local...
 On Wed, 09 Feb 2005 00:37:45 +0100, Anders F Bj�rklund 
 <afb algonet.se>  wrote:

 Alex Stevenson wrote:

 I hate to muddy the water still further, but how about the final 
 keyword?  AFAIK it only currently has meaning for class methods... 
 Though one  keyword doing double duty smells a little bad (static in 
 C  anyone?)

 "delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ?

 Reusing keywords for wildly different things is part of the heritage.

 :-)

 --anders

 Rape, pillage and the subjagation of innocent nations is part of my 
 heritage (Englishman dontcherknow old boy...),

Don't forget parliamentary democracy and that most civilised practice: 
queueing.

:-)

Feb 08 2005

"Alex Stevenson" <ans104 cs.york.ac.uk> writes:

On Wed, 9 Feb 2005 11:23:21 +1100, Matthew  
<admin stlsoft.dot.dot.dot.dot.org> wrote:

 "Alex Stevenson" <ans104 cs.york.ac.uk> wrote in message
 news:opslwkgxwo08qma6 mjolnir.spamnet.local...
 On Wed, 09 Feb 2005 00:37:45 +0100, Anders F Bj�rklund
 <afb algonet.se>  wrote:

 Alex Stevenson wrote:

 I hate to muddy the water still further, but how about the final
 keyword?  AFAIK it only currently has meaning for class methods...
 Though one  keyword doing double duty smells a little bad (static in
 C  anyone?)

 "delete" in D anyone ? "extern" in C anyone ? "in" in D anyone ?

 Reusing keywords for wildly different things is part of the heritage.

 :-)

 --anders

 Rape, pillage and the subjagation of innocent nations is part of my
 heritage (Englishman dontcherknow old boy...),

 Don't forget parliamentary democracy and that most civilised practice:
 queueing.

 :-)

Cricket. You forgot cricket. :-P

As for Paliamentary Democracy, I'll leave that to Sir Humphrey Appleby  
 from "Yes, Prime Minister":

"Since 1832 we [the civil service] have been gradually excluding the voter  
 from government, now we've got them to a point where they just vote once  
every five years for whichever bunch of buffoons will try to interfere  
with our policies..."

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Feb 08 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Matthew wrote:

 So you mean:
     - on Win32 (and *all other platforms* (that support it) literals go 
 in a read-only segment)
     - we have a readonly keyword
     - literals are implicitly readonly.
     - one cannot slice from a readonly to a non-readonly, only .dup

Slicing could be allowed, only that the slice is readonly too ?

Q: Couldn't readonly be just a another attribute to the arrays ?

char[5] s;
   s.length = 5;
   s.readonly = 0;

char[] a = new char[5];
   a.length = 5;
   a.readonly = 0;

char[] l = "hello"
   .length = 5;
   .readonly = 1;

And you can set it to false, but not set it back to true again...

--anders

Feb 08 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

I wrote, too soon:

 Q: Couldn't readonly be just a another attribute to the arrays ?
 
 char[5] s;
   s.length = 5
   s.readonly = 0
 
 char[] a = new char[5];
   a.length = 5
   a.readonly = 0
 
 char[] l = "hello"
   l.length = 5
   l.readonly = 1
 
 And you can set it to false, but not set it back to true again...

That should have read "you can set it to 1, but not back to 0".

You would use .dup to return a new copy, that had readonly = 0.

--anders

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:cubfrd$t2j$1 digitaldaemon.com...
 Matthew wrote:

 So you mean:
     - on Win32 (and *all other platforms* (that support it) literals 
 go in a read-only segment)
     - we have a readonly keyword
     - literals are implicitly readonly.
     - one cannot slice from a readonly to a non-readonly, only .dup

 Slicing could be allowed, only that the slice is readonly too ?

 Q: Couldn't readonly be just a another attribute to the arrays ?

 char[5] s;
   s.length = 5;
   s.readonly = 0;

 char[] a = new char[5];
   a.length = 5;
   a.readonly = 0;

 char[] l = "hello"
   .length = 5;
   .readonly = 1;

 And you can set it to false, but not set it back to true again...

Yes, but it'd be invisible to the programmer looking at code. Also, it'd 
be runtime, rather than compile time, checking. Which we already have, 
in the form of the access fault (on Linux at least).

Feb 08 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Matthew wrote:

Q: Couldn't readonly be just a another attribute to the arrays ?

And you can set it to 1, but not set it back to 0 again...

 
 Yes, but it'd be invisible to the programmer looking at code. Also, it'd 
 be runtime, rather than compile time, checking. Which we already have, 
 in the form of the access fault (on Linux at least).

But couldn't the compiler catch the obvious misuses ?
(i.e. assiging stuff to the readonly string literals)

It already catches if you try to e.g. assign a new
length to a static array, and other such assignments ?


It would also solve Kris's problem of returning an
array reference to his internal buffer, since he would:

1) make a slice of the entire array: copy = buffer[]
2) make the copy readonly, by: copy.readonly = true
3) return the copy, which is now a "readonly" array
4) it still doesn't protect others if *he* changes it,
    that's the same as with casting mutable->immutable

You could of course still abuse that by the built-in
array-to-pointer conversion, but that's another story.


One could even extend the "in" keyword and param default
to make arrays readonly, when not using "out" or "inout" ?

It would also enforce Copy-on-Write, since you could
still read the entire input without duplicating but
if you need to make any changes then you need to .dup
first, or else the array would still be readonly...

i.e.

void function1(char[] s);
// inside this function, "s" now has the readonly flag set

i.e. function1 sets: s.readonly = 1, need to Copy-on-Write
      (must use .dup, as setting s.readonly = 0 is an error)

void function2(inout char[] s);
// but this function is free to modify the "s" param's chars

i.e. function2 does *not* modify s.readonly, which means
      that if the input was a literal it would still fail

--anders

Feb 08 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 09 Feb 2005 00:26:38 +0100, Anders F Bj�rklund <afb algonet.se>  
wrote:
 One could even extend the "in" keyword and param default
 to make arrays readonly, when not using "out" or "inout" ?

This is my preferred solution.

- It is a compile time solution.
- It is enforcement of a functions contract i.e.

void foo(in int a, out int b, inout int c)

says to me, I'm gonna read a, write b, and read/write c.
If the function then writes to 'a' it violates that contract.


On the issue of the readonly memory, the current D behaviour is present in  
C (at least in the compilers I use). I think we can and should use both  
readonly and non readonly memory in different places, for different  


Examples:
1)
void foo(in int a, out int b, inout int c) {
   a = 1; //error
   b = 1; //ok
   c = 1; //ok
}

void main() {
   readonly int ri = 5;
   int wi;

   foo(wi,wi,ri); //error, ri is readonly
   foo(wi,ri,wi); //error, ri is readonly
   foo(ri,wi,wi); //ok
}

2)
readonly char[] ro = "READ"; //data placed in readonly memory
char[] rw = "READ/WRITE";    //data _not_ placed in readonly memory

void foo(in char[] a, out char[] b) {
   readonly char[] c;

   b = a[0..3];     //error, a is readonly. (or implicit dup?)
   b = a[0..3].dup; //ok
   c = a[0..3];     //ok

   a[0] = 'a';      //error, a is readonly
   b[0] = 'a';      //ok
   c[0] = 'a';      //error, c is readonly
}

void main() {
   readonly char[] lro = "README"; //data placed in readonly memory
   char[] lrw = "READ/WRITE";      //data _not_ placed in readonly memory

   foo(ro,rw); //ok
   foo(rw,ro); //error, ro is readonly
   foo(rw,rw); //ok
}

I realise this idea is very similar to the C/C++ const idea.

Regan

Feb 08 2005

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:

Regan Heath wrote:

 One could even extend the "in" keyword and param default
 to make arrays readonly, when not using "out" or "inout" ?

 
 This is my preferred solution.
 - It is a compile time solution.
 - It is enforcement of a functions contract i.e.

It would also work well to integrate literals...

 Examples:
 1)
 void foo(in int a, out int b, inout int c) {
   a = 1; //error
   b = 1; //ok
   c = 1; //ok
 }

I think you misunderstood something. "in" means you get a copy.
"inout" means you get a reference. ("out" means it is .init-ed)

Thus, "a = 1" is not an error. It just doesn't affect anything,
at least not outside the function (such as the argument passed).

    int a = 0, b = 0, c = 0;
    foo(a,b,c);            
    printf("a=%d b=%d c=%d\n", a, b, c);

a=0 b=1 c=1

     foo(1,2,3);            

constant 1 is not an lvalue
constant 2 is not an lvalue

 void main() {
   readonly int ri = 5;
   int wi;
 
   foo(wi,wi,ri); //error, ri is readonly
   foo(wi,ri,wi); //error, ri is readonly
   foo(ri,wi,wi); //ok
 }

If you mean "const", why not just say that ? :-)

constant 5 is not an lvalue

 2)
 readonly char[] ro = "READ"; //data placed in readonly memory
 char[] rw = "READ/WRITE";    //data _not_ placed in readonly memory

Again, having such extra modifiers on strings
is what made C++ const suck in the first place.

  char[] ro = "READ"; //data placed in readonly memory
  char[] rw = "READ/WRITE".dup;    //data _not_ placed in readonly memory

Since the read/write memory is now allocated by the
trashman, it's not as easy to have it static anyway...

String literals are located in data section, as in C.
(they are even '\u0000'-terminated, for usage with C)

 void foo(in char[] a, out char[] b) {
   readonly char[] c;

char[] c;

   b = a[0..3];     //error, a is readonly. (or implicit dup?)

most likely a compile error,
since all info is known by then.

   b = a[0..3].dup; //ok
   c = a[0..3];     //ok

making c readonly. (since a was)

   a[0] = 'a';      //error, a is readonly

same as "hello"[0] = 'a';

   b[0] = 'a';      //ok

ok.

   c[0] = 'a';      //error, c is readonly

compiler doesn't know that,
but you'd get a runtime error.


The tricky parts is when you assign to the entire array.
a = b; a = c; c = a; c = b; // and so on, and so forth

Not that it would something you would do often, but it
would probably not affect anything at all - just change
the "pointer" of the array in question (not contents)

 void main() {
   readonly char[] lro = "README"; //data placed in readonly memory
   char[] lrw = "READ/WRITE";      //data _not_ placed in readonly memory

probably should be written as:
const char[] ro = "README"; // read-only (literal)
char[] rw = "READ/WRITE".dup; // read-write (copy)

But usally you can get away with:
char[] s = "hello"; // this is read-only at run-time, but not at compile

writefln("%s",s); // OK
char[] t = s ~ " world"; // OK
s[4] = '?'; // KABOOM; should have Copy-on-Write
t[5] = '!'; // OK; since we made our own copy first

   foo(ro,rw); //ok
   foo(rw,ro); //error, ro is readonly
   foo(rw,rw); //ok

Note that most functions will have "in" parameters.

char[] tolower(char[] s);
char[] toupper(char[] s);

 I realise this idea is very similar to the C/C++ const idea.

And that is why it needs to die, I'm afraid.

--anders

Feb 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 09 Feb 2005 10:02:06 +0100, Anders F Bj�rklund <afb algonet.se>  
wrote:
 Regan Heath wrote:

 One could even extend the "in" keyword and param default
 to make arrays readonly, when not using "out" or "inout" ?

  This is my preferred solution.
 - It is a compile time solution.
 - It is enforcement of a functions contract i.e.

 It would also work well to integrate literals...

 Examples:
 1)
 void foo(in int a, out int b, inout int c) {
   a = 1; //error
   b = 1; //ok
   c = 1; //ok
 }

 I think you misunderstood something.

No, I understand how it works _now_ and I understand why it works that  
way. I think we can keep the advantages and add more.

The funny thing is if you pass a char[] you can modify the data it  
references but not the reference. eg.





















Does the GC collect "def" now? or is "def" tied to the same block as "abc"  
and thus hangs around till "abc" vanishes?

 "in" means you get a copy.
 "inout" means you get a reference. ("out" means it is .init-ed)

 Thus, "a = 1" is not an error. It just doesn't affect anything,
 at least not outside the function (such as the argument passed).

    int a = 0, b = 0, c = 0;
    foo(a,b,c);               printf("a=%d b=%d c=%d\n", a, b, c);

 a=0 b=1 c=1

     foo(1,2,3);

 constant 1 is not an lvalue
 constant 2 is not an lvalue

 void main() {
   readonly int ri = 5;
   int wi;
    foo(wi,wi,ri); //error, ri is readonly
   foo(wi,ri,wi); //error, ri is readonly
   foo(ri,wi,wi); //ok
 }

 If you mean "const", why not just say that ? :-)

Because someone suggested 'readonly' was a better name, and I agree, for  
the concept I have in mind.

A constant is something that can 'never' change. A readonly variable is  
simply a variable which is readonly in the current scope, i.e.

char[] a; //not readonly
foo(a);

void foo(in char[] a) {
   //a is readonly here
}

 constant 5 is not an lvalue

 2)
 readonly char[] ro = "READ"; //data placed in readonly memory
 char[] rw = "READ/WRITE";    //data _not_ placed in readonly memory

 Again, having such extra modifiers on strings
 is what made C++ const suck in the first place.

C++ const sucked as a result of more than just 1 factor, this readonly is  
similar to part of the whole const thing, but it's not the whole const  
thing, if that makes any sense.

The 'readonoly' above is 2 things;
1- an hint as to where the compiler can put the reference/data.
2- an indication that the programmer does not intend to modify the  
refernce/data.

Allowing the compiler to optimise and error check.

I would be interested to see if the arguments against const apply to my  
idea here equally well or not, after all D is not C in many respects, and  
this idea is not exactly the same (I dont think).

I must admit I do not know everything there is to know about the problems  
with const in C++, perhaps we should start with a description of them?

   char[] ro = "READ"; //data placed in readonly memory
   char[] rw = "READ/WRITE".dup;    //data _not_ placed in readonly memory

 Since the read/write memory is now allocated by the
 trashman, it's not as easy to have it static anyway...

It's simply an indication to the compiler, to the trashman, as to where it  
can put the memory if it so desires.

 String literals are located in data section, as in C.
 (they are even '\u0000'-terminated, for usage with C)

Sure, and on linux they're in readonly memory, and on windows they're in  
read/write memory.

 void foo(in char[] a, out char[] b) {
   readonly char[] c;

 char[] c;

   b = a[0..3];     //error, a is readonly. (or implicit dup?)

 most likely a compile error,
 since all info is known by then.

yep.

   b = a[0..3].dup; //ok
   c = a[0..3];     //ok

 making c readonly. (since a was)

yep.

   a[0] = 'a';      //error, a is readonly

 same as "hello"[0] = 'a';

yep.

   b[0] = 'a';      //ok

 ok.

   c[0] = 'a';      //error, c is readonly

 compiler doesn't know that,

It could in this simple example keep track of this during compile by  
flagging c as readonly above where it's assigned and erroring here.

There may be more complex cases where it's not possible?

 but you'd get a runtime error.

yep.

 The tricky parts is when you assign to the entire array.
 a = b; a = c; c = a; c = b; // and so on, and so forth

 Not that it would something you would do often, but it
 would probably not affect anything at all - just change
 the "pointer" of the array in question (not contents)

I think in some cases (perhaps all) the compiler can keep track of these  
things by flagging variables as readonly or not during compile.

 void main() {
   readonly char[] lro = "README"; //data placed in readonly memory
   char[] lrw = "READ/WRITE";      //data _not_ placed in readonly memory

 probably should be written as:
 const char[] ro = "README"; // read-only (literal)
 char[] rw = "READ/WRITE".dup; // read-write (copy)

Using my definition above of const and readonly I agree. However for the  
sake of less keywords we could re-use readonly.

 But usally you can get away with:
 char[] s = "hello"; // this is read-only at run-time, but not at compile

It depends where you decide to put "hello" by default. Someone has  
suggested it should go into readonly memory, some have suggested it  
shouldn't.

If not, then 'readonly' can be seen as a hint to the compiler that it can  
optimise by placing the string in readonly memory, of course it can still  
choose not to, as long as we have compile and runtime checking of  
'readonly' then it all works dandy.

If not, then you as a programmer have more options.

That said if most common case is that it's intended to be readonly then  
the default behaviour should follow that.

 writefln("%s",s); // OK
 char[] t = s ~ " world"; // OK
 s[4] = '?'; // KABOOM; should have Copy-on-Write
 t[5] = '!'; // OK; since we made our own copy first

   foo(ro,rw); //ok
   foo(rw,ro); //error, ro is readonly
   foo(rw,rw); //ok

 Note that most functions will have "in" parameters.

 char[] tolower(char[] s);
 char[] toupper(char[] s);

Sure, meaning in my idea they will _not_ modify the input. If they want to  
they should be re-specified as 'inout'. It's all part of the functions  
contract.

 I realise this idea is very similar to the C/C++ const idea.

 And that is why it needs to die, I'm afraid.

I said 'similar' not 'identical' and D is not C, D is different. I want to  
know whether the problems with C++ const apply equally here in D.

Regan

Feb 09 2005

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Regan Heath wrote:

 No, I understand how it works _now_ and I understand why it works that  
 way. I think we can keep the advantages and add more.

Okay.

 The funny thing is if you pass a char[] you can modify the data it  
 references but not the reference. eg.
 
 Does the GC collect "def" now? or is "def" tied to the same block as 
 "abc"  and thus hangs around till "abc" vanishes?

I think it'll clean it up, but my faith in the trashman is limited.

But isn't this the same as with pointers ? You can change the data
pointed to, but you can really change the pointer itself - since
it is passed by value. It's also similar to how slices work, where
one slice could operate on the data also pointed to by the other...

 If you mean "const", why not just say that ? :-)

 
 Because someone suggested 'readonly' was a better name, and I agree, 
 for  the concept I have in mind.

I meant in the current implementation, as I'm sure you know.

 C++ const sucked as a result of more than just 1 factor, this readonly 
 is  similar to part of the whole const thing, but it's not the whole 
 const  thing, if that makes any sense.

Yeah it's similar to how "inout" is similar to &, but not equal to.

 The 'readonoly' above is 2 things;
 1- an hint as to where the compiler can put the reference/data.
 2- an indication that the programmer does not intend to modify the  
 refernce/data.
 
 Allowing the compiler to optimise and error check.

Think I had about the same thing, but with a property instead.

 Since the read/write memory is now allocated by the
 trashman, it's not as easy to have it static anyway...

 
 It's simply an indication to the compiler, to the trashman, as to where 
 it  can put the memory if it so desires.

There aren't that many different places to put data, at compile time.

 String literals are located in data section, as in C.
 (they are even '\u0000'-terminated, for usage with C)

 
 Sure, and on linux they're in readonly memory, and on windows they're 
 in  read/write memory.

Linux and Darwin (and the rest of the GDC: FreeBSD, Solaris, etc)

 But usally you can get away with:
 char[] s = "hello"; // this is read-only at run-time, but not at compile

 
 It depends where you decide to put "hello" by default. Someone has  
 suggested it should go into readonly memory, some have suggested it  
 shouldn't.

The suggestion was that since it is read-only on *some*, it would
be more consistent if it could be made (forced) readonly on all ?

That is, if it's still meant to be an easily portable language.

 Note that most functions will have "in" parameters.

 char[] tolower(char[] s);
 char[] toupper(char[] s);

 
 Sure, meaning in my idea they will _not_ modify the input. If they want 
 to  they should be re-specified as 'inout'. It's all part of the 
 functions  contract.

I don't believe that it makes the slighest difference at the moment,
which was why I thought it would be better if "in" changed something.

Of course, that would imply that there is such a thing as a readonly.
The only indication at the moment is a segfault when you write to it.

--anders

Feb 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 09 Feb 2005 23:44:53 +0100, Anders F Bj�rklund <afb algonet.se>  
wrote:
 Regan Heath wrote:
 The funny thing is if you pass a char[] you can modify the data it   
 references but not the reference. eg.
  Does the GC collect "def" now? or is "def" tied to the same block as  
 "abc"  and thus hangs around till "abc" vanishes?

 I think it'll clean it up, but my faith in the trashman is limited.

 But isn't this the same as with pointers ? You can change the data
 pointed to, but you can really change the pointer itself - since
 it is passed by value.

Yes, a copy of the pointer is passed.

 It's also similar to how slices work, where
 one slice could operate on the data also pointed to by the other...

True.

 If you mean "const", why not just say that ? :-)

  Because someone suggested 'readonly' was a better name, and I agree,  
 for  the concept I have in mind.

 I meant in the current implementation, as I'm sure you know.

Oh, sorry. :)

 C++ const sucked as a result of more than just 1 factor, this readonly  
 is  similar to part of the whole const thing, but it's not the whole  
 const  thing, if that makes any sense.

 Yeah it's similar to how "inout" is similar to &, but not equal to.

That is my gut feeling, I need to talk/think about it more to see if my  
gut is lying or not.

 String literals are located in data section, as in C.
 (they are even '\u0000'-terminated, for usage with C)

  Sure, and on linux they're in readonly memory, and on windows they're  
 in  read/write memory.

 Linux and Darwin (and the rest of the GDC: FreeBSD, Solaris, etc)

Sorry, it's a bad habit of mine to say 'linux' when I should say 'unix' or  
something else which actually means what I mean.

 But usally you can get away with:
 char[] s = "hello"; // this is read-only at run-time, but not at  
 compile

  It depends where you decide to put "hello" by default. Someone has   
 suggested it should go into readonly memory, some have suggested it   
 shouldn't.

 The suggestion was that since it is read-only on *some*, it would
 be more consistent if it could be made (forced) readonly on all ?

I agree it needs to be consistent.

I'm on the fence as to whether it should be readonly or not readonly.

I don't think we should make the decision based on what is easiest to do  
at this point in time, tho that is certainly ok for a _temporary_ solution  
(emphasis on temporary).

It may turn out that the easy soln is the right one, it may not.

 That is, if it's still meant to be an easily portable language.

I agree this is important.

 Note that most functions will have "in" parameters.

 char[] tolower(char[] s);
 char[] toupper(char[] s);

  Sure, meaning in my idea they will _not_ modify the input. If they  
 want to  they should be re-specified as 'inout'. It's all part of the  
 functions  contract.

 I don't believe that it makes the slighest difference at the moment,
 which was why I thought it would be better if "in" changed something.

Sorry, I'm not sure what you mean here?

 Of course, that would imply that there is such a thing as a readonly.
 The only indication at the moment is a segfault when you write to it.

Indeed, some sort of indication before the segfault is perferred.

Regan

Feb 09 2005

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Regan Heath wrote:

  Sure, and on linux they're in readonly memory, and on windows 
 they're  in  read/write memory.

 Linux and Darwin (and the rest of the GDC: FreeBSD, Solaris, etc)

 
 Sorry, it's a bad habit of mine to say 'linux' when I should say 'unix' 
 or  something else which actually means what I mean.

Well, it seems to be a general "bad habit" - judging from the code...
And this whole "Windows" vs "linux" casing issue has me queasy still.

 I don't believe that it makes the slighest difference at the moment,
 which was why I thought it would be better if "in" changed something.

 
 Sorry, I'm not sure what you mean here?

We agree that it would be good if "in" and "out" actually did something.
(right now it doesn't affect the char[] parameters, they're trust-based)

 Of course, that would imply that there is such a thing as a readonly.
 The only indication at the moment is a segfault when you write to it.

 
 Indeed, some sort of indication before the segfault is perferred.

Yeah, I believe that's how this thread got started in the first place...
I'll just leave it, got other things to do (and lots of pending patches)

--anders

Feb 09 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> wrote in message 
news:cub6in$i6d$1 digitaldaemon.com...
 To summarise this issue, am I correct in saying that:

    writing to a slice may fail if somewhere along the way, through N 
 slicings, the original string is a literal
    this failure occurs on Linux, because GDC puts literals in a read-only 
 segment. It does _not_ fail on Win32 because they're in a writeable 
 segment

 Questions:

    does the language prescribe the Linux behaviour or the Win32 behaviour? 
 (It cannot leave it undefined, since D does not have undefineds)

The only mention I can find in the spec is on the page about Memory 
Management under Strings and COW. It says the slice may be in read-only 
memory but doesn't go into details.

My own vote would be that it's enough to add a bullet in the Portability 
Guide and probably in the Lexical section about string literals being 
read-only and/or shared/folded on some platforms.

Feb 08 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Ben Hinkle wrote:

 The only mention I can find in the spec is on the page about Memory 
 Management under Strings and COW. It says the slice may be in read-only 
 memory but doesn't go into details.
 
 My own vote would be that it's enough to add a bullet in the Portability 
 Guide and probably in the Lexical section about string literals being 
 read-only and/or shared/folded on some platforms.

String literals on Linux and Mac OS X, and probably other UNIX too
are read-only (whether using DMD or GDC). This means that slices
of string literals are also read-only. Writing to them segfaults...

The "easiest" should be to make them crash on Windows as well ? :-)
Or adding support for read-only strings to the D language itself.
(barring that, a friendly note that string literals are *read-only*)

--anders

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:cubbno$oli$1 digitaldaemon.com...
 Ben Hinkle wrote:

 The only mention I can find in the spec is on the page about Memory 
 Management under Strings and COW. It says the slice may be in 
 read-only memory but doesn't go into details.

 My own vote would be that it's enough to add a bullet in the 
 Portability Guide and probably in the Lexical section about string 
 literals being read-only and/or shared/folded on some platforms.

 String literals on Linux and Mac OS X, and probably other UNIX too
 are read-only (whether using DMD or GDC). This means that slices
 of string literals are also read-only. Writing to them segfaults...

 The "easiest" should be to make them crash on Windows as well ? :-)

Well, either way, it must be the same, since D does not have 
implementation-defined behaviour. At least, that's what it says on the 
packet. ;)

 Or adding support for read-only strings to the D language itself.
 (barring that, a friendly note that string literals are *read-only*)

But surely the problem is that, because slicing supports, well, slices, 
it is inevitable that one will be downlow in the call chain, and slicing 
a string whose original source is unknown, or at least hard to know for 
sure. In that case, do we slice or dup first? (Of course, in most cases, 
modifying something that you do not know anything about's going to be 
dodgy, but there will certainly be cases where it'd be desirable, I'm 
sure. Otherwise, no-one would've reported the problem.)

I can't see a robust alternative to either (i) slices are writeable and 
not folded, or (ii) we have const.

Feb 08 2005

Ben Hinkle <Ben_member pathlink.com> writes:

In article <cubda4$qrn$1 digitaldaemon.com>, Matthew says...
"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:cubbno$oli$1 digitaldaemon.com...
 Ben Hinkle wrote:

 The only mention I can find in the spec is on the page about Memory 
 Management under Strings and COW. It says the slice may be in 
 read-only memory but doesn't go into details.

 My own vote would be that it's enough to add a bullet in the 
 Portability Guide and probably in the Lexical section about string 
 literals being read-only and/or shared/folded on some platforms.

 String literals on Linux and Mac OS X, and probably other UNIX too
 are read-only (whether using DMD or GDC). This means that slices
 of string literals are also read-only. Writing to them segfaults...

 The "easiest" should be to make them crash on Windows as well ? :-)

Well, either way, it must be the same, since D does not have 
implementation-defined behaviour. At least, that's what it says on the 
packet. ;)

Well I wouldn't go so far to say it doesn't have implementation-defined
behavior. For example it says you shouldn't store pointers in ints. Does it
error if you do? maybe or maybe not. I think the idea is to minimize the
implementation-defined behavior. To completely remove it is impossible. So
should string literals have some implementation-defined behavior? Eh, I don't
really have a strong opinion but given the choices on the table the current
situation (or perhaps modifying the windows behavior to match unix if it is
easy) seems like the most reasonable to me without changing some fundamental
aspects of D.

 Or adding support for read-only strings to the D language itself.
 (barring that, a friendly note that string literals are *read-only*)

But surely the problem is that, because slicing supports, well, slices, 
it is inevitable that one will be downlow in the call chain, and slicing 
a string whose original source is unknown, or at least hard to know for 
sure. In that case, do we slice or dup first? (Of course, in most cases, 
modifying something that you do not know anything about's going to be 
dodgy, but there will certainly be cases where it'd be desirable, I'm 
sure. Otherwise, no-one would've reported the problem.)

I can't see a robust alternative to either (i) slices are writeable and 
not folded, or (ii) we have const.

That's what COW is all about. The only downside of COW is that it is not
enforced by the language. But then I'd argue the performance (and simplicity)
upside of COW outweigh the downside. But it is a judgement call for sure. I
think it would take a strong case to convince Walter at this point to abandon
COW. I think the lack of documentation about string literals contributed to the
specific examples the OP ran into. D's behavior didn't surprise me at all given
the C heritage. In fact I would have been surprised if it didn't follow C by
default.

Feb 08 2005

Kris <Kris_member pathlink.com> writes:

In article <cubkj0$11qs$1 digitaldaemon.com>, Ben Hinkle says...
But surely the problem is that, because slicing supports, well, slices, 
it is inevitable that one will be downlow in the call chain, and slicing 
a string whose original source is unknown, or at least hard to know for 
sure. In that case, do we slice or dup first? (Of course, in most cases, 
modifying something that you do not know anything about's going to be 
dodgy, but there will certainly be cases where it'd be desirable, I'm 
sure. Otherwise, no-one would've reported the problem.)

I can't see a robust alternative to either (i) slices are writeable and 
not folded, or (ii) we have const.

That's what COW is all about. The only downside of COW is that it is not
enforced by the language. But then I'd argue the performance (and simplicity)
upside of COW outweigh the downside. But it is a judgement call for sure. I
think it would take a strong case to convince Walter at this point to abandon
COW. I think the lack of documentation about string literals contributed to the
specific examples the OP ran into. D's behavior didn't surprise me at all given
the C heritage. In fact I would have been surprised if it didn't follow C by
default.


I don't think anyone is suggesting abandoning the CoW, Ben. What's needed is a
way to indicate, to the compiler, /when/ CoW is needed ... and have it enforce
that. 

The alternative is that libraries will be full of arbitrary array.dup *just in
case* the caller might modify the result. This is ineffective, and wholly
inefficient. When folk start writing multi-threaded apps in earnest (the next
big wave -- just look at Niagara and Cell) this will become a critical aspect of
a language. For those of us who write heavily multi-threaded apps already,
and/or thread-aware libraries, it's an issue right now.

CoW is just fine. However, when you sit down to write code that adheres to those
ideals, D cannot keep up. Keep CoW, but have the darned compiler enforce it.
Even in a simple manner :-)

For example: 

// return content maintained within such that
// it may be searched, inspected, traversed, 
// compressed into another buffer, CRC'd, 
// Base64'd, marshalled and sent over the wire,
// sent to a file, to the console, or whatever,
// in a read-only manner. NB: content can be huge! 
// This content is externally-immutable by design, 
// so we don't lose track of internal changes. 

readonly dchar[] getReadonlyContent()
{
return content;
}

Look Ma! no .dup!

There's a lot of hand-waving about how D stops a programmer from inadvertantly
doing the wrong thing -- this is an example of how the compiler really /could/
do something very useful. Let's all give the Cow a big hand ... Hoorah! Now lets
get a CoW-catcher attached to the language.

- Kris

Feb 08 2005

Ben Hinkle <Ben_member pathlink.com> writes:

In article <cubv88$1aoi$1 digitaldaemon.com>, Kris says...
In article <cubkj0$11qs$1 digitaldaemon.com>, Ben Hinkle says...
But surely the problem is that, because slicing supports, well, slices, 
it is inevitable that one will be downlow in the call chain, and slicing 
a string whose original source is unknown, or at least hard to know for 
sure. In that case, do we slice or dup first? (Of course, in most cases, 
modifying something that you do not know anything about's going to be 
dodgy, but there will certainly be cases where it'd be desirable, I'm 
sure. Otherwise, no-one would've reported the problem.)

I can't see a robust alternative to either (i) slices are writeable and 
not folded, or (ii) we have const.

That's what COW is all about. The only downside of COW is that it is not
enforced by the language. But then I'd argue the performance (and simplicity)
upside of COW outweigh the downside. But it is a judgement call for sure. I
think it would take a strong case to convince Walter at this point to abandon
COW. I think the lack of documentation about string literals contributed to the
specific examples the OP ran into. D's behavior didn't surprise me at all given
the C heritage. In fact I would have been surprised if it didn't follow C by
default.


I don't think anyone is suggesting abandoning the CoW, Ben. What's needed is a
way to indicate, to the compiler, /when/ CoW is needed ... and have it enforce
that. 

The alternative is that libraries will be full of arbitrary array.dup *just in
case* the caller might modify the result. This is ineffective, and wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not
when you return a string. What you are suggesting could be called "copy on
return" but it isn't "copy on write". Maybe since you say COW still has a place
in D and then I'd say it looks like you are argueing for "copy on anything". One
can argue COW is unsafe because it assumes the user actually obeys COW but
that's how COW works. It is a trade-off.

CoW is just fine. However, when you sit down to write code that adheres to those
ideals, D cannot keep up. Keep CoW, but have the darned compiler enforce it.
Even in a simple manner :-)

For example: 

// return content maintained within such that
// it may be searched, inspected, traversed, 
// compressed into another buffer, CRC'd, 
// Base64'd, marshalled and sent over the wire,
// sent to a file, to the console, or whatever,
// in a read-only manner. NB: content can be huge! 
// This content is externally-immutable by design, 
// so we don't lose track of internal changes. 

readonly dchar[] getReadonlyContent()
{
return content;
}

Look Ma! no .dup!

Why would you .dup something that you aren't writing to? That's COW without the
OW. Of course that is wasteful.

There's a lot of hand-waving about how D stops a programmer from inadvertantly
doing the wrong thing -- this is an example of how the compiler really /could/
do something very useful. Let's all give the Cow a big hand ... Hoorah! Now lets
get a CoW-catcher attached to the language.

- Kris

Feb 08 2005

Kris <Kris_member pathlink.com> writes:

In article <cuc2kg$1df9$1 digitaldaemon.com>, Ben Hinkle says...
The alternative is that libraries will be full of arbitrary array.dup *just in
case* the caller might modify the result. This is ineffective, and wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not
when you return a string. What you are suggesting could be called "copy on
return" but it isn't "copy on write". Maybe since you say COW still has a place
in D and then I'd say it looks like you are argueing for "copy on anything". One
can argue COW is unsafe because it assumes the user actually obeys COW but
that's how COW works. It is a trade-off.



I think we're actually saying the same thing, Ben; 

Right now there's only a flimsy and vague 'trust' mechanism in place, and even
that only applies to folk who (a) understand what CoW means, and (b) fully
understand where the content they just recieved actually came from -- the prior
example might be buried deep under a number of layers, or could be provided
without source-code (heavens!)

Hence, CoW expectations are somewhat fluffy to say the least. Please allow me to
restate the problem another way:

a) you are provided with an array to fill up with data.
b) you are provided with an array of data to inspect, but not mutate.

D does not support any distinction between these two opposite cases. Both are
just plain old arrays. Should one Cow (a)? Why? And how would data be
communicated back via the clone of the provided buffer? One should be trusted to
CoW (b) but it's just not enforced; nor is there any indication to distinguish
it from (a). This is an accident waiting to happen.

The upshot is that (i) the programmer needs an indication, perhaps within the
data type, to grok just what operation is legitimate; and (ii) the compiler
could, and most certainly should, enforce that distinction. 

Just to be sure: we're saying that CoW is context dependent (a & b above), and
currently there's nothing within the language to steer a programmer or
library-user in the right direction. In addition, we're saying the compiler
should emit a (compile-time) error where one mistakenly violates the implicit
assumptions of CoW. 

- Kris

Feb 08 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Kris" <Kris_member pathlink.com> wrote in message 
news:cuc739$1hfu$1 digitaldaemon.com...
 In article <cuc2kg$1df9$1 digitaldaemon.com>, Ben Hinkle says...
The alternative is that libraries will be full of arbitrary array.dup 
*just in
case* the caller might modify the result. This is ineffective, and 
wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to 
it - not
when you return a string. What you are suggesting could be called 
"copy on
return" but it isn't "copy on write". Maybe since you say COW still 
has a place
in D and then I'd say it looks like you are argueing for "copy on 
anything". One
can argue COW is unsafe because it assumes the user actually obeys COW 
but
that's how COW works. It is a trade-off.



 I think we're actually saying the same thing, Ben;

 Right now there's only a flimsy and vague 'trust' mechanism in place, 
 and even
 that only applies to folk who (a) understand what CoW means, and (b) 
 fully
 understand where the content they just recieved actually came from --  
 the prior
 example might be buried deep under a number of layers, or could be 
 provided
 without source-code (heavens!)

 Hence, CoW expectations are somewhat fluffy to say the least. Please 
 allow me to
 restate the problem another way:

 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

 D does not support any distinction between these two opposite cases. 
 Both are
 just plain old arrays. Should one Cow (a)? Why? And how would data be
 communicated back via the clone of the provided buffer? One should be 
 trusted to
 CoW (b) but it's just not enforced; nor is there any indication to 
 distinguish
 it from (a). This is an accident waiting to happen.

 The upshot is that (i) the programmer needs an indication, perhaps 
 within the
 data type, to grok just what operation is legitimate; and (ii) the 
 compiler
 could, and most certainly should, enforce that distinction.

 Just to be sure: we're saying that CoW is context dependent (a & b 
 above), and
 currently there's nothing within the language to steer a programmer or
 library-user in the right direction. In addition, we're saying the 
 compiler
 should emit a (compile-time) error where one mistakenly violates the 
 implicit
 assumptions of CoW.

That's a more cogent view (for you and for me). Well said.

One thing I'm now wondering about, which I'd previously discounted, is 
whether the runtime attribute .readonly, might not now actually suffice. 
It'd leave people not having to care - only the compiler would do so. 
The (not inconsiderable) disadvantage is that slices would now not be

    struct slice
    {
        size_t len;
        T        *ptr;
    };

but rather

    struct slice
    {
        size_t len;
        T        *ptr;
        unsigned readOnly : 1;
    };


But maybe that's not a bad thing. We might be able to tag on more 
attributes within the bit field to better facilitate other features. 
(None spring to mind at the mo, naturally enough.)

Thoughts?

Feb 08 2005

Kris <Kris_member pathlink.com> writes:

In article <cuc89m$1iin$2 digitaldaemon.com>, Matthew says...
"Kris" <Kris_member pathlink.com> wrote in message 
 I think we're actually saying the same thing, Ben;

 Right now there's only a flimsy and vague 'trust' mechanism in place, 
 and even
 that only applies to folk who (a) understand what CoW means, and (b) 
 fully
 understand where the content they just recieved actually came from --  
 the prior
 example might be buried deep under a number of layers, or could be 
 provided
 without source-code (heavens!)

 Hence, CoW expectations are somewhat fluffy to say the least. Please 
 allow me to
 restate the problem another way:

 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

 D does not support any distinction between these two opposite cases. 
 Both are
 just plain old arrays. Should one Cow (a)? Why? And how would data be
 communicated back via the clone of the provided buffer? One should be 
 trusted to
 CoW (b) but it's just not enforced; nor is there any indication to 
 distinguish
 it from (a). This is an accident waiting to happen.

 The upshot is that (i) the programmer needs an indication, perhaps 
 within the
 data type, to grok just what operation is legitimate; and (ii) the 
 compiler
 could, and most certainly should, enforce that distinction.

 Just to be sure: we're saying that CoW is context dependent (a & b 
 above), and
 currently there's nothing within the language to steer a programmer or
 library-user in the right direction. In addition, we're saying the 
 compiler
 should emit a (compile-time) error where one mistakenly violates the 
 implicit
 assumptions of CoW.

That's a more cogent view (for you and for me). Well said.

One thing I'm now wondering about, which I'd previously discounted, is 
whether the runtime attribute .readonly, might not now actually suffice. 
It'd leave people not having to care - only the compiler would do so. 
The (not inconsiderable) disadvantage is that slices would now not be

    struct slice
    {
        size_t len;
        T        *ptr;
    };

but rather

    struct slice
    {
        size_t len;
        T        *ptr;
        unsigned readOnly : 1;
    };


But maybe that's not a bad thing. We might be able to tag on more 
attributes within the bit field to better facilitate other features. 
(None spring to mind at the mo, naturally enough.)

Thoughts?


Aye -- but, for compile-time enforcement, surely only the type info is needed
regardless of implementation? If so, then slices-types would have to match, or
be more stringent than the readonly characteristic of the slice-source (via the
symbol table). 

Thus, slices upon *literals* would have to be declared readonly too. The
indication would presumably also be present in the associated typeinfo instance.

Perhaps I misunderstood you?

- Kris

Feb 08 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Kris wrote:
 In article <cuc89m$1iin$2 digitaldaemon.com>, Matthew says...
 
One thing I'm now wondering about, which I'd previously discounted, is 
whether the runtime attribute .readonly, might not now actually suffice. 
It'd leave people not having to care - only the compiler would do so. 
The (not inconsiderable) disadvantage is that slices would now not be

   struct slice
   {
       size_t len;
       T        *ptr;
   };

but rather

   struct slice
   {
       size_t len;
       T        *ptr;
       unsigned readOnly : 1;
   };


Not necessarily. The "readonly" boolean could just as well go in 
typeinfo... After all, it doesn't affect the contents whatsoever.

Sorta like (char*) and (int*), which both point to the same place.
Or like with "char *" and const char *" in That Other Language...

"casting" readonly=0 to readonly=1 would be allowed, by explicitly
setting it (just like with array.length). But vice-versa would not.

 Aye -- but, for compile-time enforcement, surely only the type info is needed
 regardless of implementation? If so, then slices-types would have to match, or
 be more stringent than the readonly characteristic of the slice-source (via the
 symbol table). 
 
 Thus, slices upon *literals* would have to be declared readonly too. The
 indication would presumably also be present in the associated typeinfo
instance.

It might be needed to have *yet another* flag to indicate literals.
The compiler "needs" it to be able to such performance hacks as :

 char* toStringz(char[] string)
     {
 	char* p;
 	char[] copy;
 
 	if (string.length == 0)
 	    return "";
 
 	p = &string[0] + string.length;
 
 	// Peek past end of string[], if it's 0, no conversion necessary.
 	// Note that the compiler will put a 0 past the end of static
 	// strings, and the storage allocator will put a 0 past the end
 	// of newly allocated char[]'s.
 	if (*p == 0)
 	    return string;
 
 	// Need to make a copy
 	copy = new char[string.length + 1];
 	copy[0..string.length] = string;
 	copy[string.length] = 0;
 	return copy;
     }

char* toStringz(char[] string)
{
   if (string.length == 0)
     return "";
   else if (string.literal)
     return string.ptr;
   else
     return string ~ "\0";
}

Using "readonly" here won't work, since slices are not NUL-terminated.
Only the string literals are, with '\u0000', so they can work with C.

When array literals and hash literals finally arrive, it will be even
more fun. (ignoring such almost-related concepts as function literals)

int[] array = [ 1, 2, 3 ]; foo(cast(byte[]) [1,2]);
str[str] hash = [ "one": 1, "two": 2, "three": 3 ];

--anders

Feb 09 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:cucklg$1u1i$1 digitaldaemon.com...
 Kris wrote:
 In article <cuc89m$1iin$2 digitaldaemon.com>, Matthew says...

One thing I'm now wondering about, which I'd previously discounted, 
is whether the runtime attribute .readonly, might not now actually 
suffice. It'd leave people not having to care - only the compiler 
would do so. The (not inconsiderable) disadvantage is that slices 
would now not be

   struct slice
   {
       size_t len;
       T        *ptr;
   };

but rather

   struct slice
   {
       size_t len;
       T        *ptr;
       unsigned readOnly : 1;
   };


 Not necessarily. The "readonly" boolean could just as well go in 
 typeinfo... After all, it doesn't affect the contents whatsoever.

How? A slice doesn't point to typeinfo, or at least it didn't last time 
I looked.

Or do you mean that, under the seams, D would know that one slice 
pointed to char and another pointed to const char? In that case, where 
is that knowledge represented?

 Sorta like (char*) and (int*), which both point to the same place.
 Or like with "char *" and const char *" in That Other Language...

 "casting" readonly=0 to readonly=1 would be allowed, by explicitly
 setting it (just like with array.length). But vice-versa would not.

 Aye -- but, for compile-time enforcement, surely only the type info 
 is needed
 regardless of implementation? If so, then slices-types would have to 
 match, or
 be more stringent than the readonly characteristic of the 
 slice-source (via the
 symbol table). Thus, slices upon *literals* would have to be declared 
 readonly too. The
 indication would presumably also be present in the associated 
 typeinfo instance.

 It might be needed to have *yet another* flag to indicate literals.
 The compiler "needs" it to be able to such performance hacks as :

 char* toStringz(char[] string)
     {
 char* p;
 char[] copy;

 if (string.length == 0)
     return "";

 p = &string[0] + string.length;

 // Peek past end of string[], if it's 0, no conversion necessary.
 // Note that the compiler will put a 0 past the end of static
 // strings, and the storage allocator will put a 0 past the end
 // of newly allocated char[]'s.
 if (*p == 0)
     return string;

 // Need to make a copy
 copy = new char[string.length + 1];
 copy[0..string.length] = string;
 copy[string.length] = 0;
 return copy;
     }

 char* toStringz(char[] string)
 {
   if (string.length == 0)
     return "";
   else if (string.literal)
     return string.ptr;
   else
     return string ~ "\0";
 }

 Using "readonly" here won't work, since slices are not NUL-terminated.
 Only the string literals are, with '\u0000', so they can work with C.

 When array literals and hash literals finally arrive, it will be even
 more fun. (ignoring such almost-related concepts as function literals)

 int[] array = [ 1, 2, 3 ]; foo(cast(byte[]) [1,2]);
 str[str] hash = [ "one": 1, "two": 2, "three": 3 ];

Interesting ...

Am too tired to think at the mo, but instinct tells me you may have a 
good point (wrt NUL-termination)

Feb 09 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Matthew wrote:

 How? A slice doesn't point to typeinfo, or at least it didn't last time 
 I looked.

How do you get the RTTI otherwise? I need to peek a little in DMD again.

 Or do you mean that, under the seams, D would know that one slice 
 pointed to char and another pointed to const char? In that case, where 
 is that knowledge represented?

That is an interesting question, but I'm sure it can't be impossible...

 Am too tired to think at the mo, but instinct tells me you may have a 
 good point (wrt NUL-termination)

Well, the current implementation is seriously flawed since it peeks
outside the allocated memory which is a big no-no - just as usual.
And that "storage allocator will put a 0 past the end" is not true,
since it doesn't hold with certain multiples of two, such as 16.

Right now it comes down to choices for D, either we .dup *always* -
or make something good instead. Such as slices and readonly arrays.

--anders

Feb 09 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:cucllv$1v8u$1 digitaldaemon.com...
 Matthew wrote:

 How? A slice doesn't point to typeinfo, or at least it didn't last time I 
 looked.

 How do you get the RTTI otherwise? I need to peek a little in DMD again.

Class objects have RTTI. Arrays and basic types do not.

 Or do you mean that, under the seams, D would know that one slice pointed 
 to char and another pointed to const char? In that case, where is that 
 knowledge represented?

 That is an interesting question, but I'm sure it can't be impossible...

I would expect it would involve significant redesign of some pretty basic 
parts of D.

 Am too tired to think at the mo, but instinct tells me you may have a 
 good point (wrt NUL-termination)

 Well, the current implementation is seriously flawed since it peeks
 outside the allocated memory which is a big no-no - just as usual.
 And that "storage allocator will put a 0 past the end" is not true,
 since it doesn't hold with certain multiples of two, such as 16.

Don't worry about the toStringz problem. It will be fixed - and it is 
probably fixed in Walter's sandbox already. That was a bug in toStringz that 
didn't have anything to do with COW. It was more like RARL 
(read-a-random-location).

 Right now it comes down to choices for D, either we .dup *always* -
 or make something good instead. Such as slices and readonly arrays.

In your opinion, sure. I hope anyone reading this thread doesn't start 
ignoring COW and duping always. And I doubt Walter is going to add a 
readonly attribute.

Feb 09 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Ben Hinkle wrote:

Well, the current implementation is seriously flawed since it peeks
outside the allocated memory which is a big no-no - just as usual.
And that "storage allocator will put a 0 past the end" is not true,
since it doesn't hold with certain multiples of two, such as 16.

 
 Don't worry about the toStringz problem. It will be fixed - and it is 
 probably fixed in Walter's sandbox already. That was a bug in toStringz that 
 didn't have anything to do with COW. It was more like RARL 
 (read-a-random-location).

But if you remove the current check, you also remove the speedup ?

This means that string literals now need to be copied for NUL-term.
(unless there is some other way to recognize a string literal...)

Right now it comes down to choices for D, either we .dup *always* -
or make something good instead. Such as slices and readonly arrays.

 
 In your opinion, sure. I hope anyone reading this thread doesn't start 
 ignoring COW and duping always. And I doubt Walter is going to add a 
 readonly attribute. 

It's pretty much the only way to be sure, at the moment. (and sucks)


The first fix would be to place string literals in read-only memory
on Windows too, for portability. (and to help with string pooling too)

The second and more demanding problems is to get some help enforce CoW?
There needs to be something more than "honor" protecting immutability...


Just as "inout" is a reasonable simplification and workaround for
C++ references, there needs to be something for the const specifier.
It doesn't have to cover all the uses, just the more common ones...
And my suggestion was to try and add something for the arrays (only)

--anders

Feb 09 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:cud8np$2idn$1 digitaldaemon.com...
 Ben Hinkle wrote:

Well, the current implementation is seriously flawed since it peeks
outside the allocated memory which is a big no-no - just as usual.
And that "storage allocator will put a 0 past the end" is not true,
since it doesn't hold with certain multiples of two, such as 16.

 Don't worry about the toStringz problem. It will be fixed - and it is 
 probably fixed in Walter's sandbox already. That was a bug in toStringz 
 that didn't have anything to do with COW. It was more like RARL 
 (read-a-random-location).

 But if you remove the current check, you also remove the speedup ?

 This means that string literals now need to be copied for NUL-term.
 (unless there is some other way to recognize a string literal...)

I don't know how Walter is fixing it but it could end up copying more than 
the previous implementation. But there's really no way around fixing the 
bug.

Right now it comes down to choices for D, either we .dup *always* -
or make something good instead. Such as slices and readonly arrays.

 In your opinion, sure. I hope anyone reading this thread doesn't start 
 ignoring COW and duping always. And I doubt Walter is going to add a 
 readonly attribute.

 It's pretty much the only way to be sure, at the moment. (and sucks)


 The first fix would be to place string literals in read-only memory
 on Windows too, for portability. (and to help with string pooling too)

Sounds reasonable but I have no idea how Windows manages object files. If 
it's a simple matter of putting the data in some other segment then go for 
it but if it isn't that simple you have to weigh the costs.

 The second and more demanding problems is to get some help enforce CoW?
 There needs to be something more than "honor" protecting immutability...

I'm sure Walter has heard your request. Personally I'd be surprised if he 
winds up adding a const/readonly attribute but you never know...

 Just as "inout" is a reasonable simplification and workaround for
 C++ references, there needs to be something for the const specifier.
 It doesn't have to cover all the uses, just the more common ones...
 And my suggestion was to try and add something for the arrays (only)

 --anders

Feb 09 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Kris" <Kris_member pathlink.com> wrote in message 
news:cucdre$1n34$1 digitaldaemon.com...
 In article <cuc89m$1iin$2 digitaldaemon.com>, Matthew says...
"Kris" <Kris_member pathlink.com> wrote in message
 I think we're actually saying the same thing, Ben;

 Right now there's only a flimsy and vague 'trust' mechanism in 
 place,
 and even
 that only applies to folk who (a) understand what CoW means, and (b)
 fully
 understand where the content they just recieved actually came 
 from --
 the prior
 example might be buried deep under a number of layers, or could be
 provided
 without source-code (heavens!)

 Hence, CoW expectations are somewhat fluffy to say the least. Please
 allow me to
 restate the problem another way:

 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not 
 mutate.

 D does not support any distinction between these two opposite cases.
 Both are
 just plain old arrays. Should one Cow (a)? Why? And how would data 
 be
 communicated back via the clone of the provided buffer? One should 
 be
 trusted to
 CoW (b) but it's just not enforced; nor is there any indication to
 distinguish
 it from (a). This is an accident waiting to happen.

 The upshot is that (i) the programmer needs an indication, perhaps
 within the
 data type, to grok just what operation is legitimate; and (ii) the
 compiler
 could, and most certainly should, enforce that distinction.

 Just to be sure: we're saying that CoW is context dependent (a & b
 above), and
 currently there's nothing within the language to steer a programmer 
 or
 library-user in the right direction. In addition, we're saying the
 compiler
 should emit a (compile-time) error where one mistakenly violates the
 implicit
 assumptions of CoW.

That's a more cogent view (for you and for me). Well said.

One thing I'm now wondering about, which I'd previously discounted, is
whether the runtime attribute .readonly, might not now actually 
suffice.
It'd leave people not having to care - only the compiler would do so.
The (not inconsiderable) disadvantage is that slices would now not be

    struct slice
    {
        size_t len;
        T        *ptr;
    };

but rather

    struct slice
    {
        size_t len;
        T        *ptr;
        unsigned readOnly : 1;
    };


But maybe that's not a bad thing. We might be able to tag on more
attributes within the bit field to better facilitate other features.
(None spring to mind at the mo, naturally enough.)

Thoughts?


 Aye -- but, for compile-time enforcement, surely only the type info is 
 needed
 regardless of implementation? If so, then slices-types would have to 
 match, or
 be more stringent than the readonly characteristic of the slice-source 
 (via the
 symbol table).

 Thus, slices upon *literals* would have to be declared readonly too. 
 The
 indication would presumably also be present in the associated typeinfo 
 instance.

 Perhaps I misunderstood you?

To be honest, Kris, I've kind of lost the thread in this thread.

It'd be great if people who've experience the problem(s) could post a 
short illustrative sample. Not only would that help clarify the precise 
root of the problem, I'm sure it'd also help Walter to see it, if it's 
not something that's crossed his path as yet.

Feb 09 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Matthew wrote:

 To be honest, Kris, I've kind of lost the thread in this thread.
 
 It'd be great if people who've experience the problem(s) could post a 
 short illustrative sample. Not only would that help clarify the precise 
 root of the problem, I'm sure it'd also help Walter to see it, if it's 
 not something that's crossed his path as yet.

The thread basically boiled down to a discussion about two issues:

1) That string literals are read-only on Unix, but not on Windows
This causes segfaults if you assign to a string literal, or a slice
of a string literal (or anything else that points to the literal)

One suggestion was to make the literals read-only in the D spec,
and store them in read-only memory on Windows too (to get an A.V.)

2) A very long discussion about "const" or "readonly" strings.
Kris wanted to use those to make sure that his return values or
array parameters were not being modified by the receiver of them.

There seemed to be a general opinion that the Copy-on-Write should
have some way of being enforced by the compiler. (readonly strings)

Right now there is no way of passing a "readonly" struct pointer,
or making sure some kind of "inout" reference is used on arrays
that the function actually intends to modify the contents of...


Beyond that, there was the usual bug or omission in the D spec:

 http://www.digitalmars.com/d/cppstrings.html says:
 
  In D, use the array slicing syntax in the natural manner:

     char[] str = "hello";
     str[1..2] = '?';        // str is "h??lo"


Which should be:

     char[] str = "hello".dup;
     str[1..3] = '?';        // str is "h??lo" 


At least, that's how I started the thread and how I'll end it.


The End.
--anders

Feb 09 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 9 Feb 2005 05:29:13 +0000 (UTC), Kris wrote:


[snip]

 Hence, CoW expectations are somewhat fluffy to say the least. Please allow me
to
 restate the problem another way:
 
 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

 c) you are provided with an array and might update parts of, depending on
other parameters.

This is difficult one because the called routine does a CoW if and only if
it modifies the array data. The calling routine does not know if the array
is going to be modified or not prior to the call. Sometimes that may be
significant. So does the caller do .dup just in case it gets changed? The
permutations are complex.

-- 
Derek
Melbourne, Australia
9/02/2005 5:39:15 PM

Feb 08 2005

Kris <Kris_member pathlink.com> writes:

In article <n0teow12fzy7.eoffoinrx065$.dlg 40tude.net>, Derek Parnell says...
On Wed, 9 Feb 2005 05:29:13 +0000 (UTC), Kris wrote:


[snip]

 Hence, CoW expectations are somewhat fluffy to say the least. Please allow me
to
 restate the problem another way:
 
 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

 c) you are provided with an array and might update parts of, depending on
other parameters.

This is difficult one because the called routine does a CoW if and only if
it modifies the array data. The calling routine does not know if the array
is going to be modified or not prior to the call. Sometimes that may be
significant. So does the caller do .dup just in case it gets changed? The
permutations are complex.


I think that one *may* be covered by (b). Let me have a go at it:

In such a case, the callee would presumably CoW as necessary, and return the
modified copy rather than the original. Part of the contract between the two
would have to arrange for the caller to assume a change may occur, and act
accordingly. This is CoW, but without enforcement as in case (b). Ideally, the
array would be readonly on input, and readonly on return. With the slight twist
that the return might actually be a *modifed copy* of the original (but often
just the readonly original instead).

I may not have understood correctly though. Please correct me if so! 

I'd like to point out, if I may, that the issue is perhaps most acute with
return-values. I say this because there's not even a parameter-name to steer a
user. Such a case is where any implied type-information is utterly lost (as in
the original example). 

I'd also like to point out that CoW enforcement (by the compiler) does *not*
induce any additional runtime overhead. It simply steers the programmer, and
mandates the rules are obeyed, at compile-time.

- Kris

Feb 08 2005

Kris <Kris_member pathlink.com> writes:

In article <cucfmc$1ob0$1 digitaldaemon.com>, Kris says...
In article <n0teow12fzy7.eoffoinrx065$.dlg 40tude.net>, Derek Parnell says...
On Wed, 9 Feb 2005 05:29:13 +0000 (UTC), Kris wrote:


[snip]

 Hence, CoW expectations are somewhat fluffy to say the least. Please allow me
to
 restate the problem another way:
 
 a) you are provided with an array to fill up with data.
 b) you are provided with an array of data to inspect, but not mutate.

 c) you are provided with an array and might update parts of, depending on
other parameters.

This is difficult one because the called routine does a CoW if and only if
it modifies the array data. The calling routine does not know if the array
is going to be modified or not prior to the call. Sometimes that may be
significant. So does the caller do .dup just in case it gets changed? The
permutations are complex.


I think that one *may* be covered by (b). Let me have a go at it:

In such a case, the callee would presumably CoW as necessary, and return the
modified copy rather than the original. Part of the contract between the two
would have to arrange for the caller to assume a change may occur, and act
accordingly. This is CoW, but without enforcement as in case (b). Ideally, the
array would be readonly on input, and readonly on return. With the slight twist
that the return might actually be a *modifed copy* of the original (but often
just the readonly original instead).

I may not have understood correctly though. Please correct me if so! 

I'd like to point out, if I may, that the issue is perhaps most acute with
return-values. I say this because there's not even a parameter-name to steer a
user. Such a case is where any implied type-information is utterly lost (as in
the original example). 

I'd also like to point out that CoW enforcement (by the compiler) does *not*
induce any additional runtime overhead. It simply steers the programmer, and
mandates the rules are obeyed, at compile-time.

- Kris


Ack! I'm tired, and doubling up on phrases;

The callee could also declare the input array as 'inout', and therefore modify
the .length and happily mutate away. The two parties might also arrange to pass
an 'inout' array pointer. Neither of these options would require CoW, nor
readonly attributes, at all. A case such as this is perhaps outside the realm of
concern?

- Kris

Feb 09 2005

"Walter" <newshound digitalmars.com> writes:

"Kris" <Kris_member pathlink.com> wrote in message
news:cucgl8$1pks$1 digitaldaemon.com...
 The callee could also declare the input array as 'inout', and therefore

modify
 the .length and happily mutate away. The two parties might also arrange to

pass
 an 'inout' array pointer. Neither of these options would require CoW, nor
 readonly attributes, at all. A case such as this is perhaps outside the

realm of
 concern?

I believe you're right that 'inout' handles this particular case adequately.

Feb 09 2005

Ben Hinkle <Ben_member pathlink.com> writes:

In article <cuc739$1hfu$1 digitaldaemon.com>, Kris says...
In article <cuc2kg$1df9$1 digitaldaemon.com>, Ben Hinkle says...
The alternative is that libraries will be full of arbitrary array.dup *just in
case* the caller might modify the result. This is ineffective, and wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not
when you return a string. What you are suggesting could be called "copy on
return" but it isn't "copy on write". Maybe since you say COW still has a place
in D and then I'd say it looks like you are argueing for "copy on anything". One
can argue COW is unsafe because it assumes the user actually obeys COW but
that's how COW works. It is a trade-off.



I think we're actually saying the same thing, Ben; 

Right now there's only a flimsy and vague 'trust' mechanism in place, and even
that only applies to folk who (a) understand what CoW means, and (b) fully
understand where the content they just recieved actually came from -- the prior
example might be buried deep under a number of layers, or could be provided
without source-code (heavens!)

[snip]

I completely agree COW relies on the programmers knowing what they are doing.
The tradeoffs are:
1) assume the programmers know about COW and follow it (not always true), or
2) dup like crazy and watch performance suffer, or
3) add a const/readonly attribute to help with some common gochas

None of these are perfect. Const in C++ has some issues that I assume readonly
would share. Plus to learn about readonly and all the gocha's one might have to
invest just as much effort as learning COW - in fact one might argue that
learning COW is much simpler than learning about how to use readonly/const.

For example, here's some C++ code where COW would arguably be more safe than
const:
void foo(char*x, const char*y) { x[0] = 'a'; ... }
Now if one calls foo(z,z) for a char*z, then even though y says it is const
there isn't anything preventing x from assinging to the same string. So is it
right to say the y is const? Sure but it doesn't mean the contents of y won't
change. If COW was used the function foo in D would look like
void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
I would say the D foo is safer than the C++ foo.

Adding a readonly or const attribute will catch some errors at compile time but
I argue it would introduce complexity and have gocha's and corner cases that
make it less desirable than COW.

Feb 09 2005

Ben Hinkle <Ben_member pathlink.com> writes:

Adding a readonly or const attribute will catch some errors at compile time but
I argue it would introduce complexity and have gocha's and corner cases that
make it less desirable than COW.

sorry for the double-post but I forgot to add, as I mentioned before, if we
integrated a lightning-fast dlint program into our editors that we can flag COW
violations as you write the code *before* you actually compile. If Walter really
wants to he can also add COW violation statements to the verbose mode output -
but I wouldn't hold my breath for that.

Feb 09 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Ben Hinkle wrote:

 sorry for the double-post but I forgot to add, as I mentioned before, if we
 integrated a lightning-fast dlint program into our editors that we can flag COW
 violations as you write the code *before* you actually compile. If Walter
really
 wants to he can also add COW violation statements to the verbose mode output -
 but I wouldn't hold my breath for that.

Wouldn't that be a lot like adding (optional) warnings to the compiler ?

But, yeah, if the compiler doesn't check it - then someone else must...

--anders

Feb 09 2005

Kris <Kris_member pathlink.com> writes:

In article <cud2ea$2aot$1 digitaldaemon.com>, Ben Hinkle says...
In article <cuc739$1hfu$1 digitaldaemon.com>, Kris says...
In article <cuc2kg$1df9$1 digitaldaemon.com>, Ben Hinkle says...
The alternative is that libraries will be full of arbitrary array.dup *just in
case* the caller might modify the result. This is ineffective, and wholly
inefficient.

Again, that is the whole point of COW. You only copy when you write to it - not
when you return a string. What you are suggesting could be called "copy on
return" but it isn't "copy on write". Maybe since you say COW still has a place
in D and then I'd say it looks like you are argueing for "copy on anything". One
can argue COW is unsafe because it assumes the user actually obeys COW but
that's how COW works. It is a trade-off.



I think we're actually saying the same thing, Ben; 

Right now there's only a flimsy and vague 'trust' mechanism in place, and even
that only applies to folk who (a) understand what CoW means, and (b) fully
understand where the content they just recieved actually came from -- the prior
example might be buried deep under a number of layers, or could be provided
without source-code (heavens!)

[snip]

I completely agree COW relies on the programmers knowing what they are doing.
The tradeoffs are:
1) assume the programmers know about COW and follow it (not always true), or
2) dup like crazy and watch performance suffer, or
3) add a const/readonly attribute to help with some common gochas

None of these are perfect. Const in C++ has some issues that I assume readonly
would share. Plus to learn about readonly and all the gocha's one might have to
invest just as much effort as learning COW - in fact one might argue that
learning COW is much simpler than learning about how to use readonly/const.

For example, here's some C++ code where COW would arguably be more safe than
const:
void foo(char*x, const char*y) { x[0] = 'a'; ... }
Now if one calls foo(z,z) for a char*z, then even though y says it is const
there isn't anything preventing x from assinging to the same string. So is it
right to say the y is const? Sure but it doesn't mean the contents of y won't
change. If COW was used the function foo in D would look like
void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
I would say the D foo is safer than the C++ foo.

Adding a readonly or const attribute will catch some errors at compile time but
I argue it would introduce complexity and have gocha's and corner cases that
make it less desirable than COW.


What you appear to be saying here, is that the programmer can always write the
following:

* (char *) 0x00000000 = 0;

Which, of course, they can. Nobody is asking for a mechanism to stop the
programmer from wholly circumventing the language constructs. Instead, we're
looking for a means of "steering the programmer in the right direction" -- a
notion Walter seems to strive for.

I was surprised at how many "I assume" and "might" phrases were in your post,
yet you've already surmised that something like 'readonly' is effectively
worthless. At the same time you note, just like the rest of us, that CoW has
significant issues. Which it certainly has.

Where did that constructive attitude go to?

Feb 09 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

For example, here's some C++ code where COW would arguably be more safe 
than
const:
void foo(char*x, const char*y) { x[0] = 'a'; ... }
Now if one calls foo(z,z) for a char*z, then even though y says it is 
const
there isn't anything preventing x from assinging to the same string. So is 
it
right to say the y is const? Sure but it doesn't mean the contents of y 
won't
change. If COW was used the function foo in D would look like
void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
I would say the D foo is safer than the C++ foo.

Adding a readonly or const attribute will catch some errors at compile 
time but
I argue it would introduce complexity and have gocha's and corner cases 
that
make it less desirable than COW.


 What you appear to be saying here, is that the programmer can always write 
 the
 following:

 * (char *) 0x00000000 = 0;

uhh - I don't know where you got that from. I was giving an example in C++ 
where const might confuse a careless programmer. What does that have to do 
with a seg-v?

 Which, of course, they can. Nobody is asking for a mechanism to stop the
 programmer from wholly circumventing the language constructs. Instead, 
 we're
 looking for a means of "steering the programmer in the right direction" --  
 a
 notion Walter seems to strive for.

My C++ example didn't circumvent anything. It is perfectly legal and 
cast-free C++.

 I was surprised at how many "I assume" and "might" phrases were in your 
 post,
 yet you've already surmised that something like 'readonly' is effectively
 worthless. At the same time you note, just like the rest of us, that CoW 
 has
 significant issues. Which it certainly has.

I try to keep my posts neutral and avoid using extreme language. I've been 
pretty turned off by some of the attitudes in these recent threads. It's 
probably due to my math background that I don't like stating opinions as 
facts or pushing one point of view. For example when I said something like 
"I assume adding readonly to D will behave like const in C++" that's because 
it is possible that some readonly might not have the same problems that 
const does in C++ but I think it is reasonable to assume they are similar 
enough to compare them. I never characterized readonly as "effectively 
worthless". I said it isn't perfect. I said none of the solutions are 
perfect.

 Where did that constructive attitude go to?

I'm not sure what you mean by that. I've suggested using dlint to help 
coders catch bugs before compiling. I've suggested putting more messages in 
the verbose output of dmd.

Feb 09 2005

Kris <Kris_member pathlink.com> writes:

In article <cudmqh$30rt$1 digitaldaemon.com>, Ben Hinkle says...
For example, here's some C++ code where COW would arguably be more safe 
than
const:
void foo(char*x, const char*y) { x[0] = 'a'; ... }
Now if one calls foo(z,z) for a char*z, then even though y says it is 
const
there isn't anything preventing x from assinging to the same string. So is 
it
right to say the y is const? Sure but it doesn't mean the contents of y 
won't
change. If COW was used the function foo in D would look like
void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
I would say the D foo is safer than the C++ foo.

Adding a readonly or const attribute will catch some errors at compile 
time but
I argue it would introduce complexity and have gocha's and corner cases 
that
make it less desirable than COW.


 What you appear to be saying here, is that the programmer can always write 
 the
 following:

 * (char *) 0x00000000 = 0;

uhh - I don't know where you got that from. I was giving an example in C++ 
where const might confuse a careless programmer. What does that have to do 
with a seg-v?

Ben, you trawled up an example where the callee had tried to be explicit about
its contract, but the caller deliberately (or recklessly) violated that by
passing the same array for both arguments. Yes! that could happen! Just as the
seg-v could 'happen'. We're not suggesting the compiler should try to eliminate
such behavior; so your example and the seg-v are, within this context,
equivalent.

My C++ example didn't circumvent anything. It is perfectly legal and 
cast-free C++.

It circumvented the contract of the callee, as defined by the types of its
parameters, using knowledge of the callee internals. Split some more hairs,
dude.


 I've been 
pretty turned off by some of the attitudes in these recent threads. 

You are, by no means, the only one.


probably due to my math background that I don't like stating opinions as 
facts or pushing one point of view. 

Yet, you do so with gay abandon ("And I doubt Walter is going to add a readonly
attribute"). 


 Where did that constructive attitude go to?

I'm not sure what you mean by that. 

There was a time, in the past, when you might perhaps have looked at how
readonly *could* or *might* be implemented in a valid, useful, and natural
manner. Whilst it is perfectly valid to take up an opposing view, it doesn't do
anyone any favours by loading the proposed notion with characteristics of a
failed - and by your own admission, vaguely related - implementation from some
other language.

I understand that the C++ "const" has left a bad taste in the mouths of many;
that should not colour the potential for D to be a better language than it
currently is. 

Chill.

Feb 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 9 Feb 2005 20:11:12 +0000 (UTC), Kris <Kris_member pathlink.com>  
wrote:
 In article <cudmqh$30rt$1 digitaldaemon.com>, Ben Hinkle says...
 For example, here's some C++ code where COW would arguably be more  
 safe
 than
 const:
 void foo(char*x, const char*y) { x[0] = 'a'; ... }
 Now if one calls foo(z,z) for a char*z, then even though y says it is
 const
 there isn't anything preventing x from assinging to the same string.  
 So is
 it
 right to say the y is const? Sure but it doesn't mean the contents of  
 y
 won't
 change. If COW was used the function foo in D would look like
 void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
 I would say the D foo is safer than the C++ foo.

 Adding a readonly or const attribute will catch some errors at compile
 time but
 I argue it would introduce complexity and have gocha's and corner  
 cases
 that
 make it less desirable than COW.


 What you appear to be saying here, is that the programmer can always  
 write
 the
 following:

 * (char *) 0x00000000 = 0;

 uhh - I don't know where you got that from. I was giving an example in  
 C++
 where const might confuse a careless programmer. What does that have to  
 do
 with a seg-v?

 Ben, you trawled up an example where the callee had tried to be explicit  
 about
 its contract, but the caller deliberately (or recklessly) violated that  
 by
 passing the same array for both arguments. Yes! that could happen! Just  
 as the
 seg-v could 'happen'. We're not suggesting the compiler should try to  
 eliminate
 such behavior; so your example and the seg-v are, within this context,
 equivalent.

 My C++ example didn't circumvent anything. It is perfectly legal and
 cast-free C++.

 It circumvented the contract of the callee, as defined by the types of  
 its
 parameters,

IMO the contract said, I won't mutate parameter 2, I _might_ mutate  
parameter 1. So it didn't violate it, it was just a risky gamble.

The programmer simply made a mistake but not recognising that.

It may be shortsighted to not realise that passing the same variable for  
both could cause odd behaviour, it is also possible it might not have.

It's in everyones best interest for the compiler to do all it can to help  
find these, I don't think anyone here is disagreeing with that.

 using knowledge of the callee internals. Split some more hairs,
 dude.

It can still happen if the caller has no knowledge of the internals.
It's also possible passing the same param had no ill effect, due to the  
internals.

 I've been
 pretty turned off by some of the attitudes in these recent threads.

 You are, by no means, the only one.


 probably due to my math background that I don't like stating opinions as
 facts or pushing one point of view.

 Yet, you do so with gay abandon ("And I doubt Walter is going to add a  
 readonly
 attribute").

IMO that's an opinion, not stated as a fact.
I didn't feel Ben was 'pushing' one point of view, he was certainly  
arguing _his_ point of view.

 Where did that constructive attitude go to?

 I'm not sure what you mean by that.

 There was a time, in the past, when you might perhaps have looked at how
 readonly *could* or *might* be implemented in a valid, useful, and  
 natural
 manner. Whilst it is perfectly valid to take up an opposing view, it  
 doesn't do
 anyone any favours by loading the proposed notion with characteristics  
 of a
 failed - and by your own admission, vaguely related - implementation  
 from some
 other language.

 I understand that the C++ "const" has left a bad taste in the mouths of  
 many;
 that should not colour the potential for D to be a better language than  
 it
 currently is.

I am interested in a post detailing the evils of const in C++, because I  
am certainly not 100% versed on the subject, yet, I want to participate in  
this discussion because I hope that some sort of compiler checked readonly  
is possible.

Once we have said post we can discuss how it applies to D, because D is  
not exactly the same as C and it's possible the readonly idea is not the  
same as the cost one.

Regan

Feb 09 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opslyc05jt23k2f5 ally...
 On Wed, 9 Feb 2005 20:11:12 +0000 (UTC), Kris 
 <Kris_member pathlink.com>  wrote:
 In article <cudmqh$30rt$1 digitaldaemon.com>, Ben Hinkle says...
 For example, here's some C++ code where COW would arguably be more 
 safe
 than
 const:
 void foo(char*x, const char*y) { x[0] = 'a'; ... }
 Now if one calls foo(z,z) for a char*z, then even though y says it 
 is
 const
 there isn't anything preventing x from assinging to the same 
 string.  So is
 it
 right to say the y is const? Sure but it doesn't mean the contents 
 of  y
 won't
 change. If COW was used the function foo in D would look like
 void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
 I would say the D foo is safer than the C++ foo.

 Adding a readonly or const attribute will catch some errors at 
 compile
 time but
 I argue it would introduce complexity and have gocha's and corner 
 cases
 that
 make it less desirable than COW.


 What you appear to be saying here, is that the programmer can 
 always  write
 the
 following:

 * (char *) 0x00000000 = 0;

 uhh - I don't know where you got that from. I was giving an example 
 in  C++
 where const might confuse a careless programmer. What does that have 
 to  do
 with a seg-v?

 Ben, you trawled up an example where the callee had tried to be 
 explicit  about
 its contract, but the caller deliberately (or recklessly) violated 
 that  by
 passing the same array for both arguments. Yes! that could happen! 
 Just  as the
 seg-v could 'happen'. We're not suggesting the compiler should try to 
 eliminate
 such behavior; so your example and the seg-v are, within this 
 context,
 equivalent.

 My C++ example didn't circumvent anything. It is perfectly legal and
 cast-free C++.

 It circumvented the contract of the callee, as defined by the types 
 of  its
 parameters,

 IMO the contract said, I won't mutate parameter 2, I _might_ mutate 
 parameter 1. So it didn't violate it, it was just a risky gamble.

 The programmer simply made a mistake but not recognising that.

 It may be shortsighted to not realise that passing the same variable 
 for  both could cause odd behaviour, it is also possible it might not 
 have.

 It's in everyones best interest for the compiler to do all it can to 
 help  find these, I don't think anyone here is disagreeing with that.

 using knowledge of the callee internals. Split some more hairs,
 dude.

 It can still happen if the caller has no knowledge of the internals.
 It's also possible passing the same param had no ill effect, due to 
 the  internals.

 I've been
 pretty turned off by some of the attitudes in these recent threads.

 You are, by no means, the only one.


 probably due to my math background that I don't like stating 
 opinions as
 facts or pushing one point of view.

 Yet, you do so with gay abandon ("And I doubt Walter is going to add 
 a  readonly
 attribute").

 IMO that's an opinion, not stated as a fact.
 I didn't feel Ben was 'pushing' one point of view, he was certainly 
 arguing _his_ point of view.

 Where did that constructive attitude go to?

 I'm not sure what you mean by that.

 There was a time, in the past, when you might perhaps have looked at 
 how
 readonly *could* or *might* be implemented in a valid, useful, and 
 natural
 manner. Whilst it is perfectly valid to take up an opposing view, it 
 doesn't do
 anyone any favours by loading the proposed notion with 
 characteristics  of a
 failed - and by your own admission, vaguely related - implementation 
 from some
 other language.

 I understand that the C++ "const" has left a bad taste in the mouths 
 of  many;
 that should not colour the potential for D to be a better language 
 than  it
 currently is.

 I am interested in a post detailing the evils of const in C++, because 
 I  am certainly not 100% versed on the subject, yet, I want to 
 participate in  this discussion because I hope that some sort of 
 compiler checked readonly  is possible.

This is highly contentious stuff. I am a *very big* fan of const, and 
think it is one of several ways in which C++ is manifestly superior to 
other languages, D included. IMO, almost all criticisms of const boil 
down to, with as much respect as one can possibly having saying this, 
lazyness and ignorance, or, for compiler vendors, the challenges of 
implementing (which even fans like me have to consider are not 
inconsiderable). The one exception to this is that Logical Constness + 
Multithreading is a dangerous mix: I've just written the next instalment 
of my "Flexible C++" column on this very issue - that's what Walter was 
alluding to in his post of a couple of hours ago - which should be out 
(http://www.cuj.com) sometime next week.

If you want to see some of the really powerful things you can do with 
const, then get your employer to get a copy of IC++, and digest away. 
:-)

Cheers

Matthew

Feb 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 10 Feb 2005 11:12:25 +1100, Matthew  
<admin stlsoft.dot.dot.dot.dot.org> wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opslyc05jt23k2f5 ally...
 On Wed, 9 Feb 2005 20:11:12 +0000 (UTC), Kris
 <Kris_member pathlink.com>  wrote:
 In article <cudmqh$30rt$1 digitaldaemon.com>, Ben Hinkle says...
 For example, here's some C++ code where COW would arguably be more
 safe
 than
 const:
 void foo(char*x, const char*y) { x[0] = 'a'; ... }
 Now if one calls foo(z,z) for a char*z, then even though y says it
 is
 const
 there isn't anything preventing x from assinging to the same
 string.  So is
 it
 right to say the y is const? Sure but it doesn't mean the contents
 of  y
 won't
 change. If COW was used the function foo in D would look like
 void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
 I would say the D foo is safer than the C++ foo.

 Adding a readonly or const attribute will catch some errors at
 compile
 time but
 I argue it would introduce complexity and have gocha's and corner
 cases
 that
 make it less desirable than COW.


 What you appear to be saying here, is that the programmer can
 always  write
 the
 following:

 * (char *) 0x00000000 = 0;

 uhh - I don't know where you got that from. I was giving an example
 in  C++
 where const might confuse a careless programmer. What does that have
 to  do
 with a seg-v?

 Ben, you trawled up an example where the callee had tried to be
 explicit  about
 its contract, but the caller deliberately (or recklessly) violated
 that  by
 passing the same array for both arguments. Yes! that could happen!
 Just  as the
 seg-v could 'happen'. We're not suggesting the compiler should try to
 eliminate
 such behavior; so your example and the seg-v are, within this
 context,
 equivalent.

 My C++ example didn't circumvent anything. It is perfectly legal and
 cast-free C++.

 It circumvented the contract of the callee, as defined by the types
 of  its
 parameters,

 IMO the contract said, I won't mutate parameter 2, I _might_ mutate
 parameter 1. So it didn't violate it, it was just a risky gamble.

 The programmer simply made a mistake but not recognising that.

 It may be shortsighted to not realise that passing the same variable
 for  both could cause odd behaviour, it is also possible it might not
 have.

 It's in everyones best interest for the compiler to do all it can to
 help  find these, I don't think anyone here is disagreeing with that.

 using knowledge of the callee internals. Split some more hairs,
 dude.

 It can still happen if the caller has no knowledge of the internals.
 It's also possible passing the same param had no ill effect, due to
 the  internals.

 I've been
 pretty turned off by some of the attitudes in these recent threads.

 You are, by no means, the only one.


 probably due to my math background that I don't like stating
 opinions as
 facts or pushing one point of view.

 Yet, you do so with gay abandon ("And I doubt Walter is going to add
 a  readonly
 attribute").

 IMO that's an opinion, not stated as a fact.
 I didn't feel Ben was 'pushing' one point of view, he was certainly
 arguing _his_ point of view.

 Where did that constructive attitude go to?

 I'm not sure what you mean by that.

 There was a time, in the past, when you might perhaps have looked at
 how
 readonly *could* or *might* be implemented in a valid, useful, and
 natural
 manner. Whilst it is perfectly valid to take up an opposing view, it
 doesn't do
 anyone any favours by loading the proposed notion with
 characteristics  of a
 failed - and by your own admission, vaguely related - implementation
 from some
 other language.

 I understand that the C++ "const" has left a bad taste in the mouths
 of  many;
 that should not colour the potential for D to be a better language
 than  it
 currently is.

 I am interested in a post detailing the evils of const in C++, because
 I  am certainly not 100% versed on the subject, yet, I want to
 participate in  this discussion because I hope that some sort of
 compiler checked readonly  is possible.

 This is highly contentious stuff. I am a *very big* fan of const, and
 think it is one of several ways in which C++ is manifestly superior to
 other languages, D included.

Sorry, I wasn't aware of this position.

 IMO, almost all criticisms of const boil
 down to, with as much respect as one can possibly having saying this,
 lazyness and ignorance, or, for compiler vendors, the challenges of
 implementing (which even fans like me have to consider are not
 inconsiderable).

I am going to have to do some reading then.

 The one exception to this is that Logical Constness +
 Multithreading is a dangerous mix: I've just written the next instalment
 of my "Flexible C++" column on this very issue - that's what Walter was
 alluding to in his post of a couple of hours ago - which should be out
 (http://www.cuj.com) sometime next week.

Ahh, excellent.

 If you want to see some of the really powerful things you can do with
 const, then get your employer to get a copy of IC++, and digest away.
 :-)

Regan

Feb 09 2005

"Walter" <newshound digitalmars.com> writes:

"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:cud2ea$2aot$1 digitaldaemon.com...
Right now there's only a flimsy and vague 'trust' mechanism in place, and


even
that only applies to folk who (a) understand what CoW means, and (b)


fully
understand where the content they just recieved actually came from -- the


prior
example might be buried deep under a number of layers, or could be


provided
without source-code (heavens!)

 [snip]

 I completely agree COW relies on the programmers knowing what they are

doing.
 The tradeoffs are:
 1) assume the programmers know about COW and follow it (not always true),

or
 2) dup like crazy and watch performance suffer, or
 3) add a const/readonly attribute to help with some common gochas

 None of these are perfect. Const in C++ has some issues that I assume

readonly
 would share. Plus to learn about readonly and all the gocha's one might

have to
 invest just as much effort as learning COW - in fact one might argue that
 learning COW is much simpler than learning about how to use

readonly/const.
 For example, here's some C++ code where COW would arguably be more safe

than
 const:
 void foo(char*x, const char*y) { x[0] = 'a'; ... }
 Now if one calls foo(z,z) for a char*z, then even though y says it is

const
 there isn't anything preventing x from assinging to the same string. So is

it
 right to say the y is const? Sure but it doesn't mean the contents of y

won't
 change. If COW was used the function foo in D would look like
 void foo(char*x, char*y){x = x.dup;x[0] = 'a'; ... }
 I would say the D foo is safer than the C++ foo.

 Adding a readonly or const attribute will catch some errors at compile

time but
 I argue it would introduce complexity and have gocha's and corner cases

that
 make it less desirable than COW.

Whatever the right solution is, the C++ notion of "const" is the wrong
solution. It's even worse than you showed: in multithreaded apps, assuming
that const  means it won't change is a disaster. So-called 'const' data can
change out from under you in legal, standard conforming C++ programs. COW is
a much safer and more natural technique to use with multithreading.

Feb 09 2005

"Walter" <newshound digitalmars.com> writes:

"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:cubkj0$11qs$1 digitaldaemon.com...
 That's what COW is all about. The only downside of COW is that it is not
 enforced by the language. But then I'd argue the performance (and

simplicity)
 upside of COW outweigh the downside. But it is a judgement call for sure.

I
 think it would take a strong case to convince Walter at this point to

abandon
 COW. I think the lack of documentation about string literals contributed

to the
 specific examples the OP ran into. D's behavior didn't surprise me at all

given
 the C heritage. In fact I would have been surprised if it didn't follow C

by
 default.

I'll give a fuller answer later, but I know some languages where the
language implements COW for you. They get terribly inefficient very quickly
once you start doing some heavy string manipulation. Doing COW efficiently
means using it as a convention rather than a language enforced dogma.

Feb 09 2005

Kris <Kris_member pathlink.com> writes:

In article <cudio9$2sog$1 digitaldaemon.com>, Walter says...
"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:cubkj0$11qs$1 digitaldaemon.com...
 That's what COW is all about. The only downside of COW is that it is not
 enforced by the language. But then I'd argue the performance (and

simplicity)
 upside of COW outweigh the downside. But it is a judgement call for sure.

I
 think it would take a strong case to convince Walter at this point to

abandon
 COW. I think the lack of documentation about string literals contributed

to the
 specific examples the OP ran into. D's behavior didn't surprise me at all

given
 the C heritage. In fact I would have been surprised if it didn't follow C

by
 default.

I'll give a fuller answer later, but I know some languages where the
language implements COW for you. They get terribly inefficient very quickly
once you start doing some heavy string manipulation. Doing COW efficiently
means using it as a convention rather than a language enforced dogma.

I'll bite, Walter. 

Whilst eagerly awaiting your expansion on this, you should note that *no-one* is
suggesting the language implement CoW on one's behalf.

That a patently ridiculous notion for a language like D -- so let's not even go
there; please.

We're simply looking for a way whereby the programmer is steered in the right
direction by the compiler. Nothing more. There's a vast area of unchartered
territory between "a language enforced dogma" and a "language directive".

- Kris

Feb 09 2005

"Walter" <newshound digitalmars.com> writes:

"Kris" <Kris_member pathlink.com> wrote in message
news:cudlr5$2vkh$1 digitaldaemon.com...
 Whilst eagerly awaiting your expansion on this, you should note that

*no-one* is
 suggesting the language implement CoW on one's behalf.

Good. I just want to make sure that idea is quite dead <g>.

Feb 09 2005

Derek <derek psych.ward> writes:

On Wed, 9 Feb 2005 09:50:33 -0800, Walter wrote:

 "Ben Hinkle" <Ben_member pathlink.com> wrote in message
 news:cubkj0$11qs$1 digitaldaemon.com...
 That's what COW is all about. The only downside of COW is that it is not
 enforced by the language. But then I'd argue the performance (and

 simplicity)
 upside of COW outweigh the downside. But it is a judgement call for sure.

 I
 think it would take a strong case to convince Walter at this point to

 abandon
 COW. I think the lack of documentation about string literals contributed

 to the
 specific examples the OP ran into. D's behavior didn't surprise me at all

 given
 the C heritage. In fact I would have been surprised if it didn't follow C

 by
 default.

 
 I'll give a fuller answer later, but I know some languages where the
 language implements COW for you. They get terribly inefficient very quickly
 once you start doing some heavy string manipulation. Doing COW efficiently
 means using it as a convention rather than a language enforced dogma.

I suspect you have had much more experience in this area than I have,
however one language that I use constantly, Euphoria, has CoW built-in to
it. It is an interpreted language and still runs at only 5 times longer
than equivalent D programs. So I guess that there are efficient and
inefficient ways of implementing built-in CoW.

-- 
Derek
Melbourne, Australia

Feb 09 2005

"Walter" <newshound digitalmars.com> writes:

"Derek" <derek psych.ward> wrote in message
news:1h90azwbnucp2$.t14ekot1gd8i$.dlg 40tude.net...
 I'll give a fuller answer later, but I know some languages where the
 language implements COW for you. They get terribly inefficient very


quickly
 once you start doing some heavy string manipulation. Doing COW


efficiently
 means using it as a convention rather than a language enforced dogma.

 I suspect you have had much more experience in this area than I have,
 however one language that I use constantly, Euphoria, has CoW built-in to
 it. It is an interpreted language and still runs at only 5 times longer
 than equivalent D programs. So I guess that there are efficient and
 inefficient ways of implementing built-in CoW.

I know nothing about Euphoria, but try a loop over a string that reverses
the string in place (or sorts the characters, or changes case on them).
Language implemented COW features tend to reallocate/copy the string once
for each character.

Feb 09 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 9 Feb 2005 14:46:06 -0800, Walter wrote:

 "Derek" <derek psych.ward> wrote in message
 news:1h90azwbnucp2$.t14ekot1gd8i$.dlg 40tude.net...
 I'll give a fuller answer later, but I know some languages where the
 language implements COW for you. They get terribly inefficient very


 quickly
 once you start doing some heavy string manipulation. Doing COW


 efficiently
 means using it as a convention rather than a language enforced dogma.

 I suspect you have had much more experience in this area than I have,
 however one language that I use constantly, Euphoria, has CoW built-in to
 it. It is an interpreted language and still runs at only 5 times longer
 than equivalent D programs. So I guess that there are efficient and
 inefficient ways of implementing built-in CoW.

 
 I know nothing about Euphoria, but try a loop over a string that reverses
 the string in place (or sorts the characters, or changes case on them).
 Language implemented COW features tend to reallocate/copy the string once
 for each character.

Out of curiosity, I did this. I created a CoW version of a reverse function
for both D and Euphoria.

For a string of 780 utf32 characters, Euphoria was 57 times slower than D
to reverse the string in place.

I next quadrupled the string length and then Euphoria 24 slower than D. So
the CoW aspect of Euphoria is not related to the number of characters to
process.

But this aside, the current CoW policy in D seems to be okay as it gives
the coder/designer the best flexibility, at the cost of using one's brains
;-) 


If anyone is interested, the code for both can be found at 

  http://www.users.bigpond.com/ddparnell/reverse_test.zip

-- 
Derek
Melbourne, Australia
10/02/2005 3:24:57 PM

Feb 10 2005

"Walter" <newshound digitalmars.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message
news:sdlwwvjw3ak2.vdhwdon81to1.dlg 40tude.net...
 I next quadrupled the string length and then Euphoria 24 slower than D. So
 the CoW aspect of Euphoria is not related to the number of characters to
 process.

Perhaps it uses reference counted strings.

Feb 10 2005

D Programming

C/C++ Programming

Other

digitalmars.D - immutable strings, spec vs. reality