www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Another prayer for invariant strngs

reply Robert Fraser <fraserofthenight gmail.com> writes:
Invariant strings have been discussed before (briefly) in discussions of
constness, however I wish to bring up the subject again more directly.

The "string" alias as it is now (in D 2.0) is an odd beast. The problem is that
it is invariant(char)[] instead of invariant(char[]), so that while the
characters themselves are invariant, the array is mutable. This has two main
problems:

1. It's confusing. There have been quite a few topics both in this newsgroup
and in digitalmars.D.learn about how exactly to use the 2.0 string alias and
where it's immutable/where it's not.

2. Performance. While writing my own code, I can pretend "string" is invariant
(or use my own invariant(char[]) alias), but when passing to, or receiving code
from library functions, this is not possible. This means that in each of these
situations I must take two, performance-draining precautionary measures:
i. Duplicate the string every time it's passed in or out of my code.
ii.Synchronize multithreaded access to strings/acquire locks/etc.

Invariant strings have precedent: they're used in Java, .NET, Perl, Python,
Ruby and quite a few other languages. And for when multiple string operations
are going down, there's always char[] and .idup to fall back on, which are far
better than Java's StringBuffer, etc.

So, please, Walter... consider Andrei's proposal and make "string" an alias to
invariant(char[]). It'll make a lot of happiness happen.
Jul 12 2007
next sibling parent reply Christian Kamm <kamm.incasoftware shift-at-left-and-remove-this.de> writes:
 The problem is
 that it is invariant(char)[] instead of invariant(char[])

I was under the impression that invariant(char)[] was the same type as invariant(char[]) as invariant/const never apply to the declaration itself? So invariant(int) == int, invariant(int*) == invariant(int)* invariant(int**) == invariant(int*)* != invariant(int)** Or is that incorrect? Christian
Jul 12 2007
parent reply torhu <fake address.dude> writes:
Christian Kamm wrote:
 The problem is
 that it is invariant(char)[] instead of invariant(char[])

I was under the impression that invariant(char)[] was the same type as invariant(char[]) as invariant/const never apply to the declaration itself? So invariant(int) == int, invariant(int*) == invariant(int)* invariant(int**) == invariant(int*)* != invariant(int)** Or is that incorrect?

That's my understanding too, but I'm a bit confused by that fact that Walter's examples uses both variants.
Jul 12 2007
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Oh, sorry, guess I was quite wrong. So does this mean I don't need to be making
defensive copies of every string?

torhu Wrote:

 Christian Kamm wrote:
 The problem is
 that it is invariant(char)[] instead of invariant(char[])

I was under the impression that invariant(char)[] was the same type as invariant(char[]) as invariant/const never apply to the declaration itself? So invariant(int) == int, invariant(int*) == invariant(int)* invariant(int**) == invariant(int*)* != invariant(int)** Or is that incorrect?

That's my understanding too, but I'm a bit confused by that fact that Walter's examples uses both variants.

Jul 13 2007
parent Christian Kamm <kamm.incasoftware shift-at-left-and-remove-this.de> writes:
 So does this mean I don't need to be
 making defensive copies of every string?

Yep, dynamic arrays behave very much like pointers or classes: void foo(const(char)[] str) { // valid since str is not final // only changes local copy of array pointer and length str = "abc"; // illegal! can't change the data of the array str[] = "abc"; }
Jul 13 2007
prev sibling next sibling parent 0ffh <spam frankhirsch.net> writes:
Robert Fraser wrote:
 2. Performance. While writing my own code, I can pretend "string" is
 invariant (or use my own invariant(char[]) alias), but when passing to,
 or receiving code from library functions, this is not possible. This
 means that in each of these situations I must take two,
 performance-draining precautionary measures: i. Duplicate the string
 every time it's passed in or out of my code. ii.Synchronize
 multithreaded access to strings/acquire locks/etc.

I don't quite see this point. The way I understand D2.0 strings (which may be like so much wrong, but still), with invariant(char)[] you can be sure the characters will never change, so there is totally no reason to duplicate that string. Only the pointer to the characters and the length information are mutable.
 Invariant strings have precedent: they're used in Java, .NET, Perl,
 Python, Ruby and quite a few other languages.

In my book, precedence in itself is no argument - except for lemmings. ;-) Regards, Frank
Jul 13 2007
prev sibling parent reply Regan Heath <regan netmail.co.nz> writes:
(disclaimer, I have done only the testing shown at the end of this post)

Robert Fraser wrote:
 Invariant strings have been discussed before (briefly) in discussions
 of constness, however I wish to bring up the subject again more
 directly.
 
 The "string" alias as it is now (in D 2.0) is an odd beast. The
 problem is that it is invariant(char)[] instead of invariant(char[]),
 so that while the characters themselves are invariant, the array is
 mutable. 

This makes sense if you think about it from the compilers point of view. It has placed the characters themselves in ROM but the array reference is in RAM so it's pointer and length can change. So, this is valid: invariant(char)[] a = "foo"; invariant(char)[] b = "bar"; b = a; But these are invalid: char[] p; a[0] = 'a'; //for any given rvalue b[] = a[]; //and slicing variants p = a; //p cannot point to invariant(char) If you want to prevent the reference from changing make it 'final', eg. final invariant(char)[] a;
 This has two main problems:
 
 1. It's confusing. There have been quite a few topics both in this
 newsgroup and in digitalmars.D.learn about how exactly to use the 2.0
 string alias and where it's immutable/where it's not.

I wont argue as to whether it's confusing, but to me it seems the basic concept is: "A 'string' reference isn't immutable (or rather 'final'), but it's data is (immutable)".
 2. Performance. While writing my own code, I can pretend "string" is
 invariant (or use my own invariant(char[]) alias), but when passing
 to, or receiving code from library functions, this is not possible.

When you pass string to a function that function gets a /copy/ of the reference. So, there is technically no need for the copied reference to be invariant (or rather 'final'). Changes to the reference in the function *do not* propagate back to the caller. Unless, however, the parameter is 'ref'. In which case changes to the reference propagate back to the caller. In this case if your reference is final DMD will error, see test case below. In short, if you use 'final' on your strings then even if you call a library function which takes a 'ref' the compiler will protect you.
 This means that in each of these situations I must take two,
 performance-draining precautionary measures: i. Duplicate the string
 every time it's passed in or out of my code. ii.Synchronize
 multithreaded access to strings/acquire locks/etc.

You do not need to sync access to invariant data, but you may need to sync access to an array reference (if your code, or library code might change it). To prevent changes make your strings final.
 Invariant strings have precedent: they're used in Java, .NET, Perl,
 Python, Ruby and quite a few other languages. And for when multiple
 string operations are going down, there's always char[] and .idup to
 fall back on, which are far better than Java's StringBuffer, etc.

Does Java prevent you re-assigning an invariant string reference? If so, are they implicitly 'final' then?
 So, please, Walter... consider Andrei's proposal and make "string" an
 alias to invariant(char[]). It'll make a lot of happiness happen.

I think a greater understanding of the current system is required before we start opting for changes. - Regan Heath Test cases: void main() { invariant(char)[] p1 = "one"; invariant(char[]) p2 = "two"; final invariant(char[]) p3 = "three"; char[] p4 = "four".dup; const(char)[] p5 = "five"; const(char[]) p6 = "six"; //p1[0] = 'a'; //Error: p1[0] is not mutable //p2[0] = 'a'; //Error: p2[0] is not mutable //p3[0] = 'a'; //Error: p3[0] is not mutable p4[0] = 'a'; //ok //p5[0] = 'a'; //Error: p5[0] is not mutable //p6[0] = 'a'; //Error: p6[0] is not mutable //p1[] = p2[]; //Error: slice p1[] is not mutable //p2[] = p1[]; //Error: slice p2[] is not mutable //p3[] = p1[]; //Error: slice p3[] is not mutable p4[] = p1[]; //ok //p5[] = p1[]; //Error: slice p5[] is not mutable //p6[] = p1[]; //Error: slice p6[] is not mutable p1 = p2; //ok p2 = p1; //ok //p3 = p1; //variable invariant.p3 cannot modify final/const/invariant variable 'p3' //p4 = p1; //Error: cannot implicitly convert expression (p1) of type invariant(char)[] to char[] p5 = p1; //ok p6 = p1; //ok foo(p3); //variable invariant.main.p3 cannot modify final/const/invariant variable 'p3' } /* void foo(final invariant(char)[] param) { //param = "test"; //variable invariant.foo.param cannot modify final/const/invariant variable 'param' } */ void foo(ref invariant(char)[] param) { param = "test"; //variable invariant.foo.param cannot modify final/const/invariant variable 'param' }
Jul 13 2007
parent Robert Fraser <fraserofthenight gmail.com> writes:
Oh, didn't see your message. That's awesome, thanks! No, I didn't want the
refrences to be final, just the data. Basically, I want to ensure that
functions I call won't mess around with my data.

Thanks!
All the best,
Fraser

Regan Heath Wrote:

 (disclaimer, I have done only the testing shown at the end of this post)
 
 Robert Fraser wrote:
 Invariant strings have been discussed before (briefly) in discussions
 of constness, however I wish to bring up the subject again more
 directly.
 
 The "string" alias as it is now (in D 2.0) is an odd beast. The
 problem is that it is invariant(char)[] instead of invariant(char[]),
 so that while the characters themselves are invariant, the array is
 mutable. 

This makes sense if you think about it from the compilers point of view. It has placed the characters themselves in ROM but the array reference is in RAM so it's pointer and length can change. So, this is valid: invariant(char)[] a = "foo"; invariant(char)[] b = "bar"; b = a; But these are invalid: char[] p; a[0] = 'a'; //for any given rvalue b[] = a[]; //and slicing variants p = a; //p cannot point to invariant(char) If you want to prevent the reference from changing make it 'final', eg. final invariant(char)[] a; > This has two main problems:
 
 1. It's confusing. There have been quite a few topics both in this
 newsgroup and in digitalmars.D.learn about how exactly to use the 2.0
 string alias and where it's immutable/where it's not.

I wont argue as to whether it's confusing, but to me it seems the basic concept is: "A 'string' reference isn't immutable (or rather 'final'), but it's data is (immutable)".
 2. Performance. While writing my own code, I can pretend "string" is
 invariant (or use my own invariant(char[]) alias), but when passing
 to, or receiving code from library functions, this is not possible.

When you pass string to a function that function gets a /copy/ of the reference. So, there is technically no need for the copied reference to be invariant (or rather 'final'). Changes to the reference in the function *do not* propagate back to the caller. Unless, however, the parameter is 'ref'. In which case changes to the reference propagate back to the caller. In this case if your reference is final DMD will error, see test case below. In short, if you use 'final' on your strings then even if you call a library function which takes a 'ref' the compiler will protect you.
 This means that in each of these situations I must take two,
 performance-draining precautionary measures: i. Duplicate the string
 every time it's passed in or out of my code. ii.Synchronize
 multithreaded access to strings/acquire locks/etc.

You do not need to sync access to invariant data, but you may need to sync access to an array reference (if your code, or library code might change it). To prevent changes make your strings final.
 Invariant strings have precedent: they're used in Java, .NET, Perl,
 Python, Ruby and quite a few other languages. And for when multiple
 string operations are going down, there's always char[] and .idup to
 fall back on, which are far better than Java's StringBuffer, etc.

Does Java prevent you re-assigning an invariant string reference? If so, are they implicitly 'final' then?
 So, please, Walter... consider Andrei's proposal and make "string" an
 alias to invariant(char[]). It'll make a lot of happiness happen.

I think a greater understanding of the current system is required before we start opting for changes. - Regan Heath Test cases: void main() { invariant(char)[] p1 = "one"; invariant(char[]) p2 = "two"; final invariant(char[]) p3 = "three"; char[] p4 = "four".dup; const(char)[] p5 = "five"; const(char[]) p6 = "six"; //p1[0] = 'a'; //Error: p1[0] is not mutable //p2[0] = 'a'; //Error: p2[0] is not mutable //p3[0] = 'a'; //Error: p3[0] is not mutable p4[0] = 'a'; //ok //p5[0] = 'a'; //Error: p5[0] is not mutable //p6[0] = 'a'; //Error: p6[0] is not mutable //p1[] = p2[]; //Error: slice p1[] is not mutable //p2[] = p1[]; //Error: slice p2[] is not mutable //p3[] = p1[]; //Error: slice p3[] is not mutable p4[] = p1[]; //ok //p5[] = p1[]; //Error: slice p5[] is not mutable //p6[] = p1[]; //Error: slice p6[] is not mutable p1 = p2; //ok p2 = p1; //ok //p3 = p1; //variable invariant.p3 cannot modify final/const/invariant variable 'p3' //p4 = p1; //Error: cannot implicitly convert expression (p1) of type invariant(char)[] to char[] p5 = p1; //ok p6 = p1; //ok foo(p3); //variable invariant.main.p3 cannot modify final/const/invariant variable 'p3' } /* void foo(final invariant(char)[] param) { //param = "test"; //variable invariant.foo.param cannot modify final/const/invariant variable 'param' } */ void foo(ref invariant(char)[] param) { param = "test"; //variable invariant.foo.param cannot modify final/const/invariant variable 'param' }

Jul 13 2007