www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is all this Invarient **** er... stuff, premature optimisation?

reply p9e883002 sneakemail.com writes:
Hi all,

'scuse me for not being familiar with previous or ongoing discussion on this 
subject, but I'm just coming back to D after a couple of years away.

I have some strings read in from external source that I need to convert to 
uppercase. A quick look at Phobos and I find std.string has a toupper method.

import std.stdio;
import std.string;

int main( char[][] args ) {
    char[] a = args[ 0 ].toupper();
    writefln( a );
    return 0;
}

c:\dmd\test>dmd junk.d
junk.d(5): function std.string.toupper (invariant(char)[]) does not match 
parameter types (char[])
junk.d(5): Error: cannot implicitly convert expression (args[0u]) of type
char[] 
to invariant(char)[]
junk.d(5): Error: cannot implicitly convert expression (toupper(cast(invariant
(char)[])(args[0u]))) of type invariant(char)[] to char[]

Hm. Okey dokey.

import std.stdio;
import std.string;

int main( char[][] args ) {
    char[] a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
    writefln( a );
    return 0;
}

junk.d(5): Error: cannot implicitly convert expression (toupper(cast(invariant
(char)[])(args[0u]))) of type invariant(char)[] to char[]

Shoulda known :(

import std.stdio;
import std.string;

int main( char[][] args ) {
    string a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
    writefln( a );
    return 0;
}

c:\dmd\test>dmd junk.d

c:\dmd\test>junk
C:\DMD\TEST\JUNK.EXE

Great! Now I need to replace the bit in the middle:

import std.stdio;
import std.string;

int main( char[][] args ) {
    string a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
    a[ 2 .. 4 ] = "XXX";
    writefln( a );
    return 0;
}

c:\dmd\test>dmd junk.d
junk.d(6): Error: slice a[cast(uint)2..cast(uint)4] is not mutable

Wha..? What's the point in having slices if I can't use them?

import std.stdio;
import std.string;

int main( char[][] args ) {
    char[] a = cast(char[]) ( cast(invariant(char)[]) args[ 0 ] ).toupper();
    a[ 2 .. 4 ] = "XXX";
    writefln( a );
    return 0;
}

Finally, it works. But can you see what's going on in line 5 amongst all that 
casting? Cos I sure can't.

So, I read that all this invarient stuff is about efficiency. For whom?
Must be the compiler because it sure ain't about programmer efficiency.

Ah. Maybe I meant to ignore the beauty of slices and use strings  and method 
calls for everything?

import std.stdio;
import std.string;

int main( string[] args ) {
    string a = args[ 0 ].toupper();
    a.replace(  a[ 2 .. 4 ], "XXX" );
    writefln( a );
    return 0;
}

Compiles clean and runs:

c:\dmd\test>dmd junk.d

c:\dmd\test>junk
C:\DMD\TEST\JUNK.EXE

But does nothing! 

import std.stdio;
import std.string;

int main( string[] args ) {
    string a = args[ 0 ].toupper();
    a = a.replace(  a[ 2 .. 4 ], "XXX" );
    writefln( a );
    return 0;
}

c:\dmd\test>dmd junk.d

c:\dmd\test>junk
C:XXXMD\TEST\JUNK.EXE

Finally, it runs. But at what cost? The 'immutable' a has ended up being
mutated. 
I still had to specify the slice, but I had to call another method call to
actually 
do the deed. 

Of course, a wasn't really mutated. Instead, args[0] was copied and then 
mutated and labelled a. Then a was copied and mutated and reassigned the 
mutated copy. 

So, that's two copies of the string, plus a slice, plus an extra method call to 
achieve what used to be achievable in place on the original string. Which is
now 
immutable, but I'll never need it again. 

Of course, on these short 1-off strings it doesn't matter a hoot. But when the 
strings are 200 to 500 characters a pop and there are 20,000,000 of them. It 
matters.

Did I suggest this was an optimisation?

Whatever immutability-purity cool aid you've been drinking, please go back to 
coke. And give us usable libraries and sensible implicit conversions. Cos this
sucks 
bigtime.

b.
Apr 27 2008
next sibling parent reply "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
<p9e883002 sneakemail.com> wrote:

 Of course, a wasn't really mutated. Instead, args[0] was copied and th=

 mutated and labelled a. Then a was copied and mutated and reassigned t=

 mutated copy.

 So, that's two copies of the string, plus a slice, plus an extra metho=

 call to
 achieve what used to be achievable in place on the original string.  =

 Which is now
 immutable, but I'll never need it again.

 Of course, on these short 1-off strings it doesn't matter a hoot. But =

 when the
 strings are 200 to 500 characters a pop and there are 20,000,000 of  =

 them. It
 matters.

 Did I suggest this was an optimisation?

 Whatever immutability-purity cool aid you've been drinking, please go =

 back to
 coke. And give us usable libraries and sensible implicit conversions. =

 Cos this sucks
 bigtime.

 b.

Is this what you wanted to write? int main(string[] args) { char[] a =3D cast(char[])args[0]; a[2..5] =3D "XXX"; writefln(a); return 0; } This compiles and runs, and seems to do what you describe. Sure, there's= a cast there, but it's not all that bad, is it?
Apr 27 2008
next sibling parent reply "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Mon, 28 Apr 2008 02:14:19 +0200, Simen Kjaeraas  =

<simen.kjaras gmail.com> wrote:

 <p9e883002 sneakemail.com> wrote:

 Of course, a wasn't really mutated. Instead, args[0] was copied and t=


 mutated and labelled a. Then a was copied and mutated and reassigned =


 mutated copy.

 So, that's two copies of the string, plus a slice, plus an extra meth=


 call to
 achieve what used to be achievable in place on the original string.  =


 Which is now
 immutable, but I'll never need it again.

 Of course, on these short 1-off strings it doesn't matter a hoot. But=


 when the
 strings are 200 to 500 characters a pop and there are 20,000,000 of  =


 them. It
 matters.

 Did I suggest this was an optimisation?

 Whatever immutability-purity cool aid you've been drinking, please go=


 back to
 coke. And give us usable libraries and sensible implicit conversions.=


 Cos this sucks
 bigtime.

 b.

Is this what you wanted to write? int main(string[] args) { char[] a =3D cast(char[])args[0]; a[2..5] =3D "XXX"; writefln(a); return 0; } This compiles and runs, and seems to do what you describe. Sure, there=

 a
 cast there, but it's not all that bad, is it?

Sorry, forgot the .toupper() call there. Should be char[] a =3D cast(char[])args[0].toupper(); -- Simen
Apr 27 2008
parent reply p9e883002 sneakemail.com writes:
On Mon, 28 Apr 2008 02:28:23 +0200, "Simen Kjaeraas" 
<simen.kjaras gmail.com> wrote:
 On Mon, 28 Apr 2008 02:14:19 +0200, Simen Kjaeraas  =
 
 <simen.kjaras gmail.com> wrote:
 
 <p9e883002 sneakemail.com> wrote:


 Is this what you wanted to write?

 int main(string[] args)
 {
    char[] a =3D cast(char[])args[0];
    a[2..5] =3D "XXX";
    writefln(a);
    return 0;
 }
 This compiles and runs, and seems to do what you describe. Sure, there=

 a
 cast there, but it's not all that bad, is it?

Sorry, forgot the .toupper() call there. Should be char[] a =3D cast(char[])args[0].toupper(); -- Simen

Okay, you got around the first cast by using int main( string[] ) { So now you want to lowercase it again: import std.stdio; import std.string; int main( string[] args) { char[] a = cast(char[])args[0].toupper(); a[2..5] = "XXX"; a = a.tolower; writefln(a); return 0; } c:\dmd\test>dmd junk.d junk.d(7): Error: no property 'tolower' for type 'char[]' junk.d(7): Error: cannot implicitly convert expression (1) of type int to char[] junk.d(7): Error: cannot cast int to char[] junk.d(7): Error: integral constant must be scalar type, not char[] So, cast a back to being a string, so that we can call tolower() on it and then cast the copied mutated string back to a char[]: import std.stdio; import std.string; int main( string[] args) { char[] a = cast(char[])args[0].toupper(); a[2..5] = "XXX"; a = cast(char[]) ( ( cast(string)a ).tolower ); writefln(a); return 0; } c:\dmd\test>dmd junk.d junk.d(7): Error: no property 'tolower' for type 'invariant(char)[]' junk.d(7): Error: cannot cast int to char[] junk.d(7): Error: integral constant must be scalar type, not char[] junk.d(7): Error: cannot cast int to char[] junk.d(7): Error: integral constant must be scalar type, not char[] junk.d(7): Error: cannot implicitly convert expression (0) of type int to char[] junk.d(7): Error: cannot cast int to char[] junk.d(7): Error: integral constant must be scalar type, not char[] Nope. That don't work. import std.stdio; import std.string; int main( string[] args) { char[] a = cast(char[])args[0].toupper(); a[2..5] = "XXX"; a = cast(char[])tolower( cast(string)a ); writefln(a); return 0; } Finally. It works. Summary: If I want to be able to lvalue slice operations on 'strings' (for efficiency) I have to have them as char[]. If I want to be able to use std.string methods on those same strings, I have to cast them to invariant(char)[] and the results back to char[] which involves a at least one copy operation, and probably two. And the invariant-ness of the string library is done "for efficiency"? Cheers, b.
Apr 27 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Janice Caron wrote:

 2008/4/28  <p9e883002 sneakemail.com>:
  import std.string;
 
  int main( string[] args) {
     char[] a = cast(char[])args[0].toupper();

**** UNDEFINED BEHAVIOR **** (1) args might be placed in a hardware-locked read-only segment. Then the following line would fail (2) there might be other pointers to the string, which expect it never to change.
     a[2..5] = "XXX";
     a = cast(char[])tolower( cast(string)a );
     writefln(a);
     return 0;
  }
 
  Finally. It works.

But not necessarily on all architectures, because of the undefined behavior. This is how you do it without undefined behavior. import std.string; int main( string[] args) { string a = args[0].toupper(); a = a[0..2] ~ "XXX" ~ a[5..$]; a = a.tolower(); writefln(a); return 0; }

Ack! That's horrible. Instead of using the information I have, the offset and length of the slice I want to manipulate, I have to derive two offset/length pairs to the bits I do not want to do anything to. 1) Whatever happened to polymorphism? Eg. Why can't the standard string library recognise that I, as the programmer, know what I need to do to my data. It's my job. So, if I assign the results of a string library function/method to a mutable variable (Just a variable really. An invariant variable is a constant!), then it should be possible (*IS* possible) to recognise that and return an appropriate result. Duplicating the input if required. The idea that runtime obtained or derived strings can be made truely invariant is purely theoretical. Whilst the compiler can place compile time contants into hardware protected, read-only memory segments, doing this at runtime would be horribly costly and hardly beneficial. IA-86 allows memory to be set readonly at runtime, but only in page sized chunks. Which means that either: - every derived string would need to be placed in its own 4k multiple sized chunk of ram. -or, each page would have to constantly be switched from read-only to read-write and back again as new entities are added and old ones go out of scope. And if you are not using hardware protection, then the invariance is only notional as D can call C, and C allows me access to pointers. And once I have one of those, I can scribble anywhere that isn't hardware protected. All this smacks of D reinventing, with all the same mistakes, the whole Java String vs. StringBuffer dichotomy: http://www.javaworld.com/javaworld/jw-03-2000/jw-0324-javaperf.html And Java had the VM to isolate it from non-complient code. One of several "mission statements" that drew me to D when I forst encountered it nearly 3 years ago, was the pragmatism embodied in articles like this: http://www.digitalmars.com/d/2.0/builtin.html and this: http://www.digitalmars.com/d/2.0/cppstrings.html and statements like this: "No pointless wrappers around C runtime library functions or OS API functions D provides direct access to C runtime library functions and operating system API functions. Pointless D wrappers around those functions just adds blather, bloat, baggage and bugs." Coming back to try and use D after a prolonged absence, the changes in the interim period seem to be eshewing that pragmatism in favour of some kind of mixed OO/functional purity ethic. Is there an ex-Haskeller in the house? I admit openly to still being in the throws of finding my way around the language and the library, and have being making seemingly elementary mistakes in interpreting the documentation. But one of the major attractions of D over C/C++ is its built-in string types and manipulations. As good as these are, there is still the need for a library of common operations upon them. If everytime I want to use one of these library calls, I have to cast my mutable string into and invariant and then cast the result back to mutable inorder to be able to use the built-in manipulations, lifes going to get very boring, very fast. The alternative I guess is to sit down and write my own library that performs the same operations as std.string, but on the native string type. Which kinda dilutes the purpose of having standard libraries. Sorry to be so verbose, and please don't anyone take any of this personally. I'm critiquing the code I am encountering, and the problems I am having using it. Not the prople who wrote it. Cheers, b. --
Apr 28 2008
next sibling parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Janice Caron wrote:

 you could do
 
     char[] tmp = a.dup;
     tmp[2..5] = "XXX";
     a = assumeUnique(tmp);
 

Ah! Again, 3 lines instead of 1. Plus two function calls and a temporary variable. You do realise that there is a very strong correlation between bugs and line count? That's been so for all of the last 30+years regardless of language or paradigm. So, you made it more verbose and more complex and much slower. And, in doing so, introduced more scopes for errors than you've cured.
That's one approach. Another is don't try to treat strings as mutable.

Ram is mutable--it's its purpose in being, Variables live in RAM, and vary--that's their purpose in being. Making a copy of a <strike>string</strike> piece of ram and throwing the old one away, every time I want alter its contents...kinda reminds me of disposable nappies. A costly convenience. I'll revert to 1.x and pray that 2.x fades away through lack of interest before it turns D into Yet Another Dead Language--for OO purists and academics only. Cheers, b. --
Apr 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Janice Caron wrote:
 If there's enough interest, and if Walter approves, I could certainly
 kickstart std.stringbuffer. Is that the right way to go? What do
 people think?

What it will do is provide a useful solution for those who really want to use mutable strings. I bet that, though, after a while they'll evolve to eschew it in favor of immutable strings. It's easier than arguing about it <g>.
Apr 28 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Janice Caron wrote:
 If there's enough interest, and if Walter approves, I could certainly
 kickstart std.stringbuffer. Is that the right way to go? What do
 people think?

to use mutable strings. I bet that, though, after a while they'll evolve to eschew it in favor of immutable strings. It's easier than arguing about it <g>.

I do agree with the notion that the majority of operations performed on strings in a typical application do not modify the string in place. However, in performance-oriented server applications, is it very common to hold and reuse a mutable buffer between calls to avoid the const of reallocation. Assuming that references to this data are passed around during the processing of a client request I would fully expect the surrounding code to have no need to mutate the data. However, because this is a reusable buffer, invariant is not a safe option because the contents of the buffer will change for each request. What I would be inclined to do here is use const references to reflect this. I've been thinking a lot about const and invariant recently and while invariant strings seem quite handy for test code and the like, I have not been able to think of a single production application where I would actually be able to use them for the bulk of my string data, for the reason mentioned above. Rather, I would expect to use 'const' everywhere because what I generally care about is preventing a caller or callee from changing the contents of my data. As for indicating ownership, the following rule generally suffices: char[] getData(); // result is mutable -- ownership is transferred const(char)[] getData(); // result is const -- ownership not transferred What I love about Steven's "scoped const" proposal is that it would allow me to write a single instance of a library function that would work equally well with any data, and the function would communicate its behavior within the syntax. Add "scoped const" to D 1.0 plus the ability to use 'const' in all the places it can be used in D 2.0 and I'd be a happy camper. Bonus points for eliminating storage of static const (ie ROM-able) data and dropping support for anonymous enum altogether. Sean P.S. The utility of 'invariant' for multiprogramming is a separate issue. I actually think it's unnecessary there as well, but don't want the discussion to get off track by addressing this at all. I'm merely adding this note so no one will bring it up in response to what I said above.
Apr 28 2008
parent Lars Ivar Igesund <larsivar igesund.net> writes:
Sean Kelly wrote:

 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Janice Caron wrote:
 If there's enough interest, and if Walter approves, I could certainly
 kickstart std.stringbuffer. Is that the right way to go? What do
 people think?

to use mutable strings. I bet that, though, after a while they'll evolve to eschew it in favor of immutable strings. It's easier than arguing about it <g>.

I do agree with the notion that the majority of operations performed on strings in a typical application do not modify the string in place. However, in performance-oriented server applications, is it very common to hold and reuse a mutable buffer between calls to avoid the const of reallocation.

Indeed, in the application I'm currently writing at work, there is not a single heap allocation after the startup phase. And it cannot be called trivial in any sense. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Apr 28 2008
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Janice Caron wrote:
 On 28/04/2008, Walter Bright <newshound1 digitalmars.com> wrote:
  What it will do is provide a useful solution for those who really want to
 use mutable strings. I bet that, though, after a while they'll evolve to
 eschew it in favor of immutable strings.

I'm inclined to agree with the prediction - but even so, wouldn't that be a good thing? I mean, if it keeps people on board with D2 who might otherwise have run away, then that's good, right? And if those people later realise they can do more with immutable strings, then that's good too, right?

I think we're in agreement.
Apr 28 2008
prev sibling parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Janice Caron wrote:

2008/4/28 Me Here <p9e883002 sneakemail.com>:
(I forget which module you have to import to get assumeUnique). But
what you mustn't ever do is cast away invariant.

  1) Whatever happened to polymorphism?

What's polymorphism got to do with anything? A string is an array, not a class.
  So, if I assign the results of a string library function/method to a
  mutable variable (Just a variable really. An invariant variable is a
  constant!), then it should be possible (*IS* possible) to recognise
  that and return an appropriate result.

Functions don't overload on return value.

They don't? Why not? Seems like a pretty obvious step to me. Rather than having to have methods: futzIt_returnString() futzIt_returnInt() futzIt_returnReal() futzIt_returnComplex() where 'futzIt' might me "read a string from the command line and return it to me as some type (if possible)", I can just do int i = futzIt( ... ); real r = futzIt( ... ); And let the compiler work out which futzIt() I need to call, and take care of mangling the names to allow them to coexists. You mean D doesn't already have this facility? Seems lie it would be a far more productive and useful expenditure of effort than all this invariant stuff.
  The idea that runtime obtained or derived strings can be made truly
  invariant is purely theoretical.

But the fact that someone else might be sharing the data is not.

By "someone else" you mean 'another thread'? If so, then if that is a possibility, if my code is using threads, then I, the programmer, will be aware of that and will be able to take appropriate choices. I /might/ chose to use invariance to 'protect' this particular piece of data from the problems of shared state concurrency--if there is any possibility that I intend to shared this particular piece of data. But in truth, it is very unlikely that I *will* make /that/ choice. Here's why. What does it mean to make and hold multiple (mutated) copies of a single entity? That is, I obtain a piece of data from somewhere and make it invariant. Somehow two threads obtain references to that piece of data. If none of them attempt to change it, then it makes no difference that it is marked invariant. If however, one of them is programmed to change it, then it now has a different, version of that entity to the other thread. But what does that mean? Who has the 'right' version? Show me a real situation where two threads can legitimately be making disparate modifications to a single entity, string or otherwise, and I'll show you a programming error. Once two threads make disparate modifications to an entity, they are separate entities. And they should have been given copies, not references to a single copy, in the first place. If the intent is that the share a single entity, then any legitimate modifications to that single entity should be reflected in the views of that single entity by both threads. And therefore subjected to locking, or STM or whatever mechanism is used to control that modification. This whole thing of invariance and concurrency seems to be aimed at enabling the use of COW. Which smacks of someone trying to emulate fork-like behaviours using threads. And if that is the case, and I very much hope it isn't, then let me tell you as someone who is intimately familiar with the one existing system that wen this route (iThreads: look'em up), that it is a total disaster, The whole purpose and advantage of multi-threading, over multi-processing, is (mutable) shared state. And the elimination of costs of serialisation and narrow bandwidth if IPC in the forking concurrency mode. Attempting to emulate that model using threading gives few of its advantages, all of its disadvantages, and throws away all of the advantages of threading. It is a complete and utter waste of time and effort. If the aim is to simplify the use of threading for common programming scenarios and bring it within the grasp of non-threading specialist programmers, then there are far more effective and less costly ways of achieving that.
  But one of the
  major attractions of D over C/C++ is its built-in string types

D has no built in string type. string is just an alias for invariant(char)[].

Semantics. D has built-in support for a string-type (see http://www.digitalmars.com/d/2.0/overview.html) from which I quote: "Strings" "String manipulation is so common, and so clumsy in C and C++, that it needs direct support in the language". "Modern languages handle string concatenation, copying, etc., and so does D". "Strings are a direct consequence of improved array handling." What invariant strings do, and as far as I can see the only significant thing they do, is to reinvent the clumsiness of C & C++ by making strings a second-class data type again. If the point is to try and make threading easier, it will fail miserably once people realise that it creates the scope for multiple concurrent versions of supposedly single entities. Which breaks just about every programming rule in the book, and creates scope for far more intractable errors than it fixes.
That's one approach. Another is don't try to treat strings as mutable.

If the intention of invariance is some move toward OO or functional purity, then I again quote from the same document: "Who D is Not For" [some categories elided] "Language purists. D is a practical language, and each feature of it is evaluated in that light, rather than by an ideal. " "For example, D has constructs and semantics that virtually eliminate the need for pointers for ordinary tasks. " "But pointers are still there, because sometimes the rules need to be broken." "Similarly, casts are still there for those times when the typing system needs to be overridden." Cheers, b. --
Apr 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.


Type inference in D is done "bottom up". Doing overloading based on function return type is "top down". Trying to get both schemes to coexist is a hard problem.
  The idea that runtime obtained or derived strings can be made truly
  invariant is purely theoretical.



No, it could be the same thread, via another alias to the same data. Using invariant strings allows the programmer to treat them as if they were value types and being copied for every use (like ints are), except they don't need to be actually copied. With mutable strings, one always has to be careful to keep track of who 'owns' the string, and who has references to it. When mutating the string, one must manually ensure that there are no other references to it that would be surprised by the data changing. For example, if you insert a string into a symbol table, and then later some other reference to that string changes it, it could wind up corrupting the symbol table. The point about the main(char[][] args) and modifying those strings in-place is very valid - nothing is said about where those strings actually reside, and who else may have references to the same data, and whether you can modify them with impunity or not. You could argue "this should be better documented" and you'd be right, but if the declaration instead said main(invariant(char[])args) then I *know* that I am not allowed to change them, and whoever calls main() *knows* that those arg strings won't get changed. We can both sleep comfortably. Invariant strings offer a guarantee that the data won't change, which clarifies the API of the functions. (Whenever I see an API function that takes a char*, say putenv(), it rarely says whether it saves a copy of the data or saves a copy of the reference. That just sucks.)
 If so, then if that is a possibility, if my code is using threads, then 
 I, the programmer,
 will be aware of that  and will be able to take appropriate choices.
 
 I /might/ chose to use invariance to 'protect' this particular piece of 
 data from the problems
 of shared state concurrency--if there is any possibility that I intend 
 to shared this particular piece of data.
 But in truth, it is very unlikely that I *will* make /that/ choice. 
 Here's why.
 
 What does it mean to make and hold multiple (mutated) copies of a single 
 entity?
 
 That is, I obtain a piece of data from somewhere and make it invariant.
 Somehow two threads obtain references to that piece of data.
 If none of them attempt to change it, then it makes no difference that 
 it is marked invariant.
 If however, one of them is programmed to change it, then it now has a 
 different,
 version of that entity to the other thread. But what does that mean? Who 
 has the 'right' version?
 
 Show me a real situation where two threads can legitimately be making 
 disparate modifications to a single entity,
 string or otherwise, and I'll show you a programming error. Once two 
 threads make disparate modifications to an entity,
 they are separate entities. And they should have been given copies, not 
 references to a single copy, in the first place.
 
 If the intent is that the share a single entity, then any legitimate 
 modifications to that single entity should be reflected
 in the views of that single entity by both threads. And therefore 
 subjected to locking, or STM or whatever mechanism is
 used to control that modification.
 
 This whole thing of invariance and concurrency seems to be aimed at 
 enabling the use of COW.

Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at its core?
 And if that is the case, and I very much hope it isn't, then let me tell 
 you as someone who is intimately familiar with the
 one existing system that wen this route (iThreads: look'em up), that it 
 is a total disaster,

ithreads copies the entire user data per thread. Using invariant is, of course, a way to avoid copying the data.
 The whole purpose and advantage of multi-threading, over 
 multi-processing, is (mutable) shared state. And the elimination of
 costs of serialisation and narrow bandwidth if IPC in the forking 
 concurrency mode. Attempting to emulate that model
 using threading gives few of its advantages, all of its disadvantages, 
 and throws away all of the advantages of threading.
 It is a complete and utter waste of time and effort.

I can agree with that.
Apr 28 2008
parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:48169E90.6050700 digitalmars.com...
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.


Type inference in D is done "bottom up". Doing overloading based on function return type is "top down". Trying to get both schemes to coexist is a hard problem.

But a function's result can be overloaded using "out", so why can't it be overloaded using the return value? Can't the compiler treat a return value as an implicit out argument? L.
Apr 29 2008
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Lionello Lunesu wrote:
 
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:48169E90.6050700 digitalmars.com...
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.


Type inference in D is done "bottom up". Doing overloading based on function return type is "top down". Trying to get both schemes to coexist is a hard problem.

But a function's result can be overloaded using "out", so why can't it be overloaded using the return value?

We know what the type of the out argument is. The problem with return value overloading is not knowing what the type should be.
 Can't the compiler treat a return value as an implicit out argument?

Suppose the return value is used as an argument to another function with overloaded versions. Rinse, repeat. The combinations grow out of control.
Apr 29 2008
prev sibling parent reply e-t172 <e-t172 akegroup.org> writes:
Lionello Lunesu a écrit :
 
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:48169E90.6050700 digitalmars.com...
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.


Type inference in D is done "bottom up". Doing overloading based on function return type is "top down". Trying to get both schemes to coexist is a hard problem.

But a function's result can be overloaded using "out", so why can't it be overloaded using the return value? Can't the compiler treat a return value as an implicit out argument?

Consider this: int foo(); float foo(); void bar(int a); void bar(float a); Then this: void main() { bar(foo()); } There is an obvious problem here.
Apr 29 2008
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"e-t172" wrote
 Lionello Lunesu a écrit :
 "Walter Bright" wrote in message
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.


Type inference in D is done "bottom up". Doing overloading based on function return type is "top down". Trying to get both schemes to coexist is a hard problem.

But a function's result can be overloaded using "out", so why can't it be overloaded using the return value? Can't the compiler treat a return value as an implicit out argument?

Consider this: int foo(); float foo(); void bar(int a); void bar(float a); Then this: void main() { bar(foo()); } There is an obvious problem here.

Yes, one that is solved like any other that has ambiguity: casting. We will have the same problem when opImplicitCast is introduced. This seems like a rare case anyways, not a reason not to have overloaded return values. -Steve
Apr 29 2008
prev sibling parent "Hans W. Uhlig" <huhlig gmail.com> writes:
e-t172 wrote:
 Lionello Lunesu a écrit :
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:48169E90.6050700 digitalmars.com...
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.


Type inference in D is done "bottom up". Doing overloading based on function return type is "top down". Trying to get both schemes to coexist is a hard problem.

But a function's result can be overloaded using "out", so why can't it be overloaded using the return value? Can't the compiler treat a return value as an implicit out argument?

Consider this: int foo(); float foo(); void bar(int a); void bar(float a); Then this: void main() { bar(foo()); } There is an obvious problem here.

One of two things, make an assumption as to which is called by which has the higher priority(based on precision or type). Or throw a compiler error if no cast is made. Overload Ambiguity, Cast Must be made when both return overload and parameter overload types are ambigious.
Apr 30 2008
prev sibling next sibling parent Gide Nwawudu <gide btinternet.com> writes:
On Mon, 28 Apr 2008 02:14:19 +0200, "Simen Kjaeraas"
<simen.kjaras gmail.com> wrote:

<p9e883002 sneakemail.com> wrote:

 Of course, a wasn't really mutated. Instead, args[0] was copied and then
 mutated and labelled a. Then a was copied and mutated and reassigned the
 mutated copy.

 So, that's two copies of the string, plus a slice, plus an extra method  
 call to
 achieve what used to be achievable in place on the original string.  
 Which is now
 immutable, but I'll never need it again.

 Of course, on these short 1-off strings it doesn't matter a hoot. But  
 when the
 strings are 200 to 500 characters a pop and there are 20,000,000 of  
 them. It
 matters.

 Did I suggest this was an optimisation?

 Whatever immutability-purity cool aid you've been drinking, please go  
 back to
 coke. And give us usable libraries and sensible implicit conversions.  
 Cos this sucks
 bigtime.

 b.

Is this what you wanted to write? int main(string[] args) { char[] a = cast(char[])args[0]; a[2..5] = "XXX"; writefln(a); return 0; } This compiles and runs, and seems to do what you describe. Sure, there's a cast there, but it's not all that bad, is it?

Or just add a dup. int main(string[] args) { char[] a = args[0].dup; a[2..5] = "XXX"; writefln(a); return 0; }
Apr 27 2008
prev sibling next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Simen Kjaeraas wrote:
 <p9e883002 sneakemail.com> wrote:
 
 Of course, a wasn't really mutated. Instead, args[0] was copied and then
 mutated and labelled a. Then a was copied and mutated and reassigned the
 mutated copy.

 So, that's two copies of the string, plus a slice, plus an extra 
 method call to
 achieve what used to be achievable in place on the original string. 
 Which is now
 immutable, but I'll never need it again.

 Of course, on these short 1-off strings it doesn't matter a hoot. But 
 when the
 strings are 200 to 500 characters a pop and there are 20,000,000 of 
 them. It
 matters.

 Did I suggest this was an optimisation?

 Whatever immutability-purity cool aid you've been drinking, please go 
 back to
 coke. And give us usable libraries and sensible implicit conversions. 
 Cos this sucks
 bigtime.

 b.

Is this what you wanted to write? int main(string[] args) { char[] a = cast(char[])args[0]; a[2..5] = "XXX"; writefln(a); return 0; } This compiles and runs, and seems to do what you describe. Sure, there's a cast there, but it's not all that bad, is it?

I'm no invariant guru, but I don't think that's legal. 'invariant' means the data could be stored in a portion of memory that the OS will not allow the program to write to. So you need to dup it: char[] a = args[0].dup; a[2..5] = "XXX"; writefln(a); return 0; That stuff like this compiles and seems to work is why we really need to make at least one alternative version of cast. One would be for relative safe run-of-the-mill casts, like casting float to int, or casting Object to some class (and checking for null), and the other category would be for dangerous big red flags kind of things like the above. Using the run-of-the-mill cast in the above situation would not be allowed. --bb
Apr 27 2008
next sibling parent Tomas Lindquist Olsen <tomas famolsen.dk> writes:
Bill Baxter wrote:

... snip ...

 
 That stuff like this compiles and seems to work is why we really need to 
 make at least one alternative version of cast.  One would be for 
 relative safe run-of-the-mill casts, like casting float to int, or 
 casting Object to some class (and checking for null),  and the other 
 category would be for dangerous big red flags kind of things like the 
 above.  Using the run-of-the-mill cast in the above situation would not 
 be allowed.

Amen to that !!!
Apr 28 2008
prev sibling parent "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Bill Baxter" <dnewsgroup billbaxter.com> wrote in message 
news:fv3612$sgu$1 digitalmars.com...
 That stuff like this compiles and seems to work is why we really need to 
 make at least one alternative version of cast.  One would be for relative 
 safe run-of-the-mill casts, like casting float to int, or casting Object 
 to some class (and checking for null),  and the other category would be 
 for dangerous big red flags kind of things like the above.  Using the 
 run-of-the-mill cast in the above situation would not be allowed.

That request has been on the "unofficial wish list" since the beginning.. And I still agree with it. Maybe cast() should be parsed as a template. Then, the compiler should require more "!"s as the risc increases: SomeClass sc = cast(SomeClass)some_obj; //OK int i = cast!(int)some_float; //might not fit SomeClass sc = cast!!(SomeClass)void_ptr; //unsafe char[] mutstring = cast!!!!!!!!(char[])toUpper("..."); //wtf are you doing! L.
Apr 29 2008
prev sibling next sibling parent reply p9e883002 sneakemail.com writes:
On Mon, 28 Apr 2008 02:14:19 +0200, "Simen Kjaeraas" 
<simen.kjaras gmail.com> wrote:
 <p9e883002 sneakemail.com> wrote:
 
 Is this what you wanted to write?
 
 int main(string[] args)
 {
    char[] a =3D cast(char[])args[0];
    a[2..5] =3D "XXX";
    writefln(a);
    return 0;
 }
 This compiles and runs, and seems to do what you describe. Sure, there's=
  a
 cast there, but it's not all that bad, is it?

No. You missed out uppercasing the string before replacing the slice.
Apr 27 2008
parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Mon, 28 Apr 2008 02:44:14 +0200, <p9e883002 sneakemail.com> wrote:

 On Mon, 28 Apr 2008 02:14:19 +0200, "Simen Kjaeraas"
 <simen.kjaras gmail.com> wrote:
 <p9e883002 sneakemail.com> wrote:

 Is this what you wanted to write?

 int main(string[] args)
 {
    char[] a =3D3D cast(char[])args[0];
    a[2..5] =3D3D "XXX";
    writefln(a);
    return 0;
 }
 This compiles and runs, and seems to do what you describe. Sure,  =


 there's=3D
  a
 cast there, but it's not all that bad, is it?

No. You missed out uppercasing the string before replacing the slice.

That's why I replied to my own post stating just that. Anyways, Gide got it right. A .dup is the correct way, a cast is wrong. -- Simen
Apr 27 2008
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Janice Caron wrote:
 2008/4/28 Simen Kjaeraas <simen.kjaras gmail.com>:
  int main(string[] args)
  {
   char[] a = cast(char[])args[0];
   a[2..5] = "XXX";
   writefln(a);
   return 0;
  }
  This compiles and runs, and seems to do what you describe. Sure, there's a
  cast there, but it's not all that bad, is it?

Yes, it's extremely bad. Casting away invariant is UNDEFINED BEHAVIOR, and should never be done.

It's not merely undefined, it's *illegal*! I hate the C/C++ tradition of calling "undefined behavior" to things that are *illegal*. Yes, illegal behavior causes undefined behavior, but they're not the same thing. Illegal is something that may cause your program to crash, or simply become in a fault and erroneous state. Undefined is just undefined. For example, this expression in C: a = (x++) + x*2; has undefined behavior (because of order of evaluation issues). But it's not *illegal* behavior, your program will not crash and burn because of that. -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Apr 29 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
2008/4/28 Simen Kjaeraas <simen.kjaras gmail.com>:
  int main(string[] args)
  {
   char[] a = cast(char[])args[0];
   a[2..5] = "XXX";
   writefln(a);
   return 0;
  }
  This compiles and runs, and seems to do what you describe. Sure, there's a
  cast there, but it's not all that bad, is it?

Yes, it's extremely bad. Casting away invariant is UNDEFINED BEHAVIOR, and should never be done. You should never need an explicit cast just to handle text!
Apr 28 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
2008/4/28  <p9e883002 sneakemail.com>:
  import std.string;

  int main( string[] args) {
     char[] a = cast(char[])args[0].toupper();

**** UNDEFINED BEHAVIOR **** (1) args might be placed in a hardware-locked read-only segment. Then the following line would fail (2) there might be other pointers to the string, which expect it never to change.
     a[2..5] = "XXX";
     a = cast(char[])tolower( cast(string)a );
     writefln(a);
     return 0;
  }

  Finally. It works.

But not necessarily on all architectures, because of the undefined behavior. This is how you do it without undefined behavior. import std.string; int main( string[] args) { string a = args[0].toupper(); a = a[0..2] ~ "XXX" ~ a[5..$]; a = a.tolower(); writefln(a); return 0; }
Apr 28 2008
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
<p9e883002 sneakemail.com> wrote
 Hi all,

 'scuse me for not being familiar with previous or ongoing discussion on 
 this
 subject, but I'm just coming back to D after a couple of years away.

 I have some strings read in from external source that I need to convert to
 uppercase. A quick look at Phobos and I find std.string has a toupper 
 method.
 <very good example case removed>

This is all not an issue if Walter adopts 'scoped const' contracts. http://d.puremagic.com/issues/show_bug.cgi?id=1961 The current con for this method is that it is another 'confusing' const syntax. So is what I propose more confusing, or is what this poor developer had to go through more confusing? -Steve
Apr 28 2008
parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Janice Caron (caron800 googlemail.com)'s article
 2008/4/28 Steven Schveighoffer <schveiguy yahoo.com>:
  > I have some strings read in from external source that I need to convert to
  > uppercase. A quick look at Phobos and I find std.string has a toupper
  > method.
  > <very good example case removed>

  This is all not an issue if Walter adopts 'scoped const' contracts.

invariant version should employ copy-on-write, wheras any other versions would not be able to do this. That is, toupper("HELLO"); can return the original, if and only if the string is invariant.

Can you explain this in light of Steven's 'scoped const' proposal? By my understanding (assuming scoped const): string bufI = "HELLO"; char[] bufM = "HELLO".dup; const(char)[] bufC = bufM; string retI = toupper( bufI ); // return value is invariant - ok char[] retM = toupper( bufM ); // return value is mutable - ok const(char)[] retC = toupper( bufC ); // return value is const - ok const(char)[] retC2 = toupper( bufI ); // return value is invariant - ok bufM[0] = 'J'; assert( retC[0] == 'J' ); The above seems perfectly fine, because it's impossible to pass a mutable array and return a const reference to it--the return value will be mutable as well. By contrast, let's assume the invariant implementation: string toupper( string buf ); char[] buf = "HELLO".dup; toupper( buf ); // fails toupper( buf.idup ); // works toupper( assertUnique( buf ) ); // works In the first case I have to copy buf to pass it to toupper, and in the second I have to perform a cast operation (albeit wrapped in a function to hide the truth). Assuming for a moment that mutable strings are useful and so I won't be able to use the 'string' alias all the time, can you explain what is good about either of these scenarios? Sean
Apr 28 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
2008/4/28 Me Here <p9e883002 sneakemail.com>:
  Ack! That's horrible. Instead of using the information I have, the
  offset and length of the slice I want to manipulate, I have to derive
  two offset/length pairs to the bits I do not want to do anything to.

Not necessarily. Instead of a = a[0..2] ~ "XXX" ~ a[5..$]; you could do char[] tmp = a.dup; tmp[2..5] = "XXX"; a = assumeUnique(tmp); (I forget which module you have to import to get assumeUnique). But what you mustn't ever do is cast away invariant.
  1) Whatever happened to polymorphism?

What's polymorphism got to do with anything? A string is an array, not a class.
  So, if I assign the results of a string library function/method to a
  mutable variable (Just a variable really. An invariant variable is a
  constant!), then it should be possible (*IS* possible) to recognise
  that and return an appropriate result.

Functions don't overload on return value.
  The idea that runtime obtained or derived strings can be made truely
  invariant is purely theoretical.

But the fact that someone else might be sharing the data is not.
  But one of the
  major attractions of D over C/C++ is its built-in string types

D has no built in string type. string is just an alias for invariant(char)[].
  If everytime I want to use one
  of these library calls, I have to cast my mutable string into and
  invariant and then cast the result back to mutable

That's one approach. Another is don't try to treat strings as mutable.
Apr 28 2008
prev sibling next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
2008/4/28 Steven Schveighoffer <schveiguy yahoo.com>:
  > I have some strings read in from external source that I need to convert to
  > uppercase. A quick look at Phobos and I find std.string has a toupper
  > method.
  > <very good example case removed>

  This is all not an issue if Walter adopts 'scoped const' contracts.

toupper() couldn't be reused for all constancies, because the invariant version should employ copy-on-write, wheras any other versions would not be able to do this. That is, toupper("HELLO"); can return the original, if and only if the string is invariant.
Apr 28 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 2008/4/28 Steven Schveighoffer:
  > I have some strings read in from external source that I need to 
 convert to
  > uppercase. A quick look at Phobos and I find std.string has a toupper
  > method.
  > <very good example case removed>

  This is all not an issue if Walter adopts 'scoped const' contracts.

toupper() couldn't be reused for all constancies, because the invariant version should employ copy-on-write, wheras any other versions would not be able to do this. That is, toupper("HELLO"); can return the original, if and only if the string is invariant.

toupper is probably a bad example, as your case seems like the rarest :) But I understand what you are saying. The desire to have string processing functions work with all constancies seems very reasonable and useful to me. To deny usage of toupper unless you idup the array, just to have the ability to optimize on a corner case seems incorrect, and to probably produce less efficient code for 90% of the cases. If the scoped const proposal was never accepted, and I used Phobos, I'd probably suggest a const and mutable version of toupper that allowed for those of us who use mutable strings a lot, and maybe not so much multithreadding, to not have to jump through hoops for any string processing. Maybe the solution to this is to write specializations which use COW with the invariant version, perhaps with pure functions, which always assume invariant parameters. So you have a pure toupper which handles the invariant version, and a scoped const version which allows using the function on non-invariant parameters, which can't be optimized the same anyways... -Steve
Apr 28 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 28/04/2008, Me Here <p9e883002 sneakemail.com> wrote:
 Ah! Again, 3 lines instead of 1. Plus two function calls and a temporary
variable.

To be fair though, the problem here is that the functions you are calling (std.string.toupper and std.string.tolower) don't do what you want. This is not a fault of the language - it's a limitation of the library. To that end, as others have said, this problem could be solved simply enough by the addition of another module - say, std.stringbuffer - in which we alias char[] to stringbuffer (or maybe a StringBuffer class - I'm not sure what's best) and provide a whole bunch of functions optimized for those mutable char arrays. To blame the language for the lack of library is the wrong approach. D2 has some killer, kickass features. The template metaprogramming power alone is enough to make C++ programmers weep. I'm looking forward to pure functions, and a new generation of multithreading. If there's enough interest, and if Walter approves, I could certainly kickstart std.stringbuffer. Is that the right way to go? What do people think?
Apr 28 2008
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
p9e883002 sneakemail.com wrote:
 So, that's two copies of the string, plus a slice, plus an extra method call
to 
 achieve what used to be achievable in place on the original string. Which is
now 
 immutable, but I'll never need it again. 
 
 Of course, on these short 1-off strings it doesn't matter a hoot. But when the 
 strings are 200 to 500 characters a pop and there are 20,000,000 of them. It 
 matters.
 
 Did I suggest this was an optimisation?

You bring up a good point. On a tiny example such as yours, where you can see everything that is going on at a glance, such as where strings come from and where they are going, there isn't any point to immutable strings. You're right about that. The problems start happening as the complexity rises. Strings get passed around, stored, modified, etc. It's real easy to lose track of who owns a string, who else has references to the string, who has rights to change the string and who doesn't. For example, you're changing the char[][] passed in to main(). What if one of those strings is a literal in the read-only data section? So what happens is code starts defensively making copies of the string "just in case." I'll argue that in a complex program, you'll actually wind up making far more copies than you will with invariant strings.
Apr 28 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 p9e883002 sneakemail.com wrote:
 So, that's two copies of the string, plus a slice, plus an extra method 
 call to achieve what used to be achievable in place on the original 
 string. Which is now immutable, but I'll never need it again. Of course, 
 on these short 1-off strings it doesn't matter a hoot. But when the 
 strings are 200 to 500 characters a pop and there are 20,000,000 of them. 
 It matters.

 Did I suggest this was an optimisation?

You bring up a good point. On a tiny example such as yours, where you can see everything that is going on at a glance, such as where strings come from and where they are going, there isn't any point to immutable strings. You're right about that. The problems start happening as the complexity rises. Strings get passed around, stored, modified, etc. It's real easy to lose track of who owns a string, who else has references to the string, who has rights to change the string and who doesn't. For example, you're changing the char[][] passed in to main(). What if one of those strings is a literal in the read-only data section? So what happens is code starts defensively making copies of the string "just in case." I'll argue that in a complex program, you'll actually wind up making far more copies than you will with invariant strings.

I agree that immutable strings can be valuable. That's why I think it's important to have a version of toupper that uses invariant strings because you can make more assumptions about when to make copies. But why shouldn't there be a version that does the same thing with mutable or const strings? Why should a developer be forced to always use invariant strings when the optimizations and multithreading benefits that come with only using invariant strings may not be more important for a particular program than being able to modify a string? I should still be able to use toupper on mutable strings as well... -Steve
Apr 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 I agree that immutable strings can be valuable.  That's why I think it's 
 important to have a version of toupper that uses invariant strings because 
 you can make more assumptions about when to make copies.  But why shouldn't 
 there be a version that does the same thing with mutable or const strings? 
 Why should a developer be forced to always use invariant strings when the 
 optimizations and multithreading benefits that come with only using 
 invariant strings may not be more important for a particular program than 
 being able to modify a string?  I should still be able to use toupper on 
 mutable strings as well...

That's why I agreed with Janice on making a stringbuffer module that operates on mutable strings. It's easier than arguing about it, and it doesn't hurt to have such a package. And I suspect that after using it for a while, people will naturally evolve towards using all invariant strings.
Apr 28 2008
next sibling parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
Walter Bright wrote:

 Steven Schveighoffer wrote:
 I agree that immutable strings can be valuable.  That's why I think it's
 important to have a version of toupper that uses invariant strings
 because
 you can make more assumptions about when to make copies.  But why
 shouldn't there be a version that does the same thing with mutable or
 const strings? Why should a developer be forced to always use invariant
 strings when the optimizations and multithreading benefits that come with
 only using invariant strings may not be more important for a particular
 program than
 being able to modify a string?  I should still be able to use toupper on
 mutable strings as well...

That's why I agreed with Janice on making a stringbuffer module that operates on mutable strings. It's easier than arguing about it, and it doesn't hurt to have such a package. And I suspect that after using it for a while, people will naturally evolve towards using all invariant strings.

After working with Java for quite some time, I have naturally drifted from using invariant strings to stringbuffers. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Apr 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Lars Ivar Igesund wrote:
 After working with Java for quite some time, I have naturally drifted from
 using invariant strings to stringbuffers.

Java strings lack slicing, so they're crippled anyway. I believe that slicing is one of those paradigm-shifting features, so I am not making an irrelevant point.
Apr 28 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 Lars Ivar Igesund wrote:
 After working with Java for quite some time, I have naturally drifted 
 from
 using invariant strings to stringbuffers.

Java strings lack slicing, so they're crippled anyway. I believe that slicing is one of those paradigm-shifting features, so I am not making an irrelevant point.

Java's String.substring(start, last) works just like slicing... Not that I don't love D slicing above calling a function, but saying that Java doesn't have slicing is completely false. Where they lack is in the support of mutable strings, and especially having strings be treated as native arrays. D excels in those areas. -Steve
Apr 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 Java's String.substring(start, last) works just like slicing...

No it doesn't. It makes a copy (I don't know if this is true of *all* versions of Java).
Apr 28 2008
next sibling parent Ary Borenszweig <ary esperanto.org.ar> writes:
Walter Bright escribió:
 Steven Schveighoffer wrote:
 Java's String.substring(start, last) works just like slicing...

No it doesn't. It makes a copy (I don't know if this is true of *all* versions of Java).

A String holds an char[], the "start" in it and it's "length". A substring just creates another String instance with "start" and "length" changed. So it makes a new String, but the underlying char[] remains the same.
Apr 28 2008
prev sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Walter Bright wrote:
 Steven Schveighoffer wrote:
 Java's String.substring(start, last) works just like slicing...

No it doesn't. It makes a copy (I don't know if this is true of *all* versions of Java).

Java's 6's string.substring method (JDK 1.6.0_04, 64-bit Windows): public String substring(int beginIndex, int endIndex) { if (beginIndex < 0) { throw new StringIndexOutOfBoundsException(beginIndex); } if (endIndex > count) { throw new StringIndexOutOfBoundsException(endIndex); } if (beginIndex > endIndex) { throw new StringIndexOutOfBoundsException(endIndex -beginIndex); } return ((beginIndex == 0) && (endIndex == count)) ? this : new String(offset + beginIndex, endIndex - beginIndex, value); } The important part is new String(offset + beginIndex, endIndex - beginIndex, value) which does indeed do a "slice" of sorts (that is, it returns a string with the same char array backing it with a new offset and length). No copying of data is done.
Apr 28 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Robert Fraser (fraserofthenight gmail.com)'s article
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 Java's String.substring(start, last) works just like slicing...

No it doesn't. It makes a copy (I don't know if this is true of *all* versions of Java).

public String substring(int beginIndex, int endIndex) { if (beginIndex < 0) { throw new StringIndexOutOfBoundsException(beginIndex); } if (endIndex > count) { throw new StringIndexOutOfBoundsException(endIndex); } if (beginIndex > endIndex) { throw new StringIndexOutOfBoundsException(endIndex -beginIndex); } return ((beginIndex == 0) && (endIndex == count)) ? this : new String(offset + beginIndex, endIndex - beginIndex, value); } The important part is new String(offset + beginIndex, endIndex - beginIndex, value) which does indeed do a "slice" of sorts (that is, it returns a string with the same char array backing it with a new offset and length). No copying of data is done.

Right. The issue in Java is that the String wrapper class is still allocated on the heap so DMA is still occurring. D, on the other hand, uses a fat reference so creating a slice doesn't touch the heap at all. Sean
Apr 28 2008
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Robert Fraser wrote:
 The important part is new String(offset + beginIndex, endIndex - 
 beginIndex, value) which does indeed do a "slice" of sorts (that is, it 
 returns a string with the same char array backing it with a new offset 
 and length). No copying of data is done.

Yes, you are right. I was wrong. But Java is still new'ing a new instance of String for each slice. And it still uses two levels of indirection to get to the string data.
Apr 28 2008
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Robert Fraser wrote:
 The important part is new String(offset + beginIndex, endIndex - 
 beginIndex, value) which does indeed do a "slice" of sorts (that is, it 
 returns a string with the same char array backing it with a new offset 
 and length). No copying of data is done.

Sun has it right. GNU Classpath has it wrong and copies the data every time.
Apr 28 2008
prev sibling parent Lars Ivar Igesund <larsivar igesund.net> writes:
Walter Bright wrote:

 Lars Ivar Igesund wrote:
 After working with Java for quite some time, I have naturally drifted
 from using invariant strings to stringbuffers.

Java strings lack slicing, so they're crippled anyway. I believe that slicing is one of those paradigm-shifting features, so I am not making an irrelevant point.

I agree that Java strings are crippled, but considering that String is easier to use there than StringBuffer, I certainly would need good reasons to prefer the latter? And I have. Your point about slicing may not be irrelevant, but the kickass-ness of the feature only truly comes to its right when combined with non-allocating string operations. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Apr 28 2008
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Walter Bright wrote:
 Steven Schveighoffer wrote:
 I agree that immutable strings can be valuable.  That's why I think 
 it's important to have a version of toupper that uses invariant 
 strings because you can make more assumptions about when to make 
 copies.  But why shouldn't there be a version that does the same thing 
 with mutable or const strings? Why should a developer be forced to 
 always use invariant strings when the optimizations and multithreading 
 benefits that come with only using invariant strings may not be more 
 important for a particular program than being able to modify a 
 string?  I should still be able to use toupper on mutable strings as 
 well...

That's why I agreed with Janice on making a stringbuffer module that operates on mutable strings. It's easier than arguing about it, and it doesn't hurt to have such a package. And I suspect that after using it for a while, people will naturally evolve towards using all invariant strings.

"people will naturally evolve towards using all invariant strings." Oh please. This whole discussion between "Me here" and Walter was always occurring under the notion that one either has to use all mutable strings, or all invariant strings, which is a silly idea. Use what is right for what you are trying to do! The original post code was a clear-cut example of invariant misuse. If you are going to make one or several different mutations to a string, do not use invariant, use mutable. The fact that there isn't a mutable/in-place tolower has no bearing on the const/invariant system (only on the Phobos library design). So if you had any quarrel, it wasn't with D's immutability system, but with library design (which Walter already said he plans to fix... at least on what std.string is concerned). And Walter, people won't "naturally evolve towards using all invariant strings" (nor they should). If I have a function where I'm going to perform a series of changes to a string, I'm not going to dup them with each change just to say "How cute, I'm using invariant all the way!". I'll do all the changes on a mutable string, and they return either a mutable, const, or invariant string, as appropriate to what makes sense in the code. -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Apr 29 2008
prev sibling parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Walter Bright wrote:

p9e883002 sneakemail.com wrote:

Did I suggest this was an optimisation?

You bring up a good point.

Sorry to have provoked you Walter, but thanks for your reply.
On a tiny example such as yours, where you can see everything that is 
going on at a glance, such as where strings come from and where they are 
going, there isn't any point to immutable strings. You're right about that.

Well obviously the example was trivial to concentrate attention upon the issue I was having.
  It's real easy to lose track of who owns a string, who else has references to
the string, who has rights to change the string and who doesn't.

The keyword in there is "who". The problem is that you are pessimising the entire language, once rightly famed for it's performance, for *all* users. For the notional convenience of those few writing threaded applications. Now don't go taking that the wrong way. In other circles, I am known as "Mr. Threading". At least for my advocacy of them, if not my expertise. Though I have been using threads for a relatively long time, going way back to pre-1.0 OS/2 (then known internally as CP/DOS). Only mentioned to show I'm not in the "thread is spelt f-o-r-k" camp.
For example, you're changing the char[][] passed in to main(). What if one 
of those strings is a literal in the read-only data section?

Okay. So that begs the question of how does runtime external data end up in a read-only data section? Of course, it can be done, but that then begs the question: why? But let's ignore that for now and concentrate on the development on my application that wants to mutate one or more of those strings. The first time I try to mutate one, I'm going to hit an error, either compile time or runtime, and immediately know, assuming the error message is reasonably understandable, that I need to make a copy of the immutable to string into something I can mutate. A quick, *single* dup, and I'm away and running. Provided that I have the tools to do what I need that is. In this case, and the entire point of the original post, that means a library of common string manipulation functions that work on my good old fashioned char[]s without my needing jump through the hoops of neo-orthodoxy to use them. But, as I tried to point out in the post to which you replied, the whole 'args' thing is a red herring. It was simply a convenient source of non-compile-time data. I couldn't get the std.stream example to compile. Apparently due to a bug in the v2 libraries--see elsewhere. In this particular case, I turned to D in order to manipulate 125,000,000 x 500 to 2000 byte strings. A dump of a inverted index DB. I usually do this kinda stuff in a popular scripting language, but that proved to be rather too slow for this volume of data. Each of those records needs to go through multiple mutations. From uppercasing of certain fields; the complete removal of certain characters within substantial subsets of each record; to the recalculation and adjustment of an embedded hex digest within each record to reflect the preceding changes. All told, each record my go through anything from 5 to 300 separate mutations. Doing this via immutable buffers is going to create scads and scads of short-lived, immutable sub-elements that will just tax the GC to hell and impose unnecessary and unacceptable time penalties on the process. And I almost certainly will have to go through the process many times before I get the data in the ultimate form I need.
So what happens is code starts defensively making copies of the string 
"just in case." I'll argue that in a complex program, you'll actually wind 
up making far more copies than you will with invariant strings.
[from another post] I bet that, though, after a while they'll evolve to 
eschew it in favor of immutable strings. It's easier than arguing about it

You are so wrong here. I spent 2 of the worst years of my coding career working in Java, and ended up fighting it all the way. Whilst some of that was due to their sudden re-invention of major parts of the system libraries in completely incompatible ways when the transition from (from memory) 1.2 to 1.3 occurred--and being forced to make the change because of the near total abandonment of support or bug fixing for the 'old libraries'. Another big part of the problem was the endless complexities involved in switching between the String type and the StringBuffer type. Please learn from history. Talk to (experienced) Java programmers. I mean real working stiffs, not OO-purists from academia. Preferably some that have experience of other languages also. It took until v1.5 before the performance of Java--and the dreaded GC pregnant pause--finally reached a point where Java performance for manipulating large datasets was both reasonable, and more importantly, reasonably deterministic. Don't make their mistakes over. Too many times in the last thirty years I've seen promising, pragmatic software technologies tail off into academic obscurity because th primary motivators suddenly "got religion". Whether OO dogma or functional purity or whatever other flavour of neo-orthodoxy became flavour de jour, The assumption that "they'll see the light eventually" has been the downfall of many a promising start. Just as the answer to the occasional hit-and-run death is not banning cars, so fixing unintentional aliasing in threaded applications does not lie in forcing all character arrays to be immutable. For one reason, it doesn't stop there. Character arrays, are just arrays of numbers. Exactly the same problems arise with arrays of integers, reals, associative arrays. etc. Imagine the costs of duplicating an entire hash every time you add a new key or alter a value. The penalties grow exponentially with the size of the hash (array of ints, longs, reals ...). And before you reject this notion on the basis that "I'd never do that", what's the difference? Are strings any more vulnerable to the problems invariance is meant to tackle that these other datatypes? Try manipulating large datasets--images, DNA data, signal processing, finite element analysis; any of the types of applications for which multi-threading isn't just a way allow the program to do something useful while the user decides which button to click--in any of the "referentially transparent" languages that are concurrency capable and see the hoops you have to leap through to achieve anything like descent performance. Eg. Haskell Unsafe* library routines (Basically, abandon referential transparency for this data so that we can get something done in a reasonable time frame!). Look for "If you can match 1-core C speed using 4-core Haskell parallelism without "unsafe pseudo-C in Haskell" trickery, I will be impressed. ..." in the following article: http://reddit.com/r/programming/info/61p6f/comments/ The abandonment or deprecation of lvalue slices on string types is the thin end of the wedge toward referential transparency and despite all the academic hype and impressive (small scale) demos of the 'match made in heaven' that is 'referential transparency & concurrency', try to seek out real-world examples of the combination running in real-world environments. Ie. Where someone other than the tax-payer of whatever country is paying for the development, and the time pressure to obtain the results are a little more demanding than Thesis submission date and you'll find them very conspicuous by their absence. Such ideas look great on paper, in the heady world of ideal Turing Machines with unlimited length tapes (unbounded memory). But once you bring them back to the real world of finite RAM, fragmentable heaps and GC, they becomes impractical. Unworkable for real data sets in real time. Don't feel the need to argue this on-forum. If it hasn't persuaded you that forcing invariance upon one datatype, through providing a string library that only work with invariant strings, will do little to address the problems it attempts to solve, then I doubt further discussion will. Please return to the pragmatism that so stood out in your early visions for D and abandon this folly before, as with so many of the follies of the gentleman academic of yore, it becomes a life-long quest ending up as a memorial or tombstone. Cheers, b. --
Apr 28 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
 Don't feel the need to argue this on-forum. If it hasn't persuaded you
 that forcing invariance upon one datatype, through providing a string
 library that only work with invariant strings, will do little to address
 the problems it attempts to solve, then I doubt further discussion will.

There's always Tango :p
 Please return to the pragmatism that so stood out in your early visions
 for D and abandon this folly before, as with so many of the follies of the
 gentleman academic of yore, it becomes a life-long quest ending up as a
 memorial or tombstone.

As a point of interest, this quote is at the top of the DigitalMars D page: "It seems to me that most of the "new" programming languages fall into one of two categories: Those from academia with radical new paradigms and those from large corporations with a focus on RAD and the web. Maybe it's time for a new language born out of practical experience implementing compilers." -- Michael Sean
Apr 28 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
 Just as the answer to the occasional hit-and-run death is not banning 
 cars, so fixing unintentional aliasing in threaded applications does not 
 lie in forcing all character arrays to be immutable.

D does not force all character arrays to be immutable. You can use mutable ones by declaring them as: char[] Reference types all come in 3 flavors: mutable, read-only-view-of (i.e. const) and invariant.
Apr 28 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Walter Bright wrote:

Me Here wrote:
Just as the answer to the occasional hit-and-run death is not banning  
cars, so fixing unintentional aliasing in threaded applications does not  
lie in forcing all character arrays to be immutable.

D does not force all character arrays to be immutable. You can use mutable ones by declaring them as: char[] Reference types all come in 3 flavors: mutable, read-only-view-of (i.e. const) and invariant.

Well no, but having lhe string libraries only accept and return invariant strings it amounts to much the same thing. I'm disappointed that's the only point from my post worthy of reaction :( --
Apr 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
 Walter Bright wrote:
 
 Me Here wrote:
 Just as the answer to the occasional hit-and-run death is not 
 banning  cars, so fixing unintentional aliasing in threaded 
 applications does not  lie in forcing all character arrays to be 
 immutable.

D does not force all character arrays to be immutable. You can use mutable ones by declaring them as: char[] Reference types all come in 3 flavors: mutable, read-only-view-of (i.e. const) and invariant.

Well no, but having lhe string libraries only accept and return invariant strings it amounts to much the same thing.

I agreed with Janet's proposal to create a parallel set of routines that worked on mutable strings.
 I'm disappointed that's the only point from my post worthy of reaction :(

It appeared to me to be based on the assumption that D forced all character arrays to be invariant.
Apr 28 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Walter Bright wrote:

 
 I'm disappointed that's the only point from my post worthy of reaction :(

It appeared to me to be based on the assumption that D forced all character arrays to be invariant.

Well no. It also went on to counter the idea that we're all going to come around to your way of thinking on this in short order. And to attempt to dispell the idea that the provision of inmutable strings, without doing the same for all the other datatypes, is going to fix anthing major. The exact same problems you describe for character arrays, exists for int arrays and unit arrays and....hashes of every flavour. Fixing one, if fixing them is what this does, without also fixing all the others, just moves the goal posts (a little). If a piece of code needs to know that the subject of a reference (string, int array, hash, whatever), isn't going to change, it is (and should be) *its responsibility* to ensure that--by taking a private copy. Burdening all code with the costs of immutability just in case someone is vulnerable to its mutation, *and* is too lazy to take a copy, seems like making everyne wear condoms in case someone might have sex. And doing for just one type of array when they all suffer from the same problem, doesn't seem liely to address the problems of unwanted pregnancies.
 I agreed with Janet's proposal to create a parallel set of routines that
worked on mutable strings.

Sure. Sometime soon we will have a mutable string capable library again, and then we'll see how beneficial immutable strings really are on the basis of how many people make use of them. But that doesn't address the issue of the salience of the reasoning for having them in the first place. Or the costs of using them in terms of stack fragmentation, additional GC runs, destruction of cache coherency, etc. etc. etc. --
Apr 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
 If a piece of code needs to know that the subject of a reference
 (string, int array, hash, whatever), isn't going to change, it is
 (and should be) *its responsibility* to ensure that--by taking a
 private copy.

There are two ways of doing it. One is COW, where those who make the change make the copy. The other way doesn't have a name, but it's making a copy "just in case" someone else might mutate it. I think you're proposing the latter. Invariant strings is a way of enforcing COW, rather than relying on documentation. There's no doubt you can make JIC work successfully. I've used it myself for decades. But I always find myself expending effort trying to optimize away those copies, and so find it more productive to go the other way and use COW. While I am comfortable using COW with mutable strings, the many many discussions of it in this forum made it clear that most would like to have some compiler help with it. Invariant strings fit the bill nicely.
Apr 28 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Walter Bright wrote:

There are two ways of doing it. One is COW, where those who make the 
change make the copy. The other way doesn't have a name, but it's making a 
copy "just in case" someone else might mutate it. I think you're proposing 
the latter. Invariant strings is a way of enforcing COW, rather than 
relying on documentation.

There's no doubt you can make JIC work successfully. I've used it myself 
for decades. But I always find myself expending effort trying to optimize 
away those copies, and so find it more productive to go the other way and 
use COW.

While I am comfortable using COW with mutable strings, the many many 
discussions of it in this forum made it clear that most would like to have 
some compiler help with it. Invariant strings fit the bill nicely.

Okay Walter, This will be my last word on the subject. When I posted the headpost of this thread, I had no idea what I was getting into. I've since taken the time to catch up on some of the history, along with that of the Phobos/Tango debate. See below. As I see it, both mechanisms are "just in case". The difference is that with invariants and COW, everyone who /doesn't/ need immutability has to copy so that the one person who does need it, if they indeed exist at all which we have no way of knowing, doesn't have to copy. The other way, the one person who knows they need immutability has to copy, and everyone else simply ignores the issue. If you're given a reference and you need it not to change, take a copy and hide it away. Then it cannot. If you're given a reference and you don't care if it changes, (or you want to be apprised of any changes), use it, Keep it or throw it away. Expecting everyone else to take extra precuations, always, "just in case", so that you don't have to take precautions even when you know you need to, seems the height of selfishness. STM (from elsewhere)
This whole thing of invariance and concurrency seems to be aimed at
enabling the use of COW.


Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at
its core?

I'm not sure that I follow the question in context, or the meaning of "copy-swap". STM is an alternative to locking for concurrency control. Essentially, each reader of known (marked) shared state gets a copy of the state. And an internal copy is made. If that reader later attempts to write back to the shared state, it's current value is compared against the internal copy taken when read, If they are disparate, the code that is attempting to write gets rolled back to the read point and is given the updated value (and another internal copy is taken) Lather, rinse, repeat until the copy and current values are the same, then commit the change and continue. Fairly expensive, and only works for code that can be rolled back (ie. referentially transparent code). Useless for anything that interacts with the outside world. Eg. writes to the screen, or a file, or the file system, or reads from a non-rewindable source like a port or socket or the terminal. Efficient if you live in a referentially transparent world--all data exists at compile time; no interaction with the outside world. Next to useless otherwise. You still need locking or some other mechanism to deal with external state. If that describes copy-swap then yes. Else no :) Phobos vs. Tango I definitely don't want the dead weight of pointless OO wrappers or deeply nested hierarchies. Nor the "everything must be OO" philosophy. Once I regain access to std.string for my char[]s, (and a simple, expectation conformant rand() function :), I'll be happy. Till then, I'll get outta yer hair and go back to trying to process my 140GB of data using D1. ( Which is a shame because I really like some of the language changes for D2. The extension to foreach for processing files looks cool. I'd also vote for the convergence of for/foreach if that was possible without moving away from a context-free grammar, I haven't had occasion to explore the lazyness facilties yet, but they sound cool. Ditto the templating. ) Despite our difference on the issue above, please add my goodwill and paudits to your trophy box for your vision and provision of D. Cheers, b. --
Apr 29 2008
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
 If that describes copy-swap then yes. Else no :)

copy-swap is what lock free algorithms rely on for updating a data structure. It's at the root of STM, and even has its own TLA, CAS (Copy And Swap).
Apr 29 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Me Here wrote:
 If that describes copy-swap then yes. Else no :)

structure. It's at the root of STM, and even has its own TLA, CAS (Copy And Swap).

I believe CAS actually stands for "compare and swap" or "compare and set" depending on who you talk to. RCU is probably a more popular algorithm for copy and swap--it's used in the Linux kernel quite a bit. It stands for "read, copy, update," I believe. Sean
Apr 29 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Sean Kelly wrote:

 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Me Here wrote:
 If that describes copy-swap then yes. Else no :)

structure. It's at the root of STM, and even has its own TLA, CAS (Copy And Swap).

I believe CAS actually stands for "compare and swap" or "compare and set" depending on who you talk to. RCU is probably a more popular algorithm for copy and swap--it's used in the Linux kernel quite a bit. It stands for "read, copy, update," I believe. Sean

From the litrature I found, CAS is (was originally) the name of the opcode used on a Sun microprocessor to conditionally and atomically swap the contents of two words of memory (or maybe memory and register). It also mentions a CASX opcode, and a LL/SC (Load Linked / Store Conditional) pairing that can be used as alternatives. Cheers, b. --
Apr 29 2008
parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
 Sean Kelly wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Me Here wrote:
 If that describes copy-swap then yes. Else no :)

structure. It's at the root of STM, and even has its own TLA, CAS (Copy And Swap).

I believe CAS actually stands for "compare and swap" or "compare and set" depending on who you talk to. RCU is probably a more popular algorithm for copy and swap--it's used in the Linux kernel quite a bit. It stands for "read, copy, update," I believe. Sean

used on a Sun microprocessor to conditionally and atomically swap the contents of two words of memory (or maybe memory and register). It also mentions a CASX opcode, and a LL/SC (Load Linked / Store Conditional) pairing that can be used as alternatives.

Yeah, LL/SC is pretty cool. The hardware transactional memory proposals I've seen are like LL/SC on steroids. Bit more flexible than CAS, but either works. Sean
Apr 29 2008
prev sibling parent "Me Here" <p9e883002 sneakemail.com> writes:
Walter Bright wrote:

 Me Here wrote:
 If that describes copy-swap then yes. Else no :)

copy-swap is what lock free algorithms rely on for updating a data structure. It's at the root of STM, and even has its own TLA, CAS (Copy And Swap).

Ah! As in compare & exchange (cmpxchg & cmpxchg8b) x86 opcodes. I wasn't thinking at the m/code level. Cheers, b. --
Apr 29 2008
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
 Phobos vs. Tango
 I definitely don't want the dead weight of pointless OO wrappers or deeply
 nested hierarchies. Nor the "everything must be OO" philosophy.
 Once I regain access to std.string for my char[]s, (and a simple,
 expectation conformant rand() function :), I'll be happy.

Please don't discount Tango based on what has been said about it in this forum. I know for a fact that Walter, for example, has never even looked at Tango (or he hadn't as of a few weeks ago anyway). In truth, the percentage of classes to functions in Tango is roughly the same as in Phobos... Tango is just a much larger library. If you're interested in algorithms and string operations, I suggest looking at tango.core.Array and tango.text.*. The former is basically C++'s <algorithm> retooled for D arrays, and the latter holds all the string-specific routines in Tango. Sean
Apr 29 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Sean Kelly wrote:

 == Quote from Me Here (p9e883002 sneakemail.com)'s article
 Phobos vs. Tango
 I definitely don't want the dead weight of pointless OO wrappers or deeply
 nested hierarchies. Nor the "everything must be OO" philosophy.
 Once I regain access to std.string for my char[]s, (and a simple,
 expectation conformant rand() function :), I'll be happy.

Please don't discount Tango based on what has been said about it in this forum. I know for a fact that Walter, for example, has never even looked at Tango (or he hadn't as of a few weeks ago anyway). In truth, the percentage of classes to functions in Tango is roughly the same as in Phobos... Tango is just a much larger library. If you're interested in algorithms and string operations, I suggest looking at tango.core.Array and tango.text.*. The former is basically C++'s <algorithm> retooled for D arrays, and the latter holds all the string-specific routines in Tango. Sean

The primary basis of my immediate decision regarding Tango was it incompatibility with Phobos as outlined in http://www.d-riven.com/index.cgi?tango-phobos (And several other first page hits when googling for "D Tango Phobos") Beyond that, I'm in favour of OO when only when it truly benefits me. That is, when it manages state that I would otherwise *have* to manage myself. That, for example, does not mean simply substituting an object handle for an OS handle. Nor caching of derived values unless their derivation is truly expensive. Nor the use of getters and setters to avoid direct manipulation of attributes, unless there is some genuine value-add from doing so. OO-dogma that they will isolate the library from speculative future changes in the underlying OS calls (that have been fixed in stone for 1 or4 decades or more) do not cut much ice with me. I'm also not fond of all-in-one library packaging. Seems to me that there is enough information in the source code to allow libraries to be packaged as discrete dlls/sos and to only statically link against those required. But that may be a tool chain problem rather than anything to do with Tango. It should be possible to substitute one implementation of a std.* library for another, without it being an all or nothing change. I should be able mix'n'match between implementations of std.* packages. For example, with the std.string problem I've been having. If I use import std.string; char[] a = readln(); a.toupper(); it should work. If I do import std.string.immutable; it wouldn't. One of the things that force me to go away from D a couple of years ago was the ever changing state of the libraries. Not the internal, implementations or occasional bugs, but the constantly changing interface definitions. It becomes impractical to develop a major project when you're constantly rewriting major chunks of code to accommodate the latest set of group think on the best way to package the OS and "C lib" functionality. Back then, I put it down to the necessary gestation of a new language, and moved away to get my project done. I've now come back and find that the same situation exists. The answer to an essentially trivial problem is to write and entire new library. Or rather, since the library I need was already a part of Phobos with D 0.-something resurrect and old library. And that's the most worrying thing of all. The removal of the existing library from Phobos because the main proponents suddenly drank the cool aid of Invariant strings--especially for reasoning that I still find entirely specious--does not bode well for ongoing stability I had hoped that during my two years away, that at least the interfaces would have become standardised, even if the implementations varied from release to release. But if whole chunks of functionality can suddenly disappear from the library, at the same time as major new chunks of very desirable functionality are added to the language, on the whim of 1 or 4 major players getting religion, then I'm really not sure that D is, or will ever be, ready for anything other than academic exploration of compiler technology. Reading that back. the independence of Tango begins to be more attractive, even if I have a distaste for the "everything must be OO" philosophy that (apparently) underlies it. Maybe I'll pull a copy and look for myself. For my current needs, I'm just looking for C speed with having to manage my own memory For the project I went away from D for 2 years ago, and came back hoping for stability, my own personal research project come memorial folly to be, I don't think D is yet ready for that. Maybe D1 if it doesn't becomes completely unsupported. In the interim I've "done the rounds" of an amazing variety of languages. From the functional brigade, Haskell, OCaml, Mozart/Oz, Erlang et al. and various of the newer dynamic languages. They all have their attractions, but most are spoilt by some level of dogma. Haskell with is purity. Python with the whole significant whitespace thing. P6 with unix-first, and non-delivery. Mostly, the dynamics lack the low-level control and performance I need. I've been seriously working with structured assembler to achieve the low level control and performance I want, but doing everything yourself just takes you off into far to many interesting side projects. Implementing your own memory management could occupy a lifetime; especially if you consider the possibility and advantages of using (wait for it) a segmented architecture. Most older programmers memory of segmented memory stems from the 16-bit Intel days and they (almost) universally eschew any notions of it now a 32-bit (and 64-bit) flat memory models are available. But there are some very interesting possibilities in combining 32-bit segments and virtual memory. D is my last best hope of avoiding the assembler route and trying to do it all myself. Walter's pragmatism stood out in my early experience of both the language and library design--al be it that they kept changing;)--but I was really expecting (hoping) for greater stability by this point. Ooh. Did I write all that? Still. It has persuaded me to at least go look at Phobos, even if it is done with a jaundiced eye. A stable, even if philosophically distasteful, implementation of the staples is better than a philosophically desirable but whimsically changing one. Cheers for prompting me to re-think my blanket dismissal. b. --
Apr 29 2008
next sibling parent "Me Here" <p9e883002 sneakemail.com> writes:
A stable, even if philosophically distasteful, implementation of the 
staples is better than a philosophically desirable but
whimsically changing one.

I of course meant Tango. --
Apr 29 2008
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
 That, for example, does not mean simply substituting an object handle
 for an OS handle. Nor caching of derived values unless their
 derivation is truly expensive. Nor the use of getters and setters to
 avoid direct manipulation of attributes, unless there is some genuine
 value-add from doing so. OO-dogma that they will isolate the library
 from speculative future changes in the underlying OS calls (that have
 been fixed in stone for 1 or4 decades or more) do not cut much ice
 with me.

I'm of the same opinion with that.
 One of the things that force me to go away from D a couple of years
 ago was the ever changing state of the libraries. Not the internal,
 implementations or occasional bugs, but the constantly changing
 interface definitions.

That's why D 1.0 was split off. It was done to provide a stable platform that only gets bug fixes.
Apr 29 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Walter Bright wrote:

Me Here wrote:
That, for example, does not mean simply substituting an object handle
for an OS handle. Nor caching of derived values unless their
derivation is truly expensive. Nor the use of getters and setters to
avoid direct manipulation of attributes, unless there is some genuine
value-add from doing so. OO-dogma that they will isolate the library
from speculative future changes in the underlying OS calls (that have
been fixed in stone for 1 or4 decades or more) do not cut much ice
with me.

I'm of the same opinion with that.
One of the things that force me to go away from D a couple of years
ago was the ever changing state of the libraries. Not the internal,
implementations or occasional bugs, but the constantly changing
interface definitions.

That's why D 1.0 was split off. It was done to provide a stable platform that only gets bug fixes.

Understood, but when I went to upgrade from my very old 1.x version and discovered there was a D2, I did look for an explaination (on the web sote rather than in the forums) and came up short. I guess the clue was in the alpha status, but a few lines somewhere on the download page explaining the difference wouldn't go amiss. As I also mentioned, the descriptions I found (whilst looking for the above) of the new D2 language features drew me to it. Without thinking the the implications, it strikes me that a segregation of the compilers from the runtimes would allow the mating of the d2 compiler with the D1 libraries? The D1 libraries themselves would not use or benefit from the new D2 language features but it would allow applications access to those features whilst retaining the stability of D1 libraries. But that probably entails extra work, as well as a not inconsiderable amount of careful consideration regarding the long term implications, so don't take it as a request. Just a notion in passing. Anyway, I did promise to get outta yer hair, so...I'm gone. Thanks again, b. --
Apr 29 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
 As I also mentioned, the descriptions I found (whilst looking for the
 above) of the new D2 language features
 drew me to it. Without thinking the the implications, it strikes me that a
 segregation of the compilers from
 the runtimes would allow the mating of the d2 compiler with the D1
 libraries?

See Tango, once again ;-) In fact, as things stand, the same runtime could be used for both 1.0 and 2.0 with a bit of work. I haven't done this with Tango mostly because I lack the time, but it's quite possible. Alternately, separate runtimes could be distributed and linked individually without pulling in the bulk of an entire standard library. The "Advanced Configuration Guide" I liked previously for Tango shows how to do it. This is simply more flexibility than the typical user cares about or wants to deal with, so the sub-libraries are repackaged into a larger aggregate library for the default distribution. But if you build Tango locally you'll find that the sub-libraries are still built behind the scenes. From memory, the names are: libtango-rt-dmd.a : Tango runtime for DMD libtango-rt-gdc.a : Tango runtime for GDC libtango-gc-basic.a : Tango basic/default garbage collector libtango-cc-tango.a : "common code" for the Tango standard library The runtime contains only the compiler runtime code, the GC library only the garbage collector, and the "common code" library contains user-facing code which actually needs to be linked into every D application--that being thread code and some error handling routines. If you want to build on a system with no multithreading, for example, simply toss stub out the 3 or so calls that this module exposes.
 The D1 libraries themselves would not use or benefit from the new D2
 language features but it would allow
 applications access to those features whilst retaining the stability of D1
 libraries.

Right. The greatest obstacle here is the const design, since the meaning of "const" is actually different between D 1.0 and 2.0, as well as the requirement that even code in version blocks must be syntactically correct. Thus to support a toString method that returns a const string, for example, the return value must be declared as an alias using a string mixin. Messy stuff, but it does work. Sean
Apr 29 2008
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Sean Kelly wrote:

 Right.  The greatest obstacle here is the const design, since the meaning
 of "const" is actually different between D 1.0 and 2.0, as well as the
 requirement that even code in version blocks must be syntactically
 correct.  Thus to support a toString method that returns a const string,
 for example, the return value must be declared as an alias using a string
 mixin.  Messy stuff, but it does work.
 

I personally think that for a big library like Tango, using a preprocessor would be a less painful way to go. The shipped versions of the lib would have the preprocessor already pre-run, so would be pure D code. --bb
Apr 29 2008
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
 Sean Kelly wrote:
 == Quote from Me Here (p9e883002 sneakemail.com)'s article
 Phobos vs. Tango
 I definitely don't want the dead weight of pointless OO wrappers or deeply
 nested hierarchies. Nor the "everything must be OO" philosophy.
 Once I regain access to std.string for my char[]s, (and a simple,
 expectation conformant rand() function :), I'll be happy.

Please don't discount Tango based on what has been said about it in this forum. I know for a fact that Walter, for example, has never even looked at Tango (or he hadn't as of a few weeks ago anyway). In truth, the percentage of classes to functions in Tango is roughly the same as in Phobos... Tango is just a much larger library. If you're interested in algorithms and string operations, I suggest looking at tango.core.Array and tango.text.*. The former is basically C++'s <algorithm> retooled for D arrays, and the latter holds all the string-specific routines in Tango.

with Phobos as outlined in http://www.d-riven.com/index.cgi?tango-phobos (And several other first page hits when googling for "D Tango Phobos")

For what it's worth, the "Tangobos" project is a port of Phobos to the Tango runtime, and a pre-packaged version is available on the Tango website. If you compare the source code with Phobos itself, you'll find that there are precious few diffs anywhere in the entire package, and the few that exist are mostly in std.thread. So this may be an option if you'd like to use both together.
 Beyond that, I'm in favour of OO when only when it truly benefits me. That is,
 when it manages state that I would otherwise *have* to manage myself.
 That, for example, does not mean simply substituting an object handle for an
OS handle.
 Nor caching of derived values unless their derivation is truly expensive.
 Nor the use of getters and setters to avoid direct manipulation of attributes,
 unless there is some genuine value-add from doing so.

That's the basic philosophy behind Tango. In fact, the bulk of the objects in Tango are in the IO package, with much of the remainder being in places where polymorphism is desirable (localization, for example). If you find an object in Tango that has no actual state information, it's generally packages as a class for this reason.
 OO-dogma that they will isolate the library from speculative future changes in
the
 underlying OS calls (that have been fixed in stone for 1 or4 decades or more)
 do not cut much ice with me.

Me either. However, since Tango is portable across Win32 and Posix systems (currently), I do think an argument could be made for some level of abstraction. But the C API headers are available as well if you really want to use them.
 I'm also not fond of all-in-one library packaging. Seems to me that there is
enough
 information in the source code to allow libraries to be packaged as discrete
dlls/sos
 and to only statically link against those required. But that may be a tool
chain problem
 rather than anything to do with Tango.

This was actually driven by fairly vocal user request. The original conception was for Tango to be a lightweight, modular framework to be extended by users rather than a monolithic library. In fact, we didn't even distribute an all-in-one prebuilt library for Tango until sometime last summer. Before that we expected that a tool such as Bud or Rebuild would be used. This is still quite possible however, and the modular design in terms of code dependency is still in place. If you choose to toss the tango-user lib altogether and find you want even more modularity, I suggest looking at this page: http://dsource.org/projects/tango/wiki/TopicAdvancedConfiguration As far as I know, I'm the only one that has actually read it so it's a bit out of date (the library names are wrong), but the basic concept still applies. That is, the choice of a GC can be made at link-time with Tango, and other portions of the runtime are easily replaceable as well. Some kernel projects have found this useful in the past.
 It should be possible to substitute one implementation of a std.* library for
another,
 without it being an all or nothing change. I should be able mix'n'match between
 implementations of std.* packages.
 For example, with the std.string problem I've been having. If I use
     import std.string;
     char[] a = readln();
     a.toupper();
 it should work. If I do
     import std.string.immutable;
 it wouldn't.

Agreed. My biggest complaint here is having to maintain two essentially identical packages, assuming I were to do such a thing. This is why I find Steven's "scoped const" proposal so attractive.
 One of the things that force me to go away from D a couple of years ago was
the ever changing state of the
 libraries. Not the internal, implementations or occasional bugs, but the
constantly changing interface definitions.
 It becomes impractical to develop a major project when you're constantly
rewriting major chunks of code to accommodate
 the latest set of group think on the best way to package the OS and "C lib"
functionality.
 Back then, I put it down to the necessary gestation of a new language, and
moved away to get my project done.
 I've now come back and find that the same situation exists. The answer to an
essentially trivial problem is to write
 and entire new library. Or rather, since the library I need was already a part
of Phobos with D 0.-something
 resurrect and old library.
 And that's the most worrying thing of all. The removal of the existing library
from Phobos because the main proponents
 suddenly drank the cool aid of Invariant strings--especially for reasoning
that I still find entirely specious--does not bode well
 for ongoing stability

The lack of responsiveness of the Phobos maintainers (ie. Walter) is what drove us to create Tango in the first place. I'll freely admit that the design of Tango has changed here and there as we've moved through beta, but it's largely solidified now and will be frozen once we hit 1.0. Neither I nor the other Tango developers have any desire to maintain deprecated code and the like, so we've been doing our utmost to find a design that we hope will last. Also, that there is at least one commercial project based on Tango (I think there are actually more, but I don't keep track of this very closely) says a lot about the library's stability and its support.
 I had hoped that during my two years away, that at least the interfaces would
have become standardised,
 even if the implementations varied from release to release. But if whole
chunks of functionality can suddenly
 disappear from the library, at the same time as major new chunks of very
desirable functionality are added to the language,
 on the whim of 1 or 4 major players getting religion, then I'm really not sure
that D is, or will ever be,
 ready for anything other than academic exploration of compiler technology.

To be honest, I actually feel much the same. However, I also feel that D 1.0 is a fantastically designed language overall, and I would choose it in a second over C or C++ (I'm a systems programmer so those are really the only other choices available). So at the end of the day, I will be disappointed of the "future of D" takes a hard left-hand turn towards somewhere I have no interest in going, but since I'm really quite happy with D 1.0 I won't shed too many tears over it. This perhaps doesn't bode terribly well for my use of D in the long-term, but that's a bridge I'll jump off if and when I come to it.
 Reading that back. the independence of Tango begins to be more attractive,
even if I have a distaste for the
 "everything must be OO" philosophy that (apparently) underlies it. Maybe I'll
pull a copy and look for myself.
 For my current needs, I'm just looking for C speed with having to manage my
own memory
 For the project I went away from D for 2 years ago, and came back hoping for
stability, my own personal
 research project come memorial folly to be, I don't think D is yet ready for
that. Maybe D1 if it doesn't
 becomes completely unsupported.

If it's speed you're looking for, Tango is it ;-) The IO subsystem trounces pretty much everything I've seen it compared to, for example. In practice, I think you'll find that one reason for this is that no hidden allocations take place anywhere in Tango. This tends to conflict with convenient use in some cases for simple apps however, so as things stand now I do think that some users would benefit from convenience wrappers. I tend to do this sort of thing myself for my own projects, but a third-party package would be nice for those less inclined. If you're interested in direct performance comparisons however, I suggest reading the "benchmarks" links on this page: http://dsource.org/projects/tango/wiki/Documentation The XML tests in particular are pretty astounding (I feel comfortable saying that because I had nothing to do with the development of that particular package :).
 In the interim I've "done the rounds" of an amazing variety of languages. From
the functional brigade,
 Haskell, OCaml, Mozart/Oz, Erlang  et al. and various of the newer dynamic
languages. They all have their
 attractions, but most are spoilt by some level of dogma. Haskell with is
purity. Python with the whole
 significant whitespace thing. P6 with unix-first, and non-delivery.
 Mostly, the dynamics lack the low-level control and performance I need. I've
been seriously working with
 structured assembler to achieve the low level control and performance I want,
but doing everything yourself
 just takes you off into far to many interesting side projects. Implementing
your own memory management
 could occupy a lifetime; especially if you consider the possibility and
advantages of using (wait for it) a
 segmented architecture. Most older programmers memory of segmented memory
stems from the 16-bit Intel
 days and they (almost) universally eschew any notions of it now a 32-bit (and
64-bit) flat memory models are available.
 But there are some very interesting possibilities in combining 32-bit segments
and virtual memory.
 D is my last best hope of avoiding the assembler route and trying to do it all
myself. Walter's pragmatism stood out
 in my early experience of both the language and library design--al be it that
they kept changing;)--but I was really
 expecting (hoping) for greater stability by this point.

Personally, the combination I find most attractive for my work right now is Erlang backed by C or D for the performance-critical work. That gives me the easy parallelization I want, IPC, etc, plus an easy way of optimizing the heck out of trouble spots... or simply sidestepping the strict functional model when data sharing is actually needed. I've actually come to feel that having a language separation here is a good thing as well, because it prevents "bleed through" of concepts which I feel risks poisoning the efficacy of each approach. As for the rest, Kris, one of other Tango developers, has done a lot of work in the embedded space and pushed very hard in the past for D to better support this style of programming. He wasn't terribly successful insofar as language/compiler development was concerned (there has been a lot of talk in the past about TypeInfo in particular), but Tango, at least, was designed with embedded development in mind. The lack of hidden DMA, for example. I can't say whether Tango will suit your needs, but it does seem to at least match your general goals with D.
 Ooh. Did I write all that? Still. It has persuaded me to at least go look at
Phobos, even if it is done with a jaundiced eye.
 A stable, even if philosophically distasteful, implementation of the staples
is better than a philosophically desirable but
 whimsically changing one.
 Cheers for prompting me to re-think my blanket dismissal. b.

Thank you for reconsidering :-) D may be a young language, but there has really been quite a bit of drama surrounding it in newsgroup discussion. I think it can sometimes be difficult to look past all this and take the time to decide for oneself. If nothing else, doing so takes time, and even I tend to employ the "30 second rule" when it comes to new technology. Sean
Apr 29 2008
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Sean Kelly wrote:
 If it's speed you're looking for, Tango is it ;-)  The IO subsystem trounces
pretty much everything
 I've seen it compared to, for example.  In practice, I think you'll find that
one reason for this is that
 no hidden allocations take place anywhere in Tango.  This tends to conflict
with convenient use in
 some cases for simple apps however, so as things stand now I do think that
some users would
 benefit from convenience wrappers.  

Yes please! votes++ I don't need blazing speed for my debug printfs. I need the most convenient API possible. Stdout.print("hi").newline is not quite that.
 I tend to do this sort of thing myself for my own projects,
 but a third-party package would be nice for those less inclined.  

I don't know why, but I just really dislike seeing little "mytools" dependencies dangling off of what would otherwise be nice little self-contained modules. Maybe I just find it makes it harder to reuse code. This module depends on "mytools" but over there we're using "yourtools". Do we merge them to become "ourtools", or keep both, or port my code to use yourtools instead? It's just easier to mix and match if a module doesn't have such frivolous external dependencies. --bb
Apr 29 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 28/04/2008, Sean Kelly <sean invisibleduck.org> wrote:
 Can you explain this in light of Steven's 'scoped const' proposal?

I meant that non-invariant versions would have to make a copy, but the invariant version sometimes wouldn't. That means they can't share the same code.
     string bufI = "HELLO";
     char[] bufM = "HELLO".dup;
     const(char)[] bufC = bufM;

     const(char)[] retC = toupper( bufC ); // return value is const - ok

     bufM[0] = 'J';
     assert( retC[0] == 'J' );

Why would that assert hold? I would expect toupper(char[]) to have to return a copy precisely in order to /prevent/ that problem. What am I missing?
Apr 28 2008
prev sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 28/04/2008, Walter Bright <newshound1 digitalmars.com> wrote:
  What it will do is provide a useful solution for those who really want to
 use mutable strings. I bet that, though, after a while they'll evolve to
 eschew it in favor of immutable strings.

I'm inclined to agree with the prediction - but even so, wouldn't that be a good thing? I mean, if it keeps people on board with D2 who might otherwise have run away, then that's good, right? And if those people later realise they can do more with immutable strings, then that's good too, right? Just a thought.
Apr 28 2008