digitalmars.D - Is all this Invarient **** er... stuff, premature optimisation?

p9e883002 sneakemail.com (107/107) Apr 27 2008 Hi all,

Simen Kjaeraas (18/38) Apr 27 2008 he

Simen Kjaeraas (12/49) Apr 27 2008 hen

p9e883002 sneakemail.com (58/85) Apr 27 2008 Okay, you got around the first cast by using

Janice Caron (16/25) Apr 28 2008 **** UNDEFINED BEHAVIOR ****

Me Here (64/96) Apr 28 2008 Ack! That's horrible. Instead of using the information I have, the

Janice Caron (14/29) Apr 28 2008 Not necessarily. Instead of

Me Here (15/22) Apr 28 2008 Ah! Again, 3 lines instead of 1. Plus two function calls and a temporary...

Janice Caron (17/18) Apr 28 2008 To be fair though, the problem here is that the functions you are

Walter Bright (5/8) Apr 28 2008 What it will do is provide a useful solution for those who really want

Sean Kelly (33/41) Apr 28 2008 I do agree with the notion that the majority of operations performed

Lars Ivar Igesund (9/24) Apr 28 2008 Indeed, in the application I'm currently writing at work, there is not a

Janice Caron (7/10) Apr 28 2008 I'm inclined to agree with the prediction - but even so, wouldn't that

Walter Bright (2/12) Apr 28 2008 I think we're in agreement.

Me Here (92/111) Apr 28 2008 They don't? Why not? Seems like a pretty obvious step to me.

Walter Bright (31/87) Apr 28 2008 Type inference in D is done "bottom up". Doing overloading based on

Lionello Lunesu (6/13) Apr 29 2008 But a function's result can be overloaded using "out", so why can't it b...

Walter Bright (5/20) Apr 29 2008 We know what the type of the out argument is. The problem with return
e-t172 (12/28) Apr 29 2008 Consider this:

Steven Schveighoffer (6/33) Apr 29 2008 Yes, one that is solved like any other that has ambiguity: casting. We ...
Hans W. Uhlig (6/40) Apr 30 2008 One of two things, make an assumption as to which is called by which has...

Gide Nwawudu (10/46) Apr 27 2008 Or just add a dup.
Bill Baxter (16/56) Apr 27 2008 I'm no invariant guru, but I don't think that's legal. 'invariant'

Tomas Lindquist Olsen (3/11) Apr 28 2008 Amen to that !!!
Lionello Lunesu (11/17) Apr 29 2008 That request has been on the "unofficial wish list" since the beginning....

p9e883002 sneakemail.com (3/17) Apr 27 2008 No. You missed out uppercasing the string before replacing the slice.

Simen Kjaeraas (4/22) Apr 27 2008 That's why I replied to my own post stating just that.

Janice Caron (4/13) Apr 28 2008 Yes, it's extremely bad. Casting away invariant is UNDEFINED BEHAVIOR,

Bruno Medeiros (14/28) Apr 29 2008 It's not merely undefined, it's *illegal*!

terranium (3/11) Apr 28 2008 I think one should first use safe methods at the cost of performance and...
Steven Schveighoffer (7/15) Apr 28 2008 This is all not an issue if Walter adopts 'scoped const' contracts.

Janice Caron (7/12) Apr 28 2008 toupper() couldn't be reused for all constancies, because the

Steven Schveighoffer (19/33) Apr 28 2008 toupper is probably a bad example, as your case seems like the rarest :)...
Sean Kelly (27/40) Apr 28 2008 Can you explain this in light of Steven's 'scoped const' proposal? By m...

Janice Caron (7/14) Apr 28 2008 I meant that non-invariant versions would have to make a copy, but the

Walter Bright (14/23) Apr 28 2008 You bring up a good point.

Steven Schveighoffer (11/34) Apr 28 2008 I agree that immutable strings can be valuable. That's why I think it's...

Walter Bright (6/15) Apr 28 2008 That's why I agreed with Janice on making a stringbuffer module that

Lars Ivar Igesund (8/26) Apr 28 2008 After working with Java for quite some time, I have naturally drifted fr...

Walter Bright (4/6) Apr 28 2008 Java strings lack slicing, so they're crippled anyway. I believe that

Steven Schveighoffer (7/14) Apr 28 2008 Java's String.substring(start, last) works just like slicing...

Walter Bright (3/4) Apr 28 2008 No it doesn't. It makes a copy (I don't know if this is true of *all*

Ary Borenszweig (5/10) Apr 28 2008 A String holds an char[], the "start" in it and it's "length". A
Robert Fraser (19/24) Apr 28 2008 Java's 6's string.substring method (JDK 1.6.0_04, 64-bit Windows):

Sean Kelly (5/29) Apr 28 2008 Right. The issue in Java is that the String wrapper class is still allo...
Walter Bright (4/8) Apr 28 2008 Yes, you are right. I was wrong. But Java is still new'ing a new
Christopher Wright (2/6) Apr 28 2008 Sun has it right. GNU Classpath has it wrong and copies the data every t...

Lars Ivar Igesund (12/19) Apr 28 2008 I agree that Java strings are crippled, but considering that String is

Bruno Medeiros (24/41) Apr 29 2008 "people will naturally evolve towards using all invariant strings."

Me Here (111/125) Apr 28 2008 Well obviously the example was trivial to concentrate attention upon the...

Sean Kelly (8/16) Apr 28 2008 As a point of interest, this quote is at the top of the DigitalMars D pa...
Walter Bright (6/9) Apr 28 2008 D does not force all character arrays to be immutable. You can use

Me Here (5/14) Apr 28 2008 Well no, but having lhe string libraries only accept and return invarian...

Walter Bright (5/24) Apr 28 2008 I agreed with Janet's proposal to create a parallel set of routines that...

Me Here (15/20) Apr 28 2008 Well no. It also went on to counter the idea that we're all going to co...

Walter Bright (13/17) Apr 28 2008 There are two ways of doing it. One is COW, where those who make the

Me Here (62/78) Apr 29 2008 Okay Walter,

Walter Bright (4/5) Apr 29 2008 copy-swap is what lock free algorithms rely on for updating a data

Sean Kelly (6/11) Apr 29 2008 I believe CAS actually stands for "compare and swap" or "compare and set...

Me Here (8/22) Apr 29 2008 From the litrature I found, CAS is (was originally) the name of the opco...

Sean Kelly (4/24) Apr 29 2008 Yeah, LL/SC is pretty cool. The hardware transactional memory proposals...

Me Here (4/8) Apr 29 2008 Ah! As in compare & exchange (cmpxchg & cmpxchg8b) x86 opcodes. I wasn't...

Sean Kelly (10/15) Apr 29 2008 Please don't discount Tango based on what has been said about it in this

Me Here (69/87) Apr 29 2008 The primary basis of my immediate decision regarding Tango was it incomp...

Me Here (3/6) Apr 29 2008 .
Walter Bright (4/16) Apr 29 2008 That's why D 1.0 was split off. It was done to provide a stable platform...

Me Here (25/41) Apr 29 2008 Understood, but when I went to upgrade from my very old 1.x version and

Sean Kelly (29/39) Apr 29 2008 See Tango, once again ;-) In fact, as things stand, the same runtime co...

Bill Baxter (6/13) Apr 29 2008 I personally think that for a big library like Tango, using a

Sean Kelly (69/153) Apr 29 2008 For what it's worth, the "Tangobos" project is a port of Phobos to the T...

Bill Baxter (12/19) Apr 29 2008 Yes please! votes++

terranium (2/9) Apr 29 2008 Building an SQL query with multiple concats is a well-known pitfall for ...

p9e883002 sneakemail.com writes:

Hi all,

'scuse me for not being familiar with previous or ongoing discussion on this 
subject, but I'm just coming back to D after a couple of years away.

I have some strings read in from external source that I need to convert to 
uppercase. A quick look at Phobos and I find std.string has a toupper method.

import std.stdio;
import std.string;

int main( char[][] args ) {
    char[] a = args[ 0 ].toupper();
    writefln( a );
    return 0;
}

c:\dmd\test>dmd junk.d
junk.d(5): function std.string.toupper (invariant(char)[]) does not match 
parameter types (char[])
junk.d(5): Error: cannot implicitly convert expression (args[0u]) of type
char[] 
to invariant(char)[]
junk.d(5): Error: cannot implicitly convert expression (toupper(cast(invariant
(char)[])(args[0u]))) of type invariant(char)[] to char[]

Hm. Okey dokey.

import std.stdio;
import std.string;

int main( char[][] args ) {
    char[] a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
    writefln( a );
    return 0;
}

junk.d(5): Error: cannot implicitly convert expression (toupper(cast(invariant
(char)[])(args[0u]))) of type invariant(char)[] to char[]

Shoulda known :(

import std.stdio;
import std.string;

int main( char[][] args ) {
    string a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
    writefln( a );
    return 0;
}

c:\dmd\test>dmd junk.d

c:\dmd\test>junk
C:\DMD\TEST\JUNK.EXE

Great! Now I need to replace the bit in the middle:

import std.stdio;
import std.string;

int main( char[][] args ) {
    string a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
    a[ 2 .. 4 ] = "XXX";
    writefln( a );
    return 0;
}

c:\dmd\test>dmd junk.d
junk.d(6): Error: slice a[cast(uint)2..cast(uint)4] is not mutable

Wha..? What's the point in having slices if I can't use them?

import std.stdio;
import std.string;

int main( char[][] args ) {
    char[] a = cast(char[]) ( cast(invariant(char)[]) args[ 0 ] ).toupper();
    a[ 2 .. 4 ] = "XXX";
    writefln( a );
    return 0;
}

Finally, it works. But can you see what's going on in line 5 amongst all that 
casting? Cos I sure can't.

So, I read that all this invarient stuff is about efficiency. For whom?
Must be the compiler because it sure ain't about programmer efficiency.

Ah. Maybe I meant to ignore the beauty of slices and use strings  and method 
calls for everything?

import std.stdio;
import std.string;

int main( string[] args ) {
    string a = args[ 0 ].toupper();
    a.replace(  a[ 2 .. 4 ], "XXX" );
    writefln( a );
    return 0;
}

Compiles clean and runs:

c:\dmd\test>dmd junk.d

c:\dmd\test>junk
C:\DMD\TEST\JUNK.EXE

But does nothing! 

import std.stdio;
import std.string;

int main( string[] args ) {
    string a = args[ 0 ].toupper();
    a = a.replace(  a[ 2 .. 4 ], "XXX" );
    writefln( a );
    return 0;
}

c:\dmd\test>dmd junk.d

c:\dmd\test>junk
C:XXXMD\TEST\JUNK.EXE

Finally, it runs. But at what cost? The 'immutable' a has ended up being
mutated. 
I still had to specify the slice, but I had to call another method call to
actually 
do the deed. 

Of course, a wasn't really mutated. Instead, args[0] was copied and then 
mutated and labelled a. Then a was copied and mutated and reassigned the 
mutated copy. 

So, that's two copies of the string, plus a slice, plus an extra method call to 
achieve what used to be achievable in place on the original string. Which is
now 
immutable, but I'll never need it again. 

Of course, on these short 1-off strings it doesn't matter a hoot. But when the 
strings are 200 to 500 characters a pop and there are 20,000,000 of them. It 
matters.

Did I suggest this was an optimisation?

Whatever immutability-purity cool aid you've been drinking, please go back to 
coke. And give us usable libraries and sensible implicit conversions. Cos this
sucks 
bigtime.

b.

Apr 27 2008

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

<p9e883002 sneakemail.com> wrote:

 Of course, a wasn't really mutated. Instead, args[0] was copied and th=

en
 mutated and labelled a. Then a was copied and mutated and reassigned t=

he
 mutated copy.

 So, that's two copies of the string, plus a slice, plus an extra metho=

d  =

 call to
 achieve what used to be achievable in place on the original string.  =

 Which is now
 immutable, but I'll never need it again.

 Of course, on these short 1-off strings it doesn't matter a hoot. But =

 =

 when the
 strings are 200 to 500 characters a pop and there are 20,000,000 of  =

 them. It
 matters.

 Did I suggest this was an optimisation?

 Whatever immutability-purity cool aid you've been drinking, please go =

 =

 back to
 coke. And give us usable libraries and sensible implicit conversions. =

 =

 Cos this sucks
 bigtime.

 b.


Is this what you wanted to write?

int main(string[] args)
{
   char[] a =3D cast(char[])args[0];
   a[2..5] =3D "XXX";
   writefln(a);
   return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there's=
 a
cast there, but it's not all that bad, is it?

Apr 27 2008

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Mon, 28 Apr 2008 02:14:19 +0200, Simen Kjaeraas  =

<simen.kjaras gmail.com> wrote:

 <p9e883002 sneakemail.com> wrote:

 Of course, a wasn't really mutated. Instead, args[0] was copied and t=


hen
 mutated and labelled a. Then a was copied and mutated and reassigned =


the
 mutated copy.

 So, that's two copies of the string, plus a slice, plus an extra meth=


od  =

 call to
 achieve what used to be achievable in place on the original string.  =


 Which is now
 immutable, but I'll never need it again.

 Of course, on these short 1-off strings it doesn't matter a hoot. But=


  =

 when the
 strings are 200 to 500 characters a pop and there are 20,000,000 of  =


 them. It
 matters.

 Did I suggest this was an optimisation?

 Whatever immutability-purity cool aid you've been drinking, please go=


  =

 back to
 coke. And give us usable libraries and sensible implicit conversions.=


  =

 Cos this sucks
 bigtime.

 b.


 Is this what you wanted to write?

 int main(string[] args)
 {
    char[] a =3D cast(char[])args[0];
    a[2..5] =3D "XXX";
    writefln(a);
    return 0;
 }
 This compiles and runs, and seems to do what you describe. Sure, there=

's  =

 a
 cast there, but it's not all that bad, is it?


Sorry, forgot the .toupper() call there. Should be
   char[] a =3D cast(char[])args[0].toupper();

-- Simen

Apr 27 2008

p9e883002 sneakemail.com writes:

On Mon, 28 Apr 2008 02:28:23 +0200, "Simen Kjaeraas" 
<simen.kjaras gmail.com> wrote:
 On Mon, 28 Apr 2008 02:14:19 +0200, Simen Kjaeraas  =
 
 <simen.kjaras gmail.com> wrote:
 
 <p9e883002 sneakemail.com> wrote:


 Is this what you wanted to write?

 int main(string[] args)
 {
    char[] a =3D cast(char[])args[0];
    a[2..5] =3D "XXX";
    writefln(a);
    return 0;
 }
 This compiles and runs, and seems to do what you describe. Sure, there=

 's  =
 
 a
 cast there, but it's not all that bad, is it?

 
 
 Sorry, forgot the .toupper() call there. Should be
    char[] a =3D cast(char[])args[0].toupper();
 
 -- Simen

Okay, you got around the first cast by using 

int main( string[] ) {

So now you want to lowercase it again:

import std.stdio;
import std.string;

int main( string[] args) {
    char[] a = cast(char[])args[0].toupper();    
    a[2..5] = "XXX";
    a = a.tolower;
    writefln(a);
    return 0;
}

c:\dmd\test>dmd junk.d
junk.d(7): Error: no property 'tolower' for type 'char[]'
junk.d(7): Error: cannot implicitly convert expression (1) of type int to char[]
junk.d(7): Error: cannot cast int to char[]
junk.d(7): Error: integral constant must be scalar type, not char[]

So, cast a back to being a string, so that we can call tolower() on it and then
cast 
the copied mutated string back to a char[]:

import std.stdio;
import std.string;

int main( string[] args) {
    char[] a = cast(char[])args[0].toupper();    
    a[2..5] = "XXX";
    a = cast(char[]) ( ( cast(string)a ).tolower );
    writefln(a);
    return 0;
}

c:\dmd\test>dmd junk.d
junk.d(7): Error: no property 'tolower' for type 'invariant(char)[]'
junk.d(7): Error: cannot cast int to char[]
junk.d(7): Error: integral constant must be scalar type, not char[]
junk.d(7): Error: cannot cast int to char[]
junk.d(7): Error: integral constant must be scalar type, not char[]
junk.d(7): Error: cannot implicitly convert expression (0) of type int to char[]
junk.d(7): Error: cannot cast int to char[]
junk.d(7): Error: integral constant must be scalar type, not char[]

Nope. That don't work. 

import std.stdio;
import std.string;

int main( string[] args) {
    char[] a = cast(char[])args[0].toupper();    
    a[2..5] = "XXX";
    a = cast(char[])tolower( cast(string)a );
    writefln(a);
    return 0;
}

Finally. It works. 

Summary:

If I want to be able to lvalue slice operations on 'strings' (for efficiency) I
have 
to have them as char[]. 

If I want to be able to use std.string methods on those same strings, I have to 
cast them to invariant(char)[] and the results back to char[] which involves a
at 
least one copy operation, and probably two.

And the invariant-ness of the string library is done "for efficiency"?

Cheers, b.

Apr 27 2008

"Janice Caron" <caron800 googlemail.com> writes:

2008/4/28  <p9e883002 sneakemail.com>:
  import std.string;

  int main( string[] args) {
     char[] a = cast(char[])args[0].toupper();

**** UNDEFINED BEHAVIOR ****
(1) args might be placed in a hardware-locked read-only segment. Then
the following line would fail
(2) there might be other pointers to the string, which expect it never
to change.

     a[2..5] = "XXX";
     a = cast(char[])tolower( cast(string)a );
     writefln(a);
     return 0;
  }

  Finally. It works.

But not necessarily on all architectures, because of the undefined
behavior. This is how you do it without undefined behavior.

    import std.string;

    int main( string[] args) {
        string a = args[0].toupper();
        a = a[0..2] ~ "XXX" ~ a[5..$];
        a = a.tolower();
        writefln(a);
        return 0;
    }

Apr 28 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Janice Caron wrote:

 2008/4/28  <p9e883002 sneakemail.com>:
  import std.string;
 
  int main( string[] args) {
     char[] a = cast(char[])args[0].toupper();

 
 **** UNDEFINED BEHAVIOR ****
 (1) args might be placed in a hardware-locked read-only segment. Then
 the following line would fail
 (2) there might be other pointers to the string, which expect it never
 to change.
 
     a[2..5] = "XXX";
     a = cast(char[])tolower( cast(string)a );
     writefln(a);
     return 0;
  }
 
  Finally. It works.

 
 But not necessarily on all architectures, because of the undefined
 behavior. This is how you do it without undefined behavior.
 
     import std.string;
 
     int main( string[] args) {
         string a = args[0].toupper();
         a = a[0..2] ~ "XXX" ~ a[5..$];
         a = a.tolower();
         writefln(a);
         return 0;
     }

Ack! That's horrible. Instead of using the information I have, the
offset and length of the slice I want to manipulate, I have to derive
two offset/length pairs to the bits I do not want to do anything to.

1) Whatever happened to polymorphism?

Eg. Why can't the standard string library recognise that I, as the
programmer, know what I need to do to my data. It's my job.

So, if I assign the results of a string library function/method to a
mutable variable (Just a variable really. An invariant variable is a
constant!), then it should be possible (*IS* possible) to recognise
that and return an appropriate result. Duplicating the input if
required.

The idea that runtime obtained or derived strings can be made truely
invariant is purely theoretical. Whilst the compiler can place compile
time contants into hardware protected, read-only memory segments, doing
this at runtime would be horribly costly and hardly beneficial.

IA-86 allows memory to be set readonly at runtime, but only in page
sized chunks. Which means that either:

- every derived string would need to be placed in its own 4k multiple
sized chunk of ram.

-or, each page would have to constantly be switched from read-only to
read-write and back again as new entities are added and old ones go out
of scope.

And if you are not using hardware protection, then the invariance is
only notional as  D can call C, and C allows me access to pointers. And
once I have one of those, I can scribble anywhere that isn't hardware
protected.

All this smacks of D reinventing, with all the same mistakes, the whole
Java String vs. StringBuffer dichotomy:

    http://www.javaworld.com/javaworld/jw-03-2000/jw-0324-javaperf.html

And Java had the VM to isolate it from non-complient code.

One of several "mission statements" that drew me to D when I forst
encountered it nearly 3 years ago, was the pragmatism embodied in
articles like this:

    http://www.digitalmars.com/d/2.0/builtin.html

and this:

    http://www.digitalmars.com/d/2.0/cppstrings.html

and statements like this:

    "No pointless wrappers around C runtime library functions or OS API
functions D provides direct access to C runtime library functions and
operating system API functions. Pointless D wrappers around those
functions just adds blather, bloat, baggage and bugs."

Coming back to try and use D after a prolonged absence, the changes in
the interim period seem to be eshewing that pragmatism in favour of
some kind of mixed OO/functional purity ethic. Is there an ex-Haskeller
in the house?

I admit openly to still being in the throws of finding my way around
the language and the library, and have being making seemingly
elementary mistakes in interpreting the documentation. But one of the
major attractions of D over C/C++ is its built-in string types and
manipulations. As good as these are, there is still the need for a
library of common operations upon them. If everytime I want to use one
of these library calls, I have to cast my mutable string into and
invariant and then cast the result back to mutable inorder to be able
to use the built-in manipulations, lifes going to get very boring, very
fast.

The alternative I guess is to sit down and write my own library that
performs the same operations as std.string, but on the native string
type. Which kinda dilutes the purpose of having standard libraries.


Sorry to be so verbose, and please don't anyone take any of this
personally. I'm critiquing the code I am encountering, and the problems
I am having using it. Not the prople who wrote it.

Cheers, b.
--

Apr 28 2008

"Janice Caron" <caron800 googlemail.com> writes:

2008/4/28 Me Here <p9e883002 sneakemail.com>:
  Ack! That's horrible. Instead of using the information I have, the
  offset and length of the slice I want to manipulate, I have to derive
  two offset/length pairs to the bits I do not want to do anything to.

Not necessarily. Instead of

    a = a[0..2] ~ "XXX" ~ a[5..$];

you could do

    char[] tmp = a.dup;
    tmp[2..5] = "XXX";
    a = assumeUnique(tmp);

(I forget which module you have to import to get assumeUnique). But
what you mustn't ever do is cast away invariant.


  1) Whatever happened to polymorphism?

What's polymorphism got to do with anything? A string is an array, not a class.


  So, if I assign the results of a string library function/method to a
  mutable variable (Just a variable really. An invariant variable is a
  constant!), then it should be possible (*IS* possible) to recognise
  that and return an appropriate result.

Functions don't overload on return value.


  The idea that runtime obtained or derived strings can be made truely
  invariant is purely theoretical.

But the fact that someone else might be sharing the data is not.

  But one of the
  major attractions of D over C/C++ is its built-in string types

D has no built in string type. string is just an alias for invariant(char)[].

  If everytime I want to use one
  of these library calls, I have to cast my mutable string into and
  invariant and then cast the result back to mutable

That's one approach. Another is don't try to treat strings as mutable.

Apr 28 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Janice Caron wrote:

 you could do
 
     char[] tmp = a.dup;
     tmp[2..5] = "XXX";
     a = assumeUnique(tmp);
 

Ah! Again, 3 lines instead of 1. Plus two function calls and a temporary
variable.

You do realise that there is a very strong correlation between bugs and line
count?
That's been so for all of the last 30+years regardless of language or paradigm.

So, you made it more verbose and more complex and much slower. 

And, in doing so, introduced more scopes for errors than you've cured.

That's one approach. Another is don't try to treat strings as mutable.

Ram is mutable--it's its purpose in being, 
Variables live in RAM, and vary--that's their purpose in being.

Making a copy of a <strike>string</strike> piece of ram and throwing the old
one away, 
every time I want alter its contents...kinda reminds me of disposable nappies. 
A costly convenience.

I'll revert to 1.x and pray that 2.x fades away through lack of interest before
it turns
D into Yet Another Dead Language--for OO purists and academics only.

Cheers, b.

--

Apr 28 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 28/04/2008, Me Here <p9e883002 sneakemail.com> wrote:
 Ah! Again, 3 lines instead of 1. Plus two function calls and a temporary
variable.

To be fair though, the problem here is that the functions you are
calling (std.string.toupper and std.string.tolower) don't do what you
want. This is not a fault of the language - it's a limitation of the
library.

To that end, as others have said, this problem could be solved simply
enough by the addition of another module - say, std.stringbuffer - in
which we alias char[] to stringbuffer (or maybe a StringBuffer class -
I'm not sure what's best) and provide a whole bunch of functions
optimized for those mutable char arrays.

To blame the language for the lack of library is the wrong approach.
D2 has some killer, kickass features. The template metaprogramming
power alone is enough to make C++ programmers weep. I'm looking
forward to pure functions, and a new generation of multithreading.

If there's enough interest, and if Walter approves, I could certainly
kickstart std.stringbuffer. Is that the right way to go? What do
people think?

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Janice Caron wrote:
 If there's enough interest, and if Walter approves, I could certainly
 kickstart std.stringbuffer. Is that the right way to go? What do
 people think?

What it will do is provide a useful solution for those who really want 
to use mutable strings. I bet that, though, after a while they'll evolve 
to eschew it in favor of immutable strings. It's easier than arguing 
about it <g>.

Apr 28 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Janice Caron wrote:
 If there's enough interest, and if Walter approves, I could certainly
 kickstart std.stringbuffer. Is that the right way to go? What do
 people think?

 What it will do is provide a useful solution for those who really want
 to use mutable strings. I bet that, though, after a while they'll evolve
 to eschew it in favor of immutable strings. It's easier than arguing
 about it <g>.

I do agree with the notion that the majority of operations performed
on strings in a typical application do not modify the string in place.
However, in performance-oriented server applications, is it very
common to hold and reuse a mutable buffer between calls to avoid
the const of reallocation.  Assuming that references to this data are
passed around during the processing of a client request I would
fully expect the surrounding code to have no need to mutate the data.
However, because this is a reusable buffer, invariant is not a safe option
because the contents of the buffer will change for each request.  What
I would be inclined to do here is use const references to reflect this.

I've been thinking a lot about const and invariant recently and while
invariant strings seem quite handy for test code and the like, I have
not been able to think of a single production application where I would
actually be able to use them for the bulk of my string data, for the
reason mentioned above.  Rather, I would expect to use 'const'
everywhere because what I generally care about is preventing a caller
or callee from changing the contents of my data.  As for indicating
ownership, the following rule generally suffices:

    char[] getData(); // result is mutable -- ownership is transferred
    const(char)[] getData(); // result is const -- ownership not transferred

What I love about Steven's "scoped const" proposal is that it would allow
me to write a single instance of a library function that would work equally
well with any data, and the function would communicate its behavior within
the syntax.  Add "scoped const" to D 1.0 plus the ability to use 'const' in
all the places it can be used in D 2.0 and I'd be a happy camper.  Bonus
points for eliminating storage of static const (ie ROM-able) data and
dropping support for anonymous enum altogether.


Sean

P.S. The utility of 'invariant' for multiprogramming is a separate issue.  I
actually think it's unnecessary there as well, but don't want the discussion
to get off track by addressing this at all.  I'm merely adding this note so
no one will bring it up in response to what I said above.

Apr 28 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Sean Kelly wrote:

 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Janice Caron wrote:
 If there's enough interest, and if Walter approves, I could certainly
 kickstart std.stringbuffer. Is that the right way to go? What do
 people think?

 What it will do is provide a useful solution for those who really want
 to use mutable strings. I bet that, though, after a while they'll evolve
 to eschew it in favor of immutable strings. It's easier than arguing
 about it <g>.

 
 I do agree with the notion that the majority of operations performed
 on strings in a typical application do not modify the string in place.
 However, in performance-oriented server applications, is it very
 common to hold and reuse a mutable buffer between calls to avoid
 the const of reallocation.  

Indeed, in the application I'm currently writing at work, there is not a
single heap allocation after the startup phase. And it cannot be called
trivial in any sense.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Apr 28 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 28/04/2008, Walter Bright <newshound1 digitalmars.com> wrote:
  What it will do is provide a useful solution for those who really want to
 use mutable strings. I bet that, though, after a while they'll evolve to
 eschew it in favor of immutable strings.

I'm inclined to agree with the prediction - but even so, wouldn't that
be a good thing? I mean, if it keeps people on board with D2 who might
otherwise have run away, then that's good, right? And if those people
later realise they can do more with immutable strings, then that's
good too, right?

Just a thought.

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Janice Caron wrote:
 On 28/04/2008, Walter Bright <newshound1 digitalmars.com> wrote:
  What it will do is provide a useful solution for those who really want to
 use mutable strings. I bet that, though, after a while they'll evolve to
 eschew it in favor of immutable strings.

 
 I'm inclined to agree with the prediction - but even so, wouldn't that
 be a good thing? I mean, if it keeps people on board with D2 who might
 otherwise have run away, then that's good, right? And if those people
 later realise they can do more with immutable strings, then that's
 good too, right?

I think we're in agreement.

Apr 28 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Janice Caron wrote:

2008/4/28 Me Here <p9e883002 sneakemail.com>:
(I forget which module you have to import to get assumeUnique). But
what you mustn't ever do is cast away invariant.

  1) Whatever happened to polymorphism?

What's polymorphism got to do with anything? A string is an array, not a 
class.


  So, if I assign the results of a string library function/method to a
  mutable variable (Just a variable really. An invariant variable is a
  constant!), then it should be possible (*IS* possible) to recognise
  that and return an appropriate result.

Functions don't overload on return value.

They don't? Why not? Seems like a pretty obvious step to me.

Rather than having to have methods:

     futzIt_returnString()
     futzIt_returnInt()
     futzIt_returnReal()
     futzIt_returnComplex()

where 'futzIt' might me "read a string from the command line and return it 
to me as some type (if possible)",

I can just do

int i = futzIt( ... );
real r = futzIt( ... );

And let the compiler work out which futzIt() I need to call, and take care 
of mangling the names to allow them to coexists.
You mean D doesn't already have this facility?

Seems lie it would be a far more productive and useful expenditure of 
effort than all this invariant stuff.

  The idea that runtime obtained or derived strings can be made truly
  invariant is purely theoretical.

But the fact that someone else might be sharing the data is not.

By "someone else" you mean 'another thread'?
If so, then if that is a possibility, if my code is using threads, then I, 
the programmer,
will be aware of that  and will be able to take appropriate choices.

I /might/ chose to use invariance to 'protect' this particular piece of 
data from the problems
of shared state concurrency--if there is any possibility that I intend to 
shared this particular piece of data.
But in truth, it is very unlikely that I *will* make /that/ choice. Here's 
why.

What does it mean to make and hold multiple (mutated) copies of a single 
entity?

That is, I obtain a piece of data from somewhere and make it invariant.
Somehow two threads obtain references to that piece of data.
If none of them attempt to change it, then it makes no difference that it 
is marked invariant.
If however, one of them is programmed to change it, then it now has a 
different,
version of that entity to the other thread. But what does that mean? Who 
has the 'right' version?

Show me a real situation where two threads can legitimately be making 
disparate modifications to a single entity,
string or otherwise, and I'll show you a programming error. Once two 
threads make disparate modifications to an entity,
they are separate entities. And they should have been given copies, not 
references to a single copy, in the first place.

If the intent is that the share a single entity, then any legitimate 
modifications to that single entity should be reflected
in the views of that single entity by both threads. And therefore 
subjected to locking, or STM or whatever mechanism is
used to control that modification.

This whole thing of invariance and concurrency seems to be aimed at 
enabling the use of COW.
Which smacks of someone trying to emulate fork-like behaviours using 
threads.

And if that is the case, and I very much hope it isn't, then let me tell 
you as someone who is intimately familiar with the
one existing system that wen this route (iThreads: look'em up), that it is 
a total disaster,

The whole purpose and advantage of multi-threading, over multi-processing, 
is (mutable) shared state. And the elimination of
costs of serialisation and narrow bandwidth if IPC in the forking 
concurrency mode. Attempting to emulate that model
using threading gives few of its advantages, all of its disadvantages, and 
throws away all of the advantages of threading.
It is a complete and utter waste of time and effort.

If the aim is to simplify the use of threading for common programming 
scenarios
and bring it within the grasp of non-threading specialist programmers,
then there are far more effective and less costly ways of achieving that.

  But one of the
  major attractions of D over C/C++ is its built-in string types

D has no built in string type. string is just an alias for 
invariant(char)[].

Semantics.

D has built-in support for a string-type (see 
http://www.digitalmars.com/d/2.0/overview.html) from which I quote:

     "Strings"

     "String manipulation is so common, and so clumsy in C and C++, that it
needs direct support in the language".
     "Modern languages handle string concatenation, copying, etc., and so does
D".
     "Strings are a direct consequence of improved array handling."

What invariant strings do, and as far as I can see the only significant 
thing they do, is to reinvent the clumsiness
of C & C++ by making strings a second-class data type again.

If the point is to try and make threading easier, it will fail miserably 
once people realise that it creates the scope for
multiple concurrent versions of supposedly single entities. Which breaks 
just about every programming rule in the book,
and creates scope for far more intractable errors than it fixes.

That's one approach. Another is don't try to treat strings as mutable.

If the intention of invariance is some move toward OO or functional 
purity, then I again quote from the same document:

     "Who D is Not For"

     [some categories elided]

     "Language purists. D is a practical language, and each feature of it is
evaluated in that light, rather than by an ideal. "
     "For example, D has constructs and semantics that virtually eliminate the
need for pointers for ordinary tasks. "
     "But pointers are still there, because sometimes the rules need to be
broken."
     "Similarly, casts are still there for those times when the typing system
needs to be overridden."

Cheers, b.


--

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.

 They don't? Why not? Seems like a pretty obvious step to me.

Type inference in D is done "bottom up". Doing overloading based on 
function return type is "top down". Trying to get both schemes to 
coexist is a hard problem.


  The idea that runtime obtained or derived strings can be made truly
  invariant is purely theoretical.

 But the fact that someone else might be sharing the data is not.

 By "someone else" you mean 'another thread'?

No, it could be the same thread, via another alias to the same data. 
Using invariant strings allows the programmer to treat them as if they 
were value types and being copied for every use (like ints are), except 
they don't need to be actually copied.

With mutable strings, one always has to be careful to keep track of who 
'owns' the string, and who has references to it. When mutating the 
string, one must manually ensure that there are no other references to 
it that would be surprised by the data changing. For example, if you 
insert a string into a symbol table, and then later some other reference 
to that string changes it, it could wind up corrupting the symbol table.

The point about the main(char[][] args) and modifying those strings 
in-place is very valid - nothing is said about where those strings 
actually reside, and who else may have references to the same data, and 
whether you can modify them with impunity or not. You could argue "this 
should be better documented" and you'd be right, but if the declaration 
instead said main(invariant(char[])args) then I *know* that I am not 
allowed to change them, and whoever calls main() *knows* that those arg 
strings won't get changed. We can both sleep comfortably.

Invariant strings offer a guarantee that the data won't change, which 
clarifies the API of the functions. (Whenever I see an API function that 
takes a char*, say putenv(), it rarely says whether it saves a copy of 
the data or saves a copy of the reference. That just sucks.)


 If so, then if that is a possibility, if my code is using threads, then 
 I, the programmer,
 will be aware of that  and will be able to take appropriate choices.
 
 I /might/ chose to use invariance to 'protect' this particular piece of 
 data from the problems
 of shared state concurrency--if there is any possibility that I intend 
 to shared this particular piece of data.
 But in truth, it is very unlikely that I *will* make /that/ choice. 
 Here's why.
 
 What does it mean to make and hold multiple (mutated) copies of a single 
 entity?
 
 That is, I obtain a piece of data from somewhere and make it invariant.
 Somehow two threads obtain references to that piece of data.
 If none of them attempt to change it, then it makes no difference that 
 it is marked invariant.
 If however, one of them is programmed to change it, then it now has a 
 different,
 version of that entity to the other thread. But what does that mean? Who 
 has the 'right' version?
 
 Show me a real situation where two threads can legitimately be making 
 disparate modifications to a single entity,
 string or otherwise, and I'll show you a programming error. Once two 
 threads make disparate modifications to an entity,
 they are separate entities. And they should have been given copies, not 
 references to a single copy, in the first place.
 
 If the intent is that the share a single entity, then any legitimate 
 modifications to that single entity should be reflected
 in the views of that single entity by both threads. And therefore 
 subjected to locking, or STM or whatever mechanism is
 used to control that modification.
 
 This whole thing of invariance and concurrency seems to be aimed at 
 enabling the use of COW.

Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at 
its core?

 And if that is the case, and I very much hope it isn't, then let me tell 
 you as someone who is intimately familiar with the
 one existing system that wen this route (iThreads: look'em up), that it 
 is a total disaster,

ithreads copies the entire user data per thread. Using invariant is, of 
course, a way to avoid copying the data.

 The whole purpose and advantage of multi-threading, over 
 multi-processing, is (mutable) shared state. And the elimination of
 costs of serialisation and narrow bandwidth if IPC in the forking 
 concurrency mode. Attempting to emulate that model
 using threading gives few of its advantages, all of its disadvantages, 
 and throws away all of the advantages of threading.
 It is a complete and utter waste of time and effort.

I can agree with that.

Apr 28 2008

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:48169E90.6050700 digitalmars.com...
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.

 They don't? Why not? Seems like a pretty obvious step to me.

 Type inference in D is done "bottom up". Doing overloading based on 
 function return type is "top down". Trying to get both schemes to coexist 
 is a hard problem.

But a function's result can be overloaded using "out", so why can't it be 
overloaded using the return value?

Can't the compiler treat a return value as an implicit out argument?

L.

Apr 29 2008

Walter Bright <newshound1 digitalmars.com> writes:

Lionello Lunesu wrote:
 
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:48169E90.6050700 digitalmars.com...
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.

 They don't? Why not? Seems like a pretty obvious step to me.

 Type inference in D is done "bottom up". Doing overloading based on 
 function return type is "top down". Trying to get both schemes to 
 coexist is a hard problem.

 
 But a function's result can be overloaded using "out", so why can't it 
 be overloaded using the return value?

We know what the type of the out argument is. The problem with return 
value overloading is not knowing what the type should be.


 Can't the compiler treat a return value as an implicit out argument?

Suppose the return value is used as an argument to another function with 
overloaded versions. Rinse, repeat. The combinations grow out of control.

Apr 29 2008

e-t172 <e-t172 akegroup.org> writes:

Lionello Lunesu a écrit :
 
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:48169E90.6050700 digitalmars.com...
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.

 They don't? Why not? Seems like a pretty obvious step to me.

 Type inference in D is done "bottom up". Doing overloading based on 
 function return type is "top down". Trying to get both schemes to 
 coexist is a hard problem.

 
 But a function's result can be overloaded using "out", so why can't it 
 be overloaded using the return value?
 
 Can't the compiler treat a return value as an implicit out argument?

Consider this:

int foo();
float foo();

void bar(int a);
void bar(float a);

Then this:

void main()
{
	bar(foo());
}

There is an obvious problem here.

Apr 29 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"e-t172" wrote
 Lionello Lunesu a �crit :
 "Walter Bright" wrote in message
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.

 They don't? Why not? Seems like a pretty obvious step to me.

 Type inference in D is done "bottom up". Doing overloading based on 
 function return type is "top down". Trying to get both schemes to 
 coexist is a hard problem.

 But a function's result can be overloaded using "out", so why can't it be 
 overloaded using the return value?

 Can't the compiler treat a return value as an implicit out argument?

 Consider this:

 int foo();
 float foo();

 void bar(int a);
 void bar(float a);

 Then this:

 void main()
 {
 bar(foo());
 }

 There is an obvious problem here.

Yes, one that is solved like any other that has ambiguity: casting.  We will 
have the same problem when opImplicitCast is introduced.

This seems like a rare case anyways, not a reason not to have overloaded 
return values.

-Steve

Apr 29 2008

"Hans W. Uhlig" <huhlig gmail.com> writes:

e-t172 wrote:
 Lionello Lunesu a écrit :
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:48169E90.6050700 digitalmars.com...
 Me Here wrote:
 Janice Caron wrote:
 Functions don't overload on return value.

 They don't? Why not? Seems like a pretty obvious step to me.

 Type inference in D is done "bottom up". Doing overloading based on 
 function return type is "top down". Trying to get both schemes to 
 coexist is a hard problem.

 But a function's result can be overloaded using "out", so why can't it 
 be overloaded using the return value?

 Can't the compiler treat a return value as an implicit out argument?

 
 Consider this:
 
 int foo();
 float foo();
 
 void bar(int a);
 void bar(float a);
 
 Then this:
 
 void main()
 {
     bar(foo());
 }
 
 There is an obvious problem here.

One of two things, make an assumption as to which is called by which has 
the higher priority(based on precision or type). Or throw a compiler 
error if no cast is made.

Overload Ambiguity, Cast Must be made when both return overload and 
parameter overload types are ambigious.

Apr 30 2008

Gide Nwawudu <gide btinternet.com> writes:

On Mon, 28 Apr 2008 02:14:19 +0200, "Simen Kjaeraas"
<simen.kjaras gmail.com> wrote:

<p9e883002 sneakemail.com> wrote:

 Of course, a wasn't really mutated. Instead, args[0] was copied and then
 mutated and labelled a. Then a was copied and mutated and reassigned the
 mutated copy.

 So, that's two copies of the string, plus a slice, plus an extra method  
 call to
 achieve what used to be achievable in place on the original string.  
 Which is now
 immutable, but I'll never need it again.

 Of course, on these short 1-off strings it doesn't matter a hoot. But  
 when the
 strings are 200 to 500 characters a pop and there are 20,000,000 of  
 them. It
 matters.

 Did I suggest this was an optimisation?

 Whatever immutability-purity cool aid you've been drinking, please go  
 back to
 coke. And give us usable libraries and sensible implicit conversions.  
 Cos this sucks
 bigtime.

 b.


Is this what you wanted to write?

int main(string[] args)
{
   char[] a = cast(char[])args[0];
   a[2..5] = "XXX";
   writefln(a);
   return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there's a
cast there, but it's not all that bad, is it?

Or just add a dup.

int main(string[] args)
{
   char[] a = args[0].dup;
   a[2..5] = "XXX";
   writefln(a);
   return 0;
}

Apr 27 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Simen Kjaeraas wrote:
 <p9e883002 sneakemail.com> wrote:
 
 Of course, a wasn't really mutated. Instead, args[0] was copied and then
 mutated and labelled a. Then a was copied and mutated and reassigned the
 mutated copy.

 So, that's two copies of the string, plus a slice, plus an extra 
 method call to
 achieve what used to be achievable in place on the original string. 
 Which is now
 immutable, but I'll never need it again.

 Of course, on these short 1-off strings it doesn't matter a hoot. But 
 when the
 strings are 200 to 500 characters a pop and there are 20,000,000 of 
 them. It
 matters.

 Did I suggest this was an optimisation?

 Whatever immutability-purity cool aid you've been drinking, please go 
 back to
 coke. And give us usable libraries and sensible implicit conversions. 
 Cos this sucks
 bigtime.

 b.

 
 
 Is this what you wanted to write?
 
 int main(string[] args)
 {
   char[] a = cast(char[])args[0];
   a[2..5] = "XXX";
   writefln(a);
   return 0;
 }
 This compiles and runs, and seems to do what you describe. Sure, there's a
 cast there, but it's not all that bad, is it?

I'm no invariant guru, but I don't think that's legal.  'invariant' 
means the data could be stored in a portion of memory that the OS will 
not allow the program to write to.  So you need to dup it:

    char[] a = args[0].dup;
    a[2..5] = "XXX";
    writefln(a);
    return 0;

That stuff like this compiles and seems to work is why we really need to 
make at least one alternative version of cast.  One would be for 
relative safe run-of-the-mill casts, like casting float to int, or 
casting Object to some class (and checking for null),  and the other 
category would be for dangerous big red flags kind of things like the 
above.  Using the run-of-the-mill cast in the above situation would not 
be allowed.

--bb

Apr 27 2008

Tomas Lindquist Olsen <tomas famolsen.dk> writes:

Bill Baxter wrote:

... snip ...

 
 That stuff like this compiles and seems to work is why we really need to 
 make at least one alternative version of cast.  One would be for 
 relative safe run-of-the-mill casts, like casting float to int, or 
 casting Object to some class (and checking for null),  and the other 
 category would be for dangerous big red flags kind of things like the 
 above.  Using the run-of-the-mill cast in the above situation would not 
 be allowed.

Amen to that !!!

Apr 28 2008

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

"Bill Baxter" <dnewsgroup billbaxter.com> wrote in message 
news:fv3612$sgu$1 digitalmars.com...
 That stuff like this compiles and seems to work is why we really need to 
 make at least one alternative version of cast.  One would be for relative 
 safe run-of-the-mill casts, like casting float to int, or casting Object 
 to some class (and checking for null),  and the other category would be 
 for dangerous big red flags kind of things like the above.  Using the 
 run-of-the-mill cast in the above situation would not be allowed.

That request has been on the "unofficial wish list" since the beginning.. 
And I still agree with it.

Maybe cast() should be parsed as a template. Then, the compiler should 
require more "!"s as the risc increases:

SomeClass sc = cast(SomeClass)some_obj;   //OK
int i = cast!(int)some_float;    //might not fit
SomeClass sc = cast!!(SomeClass)void_ptr;  //unsafe
char[] mutstring = cast!!!!!!!!(char[])toUpper("...");  //wtf are you doing!

L.

Apr 29 2008

p9e883002 sneakemail.com writes:

On Mon, 28 Apr 2008 02:14:19 +0200, "Simen Kjaeraas" 
<simen.kjaras gmail.com> wrote:
 <p9e883002 sneakemail.com> wrote:
 
 Is this what you wanted to write?
 
 int main(string[] args)
 {
    char[] a =3D cast(char[])args[0];
    a[2..5] =3D "XXX";
    writefln(a);
    return 0;
 }
 This compiles and runs, and seems to do what you describe. Sure, there's=
  a
 cast there, but it's not all that bad, is it?

No. You missed out uppercasing the string before replacing the slice.

Apr 27 2008

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Mon, 28 Apr 2008 02:44:14 +0200, <p9e883002 sneakemail.com> wrote:

 On Mon, 28 Apr 2008 02:14:19 +0200, "Simen Kjaeraas"
 <simen.kjaras gmail.com> wrote:
 <p9e883002 sneakemail.com> wrote:

 Is this what you wanted to write?

 int main(string[] args)
 {
    char[] a =3D3D cast(char[])args[0];
    a[2..5] =3D3D "XXX";
    writefln(a);
    return 0;
 }
 This compiles and runs, and seems to do what you describe. Sure,  =


 there's=3D
  a
 cast there, but it's not all that bad, is it?

 No. You missed out uppercasing the string before replacing the slice.


That's why I replied to my own post stating just that.
Anyways, Gide got it right. A .dup is the correct way, a cast is wrong.

-- Simen

Apr 27 2008

"Janice Caron" <caron800 googlemail.com> writes:

2008/4/28 Simen Kjaeraas <simen.kjaras gmail.com>:
  int main(string[] args)
  {
   char[] a = cast(char[])args[0];
   a[2..5] = "XXX";
   writefln(a);
   return 0;
  }
  This compiles and runs, and seems to do what you describe. Sure, there's a
  cast there, but it's not all that bad, is it?

Yes, it's extremely bad. Casting away invariant is UNDEFINED BEHAVIOR,
and should never be done.

You should never need an explicit cast just to handle text!

Apr 28 2008

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Janice Caron wrote:
 2008/4/28 Simen Kjaeraas <simen.kjaras gmail.com>:
  int main(string[] args)
  {
   char[] a = cast(char[])args[0];
   a[2..5] = "XXX";
   writefln(a);
   return 0;
  }
  This compiles and runs, and seems to do what you describe. Sure, there's a
  cast there, but it's not all that bad, is it?

 
 Yes, it's extremely bad. Casting away invariant is UNDEFINED BEHAVIOR,
 and should never be done.
 

It's not merely undefined, it's *illegal*!
I hate the C/C++ tradition of calling "undefined behavior" to things 
that are *illegal*. Yes, illegal behavior causes undefined behavior, but 
they're not the same thing. Illegal is something that may cause your 
program to crash, or simply become in a fault and erroneous state. 
Undefined is just undefined. For example, this expression in C:
   a = (x++) + x*2;
has undefined behavior (because of order of evaluation issues). But it's 
not *illegal* behavior, your program will not crash and burn because of 
that.

-- 
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Apr 29 2008

terranium <spam here.lot> writes:

 So, that's two copies of the string, plus a slice, plus an extra method call
to 
 achieve what used to be achievable in place on the original string. Which is
now 
 immutable, but I'll never need it again. 

this is what string functions look like.


 Of course, on these short 1-off strings it doesn't matter a hoot. But when the 
 strings are 200 to 500 characters a pop and there are 20,000,000 of them. It 
 matters.

In this case you may need a StringBuilder

 And give us usable libraries and sensible implicit conversions. Cos this sucks 
 bigtime.

I think one should first use safe methods at the cost of performance and memory
usage, if this proves to suck, one should switch to an advanced technique at
the cost of expressiveness and developer's care.

Apr 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

<p9e883002 sneakemail.com> wrote
 Hi all,

 'scuse me for not being familiar with previous or ongoing discussion on 
 this
 subject, but I'm just coming back to D after a couple of years away.

 I have some strings read in from external source that I need to convert to
 uppercase. A quick look at Phobos and I find std.string has a toupper 
 method.
 <very good example case removed>

This is all not an issue if Walter adopts 'scoped const' contracts.

http://d.puremagic.com/issues/show_bug.cgi?id=1961

The current con for this method is that it is another 'confusing' const 
syntax.  So is what I propose more confusing, or is what this poor developer 
had to go through more confusing?

-Steve

Apr 28 2008

"Janice Caron" <caron800 googlemail.com> writes:

2008/4/28 Steven Schveighoffer <schveiguy yahoo.com>:
  > I have some strings read in from external source that I need to convert to
  > uppercase. A quick look at Phobos and I find std.string has a toupper
  > method.
  > <very good example case removed>

  This is all not an issue if Walter adopts 'scoped const' contracts.

toupper() couldn't be reused for all constancies, because the
invariant version should employ copy-on-write, wheras any other
versions would not be able to do this.

That is,

    toupper("HELLO");

can return the original, if and only if the string is invariant.

Apr 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Janice Caron" wrote
 2008/4/28 Steven Schveighoffer:
  > I have some strings read in from external source that I need to 
 convert to
  > uppercase. A quick look at Phobos and I find std.string has a toupper
  > method.
  > <very good example case removed>

  This is all not an issue if Walter adopts 'scoped const' contracts.

 toupper() couldn't be reused for all constancies, because the
 invariant version should employ copy-on-write, wheras any other
 versions would not be able to do this.

 That is,

    toupper("HELLO");

 can return the original, if and only if the string is invariant.

toupper is probably a bad example, as your case seems like the rarest :) 
But I understand what you are saying.

The desire to have string processing functions work with all constancies 
seems very reasonable and useful to me.  To deny usage of toupper unless you 
idup the array, just to have the ability to optimize on a corner case seems 
incorrect, and to probably produce less efficient code for 90% of the cases. 
If the scoped const proposal was never accepted, and I used Phobos, I'd 
probably suggest a const and mutable version of toupper that allowed for 
those of us who use mutable strings a lot, and maybe not so much 
multithreadding, to not have to jump through hoops for any string 
processing.

Maybe the solution to this is to write specializations which use COW with 
the invariant version, perhaps with pure functions, which always assume 
invariant parameters.  So you have a pure toupper which handles the 
invariant version, and a scoped const version which allows using the 
function on non-invariant parameters, which can't be optimized the same 
anyways...

-Steve

Apr 28 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Janice Caron (caron800 googlemail.com)'s article
 2008/4/28 Steven Schveighoffer <schveiguy yahoo.com>:
  > I have some strings read in from external source that I need to convert to
  > uppercase. A quick look at Phobos and I find std.string has a toupper
  > method.
  > <very good example case removed>

  This is all not an issue if Walter adopts 'scoped const' contracts.

 toupper() couldn't be reused for all constancies, because the
 invariant version should employ copy-on-write, wheras any other
 versions would not be able to do this.
 That is,
     toupper("HELLO");
 can return the original, if and only if the string is invariant.

Can you explain this in light of Steven's 'scoped const' proposal?  By my
understanding (assuming scoped const):

    string bufI = "HELLO";
    char[] bufM = "HELLO".dup;
    const(char)[] bufC = bufM;

    string retI = toupper( bufI ); // return value is invariant - ok
    char[] retM = toupper( bufM ); // return value is mutable - ok
    const(char)[] retC = toupper( bufC ); // return value is const - ok
    const(char)[] retC2 = toupper( bufI ); // return value is invariant - ok

    bufM[0] = 'J';
    assert( retC[0] == 'J' );

The above seems perfectly fine, because it's impossible to pass a mutable
array and return a const reference to it--the return value will be mutable as
well.

By contrast, let's assume the invariant implementation:

    string toupper( string buf );

    char[] buf = "HELLO".dup;

    toupper( buf ); // fails
    toupper( buf.idup ); // works
    toupper( assertUnique( buf ) ); // works

In the first case I have to copy buf to pass it to toupper, and in the second I
have
to perform a cast operation (albeit wrapped in a function to hide the truth).
Assuming for a moment that mutable strings are useful and so I won't be able to
use the 'string' alias all the time, can you explain what is good about either
of
these scenarios?


Sean

Apr 28 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 28/04/2008, Sean Kelly <sean invisibleduck.org> wrote:
 Can you explain this in light of Steven's 'scoped const' proposal?

I meant that non-invariant versions would have to make a copy, but the
invariant version sometimes wouldn't. That means they can't share the
same code.


     string bufI = "HELLO";
     char[] bufM = "HELLO".dup;
     const(char)[] bufC = bufM;

     const(char)[] retC = toupper( bufC ); // return value is const - ok

     bufM[0] = 'J';
     assert( retC[0] == 'J' );

Why would that assert hold? I would expect toupper(char[]) to have to
return a copy precisely in order to /prevent/ that problem. What am I
missing?

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

p9e883002 sneakemail.com wrote:
 So, that's two copies of the string, plus a slice, plus an extra method call
to 
 achieve what used to be achievable in place on the original string. Which is
now 
 immutable, but I'll never need it again. 
 
 Of course, on these short 1-off strings it doesn't matter a hoot. But when the 
 strings are 200 to 500 characters a pop and there are 20,000,000 of them. It 
 matters.
 
 Did I suggest this was an optimisation?

You bring up a good point.

On a tiny example such as yours, where you can see everything that is 
going on at a glance, such as where strings come from and where they are 
going, there isn't any point to immutable strings. You're right about that.

The problems start happening as the complexity rises. Strings get passed 
around, stored, modified, etc. It's real easy to lose track of who owns 
a string, who else has references to the string, who has rights to 
change the string and who doesn't.

For example, you're changing the char[][] passed in to main(). What if 
one of those strings is a literal in the read-only data section?

So what happens is code starts defensively making copies of the string 
"just in case." I'll argue that in a complex program, you'll actually 
wind up making far more copies than you will with invariant strings.

Apr 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 p9e883002 sneakemail.com wrote:
 So, that's two copies of the string, plus a slice, plus an extra method 
 call to achieve what used to be achievable in place on the original 
 string. Which is now immutable, but I'll never need it again. Of course, 
 on these short 1-off strings it doesn't matter a hoot. But when the 
 strings are 200 to 500 characters a pop and there are 20,000,000 of them. 
 It matters.

 Did I suggest this was an optimisation?

 You bring up a good point.

 On a tiny example such as yours, where you can see everything that is 
 going on at a glance, such as where strings come from and where they are 
 going, there isn't any point to immutable strings. You're right about 
 that.

 The problems start happening as the complexity rises. Strings get passed 
 around, stored, modified, etc. It's real easy to lose track of who owns a 
 string, who else has references to the string, who has rights to change 
 the string and who doesn't.

 For example, you're changing the char[][] passed in to main(). What if one 
 of those strings is a literal in the read-only data section?

 So what happens is code starts defensively making copies of the string 
 "just in case." I'll argue that in a complex program, you'll actually wind 
 up making far more copies than you will with invariant strings.

I agree that immutable strings can be valuable.  That's why I think it's 
important to have a version of toupper that uses invariant strings because 
you can make more assumptions about when to make copies.  But why shouldn't 
there be a version that does the same thing with mutable or const strings? 
Why should a developer be forced to always use invariant strings when the 
optimizations and multithreading benefits that come with only using 
invariant strings may not be more important for a particular program than 
being able to modify a string?  I should still be able to use toupper on 
mutable strings as well...

-Steve

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Steven Schveighoffer wrote:
 I agree that immutable strings can be valuable.  That's why I think it's 
 important to have a version of toupper that uses invariant strings because 
 you can make more assumptions about when to make copies.  But why shouldn't 
 there be a version that does the same thing with mutable or const strings? 
 Why should a developer be forced to always use invariant strings when the 
 optimizations and multithreading benefits that come with only using 
 invariant strings may not be more important for a particular program than 
 being able to modify a string?  I should still be able to use toupper on 
 mutable strings as well...

That's why I agreed with Janice on making a stringbuffer module that 
operates on mutable strings. It's easier than arguing about it, and it 
doesn't hurt to have such a package. And I suspect that after using it 
for a while, people will naturally evolve towards using all invariant 
strings.

Apr 28 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Walter Bright wrote:

 Steven Schveighoffer wrote:
 I agree that immutable strings can be valuable.  That's why I think it's
 important to have a version of toupper that uses invariant strings
 because
 you can make more assumptions about when to make copies.  But why
 shouldn't there be a version that does the same thing with mutable or
 const strings? Why should a developer be forced to always use invariant
 strings when the optimizations and multithreading benefits that come with
 only using invariant strings may not be more important for a particular
 program than
 being able to modify a string?  I should still be able to use toupper on
 mutable strings as well...

 
 That's why I agreed with Janice on making a stringbuffer module that
 operates on mutable strings. It's easier than arguing about it, and it
 doesn't hurt to have such a package. And I suspect that after using it
 for a while, people will naturally evolve towards using all invariant
 strings.

After working with Java for quite some time, I have naturally drifted from
using invariant strings to stringbuffers.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Lars Ivar Igesund wrote:
 After working with Java for quite some time, I have naturally drifted from
 using invariant strings to stringbuffers.

Java strings lack slicing, so they're crippled anyway. I believe that 
slicing is one of those paradigm-shifting features, so I am not making 
an irrelevant point.

Apr 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 Lars Ivar Igesund wrote:
 After working with Java for quite some time, I have naturally drifted 
 from
 using invariant strings to stringbuffers.

 Java strings lack slicing, so they're crippled anyway. I believe that 
 slicing is one of those paradigm-shifting features, so I am not making an 
 irrelevant point.

Java's String.substring(start, last) works just like slicing...

Not that I don't love D slicing above calling a function, but saying that 
Java doesn't have slicing is completely false.

Where they lack is in the support of mutable strings, and especially having 
strings be treated as native arrays.  D excels in those areas.

-Steve

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Steven Schveighoffer wrote:
 Java's String.substring(start, last) works just like slicing...

No it doesn't. It makes a copy (I don't know if this is true of *all* 
versions of Java).

Apr 28 2008

Ary Borenszweig <ary esperanto.org.ar> writes:

Walter Bright escribi�:
 Steven Schveighoffer wrote:
 Java's String.substring(start, last) works just like slicing...

 
 No it doesn't. It makes a copy (I don't know if this is true of *all* 
 versions of Java).

A String holds an char[], the "start" in it and it's "length". A 
substring just creates another String instance with "start" and "length" 
changed.

So it makes a new String, but the underlying char[] remains the same.

Apr 28 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Walter Bright wrote:
 Steven Schveighoffer wrote:
 Java's String.substring(start, last) works just like slicing...

 
 No it doesn't. It makes a copy (I don't know if this is true of *all* 
 versions of Java).

Java's 6's string.substring method (JDK 1.6.0_04, 64-bit Windows):

public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
	throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
	throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
	throw new StringIndexOutOfBoundsException(endIndex -beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
	new String(offset + beginIndex, endIndex - beginIndex, value);
}

The important part is new String(offset + beginIndex, endIndex - 
beginIndex, value) which does indeed do a "slice" of sorts (that is, it 
returns a string with the same char array backing it with a new offset 
and length). No copying of data is done.

Apr 28 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Robert Fraser (fraserofthenight gmail.com)'s article
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 Java's String.substring(start, last) works just like slicing...

 No it doesn't. It makes a copy (I don't know if this is true of *all*
 versions of Java).

 Java's 6's string.substring method (JDK 1.6.0_04, 64-bit Windows):
 public String substring(int beginIndex, int endIndex) {
 if (beginIndex < 0) {
 	throw new StringIndexOutOfBoundsException(beginIndex);
 }
 if (endIndex > count) {
 	throw new StringIndexOutOfBoundsException(endIndex);
 }
 if (beginIndex > endIndex) {
 	throw new StringIndexOutOfBoundsException(endIndex -beginIndex);
 }
 return ((beginIndex == 0) && (endIndex == count)) ? this :
 	new String(offset + beginIndex, endIndex - beginIndex, value);
 }
 The important part is new String(offset + beginIndex, endIndex -
 beginIndex, value) which does indeed do a "slice" of sorts (that is, it
 returns a string with the same char array backing it with a new offset
 and length). No copying of data is done.

Right.  The issue in Java is that the String wrapper class is still allocated
on the heap so DMA is still occurring.  D, on the other hand, uses a fat
reference so creating a slice doesn't touch the heap at all.


Sean

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Robert Fraser wrote:
 The important part is new String(offset + beginIndex, endIndex - 
 beginIndex, value) which does indeed do a "slice" of sorts (that is, it 
 returns a string with the same char array backing it with a new offset 
 and length). No copying of data is done.

Yes, you are right. I was wrong. But Java is still new'ing a new 
instance of String for each slice. And it still uses two levels of 
indirection to get to the string data.

Apr 28 2008

Christopher Wright <dhasenan gmail.com> writes:

Robert Fraser wrote:
 The important part is new String(offset + beginIndex, endIndex - 
 beginIndex, value) which does indeed do a "slice" of sorts (that is, it 
 returns a string with the same char array backing it with a new offset 
 and length). No copying of data is done.

Sun has it right. GNU Classpath has it wrong and copies the data every time.

Apr 28 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Walter Bright wrote:

 Lars Ivar Igesund wrote:
 After working with Java for quite some time, I have naturally drifted
 from using invariant strings to stringbuffers.

 
 Java strings lack slicing, so they're crippled anyway. I believe that
 slicing is one of those paradigm-shifting features, so I am not making
 an irrelevant point.

I agree that Java strings are crippled, but considering that String is
easier to use there than StringBuffer, I certainly would need good reasons
to prefer the latter? And I have.

Your point about slicing may not be irrelevant, but the kickass-ness of the
feature only truly comes to its right when combined with non-allocating
string operations. 

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Apr 28 2008

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Walter Bright wrote:
 Steven Schveighoffer wrote:
 I agree that immutable strings can be valuable.  That's why I think 
 it's important to have a version of toupper that uses invariant 
 strings because you can make more assumptions about when to make 
 copies.  But why shouldn't there be a version that does the same thing 
 with mutable or const strings? Why should a developer be forced to 
 always use invariant strings when the optimizations and multithreading 
 benefits that come with only using invariant strings may not be more 
 important for a particular program than being able to modify a 
 string?  I should still be able to use toupper on mutable strings as 
 well...

 
 That's why I agreed with Janice on making a stringbuffer module that 
 operates on mutable strings. It's easier than arguing about it, and it 
 doesn't hurt to have such a package. And I suspect that after using it 
 for a while, people will naturally evolve towards using all invariant 
 strings.

"people will naturally evolve towards using all invariant strings."
Oh please. This whole discussion between "Me here" and Walter was always 
occurring under the notion that one either has to use all mutable 
strings, or all invariant strings, which is a silly idea. Use what is 
right for what you are trying to do!

The original post code was a clear-cut example of invariant misuse. If 
you are going to make one or several different mutations to a string, do 
not use invariant, use mutable. The fact that there isn't a 
mutable/in-place tolower has no bearing on the const/invariant system 
(only on the Phobos library design). So if you had any quarrel, it 
wasn't with D's immutability system, but with library design (which 
Walter already said he plans to fix... at least on what std.string is 
concerned).

And Walter, people won't "naturally evolve towards using all invariant 
strings" (nor they should). If I have a function where I'm going to 
perform a series of changes to a string, I'm not going to dup them with 
each change just to say "How cute, I'm using invariant all the way!". 
I'll do all the changes on a mutable string, and they return either a 
mutable, const, or invariant string, as appropriate to what makes sense 
in the code.

-- 
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Apr 29 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Walter Bright wrote:

p9e883002 sneakemail.com wrote:

Did I suggest this was an optimisation?

You bring up a good point.

Sorry to have provoked you Walter, but thanks for your reply.

On a tiny example such as yours, where you can see everything that is 
going on at a glance, such as where strings come from and where they are 
going, there isn't any point to immutable strings. You're right about that.

Well obviously the example was trivial to concentrate attention upon the 
issue I was having.

  It's real easy to lose track of who owns a string, who else has references to
the string, who has rights to change the string and who doesn't.

The keyword in there is "who". The problem is that you are pessimising the 
entire language, once rightly famed for it's performance, for *all* users. 
For the notional convenience of those few writing threaded applications. 
Now don't go taking that the wrong way. In other circles, I am known as 
"Mr. Threading". At least for my advocacy of them, if not my expertise. 
Though I have been using threads for a relatively long time, going way 
back to pre-1.0 OS/2 (then known internally as CP/DOS). Only mentioned to 
show I'm not in the "thread is spelt f-o-r-k" camp.

For example, you're changing the char[][] passed in to main(). What if one 
of those strings is a literal in the read-only data section?

Okay. So that begs the question of how does runtime external data end up 
in a read-only data section? Of course, it can be done, but that then begs 
the question: why? But let's ignore that for now and concentrate on the 
development on my application that wants to mutate one or more of those 
strings.

The first time I try to mutate one, I'm going to hit an error, either 
compile time or runtime, and immediately know, assuming the error message 
is reasonably understandable, that I need to make a copy of the immutable 
to string into something I can mutate. A quick, *single* dup, and I'm away 
and running.

Provided that I have the tools to do what I need that is. In this case, 
and the entire point of the original post, that means a library of common 
string manipulation functions that work on my good old fashioned char[]s 
without my needing jump through the hoops of neo-orthodoxy to use them.

But, as I tried to point out in the post to which you replied, the whole 
'args' thing is a red herring. It was simply a convenient source of 
non-compile-time data. I couldn't get the std.stream example to compile. 
Apparently due to a bug in the v2 libraries--see elsewhere.

In this particular case, I turned to D in order to manipulate 125,000,000 
x 500 to 2000 byte strings. A dump of a inverted index DB. I usually do 
this kinda stuff in a popular scripting language, but that proved to be 
rather too slow for this volume of data. Each of those records needs to go 
through multiple mutations. From uppercasing of certain fields; the 
complete removal of certain characters within substantial subsets of each 
record; to the recalculation and adjustment of an embedded hex digest 
within each record to reflect the preceding changes. All told, each record 
my go through anything from 5 to 300 separate mutations.

Doing this via immutable buffers is going to create scads and scads of 
short-lived, immutable sub-elements that will just tax the GC to hell and 
impose unnecessary and unacceptable time penalties on the process. And I 
almost certainly will have to go through the process many times before I 
get the data in the ultimate form I need.

So what happens is code starts defensively making copies of the string 
"just in case." I'll argue that in a complex program, you'll actually wind 
up making far more copies than you will with invariant strings.
[from another post] I bet that, though, after a while they'll evolve to 
eschew it in favor of immutable strings. It's easier than arguing about it

You are so wrong here. I spent 2 of the worst years of my coding career 
working in Java, and ended up fighting it all the way. Whilst some of that 
was due to their sudden re-invention of major parts of the system 
libraries in completely incompatible ways when the transition from (from 
memory) 1.2 to 1.3 occurred--and being forced to make the change because 
of the near total abandonment of support or bug fixing for the 'old 
libraries'. Another big part of the problem was the endless complexities 
involved in switching between the String type and the StringBuffer type.

Please learn from history. Talk to (experienced) Java programmers. I mean 
real working stiffs, not OO-purists from academia. Preferably some that 
have experience of other languages also. It took until v1.5 before the 
performance of Java--and the dreaded GC pregnant pause--finally reached a 
point where Java performance for manipulating large datasets was both 
reasonable, and more importantly, reasonably deterministic. Don't make 
their mistakes over.

Too many times in the last thirty years I've seen promising, pragmatic 
software technologies tail off into academic obscurity because th primary 
motivators suddenly "got religion". Whether OO dogma or functional purity 
or whatever other flavour of neo-orthodoxy became flavour de jour, The 
assumption that "they'll see the light eventually" has been the downfall 
of many a promising start.

Just as the answer to the occasional hit-and-run death is not banning 
cars, so fixing unintentional aliasing in threaded applications does not 
lie in forcing all character arrays to be immutable.

For one reason, it doesn't stop there. Character arrays, are just arrays 
of numbers. Exactly the same problems arise with arrays of integers, 
reals, associative arrays. etc. Imagine the costs of duplicating an entire 
hash every time you add a new key or alter a value. The penalties grow 
exponentially with the size of the hash (array of ints, longs, reals ...).

And before you reject this notion on the basis that "I'd never do that", 
what's the difference? Are strings any more vulnerable to the problems 
invariance is meant to tackle that these other datatypes?

Try manipulating large datasets--images, DNA data, signal processing, 
finite element analysis; any of the types of applications for which 
multi-threading isn't just a way allow the program to do something useful 
while the user decides which button to click--in any of the "referentially 
transparent" languages that are concurrency capable and see the hoops you 
have to leap through to achieve anything like descent performance. Eg. 
Haskell Unsafe* library routines (Basically, abandon referential 
transparency for this data so that we can get something done in a 
reasonable time frame!). Look for "If you can match 1-core C speed using 
4-core Haskell parallelism without "unsafe pseudo-C in Haskell" trickery, 
I will be impressed. ..." in the following article:   
http://reddit.com/r/programming/info/61p6f/comments/

The abandonment or deprecation of lvalue slices on string types is the 
thin end of the wedge toward referential transparency and despite all the 
academic hype and impressive (small scale) demos of the 'match made in 
heaven' that is 'referential transparency & concurrency', try to seek out 
real-world examples of the combination running in real-world environments. 
Ie. Where someone other than the tax-payer of whatever country is paying 
for the development, and the time pressure to obtain the results are a 
little more demanding than Thesis submission date and you'll find them 
very conspicuous by their absence.

Such ideas look great on paper, in the heady world of ideal Turing 
Machines with unlimited length tapes (unbounded memory). But once you 
bring them back to the real world of finite RAM, fragmentable heaps and 
GC, they becomes impractical. Unworkable for real data sets in real time.

Don't feel the need to argue this on-forum. If it hasn't persuaded you 
that forcing invariance upon one datatype, through providing a string 
library that only work with invariant strings, will do little to address 
the problems it attempts to solve, then I doubt further discussion will. 
Please return to the pragmatism that so stood out in your early visions 
for D and abandon this folly before, as with so many of the follies of the 
gentleman academic of yore, it becomes a life-long quest ending up as a 
memorial or tombstone.

Cheers, b.
--

Apr 28 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Me Here (p9e883002 sneakemail.com)'s article
 Don't feel the need to argue this on-forum. If it hasn't persuaded you
 that forcing invariance upon one datatype, through providing a string
 library that only work with invariant strings, will do little to address
 the problems it attempts to solve, then I doubt further discussion will.

There's always Tango :p

 Please return to the pragmatism that so stood out in your early visions
 for D and abandon this folly before, as with so many of the follies of the
 gentleman academic of yore, it becomes a life-long quest ending up as a
 memorial or tombstone.

As a point of interest, this quote is at the top of the DigitalMars D page:

"It seems to me that most of the "new" programming languages fall into one
of two categories: Those from academia with radical new paradigms and those
from large corporations with a focus on RAD and the web. Maybe it's time for a
new language born out of practical experience implementing compilers." --
Michael


Sean

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Me Here wrote:
 Just as the answer to the occasional hit-and-run death is not banning 
 cars, so fixing unintentional aliasing in threaded applications does not 
 lie in forcing all character arrays to be immutable.

D does not force all character arrays to be immutable. You can use 
mutable ones by declaring them as:

	char[]

Reference types all come in 3 flavors: mutable, read-only-view-of (i.e. 
const) and invariant.

Apr 28 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Walter Bright wrote:

Me Here wrote:
Just as the answer to the occasional hit-and-run death is not banning  
cars, so fixing unintentional aliasing in threaded applications does not  
lie in forcing all character arrays to be immutable.

D does not force all character arrays to be immutable. You can use mutable 
ones by declaring them as:

	char[]

Reference types all come in 3 flavors: mutable, read-only-view-of (i.e. 
const) and invariant.

Well no, but having lhe string libraries only accept and return invariant 
strings it amounts to much the same thing.

I'm disappointed that's the only point from my post worthy of reaction :(

--

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Me Here wrote:
 Walter Bright wrote:
 
 Me Here wrote:
 Just as the answer to the occasional hit-and-run death is not 
 banning  cars, so fixing unintentional aliasing in threaded 
 applications does not  lie in forcing all character arrays to be 
 immutable.

 D does not force all character arrays to be immutable. You can use 
 mutable ones by declaring them as:

     char[]

 Reference types all come in 3 flavors: mutable, read-only-view-of 
 (i.e. const) and invariant.

 
 Well no, but having lhe string libraries only accept and return 
 invariant strings it amounts to much the same thing.

I agreed with Janet's proposal to create a parallel set of routines that 
worked on mutable strings.


 I'm disappointed that's the only point from my post worthy of reaction :(

It appeared to me to be based on the assumption that D forced all 
character arrays to be invariant.

Apr 28 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Walter Bright wrote:

 
 I'm disappointed that's the only point from my post worthy of reaction :(

 
 It appeared to me to be based on the assumption that D forced all character
arrays to be invariant.

Well no. It also went on to counter the idea that we're all going to  come
around to your way of thinking on this in short order.
And to attempt to dispell the idea that the provision of inmutable strings,
without doing the same for all the other datatypes, is going to fix anthing
major.

The exact same problems you describe for character arrays, exists for int
arrays and unit arrays and....hashes of every flavour. 
Fixing one, if fixing them is what this does, without also fixing all the
others, just moves the goal posts (a little). 

If a piece of code needs to know that the subject of a reference (string, int
array, hash, whatever), isn't going to change, 
it is (and should be) *its responsibility* to ensure that--by taking a private
copy.

Burdening all code with the costs of immutability just in case someone is
vulnerable to its mutation, *and* is too lazy to take a copy, 
seems like making everyne wear condoms in case someone might have sex. And
doing for just one type of array when they all suffer
from the same problem, doesn't seem liely to address the problems of unwanted
pregnancies.

 I agreed with Janet's proposal to create a parallel set of routines that
worked on mutable strings.

Sure. Sometime soon we will have a mutable string capable library again, and
then we'll see how beneficial immutable strings really are 
on the basis of how many people make use of them.

But that doesn't address the issue of the salience of the reasoning for having
them in the first place. Or the costs of using them in terms of 
stack fragmentation, additional GC runs, destruction of cache coherency, etc.
etc. etc.
--

Apr 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Me Here wrote:
 If a piece of code needs to know that the subject of a reference
 (string, int array, hash, whatever), isn't going to change, it is
 (and should be) *its responsibility* to ensure that--by taking a
 private copy.

There are two ways of doing it. One is COW, where those who make the 
change make the copy. The other way doesn't have a name, but it's making 
a copy "just in case" someone else might mutate it. I think you're 
proposing the latter. Invariant strings is a way of enforcing COW, 
rather than relying on documentation.

There's no doubt you can make JIC work successfully. I've used it myself 
for decades. But I always find myself expending effort trying to 
optimize away those copies, and so find it more productive to go the 
other way and use COW.

While I am comfortable using COW with mutable strings, the many many 
discussions of it in this forum made it clear that most would like to 
have some compiler help with it. Invariant strings fit the bill nicely.

Apr 28 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Walter Bright wrote:

There are two ways of doing it. One is COW, where those who make the 
change make the copy. The other way doesn't have a name, but it's making a 
copy "just in case" someone else might mutate it. I think you're proposing 
the latter. Invariant strings is a way of enforcing COW, rather than 
relying on documentation.

There's no doubt you can make JIC work successfully. I've used it myself 
for decades. But I always find myself expending effort trying to optimize 
away those copies, and so find it more productive to go the other way and 
use COW.

While I am comfortable using COW with mutable strings, the many many 
discussions of it in this forum made it clear that most would like to have 
some compiler help with it. Invariant strings fit the bill nicely.

Okay Walter,

This will be my last word on the subject. When I posted the headpost of 
this thread, I had no idea what I was getting into.
I've since taken the time to catch up on some of the history, along with 
that of the Phobos/Tango debate. See below.

As I see it, both mechanisms are "just in case". The difference is that 
with invariants and COW, everyone who /doesn't/ need immutability has
to copy so that the one person who does need it, if they indeed exist at 
all which we have no way of knowing, doesn't have to copy.

The other way, the one person who knows they need immutability has to 
copy, and everyone else simply ignores the issue.

If you're given a reference and you need it not to change, take a copy and 
hide it away. Then it cannot.
If you're given a reference and you don't care if it changes, (or you want 
to be apprised of any changes), use it, Keep it or throw it away.

Expecting everyone else to take extra precuations, always, "just in case", 
so that you don't have to take precautions even when you know
you need to, seems the height of selfishness.

STM (from elsewhere)

This whole thing of invariance and concurrency seems to be aimed at
enabling the use of COW.


Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at
its core?

I'm not sure that I follow the question in context, or the meaning of 
"copy-swap".

STM is an alternative to locking for concurrency control. Essentially, 
each reader of known (marked) shared state gets a copy of the state. And 
an internal copy is made.
If that reader later attempts to write back to the shared state, it's 
current value is compared against the internal copy taken when read,
If they are disparate, the code that is attempting to write gets rolled 
back to the read point and is given the updated value (and another 
internal copy is taken)
Lather, rinse, repeat until the copy and current values are the same, then 
commit the change and continue.

Fairly expensive, and only works for code that can be rolled back (ie. 
referentially transparent code).
Useless for anything that interacts with the outside world. Eg. writes to 
the screen, or a file, or the file system,
or reads from a non-rewindable source like a port or socket or the terminal.
Efficient if you live in a referentially transparent world--all data 
exists at compile time; no interaction with the outside world.
Next to useless otherwise. You still need locking or some other mechanism 
to deal with external state.

If that describes copy-swap then yes. Else no :)

Phobos vs. Tango

I definitely don't want the dead weight of pointless OO wrappers or deeply 
nested hierarchies. Nor the "everything must be OO" philosophy.

Once I regain access to std.string for my char[]s, (and a simple, 
expectation conformant rand() function :), I'll be happy.

Till then, I'll get outta yer hair and go back to trying to process my 
140GB of data using D1.
(
Which is a shame because I really like some of the language changes for 
D2. The extension to foreach for processing files looks cool.
I'd also vote for the convergence of for/foreach if that was possible 
without moving away from a context-free grammar,
I haven't had occasion to explore the lazyness facilties yet, but they 
sound cool.
Ditto the templating.
)

Despite our difference on the issue above, please add my goodwill and 
paudits to your trophy box for your vision and provision of D.

Cheers, b.

--

Apr 29 2008

Walter Bright <newshound1 digitalmars.com> writes:

Me Here wrote:
 If that describes copy-swap then yes. Else no :)

copy-swap is what lock free algorithms rely on for updating a data 
structure. It's at the root of STM, and even has its own TLA, CAS (Copy 
And Swap).

Apr 29 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Me Here wrote:
 If that describes copy-swap then yes. Else no :)

 copy-swap is what lock free algorithms rely on for updating a data
 structure. It's at the root of STM, and even has its own TLA, CAS (Copy
 And Swap).

I believe CAS actually stands for "compare and swap" or "compare and set"
depending on who you talk to.  RCU is probably a more popular algorithm
for copy and swap--it's used in the Linux kernel quite a bit.  It stands for
"read, copy, update," I believe.


Sean

Apr 29 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Sean Kelly wrote:

 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Me Here wrote:
 If that describes copy-swap then yes. Else no :)

 copy-swap is what lock free algorithms rely on for updating a data
 structure. It's at the root of STM, and even has its own TLA, CAS (Copy
 And Swap).

 
 I believe CAS actually stands for "compare and swap" or "compare and set"
 depending on who you talk to.  RCU is probably a more popular algorithm
 for copy and swap--it's used in the Linux kernel quite a bit.  It stands for
 "read, copy, update," I believe.
 
 
 Sean

From the litrature I found, CAS is (was originally) the name of the opcode
used on a Sun microprocessor to conditionally and atomically swap the contents 
of two words of memory (or maybe memory and register). 

It also mentions a CASX opcode, and a LL/SC (Load Linked / Store
Conditional) pairing that can be used as alternatives.

Cheers, b.
--

Apr 29 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Me Here (p9e883002 sneakemail.com)'s article
 Sean Kelly wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Me Here wrote:
 If that describes copy-swap then yes. Else no :)

 copy-swap is what lock free algorithms rely on for updating a data
 structure. It's at the root of STM, and even has its own TLA, CAS (Copy
 And Swap).

 I believe CAS actually stands for "compare and swap" or "compare and set"
 depending on who you talk to.  RCU is probably a more popular algorithm
 for copy and swap--it's used in the Linux kernel quite a bit.  It stands for
 "read, copy, update," I believe.


 Sean

 From the litrature I found, CAS is (was originally) the name of the opcode
 used on a Sun microprocessor to conditionally and atomically swap the contents
 of two words of memory (or maybe memory and register).
 It also mentions a CASX opcode, and a LL/SC (Load Linked / Store
 Conditional) pairing that can be used as alternatives.

Yeah, LL/SC is pretty cool.  The hardware transactional memory proposals I've
seen are like LL/SC on steroids.  Bit more flexible than CAS, but either works.


Sean

Apr 29 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Walter Bright wrote:

 Me Here wrote:
 If that describes copy-swap then yes. Else no :)

 
 copy-swap is what lock free algorithms rely on for updating a data structure.
It's at the root of STM, and even has its own TLA, CAS (Copy And Swap).

Ah! As in compare & exchange (cmpxchg & cmpxchg8b) x86 opcodes. I wasn't
thinking at the m/code level. 

Cheers, b.

--

Apr 29 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Me Here (p9e883002 sneakemail.com)'s article
 Phobos vs. Tango
 I definitely don't want the dead weight of pointless OO wrappers or deeply
 nested hierarchies. Nor the "everything must be OO" philosophy.
 Once I regain access to std.string for my char[]s, (and a simple,
 expectation conformant rand() function :), I'll be happy.

Please don't discount Tango based on what has been said about it in this
forum.  I know for a fact that Walter, for example, has never even looked
at Tango (or he hadn't as of a few weeks ago anyway).  In truth, the percentage
of classes to functions in Tango is roughly the same as in Phobos... Tango is
just a much larger library.  If you're interested in algorithms and string
operations,
I suggest looking at tango.core.Array and tango.text.*.  The former is basically
C++'s <algorithm> retooled for D arrays, and the latter holds all the
string-specific
routines in Tango.


Sean

Apr 29 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Sean Kelly wrote:

 == Quote from Me Here (p9e883002 sneakemail.com)'s article
 Phobos vs. Tango
 I definitely don't want the dead weight of pointless OO wrappers or deeply
 nested hierarchies. Nor the "everything must be OO" philosophy.
 Once I regain access to std.string for my char[]s, (and a simple,
 expectation conformant rand() function :), I'll be happy.

 
 Please don't discount Tango based on what has been said about it in this
 forum.  I know for a fact that Walter, for example, has never even looked
 at Tango (or he hadn't as of a few weeks ago anyway).  In truth, the percentage
 of classes to functions in Tango is roughly the same as in Phobos... Tango is
 just a much larger library.  If you're interested in algorithms and string
operations,
 I suggest looking at tango.core.Array and tango.text.*.  The former is
basically
 C++'s <algorithm> retooled for D arrays, and the latter holds all the
string-specific
 routines in Tango.
 
 
 Sean

The primary basis of my immediate decision regarding Tango was it
incompatibility 
with Phobos as outlined in 

    http://www.d-riven.com/index.cgi?tango-phobos

(And several other first page hits when googling for "D Tango Phobos")

Beyond that, I'm in favour of OO when only when it truly benefits me. That is, 
when it manages state that I would otherwise *have* to manage myself. 

That, for example, does not mean simply substituting an object handle for an OS
handle.
Nor caching of derived values unless their derivation is truly expensive.
Nor the use of getters and setters to avoid direct manipulation of attributes,
unless there is some genuine value-add from doing so. 
OO-dogma that they will isolate the library from speculative future changes in
the 
underlying OS calls (that have been fixed in stone for 1 or4 decades or more) 
do not cut much ice with me.

I'm also not fond of all-in-one library packaging. Seems to me that there is
enough 
information in the source code to allow libraries to be packaged as discrete
dlls/sos 
and to only statically link against those required. But that may be a tool
chain problem
rather than anything to do with Tango.

It should be possible to substitute one implementation of a std.* library for
another,
without it being an all or nothing change. I should be able mix'n'match between 
implementations of std.* packages. 

For example, with the std.string problem I've been having. If I use

    import std.string;

    char[] a = readln();
    a.toupper();

it should work. If I do

    import std.string.immutable;

it wouldn't.

One of the things that force me to go away from D a couple of years ago was the
ever changing state of the
libraries. Not the internal, implementations or occasional bugs, but the
constantly changing interface definitions. 

It becomes impractical to develop a major project when you're constantly
rewriting major chunks of code to accommodate 
the latest set of group think on the best way to package the OS and "C lib"
functionality.

Back then, I put it down to the necessary gestation of a new language, and
moved away to get my project done. 
I've now come back and find that the same situation exists. The answer to an
essentially trivial problem is to write
and entire new library. Or rather, since the library I need was already a part
of Phobos with D 0.-something
resurrect and old library. 

And that's the most worrying thing of all. The removal of the existing library
from Phobos because the main proponents 
suddenly drank the cool aid of Invariant strings--especially for reasoning that
I still find entirely specious--does not bode well
for ongoing stability

I had hoped that during my two years away, that at least the interfaces would
have become standardised, 
even if the implementations varied from release to release. But if whole chunks
of functionality can suddenly 
disappear from the library, at the same time as major new chunks of very
desirable functionality are added to the language,
on the whim of 1 or 4 major players getting religion, then I'm really not sure
that D is, or will ever be, 
ready for anything other than academic exploration of compiler technology.

Reading that back. the independence of Tango begins to be more attractive, even
if I have a distaste for the 
"everything must be OO" philosophy that (apparently) underlies it. Maybe I'll
pull a copy and look for myself.

For my current needs, I'm just looking for C speed with having to manage my own
memory

For the project I went away from D for 2 years ago, and came back hoping for
stability, my own personal
research project come memorial folly to be, I don't think D is yet ready for
that. Maybe D1 if it doesn't 
becomes completely unsupported.  

In the interim I've "done the rounds" of an amazing variety of languages. From
the functional brigade, 
Haskell, OCaml, Mozart/Oz, Erlang  et al. and various of the newer dynamic
languages. They all have their 
attractions, but most are spoilt by some level of dogma. Haskell with is
purity. Python with the whole 
significant whitespace thing. P6 with unix-first, and non-delivery. 

Mostly, the dynamics lack the low-level control and performance I need. I've
been seriously working with
structured assembler to achieve the low level control and performance I want,
but doing everything yourself
just takes you off into far to many interesting side projects. Implementing
your own memory management 
could occupy a lifetime; especially if you consider the possibility and
advantages of using (wait for it) a 
segmented architecture. Most older programmers memory of segmented memory stems
from the 16-bit Intel
days and they (almost) universally eschew any notions of it now a 32-bit (and
64-bit) flat memory models are available.
But there are some very interesting possibilities in combining 32-bit segments
and virtual memory.

D is my last best hope of avoiding the assembler route and trying to do it all
myself. Walter's pragmatism stood out
in my early experience of both the language and library design--al be it that
they kept changing;)--but I was really 
expecting (hoping) for greater stability by this point. 

Ooh. Did I write all that? Still. It has persuaded me to at least go look at
Phobos, even if it is done with a jaundiced eye.
A stable, even if philosophically distasteful, implementation of the staples is
better than a philosophically desirable but 
whimsically changing one.

Cheers for prompting me to re-think my blanket dismissal. b.








--

Apr 29 2008

"Me Here" <p9e883002 sneakemail.com> writes:

A stable, even if philosophically distasteful, implementation of the 
staples is better than a philosophically desirable but
whimsically changing one.

.
I of course meant Tango.


--

Apr 29 2008

Walter Bright <newshound1 digitalmars.com> writes:

Me Here wrote:
 That, for example, does not mean simply substituting an object handle
 for an OS handle. Nor caching of derived values unless their
 derivation is truly expensive. Nor the use of getters and setters to
 avoid direct manipulation of attributes, unless there is some genuine
 value-add from doing so. OO-dogma that they will isolate the library
 from speculative future changes in the underlying OS calls (that have
 been fixed in stone for 1 or4 decades or more) do not cut much ice
 with me.

I'm of the same opinion with that.

 One of the things that force me to go away from D a couple of years
 ago was the ever changing state of the libraries. Not the internal,
 implementations or occasional bugs, but the constantly changing
 interface definitions.

That's why D 1.0 was split off. It was done to provide a stable platform 
that only gets bug fixes.

Apr 29 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Walter Bright wrote:

Me Here wrote:
That, for example, does not mean simply substituting an object handle
for an OS handle. Nor caching of derived values unless their
derivation is truly expensive. Nor the use of getters and setters to
avoid direct manipulation of attributes, unless there is some genuine
value-add from doing so. OO-dogma that they will isolate the library
from speculative future changes in the underlying OS calls (that have
been fixed in stone for 1 or4 decades or more) do not cut much ice
with me.

I'm of the same opinion with that.

One of the things that force me to go away from D a couple of years
ago was the ever changing state of the libraries. Not the internal,
implementations or occasional bugs, but the constantly changing
interface definitions.

That's why D 1.0 was split off. It was done to provide a stable platform 
that only gets bug fixes.

Understood, but when I went to upgrade from my very old 1.x version and 
discovered there was a D2,
I did look for an explaination (on the web sote rather than in the forums) 
and came up short.

I guess the clue was in the alpha status, but a few lines somewhere on the 
download page explaining the
difference wouldn't go amiss.

As I also mentioned, the descriptions I found (whilst looking for the 
above) of the new D2 language features
drew me to it. Without thinking the the implications, it strikes me that a 
segregation of the compilers from
the runtimes would allow the mating of the d2 compiler with the D1 
libraries?

The D1 libraries themselves would not use or benefit from the new D2 
language features but it would allow
applications access to those features whilst retaining the stability of D1 
libraries.

But that probably entails extra work, as well as a not inconsiderable 
amount of careful consideration
regarding the long term implications, so don't take it as a request. Just 
a notion in passing.

Anyway, I did promise to get outta yer hair, so...I'm gone.

Thanks again, b.

--

Apr 29 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Me Here (p9e883002 sneakemail.com)'s article
 As I also mentioned, the descriptions I found (whilst looking for the
 above) of the new D2 language features
 drew me to it. Without thinking the the implications, it strikes me that a
 segregation of the compilers from
 the runtimes would allow the mating of the d2 compiler with the D1
 libraries?

See Tango, once again ;-)  In fact, as things stand, the same runtime could
be used for both 1.0 and 2.0 with a bit of work.  I haven't done this with
Tango mostly because I lack the time, but it's quite possible.  Alternately,
separate runtimes could be distributed and linked individually without
pulling in the bulk of an entire standard library.  The "Advanced
Configuration Guide" I liked previously for Tango shows how to do it.
This is simply more flexibility than the typical user cares about or wants
to deal with, so the sub-libraries are repackaged into a larger aggregate
library for the default distribution.  But if you build Tango locally you'll
find that the sub-libraries are still built behind the scenes.  From memory,
the names are:

libtango-rt-dmd.a : Tango runtime for DMD
libtango-rt-gdc.a : Tango runtime for GDC
libtango-gc-basic.a : Tango basic/default garbage collector
libtango-cc-tango.a : "common code" for the Tango standard library

The runtime contains only the compiler runtime code, the GC library
only the garbage collector, and the "common code" library contains
user-facing code which actually needs to be linked into every D
application--that being thread code and some error handling routines.
If you want to build on a system with no multithreading, for example,
simply toss stub out the 3 or so calls that this module exposes.

 The D1 libraries themselves would not use or benefit from the new D2
 language features but it would allow
 applications access to those features whilst retaining the stability of D1
 libraries.

Right.  The greatest obstacle here is the const design, since the meaning
of "const" is actually different between D 1.0 and 2.0, as well as the
requirement that even code in version blocks must be syntactically
correct.  Thus to support a toString method that returns a const string,
for example, the return value must be declared as an alias using a string
mixin.  Messy stuff, but it does work.


Sean

Apr 29 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Sean Kelly wrote:

 Right.  The greatest obstacle here is the const design, since the meaning
 of "const" is actually different between D 1.0 and 2.0, as well as the
 requirement that even code in version blocks must be syntactically
 correct.  Thus to support a toString method that returns a const string,
 for example, the return value must be declared as an alias using a string
 mixin.  Messy stuff, but it does work.
 

I personally think that for a big library like Tango, using a 
preprocessor would be a less painful way to go.  The shipped versions of 
the lib would have the preprocessor already pre-run, so would be pure D 
code.

--bb

Apr 29 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Me Here (p9e883002 sneakemail.com)'s article
 Sean Kelly wrote:
 == Quote from Me Here (p9e883002 sneakemail.com)'s article
 Phobos vs. Tango
 I definitely don't want the dead weight of pointless OO wrappers or deeply
 nested hierarchies. Nor the "everything must be OO" philosophy.
 Once I regain access to std.string for my char[]s, (and a simple,
 expectation conformant rand() function :), I'll be happy.

 Please don't discount Tango based on what has been said about it in this
 forum.  I know for a fact that Walter, for example, has never even looked
 at Tango (or he hadn't as of a few weeks ago anyway).  In truth, the percentage
 of classes to functions in Tango is roughly the same as in Phobos... Tango is
 just a much larger library.  If you're interested in algorithms and string
operations,
 I suggest looking at tango.core.Array and tango.text.*.  The former is
basically
 C++'s <algorithm> retooled for D arrays, and the latter holds all the
string-specific
 routines in Tango.

 The primary basis of my immediate decision regarding Tango was it
incompatibility
 with Phobos as outlined in
     http://www.d-riven.com/index.cgi?tango-phobos
 (And several other first page hits when googling for "D Tango Phobos")

For what it's worth, the "Tangobos" project is a port of Phobos to the Tango
runtime,
and a pre-packaged version is available on the Tango website.  If you compare
the
source code with Phobos itself, you'll find that there are precious few diffs
anywhere
in the entire package, and the few that exist are mostly in std.thread.  So
this may be
an option if you'd like to use both together.

 Beyond that, I'm in favour of OO when only when it truly benefits me. That is,
 when it manages state that I would otherwise *have* to manage myself.
 That, for example, does not mean simply substituting an object handle for an
OS handle.
 Nor caching of derived values unless their derivation is truly expensive.
 Nor the use of getters and setters to avoid direct manipulation of attributes,
 unless there is some genuine value-add from doing so.

That's the basic philosophy behind Tango.  In fact, the bulk of the objects in
Tango are
in the IO package, with much of the remainder being in places where
polymorphism is
desirable (localization, for example).  If you find an object in Tango that has
no actual
state information, it's generally packages as a class for this reason.

 OO-dogma that they will isolate the library from speculative future changes in
the
 underlying OS calls (that have been fixed in stone for 1 or4 decades or more)
 do not cut much ice with me.

Me either.  However, since Tango is portable across Win32 and Posix systems
(currently),
I do think an argument could be made for some level of abstraction.  But the C
API headers
are available as well if you really want to use them.

 I'm also not fond of all-in-one library packaging. Seems to me that there is
enough
 information in the source code to allow libraries to be packaged as discrete
dlls/sos
 and to only statically link against those required. But that may be a tool
chain problem
 rather than anything to do with Tango.

This was actually driven by fairly vocal user request.  The original conception
was for Tango
to be a lightweight, modular framework to be extended by users rather than a
monolithic
library.  In fact, we didn't even distribute an all-in-one prebuilt library for
Tango until
sometime last summer.  Before that we expected that a tool such as Bud or
Rebuild would
be used.  This is still quite possible however, and the modular design in terms
of code
dependency is still in place.  If you choose to toss the tango-user lib
altogether and find
you want even more modularity, I suggest looking at this page:

http://dsource.org/projects/tango/wiki/TopicAdvancedConfiguration

As far as I know, I'm the only one that has actually read it so it's a bit out
of date (the library
names are wrong), but the basic concept still applies.  That is, the choice of
a GC can be made
at link-time with Tango, and other portions of the runtime are easily
replaceable as well.  Some
kernel projects have found this useful in the past.

 It should be possible to substitute one implementation of a std.* library for
another,
 without it being an all or nothing change. I should be able mix'n'match between
 implementations of std.* packages.
 For example, with the std.string problem I've been having. If I use
     import std.string;
     char[] a = readln();
     a.toupper();
 it should work. If I do
     import std.string.immutable;
 it wouldn't.

Agreed.  My biggest complaint here is having to maintain two essentially
identical packages,
assuming I were to do such a thing.  This is why I find Steven's "scoped const"
proposal so
attractive.

 One of the things that force me to go away from D a couple of years ago was
the ever changing state of the
 libraries. Not the internal, implementations or occasional bugs, but the
constantly changing interface definitions.
 It becomes impractical to develop a major project when you're constantly
rewriting major chunks of code to accommodate
 the latest set of group think on the best way to package the OS and "C lib"
functionality.
 Back then, I put it down to the necessary gestation of a new language, and
moved away to get my project done.
 I've now come back and find that the same situation exists. The answer to an
essentially trivial problem is to write
 and entire new library. Or rather, since the library I need was already a part
of Phobos with D 0.-something
 resurrect and old library.
 And that's the most worrying thing of all. The removal of the existing library
from Phobos because the main proponents
 suddenly drank the cool aid of Invariant strings--especially for reasoning
that I still find entirely specious--does not bode well
 for ongoing stability

The lack of responsiveness of the Phobos maintainers (ie. Walter) is what drove
us to create
Tango in the first place.  I'll freely admit that the design of Tango has
changed here and there
as we've moved through beta, but it's largely solidified now and will be frozen
once we hit 1.0.
Neither I nor the other Tango developers have any desire to maintain deprecated
code and the
like, so we've been doing our utmost to find a design that we hope will last. 
Also, that there
is at least one commercial project based on Tango (I think there are actually
more, but I don't
keep track of this very closely) says a lot about the library's stability and
its support.

 I had hoped that during my two years away, that at least the interfaces would
have become standardised,
 even if the implementations varied from release to release. But if whole
chunks of functionality can suddenly
 disappear from the library, at the same time as major new chunks of very
desirable functionality are added to the language,
 on the whim of 1 or 4 major players getting religion, then I'm really not sure
that D is, or will ever be,
 ready for anything other than academic exploration of compiler technology.

To be honest, I actually feel much the same.  However, I also feel that D 1.0
is a fantastically
designed language overall, and I would choose it in a second over C or C++ (I'm
a systems
programmer so those are really the only other choices available).  So at the
end of the day, I
will be disappointed of the "future of D" takes a hard left-hand turn towards
somewhere
I have no interest in going, but since I'm really quite happy with D 1.0 I
won't shed too many
tears over it.  This perhaps doesn't bode terribly well for my use of D in the
long-term, but
that's a bridge I'll jump off if and when I come to it.

 Reading that back. the independence of Tango begins to be more attractive,
even if I have a distaste for the
 "everything must be OO" philosophy that (apparently) underlies it. Maybe I'll
pull a copy and look for myself.
 For my current needs, I'm just looking for C speed with having to manage my
own memory
 For the project I went away from D for 2 years ago, and came back hoping for
stability, my own personal
 research project come memorial folly to be, I don't think D is yet ready for
that. Maybe D1 if it doesn't
 becomes completely unsupported.

If it's speed you're looking for, Tango is it ;-)  The IO subsystem trounces
pretty much everything
I've seen it compared to, for example.  In practice, I think you'll find that
one reason for this is that
no hidden allocations take place anywhere in Tango.  This tends to conflict
with convenient use in
some cases for simple apps however, so as things stand now I do think that some
users would
benefit from convenience wrappers.  I tend to do this sort of thing myself for
my own projects,
but a third-party package would be nice for those less inclined.  If you're
interested in direct
performance comparisons however, I suggest reading the "benchmarks" links on
this page:

http://dsource.org/projects/tango/wiki/Documentation

The XML tests in particular are pretty astounding (I feel comfortable saying
that because I had
nothing to do with the development of that particular package :).

 In the interim I've "done the rounds" of an amazing variety of languages. From
the functional brigade,
 Haskell, OCaml, Mozart/Oz, Erlang  et al. and various of the newer dynamic
languages. They all have their
 attractions, but most are spoilt by some level of dogma. Haskell with is
purity. Python with the whole
 significant whitespace thing. P6 with unix-first, and non-delivery.
 Mostly, the dynamics lack the low-level control and performance I need. I've
been seriously working with
 structured assembler to achieve the low level control and performance I want,
but doing everything yourself
 just takes you off into far to many interesting side projects. Implementing
your own memory management
 could occupy a lifetime; especially if you consider the possibility and
advantages of using (wait for it) a
 segmented architecture. Most older programmers memory of segmented memory
stems from the 16-bit Intel
 days and they (almost) universally eschew any notions of it now a 32-bit (and
64-bit) flat memory models are available.
 But there are some very interesting possibilities in combining 32-bit segments
and virtual memory.
 D is my last best hope of avoiding the assembler route and trying to do it all
myself. Walter's pragmatism stood out
 in my early experience of both the language and library design--al be it that
they kept changing;)--but I was really
 expecting (hoping) for greater stability by this point.

Personally, the combination I find most attractive for my work right now is
Erlang backed by C or D for the
performance-critical work.  That gives me the easy parallelization I want, IPC,
etc, plus an easy way of
optimizing the heck out of trouble spots... or simply sidestepping the strict
functional model when data
sharing is actually needed.  I've actually come to feel that having a language
separation here is a good thing
as well, because it prevents "bleed through" of concepts which I feel risks
poisoning the efficacy of each
approach.

As for the rest, Kris, one of other Tango developers, has done a lot of work in
the embedded space and
pushed very hard in the past for D to better support this style of programming.
 He wasn't terribly successful
insofar as language/compiler development was concerned (there has been a lot of
talk in the past about TypeInfo
in particular), but Tango, at least, was designed with embedded development in
mind.  The lack of hidden DMA,
for example.  I can't say whether Tango will suit your needs, but it does seem
to at least match your general
goals with D.

 Ooh. Did I write all that? Still. It has persuaded me to at least go look at
Phobos, even if it is done with a jaundiced eye.
 A stable, even if philosophically distasteful, implementation of the staples
is better than a philosophically desirable but
 whimsically changing one.
 Cheers for prompting me to re-think my blanket dismissal. b.

Thank you for reconsidering :-)  D may be a young language, but there has
really been quite a bit of drama
surrounding it in newsgroup discussion.  I think it can sometimes be difficult
to look past all this and take the
time to decide for oneself.  If nothing else, doing so takes time, and even I
tend to employ the "30 second rule"
when it comes to new technology.


Sean

Apr 29 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Sean Kelly wrote:
 If it's speed you're looking for, Tango is it ;-)  The IO subsystem trounces
pretty much everything
 I've seen it compared to, for example.  In practice, I think you'll find that
one reason for this is that
 no hidden allocations take place anywhere in Tango.  This tends to conflict
with convenient use in
 some cases for simple apps however, so as things stand now I do think that
some users would
 benefit from convenience wrappers.  

Yes please! votes++
I don't need blazing speed for my debug printfs. I need the most 
convenient API possible. Stdout.print("hi").newline is not quite that.

 I tend to do this sort of thing myself for my own projects,
 but a third-party package would be nice for those less inclined.  

I don't know why, but I just really dislike seeing little "mytools" 
dependencies dangling off of what would otherwise be nice little 
self-contained modules.  Maybe I just find it makes it harder to reuse 
code.  This module depends on "mytools" but over there we're using 
"yourtools".  Do we merge them to become "ourtools", or keep both, or 
port my code to use yourtools instead?  It's just easier to mix and 
match if a module doesn't have such frivolous external dependencies.

--bb

Apr 29 2008

terranium <spam here.lot> writes:

Walter Bright Wrote:

 Lars Ivar Igesund wrote:
 After working with Java for quite some time, I have naturally drifted from
 using invariant strings to stringbuffers.

 
 Java strings lack slicing, so they're crippled anyway. I believe that 
 slicing is one of those paradigm-shifting features, so I am not making 
 an irrelevant point.

Building an SQL query with multiple concats is a well-known pitfall for novice
programmers.

Apr 29 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Is all this Invarient **** er... stuff, premature optimisation?