Hi all,
'scuse me for not being familiar with previous or ongoing discussion on this
subject, but I'm just coming back to D after a couple of years away.
I have some strings read in from external source that I need to convert to
uppercase. A quick look at Phobos and I find std.string has a toupper method.
import std.stdio;
import std.string;
int main( char[][] args ) {
char[] a = args[ 0 ].toupper();
writefln( a );
return 0;
}
c:\dmd\test>dmd junk.d
junk.d(5): function std.string.toupper (invariant(char)[]) does not match
parameter types (char[])
junk.d(5): Error: cannot implicitly convert expression (args[0u]) of type
char[]
to invariant(char)[]
junk.d(5): Error: cannot implicitly convert expression (toupper(cast(invariant
(char)[])(args[0u]))) of type invariant(char)[] to char[]
Hm. Okey dokey.
import std.stdio;
import std.string;
int main( char[][] args ) {
char[] a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
writefln( a );
return 0;
}
junk.d(5): Error: cannot implicitly convert expression (toupper(cast(invariant
(char)[])(args[0u]))) of type invariant(char)[] to char[]
Shoulda known :(
import std.stdio;
import std.string;
int main( char[][] args ) {
string a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
writefln( a );
return 0;
}
c:\dmd\test>dmd junk.d
c:\dmd\test>junk
C:\DMD\TEST\JUNK.EXE
Great! Now I need to replace the bit in the middle:
import std.stdio;
import std.string;
int main( char[][] args ) {
string a = ( cast(invariant(char)[]) args[ 0 ] ).toupper();
a[ 2 .. 4 ] = "XXX";
writefln( a );
return 0;
}
c:\dmd\test>dmd junk.d
junk.d(6): Error: slice a[cast(uint)2..cast(uint)4] is not mutable
Wha..? What's the point in having slices if I can't use them?
import std.stdio;
import std.string;
int main( char[][] args ) {
char[] a = cast(char[]) ( cast(invariant(char)[]) args[ 0 ] ).toupper();
a[ 2 .. 4 ] = "XXX";
writefln( a );
return 0;
}
Finally, it works. But can you see what's going on in line 5 amongst all that
casting? Cos I sure can't.
So, I read that all this invarient stuff is about efficiency. For whom?
Must be the compiler because it sure ain't about programmer efficiency.
Ah. Maybe I meant to ignore the beauty of slices and use strings and method
calls for everything?
import std.stdio;
import std.string;
int main( string[] args ) {
string a = args[ 0 ].toupper();
a.replace( a[ 2 .. 4 ], "XXX" );
writefln( a );
return 0;
}
Compiles clean and runs:
c:\dmd\test>dmd junk.d
c:\dmd\test>junk
C:\DMD\TEST\JUNK.EXE
But does nothing!
import std.stdio;
import std.string;
int main( string[] args ) {
string a = args[ 0 ].toupper();
a = a.replace( a[ 2 .. 4 ], "XXX" );
writefln( a );
return 0;
}
c:\dmd\test>dmd junk.d
c:\dmd\test>junk
C:XXXMD\TEST\JUNK.EXE
Finally, it runs. But at what cost? The 'immutable' a has ended up being
mutated.
I still had to specify the slice, but I had to call another method call to
actually
do the deed.
Of course, a wasn't really mutated. Instead, args[0] was copied and then
mutated and labelled a. Then a was copied and mutated and reassigned the
mutated copy.
So, that's two copies of the string, plus a slice, plus an extra method call to
achieve what used to be achievable in place on the original string. Which is
now
immutable, but I'll never need it again.
Of course, on these short 1-off strings it doesn't matter a hoot. But when the
strings are 200 to 500 characters a pop and there are 20,000,000 of them. It
matters.
Did I suggest this was an optimisation?
Whatever immutability-purity cool aid you've been drinking, please go back to
coke. And give us usable libraries and sensible implicit conversions. Cos this
sucks
bigtime.
b.
Of course, a wasn't really mutated. Instead, args[0] was copied and th=
mutated and labelled a. Then a was copied and mutated and reassigned t=
mutated copy.
So, that's two copies of the string, plus a slice, plus an extra metho=
call to
achieve what used to be achievable in place on the original string. =
Which is now
immutable, but I'll never need it again.
Of course, on these short 1-off strings it doesn't matter a hoot. But =
when the
strings are 200 to 500 characters a pop and there are 20,000,000 of =
them. It
matters.
Did I suggest this was an optimisation?
Whatever immutability-purity cool aid you've been drinking, please go =
back to
coke. And give us usable libraries and sensible implicit conversions. =
Cos this sucks
bigtime.
b.
Is this what you wanted to write?
int main(string[] args)
{
char[] a =3D cast(char[])args[0];
a[2..5] =3D "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there's=
a
cast there, but it's not all that bad, is it?
Of course, a wasn't really mutated. Instead, args[0] was copied and t=
mutated and labelled a. Then a was copied and mutated and reassigned =
mutated copy.
So, that's two copies of the string, plus a slice, plus an extra meth=
call to
achieve what used to be achievable in place on the original string. =
Which is now
immutable, but I'll never need it again.
Of course, on these short 1-off strings it doesn't matter a hoot. But=
when the
strings are 200 to 500 characters a pop and there are 20,000,000 of =
them. It
matters.
Did I suggest this was an optimisation?
Whatever immutability-purity cool aid you've been drinking, please go=
back to
coke. And give us usable libraries and sensible implicit conversions.=
Cos this sucks
bigtime.
b.
Is this what you wanted to write?
int main(string[] args)
{
char[] a =3D cast(char[])args[0];
a[2..5] =3D "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there=
a
cast there, but it's not all that bad, is it?
Sorry, forgot the .toupper() call there. Should be
char[] a =3D cast(char[])args[0].toupper();
-- Simen
<p9e883002 sneakemail.com> wrote:
Is this what you wanted to write?
int main(string[] args)
{
char[] a =3D cast(char[])args[0];
a[2..5] =3D "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there=
a
cast there, but it's not all that bad, is it?
Sorry, forgot the .toupper() call there. Should be
char[] a =3D cast(char[])args[0].toupper();
-- Simen
Okay, you got around the first cast by using
int main( string[] ) {
So now you want to lowercase it again:
import std.stdio;
import std.string;
int main( string[] args) {
char[] a = cast(char[])args[0].toupper();
a[2..5] = "XXX";
a = a.tolower;
writefln(a);
return 0;
}
c:\dmd\test>dmd junk.d
junk.d(7): Error: no property 'tolower' for type 'char[]'
junk.d(7): Error: cannot implicitly convert expression (1) of type int to char[]
junk.d(7): Error: cannot cast int to char[]
junk.d(7): Error: integral constant must be scalar type, not char[]
So, cast a back to being a string, so that we can call tolower() on it and then
cast
the copied mutated string back to a char[]:
import std.stdio;
import std.string;
int main( string[] args) {
char[] a = cast(char[])args[0].toupper();
a[2..5] = "XXX";
a = cast(char[]) ( ( cast(string)a ).tolower );
writefln(a);
return 0;
}
c:\dmd\test>dmd junk.d
junk.d(7): Error: no property 'tolower' for type 'invariant(char)[]'
junk.d(7): Error: cannot cast int to char[]
junk.d(7): Error: integral constant must be scalar type, not char[]
junk.d(7): Error: cannot cast int to char[]
junk.d(7): Error: integral constant must be scalar type, not char[]
junk.d(7): Error: cannot implicitly convert expression (0) of type int to char[]
junk.d(7): Error: cannot cast int to char[]
junk.d(7): Error: integral constant must be scalar type, not char[]
Nope. That don't work.
import std.stdio;
import std.string;
int main( string[] args) {
char[] a = cast(char[])args[0].toupper();
a[2..5] = "XXX";
a = cast(char[])tolower( cast(string)a );
writefln(a);
return 0;
}
Finally. It works.
Summary:
If I want to be able to lvalue slice operations on 'strings' (for efficiency) I
have
to have them as char[].
If I want to be able to use std.string methods on those same strings, I have to
cast them to invariant(char)[] and the results back to char[] which involves a
at
least one copy operation, and probably two.
And the invariant-ness of the string library is done "for efficiency"?
Cheers, b.
import std.string;
int main( string[] args) {
char[] a = cast(char[])args[0].toupper();
**** UNDEFINED BEHAVIOR ****
(1) args might be placed in a hardware-locked read-only segment. Then
the following line would fail
(2) there might be other pointers to the string, which expect it never
to change.
a[2..5] = "XXX";
a = cast(char[])tolower( cast(string)a );
writefln(a);
return 0;
}
Finally. It works.
But not necessarily on all architectures, because of the undefined
behavior. This is how you do it without undefined behavior.
import std.string;
int main( string[] args) {
string a = args[0].toupper();
a = a[0..2] ~ "XXX" ~ a[5..$];
a = a.tolower();
writefln(a);
return 0;
}
Ack! That's horrible. Instead of using the information I have, the
offset and length of the slice I want to manipulate, I have to derive
two offset/length pairs to the bits I do not want to do anything to.
1) Whatever happened to polymorphism?
Eg. Why can't the standard string library recognise that I, as the
programmer, know what I need to do to my data. It's my job.
So, if I assign the results of a string library function/method to a
mutable variable (Just a variable really. An invariant variable is a
constant!), then it should be possible (*IS* possible) to recognise
that and return an appropriate result. Duplicating the input if
required.
The idea that runtime obtained or derived strings can be made truely
invariant is purely theoretical. Whilst the compiler can place compile
time contants into hardware protected, read-only memory segments, doing
this at runtime would be horribly costly and hardly beneficial.
IA-86 allows memory to be set readonly at runtime, but only in page
sized chunks. Which means that either:
- every derived string would need to be placed in its own 4k multiple
sized chunk of ram.
-or, each page would have to constantly be switched from read-only to
read-write and back again as new entities are added and old ones go out
of scope.
And if you are not using hardware protection, then the invariance is
only notional as D can call C, and C allows me access to pointers. And
once I have one of those, I can scribble anywhere that isn't hardware
protected.
All this smacks of D reinventing, with all the same mistakes, the whole
Java String vs. StringBuffer dichotomy:
http://www.javaworld.com/javaworld/jw-03-2000/jw-0324-javaperf.html
And Java had the VM to isolate it from non-complient code.
One of several "mission statements" that drew me to D when I forst
encountered it nearly 3 years ago, was the pragmatism embodied in
articles like this:
http://www.digitalmars.com/d/2.0/builtin.html
and this:
http://www.digitalmars.com/d/2.0/cppstrings.html
and statements like this:
"No pointless wrappers around C runtime library functions or OS API
functions D provides direct access to C runtime library functions and
operating system API functions. Pointless D wrappers around those
functions just adds blather, bloat, baggage and bugs."
Coming back to try and use D after a prolonged absence, the changes in
the interim period seem to be eshewing that pragmatism in favour of
some kind of mixed OO/functional purity ethic. Is there an ex-Haskeller
in the house?
I admit openly to still being in the throws of finding my way around
the language and the library, and have being making seemingly
elementary mistakes in interpreting the documentation. But one of the
major attractions of D over C/C++ is its built-in string types and
manipulations. As good as these are, there is still the need for a
library of common operations upon them. If everytime I want to use one
of these library calls, I have to cast my mutable string into and
invariant and then cast the result back to mutable inorder to be able
to use the built-in manipulations, lifes going to get very boring, very
fast.
The alternative I guess is to sit down and write my own library that
performs the same operations as std.string, but on the native string
type. Which kinda dilutes the purpose of having standard libraries.
Sorry to be so verbose, and please don't anyone take any of this
personally. I'm critiquing the code I am encountering, and the problems
I am having using it. Not the prople who wrote it.
Cheers, b.
--
you could do
char[] tmp = a.dup;
tmp[2..5] = "XXX";
a = assumeUnique(tmp);
Ah! Again, 3 lines instead of 1. Plus two function calls and a temporary
variable.
You do realise that there is a very strong correlation between bugs and line
count?
That's been so for all of the last 30+years regardless of language or paradigm.
So, you made it more verbose and more complex and much slower.
And, in doing so, introduced more scopes for errors than you've cured.
That's one approach. Another is don't try to treat strings as mutable.
Ram is mutable--it's its purpose in being,
Variables live in RAM, and vary--that's their purpose in being.
Making a copy of a <strike>string</strike> piece of ram and throwing the old
one away,
every time I want alter its contents...kinda reminds me of disposable nappies.
A costly convenience.
I'll revert to 1.x and pray that 2.x fades away through lack of interest before
it turns
D into Yet Another Dead Language--for OO purists and academics only.
Cheers, b.
--
Apr 28 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Janice Caron wrote:
If there's enough interest, and if Walter approves, I could certainly
kickstart std.stringbuffer. Is that the right way to go? What do
people think?
What it will do is provide a useful solution for those who really want
to use mutable strings. I bet that, though, after a while they'll evolve
to eschew it in favor of immutable strings. It's easier than arguing
about it <g>.
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
Janice Caron wrote:
If there's enough interest, and if Walter approves, I could certainly
kickstart std.stringbuffer. Is that the right way to go? What do
people think?
to use mutable strings. I bet that, though, after a while they'll evolve
to eschew it in favor of immutable strings. It's easier than arguing
about it <g>.
I do agree with the notion that the majority of operations performed
on strings in a typical application do not modify the string in place.
However, in performance-oriented server applications, is it very
common to hold and reuse a mutable buffer between calls to avoid
the const of reallocation. Assuming that references to this data are
passed around during the processing of a client request I would
fully expect the surrounding code to have no need to mutate the data.
However, because this is a reusable buffer, invariant is not a safe option
because the contents of the buffer will change for each request. What
I would be inclined to do here is use const references to reflect this.
I've been thinking a lot about const and invariant recently and while
invariant strings seem quite handy for test code and the like, I have
not been able to think of a single production application where I would
actually be able to use them for the bulk of my string data, for the
reason mentioned above. Rather, I would expect to use 'const'
everywhere because what I generally care about is preventing a caller
or callee from changing the contents of my data. As for indicating
ownership, the following rule generally suffices:
char[] getData(); // result is mutable -- ownership is transferred
const(char)[] getData(); // result is const -- ownership not transferred
What I love about Steven's "scoped const" proposal is that it would allow
me to write a single instance of a library function that would work equally
well with any data, and the function would communicate its behavior within
the syntax. Add "scoped const" to D 1.0 plus the ability to use 'const' in
all the places it can be used in D 2.0 and I'd be a happy camper. Bonus
points for eliminating storage of static const (ie ROM-able) data and
dropping support for anonymous enum altogether.
Sean
P.S. The utility of 'invariant' for multiprogramming is a separate issue. I
actually think it's unnecessary there as well, but don't want the discussion
to get off track by addressing this at all. I'm merely adding this note so
no one will bring it up in response to what I said above.
Apr 28 2008
↑ ↓ ← → Lars Ivar Igesund <larsivar igesund.net> writes:
Sean Kelly wrote:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
Janice Caron wrote:
If there's enough interest, and if Walter approves, I could certainly
kickstart std.stringbuffer. Is that the right way to go? What do
people think?
to use mutable strings. I bet that, though, after a while they'll evolve
to eschew it in favor of immutable strings. It's easier than arguing
about it <g>.
I do agree with the notion that the majority of operations performed
on strings in a typical application do not modify the string in place.
However, in performance-oriented server applications, is it very
common to hold and reuse a mutable buffer between calls to avoid
the const of reallocation.
Indeed, in the application I'm currently writing at work, there is not a
single heap allocation after the startup phase. And it cannot be called
trivial in any sense.
--
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
Apr 28 2008
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Janice Caron wrote:
On 28/04/2008, Walter Bright <newshound1 digitalmars.com> wrote:
What it will do is provide a useful solution for those who really want to
use mutable strings. I bet that, though, after a while they'll evolve to
eschew it in favor of immutable strings.
I'm inclined to agree with the prediction - but even so, wouldn't that
be a good thing? I mean, if it keeps people on board with D2 who might
otherwise have run away, then that's good, right? And if those people
later realise they can do more with immutable strings, then that's
good too, right?
2008/4/28 Me Here <p9e883002 sneakemail.com>:
(I forget which module you have to import to get assumeUnique). But
what you mustn't ever do is cast away invariant.
1) Whatever happened to polymorphism?
What's polymorphism got to do with anything? A string is an array, not a
class.
So, if I assign the results of a string library function/method to a
mutable variable (Just a variable really. An invariant variable is a
constant!), then it should be possible (*IS* possible) to recognise
that and return an appropriate result.
Functions don't overload on return value.
They don't? Why not? Seems like a pretty obvious step to me.
Rather than having to have methods:
futzIt_returnString()
futzIt_returnInt()
futzIt_returnReal()
futzIt_returnComplex()
where 'futzIt' might me "read a string from the command line and return it
to me as some type (if possible)",
I can just do
int i = futzIt( ... );
real r = futzIt( ... );
And let the compiler work out which futzIt() I need to call, and take care
of mangling the names to allow them to coexists.
You mean D doesn't already have this facility?
Seems lie it would be a far more productive and useful expenditure of
effort than all this invariant stuff.
The idea that runtime obtained or derived strings can be made truly
invariant is purely theoretical.
But the fact that someone else might be sharing the data is not.
By "someone else" you mean 'another thread'?
If so, then if that is a possibility, if my code is using threads, then I,
the programmer,
will be aware of that and will be able to take appropriate choices.
I /might/ chose to use invariance to 'protect' this particular piece of
data from the problems
of shared state concurrency--if there is any possibility that I intend to
shared this particular piece of data.
But in truth, it is very unlikely that I *will* make /that/ choice. Here's
why.
What does it mean to make and hold multiple (mutated) copies of a single
entity?
That is, I obtain a piece of data from somewhere and make it invariant.
Somehow two threads obtain references to that piece of data.
If none of them attempt to change it, then it makes no difference that it
is marked invariant.
If however, one of them is programmed to change it, then it now has a
different,
version of that entity to the other thread. But what does that mean? Who
has the 'right' version?
Show me a real situation where two threads can legitimately be making
disparate modifications to a single entity,
string or otherwise, and I'll show you a programming error. Once two
threads make disparate modifications to an entity,
they are separate entities. And they should have been given copies, not
references to a single copy, in the first place.
If the intent is that the share a single entity, then any legitimate
modifications to that single entity should be reflected
in the views of that single entity by both threads. And therefore
subjected to locking, or STM or whatever mechanism is
used to control that modification.
This whole thing of invariance and concurrency seems to be aimed at
enabling the use of COW.
Which smacks of someone trying to emulate fork-like behaviours using
threads.
And if that is the case, and I very much hope it isn't, then let me tell
you as someone who is intimately familiar with the
one existing system that wen this route (iThreads: look'em up), that it is
a total disaster,
The whole purpose and advantage of multi-threading, over multi-processing,
is (mutable) shared state. And the elimination of
costs of serialisation and narrow bandwidth if IPC in the forking
concurrency mode. Attempting to emulate that model
using threading gives few of its advantages, all of its disadvantages, and
throws away all of the advantages of threading.
It is a complete and utter waste of time and effort.
If the aim is to simplify the use of threading for common programming
scenarios
and bring it within the grasp of non-threading specialist programmers,
then there are far more effective and less costly ways of achieving that.
But one of the
major attractions of D over C/C++ is its built-in string types
D has no built in string type. string is just an alias for
invariant(char)[].
Semantics.
D has built-in support for a string-type (see
http://www.digitalmars.com/d/2.0/overview.html) from which I quote:
"Strings"
"String manipulation is so common, and so clumsy in C and C++, that it
needs direct support in the language".
"Modern languages handle string concatenation, copying, etc., and so does
D".
"Strings are a direct consequence of improved array handling."
What invariant strings do, and as far as I can see the only significant
thing they do, is to reinvent the clumsiness
of C & C++ by making strings a second-class data type again.
If the point is to try and make threading easier, it will fail miserably
once people realise that it creates the scope for
multiple concurrent versions of supposedly single entities. Which breaks
just about every programming rule in the book,
and creates scope for far more intractable errors than it fixes.
That's one approach. Another is don't try to treat strings as mutable.
If the intention of invariance is some move toward OO or functional
purity, then I again quote from the same document:
"Who D is Not For"
[some categories elided]
"Language purists. D is a practical language, and each feature of it is
evaluated in that light, rather than by an ideal. "
"For example, D has constructs and semantics that virtually eliminate the
need for pointers for ordinary tasks. "
"But pointers are still there, because sometimes the rules need to be
broken."
"Similarly, casts are still there for those times when the typing system
needs to be overridden."
Cheers, b.
--
Apr 28 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
Janice Caron wrote:
Functions don't overload on return value.
Type inference in D is done "bottom up". Doing overloading based on
function return type is "top down". Trying to get both schemes to
coexist is a hard problem.
The idea that runtime obtained or derived strings can be made truly
invariant is purely theoretical.
No, it could be the same thread, via another alias to the same data.
Using invariant strings allows the programmer to treat them as if they
were value types and being copied for every use (like ints are), except
they don't need to be actually copied.
With mutable strings, one always has to be careful to keep track of who
'owns' the string, and who has references to it. When mutating the
string, one must manually ensure that there are no other references to
it that would be surprised by the data changing. For example, if you
insert a string into a symbol table, and then later some other reference
to that string changes it, it could wind up corrupting the symbol table.
The point about the main(char[][] args) and modifying those strings
in-place is very valid - nothing is said about where those strings
actually reside, and who else may have references to the same data, and
whether you can modify them with impunity or not. You could argue "this
should be better documented" and you'd be right, but if the declaration
instead said main(invariant(char[])args) then I *know* that I am not
allowed to change them, and whoever calls main() *knows* that those arg
strings won't get changed. We can both sleep comfortably.
Invariant strings offer a guarantee that the data won't change, which
clarifies the API of the functions. (Whenever I see an API function that
takes a char*, say putenv(), it rarely says whether it saves a copy of
the data or saves a copy of the reference. That just sucks.)
If so, then if that is a possibility, if my code is using threads, then
I, the programmer,
will be aware of that and will be able to take appropriate choices.
I /might/ chose to use invariance to 'protect' this particular piece of
data from the problems
of shared state concurrency--if there is any possibility that I intend
to shared this particular piece of data.
But in truth, it is very unlikely that I *will* make /that/ choice.
Here's why.
What does it mean to make and hold multiple (mutated) copies of a single
entity?
That is, I obtain a piece of data from somewhere and make it invariant.
Somehow two threads obtain references to that piece of data.
If none of them attempt to change it, then it makes no difference that
it is marked invariant.
If however, one of them is programmed to change it, then it now has a
different,
version of that entity to the other thread. But what does that mean? Who
has the 'right' version?
Show me a real situation where two threads can legitimately be making
disparate modifications to a single entity,
string or otherwise, and I'll show you a programming error. Once two
threads make disparate modifications to an entity,
they are separate entities. And they should have been given copies, not
references to a single copy, in the first place.
If the intent is that the share a single entity, then any legitimate
modifications to that single entity should be reflected
in the views of that single entity by both threads. And therefore
subjected to locking, or STM or whatever mechanism is
used to control that modification.
This whole thing of invariance and concurrency seems to be aimed at
enabling the use of COW.
Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at
its core?
And if that is the case, and I very much hope it isn't, then let me tell
you as someone who is intimately familiar with the
one existing system that wen this route (iThreads: look'em up), that it
is a total disaster,
ithreads copies the entire user data per thread. Using invariant is, of
course, a way to avoid copying the data.
The whole purpose and advantage of multi-threading, over
multi-processing, is (mutable) shared state. And the elimination of
costs of serialisation and narrow bandwidth if IPC in the forking
concurrency mode. Attempting to emulate that model
using threading gives few of its advantages, all of its disadvantages,
and throws away all of the advantages of threading.
It is a complete and utter waste of time and effort.
"Walter Bright" <newshound1 digitalmars.com> wrote in message
news:48169E90.6050700 digitalmars.com...
Me Here wrote:
Janice Caron wrote:
Functions don't overload on return value.
Type inference in D is done "bottom up". Doing overloading based on
function return type is "top down". Trying to get both schemes to coexist
is a hard problem.
But a function's result can be overloaded using "out", so why can't it be
overloaded using the return value?
Can't the compiler treat a return value as an implicit out argument?
L.
Apr 29 2008
↑ ↓← → Walter Bright <newshound1 digitalmars.com> writes:
Lionello Lunesu wrote:
"Walter Bright" <newshound1 digitalmars.com> wrote in message
news:48169E90.6050700 digitalmars.com...
Me Here wrote:
Janice Caron wrote:
Functions don't overload on return value.
Type inference in D is done "bottom up". Doing overloading based on
function return type is "top down". Trying to get both schemes to
coexist is a hard problem.
But a function's result can be overloaded using "out", so why can't it
be overloaded using the return value?
We know what the type of the out argument is. The problem with return
value overloading is not knowing what the type should be.
Can't the compiler treat a return value as an implicit out argument?
Suppose the return value is used as an argument to another function with
overloaded versions. Rinse, repeat. The combinations grow out of control.
"Walter Bright" <newshound1 digitalmars.com> wrote in message
news:48169E90.6050700 digitalmars.com...
Me Here wrote:
Janice Caron wrote:
Functions don't overload on return value.
Type inference in D is done "bottom up". Doing overloading based on
function return type is "top down". Trying to get both schemes to
coexist is a hard problem.
But a function's result can be overloaded using "out", so why can't it
be overloaded using the return value?
Can't the compiler treat a return value as an implicit out argument?
Consider this:
int foo();
float foo();
void bar(int a);
void bar(float a);
Then this:
void main()
{
bar(foo());
}
There is an obvious problem here.
Type inference in D is done "bottom up". Doing overloading based on
function return type is "top down". Trying to get both schemes to
coexist is a hard problem.
But a function's result can be overloaded using "out", so why can't it be
overloaded using the return value?
Can't the compiler treat a return value as an implicit out argument?
Consider this:
int foo();
float foo();
void bar(int a);
void bar(float a);
Then this:
void main()
{
bar(foo());
}
There is an obvious problem here.
Yes, one that is solved like any other that has ambiguity: casting. We will
have the same problem when opImplicitCast is introduced.
This seems like a rare case anyways, not a reason not to have overloaded
return values.
-Steve
Apr 29 2008
↑ ↓ ← → "Hans W. Uhlig" <huhlig gmail.com> writes:
"Walter Bright" <newshound1 digitalmars.com> wrote in message
news:48169E90.6050700 digitalmars.com...
Me Here wrote:
Janice Caron wrote:
Functions don't overload on return value.
Type inference in D is done "bottom up". Doing overloading based on
function return type is "top down". Trying to get both schemes to
coexist is a hard problem.
But a function's result can be overloaded using "out", so why can't it
be overloaded using the return value?
Can't the compiler treat a return value as an implicit out argument?
Consider this:
int foo();
float foo();
void bar(int a);
void bar(float a);
Then this:
void main()
{
bar(foo());
}
There is an obvious problem here.
One of two things, make an assumption as to which is called by which has
the higher priority(based on precision or type). Or throw a compiler
error if no cast is made.
Overload Ambiguity, Cast Must be made when both return overload and
parameter overload types are ambigious.
Of course, a wasn't really mutated. Instead, args[0] was copied and then
mutated and labelled a. Then a was copied and mutated and reassigned the
mutated copy.
So, that's two copies of the string, plus a slice, plus an extra method
call to
achieve what used to be achievable in place on the original string.
Which is now
immutable, but I'll never need it again.
Of course, on these short 1-off strings it doesn't matter a hoot. But
when the
strings are 200 to 500 characters a pop and there are 20,000,000 of
them. It
matters.
Did I suggest this was an optimisation?
Whatever immutability-purity cool aid you've been drinking, please go
back to
coke. And give us usable libraries and sensible implicit conversions.
Cos this sucks
bigtime.
b.
Is this what you wanted to write?
int main(string[] args)
{
char[] a = cast(char[])args[0];
a[2..5] = "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there's a
cast there, but it's not all that bad, is it?
Or just add a dup.
int main(string[] args)
{
char[] a = args[0].dup;
a[2..5] = "XXX";
writefln(a);
return 0;
}
Apr 27 2008
↑↓←→ Bill Baxter <dnewsgroup billbaxter.com> writes:
Simen Kjaeraas wrote:
<p9e883002 sneakemail.com> wrote:
Of course, a wasn't really mutated. Instead, args[0] was copied and then
mutated and labelled a. Then a was copied and mutated and reassigned the
mutated copy.
So, that's two copies of the string, plus a slice, plus an extra
method call to
achieve what used to be achievable in place on the original string.
Which is now
immutable, but I'll never need it again.
Of course, on these short 1-off strings it doesn't matter a hoot. But
when the
strings are 200 to 500 characters a pop and there are 20,000,000 of
them. It
matters.
Did I suggest this was an optimisation?
Whatever immutability-purity cool aid you've been drinking, please go
back to
coke. And give us usable libraries and sensible implicit conversions.
Cos this sucks
bigtime.
b.
Is this what you wanted to write?
int main(string[] args)
{
char[] a = cast(char[])args[0];
a[2..5] = "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there's a
cast there, but it's not all that bad, is it?
I'm no invariant guru, but I don't think that's legal. 'invariant'
means the data could be stored in a portion of memory that the OS will
not allow the program to write to. So you need to dup it:
char[] a = args[0].dup;
a[2..5] = "XXX";
writefln(a);
return 0;
That stuff like this compiles and seems to work is why we really need to
make at least one alternative version of cast. One would be for
relative safe run-of-the-mill casts, like casting float to int, or
casting Object to some class (and checking for null), and the other
category would be for dangerous big red flags kind of things like the
above. Using the run-of-the-mill cast in the above situation would not
be allowed.
--bb
Apr 27 2008
↑ ↓← → Tomas Lindquist Olsen <tomas famolsen.dk> writes:
Bill Baxter wrote:
... snip ...
That stuff like this compiles and seems to work is why we really need to
make at least one alternative version of cast. One would be for
relative safe run-of-the-mill casts, like casting float to int, or
casting Object to some class (and checking for null), and the other
category would be for dangerous big red flags kind of things like the
above. Using the run-of-the-mill cast in the above situation would not
be allowed.
"Bill Baxter" <dnewsgroup billbaxter.com> wrote in message
news:fv3612$sgu$1 digitalmars.com...
That stuff like this compiles and seems to work is why we really need to
make at least one alternative version of cast. One would be for relative
safe run-of-the-mill casts, like casting float to int, or casting Object
to some class (and checking for null), and the other category would be
for dangerous big red flags kind of things like the above. Using the
run-of-the-mill cast in the above situation would not be allowed.
That request has been on the "unofficial wish list" since the beginning..
And I still agree with it.
Maybe cast() should be parsed as a template. Then, the compiler should
require more "!"s as the risc increases:
SomeClass sc = cast(SomeClass)some_obj; //OK
int i = cast!(int)some_float; //might not fit
SomeClass sc = cast!!(SomeClass)void_ptr; //unsafe
char[] mutstring = cast!!!!!!!!(char[])toUpper("..."); //wtf are you doing!
L.
<p9e883002 sneakemail.com> wrote:
Is this what you wanted to write?
int main(string[] args)
{
char[] a =3D cast(char[])args[0];
a[2..5] =3D "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there's=
a
cast there, but it's not all that bad, is it?
No. You missed out uppercasing the string before replacing the slice.
<p9e883002 sneakemail.com> wrote:
Is this what you wanted to write?
int main(string[] args)
{
char[] a =3D3D cast(char[])args[0];
a[2..5] =3D3D "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, =
there's=3D
a
cast there, but it's not all that bad, is it?
No. You missed out uppercasing the string before replacing the slice.
That's why I replied to my own post stating just that.
Anyways, Gide got it right. A .dup is the correct way, a cast is wrong.
-- Simen
Apr 27 2008
↑ ↓ ← → Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
int main(string[] args)
{
char[] a = cast(char[])args[0];
a[2..5] = "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there's a
cast there, but it's not all that bad, is it?
Yes, it's extremely bad. Casting away invariant is UNDEFINED BEHAVIOR,
and should never be done.
It's not merely undefined, it's *illegal*!
I hate the C/C++ tradition of calling "undefined behavior" to things
that are *illegal*. Yes, illegal behavior causes undefined behavior, but
they're not the same thing. Illegal is something that may cause your
program to crash, or simply become in a fault and erroneous state.
Undefined is just undefined. For example, this expression in C:
a = (x++) + x*2;
has undefined behavior (because of order of evaluation issues). But it's
not *illegal* behavior, your program will not crash and burn because of
that.
--
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
int main(string[] args)
{
char[] a = cast(char[])args[0];
a[2..5] = "XXX";
writefln(a);
return 0;
}
This compiles and runs, and seems to do what you describe. Sure, there's a
cast there, but it's not all that bad, is it?
Yes, it's extremely bad. Casting away invariant is UNDEFINED BEHAVIOR,
and should never be done.
You should never need an explicit cast just to handle text!
import std.string;
int main( string[] args) {
char[] a = cast(char[])args[0].toupper();
**** UNDEFINED BEHAVIOR ****
(1) args might be placed in a hardware-locked read-only segment. Then
the following line would fail
(2) there might be other pointers to the string, which expect it never
to change.
a[2..5] = "XXX";
a = cast(char[])tolower( cast(string)a );
writefln(a);
return 0;
}
Finally. It works.
But not necessarily on all architectures, because of the undefined
behavior. This is how you do it without undefined behavior.
import std.string;
int main( string[] args) {
string a = args[0].toupper();
a = a[0..2] ~ "XXX" ~ a[5..$];
a = a.tolower();
writefln(a);
return 0;
}
Hi all,
'scuse me for not being familiar with previous or ongoing discussion on
this
subject, but I'm just coming back to D after a couple of years away.
I have some strings read in from external source that I need to convert to
uppercase. A quick look at Phobos and I find std.string has a toupper
method.
<very good example case removed>
This is all not an issue if Walter adopts 'scoped const' contracts.
http://d.puremagic.com/issues/show_bug.cgi?id=1961
The current con for this method is that it is another 'confusing' const
syntax. So is what I propose more confusing, or is what this poor developer
had to go through more confusing?
-Steve
Apr 28 2008
↑ ↓ ← → Sean Kelly <sean invisibleduck.org> writes:
== Quote from Janice Caron (caron800 googlemail.com)'s article
2008/4/28 Steven Schveighoffer <schveiguy yahoo.com>:
> I have some strings read in from external source that I need to convert to
> uppercase. A quick look at Phobos and I find std.string has a toupper
> method.
> <very good example case removed>
This is all not an issue if Walter adopts 'scoped const' contracts.
invariant version should employ copy-on-write, wheras any other
versions would not be able to do this.
That is,
toupper("HELLO");
can return the original, if and only if the string is invariant.
Can you explain this in light of Steven's 'scoped const' proposal? By my
understanding (assuming scoped const):
string bufI = "HELLO";
char[] bufM = "HELLO".dup;
const(char)[] bufC = bufM;
string retI = toupper( bufI ); // return value is invariant - ok
char[] retM = toupper( bufM ); // return value is mutable - ok
const(char)[] retC = toupper( bufC ); // return value is const - ok
const(char)[] retC2 = toupper( bufI ); // return value is invariant - ok
bufM[0] = 'J';
assert( retC[0] == 'J' );
The above seems perfectly fine, because it's impossible to pass a mutable
array and return a const reference to it--the return value will be mutable as
well.
By contrast, let's assume the invariant implementation:
string toupper( string buf );
char[] buf = "HELLO".dup;
toupper( buf ); // fails
toupper( buf.idup ); // works
toupper( assertUnique( buf ) ); // works
In the first case I have to copy buf to pass it to toupper, and in the second I
have
to perform a cast operation (albeit wrapped in a function to hide the truth).
Assuming for a moment that mutable strings are useful and so I won't be able to
use the 'string' alias all the time, can you explain what is good about either
of
these scenarios?
Sean
Ack! That's horrible. Instead of using the information I have, the
offset and length of the slice I want to manipulate, I have to derive
two offset/length pairs to the bits I do not want to do anything to.
Not necessarily. Instead of
a = a[0..2] ~ "XXX" ~ a[5..$];
you could do
char[] tmp = a.dup;
tmp[2..5] = "XXX";
a = assumeUnique(tmp);
(I forget which module you have to import to get assumeUnique). But
what you mustn't ever do is cast away invariant.
1) Whatever happened to polymorphism?
What's polymorphism got to do with anything? A string is an array, not a class.
So, if I assign the results of a string library function/method to a
mutable variable (Just a variable really. An invariant variable is a
constant!), then it should be possible (*IS* possible) to recognise
that and return an appropriate result.
Functions don't overload on return value.
The idea that runtime obtained or derived strings can be made truely
invariant is purely theoretical.
But the fact that someone else might be sharing the data is not.
But one of the
major attractions of D over C/C++ is its built-in string types
D has no built in string type. string is just an alias for invariant(char)[].
If everytime I want to use one
of these library calls, I have to cast my mutable string into and
invariant and then cast the result back to mutable
That's one approach. Another is don't try to treat strings as mutable.
2008/4/28 Steven Schveighoffer <schveiguy yahoo.com>:
> I have some strings read in from external source that I need to convert to
> uppercase. A quick look at Phobos and I find std.string has a toupper
> method.
> <very good example case removed>
This is all not an issue if Walter adopts 'scoped const' contracts.
toupper() couldn't be reused for all constancies, because the
invariant version should employ copy-on-write, wheras any other
versions would not be able to do this.
That is,
toupper("HELLO");
can return the original, if and only if the string is invariant.
> I have some strings read in from external source that I need to
convert to
> uppercase. A quick look at Phobos and I find std.string has a toupper
> method.
> <very good example case removed>
This is all not an issue if Walter adopts 'scoped const' contracts.
toupper() couldn't be reused for all constancies, because the
invariant version should employ copy-on-write, wheras any other
versions would not be able to do this.
That is,
toupper("HELLO");
can return the original, if and only if the string is invariant.
toupper is probably a bad example, as your case seems like the rarest :)
But I understand what you are saying.
The desire to have string processing functions work with all constancies
seems very reasonable and useful to me. To deny usage of toupper unless you
idup the array, just to have the ability to optimize on a corner case seems
incorrect, and to probably produce less efficient code for 90% of the cases.
If the scoped const proposal was never accepted, and I used Phobos, I'd
probably suggest a const and mutable version of toupper that allowed for
those of us who use mutable strings a lot, and maybe not so much
multithreadding, to not have to jump through hoops for any string
processing.
Maybe the solution to this is to write specializations which use COW with
the invariant version, perhaps with pure functions, which always assume
invariant parameters. So you have a pure toupper which handles the
invariant version, and a scoped const version which allows using the
function on non-invariant parameters, which can't be optimized the same
anyways...
-Steve
On 28/04/2008, Me Here <p9e883002 sneakemail.com> wrote:
Ah! Again, 3 lines instead of 1. Plus two function calls and a temporary
variable.
To be fair though, the problem here is that the functions you are
calling (std.string.toupper and std.string.tolower) don't do what you
want. This is not a fault of the language - it's a limitation of the
library.
To that end, as others have said, this problem could be solved simply
enough by the addition of another module - say, std.stringbuffer - in
which we alias char[] to stringbuffer (or maybe a StringBuffer class -
I'm not sure what's best) and provide a whole bunch of functions
optimized for those mutable char arrays.
To blame the language for the lack of library is the wrong approach.
D2 has some killer, kickass features. The template metaprogramming
power alone is enough to make C++ programmers weep. I'm looking
forward to pure functions, and a new generation of multithreading.
If there's enough interest, and if Walter approves, I could certainly
kickstart std.stringbuffer. Is that the right way to go? What do
people think?
Apr 28 2008
↑↓←→ Walter Bright <newshound1 digitalmars.com> writes:
p9e883002 sneakemail.com wrote:
So, that's two copies of the string, plus a slice, plus an extra method call
to
achieve what used to be achievable in place on the original string. Which is
now
immutable, but I'll never need it again.
Of course, on these short 1-off strings it doesn't matter a hoot. But when the
strings are 200 to 500 characters a pop and there are 20,000,000 of them. It
matters.
Did I suggest this was an optimisation?
You bring up a good point.
On a tiny example such as yours, where you can see everything that is
going on at a glance, such as where strings come from and where they are
going, there isn't any point to immutable strings. You're right about that.
The problems start happening as the complexity rises. Strings get passed
around, stored, modified, etc. It's real easy to lose track of who owns
a string, who else has references to the string, who has rights to
change the string and who doesn't.
For example, you're changing the char[][] passed in to main(). What if
one of those strings is a literal in the read-only data section?
So what happens is code starts defensively making copies of the string
"just in case." I'll argue that in a complex program, you'll actually
wind up making far more copies than you will with invariant strings.
So, that's two copies of the string, plus a slice, plus an extra method
call to achieve what used to be achievable in place on the original
string. Which is now immutable, but I'll never need it again. Of course,
on these short 1-off strings it doesn't matter a hoot. But when the
strings are 200 to 500 characters a pop and there are 20,000,000 of them.
It matters.
Did I suggest this was an optimisation?
You bring up a good point.
On a tiny example such as yours, where you can see everything that is
going on at a glance, such as where strings come from and where they are
going, there isn't any point to immutable strings. You're right about
that.
The problems start happening as the complexity rises. Strings get passed
around, stored, modified, etc. It's real easy to lose track of who owns a
string, who else has references to the string, who has rights to change
the string and who doesn't.
For example, you're changing the char[][] passed in to main(). What if one
of those strings is a literal in the read-only data section?
So what happens is code starts defensively making copies of the string
"just in case." I'll argue that in a complex program, you'll actually wind
up making far more copies than you will with invariant strings.
I agree that immutable strings can be valuable. That's why I think it's
important to have a version of toupper that uses invariant strings because
you can make more assumptions about when to make copies. But why shouldn't
there be a version that does the same thing with mutable or const strings?
Why should a developer be forced to always use invariant strings when the
optimizations and multithreading benefits that come with only using
invariant strings may not be more important for a particular program than
being able to modify a string? I should still be able to use toupper on
mutable strings as well...
-Steve
Apr 28 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
I agree that immutable strings can be valuable. That's why I think it's
important to have a version of toupper that uses invariant strings because
you can make more assumptions about when to make copies. But why shouldn't
there be a version that does the same thing with mutable or const strings?
Why should a developer be forced to always use invariant strings when the
optimizations and multithreading benefits that come with only using
invariant strings may not be more important for a particular program than
being able to modify a string? I should still be able to use toupper on
mutable strings as well...
That's why I agreed with Janice on making a stringbuffer module that
operates on mutable strings. It's easier than arguing about it, and it
doesn't hurt to have such a package. And I suspect that after using it
for a while, people will naturally evolve towards using all invariant
strings.
Apr 28 2008
↑ ↓←→ Lars Ivar Igesund <larsivar igesund.net> writes:
Walter Bright wrote:
Steven Schveighoffer wrote:
I agree that immutable strings can be valuable. That's why I think it's
important to have a version of toupper that uses invariant strings
because
you can make more assumptions about when to make copies. But why
shouldn't there be a version that does the same thing with mutable or
const strings? Why should a developer be forced to always use invariant
strings when the optimizations and multithreading benefits that come with
only using invariant strings may not be more important for a particular
program than
being able to modify a string? I should still be able to use toupper on
mutable strings as well...
That's why I agreed with Janice on making a stringbuffer module that
operates on mutable strings. It's easier than arguing about it, and it
doesn't hurt to have such a package. And I suspect that after using it
for a while, people will naturally evolve towards using all invariant
strings.
After working with Java for quite some time, I have naturally drifted from
using invariant strings to stringbuffers.
--
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
Apr 28 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Lars Ivar Igesund wrote:
After working with Java for quite some time, I have naturally drifted from
using invariant strings to stringbuffers.
Java strings lack slicing, so they're crippled anyway. I believe that
slicing is one of those paradigm-shifting features, so I am not making
an irrelevant point.
After working with Java for quite some time, I have naturally drifted
from
using invariant strings to stringbuffers.
Java strings lack slicing, so they're crippled anyway. I believe that
slicing is one of those paradigm-shifting features, so I am not making an
irrelevant point.
Java's String.substring(start, last) works just like slicing...
Not that I don't love D slicing above calling a function, but saying that
Java doesn't have slicing is completely false.
Where they lack is in the support of mutable strings, and especially having
strings be treated as native arrays. D excels in those areas.
-Steve
Apr 28 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
Java's String.substring(start, last) works just like slicing...
No it doesn't. It makes a copy (I don't know if this is true of *all*
versions of Java).
Java's String.substring(start, last) works just like slicing...
No it doesn't. It makes a copy (I don't know if this is true of *all*
versions of Java).
A String holds an char[], the "start" in it and it's "length". A
substring just creates another String instance with "start" and "length"
changed.
So it makes a new String, but the underlying char[] remains the same.
Apr 28 2008
↑ ↓ ←→ Robert Fraser <fraserofthenight gmail.com> writes:
Walter Bright wrote:
Steven Schveighoffer wrote:
Java's String.substring(start, last) works just like slicing...
No it doesn't. It makes a copy (I don't know if this is true of *all*
versions of Java).
Java's 6's string.substring method (JDK 1.6.0_04, 64-bit Windows):
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex -beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}
The important part is new String(offset + beginIndex, endIndex -
beginIndex, value) which does indeed do a "slice" of sorts (that is, it
returns a string with the same char array backing it with a new offset
and length). No copying of data is done.
Apr 28 2008
↑ ↓← → Sean Kelly <sean invisibleduck.org> writes:
== Quote from Robert Fraser (fraserofthenight gmail.com)'s article
Walter Bright wrote:
Steven Schveighoffer wrote:
Java's String.substring(start, last) works just like slicing...
No it doesn't. It makes a copy (I don't know if this is true of *all*
versions of Java).
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex -beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}
The important part is new String(offset + beginIndex, endIndex -
beginIndex, value) which does indeed do a "slice" of sorts (that is, it
returns a string with the same char array backing it with a new offset
and length). No copying of data is done.
Right. The issue in Java is that the String wrapper class is still allocated
on the heap so DMA is still occurring. D, on the other hand, uses a fat
reference so creating a slice doesn't touch the heap at all.
Sean
Apr 28 2008
↑↓← → Walter Bright <newshound1 digitalmars.com> writes:
Robert Fraser wrote:
The important part is new String(offset + beginIndex, endIndex -
beginIndex, value) which does indeed do a "slice" of sorts (that is, it
returns a string with the same char array backing it with a new offset
and length). No copying of data is done.
Yes, you are right. I was wrong. But Java is still new'ing a new
instance of String for each slice. And it still uses two levels of
indirection to get to the string data.
Apr 28 2008
↑ ↓ ← → Christopher Wright <dhasenan gmail.com> writes:
Robert Fraser wrote:
The important part is new String(offset + beginIndex, endIndex -
beginIndex, value) which does indeed do a "slice" of sorts (that is, it
returns a string with the same char array backing it with a new offset
and length). No copying of data is done.
Sun has it right. GNU Classpath has it wrong and copies the data every time.
Apr 28 2008
↑ ↓ ← → Lars Ivar Igesund <larsivar igesund.net> writes:
Walter Bright wrote:
Lars Ivar Igesund wrote:
After working with Java for quite some time, I have naturally drifted
from using invariant strings to stringbuffers.
Java strings lack slicing, so they're crippled anyway. I believe that
slicing is one of those paradigm-shifting features, so I am not making
an irrelevant point.
I agree that Java strings are crippled, but considering that String is
easier to use there than StringBuffer, I certainly would need good reasons
to prefer the latter? And I have.
Your point about slicing may not be irrelevant, but the kickass-ness of the
feature only truly comes to its right when combined with non-allocating
string operations.
--
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
Apr 28 2008
↑ ↓ ← → Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Walter Bright wrote:
Steven Schveighoffer wrote:
I agree that immutable strings can be valuable. That's why I think
it's important to have a version of toupper that uses invariant
strings because you can make more assumptions about when to make
copies. But why shouldn't there be a version that does the same thing
with mutable or const strings? Why should a developer be forced to
always use invariant strings when the optimizations and multithreading
benefits that come with only using invariant strings may not be more
important for a particular program than being able to modify a
string? I should still be able to use toupper on mutable strings as
well...
That's why I agreed with Janice on making a stringbuffer module that
operates on mutable strings. It's easier than arguing about it, and it
doesn't hurt to have such a package. And I suspect that after using it
for a while, people will naturally evolve towards using all invariant
strings.
"people will naturally evolve towards using all invariant strings."
Oh please. This whole discussion between "Me here" and Walter was always
occurring under the notion that one either has to use all mutable
strings, or all invariant strings, which is a silly idea. Use what is
right for what you are trying to do!
The original post code was a clear-cut example of invariant misuse. If
you are going to make one or several different mutations to a string, do
not use invariant, use mutable. The fact that there isn't a
mutable/in-place tolower has no bearing on the const/invariant system
(only on the Phobos library design). So if you had any quarrel, it
wasn't with D's immutability system, but with library design (which
Walter already said he plans to fix... at least on what std.string is
concerned).
And Walter, people won't "naturally evolve towards using all invariant
strings" (nor they should). If I have a function where I'm going to
perform a series of changes to a string, I'm not going to dup them with
each change just to say "How cute, I'm using invariant all the way!".
I'll do all the changes on a mutable string, and they return either a
mutable, const, or invariant string, as appropriate to what makes sense
in the code.
--
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sorry to have provoked you Walter, but thanks for your reply.
On a tiny example such as yours, where you can see everything that is
going on at a glance, such as where strings come from and where they are
going, there isn't any point to immutable strings. You're right about that.
Well obviously the example was trivial to concentrate attention upon the
issue I was having.
It's real easy to lose track of who owns a string, who else has references to
the string, who has rights to change the string and who doesn't.
The keyword in there is "who". The problem is that you are pessimising the
entire language, once rightly famed for it's performance, for *all* users.
For the notional convenience of those few writing threaded applications.
Now don't go taking that the wrong way. In other circles, I am known as
"Mr. Threading". At least for my advocacy of them, if not my expertise.
Though I have been using threads for a relatively long time, going way
back to pre-1.0 OS/2 (then known internally as CP/DOS). Only mentioned to
show I'm not in the "thread is spelt f-o-r-k" camp.
For example, you're changing the char[][] passed in to main(). What if one
of those strings is a literal in the read-only data section?
Okay. So that begs the question of how does runtime external data end up
in a read-only data section? Of course, it can be done, but that then begs
the question: why? But let's ignore that for now and concentrate on the
development on my application that wants to mutate one or more of those
strings.
The first time I try to mutate one, I'm going to hit an error, either
compile time or runtime, and immediately know, assuming the error message
is reasonably understandable, that I need to make a copy of the immutable
to string into something I can mutate. A quick, *single* dup, and I'm away
and running.
Provided that I have the tools to do what I need that is. In this case,
and the entire point of the original post, that means a library of common
string manipulation functions that work on my good old fashioned char[]s
without my needing jump through the hoops of neo-orthodoxy to use them.
But, as I tried to point out in the post to which you replied, the whole
'args' thing is a red herring. It was simply a convenient source of
non-compile-time data. I couldn't get the std.stream example to compile.
Apparently due to a bug in the v2 libraries--see elsewhere.
In this particular case, I turned to D in order to manipulate 125,000,000
x 500 to 2000 byte strings. A dump of a inverted index DB. I usually do
this kinda stuff in a popular scripting language, but that proved to be
rather too slow for this volume of data. Each of those records needs to go
through multiple mutations. From uppercasing of certain fields; the
complete removal of certain characters within substantial subsets of each
record; to the recalculation and adjustment of an embedded hex digest
within each record to reflect the preceding changes. All told, each record
my go through anything from 5 to 300 separate mutations.
Doing this via immutable buffers is going to create scads and scads of
short-lived, immutable sub-elements that will just tax the GC to hell and
impose unnecessary and unacceptable time penalties on the process. And I
almost certainly will have to go through the process many times before I
get the data in the ultimate form I need.
So what happens is code starts defensively making copies of the string
"just in case." I'll argue that in a complex program, you'll actually wind
up making far more copies than you will with invariant strings.
[from another post] I bet that, though, after a while they'll evolve to
eschew it in favor of immutable strings. It's easier than arguing about it
You are so wrong here. I spent 2 of the worst years of my coding career
working in Java, and ended up fighting it all the way. Whilst some of that
was due to their sudden re-invention of major parts of the system
libraries in completely incompatible ways when the transition from (from
memory) 1.2 to 1.3 occurred--and being forced to make the change because
of the near total abandonment of support or bug fixing for the 'old
libraries'. Another big part of the problem was the endless complexities
involved in switching between the String type and the StringBuffer type.
Please learn from history. Talk to (experienced) Java programmers. I mean
real working stiffs, not OO-purists from academia. Preferably some that
have experience of other languages also. It took until v1.5 before the
performance of Java--and the dreaded GC pregnant pause--finally reached a
point where Java performance for manipulating large datasets was both
reasonable, and more importantly, reasonably deterministic. Don't make
their mistakes over.
Too many times in the last thirty years I've seen promising, pragmatic
software technologies tail off into academic obscurity because th primary
motivators suddenly "got religion". Whether OO dogma or functional purity
or whatever other flavour of neo-orthodoxy became flavour de jour, The
assumption that "they'll see the light eventually" has been the downfall
of many a promising start.
Just as the answer to the occasional hit-and-run death is not banning
cars, so fixing unintentional aliasing in threaded applications does not
lie in forcing all character arrays to be immutable.
For one reason, it doesn't stop there. Character arrays, are just arrays
of numbers. Exactly the same problems arise with arrays of integers,
reals, associative arrays. etc. Imagine the costs of duplicating an entire
hash every time you add a new key or alter a value. The penalties grow
exponentially with the size of the hash (array of ints, longs, reals ...).
And before you reject this notion on the basis that "I'd never do that",
what's the difference? Are strings any more vulnerable to the problems
invariance is meant to tackle that these other datatypes?
Try manipulating large datasets--images, DNA data, signal processing,
finite element analysis; any of the types of applications for which
multi-threading isn't just a way allow the program to do something useful
while the user decides which button to click--in any of the "referentially
transparent" languages that are concurrency capable and see the hoops you
have to leap through to achieve anything like descent performance. Eg.
Haskell Unsafe* library routines (Basically, abandon referential
transparency for this data so that we can get something done in a
reasonable time frame!). Look for "If you can match 1-core C speed using
4-core Haskell parallelism without "unsafe pseudo-C in Haskell" trickery,
I will be impressed. ..." in the following article:
http://reddit.com/r/programming/info/61p6f/comments/
The abandonment or deprecation of lvalue slices on string types is the
thin end of the wedge toward referential transparency and despite all the
academic hype and impressive (small scale) demos of the 'match made in
heaven' that is 'referential transparency & concurrency', try to seek out
real-world examples of the combination running in real-world environments.
Ie. Where someone other than the tax-payer of whatever country is paying
for the development, and the time pressure to obtain the results are a
little more demanding than Thesis submission date and you'll find them
very conspicuous by their absence.
Such ideas look great on paper, in the heady world of ideal Turing
Machines with unlimited length tapes (unbounded memory). But once you
bring them back to the real world of finite RAM, fragmentable heaps and
GC, they becomes impractical. Unworkable for real data sets in real time.
Don't feel the need to argue this on-forum. If it hasn't persuaded you
that forcing invariance upon one datatype, through providing a string
library that only work with invariant strings, will do little to address
the problems it attempts to solve, then I doubt further discussion will.
Please return to the pragmatism that so stood out in your early visions
for D and abandon this folly before, as with so many of the follies of the
gentleman academic of yore, it becomes a life-long quest ending up as a
memorial or tombstone.
Cheers, b.
--
Apr 28 2008
↑ ↓← → Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
Don't feel the need to argue this on-forum. If it hasn't persuaded you
that forcing invariance upon one datatype, through providing a string
library that only work with invariant strings, will do little to address
the problems it attempts to solve, then I doubt further discussion will.
There's always Tango :p
Please return to the pragmatism that so stood out in your early visions
for D and abandon this folly before, as with so many of the follies of the
gentleman academic of yore, it becomes a life-long quest ending up as a
memorial or tombstone.
As a point of interest, this quote is at the top of the DigitalMars D page:
"It seems to me that most of the "new" programming languages fall into one
of two categories: Those from academia with radical new paradigms and those
from large corporations with a focus on RAD and the web. Maybe it's time for a
new language born out of practical experience implementing compilers." --
Michael
Sean
Apr 28 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
Just as the answer to the occasional hit-and-run death is not banning
cars, so fixing unintentional aliasing in threaded applications does not
lie in forcing all character arrays to be immutable.
D does not force all character arrays to be immutable. You can use
mutable ones by declaring them as:
char[]
Reference types all come in 3 flavors: mutable, read-only-view-of (i.e.
const) and invariant.
Just as the answer to the occasional hit-and-run death is not banning
cars, so fixing unintentional aliasing in threaded applications does not
lie in forcing all character arrays to be immutable.
D does not force all character arrays to be immutable. You can use mutable
ones by declaring them as:
char[]
Reference types all come in 3 flavors: mutable, read-only-view-of (i.e.
const) and invariant.
Well no, but having lhe string libraries only accept and return invariant
strings it amounts to much the same thing.
I'm disappointed that's the only point from my post worthy of reaction :(
--
Apr 28 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
Walter Bright wrote:
Me Here wrote:
Just as the answer to the occasional hit-and-run death is not
banning cars, so fixing unintentional aliasing in threaded
applications does not lie in forcing all character arrays to be
immutable.
D does not force all character arrays to be immutable. You can use
mutable ones by declaring them as:
char[]
Reference types all come in 3 flavors: mutable, read-only-view-of
(i.e. const) and invariant.
Well no, but having lhe string libraries only accept and return
invariant strings it amounts to much the same thing.
I agreed with Janet's proposal to create a parallel set of routines that
worked on mutable strings.
I'm disappointed that's the only point from my post worthy of reaction :(
It appeared to me to be based on the assumption that D forced all
character arrays to be invariant.
I'm disappointed that's the only point from my post worthy of reaction :(
It appeared to me to be based on the assumption that D forced all character
arrays to be invariant.
Well no. It also went on to counter the idea that we're all going to come
around to your way of thinking on this in short order.
And to attempt to dispell the idea that the provision of inmutable strings,
without doing the same for all the other datatypes, is going to fix anthing
major.
The exact same problems you describe for character arrays, exists for int
arrays and unit arrays and....hashes of every flavour.
Fixing one, if fixing them is what this does, without also fixing all the
others, just moves the goal posts (a little).
If a piece of code needs to know that the subject of a reference (string, int
array, hash, whatever), isn't going to change,
it is (and should be) *its responsibility* to ensure that--by taking a private
copy.
Burdening all code with the costs of immutability just in case someone is
vulnerable to its mutation, *and* is too lazy to take a copy,
seems like making everyne wear condoms in case someone might have sex. And
doing for just one type of array when they all suffer
from the same problem, doesn't seem liely to address the problems of unwanted
pregnancies.
I agreed with Janet's proposal to create a parallel set of routines that
worked on mutable strings.
Sure. Sometime soon we will have a mutable string capable library again, and
then we'll see how beneficial immutable strings really are
on the basis of how many people make use of them.
But that doesn't address the issue of the salience of the reasoning for having
them in the first place. Or the costs of using them in terms of
stack fragmentation, additional GC runs, destruction of cache coherency, etc.
etc. etc.
--
Apr 28 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
If a piece of code needs to know that the subject of a reference
(string, int array, hash, whatever), isn't going to change, it is
(and should be) *its responsibility* to ensure that--by taking a
private copy.
There are two ways of doing it. One is COW, where those who make the
change make the copy. The other way doesn't have a name, but it's making
a copy "just in case" someone else might mutate it. I think you're
proposing the latter. Invariant strings is a way of enforcing COW,
rather than relying on documentation.
There's no doubt you can make JIC work successfully. I've used it myself
for decades. But I always find myself expending effort trying to
optimize away those copies, and so find it more productive to go the
other way and use COW.
While I am comfortable using COW with mutable strings, the many many
discussions of it in this forum made it clear that most would like to
have some compiler help with it. Invariant strings fit the bill nicely.
There are two ways of doing it. One is COW, where those who make the
change make the copy. The other way doesn't have a name, but it's making a
copy "just in case" someone else might mutate it. I think you're proposing
the latter. Invariant strings is a way of enforcing COW, rather than
relying on documentation.
There's no doubt you can make JIC work successfully. I've used it myself
for decades. But I always find myself expending effort trying to optimize
away those copies, and so find it more productive to go the other way and
use COW.
While I am comfortable using COW with mutable strings, the many many
discussions of it in this forum made it clear that most would like to have
some compiler help with it. Invariant strings fit the bill nicely.
Okay Walter,
This will be my last word on the subject. When I posted the headpost of
this thread, I had no idea what I was getting into.
I've since taken the time to catch up on some of the history, along with
that of the Phobos/Tango debate. See below.
As I see it, both mechanisms are "just in case". The difference is that
with invariants and COW, everyone who /doesn't/ need immutability has
to copy so that the one person who does need it, if they indeed exist at
all which we have no way of knowing, doesn't have to copy.
The other way, the one person who knows they need immutability has to
copy, and everyone else simply ignores the issue.
If you're given a reference and you need it not to change, take a copy and
hide it away. Then it cannot.
If you're given a reference and you don't care if it changes, (or you want
to be apprised of any changes), use it, Keep it or throw it away.
Expecting everyone else to take extra precuations, always, "just in case",
so that you don't have to take precautions even when you know
you need to, seems the height of selfishness.
STM (from elsewhere)
This whole thing of invariance and concurrency seems to be aimed at
enabling the use of COW.
Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at
its core?
I'm not sure that I follow the question in context, or the meaning of
"copy-swap".
STM is an alternative to locking for concurrency control. Essentially,
each reader of known (marked) shared state gets a copy of the state. And
an internal copy is made.
If that reader later attempts to write back to the shared state, it's
current value is compared against the internal copy taken when read,
If they are disparate, the code that is attempting to write gets rolled
back to the read point and is given the updated value (and another
internal copy is taken)
Lather, rinse, repeat until the copy and current values are the same, then
commit the change and continue.
Fairly expensive, and only works for code that can be rolled back (ie.
referentially transparent code).
Useless for anything that interacts with the outside world. Eg. writes to
the screen, or a file, or the file system,
or reads from a non-rewindable source like a port or socket or the terminal.
Efficient if you live in a referentially transparent world--all data
exists at compile time; no interaction with the outside world.
Next to useless otherwise. You still need locking or some other mechanism
to deal with external state.
If that describes copy-swap then yes. Else no :)
Phobos vs. Tango
I definitely don't want the dead weight of pointless OO wrappers or deeply
nested hierarchies. Nor the "everything must be OO" philosophy.
Once I regain access to std.string for my char[]s, (and a simple,
expectation conformant rand() function :), I'll be happy.
Till then, I'll get outta yer hair and go back to trying to process my
140GB of data using D1.
(
Which is a shame because I really like some of the language changes for
D2. The extension to foreach for processing files looks cool.
I'd also vote for the convergence of for/foreach if that was possible
without moving away from a context-free grammar,
I haven't had occasion to explore the lazyness facilties yet, but they
sound cool.
Ditto the templating.
)
Despite our difference on the issue above, please add my goodwill and
paudits to your trophy box for your vision and provision of D.
Cheers, b.
--
Apr 29 2008
↑ ↓←→ Walter Bright <newshound1 digitalmars.com> writes:
Me Here wrote:
If that describes copy-swap then yes. Else no :)
copy-swap is what lock free algorithms rely on for updating a data
structure. It's at the root of STM, and even has its own TLA, CAS (Copy
And Swap).
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
Me Here wrote:
If that describes copy-swap then yes. Else no :)
structure. It's at the root of STM, and even has its own TLA, CAS (Copy
And Swap).
I believe CAS actually stands for "compare and swap" or "compare and set"
depending on who you talk to. RCU is probably a more popular algorithm
for copy and swap--it's used in the Linux kernel quite a bit. It stands for
"read, copy, update," I believe.
Sean
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
Me Here wrote:
If that describes copy-swap then yes. Else no :)
structure. It's at the root of STM, and even has its own TLA, CAS (Copy
And Swap).
I believe CAS actually stands for "compare and swap" or "compare and set"
depending on who you talk to. RCU is probably a more popular algorithm
for copy and swap--it's used in the Linux kernel quite a bit. It stands for
"read, copy, update," I believe.
Sean
From the litrature I found, CAS is (was originally) the name of the opcode
used on a Sun microprocessor to conditionally and atomically swap the contents
of two words of memory (or maybe memory and register).
It also mentions a CASX opcode, and a LL/SC (Load Linked / Store
Conditional) pairing that can be used as alternatives.
Cheers, b.
--
Apr 29 2008
↑ ↓ ← → Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
Sean Kelly wrote:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
Me Here wrote:
If that describes copy-swap then yes. Else no :)
structure. It's at the root of STM, and even has its own TLA, CAS (Copy
And Swap).
I believe CAS actually stands for "compare and swap" or "compare and set"
depending on who you talk to. RCU is probably a more popular algorithm
for copy and swap--it's used in the Linux kernel quite a bit. It stands for
"read, copy, update," I believe.
Sean
used on a Sun microprocessor to conditionally and atomically swap the contents
of two words of memory (or maybe memory and register).
It also mentions a CASX opcode, and a LL/SC (Load Linked / Store
Conditional) pairing that can be used as alternatives.
Yeah, LL/SC is pretty cool. The hardware transactional memory proposals I've
seen are like LL/SC on steroids. Bit more flexible than CAS, but either works.
Sean
copy-swap is what lock free algorithms rely on for updating a data structure.
It's at the root of STM, and even has its own TLA, CAS (Copy And Swap).
Ah! As in compare & exchange (cmpxchg & cmpxchg8b) x86 opcodes. I wasn't
thinking at the m/code level.
Cheers, b.
--
Apr 29 2008
↑ ↓ ←→ Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
Phobos vs. Tango
I definitely don't want the dead weight of pointless OO wrappers or deeply
nested hierarchies. Nor the "everything must be OO" philosophy.
Once I regain access to std.string for my char[]s, (and a simple,
expectation conformant rand() function :), I'll be happy.
Please don't discount Tango based on what has been said about it in this
forum. I know for a fact that Walter, for example, has never even looked
at Tango (or he hadn't as of a few weeks ago anyway). In truth, the percentage
of classes to functions in Tango is roughly the same as in Phobos... Tango is
just a much larger library. If you're interested in algorithms and string
operations,
I suggest looking at tango.core.Array and tango.text.*. The former is basically
C++'s <algorithm> retooled for D arrays, and the latter holds all the
string-specific
routines in Tango.
Sean
== Quote from Me Here (p9e883002 sneakemail.com)'s article
Phobos vs. Tango
I definitely don't want the dead weight of pointless OO wrappers or deeply
nested hierarchies. Nor the "everything must be OO" philosophy.
Once I regain access to std.string for my char[]s, (and a simple,
expectation conformant rand() function :), I'll be happy.
Please don't discount Tango based on what has been said about it in this
forum. I know for a fact