digitalmars.D - .NET on a string

Walter Bright (2/2) Mar 16 2009 Cristi talks about adapting D strings to .NET.

Georg Wrede (9/13) Mar 17 2009 Actually, meant to ask earlier, but

Steven Schveighoffer (39/51) Mar 17 2009 There has been. But there are very good reasons to keep arrays and slic...

Robert Jacques (35/92) Mar 17 2009 Well, if it helps, my roommate and I just independently came up with thi...
Cristian Vlasceanu (37/45) Mar 21 2009 I am afraid that there's a fallacy in your argument: substrings ARE stri...

Kagamin (2/10) Mar 22 2009 There was another suggestion - to mark arrays: int[] is a slice, int[new...

Christopher Wright (3/15) Mar 22 2009 The const system is largely a frontend issue. I don't think that will
Michel Fortin (13/26) Mar 22 2009 I like that better, because the less verbose type is a slice and that's

bearophile (5/8) Mar 22 2009 It seems that LLVM (and GCC) too are able to support them well enough. I...
Michel Fortin (33/42) Mar 22 2009 As I read it have no problem integrating with the writing CLR code to
Steven Schveighoffer (28/85) Mar 23 2009 I thought one of the benefits of having immutable strings is that

Robin KAY (16/21) Mar 23 2009 Sun's Java implements the String class as below, so that sub-strings can
Denis Koroskin (2/96) Mar 23 2009 I recall a discussion on Java forum where a user was getting a huge "mem...
Christopher Wright (6/11) Mar 23 2009 I checked the source code for ClassPath's string implementation about a
Cristian Vlasceanu (42/55) Mar 23 2009 Fair enough, I meant substrings in more of a C++ way. Perhaps discussing...

Steven Schveighoffer (28/62) Mar 24 2009 OK, I didn't think of that implementation.

Cristian Vlasceanu (4/7) Mar 24 2009 What I meant is that porting the libraries is not in the scope of my pro...

Steven Schveighoffer (5/8) Mar 24 2009 There is not already a meaning for T[new], it is a syntax error. There ...

Cristian Vlasceanu (14/26) Mar 24 2009 Yes, but the current, existing meaning will be preserved:

Steven Schveighoffer (35/56) Mar 26 2009 OK, I'm not sure I understood your original proposal, before I respond

Cristian Vlasceanu (7/66) Mar 29 2009 At a first look yes I think the assertion will fail, but not if you decl...

bearophile (4/8) Mar 24 2009 Can you explain me what are the purposes of D.net?

Cristian Vlasceanu (8/13) Mar 24 2009 You mean other than the tremendous fun that I am having writing a compil...

dsimcha (50/53) Mar 17 2009 No rigorous discussion. This seems to be one of those thorny issues tha...

bearophile (12/14) Mar 17 2009 For the second time I have put a bug like this:

bearophile (3/4) Mar 17 2009 Sorry, I meant:
Steven Schveighoffer (4/9) Mar 17 2009 That will affect the GC when it is scanning for pointers.

Georg Wrede (9/12) Mar 17 2009 At times I find myself wondering if the D string (i.e. invariant char[])...
Kagamin (2/4) Mar 25 2009 framework classes are not const qualified.

Walter Bright <newshound1 digitalmars.com> writes:

Cristi talks about adapting D strings to .NET.

http://www.reddit.com/r/programming/comments/84urf/net_on_a_string/?already_submitted=true

Mar 16 2009

Georg Wrede <georg.wrede iki.fi> writes:

Walter Bright wrote:
 Cristi talks about adapting D strings to .NET.
 
 http://www.reddit.com/r/programming/comments/84urf/net_on_a_string/?al
eady_submitted=true 
 

Actually, meant to ask earlier, but

"[...in .NET,] System.Array and System.ArraySegment are distinct, 
unrelated types. In D arrays slices are indistinguishable from arrays 
and this creates the problems that I mentioned in the interview."

Is this discussed somewhere?

The idea of slices and arrays being distinct types does seem to have 
advantages. I've seen a couple of mentions of this lately, but has there 
been a *rigorous* discussion?

Mar 17 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 17 Mar 2009 10:59:21 -0400, Georg Wrede <georg.wrede iki.fi> wrote:

Walter Bright wrote:
Cristi talks about adapting D strings to .NET.

http://www.reddit.com/r/programming/comments/84urf/net_on_a_string/?already_submitted=true

Actually, meant to ask earlier, but

"[...in .NET,] System.Array and System.ArraySegment are distinct,
unrelated types. In D arrays slices are indistinguishable from arrays
and this creates the problems that I mentioned in the interview."

Is this discussed somewhere?

The idea of slices and arrays being distinct types does seem to have
advantages. I've seen a couple of mentions of this lately, but has there
been a *rigorous* discussion?

There has been. But there are very good reasons to keep arrays and slices

string. It allows iterative patterns such as:

str = str[1..$];

The main problem comes form having a substring overwrite data in another
string via appending. From what I can tell, that is the *only* issue with
arrays and slices being the same type. This problem can be solved in
other ways. I've put forth 2 solutions that as far as I know were not
proven to be infeasible, but which never received any attention in the NG
from Walter or Andrei.

There was mention of a separate T[new] type, before my time with D, but I
think you still have the same aliasing problems there unless you zero out
the source instance on assignment, or simply disallow assigning a T[new]
from another T[new], which can affect the design of code, and make certain
iterative patterns difficult if not impossible.

The two solutions I put forth are:

1. Storing the requested length of an allocated block in the GC. This not
only allows for more intuitive appending, but could also help the GC when
scanning for pointers (you might not have to zero out the block first).
This proposal received zero responses.

2. Storing the requested length of an allocated array at the front of the
array. I had a hackish scheme to use the most significant bit in the
length field to identify if a slice is just beyond the allocated length.
Therefore, an append operation would first check if it is the first
element in a block, and if so check the allocated length before deciding
whether to append in place or allocate a new block. There were some
questions on the proposal, but I don't believe anyone found any killer
problems with it. It has advantages and disadvantages over the first
proposal and current implementation. The main advantage is that the
append code does not have to query the GC to get the requested length.

Aside from the append problem, I see no other drawbacks to having strings
the way they are.

References:

Proposal 1:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=63146

Proposal 2:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=77437

-Steve

Mar 17 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 17 Mar 2009 13:03:56 -0400, Steven Schveighoffer
<schveiguy yahoo.com> wrote:

On Tue, 17 Mar 2009 10:59:21 -0400, Georg Wrede <georg.wrede iki.fi>
wrote:

Walter Bright wrote:
Cristi talks about adapting D strings to .NET.

http://www.reddit.com/r/programming/comments/84urf/net_on_a_string/?already_submitted=true

Actually, meant to ask earlier, but

Is this discussed somewhere?

The idea of slices and arrays being distinct types does seem to have
advantages. I've seen a couple of mentions of this lately, but has
there been a *rigorous* discussion?

There has been. But there are very good reasons to keep arrays and

as a string. It allows iterative patterns such as:

str = str[1..$];

The main problem comes form having a substring overwrite data in another
string via appending. From what I can tell, that is the *only* issue
with arrays and slices being the same type. This problem can be solved
in other ways. I've put forth 2 solutions that as far as I know were
not proven to be infeasible, but which never received any attention in
the NG from Walter or Andrei.

There was mention of a separate T[new] type, before my time with D, but
I think you still have the same aliasing problems there unless you zero
out the source instance on assignment, or simply disallow assigning a
T[new] from another T[new], which can affect the design of code, and
make certain iterative patterns difficult if not impossible.

The two solutions I put forth are:

1. Storing the requested length of an allocated block in the GC. This
not only allows for more intuitive appending, but could also help the GC
when scanning for pointers (you might not have to zero out the block
first). This proposal received zero responses.

Well, if it helps, my roommate and I just independently came up with this
idea. (So is this great minds think a like or that simple ones seldom
differ?)

2. Storing the requested length of an allocated array at the front of
the array. I had a hackish scheme to use the most significant bit in
the length field to identify if a slice is just beyond the allocated
length. Therefore, an append operation would first check if it is the
first element in a block, and if so check the allocated length before
deciding whether to append in place or allocate a new block. There were
some questions on the proposal, but I don't believe anyone found any
killer problems with it. It has advantages and disadvantages over the
first proposal and current implementation. The main advantage is that
the append code does not have to query the GC to get the requested
length.

You mentioned being un-sure of the threading and alignment issues of your
proposal in [2]. I see both a lock and lock-free solution:

Lock solution:
As to append to an array, one has to know the capacity, which requires
taking the GC lock. If, as in [1], the used length and capacity are stored
in the GC together, one could move the extension operation to the GC. i.e.
given a block, the used length in that block and an array (consisting of a
pointer into that block and a length) it is straight forward to calculate
if that array could be lengthen and if not, allocate a new one. The caller
can then do a full or partial copy as appropriate. This avoids the
alignment issues (as no new information is stored in the block) and
maintains the maximum array size (although this is a fairly minor corner
case)

Lock-free solution
This solution would make use of the 'empty' 16-byte block that proceeds
each allocated block. I'm assuming this is usable, but if it's not, an
extra 16-byte block would have to be added to each heap array to maintain
proper alignment. This solution would cache the array capacity and its
used length just prior to the start of the array (i.e. in the 'empty'
block). This is similar to [2], except that in [2] you still have to query
the GC for capacity. Again, a bit flag in the array length determines
slice / non-slice status. Regarding multi-threading issues: reading
capacity suffers from publication safety issues (I think) but it might
get cleared up by the releasing of the GC allocation lock. Otherwise, it
would need to be fenced. As for the length, one can use a single compare
and swap (comparing the GC's length with the local array's length and
setting the new desired length). This also has the advantage of avoiding
the GC when doing array appending, which decreases the need for separate
ArrayBuldier classes. (Of course, once shared/local is introduced, the
expensive fences and CAS instructions can be limited appropriately.)

Aside from the append problem, I see no other drawbacks to having
strings the way they are.

References:

Proposal 1:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=63146

Proposal 2:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=77437

-Steve

Mar 17 2009

"Cristian Vlasceanu" <cristian zerobugs.org> writes:

 The idea of slices and arrays being distinct types does seem to have 
 advantages. I've seen a couple of mentions of this lately, but has there 
 been a *rigorous* discussion?

 There has been.  But there are very good reasons to keep arrays and slices 

 string.  It allows iterative patterns such as:

 str = str[1..$];

I am afraid that there's a fallacy in your argument: substrings ARE strings: 
1) they are full copies of the characters in the given range, and 2) once 
the substring is created it goes its own merry way (i.e. it does not keep 
track of any relationship to the "original" string). Slices ARE NOT arrays. 
Slices are more like "views" into the original array. It is like the 
difference between the icon and the saint / deity that it represents.

Another point that I have a hard time getting accross (even to the language 
heavy-weights) is that just because it is easy to represent arrays and 
slices seemlessly IN THE PARTICULAR CASE OF THE DIGITAL MARS BACKEND it does 
not mean it is going to work as smooth and seamless in other systems. The 
.NET backend that I am working on is the case in point. If instead of using 
.NET built-in arrays I craft my own representation (to stay compatible with 
the DMD's way of doing array and slices) then I give up interoperability 
with other languages -- and that would defeat the point of doing D on .NET 
to begin with.

Your proposed solution is interesting but implementation-specific. I am 
afraid that I cannot not use it with .NET (I just generate IL code, which is 
more high-level than "ordinary" assembly code).

I passed a proposal of my own to Walter and Andrei, and that is to have D 
coders explicitly state the intent of using a slice with the "ref" keyword; 
"ref" is already a legal token in D (at least in 2.0) albeit it is only 
valid in the context of a parameter list, or foreach argument list. It is 
not legal to say "ref int j = i;" in a declaration, for example. But it is a 
trivial change in the parser (I have implemented this change as a proof of 
concept / language extension research) to allow ref (just for slices): "ref 
int[] s = a[1..2];" other than in parameter and foreach arg lists.

I think that "ref" makes sense, because slices, like I said, are 
conceptually views (or references) into the "true" arrays. This simple 
change would a) make D code more self-documenting, and it would give a very 
powerful hint to the compiler. Also, the "ref" semantics is backwards 
compatbile with the exisiting cases where "ref" is allowed.

But there may be some yet unforseen side-effects (when dealing with generic 
code, for instance) that need to be further investigated before I push for 
this solution at full speed. If I find it to work suitably well I have no 
problem in making my own D 2.0 / .NET dialect...

Cheers,
Cristian

Mar 21 2009

Kagamin <spam here.lot> writes:

Cristian Vlasceanu Wrote:

 I passed a proposal of my own to Walter and Andrei, and that is to have D 
 coders explicitly state the intent of using a slice with the "ref" keyword; 
 "ref" is already a legal token in D (at least in 2.0) albeit it is only 
 valid in the context of a parameter list, or foreach argument list. It is 
 not legal to say "ref int j = i;" in a declaration, for example. But it is a 
 trivial change in the parser (I have implemented this change as a proof of 
 concept / language extension research) to allow ref (just for slices): "ref 
 int[] s = a[1..2];" other than in parameter and foreach arg lists.

There was another suggestion - to mark arrays: int[] is a slice, int[new] is an
array. I think, you'll have problems with const system, .net doesn't have one.

Mar 22 2009

Christopher Wright <dhasenan gmail.com> writes:

Kagamin wrote:
 Cristian Vlasceanu Wrote:
 
 I passed a proposal of my own to Walter and Andrei, and that is to have D 
 coders explicitly state the intent of using a slice with the "ref" keyword; 
 "ref" is already a legal token in D (at least in 2.0) albeit it is only 
 valid in the context of a parameter list, or foreach argument list. It is 
 not legal to say "ref int j = i;" in a declaration, for example. But it is a 
 trivial change in the parser (I have implemented this change as a proof of 
 concept / language extension research) to allow ref (just for slices): "ref 
 int[] s = a[1..2];" other than in parameter and foreach arg lists.

 
 There was another suggestion - to mark arrays: int[] is a slice, int[new] is
an array. I think, you'll have problems with const system, .net doesn't have
one.

The const system is largely a frontend issue. I don't think that will 
cause many issues.

Mar 22 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-03-22 05:08:55 -0400, Kagamin <spam here.lot> said:

 Cristian Vlasceanu Wrote:
 
 I passed a proposal of my own to Walter and Andrei, and that is to have D
 coders explicitly state the intent of using a slice with the "ref" keyword;
 "ref" is already a legal token in D (at least in 2.0) albeit it is only
 valid in the context of a parameter list, or foreach argument list. It is
 not legal to say "ref int j = i;" in a declaration, for example. But it is a
 trivial change in the parser (I have implemented this change as a proof of
 concept / language extension research) to allow ref (just for slices): "ref
 int[] s = a[1..2];" other than in parameter and foreach arg lists.

 
 There was another suggestion - to mark arrays: int[] is a slice, 
 int[new] is an array.

I like that better, because the less verbose type is a slice and that's 
what you need to pass around most of the time.

But I think it'd be even better (perhaps not for .NET compatibility, 
but for D in general) to just remove the need for having a separate 
array type. For instance, we could have the GC keep track of the array 
bounds in addition to the allocated size, allowing it to determine if 
it can grow a slice in place by checking if the slice points at the 
end. I'm sure someone has already proposed this at some point.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 22 2009

bearophile <bearophileHUGS lycos.com> writes:

Cristian Vlasceanu:
 arrays and 
 slices seemlessly IN THE PARTICULAR CASE OF THE DIGITAL MARS BACKEND it does 
 not mean it is going to work as smooth and seamless in other systems.

It seems that LLVM (and GCC) too are able to support them well enough. I think
D was never designed to be a VM-based language, it is supposed to be a system
language after all (while dotnet isn't) so it's normal to find some troubles
when you try to implement it on a VM.
It's wrong to worsen the syntax/semantics of D just to simplify a dotnet port.

Bye,
bearophile

Mar 22 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-03-22 01:31:28 -0400, "Cristian Vlasceanu" 
<cristian zerobugs.org> said:

 Another point that I have a hard time getting accross (even to the language
 heavy-weights) is that just because it is easy to represent arrays and
 slices seemlessly IN THE PARTICULAR CASE OF THE DIGITAL MARS BACKEND it does
 not mean it is going to work as smooth and seamless in other systems. The
 .NET backend that I am working on is the case in point. If instead of using
 .NET built-in arrays I craft my own representation (to stay compatible with
 the DMD's way of doing array and slices) then I give up interoperability
 with other languages -- and that would defeat the point of doing D on .NET
 to begin with.

As I read it have no problem integrating with the writing CLR code to 
make D semantics work, but you struggle with seamless compatibility 
with the framework (using .NET predefined classes).

But really, no compatibility? In the D/Objective-C bridge, I have a 
bunch of conversions functions to and from NSArray. You can surely do 
the same in .NET: have the D array type, and the .NET array type, and 
convert from one another when necessary.

Sure, the D/Objective-C bridge isn't a backend based on the Objective-C 
runtime, but to the user it almost is (D and Objective-C objects look 
the same and are interchangeable). If I was pushing that a little 
further and writing a compiler using Objective-C runtime to implement D 
classes, I still wouldn't be able to use NSArray to represents D arrays 
because they have different caracteristics, limitations, and 
behaviours. But I don't see that as a flaw.

That D arrays make different tradoffs than NSArray or .NET arrays isn't 
a problem that should be corrected: it's something that makes D what it 
is as it enables some patterns that wouldn't work otherwise. Sure, 
those different tradoffs makes things harder to bridge with arrays 
defined in some existing frameworks (both Cocoa and .NET), but if they 
make things easier and nicer inside the D language I'm all for keeping 
them.

So when using the D/Objective-C bridge, there are two array types and 
you use the one most appropriate for the job. And you can convert from 
one to the other when necessary.

Perhaps you should look at means of easying the conversion process when 
passing D arrays to functions expecting .NET arrays instead of fighting 
to unify two different array concepts into one.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 22 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 22 Mar 2009 01:31:28 -0400, Cristian Vlasceanu wrote:

 The idea of slices and arrays being distinct types does seem to have
 advantages. I've seen a couple of mentions of this lately, but has  
 there
 been a *rigorous* discussion?

 There has been.  But there are very good reasons to keep arrays and  
 slices

 string.  It allows iterative patterns such as:

 str = str[1..$];

 I am afraid that there's a fallacy in your argument: substrings ARE  
 strings:
 1) they are full copies of the characters in the given range, and 2) once
 the substring is created it goes its own merry way (i.e. it does not keep
 track of any relationship to the "original" string). Slices ARE NOT  
 arrays.
 Slices are more like "views" into the original array. It is like the
 difference between the icon and the saint / deity that it represents.

I thought one of the benefits of having immutable strings is that  
substrings were just pointers to slices of the original data in Java and  
.NET.  So every time I do a substring in Java and .NET, it creates a copy  
of the data?  That seems very wasteful, especially when the data is  
immutable...

Note I have no intimate knowledge of the inner workings of Java and .Net,  
I just went on what logically makes sense.

Your statement that slices are not arrays depends on your definition of  
slice and array.  To me, slices and arrays are identical.  Slices are  
simply a smaller set of the array.  It's like saying subsets are a  
different type than sets.  I suppose it depends on what language you  
learned about slices (for me it was D).

The only issue I see is the builtin append operation.  Everything else has  
a clean solution already.

 Another point that I have a hard time getting accross (even to the  
 language
 heavy-weights) is that just because it is easy to represent arrays and
 slices seemlessly IN THE PARTICULAR CASE OF THE DIGITAL MARS BACKEND it  
 does
 not mean it is going to work as smooth and seamless in other systems. The
 .NET backend that I am working on is the case in point. If instead of  
 using
 .NET built-in arrays I craft my own representation (to stay compatible  
 with
 the DMD's way of doing array and slices) then I give up interoperability
 with other languages -- and that would defeat the point of doing D on  
 .NET
 to begin with.

Fair enough, but I think restricting the design of D to cater to other  
back ends is probably not a huge driver for Walter.  It's not without  
precedent that people adapting languages to .NET have to introduce syntax  
changes to the language, no?  I'm thinking of C++.NET.

 Your proposed solution is interesting but implementation-specific. I am
 afraid that I cannot not use it with .NET (I just generate IL code,  
 which is
 more high-level than "ordinary" assembly code).

 I passed a proposal of my own to Walter and Andrei, and that is to have D
 coders explicitly state the intent of using a slice with the "ref"  
 keyword;
 "ref" is already a legal token in D (at least in 2.0) albeit it is only
 valid in the context of a parameter list, or foreach argument list. It is
 not legal to say "ref int j = i;" in a declaration, for example. But it  
 is a
 trivial change in the parser (I have implemented this change as a proof  
 of
 concept / language extension research) to allow ref (just for slices):  
 "ref
 int[] s = a[1..2];" other than in parameter and foreach arg lists.

 I think that "ref" makes sense, because slices, like I said, are
 conceptually views (or references) into the "true" arrays. This simple
 change would a) make D code more self-documenting, and it would give a  
 very
 powerful hint to the compiler. Also, the "ref" semantics is backwards
 compatbile with the exisiting cases where "ref" is allowed.

That is an interesting idea.  But I have a couple problems with it:

First, when I see ref int[] s, I think reference to an array, not this  
array references data from another array.
Second, your proposed default (not using ref) is to copy data everywhere,  
which is not good for performance.  Most of the time, arrays are passed  
without needing to "own" the data, so making copies everywhere you forgot  
to put ref would be hard to deal with.  It's also completely incompatible  
with existing code, which expects reference semantics without using ref.

-Steve

Mar 23 2009

Robin KAY <komadori gekkou.co.uk> writes:

Steven Schveighoffer wrote:
[snip]
 I thought one of the benefits of having immutable strings is that 
 substrings were just pointers to slices of the original data in Java and 
 .NET.  So every time I do a substring in Java and .NET, it creates a 
 copy of the data?  That seems very wasteful, especially when the data is 
 immutable...

Sun's Java implements the String class as below, so that sub-strings can
share their parent's character array. This is, AFAIK, an implementation
detail although I don't know of any implementation that differs.

The downside is that holding a reference to a sub-string will prevent
the parent from being collected and this is not an obvious side-effect
of substring(). Storing longer strings as ropes might be better.

class String
{
   char[] value;
   int offset;
   int count;
}

-- 
Robin KAY

Mar 23 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Mon, 23 Mar 2009 20:41:45 +0300, Steven Schveighoffer <schveiguy yahoo.com>
wrote:

 On Sun, 22 Mar 2009 01:31:28 -0400, Cristian Vlasceanu wrote:

 The idea of slices and arrays being distinct types does seem to have
 advantages. I've seen a couple of mentions of this lately, but has  
 there
 been a *rigorous* discussion?

 There has been.  But there are very good reasons to keep arrays and  
 slices

 string.  It allows iterative patterns such as:

 str = str[1..$];

 I am afraid that there's a fallacy in your argument: substrings ARE  
 strings:
 1) they are full copies of the characters in the given range, and 2)  
 once
 the substring is created it goes its own merry way (i.e. it does not  
 keep
 track of any relationship to the "original" string). Slices ARE NOT  
 arrays.
 Slices are more like "views" into the original array. It is like the
 difference between the icon and the saint / deity that it represents.

 I thought one of the benefits of having immutable strings is that  
 substrings were just pointers to slices of the original data in Java and  
 .NET.  So every time I do a substring in Java and .NET, it creates a  
 copy of the data?  That seems very wasteful, especially when the data is  
 immutable...

 Note I have no intimate knowledge of the inner workings of Java and  
 .Net, I just went on what logically makes sense.

 Your statement that slices are not arrays depends on your definition of  
 slice and array.  To me, slices and arrays are identical.  Slices are  
 simply a smaller set of the array.  It's like saying subsets are a  
 different type than sets.  I suppose it depends on what language you  
 learned about slices (for me it was D).

 The only issue I see is the builtin append operation.  Everything else  
 has a clean solution already.

 Another point that I have a hard time getting accross (even to the  
 language
 heavy-weights) is that just because it is easy to represent arrays and
 slices seemlessly IN THE PARTICULAR CASE OF THE DIGITAL MARS BACKEND it  
 does
 not mean it is going to work as smooth and seamless in other systems.  
 The
 .NET backend that I am working on is the case in point. If instead of  
 using
 .NET built-in arrays I craft my own representation (to stay compatible  
 with
 the DMD's way of doing array and slices) then I give up interoperability
 with other languages -- and that would defeat the point of doing D on  
 .NET
 to begin with.

 Fair enough, but I think restricting the design of D to cater to other  
 back ends is probably not a huge driver for Walter.  It's not without  
 precedent that people adapting languages to .NET have to introduce  
 syntax changes to the language, no?  I'm thinking of C++.NET.

 Your proposed solution is interesting but implementation-specific. I am
 afraid that I cannot not use it with .NET (I just generate IL code,  
 which is
 more high-level than "ordinary" assembly code).

 I passed a proposal of my own to Walter and Andrei, and that is to have  
 D
 coders explicitly state the intent of using a slice with the "ref"  
 keyword;
 "ref" is already a legal token in D (at least in 2.0) albeit it is only
 valid in the context of a parameter list, or foreach argument list. It  
 is
 not legal to say "ref int j = i;" in a declaration, for example. But it  
 is a
 trivial change in the parser (I have implemented this change as a proof  
 of
 concept / language extension research) to allow ref (just for slices):  
 "ref
 int[] s = a[1..2];" other than in parameter and foreach arg lists.

 I think that "ref" makes sense, because slices, like I said, are
 conceptually views (or references) into the "true" arrays. This simple
 change would a) make D code more self-documenting, and it would give a  
 very
 powerful hint to the compiler. Also, the "ref" semantics is backwards
 compatbile with the exisiting cases where "ref" is allowed.

 That is an interesting idea.  But I have a couple problems with it:

 First, when I see ref int[] s, I think reference to an array, not this  
 array references data from another array.
 Second, your proposed default (not using ref) is to copy data  
 everywhere, which is not good for performance.  Most of the time, arrays  
 are passed without needing to "own" the data, so making copies  
 everywhere you forgot to put ref would be hard to deal with.  It's also  
 completely incompatible with existing code, which expects reference  
 semantics without using ref.

 -Steve

I recall a discussion on Java forum where a user was getting a huge "memory
leak" because he was processing large xml files and stored some (rather small)
String values that were substrings inside those xml files. Since then I got an
impression that a substring of a String is just a slice into an original
string. But I agree that this is an implementation detail.

Mar 23 2009

Christopher Wright <dhasenan gmail.com> writes:

Steven Schveighoffer wrote:
 I thought one of the benefits of having immutable strings is that 
 substrings were just pointers to slices of the original data in Java and 
 .NET.  So every time I do a substring in Java and .NET, it creates a 
 copy of the data?  That seems very wasteful, especially when the data is 
 immutable...

I checked the source code for ClassPath's string implementation about a 
year ago. It copies to create a substring. I thought it was stupid.

On the other hand, ClassPath is an open source project and doesn't have 
hundreds of millions of dollars in backing, so I imagine Sun's String 
implementation is slightly smarter.

Mar 23 2009

"Cristian Vlasceanu" <cristian zerobugs.org> writes:

 I thought one of the benefits of having immutable strings is that 
 substrings were just pointers to slices of the original data in Java and 
 .NET.  So every time I do a substring in Java and .NET, it creates a copy 
 of the data?  That seems very wasteful, especially when the data is 
 immutable...

Fair enough, I meant substrings in more of a C++ way. Perhaps discussing 
slices / arrays by comparison with substrings / strings is a bad idea, 
because as you point out strings are immutable whilst arrays are not.

My main problem with slices is that when you a) append to them, or b) resize 
them (by assigning to the length property), and the new size goes past the 
bounds of the "original" array, then the slice gets "divorced" from the 
array, and from a light-weight "view" it gets promoted to a "first class" 
array (at least this is the behavior in 2.025).

This means that for arrays of value types, you can modify the original array 
indirectly via the outstanding slices, but only up until when either a) or 
b) occur, at runtime.

I would prefer to be able to see explicitly in the source code what is a 
"full array", and what is just a "view".

On the other hand, this is not a deal breaker, because I can always roll my 
own slice like this:

struct (T) ArraySlice {

private:

     T[] a; // reference to array

     int begin; // index where slice begins

     int end; // one past index where slice ends

public:

     T opIndex(size_t i) { return a[i]; }

      void opIndexAssign(size_t i, T val) { a[i] = val; }

     int length() { return end - begin; }

     // comment this function out to prevent resizing

     void length(int newLength) {

           end = begin + newlength;

           if (end > a.length) { end = a.length; }

                }

     // support foreach

     int opApply(int delegate(ref int) dg) {

           foreach (i;begin..end) {

                if (dg(a[i])) break;

           }

           return 0;

     }

}


 That is an interesting idea.  But I have a couple problems with it:

 First, when I see ref int[] s, I think reference to an array, not this 
 array references data from another array.
 Second, your proposed default (not using ref) is to copy data everywhere, 
 which is not good for performance.  Most of the time, arrays are passed 
 without needing to "own" the data, so making copies everywhere you forgot 
 to put ref would be hard to deal with.  It's also completely incompatible 
 with existing code, which expects reference semantics without using ref.

That's exactly right, but for D / .NET I do not expect existing code to 
compile as-is. For example, I do not intent to port any of the phobos / 
tango code. I envision all I / O and system stuff to go through 
[mscorlib]System.


Cristian

Mar 23 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 23 Mar 2009 22:24:30 -0400, Cristian Vlasceanu  
<cristian zerobugs.org> wrote:

 I thought one of the benefits of having immutable strings is that
 substrings were just pointers to slices of the original data in Java and
 .NET.  So every time I do a substring in Java and .NET, it creates a  
 copy
 of the data?  That seems very wasteful, especially when the data is
 immutable...

 Fair enough, I meant substrings in more of a C++ way. Perhaps discussing
 slices / arrays by comparison with substrings / strings is a bad idea,
 because as you point out strings are immutable whilst arrays are not.

OK, I didn't think of that implementation.

 My main problem with slices is that when you a) append to them, or b)  
 resize
 them (by assigning to the length property), and the new size goes past  
 the
 bounds of the "original" array, then the slice gets "divorced" from the
 array, and from a light-weight "view" it gets promoted to a "first class"
 array (at least this is the behavior in 2.025).

It's even more bizarre than this :)  If you append to a slice that happens  
to point to the first bytes of the original array, it appends in place (no  
divorce!), possibly overwriting the (possibly immutable!) data still in  
the original array.  This is the bug I'm trying to fix with my proposals.

But I see your point, which is one aspect that I was aware of, but didn't  
really feel like it was a huge problem.  After all, if the behavior is  
deterministic, then you can know whether your slice still points to the  
original array.  But looking at it from your point of view, the scheme  
definitely has valid issues.  Most other parts of the D language allow you  
to simply look at the type of something and know what it means.  Arrays,  
you have to examine the code that created/used the array to figure out  
whether it's an alias or unique data.  That is a problem.

So maybe it is a worthwhile exercise to figure out if there is a way to  
embed the attributes of the array into the type.  I'll have to think about  
how this could be implemented in a way that makes sense, is realistic, and  
does not hinder performance or syntax.  There might still be a way to make  
this work.

 That is an interesting idea.  But I have a couple problems with it:

 First, when I see ref int[] s, I think reference to an array, not this
 array references data from another array.
 Second, your proposed default (not using ref) is to copy data  
 everywhere,
 which is not good for performance.  Most of the time, arrays are passed
 without needing to "own" the data, so making copies everywhere you  
 forgot
 to put ref would be hard to deal with.  It's also completely  
 incompatible
 with existing code, which expects reference semantics without using ref.

 That's exactly right, but for D / .NET I do not expect existing code to
 compile as-is. For example, I do not intent to port any of the phobos /
 tango code. I envision all I / O and system stuff to go through
 [mscorlib]System.

That is what I would expect also, but part of the benefit of having .NET  
implemented in another language is to port existing code from that  
language to a .NET runtime.  Any code that was to be ported might suffer.   
There are already instances of ref char[] or ref string in many  
applications/libs that would cause strange bugs when ported to .NET.

Oh, and BTW, if I couldn't use Tango, I'd most certainly not use D.NET ;)   
I sort of loathe the .NET runtime libs, except for certain parts.

-Steve

Mar 24 2009

Cristian Vlasceanu <cristian zerobugs.org> writes:

Steven Schveighoffer Wrote:

 Oh, and BTW, if I couldn't use Tango, I'd most certainly not use D.NET ;)   
 I sort of loathe the .NET runtime libs, except for certain parts.
 

What I meant is that porting the libraries is not in the scope of my project.
Anyway, that is a side discussion.

Back to the slices topic: I agree that my proposed "ref" solution would require
code changes, but isn't that true for T[new] as well?

Cristian

Mar 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Mar 2009 18:26:16 -0400, Cristian Vlasceanu  
<cristian zerobugs.org> wrote:

 Back to the slices topic: I agree that my proposed "ref" solution would  
 require code changes, but isn't that true for T[new] as well?

 Cristian

There is not already a meaning for T[new], it is a syntax error.  There is  
already a meaning for ref T[].

-Steve

Mar 24 2009

Cristian Vlasceanu <cristian zerobugs.org> writes:

Steven Schveighoffer Wrote:

 On Tue, 24 Mar 2009 18:26:16 -0400, Cristian Vlasceanu  
 <cristian zerobugs.org> wrote:
 
 Back to the slices topic: I agree that my proposed "ref" solution would  
 require code changes, but isn't that true for T[new] as well?

 Cristian

 
 There is not already a meaning for T[new], it is a syntax error.  There is  
 already a meaning for ref T[].
 

Yes, but the current, existing meaning will be preserved:

void f(ref T[] a) {
   a[13] = 42; // still works as before if "a" is a slice under the hood
   a = null; // very easy for the compiler to make this work: a.array = null
}

I thought that your main objection was that, with my proposed change, in code
like this:

int [] a = ...
int [] s = a[0..$]; 

"s" becomes a full blown copy of a (rather than a slice), and to reinstate the
today's semantics the code should be re-written to:

ref int [] s = a[0..$]; 

It is true, but such cases can be worked out in the compiler and we'll have
warning messages issued.

Thanks,
Cristian

Mar 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Mar 2009 20:02:16 -0400, Cristian Vlasceanu  
<cristian zerobugs.org> wrote:

 Steven Schveighoffer Wrote:

 On Tue, 24 Mar 2009 18:26:16 -0400, Cristian Vlasceanu
 <cristian zerobugs.org> wrote:

 Back to the slices topic: I agree that my proposed "ref" solution  

 would
 require code changes, but isn't that true for T[new] as well?

 Cristian

 There is not already a meaning for T[new], it is a syntax error.  There  
 is
 already a meaning for ref T[].

 Yes, but the current, existing meaning will be preserved:

 void f(ref T[] a) {
    a[13] = 42; // still works as before if "a" is a slice under the hood
    a = null; // very easy for the compiler to make this work: a.array =  
 null
 }

OK, I'm not sure I understood your original proposal, before I respond  
more, let me make it clear what my understanding was.

In your proposal, a ref T[] a is the same as a slice today.  That is,  
assignment to a ref T[] simply copies the pointer and length from another  
T[] or ref T[].  However, it does not reference another slice struct, but  
is a local struct in itself.

In the current situation, a ref T[] is a reference to a slice struct.   
That is, assignement to a ref T[] overwrites the pointer and length on the  
reference that was passed.

So here is my objection:

void trim(ref char[] c)
{
    //
    // remove leading and trailing spaces
    //
    while(c.length > 0 && c[0] == ' ')
       c = c[1..$];
    while(c.length > 0 && c[$-1] == ' ')
       c = c[0..$-1];
}

void foo()
{
    char[] x = "   trim this!   ".dup;
    trim(x);
    assert(x == "trim this!");
}

Now, in your scheme, the ref simply means that c's data is referencing  
something else, not that c is a reference, so the assert will fail, no?

If this isn't the case, let me know how you signify:

1. a normal slice (struct is local, but ptr and length are aliased from  
data).
2. a reference of a slice (references an external struct).

-Steve

Mar 26 2009

"Cristian Vlasceanu" <cristian zerobugs.org> writes:

At a first look yes I think the assertion will fail, but not if you declare 
x:
ref char[] x = "   trim this!   ".dup;

It could be possible to tweak the compiler so that it forces you to declare 
x like that

"Steven Schveighoffer" <schveiguy yahoo.com> wrote in umessage 
news:op.urex03gyeav7ka steves.networkengines.com...
 On Tue, 24 Mar 2009 20:02:16 -0400, Cristian Vlasceanu 
 <cristian zerobugs.org> wrote:

 Steven Schveighoffer Wrote:

 On Tue, 24 Mar 2009 18:26:16 -0400, Cristian Vlasceanu
 <cristian zerobugs.org> wrote:

 Back to the slices topic: I agree that my proposed "ref" solution

 would
 require code changes, but isn't that true for T[new] as well?

 Cristian

 There is not already a meaning for T[new], it is a syntax error.  There 
 is
 already a meaning for ref T[].

 Yes, but the current, existing meaning will be preserved:

 void f(ref T[] a) {
    a[13] = 42; // still works as before if "a" is a slice under the hood
    a = null; // very easy for the compiler to make this work: a.array = 
 null
 }

 OK, I'm not sure I understood your original proposal, before I respond 
 more, let me make it clear what my understanding was.

 In your proposal, a ref T[] a is the same as a slice today.  That is, 
 assignment to a ref T[] simply copies the pointer and length from another 
 T[] or ref T[].  However, it does not reference another slice struct, but 
 is a local struct in itself.

 In the current situation, a ref T[] is a reference to a slice struct. 
 That is, assignement to a ref T[] overwrites the pointer and length on the 
 reference that was passed.

 So here is my objection:

 void trim(ref char[] c)
 {
    //
    // remove leading and trailing spaces
    //
    while(c.length > 0 && c[0] == ' ')
       c = c[1..$];
    while(c.length > 0 && c[$-1] == ' ')
       c = c[0..$-1];
 }

 void foo()
 {
    char[] x = "   trim this!   ".dup;
    trim(x);
    assert(x == "trim this!");
 }

 Now, in your scheme, the ref simply means that c's data is referencing 
 something else, not that c is a reference, so the assert will fail, no?

 If this isn't the case, let me know how you signify:

 1. a normal slice (struct is local, but ptr and length are aliased from 
 data).
 2. a reference of a slice (references an external struct).

 -Steve

Mar 29 2009

bearophile <bearophileHUGS lycos.com> writes:

Cristian Vlasceanu:
 That's exactly right, but for D / .NET I do not expect existing code to 
 compile as-is. For example, I do not intent to port any of the phobos / 
 tango code. I envision all I / O and system stuff to go through 
 [mscorlib]System.

Can you explain me what are the purposes of D.net?

Bye,
bearophile

Mar 24 2009

Cristian Vlasceanu <cristian zerobugs.org> writes:

bearophile Wrote:
 
 Can you explain me what are the purposes of D.net?
 
 Bye,
 bearophile

You mean other than the tremendous fun that I am having writing a compiler
back-end? ;)  

I guess the idea is to help the language catch on, by spreading it (or a
dialect of it) to as many platforms as possible.

Other than the above-mentioned levels of personal fun, I am envisioning to
lower the entry barrier for experimentation with the language. 

The back-end for .net is much simpler than the native code generator (it just
outputs textual IL, then invokes ILASM) so it should be easier to understand
and tweak.

The front-end is open source (is dually licensed as GPL / artistic) and my .net
back-end will be released under the Ms-PL (OSI-certified), so the entire
project will be 100% Open Source; people will be able to make changes and
redistribute them, and hopefully this will entice even more folks to adopt (or
at least give a try to) the language. 

Cheers,
Cristian

Mar 24 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Georg Wrede (georg.wrede iki.fi)'s article
 The idea of slices and arrays being distinct types does seem to have
 advantages. I've seen a couple of mentions of this lately, but has there
 been a *rigorous* discussion?

No rigorous discussion.  This seems to be one of those thorny issues that noone
wants to delve into seriously.  I had proposed once that T[new], which has been
thrown around here before, be a value type but be implicitly convertible to T[]
by
reference.  Here is the proposal reproduced.  Maybe this time it will get some
comments.

Resizable arrays are such an important, heavily used feature that they should
get
special treatment from the language, to ensure that they are syntactically
elegant, efficient both at runtime and in terms of compilation time, easy to
use,
free of odd corner cases, and in general are true first class citizens.
Nonetheless, D's arrays do have some rough edges.  My proposal would be as
follows:

There should be two array types:
T[new] and T[].
This has been proposed before, though not in as much detail as what I'm
suggesting.  The semantics should work as follows:

T[new] is considered a subtype of T[].  This works mostly like you would expect.
T[new] can be implicitly converted to T[].  The twist is that T[] can also be
assigned to T[new] without any explicit conversion, but a copy of the underlying
data is implicitly made for safety.

Assigning either a T[] or a T[new] to a T[] will result in aliasing:

T[] foo;
T[new] bar = new T[N];
bar[0] = 1;
foo = bar;
foo[0] = 2;
writeln(bar[0]);  // Prints 2.

T[new]s are always assigned by value to make concatenation and resizing safe. 
The
underlying data is copied implicitly when assigning from a T[] to a T[new] or
from
a T[new] to another T[new].

T[new] foo;
// new returns T[new], implicitly converted to T[]
T[] bar = new T[N];
bar[0] = 1;
foo = bar;
foo[0] = 2;
writeln(bar[0]);  // Prints 1.

T[]s can be sliced, but cannot be enlarged or appended to.  They should be laid
out the same way they are now, as a struct of a pointer and a length.

T[new]s can be appended to and enlarged in addition to being sliced.  Their
layout
should be a struct of pointer, length, capacity to make appending fast.  This
will
also make implicit conversion to T[] essentially free, since all that is
necessary
is slicing off the capacity field.

I also wonder if this proposal could be extended to fix some of the covariance
issues with arrays (see Bugzilla 2095).  This might work by only allowing
covariant types to be assigned to T[new], not T[].  For example:

class A{}
class B:A{}

B[new] ba = [new B];
A[] aa = ba;  // Error:  Cannot assign covariant types to T[].
A[new] aa = ba;  // Works, but copy is implicitly made.

Mar 17 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:
Nonetheless, D's arrays do have some rough edges.<

For the second time I have put a bug like this:

def foo(int[] a) {
  ...
  a.length = a.length - 1;
}

The length change can't be seen from the outside, because a is a struct passed
by value. You have to use foo(ref int[] a). I think I'd like this to be the
default behavior of dynamic array pass (disabled for example if you put an
foo(in int[] a).
Do you add this bug to your programs too?


Steven Schveighoffer:
I had a hackish scheme to use the most significant bit in the length field to
identify if a slice is just beyond the allocated length.<

Such bit may also be stored in the pointer, if they are aligned to 8 or 16
bytes.

Bye,
bearophile

Mar 17 2009

bearophile <bearophileHUGS lycos.com> writes:

bearophile:
 def foo(int[] a) {

Sorry, I meant:
void foo(int[] a) {

Mar 17 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 17 Mar 2009 13:29:59 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:
 I had a hackish scheme to use the most significant bit in the length  
 field to identify if a slice is just beyond the allocated length.<

 Such bit may also be stored in the pointer, if they are aligned to 8 or  
 16 bytes.

That will affect the GC when it is scanning for pointers.

-Steve

Mar 17 2009

Georg Wrede <georg.wrede iki.fi> writes:

Walter Bright wrote:
 Cristi talks about adapting D strings to .NET.
 
 http://www.reddit.com/r/programming/comments/84urf/net_on_a_string/?al
eady_submitted=true 

At times I find myself wondering if the D string (i.e. invariant char[]) 
is such a good idea when it (sort-of implicitly) means string of utf8. 
After all, it could be construed as being just a "quick and good-enough" 
hack.

But then I come across things like

    http://www.yoda.arachsys.com/csharp/strings.html#culture

and remember that this is more or less hopeless to get right (with any 
decent effort anyway).

Mar 17 2009

Kagamin <spam here.lot> writes:

Christopher Wright Wrote:

 The const system is largely a frontend issue. I don't think that will 
 cause many issues.

framework classes are not const qualified.

Mar 25 2009

D Programming

C/C++ Programming

Other

digitalmars.D - .NET on a string