digitalmars.D - Change representation of dynamic arrays?

Walter Bright (38/38) Oct 19 2007 Currently, arrays are represented under the hood as:

Daniel Keep (14/15) Oct 19 2007 I'm hesitant to break something like this which is basically invisible

0ffh (15/19) Oct 19 2007 I guess we go from changing one pointer and a size_t to changing two

Daniel Keep (20/43) Oct 20 2007 Well, that's definitely more work to compute end, although in most

Janice Caron (2/3) Oct 20 2007 tst.end = str.end - e*wchar.sizeof;
Janice Caron (5/8) Oct 20 2007 See, the $ wouldn't expand to (str.end-str.start) as you implied, it

Daniel Keep (7/16) Oct 20 2007 Ah yes, silly me for quoting something Walter wrote without thinking

David Brown (6/10) Oct 19 2007 I wouldn't expect the performance to change in this case. You're doing ...
Janice Caron (17/19) Oct 20 2007 While this may be optimised for simple arrays, it poses much more of a

Walter Bright (4/26) Oct 20 2007 If you have a user-defined struct, you get to decide how it is

Bill Baxter (14/40) Oct 20 2007 '$' doesn't work in user defined types. So the rest of your comment is

Janice Caron (25/32) Oct 20 2007 I guess what I was thinking is... suppose I have an arbitrary,

BCS (3/30) Oct 20 2007 if the return of .length is left unrestricted, then you can play games w...

Oskar Linde (14/16) Oct 21 2007 Yes, it doesn't have to be more complicated than so. That would handle

Janice Caron (24/25) Oct 21 2007 I think those are /excellent/ ideas.

0ffh (9/22) Oct 19 2007 Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar))

Walter Bright (3/11) Oct 19 2007 printf("my string is %*.s\n", str.length, str.ptr);

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/15) Oct 20 2007 When I try this in C, it gives a warning about the "length" argument:

Walter Bright (2/19) Oct 20 2007 I think I'm not going to think about it and just deprecate it .

Chris Miller (8/22) Oct 19 2007 It sounds interesting...

Walter Bright (2/7) Oct 19 2007 I think you answered it!

Chris Miller (16/23) Oct 20 2007 I'm starting to think the change might not be such a good idea. Imagine ...

Janice Caron (9/12) Oct 20 2007 It would be very counterintuitive if ("world"++ == "orld"), so I don't
Janice Caron (8/14) Oct 20 2007 To add to that, iterators in general are not range checked, because

Janice Caron (17/20) Oct 19 2007 This will break some of my old code, but none of my new code.

Oskar Linde (5/21) Oct 19 2007 All array allocations in D allocate an extra byte just to prevent this.
Walter Bright (3/35) Oct 20 2007 No prob, t=null already sets both members to 0.

Janice Caron (5/15) Oct 20 2007 Both members of t, yes, but it won't stop s.end from pointing to t's

Walter Bright (2/4) Oct 20 2007 Yes, an extra byte is allocated for just this reason.

Jascha Wetzel (5/10) Oct 20 2007 couldn't the end pointer just point to the last element?

Janice Caron (7/14) Oct 20 2007 Oooh yuk. How horrible. I'd rather that didn't happen. For one thing,

Jascha Wetzel (11/26) Oct 20 2007 one can argue, that this way, empty arrays have an unambiguous

Janice Caron (15/19) Oct 20 2007 I will argue below that this is not a good thing.
Bill Baxter (6/30) Oct 20 2007 Not quite true. if(str){} tests only the .ptr part and ignores the

Janice Caron (9/11) Oct 20 2007 Ooh, that's nasty. I've never tried that.
BCS (3/42) Oct 20 2007 Ohh. That's another feature for an IDE: Find this pattern in the syntax/...

Janice Caron (11/14) Oct 20 2007 Ooh - just noticed this.
Janice Caron (14/14) Oct 20 2007 Walter, I think we might want a new property, isEmpty, for arrays (and
David Brown (10/14) Oct 20 2007 The compiler should never generate an actual division for a compile-time
Janice Caron (22/28) Oct 20 2007 I hadn't thought about that before, but yes, you're right. The rule
David Brown (4/5) Oct 20 2007 The gcc backend certainly does. I don't know about dmd. I don't have i...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/11) Oct 20 2007 Or maybe just move it from object.d and hello.d, over to std.c.stdio ?

Oskar Linde (21/40) Oct 19 2007 This has been nagging me too. Doing a = a[1..$]; has never felt very

Janice Caron (19/23) Oct 20 2007 I'd be happy for it to break if the break could be detected at compile

Daniel Keep (15/35) Oct 20 2007 You're calling a function that completely bypasses the type system, and

Janice Caron (7/16) Oct 20 2007 I would expect it to fail to compile, and I would also expect to be

Oskar Linde (10/29) Oct 20 2007 I guess there could be a warning (or error) passing a D array as a C

Walter Bright (2/15) Oct 20 2007 Yes.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (7/18) Oct 20 2007 I think you meant to write printf("my string is %.*s\n", str);

Walter Bright (3/28) Oct 20 2007 LOL!

Janice Caron (5/7) Oct 20 2007 That's pointer arithmetic, of course, so looked at in pure assembler

Walter Bright (6/16) Oct 20 2007 Divides are expensive. It's mitigated by:

David Brown (5/10) Oct 20 2007 At least with the gcc backend, divides by constants almost always turn i...

Walter Bright (6/16) Oct 20 2007 Divides are expensive. It's mitigated by:

Janice Caron (11/13) Oct 20 2007 It also breaks binary compatibility with object files already

Vladimir Panteleev (6/16) Oct 20 2007 I guess that could be fixed by changing the name mangling for dynamic ar...

Jascha Wetzel (5/6) Oct 20 2007 Since this is a different ABI, the debug symbols need to indicate which
Witold Baryluk (11/31) Oct 20 2007 What about garbage collector? And pointer that doesn't points to
Sean Kelly (16/66) Oct 20 2007 This isn't a compelling reason to keep the current implementation. As
Knud Soerensen (5/8) Oct 20 2007 I say go for it!
Derek Parnell (23/24) Oct 21 2007 My first concern was for the speed of accessing the length, either throu...
Don Clugston (13/30) Oct 21 2007 This seems to be a more powerful construction than .ptr + .length. If yo...
Graham St Jack (11/11) Oct 21 2007 If we are considering changing the internal representation of dynamic

David Brown (14/18) Oct 21 2007 D already makes use of this information in the GC. In fact, allocations

Graham St Jack (2/25) Oct 21 2007 Cool - thanks.

Kris (7/45) Oct 21 2007 Can anyone explain why the compiler/iterators/foreach would insist on us...
Steven Schveighoffer (25/31) Oct 22 2007 Yes, but checking the length (which is something I do often in my code) ...

Kris (5/40) Oct 22 2007 Yeah, I'm with Steve on this. There's more than one way to skin this cat...

Walter Bright <newshound1 digitalmars.com> writes:

Currently, arrays are represented under the hood as:

	size_t lengthOfArray;
	void* ptrToStartOfArray;

Which works out reasonably well. The problem is if you want to use array 
types as the basis of iterators, and you want to step through the array. 
There's no escaping it being two operations:

	decrement the length
	increment the pointer

This puts a brick in any fast implementation of iterators. To fix that, 
we can change the representation to:

	void* ptrToStartOfArray;
	void* ptrPastEndOfArray;

Then there's just one increment. Some tests show this can improve loop 
performance by up to 70%.

So, what does this not break?

1) Doesn't break array.ptr, this will still work.
2) Doesn't break array.length as rvalue, as this is rewritten by the 
compiler as (array.end - array.start).
3) Doesn't break array.length as an lvalue, as that is handled by the 
runtime library anyway.
4) Won't break anything on D 1.0, as it wouldn't get this change.
5) Won't break array slices, or any of that stuff we love about D arrays.

What does this break?

1) Passing dynamic arrays to printf as in:

	printf("my string is %*.s\n", str);

which relied on the under-the-hood representation. This doesn't work on 
some architectures anyway, and is thoroughly obsolete. One could quickly 
fix such code by writing it as:

	printf("my string is %*.s\n", str.length, str.ptr);

2) It breaks the internal library support code, but that's my problem.

3) It breaks binary compatibility with libraries already compiled. But 
we expect to break binary compatibility with D 2.0.

4) It breaks things like cast(ulong)str, if one was crazy enough to do 
that anyway.

5) It breaks anything that tries to look at the underlying 
representation of dynamic arrays - but such code should be rewritten to 
use .ptr and .length anyway, or slice notation.

So, what do you think?

Oct 19 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Walter Bright wrote:
 [stuff]

I'm hesitant to break something like this which is basically invisible
to the end user.  Then again, a 70% speed increase is not something to
turn one's nose up at!  And, as you said, you D 2.0 was forked so that
changes like this could be made.

You say you've measured the performance of loops using this, but what
about slicing code?  I know that I use a lot of [lower..$-something]
style code; how much slower does this become?  Once I'm not so busy, I
might try to work out the proportion of foreach to $ slicing in my code. :P

But, if the gains are there, and there's nothing critical that it breaks
that you haven't thought of, I can't see any reason not to do it.

	-- Daniel

P.S. I assume that array.ptr = array.start, and we'll have access to
array.end, yes?

Oct 19 2007

0ffh <spam frankhirsch.net> writes:

Daniel Keep wrote:
 You say you've measured the performance of loops using this, but what
 about slicing code?  I know that I use a lot of [lower..$-something]
 style code; how much slower does this become?  Once I'm not so busy, I
 might try to work out the proportion of foreach to $ slicing in my code. :P

I guess we go from changing one pointer and a size_t to changing two
pointers, I can't see a lot of space for that being slower than now.

Consider:
   wchar[] str="this is a test";
   int s=10; // slice start
   int e=14; // slice end
   tst=s[s..e];
Old array:
   tst.ptr=str.ptr+s*wchar.sizeof;
   tst.len=e-s;
New array:
   tst.start=str.start+s*wchar.sizeof;
   tst.end=str.start+e*wchar.sizeof;

Regards, Frank

Oct 19 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

0ffh wrote:
 Daniel Keep wrote:
 You say you've measured the performance of loops using this, but what
 about slicing code?  I know that I use a lot of [lower..$-something]
 style code; how much slower does this become?  Once I'm not so busy, I
 might try to work out the proportion of foreach to $ slicing in my
 code. :P

 
 I guess we go from changing one pointer and a size_t to changing two
 pointers, I can't see a lot of space for that being slower than now.
 
 Consider:
   wchar[] str="this is a test";
   int s=10; // slice start
   int e=14; // slice end
   tst=s[s..e];
 Old array:
   tst.ptr=str.ptr+s*wchar.sizeof;
   tst.len=e-s;
 New array:
   tst.start=str.start+s*wchar.sizeof;
   tst.end=str.start+e*wchar.sizeof;
 
 Regards, Frank

Well, that's definitely more work to compute end, although in most
cases, the multiply could be reduced to a bit shift.  I was talking
about cases where we slice using the length of the array, since now we
have to actually go and compute the length instead of simply looking it up.

e = 2;
tst = str[s .. $-e];

Old array:

tst.ptr = str.ptr + s*wchar.sizeof;
tst.len = (str.length-e) - s;

New array:

tst.start = str.start + s*wchar.sizeof;
//tst.end = str.start + (str.length-e)*wchar.sizeof;
tst.end   = str.start + ((str.end-str.start)-e)*wchar.sizeof;

So we've gone from two subtractions to an addition, two subtractions and
a multiply; that's double the number of operations, one of which is an
expensive multiply.  If the primary justification for this change is
eliminating a decrement operation for iteration, I think it's very
relevant to look at how it's going to affect other code!

	-- Daniel

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:
 tst.end   = str.start + ((str.end-str.start)-e)*wchar.sizeof;

tst.end = str.end - e*wchar.sizeof;

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Janice Caron <caron800 googlemail.com> wrote:
 On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:
 tst.end   = str.start + ((str.end-str.start)-e)*wchar.sizeof;

 tst.end = str.end - e*wchar.sizeof;

See, the $ wouldn't expand to (str.end-str.start) as you implied, it
would expand to ((str.end-str.start)/wchar.sizeof). Obviously the two
str.starts cancel each other out, so you can adjust the end pointer
(relative to $) without looking at the start pointer at all.

Oct 20 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Janice Caron wrote:
 On 10/20/07, Janice Caron <caron800 googlemail.com> wrote:
 On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:
 tst.end   = str.start + ((str.end-str.start)-e)*wchar.sizeof;

 tst.end = str.end - e*wchar.sizeof;

 
 See, the $ wouldn't expand to (str.end-str.start) as you implied, it

Ah yes, silly me for quoting something Walter wrote without thinking
about it first.  :P

 would expand to ((str.end-str.start)/wchar.sizeof). Obviously the two
 str.starts cancel each other out, so you can adjust the end pointer
 (relative to $) without looking at the start pointer at all.

Assuming the compiler is smart enough to do algebraic elimination.
I think I'll also have to take a look at what code the compiler
generates in these cases.

	-- Daniel

Oct 20 2007

David Brown <dlang davidb.org> writes:

On Sat, Oct 20, 2007 at 01:11:44PM +1000, Daniel Keep wrote:

You say you've measured the performance of loops using this, but what
about slicing code?  I know that I use a lot of [lower..$-something]
style code; how much slower does this become?  Once I'm not so busy, I
might try to work out the proportion of foreach to $ slicing in my code. :P

I wouldn't expect the performance to change in this case.  You're doing an
addition to adjust the start pointer, and a subtraction to adjust the
length.  The new representation would do the same addition and subtraction.

In fact, it looks like it would even generate the same code.

David

Oct 19 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:
 but what
 about slicing code?  I know that I use a lot of [lower..$-something]

While this may be optimised for simple arrays, it poses much more of a
problem for classes and structs which override opIndex() and
opSlice(). The problem is that indeces in those functions are always
specified as being relative to the beginning - there is no way to
specify that an index should be considered relative to the end - so
when you do a[0..$-1] in such a struct, the compiler has to convert
$-1 into (a.length()-1). Then, inside opSlice(), to turn it back to a
pointer you'd have to do (start+index).

I know adds and subtracts are not that expensive, but given that this
change is being talked about /because/ we want to get rid of a
subtract, it is worth thinking about this.

Presumably this will all sort itself out if we get iterators? When
that happens, slice expressions could be rewritten by the compiler as
iterator operations, which in turn would call opDereference() or
opRange(), taking iterator parameters. Is that likely to happen, or
will user-defined types always take the slow route?

Oct 20 2007

Walter Bright <newshound1 digitalmars.com> writes:

Janice Caron wrote:
 On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:
 but what
 about slicing code?  I know that I use a lot of [lower..$-something]

 
 While this may be optimised for simple arrays, it poses much more of a
 problem for classes and structs which override opIndex() and
 opSlice(). The problem is that indeces in those functions are always
 specified as being relative to the beginning - there is no way to
 specify that an index should be considered relative to the end - so
 when you do a[0..$-1] in such a struct, the compiler has to convert
 $-1 into (a.length()-1). Then, inside opSlice(), to turn it back to a
 pointer you'd have to do (start+index).
 
 I know adds and subtracts are not that expensive, but given that this
 change is being talked about /because/ we want to get rid of a
 subtract, it is worth thinking about this.
 
 Presumably this will all sort itself out if we get iterators? When
 that happens, slice expressions could be rewritten by the compiler as
 iterator operations, which in turn would call opDereference() or
 opRange(), taking iterator parameters. Is that likely to happen, or
 will user-defined types always take the slow route?

If you have a user-defined struct, you get to decide how it is 
implemented. How the built-in arrays are done under the hood shouldn't 
be relevant.

Oct 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Walter Bright wrote:
 Janice Caron wrote:
 On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:
 but what
 about slicing code?  I know that I use a lot of [lower..$-something]

 While this may be optimised for simple arrays, it poses much more of a
 problem for classes and structs which override opIndex() and
 opSlice(). The problem is that indeces in those functions are always
 specified as being relative to the beginning - there is no way to
 specify that an index should be considered relative to the end - so
 when you do a[0..$-1] in such a struct, the compiler has to convert
 $-1 into (a.length()-1). Then, inside opSlice(), to turn it back to a
 pointer you'd have to do (start+index).


'$' doesn't work in user defined types.  So the rest of your comment is 
only relevant to some hypothetical future D where $ is allowed for user 
types.  Andrei strongly favored making it mean .length everywhere if I 
recall correctly, so if anything happens with it that's probably what it 
will be, and your comment will be relevant.

 I know adds and subtracts are not that expensive, but given that this
 change is being talked about /because/ we want to get rid of a
 subtract, it is worth thinking about this.

 Presumably this will all sort itself out if we get iterators? When
 that happens, slice expressions could be rewritten by the compiler as
 iterator operations, which in turn would call opDereference() or
 opRange(), taking iterator parameters. Is that likely to happen, or
 will user-defined types always take the slow route?

 
 If you have a user-defined struct, you get to decide how it is 
 implemented. How the built-in arrays are done under the hood shouldn't 
 be relevant.

It becomes relevant if $ becomes a synonym for .length as was previously 
proposed.  In that case you don't get to decide how $ is implemented. 
It has to compute the length, even if that's O(N) and not really needed 
for computing the $-1'th node.

Or if all you're trying to do is make a light wrapper around a built-in 
array (to, say, add an extra data member to it), then it means that your 
opIndex could be significantly less efficient than the built-in one.

--bb

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Bill Baxter <dnewsgroup billbaxter.com> wrote:
 It becomes relevant if $ becomes a synonym for .length as was previously
 proposed.  In that case you don't get to decide how $ is implemented.
 It has to compute the length, even if that's O(N) and not really needed
 for computing the $-1'th node.

 Or if all you're trying to do is make a light wrapper around a built-in
 array (to, say, add an extra data member to it), then it means that your
 opIndex could be significantly less efficient than the built-in one.

I guess what I was thinking is... suppose I have an arbitrary,
user-defined collection class:

    MyCollection!(int) c;

and I want to dereference the last element. The source code might be:

    int n = c[$-1];

Now, currently, (or if $ gets defined as length), that would get turned into:

    int n = c.opIndex(c.length()-1);

which /might/ be computationally intensive, depending on what
MyCollection does with length() and opIndex(). The thing is, Walter
seemed to imply that iterators were on their way, so what I was
thinking was, if

    int n = c[$-1];

could instead be translated into

    int n = *(c.end()-1);

then the computational overhead would vanish. I think we'd need
something like opDereference() to overload the unary *. But, see,
you'd never actually have to /write/ the *, because you'd still write
"c[$-1]" - it's just what's going on under the hood.

Similarly, I'm thinking the expression c[a,$-b] could turn into
opRange(c.begin()+a, c.end()-b), instead of c.opIndex(a,
c.length()-1).

But it's all just pie in the sky right now, because I've got
absolutely no idea what, if anything, Walter plans to do with
collections and iterators.

Oct 20 2007

BCS <ao pathlink.com> writes:

Reply to Janice,

 On 10/20/07, Bill Baxter <dnewsgroup billbaxter.com> wrote:
 
 It becomes relevant if $ becomes a synonym for .length as was
 previously
 proposed.  In that case you don't get to decide how $ is implemented.
 It has to compute the length, even if that's O(N) and not really
 needed
 for computing the $-1'th node.
 Or if all you're trying to do is make a light wrapper around a
 built-in array (to, say, add an extra data member to it), then it
 means that your opIndex could be significantly less efficient than
 the built-in one.
 

 I guess what I was thinking is... suppose I have an arbitrary,
 user-defined collection class:
 
 MyCollection!(int) c;
 
 and I want to dereference the last element. The source code might be:
 
 int n = c[$-1];
 
 Now, currently, (or if $ gets defined as length), that would get
 turned into:
 
 int n = c.opIndex(c.length()-1);
 

if the return of .length is left unrestricted, then you can play games with 
having it return a proxy type that another opIndex then uses.

Oct 20 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

BCS wrote:

 if the return of .length is left unrestricted, then you can play games 
 with having it return a proxy type that another opIndex then uses.

Yes, it doesn't have to be more complicated than so. That would handle 
the multidimensional case as well. Possibly let $ -> opDollar() instead 
of length(). Then, all we need is to make a..b turn into a special range 
type and depreciate opSlice.

The remaining problem would be that while .length always explicitly 
refers to an object. $ only does that implicitly within index 
operations. A global _dollar() would solve that and allow:

auto a = 1 .. $-1;
auto b = $-5 .. $;
auto c = myObj[a,b];

But that might be too much to ask for.

-- 
Oskar

Oct 21 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/21/07, Oskar Linde <oskar.lindeREM ovegmail.com> wrote:
 But that might be too much to ask for.

I think those are /excellent/ ideas.

So I assume opDollar would return some proxy type which represents
"offset from end", and looks something like

struct Dollar
{
    int offset = 0;
    static Dollar opCall(int n) { Dollar d; d.offset=n; return d; }
    Dollar opAdd(int n) { return Dollar(offset+n); }
    Dollar opAdd_r(int n) { return Dollar(offset+n); }
    Dollar opSub(int n) { return Dollar(offset-n); }
}

and x..y returns a Slice!(typeof(x),typeof(y)) where

struct Slice(T,U)
{
    T x;
    U y;
}

(no need to implement opDotDot - that can happen automatically), and
then opIndex knows to expect either an integer, a Dollar, or Slice
whose elements are integers or Dollars. (More exotic uses of course
can be imagined, but that would be enough to emulate current
behavior).

I like it!

Oct 21 2007

0ffh <spam frankhirsch.net> writes:

Walter Bright wrote:
 [see news://news.digitalmars.com:119/ffbr56$r40$1 digitalmars.com]
 What does this break?
 1) Passing dynamic arrays to printf as in:
     printf("my string is %*.s\n", str);

Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar))
anyway, never tried what you're doing there. Did I miss something?

 2) It breaks the internal library support code, but that's my problem.

Si.

 3) It breaks binary compatibility with libraries already compiled. But 
 we expect to break binary compatibility with D 2.0.

No sweat.

 4) It breaks things like cast(ulong)str, if one was crazy enough to do 
 that anyway.

Never needed it... =)

 5) It breaks anything that tries to look at the underlying 
 representation of dynamic arrays - but such code should be rewritten to 
 use .ptr and .length anyway, or slice notation.

Been a good boy and used .ptr and .length, so I'm fine with it... grin!

 So, what do you think?

I'm all for it!

Regards, Frank

Oct 19 2007

Walter Bright <newshound1 digitalmars.com> writes:

0ffh wrote:
 Walter Bright wrote:
 [see news://news.digitalmars.com:119/ffbr56$r40$1 digitalmars.com]
 What does this break?
 1) Passing dynamic arrays to printf as in:
     printf("my string is %*.s\n", str);

 
 Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar))
 anyway, never tried what you're doing there. Did I miss something?

printf("my string is %*.s\n", str.length, str.ptr);

will be a little faster, because it doesn't need to allocate memory.

Oct 19 2007

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter Bright wrote:

 1) Passing dynamic arrays to printf as in:
     printf("my string is %*.s\n", str);

 Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar))
 anyway, never tried what you're doing there. Did I miss something?

 
 printf("my string is %*.s\n", str.length, str.ptr);
 
 will be a little faster, because it doesn't need to allocate memory.

When I try this in C, it gives a warning about the "length" argument:
field precision should have type 'int', but argument 2 has type 'size_t'

Wouldn't this warning apply to C too, when using the printf function ?
I'm thinking it might need a cast(int) to work properly on 64-bit... ?

--anders

Oct 20 2007

Walter Bright <newshound1 digitalmars.com> writes:

Anders F Bj�rklund wrote:
 Walter Bright wrote:
 
 1) Passing dynamic arrays to printf as in:
     printf("my string is %*.s\n", str);

 Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar))
 anyway, never tried what you're doing there. Did I miss something?

 printf("my string is %*.s\n", str.length, str.ptr);

 will be a little faster, because it doesn't need to allocate memory.

 
 When I try this in C, it gives a warning about the "length" argument:
 field precision should have type 'int', but argument 2 has type 'size_t'
 
 Wouldn't this warning apply to C too, when using the printf function ?
 I'm thinking it might need a cast(int) to work properly on 64-bit... ?

I think I'm not going to think about it and just deprecate it <g>.

Oct 20 2007

"Chris Miller" <chris dprogramming.com> writes:

On Fri, 19 Oct 2007 23:03:02 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Currently, arrays are represented under the hood as:

 	size_t lengthOfArray;
 	void* ptrToStartOfArray;

 Which works out reasonably well. The problem is if you want to use array  
 types as the basis of iterators, and you want to step through the array.  
 There's no escaping it being two operations:

 	decrement the length
 	increment the pointer

 This puts a brick in any fast implementation of iterators. To fix that,  
 we can change the representation to:

 	void* ptrToStartOfArray;
 	void* ptrPastEndOfArray;

 Then there's just one increment. Some tests show this can improve loop  
 performance by up to 70%.

It sounds interesting...

I wonder what kind of iterators you have in mind. Why couldn't the  
iterators just hold start and end pointers? It would only need to be  
calculated when creating an iterator from a non-iterator. Or perhaps you  
want a slice to be an iterator itself, so you could just do slice++ and it  
slices out the first element.

Oct 19 2007

Walter Bright <newshound1 digitalmars.com> writes:

Chris Miller wrote:
 I wonder what kind of iterators you have in mind. Why couldn't the 
 iterators just hold start and end pointers? It would only need to be 
 calculated when creating an iterator from a non-iterator. Or perhaps you 
 want a slice to be an iterator itself, so you could just do slice++ and 
 it slices out the first element.

I think you answered it!

Oct 19 2007

"Chris Miller" <chris dprogramming.com> writes:

On Sat, 20 Oct 2007 02:27:29 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Chris Miller wrote:
 I wonder what kind of iterators you have in mind. Why couldn't the  
 iterators just hold start and end pointers? It would only need to be  
 calculated when creating an iterator from a non-iterator. Or perhaps  
 you want a slice to be an iterator itself, so you could just do  
 slice++ and it slices out the first element.

 I think you answered it!

I'm starting to think the change might not be such a good idea. Imagine  
all the existing code extensively using length. This change is for one  
thing, that could probably just as easily be improved by changing how that  
one thing is implemented, without affecting all this other code. It's not  
even really about supporting legacy usage; the other methods are probably  
mostly quite valid still, and iterators won't solve all those cases.

Also, doing slice++ doesn't allow slice-- safely (not that all forward  
iterators are backwards iterators, but an iterator from a slice should be  
bidirectional). Slice iterators should probably have 3 pointers:  
beginning, end, and current position. Just build an iterator from a slice,  
which is a couple extra cycles up front.

I really want to hear more about what's going on, instead of guessing how  
grand things are going to be, or not going to be. How about another  
newsgroup for possible future plans, like digitalmars.D.alpha

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Chris Miller <chris dprogramming.com> wrote:
 Or perhaps you
 want a slice to be an iterator itself, so you could just do slice++ and it
 slices out the first element.

It would be very counterintuitive if ("world"++ == "orld"), so I don't
think I'd like that.

In C++ terms, an interator is often just a pointer (plus some operator
overloads), and an array of [ begin, end ]  iterators is called a
range. Looked at like that, an array slice in the new scheme would be
an array of two iterators (i.e. a range), rather than a single
iterator.

Or so it seems to me.

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Janice Caron <caron800 googlemail.com> wrote:
 On 10/20/07, Chris Miller <chris dprogramming.com> wrote:
 Or perhaps you
 want a slice to be an iterator itself, so you could just do slice++ and it
 slices out the first element.

 It would be very counterintuitive if ("world"++ == "orld"), so I don't
 think I'd like that.

To add to that, iterators in general are not range checked, because
they are generalised pointers, not generalised indeces.

You could argue that perhaps they /should/ be range checked (at least,
in debug builds in D), but to achieve that, you'd need a struct
containing not two, but /three/ pointers - one to the start of the
slice, one to the end of the slice, and one to the thing pointed at.
Otherwise you'd never be able to do --p.

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:
 1) Passing dynamic arrays to printf as in:

         printf("my string is %*.s\n", str);

This will break some of my old code, but none of my new code.

What you could do to fix that is to break it even more (ironically).
Make "printf" a reserved word, or an alias for writef, or /something/
which will cause any use of it to become a compile error.


 So, what do you think?

I had always assumed that, from the start, you designed it as { ptr,
length } deliberately in order to help the garbage collector.

If you change it, then you will be maintaining a pointer (the end
pointer) to something beyond the end of the array. For example,
suppose I do:

    int[] s = new int[4];
    int[] t = new int[1000];

now t.begin might end up being equal to s.end. But now when we do

    t = null;

or t goes out of scope or we otherwise assign t, suddenly that array
can no longer be freed, because the end pointer of s points to it.

(Not that I particularly care. Just pointing it out. Go for it).

Oct 19 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Janice Caron wrote:

 I had always assumed that, from the start, you designed it as { ptr,
 length } deliberately in order to help the garbage collector.
 
 If you change it, then you will be maintaining a pointer (the end
 pointer) to something beyond the end of the array. For example,
 suppose I do:
 
     int[] s = new int[4];
     int[] t = new int[1000];
 
 now t.begin might end up being equal to s.end. 

All array allocations in D allocate an extra byte just to prevent this.

But now when we do
 
     t = null;
 
 or t goes out of scope or we otherwise assign t, suddenly that array
 can no longer be freed, because the end pointer of s points to it.

-- 
Oskar

Oct 19 2007

Walter Bright <newshound1 digitalmars.com> writes:

Janice Caron wrote:
 On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:
 1) Passing dynamic arrays to printf as in:

         printf("my string is %*.s\n", str);

 
 This will break some of my old code, but none of my new code.
 
 What you could do to fix that is to break it even more (ironically).
 Make "printf" a reserved word, or an alias for writef, or /something/
 which will cause any use of it to become a compile error.

Just add 'deprecate' <g>.

 
 
 So, what do you think?

 
 I had always assumed that, from the start, you designed it as { ptr,
 length } deliberately in order to help the garbage collector.
 
 If you change it, then you will be maintaining a pointer (the end
 pointer) to something beyond the end of the array. For example,
 suppose I do:
 
     int[] s = new int[4];
     int[] t = new int[1000];
 
 now t.begin might end up being equal to s.end. But now when we do
 
     t = null;
 
 or t goes out of scope or we otherwise assign t, suddenly that array
 can no longer be freed, because the end pointer of s points to it.

No prob, t=null already sets both members to 0.

 
 (Not that I particularly care. Just pointing it out. Go for it).

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:
     int[] s = new int[4];
     int[] t = new int[1000];

 now t.begin might end up being equal to s.end. But now when we do

     t = null;

 or t goes out of scope or we otherwise assign t, suddenly that array
 can no longer be freed, because the end pointer of s points to it.

 No prob, t=null already sets both members to 0.

Both members of t, yes, but it won't stop s.end from pointing to t's
allocated memory.

That said, someone posted that you always allocate an extra byte. If
that's true, problem solved.

Oct 20 2007

Walter Bright <newshound1 digitalmars.com> writes:

Janice Caron wrote:
 That said, someone posted that you always allocate an extra byte. If
 that's true, problem solved.

Yes, an extra byte is allocated for just this reason.

Oct 20 2007

Jascha Wetzel <firstname mainia.de> writes:

Walter Bright wrote:
 Janice Caron wrote:
 That said, someone posted that you always allocate an extra byte. If
 that's true, problem solved.

 
 Yes, an extra byte is allocated for just this reason.

couldn't the end pointer just point to the last element?
iteration could use <=
the length would be end-start+1
etc.

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:
 Walter Bright wrote:
 Janice Caron wrote:
 That said, someone posted that you always allocate an extra byte. If
 that's true, problem solved.

 Yes, an extra byte is allocated for just this reason.

 couldn't the end pointer just point to the last element?

Oooh yuk. How horrible. I'd rather that didn't happen. For one thing,
it means that empty arrays (and empty collection ranges generally)
would have to have a different representation.

No, I like the "standard" way of doing things, whereby the end
iterator refences the last+1 element (and therefore cannot be
dereferenced).

Oct 20 2007

Jascha Wetzel <firstname mainia.de> writes:

Janice Caron wrote:
 On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:
 Walter Bright wrote:
 Janice Caron wrote:
 That said, someone posted that you always allocate an extra byte. If
 that's true, problem solved.

 Yes, an extra byte is allocated for just this reason.

 couldn't the end pointer just point to the last element?

 
 Oooh yuk. How horrible. I'd rather that didn't happen. For one thing,
 it means that empty arrays (and empty collection ranges generally)
 would have to have a different representation.
 
 No, I like the "standard" way of doing things, whereby the end
 iterator refences the last+1 element (and therefore cannot be
 dereferenced).

one can argue, that this way, empty arrays have an unambiguous 
representation (they are always null).
there is enough code like
if ( str is null || str == "" )
which wouldn't be necessary.
the current semantics deal with null arrays in the same way as with 
empty arrays (length == 0, appending works alike). therefore, the 
additional expressiveness one would usually expect (empty array vs. 
invalid array) isn't quite there.
but i still wonder, whether there would be performance issues.

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:
 one can argue, that this way, empty arrays have an unambiguous
 representation (they are always null).

I will argue below that this is not a good thing.

Suppose I write a parser function which (say) parses an integer. It
would have a signature like

   bool parseInt(ref string s, out int n)

The idea is that if the function returns true, then the parsed int is
returned in n, and s's pointer is advanced past the bytes consumed.
However, I don't want s.ptr to be set to null, even if we consume all
the bytes. For example, I might want to do a pointer subtraction later
on. Or I might want to be able to extract the pointer because s was
merely a slice of a larger string.

For all sorts of reasons, allowing empty strings to have an address
does make sense.


 there is enough code like
 if ( str is null || str == "" )

I always write if (str.length == 0) to test for empty strings (and
arrays generally). That works just fine, always.

Oct 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Jascha Wetzel wrote:
 Janice Caron wrote:
 On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:
 Walter Bright wrote:
 Janice Caron wrote:
 That said, someone posted that you always allocate an extra byte. If
 that's true, problem solved.

 Yes, an extra byte is allocated for just this reason.

 couldn't the end pointer just point to the last element?

 Oooh yuk. How horrible. I'd rather that didn't happen. For one thing,
 it means that empty arrays (and empty collection ranges generally)
 would have to have a different representation.

 No, I like the "standard" way of doing things, whereby the end
 iterator refences the last+1 element (and therefore cannot be
 dereferenced).

 
 one can argue, that this way, empty arrays have an unambiguous 
 representation (they are always null).
 there is enough code like
 if ( str is null || str == "" )
 which wouldn't be necessary.
 the current semantics deal with null arrays in the same way as with 
 empty arrays (length == 0, appending works alike). 


Not quite true.  if(str){} tests only the .ptr part and ignores the 
length (as I quite painfully discovered recently... Sure wish there were 
an easy way to find all the places in my code where I used that 
expression instead of .length==0.).

--bb

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Bill Baxter <dnewsgroup billbaxter.com> wrote:
 Not quite true.  if(str){} tests only the .ptr part and ignores the
 length

Ooh, that's nasty. I've never tried that.

Seems to me that either (a) if (array) shouldn't work - ie. arrays
shouldn't implicitly cast to bool, or else (b) if (array) should be
true iff the array is non-empty (which is not the same thing as
non-null).

I guess currently it tests whether or not the array is null - that is,
unassigned or explicitly assigned with null. I'm not sure why that's
useful.

Oct 20 2007

BCS <ao pathlink.com> writes:

Reply to Bill,

 Jascha Wetzel wrote:
 
 Janice Caron wrote:
 
 On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:
 
 Walter Bright wrote:
 
 Janice Caron wrote:
 
 That said, someone posted that you always allocate an extra byte.
 If that's true, problem solved.
 

 Yes, an extra byte is allocated for just this reason.
 

 couldn't the end pointer just point to the last element?
 

 Oooh yuk. How horrible. I'd rather that didn't happen. For one
 thing, it means that empty arrays (and empty collection ranges
 generally) would have to have a different representation.
 
 No, I like the "standard" way of doing things, whereby the end
 iterator refences the last+1 element (and therefore cannot be
 dereferenced).
 

 one can argue, that this way, empty arrays have an unambiguous
 representation (they are always null).
 there is enough code like
 if ( str is null || str == "" )
 which wouldn't be necessary.
 the current semantics deal with null arrays in the same way as with
 empty arrays (length == 0, appending works alike).

 Not quite true.  if(str){} tests only the .ptr part and ignores the
 length (as I quite painfully discovered recently... Sure wish there
 were an easy way to find all the places in my code where I used that
 expression instead of .length==0.).
 
 --bb
 

Ohh. That's another feature for an IDE: Find this pattern in the
syntax/semantic 
tree.

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:
 Yes, an extra byte is allocated for just this reason.

 couldn't the end pointer just point to the last element?
 iteration could use <=

Ooh - just noticed this.

The test is normally (p != end). You can't use <= with iterators in
general because, if you did, the iterators would have to implement
opCmp(), which would be an O(N) operation in the case of linked lists.
!= is O(1).

With plain old arrays, for (p=start; p<=end; ++p) would fail with
empty arrays if the empty array was represented as [ null, null ],
because (null <= null) is true, so the for body would get executed
once with p == null. (!)

So there are /lots/ of reasons why end has to point /beyond/ the last byte.

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

Walter, I think we might want a new property, isEmpty, for arrays (and
collections in general).

See, we can't test for (array == null), because as you know, a
non-null array can still be empty.

The test (array.length == 0) will still work, but since calculating
.length might involve a divide, it's no longer guaranteed to be
optimal. (Also, I can imagine collection in which .length might be
O(N)).

We could test for (array.start == array.end). In fact, that's probably
the best approach, given what you've told us so far. But is there a
guarantee that that would necessarily be the most efficient test for
collections in general? I can imagine collections in which .end might
involve a non-trivial calculation.

.isEmpty would be clean, simple, and obvious.

Oct 20 2007

David Brown <dlang davidb.org> writes:

On Sat, Oct 20, 2007 at 04:44:06PM +0100, Janice Caron wrote:

The test (array.length == 0) will still work, but since calculating
.length might involve a divide, it's no longer guaranteed to be
optimal. (Also, I can imagine collection in which .length might be
O(N)).

The compiler should never generate an actual division for a compile-time
constant.  For any constant division, there is an equivalent multiplication
that will give the same result.  Most modern architectures do multiply
quickly.

It is more work than strictly necessary to determine if the length is zero.
However, (array.length == 0) is a real easy thing for an optimizer to see.
It certainly doesn't justify adding a special construct to the language for
it.  It would be easier to just fix the optimizer if it misses it.

David

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, David Brown <dlang davidb.org> wrote:
 On Sat, Oct 20, 2007 at 04:44:06PM +0100, Janice Caron wrote:

 For any constant division, there is an equivalent multiplication
 that will give the same result.

I hadn't thought about that before, but yes, you're right. The rule
is, if two numbers a and b are relatively prime (that is to say:
gcd(a,b)=1) there is exactly one number c=inv(a) mod b so that c*a mod
b=1. So for what you suggest, b would have to be 2^32 (that is,
uint.max+1) and a the number you want to divide by. In order to be
coprime with b, a may /not/ be even, but that's not a problem because
we can just do shifts until it isn't. So, for example, to divide by 7,
you can just multiply by invmod(7,0x100000000). To divide by 14, you'd
right-shift first, and then divide by 7. Cool!

I wonder if D actually does things this way?


 However, (array.length == 0) is a real easy thing for an optimizer to see.
 It certainly doesn't justify adding a special construct to the language for
 it.

I was thinking ahead. I was assuming we'd want the same semantics for
all collections, not just arrays. C++ STL has an empty() function as
well as a size() function for exactly that reason. For some
collections, calculating .length() will be an O(N) operation, so for
those collections, an isEmpty() functions would make sense. I agree
it's less necessary for a plain old array, but it would help template
metaprogramming enormously if something like isEmpty was available, so
you wouldn't need to know what kind of collection you were dealing
with.

That said, it's easy enough to add as a standalone function if it's
not built in.

Oct 20 2007

David Brown <dlang davidb.org> writes:

On Sat, Oct 20, 2007 at 06:12:07PM +0100, Janice Caron wrote:

I wonder if D actually does things this way?

The gcc backend certainly does.  I don't know about dmd.  I don't have it
handy to check, but it wouldn't be hard to disassemble the output and look.

David

Oct 20 2007

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter Bright wrote:

 This will break some of my old code, but none of my new code.

 What you could do to fix that is to break it even more (ironically).
 Make "printf" a reserved word, or an alias for writef, or /something/
 which will cause any use of it to become a compile error.

 
 Just add 'deprecate' <g>.

Or maybe just move it from object.d and hello.d, over to std.c.stdio ?

:-P

--anders

Oct 20 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Walter Bright wrote:
 Currently, arrays are represented under the hood as:
 
     size_t lengthOfArray;
     void* ptrToStartOfArray;
 
 Which works out reasonably well. The problem is if you want to use array 
 types as the basis of iterators, and you want to step through the array. 
 There's no escaping it being two operations:
 
     decrement the length
     increment the pointer

This has been nagging me too. Doing a = a[1..$]; has never felt very 
efficient. Will there be a different syntax for such idioms?

 What does this break?
 
 1) Passing dynamic arrays to printf as in:
 
     printf("my string is %*.s\n", str);

I'm glad that breaks. That code was never portable anyway, and has 
caused headaches porting D code to other platforms.

 4) It breaks things like cast(ulong)str, if one was crazy enough to do 
 that anyway.

This is something 64-bit D also breaks.

 So, what do you think?

It really shows the power of the property syntax that one can 
transparently change the underlying representation like this. I am all 
for it (as far as I can see now).

Two related things that have been holding me off D2 are waiting for some 
announced code breaking changes that will have quite some impact on how 
code is written (not mentioning const):

1. T[new]
  - Hopefully T[*] (hint ;) )
  - Hopefully a reference type

2. Static arrays becoming value types
  - Being able to return them from functions
  - No longer having to litter the code with incompatible wrapper structs

Are the ETA for those still far away?

-- 
Oskar

Oct 19 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Oskar Linde <oskar.lindeREM ovegmail.com> wrote:
 1) Passing dynamic arrays to printf as in:

     printf("my string is %*.s\n", str);

 I'm glad that breaks. That code was never portable anyway

I'd be happy for it to break if the break could be detected at compile
time (or even at runtime with an assert). I'm not terrifically happy
that source code that previously generated code that ran just fine
would now generate code that will simply crash with no explanation.

You have to remember that there was once a time when writef() and
family did not exist, and that printf("%.*s") was the /recommended/
way of doing things, documented as such on the Digital Mars site. So
you cannot blame anyone for using that idiom, say, three or four years
ago.

As for not portable, I'm surprised at that. I thought the C printf()
function was fairly standardised? Ah well, it doesn't matter. I'm sure
everyone here agrees that D folk should not use printf anyway, now
that we've got better things.

I'd be all in favor making any use of printf a compile error - say by
making a global function which takes no parameters (forcing coders to
explicity qualify the function name if they wanted to C printf). Note
that this reasoning also applies to vprintf, fprintf, vfprintf,
sprintf, snprintf, etc., etc...

Oct 20 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Janice Caron wrote:
 On 10/20/07, Oskar Linde <oskar.lindeREM ovegmail.com> wrote:
 1) Passing dynamic arrays to printf as in:

     printf("my string is %*.s\n", str);

 I'm glad that breaks. That code was never portable anyway

 
 I'd be happy for it to break if the break could be detected at compile
 time (or even at runtime with an assert). I'm not terrifically happy
 that source code that previously generated code that ran just fine
 would now generate code that will simply crash with no explanation.

You're calling a function that completely bypasses the type system, and
you want compile-time checking on it? :S  If you're going to play with
fire (type-unsafe variadic C functions), you shouldn't be surprised when
you get third-degree burns.  ;)

 You have to remember that there was once a time when writef() and
 family did not exist, and that printf("%.*s") was the /recommended/
 way of doing things, documented as such on the Digital Mars site. So
 you cannot blame anyone for using that idiom, say, three or four years
 ago.

This is true, but remember that this is for D 2.0.  If someone honestly
expects to be able to take code written three years ago when the
language was still in flux (and which, mind you, likely won't even
compile with the current 1.0 compiler) and compile it on 2.0 without
issues is, in my opinion, delusional.

 I'd be all in favor making any use of printf a compile error - say by
 making a global function which takes no parameters (forcing coders to
 explicity qualify the function name if they wanted to C printf). Note
 that this reasoning also applies to vprintf, fprintf, vfprintf,
 sprintf, snprintf, etc., etc...

I think that removing it from object so that it can't be used without
explicitly importing it would be sufficient.

Maybe sticking a deprecated attribute on the printf declaration would
also help.  Maybe with a message saying "No, you want writef.  Seriously."

	-- Daniel

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:
 This is true, but remember that this is for D 2.0.  If someone honestly
 expects to be able to take code written three years ago when the
 language was still in flux (and which, mind you, likely won't even
 compile with the current 1.0 compiler) and compile it on 2.0 without
 issues is, in my opinion, delusional.

I would expect it to fail to compile, and I would also expect to be
able to use the compiler errors (or in extreme cases, runtime asserts)
to tell me what needs to be changed. What I don't expect is no
complaints at all, just a runtime, hard to debug, crash. (That's
especially important if you're using code you didn't write yourself).

 think that removing it from object so that it can't be used without
explicitly importing it would be sufficient.

 Maybe sticking a deprecated attribute on the printf declaration would
 also help.  Maybe with a message saying "No, you want writef.  Seriously."

Both are excellent ideas.

Oct 20 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Janice Caron wrote:
 On 10/20/07, Oskar Linde <oskar.lindeREM ovegmail.com> wrote:
 1) Passing dynamic arrays to printf as in:

     printf("my string is %*.s\n", str);

 I'm glad that breaks. That code was never portable anyway

 
 I'd be happy for it to break if the break could be detected at compile
 time (or even at runtime with an assert). I'm not terrifically happy
 that source code that previously generated code that ran just fine
 would now generate code that will simply crash with no explanation.

I guess there could be a warning (or error) passing a D array as a C 
vararg argument.

 You have to remember that there was once a time when writef() and
 family did not exist, and that printf("%.*s") was the /recommended/
 way of doing things, documented as such on the Digital Mars site. So
 you cannot blame anyone for using that idiom, say, three or four years
 ago.

I am well aware of that, and I dont blame anyone for using the idiom. I 
merely blame the recommendation. ;)

 As for not portable, I'm surprised at that. I thought the C printf()
 function was fairly standardised? Ah well, it doesn't matter. I'm sure
 everyone here agrees that D folk should not use printf anyway, now
 that we've got better things.

Printf is standardized. That is not the problem. The problem is that the 
above idiom is a hack relying on a specific stack layout when passing 
vararg arguments to C functions.

-- 
Oskar

Oct 20 2007

Walter Bright <newshound1 digitalmars.com> writes:

Oskar Linde wrote:
 Two related things that have been holding me off D2 are waiting for some 
 announced code breaking changes that will have quite some impact on how 
 code is written (not mentioning const):
 
 1. T[new]
  - Hopefully T[*] (hint ;) )
  - Hopefully a reference type
 
 2. Static arrays becoming value types
  - Being able to return them from functions
  - No longer having to litter the code with incompatible wrapper structs
 
 Are the ETA for those still far away?

Yes.

Oct 20 2007

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter Bright wrote:

 What does this break?
 
 1) Passing dynamic arrays to printf as in:
 
     printf("my string is %*.s\n", str);

I think you meant to write printf("my string is %.*s\n", str);

Which is something of an argument against printf in itself :-)

 which relied on the under-the-hood representation. This doesn't work on 
 some architectures anyway, and is thoroughly obsolete. One could quickly 
 fix such code by writing it as:
 
     printf("my string is %*.s\n", str.length, str.ptr);

Yes, but that change had to be done sooner or later anyway...

Same as with the cast from char[] to ulong, dirty hacks both.

 So, what do you think?

Good riddance.

--anders

Oct 20 2007

Walter Bright <newshound1 digitalmars.com> writes:

Anders F Bj�rklund wrote:
 Walter Bright wrote:
 
 What does this break?

 1) Passing dynamic arrays to printf as in:

     printf("my string is %*.s\n", str);

 
 I think you meant to write printf("my string is %.*s\n", str);
 
 Which is something of an argument against printf in itself :-)

Yes.

 
 which relied on the under-the-hood representation. This doesn't work 
 on some architectures anyway, and is thoroughly obsolete. One could 
 quickly fix such code by writing it as:

     printf("my string is %*.s\n", str.length, str.ptr);

 
 Yes, but that change had to be done sooner or later anyway...
 
 Same as with the cast from char[] to ulong, dirty hacks both.
 
 So, what do you think?

 
 Good riddance.

LOL!

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:

 2) Doesn't break array.length as rvalue, as this is rewritten by the
 compiler as (array.end - array.start).

That's pointer arithmetic, of course, so looked at in pure assembler
terms, we're talking ((array.end - array.start) / element.sizeof)

Is there any performance cost to the divide, or is optimised to be a
right shift (at least where it can be)?

Oct 20 2007

Walter Bright <newshound1 digitalmars.com> writes:

Janice Caron wrote:
 On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:
 
 2) Doesn't break array.length as rvalue, as this is rewritten by the
 compiler as (array.end - array.start).

 
 That's pointer arithmetic, of course, so looked at in pure assembler
 terms, we're talking ((array.end - array.start) / element.sizeof)
 
 Is there any performance cost to the divide, or is optimised to be a
 right shift (at least where it can be)?


Divides are expensive. It's mitigated by:
1) most of the time, it's 1
2) most of the rest of the time, it's a power of 2 and optimized to shift
3) some of the rest of the time, it can be skipped as it divides then
multiples by the same amount.

Oct 20 2007

David Brown <dlang davidb.org> writes:

On Sat, Oct 20, 2007 at 01:16:54AM -0700, Walter Bright wrote:

 Divides are expensive. It's mitigated by:
 1) most of the time, it's 1
 2) most of the rest of the time, it's a power of 2 and optimized to shift
 3) some of the rest of the time, it can be skipped as it divides then
 multiples by the same amount.

At least with the gcc backend, divides by constants almost always turn into
multiplies of strange constants, or weird tricks with shifts and
subtracts/adds, depending on what the compiler things will be better.

David

Oct 20 2007

Walter Bright <newshound1 digitalmars.com> writes:

Janice Caron wrote:
 On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:
 
 2) Doesn't break array.length as rvalue, as this is rewritten by the
 compiler as (array.end - array.start).

 
 That's pointer arithmetic, of course, so looked at in pure assembler
 terms, we're talking ((array.end - array.start) / element.sizeof)
 
 Is there any performance cost to the divide, or is optimised to be a
 right shift (at least where it can be)?


Divides are expensive. It's mitigated by:
1) most of the time, it's 1
2) most of the rest of the time, it's a power of 2 and optimized to shift
3) some of the rest of the time, it can be skipped as it divides then 
multiples by the same amount.

Oct 20 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:
 What does this break?

 3) It breaks binary compatibility with libraries already compiled.

It also breaks binary compatibility with object files already
compiled. I don't use dsss do I don't know how that will deal, but
anyone using a makefile or some other build system needs to make sure
that .obj files are dependent, not only on the correstponding .d file,
but also on dmd.exe (or platform equivalent), so that the rebuild
automatically updates the .obj file when dmd changes.

That's not Walter's problem of course - it's just something of which
developers need to be aware. Even if your build script does not have
this dependency built in, a simple one-time "clean" build will fix
all.

Oct 20 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Sat, 20 Oct 2007 12:15:14 +0300, Janice Caron <caron800 googlemail.com>
wrote:

 On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:
 What does this break?

 3) It breaks binary compatibility with libraries already compiled.

 It also breaks binary compatibility with object files already
 compiled. I don't use dsss do I don't know how that will deal, but
 anyone using a makefile or some other build system needs to make sure
 that .obj files are dependent, not only on the correstponding .d file,
 but also on dmd.exe (or platform equivalent), so that the rebuild
 automatically updates the .obj file when dmd changes.

I guess that could be fixed by changing the name mangling for dynamic arrays.
Then the users will be forced to rebuild their apps (unless they pass dynamic
arrays hidden in void pointers or arrays).

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 20 2007

Jascha Wetzel <firstname mainia.de> writes:

Walter Bright wrote:
 What does this break?

Since this is a different ABI, the debug symbols need to indicate which 
version is used.
For CodeView i suggest using the language and/or version fields in the 
S_COMPILE symbol, which aren't used by DMD, atm.

Oct 20 2007

Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:

Dnia Fri, 19 Oct 2007 20:03:02 -0700
Walter Bright <newshound1 digitalmars.com> napisa=B3/a:

 Currently, arrays are represented under the hood as:
=20
 	size_t lengthOfArray;
 	void* ptrToStartOfArray;
=20
 Which works out reasonably well. The problem is if you want to use
 array types as the basis of iterators, and you want to step through
 the array. There's no escaping it being two operations:
=20
 	decrement the length
 	increment the pointer
=20
 This puts a brick in any fast implementation of iterators. To fix
 that, we can change the representation to:
=20
 	void* ptrToStartOfArray;
 	void* ptrPastEndOfArray;
=20
 Then there's just one increment. Some tests show this can improve
 loop performance by up to 70%.

What about garbage collector? And pointer that doesn't points to
real data?

If this will be fixed somehow and there will be strategy for
moving collector to change this obscure second pointer to be proper, for
new location of data, then it can be good choice. (especially if this
change of iterator really will increse iteration, and wouldn't give
regressions).


--=20
Witold Baryluk, aleph0

Oct 20 2007

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 Currently, arrays are represented under the hood as:
 
     size_t lengthOfArray;
     void* ptrToStartOfArray;
 
 Which works out reasonably well. The problem is if you want to use array 
 types as the basis of iterators, and you want to step through the array. 
 There's no escaping it being two operations:
 
     decrement the length
     increment the pointer
 
 This puts a brick in any fast implementation of iterators. To fix that, 
 we can change the representation to:
 
     void* ptrToStartOfArray;
     void* ptrPastEndOfArray;
 
 Then there's just one increment. Some tests show this can improve loop 
 performance by up to 70%.
 
 So, what does this not break?
 
 1) Doesn't break array.ptr, this will still work.
 2) Doesn't break array.length as rvalue, as this is rewritten by the 
 compiler as (array.end - array.start).
 3) Doesn't break array.length as an lvalue, as that is handled by the 
 runtime library anyway.
 4) Won't break anything on D 1.0, as it wouldn't get this change.
 5) Won't break array slices, or any of that stuff we love about D arrays.
 
 What does this break?
 
 1) Passing dynamic arrays to printf as in:
 
     printf("my string is %*.s\n", str);
 
 which relied on the under-the-hood representation. This doesn't work on 
 some architectures anyway, and is thoroughly obsolete. One could quickly 
 fix such code by writing it as:
 
     printf("my string is %*.s\n", str.length, str.ptr);

This isn't a compelling reason to keep the current implementation.  As 
Anders pointed out, it doesn't work on 64-bit anyway.

 2) It breaks the internal library support code, but that's my problem.

Yeah, no big deal here either.

 3) It breaks binary compatibility with libraries already compiled. But 
 we expect to break binary compatibility with D 2.0.

Right.

 4) It breaks things like cast(ulong)str, if one was crazy enough to do 
 that anyway.

Though I know the runtime does this, I've never seen any other code that 
does :-)  And the runtime changes are easily fixed anyway.  Heck, GDC 
already has this implemented.

 5) It breaks anything that tries to look at the underlying 
 representation of dynamic arrays - but such code should be rewritten to 
 use .ptr and .length anyway, or slice notation.

Not a compelling reason to keep the old representation, as this is 
really an implementation detail.

My only potential concern would be the impact on slice operations and 
such, but addition and subtraction are so fast that this change 
shouldn't affect anything.  Also, such operations are rarely, if ever, 
performed in tight loops, unlike iterator operations.

vote++


Sean

Oct 20 2007

Knud Soerensen <4tuu4k002 sneakemail.com> writes:

Walter Bright wrote:
 Then there's just one increment. Some tests show this can improve loop
 performance by up to 70%.
 

I say go for it!
I think it is important to break code as soon as possible.
Waiting to break code will only result in more code being broken
at a later time.

Oct 20 2007

Derek Parnell <derek psych.ward> writes:

On Fri, 19 Oct 2007 20:03:02 -0700, Walter Bright wrote:
 So, what do you think?

My first concern was for the speed of accessing the length, either through
.length property or the $ concept.

I've examined my usage of .length and about 80% of the time I'm comparing
it to zero ...

   if (X.length == 0)
   if (X.length > 0) 

About another 10% is comparing the length of one array to another array ...

   if (X.length == Y.length)

Most of the rest is extracting right hand slices (substrings typically) ...

    result = X[$ - a .. $ - b]

Only occasionally do I set the .length

Most of my array iterations use the foreach() construct.

So it seems that it would be beneficial change for me. It certainly won't
break any of my code because I don't try to access the ABI directly, only
through .ptr and .length.

My test for an empty array is to compare the length. My test for an
unassigned array is to test the .ptr value to null. So I think that a good
built-in property would be an .isEmpty concept. 

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Oct 21 2007

Don Clugston <dac nospam.com.au> writes:

Walter Bright wrote:
 Currently, arrays are represented under the hood as:
 
     size_t lengthOfArray;
     void* ptrToStartOfArray;
 
 Which works out reasonably well. The problem is if you want to use array 
 types as the basis of iterators, and you want to step through the array. 
 There's no escaping it being two operations:
 
     decrement the length
     increment the pointer
 
 This puts a brick in any fast implementation of iterators. To fix that, 
 we can change the representation to:
 
     void* ptrToStartOfArray;
     void* ptrPastEndOfArray;

This seems to be a more powerful construction than .ptr + .length. If you know 
the length, it's usually trivial to calculate the end, but the converse is not 
necessarily true.
My first reaction is, does it generalise to multi-dimensional arrays? (I'm 
hoping we'll still get them eventually, though not necessarily as built-in
types).
Haven't yet thought through it properly, but I think it's probably _better_.
It means that a 'strided slice' (eg, every 4th element) could be implemented 
without mucking around with the length element. It's easy to test if two such 
arrays overlap.
Many operations can be done entirely at compile time, simply using a cast.

votes++;
I'm less sure about the use of slices as iterators though.

Oct 21 2007

Graham St Jack <grahams acres.com.au> writes:

If we are considering changing the internal representation of dynamic 
arrays, maybe we should also consider adding a pointer to "end of 
allocation", making it three pointers in total.

The extra pointer could be used by the concatenation operators to 
dramatically reduce the number of reallocs that occur when doing ~= 
appends to a dynamic array - by only realloc-ing when capacity runs out, 
and allocating more capacity than immediately needed.

I currently have to use a more heavy-weight container class (like a 
Vector) if I don't want to pay the performance cost of repeated 
reallocs, and it seems to me that cranking up the "out of the box" 
performance of dynamic arrays when concatenating would be good.

Oct 21 2007

David Brown <dlang davidb.org> writes:

On Mon, Oct 22, 2007 at 09:55:12AM +0930, Graham St Jack wrote:

 The extra pointer could be used by the concatenation operators to 
 dramatically reduce the number of reallocs that occur when doing ~= appends 
 to a dynamic array - by only realloc-ing when capacity runs out, and 
 allocating more capacity than immediately needed.

D already makes use of this information in the GC.  In fact, allocations
are always powers of 2, up until the page size, and then in full pages.
Try the following:

import std.stdio;
import std.gc;

void main ()
{
     char[] text;

     for (int i = 0; i < 256; i++) {
         text ~= 'a';
         writefln ("len = %d, alen = %d", text.length, capacity (text.ptr));
     }
}

Oct 21 2007

Graham St Jack <grahams acres.com.au> writes:

David Brown wrote:
 On Mon, Oct 22, 2007 at 09:55:12AM +0930, Graham St Jack wrote:
 
 The extra pointer could be used by the concatenation operators to 
 dramatically reduce the number of reallocs that occur when doing ~= 
 appends to a dynamic array - by only realloc-ing when capacity runs 
 out, and allocating more capacity than immediately needed.

 
 D already makes use of this information in the GC.  In fact, allocations
 are always powers of 2, up until the page size, and then in full pages.
 Try the following:
 
 import std.stdio;
 import std.gc;
 
 void main ()
 {
     char[] text;
 
     for (int i = 0; i < 256; i++) {
         text ~= 'a';
         writefln ("len = %d, alen = %d", text.length, capacity (text.ptr));
     }
 }

Cool - thanks.

Oct 21 2007

"Kris" <foo bar.com> writes:

Can anyone explain why the compiler/iterators/foreach would insist on using 
the two operations Walter notes below (decrement the length, increment the 
pointer) instead of first converting to a pointer pair?

Seem like this is a concern that has more than one way to approach it?

- kris



"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:ffbr56$r40$1 digitalmars.com...
 Currently, arrays are represented under the hood as:

 size_t lengthOfArray;
 void* ptrToStartOfArray;

 Which works out reasonably well. The problem is if you want to use array 
 types as the basis of iterators, and you want to step through the array. 
 There's no escaping it being two operations:

 decrement the length
 increment the pointer

 This puts a brick in any fast implementation of iterators. To fix that, we 
 can change the representation to:

 void* ptrToStartOfArray;
 void* ptrPastEndOfArray;

 Then there's just one increment. Some tests show this can improve loop 
 performance by up to 70%.

 So, what does this not break?

 1) Doesn't break array.ptr, this will still work.
 2) Doesn't break array.length as rvalue, as this is rewritten by the 
 compiler as (array.end - array.start).
 3) Doesn't break array.length as an lvalue, as that is handled by the 
 runtime library anyway.
 4) Won't break anything on D 1.0, as it wouldn't get this change.
 5) Won't break array slices, or any of that stuff we love about D arrays.

 What does this break?

 1) Passing dynamic arrays to printf as in:

 printf("my string is %*.s\n", str);

 which relied on the under-the-hood representation. This doesn't work on 
 some architectures anyway, and is thoroughly obsolete. One could quickly 
 fix such code by writing it as:

 printf("my string is %*.s\n", str.length, str.ptr);

 2) It breaks the internal library support code, but that's my problem.

 3) It breaks binary compatibility with libraries already compiled. But we 
 expect to break binary compatibility with D 2.0.

 4) It breaks things like cast(ulong)str, if one was crazy enough to do 
 that anyway.

 5) It breaks anything that tries to look at the underlying representation 
 of dynamic arrays - but such code should be rewritten to use .ptr and 
 .length anyway, or slice notation.

 So, what do you think?

Oct 21 2007

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 2) Doesn't break array.length as rvalue, as this is rewritten by the 
 compiler as (array.end - array.start).

Yes, but checking the length (which is something I do often in my code) now 
becomes an operation rather than a memory load.  So you are swapping a 
performance hit in one place with one in another.

 What does this break?

 1) Passing dynamic arrays to printf as in:

 printf("my string is %*.s\n", str);

Absolutely could not care less :)

 So, what do you think?

I think this change is unnecessary for iterators, why do we need to change 
the array representation?  An iterator is a pointer in a sequence of values. 
It does not need to know where in the sequence of values it is, so why would 
you represent an iterator with 2 pointers?  If you make the case that it's 
for bounds checking, then you really need 3 pointers to check for reverse 
iterating past the beginning.  I'll also point out that you are now adding 
the cost of copying two values when passing an iterator around instead of 
one pointer, so using pointers will be faster anyways.

Why not have an iterator type, defined as the property of an array, i.e.:

char[].iterator is a type, which is aliased to char *.

Then you could have properties like:

char[].iterator begin() { return ptr}
char[].iterator end() { return ptr + length}

This is very similar to C++, and makes the most sense to me, and provides 
the most speed out of anything.
You could do something similar with AA's, but the type would have to be a 
struct probably.

It wouldn't even pollute the namespace/require keywords because they are 
properties of the arrays themselves.

-Steve

Oct 22 2007

"Kris" <foo bar.com> writes:

Yeah, I'm with Steve on this. There's more than one way to skin this cat, 
and have solid performance all round instead of potentially trading one for 
another.


"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message 
news:ffij0u$290f$1 digitalmars.com...
 "Walter Bright" wrote
 2) Doesn't break array.length as rvalue, as this is rewritten by the 
 compiler as (array.end - array.start).

 Yes, but checking the length (which is something I do often in my code) 
 now becomes an operation rather than a memory load.  So you are swapping a 
 performance hit in one place with one in another.

 What does this break?

 1) Passing dynamic arrays to printf as in:

 printf("my string is %*.s\n", str);

 Absolutely could not care less :)

 So, what do you think?

 I think this change is unnecessary for iterators, why do we need to change 
 the array representation?  An iterator is a pointer in a sequence of 
 values. It does not need to know where in the sequence of values it is, so 
 why would you represent an iterator with 2 pointers?  If you make the case 
 that it's for bounds checking, then you really need 3 pointers to check 
 for reverse iterating past the beginning.  I'll also point out that you 
 are now adding the cost of copying two values when passing an iterator 
 around instead of one pointer, so using pointers will be faster anyways.

 Why not have an iterator type, defined as the property of an array, i.e.:

 char[].iterator is a type, which is aliased to char *.

 Then you could have properties like:

 char[].iterator begin() { return ptr}
 char[].iterator end() { return ptr + length}

 This is very similar to C++, and makes the most sense to me, and provides 
 the most speed out of anything.
 You could do something similar with AA's, but the type would have to be a 
 struct probably.

 It wouldn't even pollute the namespace/require keywords because they are 
 properties of the arrays themselves.

 -Steve

Oct 22 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Change representation of dynamic arrays?