www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - 'Aliasing problem' and D

reply Dave <Dave_member pathlink.com> writes:
I'm very new to D (literally as of yesterday), but am very impressed with what
I'm seeing so far.

Being that I want this language to succeed and an important part of that will be
performance potential over C, I'm curious - how does/will D deal with the
pointer 'aliasing problem' that plagues C and C++ compiler developers?

From what little I've seen so far, it seems that this same problem has been
'forced' on D by it's backward compatability with C libraries and C/C++ -like
support for pointers.

IMHO, any language that seeks to replace C/C++ should do it's best to avoid this
problem, or at least discourage code that introduces it.
Aug 01 2004
next sibling parent reply Sha Chancellor <schancel pacific.net> writes:
In article <ceis7r$23bt$1 digitaldaemon.com>,
 Dave <Dave_member pathlink.com> wrote:

 I'm very new to D (literally as of yesterday), but am very impressed with 
 what
 I'm seeing so far.
 
 Being that I want this language to succeed and an important part of that will 
 be
 performance potential over C, I'm curious - how does/will D deal with the
 pointer 'aliasing problem' that plagues C and C++ compiler developers?
 
 From what little I've seen so far, it seems that this same problem has been
 'forced' on D by it's backward compatability with C libraries and C/C++ -like
 support for pointers.
 
 IMHO, any language that seeks to replace C/C++ should do it's best to avoid 
 this
 problem, or at least discourage code that introduces it.

What aliasing problem would you be referring too? ( No D does not deal with it with denial, it's an honest question :)
Aug 01 2004
next sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <schancel-8833C7.08301601082004 digitalmars.com>, Sha Chancellor
says...
What aliasing problem would you be referring too?  ( No D does not deal 
with it with denial, it's an honest question :)

I should have been more specific - sorry. Code like this: extern int i;
Aug 01 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <ceja1n$28um$1 digitaldaemon.com>, Dave says...
I should have been more specific - sorry.

Code like this:

extern int i;

Bah - this was cut off somehow when I posted it, I'll try again.. Code like this: ;--- extern int i; void func( int &ri ) { for( int j = 0; j < 10; j++ ) { ri++; i++; } } int main() { i = 10; func(i); printf("%d\n",i); return 0; } ;--- where 'ri' refers to 'i' the compiler has to keep an in-memory copy instead of binding 'i' to a register for example. I guess other optimizations opportunites are lossed as well, even if the aliasing doesn't happen in the program, unless the compiler is sophisticated enough to keep track of variables like this (which of course adds compile time, complexity, bugs, etc..). C99 has the 'restrict' keyword, but many aren't happy with that solution. More on that here: http://www.cbau.freeserve.co.uk/Compiler/RestrictPointers.html This 'aliasing problem' is oft refered to as a reason that FORTRAN (for example) is easier to write optimizers for.
Aug 01 2004
parent reply Dave <Dave_member pathlink.com> writes:
Was this that stupid of a question or what? Seriously..

Anyone want to take a crack at it? Walter??

The reason I think this may be important for D is because a) a lot of scientific
computing types aren't real happy with FORTRAN 95, b) they aren't real happy
with C++ for many of the same reasons we aren't, c) they aren't real happy with
Java (say what you will on the Java performance issue, but when you have to
write C code in an an OOP language like Java to make it perform decently, it is
not suitable for HPF.) and d) so, all this leaves it wide open for D to step in
and make them happy, what with it's good support for native FP and built-in
complex types and all. Plus, it was designed by a compiler developer ;)

IMHO, get the HPC folks (both end-users and tool vendors) interested in an OOP
language that has the potential to crunch numbers fast and you have yourself a
good foothold on the language market.

Thanks..

In article <ceja1n$28um$1 digitaldaemon.com>, Dave says...
I should have been more specific - sorry.

Code like this:

extern int i;

Bah - this was cut off somehow when I posted it, I'll try again.. Code like this: ;--- extern int i; void func( int &ri ) { for( int j = 0; j < 10; j++ ) { ri++; i++; } } int main() { i = 10; func(i); printf("%d\n",i); return 0; } ;--- where 'ri' refers to 'i' the compiler has to keep an in-memory copy instead of binding 'i' to a register for example. I guess other optimizations opportunites are lossed as well, even if the aliasing doesn't happen in the program, unless the compiler is sophisticated enough to keep track of variables like this (which of course adds compile time, complexity, bugs, etc..). C99 has the 'restrict' keyword, but many aren't happy with that solution. More on that here: http://www.cbau.freeserve.co.uk/Compiler/RestrictPointers.html This 'aliasing problem' is oft refered to as a reason that FORTRAN (for example) is easier to write optimizers for.

Aug 03 2004
next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <ceobns$1akh$1 digitaldaemon.com>, Dave says...
Was this that stupid of a question or what? Seriously..

If I understand your question, the D version would be this: int i; void func( inout int ri ) { for( int j = 0; j < 10; j++ ) { ri++; i++; } } int main() { i = 10; func(i); printf("%d\n",i); return 0; }
Aug 03 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <ceoc3l$1apl$1 digitaldaemon.com>, Sean Kelly says...
int i;

void func( inout int ri )
{
for( int j = 0; j < 10; j++ ) {
ri++;
i++;
}
}

int main()
{
i = 10;
func(i);
printf("%d\n",i);
return 0;
}

Yes and it runs as the C version nominally would. Walter, back here: D/28904 you mention
 What Fortran has over C is the 'noalias' on function parameters which allows
 for aggressive optimization. What I'm thinking of is writing the spec for D
 functions so that parameters are always 'noalias' (for extern (C) functions
 this would not apply).

Is that or some other resolution to aliasing still under consideration?? Others - how much current code would that break?? Would it have a big effect on the DTL implementation for example? Thanks..
Aug 06 2004
parent reply Nick <Nick_member pathlink.com> writes:
In article <cf0l3e$q9p$1 digitaldaemon.com>, Dave says...
In article <ceoc3l$1apl$1 digitaldaemon.com>, Sean Kelly says...
int i;

void func( inout int ri )
{
for( int j = 0; j < 10; j++ ) {
ri++;
i++;
}
}

int main()
{
i = 10;
func(i);
printf("%d\n",i);
return 0;
}


Others - how much current code would that break?? Would it have a big effect on
the DTL implementation for example?

I can't say if it would break anything, but I do like what you're proposing. A coding style like the one above is IMO a very bad one. I was a bit surprised that the above worked as it did, I though func() would make a local copy and "return" it when done with it. Not doing so could create some unexpected effects, for example: # import std.stdio; # # void foo(inout int a, inout int b) # { # writefln(a); # b++; # writefln(a); // One might expect a to be unchanged at this point # } # # void main() # { # int i = 10; # foo(i,i); # } This outputs 10 and 11. I think you should be guaranteed that local variables are not altered by "external" influences during the course of executing a function. And if you really want the above behavior, use pointers. Nick
Aug 06 2004
next sibling parent Dave <Dave_member pathlink.com> writes:
In article <cf0ocn$sg3$1 digitaldaemon.com>, Nick says...
In article <cf0l3e$q9p$1 digitaldaemon.com>, Dave says...
In article <ceoc3l$1apl$1 digitaldaemon.com>, Sean Kelly says...

I can't say if it would break anything, but I do like what you're proposing. A coding style like the one above is IMO a very bad one.

I agree - and the C code I've been working with recently has quite a few LOC like that, and I hope future D code will not ;) The biggest thing for me though are the compiler optimization oppotunities that disappear for many functions that have an argument list like that (if C style aliasing is allowed that is). I think what Walter said here D/28904 would do D alot of good. Thanks. PS: I'm new to D and these newgroups, and I gotta say am very impressed with both. Here's to a grand weekend for everybody..3..2..1..it is now Miller Time for this cat!
Aug 06 2004
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <cf0ocn$sg3$1 digitaldaemon.com>, Nick says...
I was a bit surprised that the above worked as it did, I though func() would
make a local copy and "return" it when done with it. Not doing so could create
some unexpected effects, for example:

# import std.stdio;
#
# void foo(inout int a, inout int b)
# {
#   writefln(a);
#   b++;
#   writefln(a); // One might expect a to be unchanged at this point
# }
#
# void main()
# {
#   int i = 10;
#   foo(i,i);
# }

This outputs 10 and 11. I think you should be guaranteed that local variables
are not altered by "external" influences during the course of executing a
function. And if you really want the above behavior, use pointers.

I don't know. I would consider it a very bad idea to pass the same variable as multiple parameters of a function with side effects. This is a case where I don't think it's necessary for the compiler to protect the programmer from himself. Besides, I may not always want to pay for the extra instructions and such that this would require. Sean
Aug 06 2004
parent Jay <Jay_member pathlink.com> writes:
In article <cf0qis$tlh$1 digitaldaemon.com>, Sean Kelly says...
I don't know.  I would consider it a very bad idea to pass the same variable as
multiple parameters of a function with side effects.  This is a case where I
don't think it's necessary for the compiler to protect the programmer from
himself.  Besides, I may not always want to pay for the extra instructions and
such that this would require.

Potential pointer aliasing crops on in all sorts of mundane situations. Anywhere the compiler cannot maintain a strict bounds on a pointer's domain, it has to assume it could be pointing anywhere--your current loop termination variable, for example. Imagine you've got a performance-critical "for" loop over "0" to "n" that calls a function, which calls a function, which calls a function, which writes to a pointer. Now imaging that the compiler, for whatever reason, cannot maintain a strict bounds on the address range of that pointer, which could possibly be pointing at the stack. What should have been a very tight "for" loop has just become much less optimizable, because "n" might have been altered by a three-deep nested function that happens to write to a pointer for which strict bounds cannot be determined. It's a scary situation, and some compilers have an option to ignore pointer aliasing during optimization, which I always enable (if I remember to). I agree it is a serious problem and sure would be nice to address somehow, if possible.
Aug 17 2004
prev sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Not a stupid question at all. See previous threads, for example 
 digitalmars.D/5274
Norbet and Walter (and probably others) had a nice discussion about future
possibilities. Walter doesn't like "restrict" much either.

Dave wrote:

 
 Was this that stupid of a question or what? Seriously..
 
 Anyone want to take a crack at it? Walter??
 
 The reason I think this may be important for D is because a) a lot of
 scientific computing types aren't real happy with FORTRAN 95, b) they
 aren't real happy with C++ for many of the same reasons we aren't, c) they
 aren't real happy with Java (say what you will on the Java performance
 issue, but when you have to write C code in an an OOP language like Java
 to make it perform decently, it is not suitable for HPF.) and d) so, all
 this leaves it wide open for D to step in and make them happy, what with
 it's good support for native FP and built-in complex types and all. Plus,
 it was designed by a compiler developer ;)
 
 IMHO, get the HPC folks (both end-users and tool vendors) interested in an
 OOP language that has the potential to crunch numbers fast and you have
 yourself a good foothold on the language market.
 
 Thanks..
 
In article <ceja1n$28um$1 digitaldaemon.com>, Dave says...
I should have been more specific - sorry.

Code like this:

extern int i;

Bah - this was cut off somehow when I posted it, I'll try again.. Code like this: ;--- extern int i; void func( int &ri ) { for( int j = 0; j < 10; j++ ) { ri++; i++; } } int main() { i = 10; func(i); printf("%d\n",i); return 0; } ;--- where 'ri' refers to 'i' the compiler has to keep an in-memory copy instead of binding 'i' to a register for example. I guess other optimizations opportunites are lossed as well, even if the aliasing doesn't happen in the program, unless the compiler is sophisticated enough to keep track of variables like this (which of course adds compile time, complexity, bugs, etc..). C99 has the 'restrict' keyword, but many aren't happy with that solution. More on that here: http://www.cbau.freeserve.co.uk/Compiler/RestrictPointers.html This 'aliasing problem' is oft refered to as a reason that FORTRAN (for example) is easier to write optimizers for.


Aug 03 2004
next sibling parent J C Calvarese <jcc7 cox.net> writes:
In article <ceoed5$1btb$1 digitaldaemon.com>, Ben Hinkle says...
Not a stupid question at all. See previous threads, for example 
 digitalmars.D/5274
Norbet and Walter (and probably others) had a nice discussion about future
possibilities. Walter doesn't like "restrict" much either.

Also, I found some ever older threads using "aliasing problem" at http://www.digitalmars.com/advancedsearch.html: http://www.digitalmars.com/d/archives/1913.html http://www.digitalmars.com/d/archives/333.html (I think they relate to your topic, but I'm not 100% sure.) jcc7
Aug 03 2004
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
Doh! CTFA (Check the fargin' Archives). Sorry, and thanks for the responses and
pointers to the archives.

Here are some from back a while.. Can anyone tell me if the non-overlapping
array prohibition is still part of the spec.?

D/348
D/1926
D/18534
D/18543
D/17971

The following deal with exactly what my original post had in mind:

D/28260, to qoute Walter: 

"Historically, adding in special keywords for such optimizations has not worked
out well. That's why I was thinking of making it implicit for D function
parameters."

I AGREE - let's do that (make it implicit, but warn in debug mode), or is it too
late?

D/28377, to qoute Walter again:

"I think it would be better to have the compiler assume they are not aliased
(since that is by far the usual case) and have to say when they are not aliased.
Also, a runtime check that they really are not aliased might be appropriate in
debug mode."

I, again, wholeheartedly AGREE.

I'm with Drew here: D/28904

What is the state of this as far as D goes?

God Bless you Walter, I can see you put a lot of thought into memory layout for
and vectorizing of arrays and such, and also the aliasing issue.

Hopefully these ideas (non-overlapping arrays and implicit no-alias with debug
warning) have stood the test of time and v1 implementation so far. Because if
they do, I think it covers a lot of the issue for not only HPC code, but also
many other things now-a-days, like writing high-throughput socket code, database
engines, AI engines, speech synthesis, etc., etc., etc...

It seems like Walter wants to give both us and compiler implementors a great
high-performance base to work with..

Not only that, but consider this: In these days of cool, productive and decent
performance interpreted languages like Perl, Python, etc. not to mention Java,
many people are just not going to switch just because of excellent new features
(most people think "their" language has enough features - after all, they've
been able to "get by with it so-far").

Now, if you give them:

- High-performance on the order Fortran,
- Intuitive (implicit aliasing like C is NOT intuitive to most people using
Perl, VB, Java, etc. and therefore is the source of a lot of bugs to them),
- True OOP language with all the features of D,

THAT in total is a great reason to switch.

How about the compiler developers?? Many of them would quit en-masse if you told
them they had to implement C++ all over again, except with MORE features ;)

Thanks for all of the pointers to the archived messages.

- Dave

In article <ceoed5$1btb$1 digitaldaemon.com>, Ben Hinkle says...
Not a stupid question at all. See previous threads, for example 
 digitalmars.D/5274
Norbet and Walter (and probably others) had a nice discussion about future
possibilities. Walter doesn't like "restrict" much either.

Dave wrote:

 
 Was this that stupid of a question or what? Seriously..


Aug 03 2004
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Dave schrieb:

 Doh! CTFA (Check the fargin' Archives). Sorry, and thanks for the responses and
 pointers to the archives.

No problem. It just takes time and threads get forgotten, because the newsgroup grows too fast for anyone to follow.
 Here are some from back a while.. Can anyone tell me if the non-overlapping
 array prohibition is still part of the spec.?

As far as i know, it is with whole-array operations (they are not implemented yet iirc). It is not specified in which order the elements would be processed, and thus they must be unaliased... I don't think it holds true of functions in general. One solution might be a sort-of unoverlapping assert, which would be standard and both keep us safe and enable optimizations. BTW, asserts usually do enable optimizations in D - in release mode, their true-ness is assumed for optimization, although the assert code itself is left out. Even if it's not there, the whole array operations would probably carry the bulk of performance optimization by utilizing the SIMD units. The check itself is technically very simple and doesn't consume time, since arrays are pointer&length and one only has to check their overlapping. Since C doesn't have a concept of an array, i hardly imagine how such a check could be done.
 D/348
 D/1926
 D/18534
 D/18543
 D/17971

Phew, you do some investigation work. :)
 The following deal with exactly what my original post had in mind:
 
 D/28260, to qoute Walter: 
 
 "Historically, adding in special keywords for such optimizations has not worked
 out well. That's why I was thinking of making it implicit for D function
 parameters."
 
 I AGREE - let's do that (make it implicit, but warn in debug mode), or is it
too
 late?

No, i don't think things like that are late. The compiler doesn't look like it's approaching the spec very soon, and code gets broken every now and then.
 D/28377, to qoute Walter again:
 
 "I think it would be better to have the compiler assume they are not aliased
 (since that is by far the usual case) and have to say when they are not
aliased.
 Also, a runtime check that they really are not aliased might be appropriate in
 debug mode."
 
 I, again, wholeheartedly AGREE.

Hmmm... And what would one do when one is willing to accept aliased arrays? And is there any necessity in this case anyway? What i see in front of my eyes, is a function, which inputs 2 arrays and outputs a new one. I understand that there should be no aliasing if it was to output in one of the inputs, but aliasing between the inputs would be OK if the output is a new array. You mention Fortran doesn't have the problem. How does Fortran deal with aliasing? If i remember correctly, Sather has some way to identify possible aliasing hazards statically, and allow aliasing if nothing speaks against it, though i might be wrong - perhaps it was just planned. Sather is a whole-program compiler. D is geared towards whole-program compilation, but should also work without it. C++ is too weak for all that.
 I'm with Drew here: D/28904
 
 What is the state of this as far as D goes?
 
 God Bless you Walter, I can see you put a lot of thought into memory layout for
 and vectorizing of arrays and such, and also the aliasing issue.
 
 Hopefully these ideas (non-overlapping arrays and implicit no-alias with debug
 warning) have stood the test of time and v1 implementation so far. Because if
 they do, I think it covers a lot of the issue for not only HPC code, but also
 many other things now-a-days, like writing high-throughput socket code,
database
 engines, AI engines, speech synthesis, etc., etc., etc...
 
 It seems like Walter wants to give both us and compiler implementors a great
 high-performance base to work with..
 
 Not only that, but consider this: In these days of cool, productive and decent
 performance interpreted languages like Perl, Python, etc. not to mention Java,
 many people are just not going to switch just because of excellent new features
 (most people think "their" language has enough features - after all, they've
 been able to "get by with it so-far").
 
 Now, if you give them:
 
 - High-performance on the order Fortran,
 - Intuitive (implicit aliasing like C is NOT intuitive to most people using
 Perl, VB, Java, etc. and therefore is the source of a lot of bugs to them),
 - True OOP language with all the features of D,
 
 THAT in total is a great reason to switch.
 
 How about the compiler developers?? Many of them would quit en-masse if you
told
 them they had to implement C++ all over again, except with MORE features ;)
 
 Thanks for all of the pointers to the archived messages.
 
 - Dave
 
 In article <ceoed5$1btb$1 digitaldaemon.com>, Ben Hinkle says...
 
Not a stupid question at all. See previous threads, for example 
digitalmars.D/5274
Norbet and Walter (and probably others) had a nice discussion about future
possibilities. Walter doesn't like "restrict" much either.

Dave wrote:


Was this that stupid of a question or what? Seriously..



Aug 10 2004
parent reply "Carlos Santander B." <carlos8294 msn.com> writes:
"Ilya Minkov" <minkov cs.tum.edu> escribió en el mensaje
news:cfasdk$30jv$1 digitaldaemon.com
| You mention Fortran doesn't have the problem. How does Fortran deal with
| aliasing?

Fortran doesn't have pointers, so there's no aliasing.

-----------------------
Carlos Santander Bernal
Aug 11 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <cfdm6s$126s$1 digitaldaemon.com>, Carlos Santander B. says...
"Ilya Minkov" <minkov cs.tum.edu> escribió en el mensaje
news:cfasdk$30jv$1 digitaldaemon.com
| You mention Fortran doesn't have the problem. How does Fortran deal with
| aliasing?

Fortran doesn't have pointers, so there's no aliasing.

No, it's more innocuous than that even.. This f77 compiled with g77 (all I have): I=1 CALL FOO(I,I) PRINT *,I END SUBROUTINE FOO(J,K) c D could have a debug build runtime 'assert' placed by the compiler here c checking if J & K referenced the same variable, along the sames lines of c array bounds checking in D J = J + K K = J * K PRINT *, J, K END produces the 'intuitively wrong' results through aliasing w/o any compiler or runtime warnings: 4 4 4 The way (g77 at least) handles it is it expects you to follow the specs. that say not to write code like that. The proposal I suggested a little earlier today would not 'noalias' pointers or the inside of 'common' data like arrays, only value types and the built-in arrays passed by ref. (with a debug runtime warning as illustrated above, just like what happens with array bounds checking for debug builds now). - Dave
Aug 11 2004
parent reply "Carlos Santander B." <carlos8294 msn.com> writes:
"Dave" <Dave_member pathlink.com> escribió en el mensaje
news:cfdr1n$15ba$1 digitaldaemon.com
| No, it's more innocuous than that even..
|
| This f77 compiled with g77 (all I have):
|
| I=1
| CALL FOO(I,I)
| PRINT *,I
| END
|
| SUBROUTINE FOO(J,K)
| c       D could have a debug build runtime 'assert' placed by the compiler
here
| c       checking if J & K referenced the same variable, along the sames lines
of
| c       array bounds checking in D
| J = J + K
| K = J * K
| PRINT *, J, K
| END
|
| produces the 'intuitively wrong' results through aliasing w/o any compiler or
| runtime warnings:
|
| 4 4
| 4
|
| The way (g77 at least) handles it is it expects you to follow the specs. that
| say not to write code like that.
|
| The proposal I suggested a little earlier today would not 'noalias' pointers
or
| the inside of 'common' data like arrays, only value types and the built-in
| arrays passed by ref. (with a debug runtime warning as illustrated above, just
| like what happens with array bounds checking for debug builds now).
|
| - Dave

Sorry about what I said: I had read it here on this ng, so I just repeated it.

I have a copy of Salford FTN77 Compiler (4.03, Personal Edition. Used to be
free, not anymore I think), and it produced the exact same result until I used
the "/UNSAFE" flag. From the help file: "/UNSAFE: Used in conjunction with
/OPTIMISE in order to improve the execution speed of certain programs by using
code re-arrangement techniques". With that flag I got 2's instead of 4's.

-----------------------
Carlos Santander Bernal
Aug 11 2004
parent reply "Carlos Santander B." <carlos8294 msn.com> writes:
"Carlos Santander B." <carlos8294 msn.com> escribió en el mensaje
news:cfejqa$1g7i$1 digitaldaemon.com
| I have a copy of Salford FTN77 Compiler (4.03, Personal Edition. Used to be
| free, not anymore I think)

It's still free. And their FTN95 Personal Editional is also free.
Just in case someone's interested...

-----------------------
Carlos Santander Bernal
Aug 11 2004
parent Dave <Dave_member pathlink.com> writes:
In article <cfek28$1gbg$1 digitaldaemon.com>, Carlos Santander B. says...
"Carlos Santander B." <carlos8294 msn.com> escribió en el mensaje
news:cfejqa$1g7i$1 digitaldaemon.com
| I have a copy of Salford FTN77 Compiler (4.03, Personal Edition. Used to be
| free, not anymore I think)

It's still free. And their FTN95 Personal Editional is also free.
Just in case someone's interested...

Might be! Thanks.. - Dave
-----------------------
Carlos Santander Bernal

Aug 11 2004
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <schancel-8833C7.08301601082004 digitalmars.com>, Sha Chancellor
says...
In article <ceis7r$23bt$1 digitaldaemon.com>,
 Dave <Dave_member pathlink.com> wrote:

 I'm very new to D (literally as of yesterday), but am very impressed with 
 what I'm seeing so far.
 
 Being that I want this language to succeed and an important part of that will 
 be performance potential over C, I'm curious - how does/will D deal with the
 pointer 'aliasing problem' that plagues C and C++ compiler developers?


What aliasing problem would you be referring too?  ( No D does not deal 
with it with denial, it's an honest question :)

--- Thump, thump - up on the soapbox --- I hate to say it, but it's appearing more and more like it is being dealt with by denial ;) I've asked the question a couple of times in a couple of different ways and it seems that none of the 'principles' of D design and implementation can or will give me an answer on future direction [or even future plans] for this issue. And all the while more code (that a change may break) is being written, large efforts are underway to develop big libraries (like the DTL, parts of which which may break) and version 1.0 is just around the corner (which more-or-less cements the issue for a while, or maybe forever - we are still living issues like this down with C/C++). Look, FWIW, I'm smitten with D. Absolutely love what I've been able to absorb in a week's worth of late nights. I think Walter is a gutsy genious. Matthew as well - brilliant guy, investing tons of time and stomach lining on developing the DTL and writing about and promoting D. And I'm impressed with all the work and input all the others have put into this language and this newsgroup. What I'm saying here is all in the spirit of trying to make sure D is being used by more than a couple thousand Die harD D Developers in the coming years. I want this language to succeed because I'm sick and tired of basic inadequecies in C and its derivitive languages. Although I personally hate ocaml, I don't want to see D end up like that - lot's of promise, little future (although I'm damn glad I'm not forced by popularity to use it. It may be intuituve if you speak french, but I don't and loath that language [ocaml, not french] ;). I work 70+ hrs. a week, have a family that needs attention and STILL want to help D anyway I can, but I simply will not start working (as opposed to playing) with D until fundamental issues like aliasing behaviour, runtime allocated rectangular arrays, etc. are worked out better. All the OOP stuff is great, but it's damn hard to write, improve and _trust the future of_ code developed with a language that hasn't nailed down even more fundamental issues when there are many other good choices out there already that have mature compilers, libraries, etc. It appears to me (given the versioning scheme of dmc, etc.) that Walter has a different - maybe the correct, who knows - take on versioning than the rest of us. In other words, v1.0 seems like more of an increment than a milestone to him. Problem is almost no one else outside of this community sees it that way. If D takes off (and I'm convinced it won't unless some basic issues like 'the aliasing problem' are taken care of) the D community may have to live with features/results of v1.0 for a long time, even if v1.0 itself is obsoleted in a month. I like what I've seen so far, but the aliasing issue is one of the underlying reasons WHY C and C++ need to be REPLACED, and are not used for whole classes of applications. Jagged dynamic arrays being the only built-in choice for runtime allocated arrays is another big one. I hate to rain on the parade here, but I think this language will really suffer in both popularity and somewhat in utility w/o a decent resolution to these two issues (and there may be others as well, but these really stand out). Now a days, most people will simply not adopt another natively compiled language over C/C++/C# and Java until there is a clear performance advantage in doing so and compiler vendors will not support a language where there is primarily C++ style OOP wrapped in different semantics. Face it folks, in the mainstream OOP performance arena C++ is king and Java and C# are making inroads. Ada is the odd man out here, but may well be a better choice than any of those technically. You will win few of those people over, but not many. In the C and Fortran worlds, there is just no way that they will switch until D proves itself a better performer, or at least there is now a reason to believe from the language design that it can perform the same (Fortran) or better (C). D fans - you gotta see it as most of the rest of the world will see it when a new project comes along. When you think of it, it is not hard at all to imagine this dialog: Geek: "Boss, there is this new language out there called D that I really like. Howz about we give it a shot for our next project." PHB: "Ummm, well, why?" Geek: "Because it's really cool and the compiler is free!" PHB: "Ok, you've just described about 20 other languages that I can read about and download on the net. Now give me some real good reasons or go back to your C++ IDE, or if you are you still tuning that Java app., finish that.. I have a tee time in a 1/2 hr." Geek: "Ummm, well I can be maybe about 20% more productive with it, unless the unproven compiler and libraries give me problems." PHB: "Hmmm, sounds more risky than being near the fairway when I tee off.. How about performance?" Geek: "Ummm, well the compilers are young yet, but should give me performance on par with Java and maybe reasonably close to C++, at least for some things. But as I say, it's young yet.". PHB: "Ok, I can understand that every great tool has to start somwhere and the productivity sounds enticing. You know how picky some users and the development team can be about stuff like performance.. Is there a potential for this language to provide us with better performance once the tools mature - can we finally get rid of those damn Fortran legacy apps. we're always hiring consultants for"? Geek: "Umm, well, hmmmm, uhhh, well, I guess not, not really anyway. Ok, I'll level with you - it inherited basically about the same performance potential as C++ has, but, but, well it has some cool built-ins and auto-GC". PHB: "Hmph, well Ok, GC is cool. How useful are the built-ins". Geek: (Shit, I knew he'd ask that) "Well, for specialized high-performance needs, we can bypass the GC if needed and build our own classes". PHB: "Sounds like what we've already done and tested with C++" Geek: "Hmmm, yep". PHB: (Pats Geek on the shoulder) "Ok, well will talk about it some other time (once I get council from the finger-in-the-wind oracle). In the meantime, get back out there and bust your ass with C++ or finish tuning that Java app. until we sign that contract with that Indian company, would ya". Geek: (Mumbling under his breath) "[ok - just try not to hit the CEO in the butt with your golf ball again, dork.]" If you are going to try and win mindshare for a high-performance language, then you have to pay attention to basic issues like this, and do your best to get it right the first time even if the release of the big 1.0 is delayed a while. It doesn't take a rocket scientist to figure out that mimicking C/C++ aliasing and jagged arrays is NOT THE WAY TO GO for a language aspiring to win developers away from it. I could friggen kick myself for not finding out about D earlier so I could have bitched then. And I'm guessing that many reading this (if any) would like to kick me now.. But anyhow... PHP is a good example of how to promise and deliver on a language to initially fill a good size niche and then expand to win more mind-share. They didn't try to mimick Java (as used for that class of applications) or ASP on performance or features. What that community started with, promised and delivered on was an easy to learn and use dynamic website scripting language with most of the useful features and decent performance. I think D will need to do something similiar as far as expectation management. To win away developers and tool vendors from C/C++/Java/C# and maybe even Fortran, some basic underlying design of the language has to be different enough to attract them. IMO, adding more and more, or slightly improving existing, OOP features is not the way to go. Adding high-performance orientated semantics to the language along with /high-performance/ built-ins is the way to go. In this group and in the archives, I've seen /ALOT/ of talk, argument and mindshare wasted on how this or that OOP feature doesn't exactly coincide with what they expected/wanted/whatever, but comparitively little on the performance of the language and tools or changing the semantics to support high-performance compiler development. The worst part of this, given the squeeky wheel truism, is that Walter has been driven to pay so much attention to OOP nits that the other more basic stuff hasn't gotten the attention it deserves. Since this is a new language, I really think the most basic needs of the underlying language should be tended to first, don't you?? --- THUMP off the soapbox.. --- If you made it here, thanks for taking the time to read this.. - Dave PS: I post more-or-less anonymously to these newsgroups. Rest assured it hasn't been because I'm somehow gutless when it comes to expressing my opinions, defending them or having them attributed to me forever. A family member was recently the victim of ID theft, and I'm a bit wary of any info. I put out into the public domain, including e-mail addresses, etc. (the ID theft modus-operandi included use of an e-mail address).
Aug 08 2004
next sibling parent reply J C Calvarese <jcc7 cox.net> writes:
You obvious have passion and knowledge about the issue of the "Aliasing 
Problem" that far exceeds my interest. I hope that D doesn't disappoint 
you in this respect, but I fear that it will. Nonetheless, I do have a 
few minor points I'd like to make.

Dave wrote:
 In article <schancel-8833C7.08301601082004 digitalmars.com>, Sha Chancellor
 It appears to me (given the versioning scheme of dmc, etc.) that Walter has a
 different - maybe the correct, who knows - take on versioning than the rest of
 us. In other words, v1.0 seems like more of an increment than a milestone to
 him. Problem is almost no one else outside of this community sees it that way.
 If D takes off (and I'm convinced it won't unless some basic issues like 'the
 aliasing problem' are taken care of) the D community may have to live with
 features/results of v1.0 for a long time, even if v1.0 itself is obsoleted in a
 month.

Unless Walter has told you something that he hasn't told anyone else, I don't think anyone knows when DMD 1.0 is coming. I know DMD 0.98 has just been released, but unless Walter slows the releases way down I can't believe we're a couple releases away from DMD 1.0. I think we're going to see a 0.101, 0.102, 0.103, etc. before 1.0 appears. This "floating point" vs. "major.minor" controversy hasn't been commented on by Walter, but Walter hasn't indicated that he wants to release anything other than a polished D 1.0 that will make D look good and stable.
 I like what I've seen so far, but the aliasing issue is one of the underlying
 reasons WHY C and C++ need to be REPLACED, and are not used for whole classes
of
 applications. Jagged dynamic arrays being the only built-in choice for runtime
 allocated arrays is another big one. 
 
 I hate to rain on the parade here, but I think this language will really suffer
 in both popularity and somewhat in utility w/o a decent resolution to these two
 issues (and there may be others as well, but these really stand out). Now a
 days, most people will simply not adopt another natively compiled language over
 C/C++/C# and Java until there is a clear performance advantage in doing so and

I'm not particularly concerned about the performance of C/C++. I care about the easy of programming in those languages (or lack thereof). I don't like Java because of the performance issues and the requirement of using OOP, but the people who like Java usually don't mind the performance loss and enjoy the OOP aspects. If they're looking for better performance, they'll either like the D syntax or they won't. I doubt they'd switch to D just because it's x percent faster than C/C++. But I'm just guessing.
 compiler vendors will not support a language where there is primarily C++ style
 OOP wrapped in different semantics. Face it folks, in the mainstream OOP
 performance arena C++ is king and Java and C# are making inroads. Ada is the
odd
 man out here, but may well be a better choice than any of those technically.
You
 will win few of those people over, but not many. In the C and Fortran worlds,
 there is just no way that they will switch until D proves itself a better
 performer, or at least there is now a reason to believe from the language
design
 that it can perform the same (Fortran) or better (C).

Honestly, I don't think D is even targeted at Fortran programmers. Or Cobol programmers. It'd be great if D could appeal to programmers of every language, but I really don't think we're going too get many Visual Basic converts either. Or Caml or Haskell. D is designed to be what C++ should have been. C programmers that don't like C++ because it's too complicated should like D. C++ programmers who don't think that C is powerful enough should like D. That's the niche that D is targeting. If the libraries become powerful enough so that Java, C#, and Delphi programmers come aboard that's great, but the grand vision has never been "one language to rule them all". Just the best all-around language yet (perhaps I overstate the goal?). Yes, performance is important, but so is the ease of programming.
 Since this is a new language, I really think the most basic needs of the
 underlying language should be tended to first, don't you??

If you're building a house and essentially all you have left to do before you sell it is install the carpet, is that a good time to pull all of the incandescent light fixtures and replace them with fluorescence lights. I think you're talking about big changes and recently Walter has been talking about being done with the design. If you were making these suggestions a year ago, you might have found more interest. But even then could have been too late. Walter's been designing D for several years.
 
 --- THUMP off the soapbox.. ---
 
 If you made it here, thanks for taking the time to read this..

Thanks for posting. Maybe Walter can think of an easy way to adjust the Aliasing to allay your performance fears. We'd all like D to be the best language, but there are different ideas of how that's accomplished. With the increasing speed and memory of new hardware, performance issues aren't always the most important issue. And I think Walter's current goal is for DMD to work right and he'll work on getting it to be "leaner and meaner" later.
 
 - Dave
 
 PS: I post more-or-less anonymously to these newsgroups. Rest assured it hasn't
 been because I'm somehow gutless when it comes to expressing my opinions,
 defending them or having them attributed to me forever. A family member was
 recently the victim of ID theft, and I'm a bit wary of any info. I put out into
 the public domain, including e-mail addresses, etc. (the ID theft
modus-operandi
 included use of an e-mail address).

A few people around here post more anonymously than you do, I wouldn't feel bad about it. -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Aug 08 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <cf66ab$350$1 digitaldaemon.com>, J C Calvarese says...
You obvious have passion and knowledge about the issue of the "Aliasing 
Problem" that far exceeds my interest. I hope that D doesn't disappoint 
you in this respect, but I fear that it will. Nonetheless, I do have a 
few minor points I'd like to make.

Passion yes, knowledge, maybe a just a little bit. I'm no compiler writer, heck I'm not even a numerics guy. The reason I care about it is because D currently adapts something bad from C that I think could finally be done away with w/o a lot of pain. And I hate paying for something many times (in terms of clean code generation) that is there to accomidate a bare minority of the code. Who knows - maybe the reason Walter is not commenting on this thread is because he's got it figured out or he no longer thinks it as important as he once did in the grand scheme of things (check the archives or my earlier posts pointing to some of these archives if you're curious about that).
just been released, but unless Walter slows the releases way down I 
can't believe we're a couple releases away from DMD 1.0. I think we're 
going to see a 0.101, 0.102, 0.103, etc. before 1.0 appears. This 
"floating point" vs. "major.minor" controversy hasn't been commented on 
by Walter, but Walter hasn't indicated that he wants to release anything 
other than a polished D 1.0 that will make D look good and stable.

Well, that's encouraging at least. 2 things made me think we were heading to 1.0: I believe there was some talk of a release date of March 04 in the archives. Obviously we are way past that. 2nd, the version number scheme that is used for other DM products indicates 1.0 is a couple of releases away unless they start being numbered 0.99.1, 0.99.2, etc. or I guess 1.0 alpha1, 1.0 alpha2, 1.0 beta1, etc. could be used. But your right - none of this is set in stone.
I'm not particularly concerned about the performance of C/C++. I care 
about the easy of programming in those languages (or lack thereof).

I don't like Java because of the performance issues and the requirement 
of using OOP, but the people who like Java usually don't mind the 

I'm confused - you don't care about C++ performance, but do about Java performance? The truth is becoming, except for probably some (important) OOP stuff like generics and others, a few Java runtimes are closing the gap with top C++ compilers in both benchmarks and more than just some 'real world' code from what I've heard from fellow developers who've used both. That's another reason I want D to be superior in performance, or at this stage, at least superior in performance /potential/.
performance loss and enjoy the OOP aspects. If they're looking for 
better performance, they'll either like the D syntax or they won't. I 
doubt they'd switch to D just because it's x percent faster than C/C++. 
But I'm just guessing.

Right now, according to some benchmarks I've run, D seems a good margin slower for somethings, especially when the built-in strings and AA's are used. For example char[] concatenation in a tight loop like this: ;--- D: import std.string; import std.stream; int main() { char[] input, output; output = "<TABLE><TR><TD>\n"; File f = new File("some_large_file",FileMode.In); while(!f.eof()) { input = f.readLine(); output ~= input; output ~= "</TD><TD>\n"; } f.close(); output ~= "</TD></TR></TABLE>\n"; printf("output length: %d\n",output.length); return(0); } output length: 6944735 real 0m37.747s user 0m31.840s sys 0m3.180s ;--- C++: #include <string> #include <fstream> using namespace std; int main() { string input, output; output = "<TABLE><TR><TD>\n"; ifstream f("some_large_file",fstream::in); while(getline(f,input)) { output += input; output += "</TD><TD>\n"; } f.close(); output += "</TD></TR></TABLE>\n"; printf("output length: %d\n",output.length()); return(0); } output length: 6944735 real 0m0.311s user 0m0.260s sys 0m0.040s ;--- is alot slower than basic_string<> in C++. Now I know about OutBuffer, but people will be drawn to using char[] for this, just like they are drawn to Java String instead of StringBuffer or StringBuilder (Java v1.5). Here's the OutBuffer version: import std.string; import std.stream; import std.outbuffer; int main() { char[] input; OutBuffer output = new OutBuffer(); output.write("<TABLE><TR><TD>\n"); File f = new File("some_large_file",FileMode.In); while(!f.eof()) { input = f.readLine(); output.write(input); output.write("</TD><TD>\n"); } f.close(); output.write("</TD></TR></TABLE>\n"); printf("output length: %d\n",output.toString().length); return(0); } output length: 6944735 real 0m5.217s user 0m2.640s sys 0m2.280s ;--- C++ is still 10x faster, and the Java version would probably be not far behind C++ from what I've seen. Like it or not, many people will choose whether or not to give a language a 2nd look based on benchmarks. _Especially_ when the competition is C/C++ or Java.
Honestly, I don't think D is even targeted at Fortran programmers. Or 
Cobol programmers.

Maybe not Cobol programmers, but check the archives on the Fortran question. While maybe not an explicit goal, it seems to be a sincere wish of Walter and others that D would be considered as a replacement for Fortran. It is one of the areas C/C++ falls short. One of the reasons why C/C++ falls short here is [you guessed it] 'the aliasing problem'.
It'd be great if D could appeal to programmers of every language, but I 
really don't think we're going too get many Visual Basic converts 
either. Or Caml or Haskell.

Caml or Haskell I agree with, but that is probably a pretty small minority. Visual Basic I strongly disagree with. Apparently many of those people have decided to switch to Java or C# when VB went the .NET route. D would be right up their alley.
D is designed to be what C++ should have been. C programmers that don't 
like C++ because it's too complicated should like D. C++ programmers who 

That's the point - C++ should have handled the aliasing problem (among other C weaknesses) differently, but the principal designers of that language were too damn busy fighting over arcane OOP issues (sound familiar? ;).
If you're building a house and essentially all you have left to do 
before you sell it is install the carpet, is that a good time to pull 
all of the incandescent light fixtures and replace them with 
fluorescence lights. I think you're talking about big changes and 
recently Walter has been talking about being done with the design. If 
you were making these suggestions a year ago, you might have found more 
interest. But even then could have been too late. Walter's been 
designing D for several years.

Funny you should mention that. Something like it happened to a house I had built. It turns out some of the electrical would not pass inspection so they had to change that before they could lay the carpet (true story). And I /may not/ be talking about changes that are all that huge or drastically hard to implement either. It's certainly worth more discussion IMHO, which is why I keep at this thread.
 If you made it here, thanks for taking the time to read this..

Thanks for posting. Maybe Walter can think of an easy way to adjust the Aliasing to allay your performance fears.

I hope so.
of how that's accomplished. With the increasing speed and memory of new 
hardware, performance issues aren't always the most important issue. And 
I think Walter's current goal is for DMD to work right and he'll work on 
getting it to be "leaner and meaner" later.

Sorry, but I have to call you on that.. Many have been saying similiar for years and it seems that software complexity always fills the gap, and then some. Java fans are qouted as saying the same ad nauseum in the early days and even now to some extent. In the eight or so years since Java took off, people are still comlaining about it and /both/ machines and Java runtimes are quite a bit more performant now than they were then. I've heard tell that a machine costing under about $5000 hasn't even been built yet that can run the next version of Windows with generally acceptable performance. MS is literally betting the farm that Moore's Law will continue unimpeded so they have something to run what they sell. It's a good bet, but still a bet. Maybe with a D spec. that takes care of the aliasing issue, they could have wrote the new Windows with D and I could run it on my trusty 'old' Pentium 4 machine, and spend the $3000 I'd save on a new Harley ;) - Dave
Aug 09 2004
parent reply Ben Hinkle <bhinkle4 juno.com> writes:
[mega-snip]

 ;---
 D:
 
 import std.string;
 import std.stream;
 
 int main()
 {
 char[] input, output;
 
 output = "<TABLE><TR><TD>\n";
 File f = new File("some_large_file",FileMode.In);
 while(!f.eof()) {
 input = f.readLine();
 output ~= input;
 output ~= "</TD><TD>\n";
 }
 f.close();
 
 output ~= "</TD></TR></TABLE>\n";
 
 printf("output length: %d\n",output.length);
 return(0);
 }

Have you tried using a BufferedFile? The default File is unbuffered. There are a number of posters who think the default File should be buffered and I'm starting to agree - just because people seem to assume it is buffered.
 output length: 6944735
 real    0m37.747s
 user    0m31.840s
 sys     0m3.180s
 ;---

[mega-snip]
Aug 09 2004
next sibling parent reply Berin Loritsch <bloritsch d-haven.org> writes:
Ben Hinkle wrote:
<snip>

 Have you tried using a BufferedFile? The default File is unbuffered. There
 are a number of posters who think the default File should be buffered and
 I'm starting to agree - just because people seem to assume it is buffered.

I would be careful with this. Something I have found with Java IO Streams is that buffering can backfire--and if there is no way to turn it off then the developer is stuck. Let me give an example: In a web environment you need to handle as many requests as you can at one time. This is key to scalability. Initial testing might suggest that using a BufferedInputStream for file IO would speed up the request/response time on the server. Then later, when you are doing load testing, you find that the extra KB or so of RAM taken up by the buffer is adding up quickly and your system starts falling apart at the seams due to the heavy load on the memory system. This is something I have been bitten by in the past. I would be surprised to find it only limited to Java. The solution to use unbuffered IO or even a greatly reduced buffer size helped the scalability of the webapp.
Aug 09 2004
next sibling parent reply "Stratus" <vdai spamnet.net> writes:
Really shouldn't waste the effort in replying, but this post is so full of
FUD ... Puhhhleez ! I will, however, not pollute this decent topic further
by pointing out exactly how much fallacy is involved. Instead, I'll try to
assume it's an offbeat attempt at humor.

"Berin Loritsch" <bloritsch d-haven.org> wrote in message
news:cf8c0q$qnr$1 digitaldaemon.com...
 I would be careful with this.  Something I have found with Java IO
 Streams is that buffering can backfire--and if there is no way to
 turn it off then the developer is stuck.  Let me give an example:

 In a web environment you need to handle as many requests as you can
 at one time.  This is key to scalability.  Initial testing might
 suggest that using a BufferedInputStream for file IO would speed up
 the request/response time on the server.  Then later, when you are
 doing load testing, you find that the extra KB or so of RAM taken
 up by the buffer is adding up quickly and your system starts falling
 apart at the seams due to the heavy load on the memory system.

 This is something I have been bitten by in the past.  I would be
 surprised to find it only limited to Java.  The solution to use
 unbuffered IO or even a greatly reduced buffer size helped the
 scalability of the webapp.

Aug 09 2004
parent reply Berin Loritsch <bloritsch d-haven.org> writes:
Stratus wrote:

 Really shouldn't waste the effort in replying, but this post is so full of
 FUD ... Puhhhleez ! I will, however, not pollute this decent topic further
 by pointing out exactly how much fallacy is involved. Instead, I'll try to
 assume it's an offbeat attempt at humor.

Actually, it is an anecdote of history.
 
 "Berin Loritsch" <bloritsch d-haven.org> wrote in message
 news:cf8c0q$qnr$1 digitaldaemon.com...
 
I would be careful with this.  Something I have found with Java IO
Streams is that buffering can backfire--and if there is no way to
turn it off then the developer is stuck.  Let me give an example:

In a web environment you need to handle as many requests as you can
at one time.  This is key to scalability.  Initial testing might
suggest that using a BufferedInputStream for file IO would speed up
the request/response time on the server.  Then later, when you are
doing load testing, you find that the extra KB or so of RAM taken
up by the buffer is adding up quickly and your system starts falling
apart at the seams due to the heavy load on the memory system.

This is something I have been bitten by in the past.  I would be
surprised to find it only limited to Java.  The solution to use
unbuffered IO or even a greatly reduced buffer size helped the
scalability of the webapp.


Aug 09 2004
parent Berin Loritsch <bloritsch d-haven.org> writes:
Berin Loritsch wrote:

 Stratus wrote:
 
 Really shouldn't waste the effort in replying, but this post is so 
 full of
 FUD ... Puhhhleez ! I will, however, not pollute this decent topic 
 further
 by pointing out exactly how much fallacy is involved. Instead, I'll 
 try to
 assume it's an offbeat attempt at humor.

Actually, it is an anecdote of history.

Besides whats wrong with supplying both an UnbufferedFile and a BufferedFile? You have the flexibility when you need it. I assume you don't want to take the time to discover how little falacy is involved. Or how that particular premature optmization bit me.
Aug 09 2004
prev sibling parent J C Calvarese <jcc7 cox.net> writes:
Berin Loritsch wrote:

 Ben Hinkle wrote:
 <snip>
 
 Have you tried using a BufferedFile? The default File is unbuffered. 
 There
 are a number of posters who think the default File should be buffered and
 I'm starting to agree - just because people seem to assume it is 
 buffered.

I would be careful with this. Something I have found with Java IO Streams is that buffering can backfire--and if there is no way to turn it off then the developer is stuck. Let me give an example:

I read "the default File should be buffered" to mean both are allowed and it's the developer's choice. Just because one is given a preferential name doesn't mean that the other isn't allowed. No need for alarm. ;) -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Aug 09 2004
prev sibling parent Dave <Dave_member pathlink.com> writes:
In article <cf8b1l$qb0$1 digitaldaemon.com>, Ben Hinkle says...
Have you tried using a BufferedFile? The default File is unbuffered. There
are a number of posters who think the default File should be buffered and
I'm starting to agree - just because people seem to assume it is buffered.

Here is that along with a similiar Java version. Thanks for the tip on BufferedFile(). ;---------- D version: import std.string; import std.stream; import std.outbuffer; int main() { char[] input; OutBuffer output = new OutBuffer(); output.write("<TABLE><TR><TD>\n"); BufferedFile f = new BufferedFile("some_large_file",FileMode.In); while(!f.eof()) { input = f.readLine(); output.write(input); output.write("</TD><TD>\n"); } f.close(); output.write("</TD></TR></TABLE>\n"); printf("output length: %d\n",output.toString().length); return(0); } ;--- output length: 6944735 real 0m0.767s user 0m0.710s sys 0m0.060s ;;----------------------- Java version: import java.io.*; import java.util.*; import java.text.*; public class html { public static void main(String[] args) { String input; StringBuffer output = new StringBuffer(); output.append("<TABLE><TR><TD>\n"); try { FileReader f = new FileReader("some_large_file"); BufferedReader in = new BufferedReader(f); while ((input = in.readLine()) != null) { output.append(input); output.append("</TD><TD>\n"); } f.close(); } catch (IOException e) { System.err.println(e); return; } output.append("</TD></TR></TABLE>\n"); System.out.println("output length: " + output.toString().length()); } } ;--- # /usr/java/j2sdk1.4.2_05/bin/javac html.java # time /usr/java/j2sdk1.4.2_05/bin/java -server html output length: 6944735 real 0m1.881s user 0m0.780s sys 0m0.050s
Aug 09 2004
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 8 Aug 2004 20:48:36 +0000 (UTC), Dave wrote:

 In article <schancel-8833C7.08301601082004 digitalmars.com>, Sha Chancellor
 says...
In article <ceis7r$23bt$1 digitaldaemon.com>,
 Dave <Dave_member pathlink.com> wrote:

 I'm very new to D (literally as of yesterday), but am very impressed with 
 what I'm seeing so far.
 
 Being that I want this language to succeed and an important part of that will 
 be performance potential over C, I'm curious - how does/will D deal with the
 pointer 'aliasing problem' that plagues C and C++ compiler developers?


What aliasing problem would you be referring too?  ( No D does not deal 
with it with denial, it's an honest question :)

--- Thump, thump - up on the soapbox --- I hate to say it, but it's appearing more and more like it is being dealt with by denial ;)

[big snip] I think I understand what the "aliasing" issue is. What I can't quite see is what you think the resolution should be. Are you ask for the compiler to detect (potential?) aliasing situations and issue an error message? Or is it a run-time solution you are requiring? If someone on my team coded something like the example you gave, I'd have them rewrite that potentially dangerous code into something a *lot* more sane. Even if updating a shared variable absolutely had to be coded, I'd insist on a runtime check, along the lines of ... int i; void func( inout int ri ) { if (&ri != &i) { for( int j = 0; j < 10; j++ ) { ri++; i++; } } else throw new Error("Aliased ri/i"); } int main() { i = 10; func(i); printf("%d\n",i); return 0; } //--------------Though this is a safer way... int i; int func( in int ri ) { for( int j = 0; j < 10; j++ ) { ri++; i++; } return ri; } int main() { int res; i = 10; i = func(i); printf("i=%d\n",i); return 0; } //----------------- For a compiler to detect aliasing and abort when found, is a dangerous route. It assumes that the compiler absolutely knows what is in the mind of the coder. Who's not to say that the aliasing situation is not the desired one in some circumstances. eg. To demonstrate it to students. It think Walter saying that it is better to assume that variables are not aliased is the better way to go. And if the coder writes code that can possibly result in it happening, then it is their responsibility to check for it. -- Derek Melbourne, Australia 9/Aug/04 9:42:21 AM
Aug 08 2004
parent Dave <Dave_member pathlink.com> writes:
In article <cf6f15$557$1 digitaldaemon.com>, Derek Parnell says...
On Sun, 8 Aug 2004 20:48:36 +0000 (UTC), Dave wrote:
[big snip]

I think I understand what the "aliasing" issue is. What I can't quite see is what you think the resolution should be. Are you ask for the compiler to detect (potential?) aliasing situations and issue an error message? Or is it a run-time solution you are requiring?

What I'm specifically thinking of is something very close to what Walter mentions here: D/28215 Strictly speaking of aliasing problems, since I think non-overlapping array slices are covered by the spec. already, non-aliasing function params would give the most bang for the buck because this is where it messes up optimization the most and probably where it is easiest to check. I think it would be do-able for a compile-time check to be made on out/inout function/method parameters for native, built-in and object data types, but /not/ pointers, and not for extern (C) functions. That way, the compiler could be 'sure' that it could aggressively optimize for these types of functions and not worry about side effects. I think this would also have a lot of utility because D coding won't require pointers so much as C and to some extent C++. And of course, it may keep progammers from stepping on each others (heck, there own) code quite a bit for large programs with quite a few modules. I think the above would be a reasonably workable solution for the following types of situations, because the scoping, out/inout params. and 'import' functionality of D gives the compiler the visibility it needs to determine aliasing like this. Ok, I know the following is simplistic and doesn't cover all situations, but something like this is what I'm talking about. ;--- objx.d: class ObjX { int varY; int varZ; } main.d: import objx; int i; ObjX x, y, z; // The compiler would "only" need to track variables declared outside // a functions scope of the type(s) passed through out/inout param(s). // and accessed in a function. The scope of all variables inside a // function has to be known when parsing the function, right? // Otherwise an "undefined identifier" error would occur. void foo(inout int ri) { ri++; i++; } // Function _D5main3fooFKivz stored in a linked list attribute in the symbol // table for variable _D5main1ii void bar(out int ri, inout int rj) { ri--; rj++; } void baz(inout ObjX ox) { ox.varY *= 10; x.varZ--; } class A { int j; } class B : A { int k; this() { this.x = new ObjX; } void foobar(out int z) { k++; z = k / 10; } ObjX x; } B b; void foobaz(inout A a) { b.k / a.j; } int a[]; void snafu(inout int arr[]) { arr[] = 10; for(int i; i < 10; i++) { a[i] /= arr[i]; } } void main(char[][] args) { foo(i); // Compile error: i is accessed in foo(); can't be passed byref bar(i,i); // Compile error: i is passed by ref. for more than 1 param. x = new ObjX(); baz(x); // Error: x is accessed in baz() y = x; // Stored in symbol table for y as currently referring to x FuncY(); // Error: x is referred to by y and accessed in baz(), called by FuncY() (see * below) z = new ObjX(); y = z; // Symbol table says y is now referring to z FuncY(); // Ok FuncZ(); // Error: a refers to b which is accessed in foobaz b.foobar(b.k); // Error: b.k is passed by ref. and accessed in b.foobar() a.length = 10; snafu(a); // Error: a is passed by ref. and accessed in snafu() for(int i; i < 10; i++) { printf("a[%d] = %d\n",i,a[i]); } y = b.x; FuncY(); // Ok: y referring to b.x, which is not accessed in baz() b.x = x; FuncY(); // Error: b.x refers to x, which is accessed in baz(), called by FuncY() } // (*) A symbol table lookup would check the scope of y for use in FuncY // The symbol lookup on y and a check of the 'refers to' attribute could // tell the compiler that y currently refers to x // symbol table attribute for x would list baz() as a function with a // referrence parameter of type ObjX inside x's scope, generating an // error. // Possible w/o jumping through hoops?? void FuncY() { baz(y); } void FuncZ() { b = new B; A a = b; foobaz(a); } ;---
If someone on my team coded something like the example you gave, I'd have
them rewrite that potentially dangerous code into something a *lot* more
sane. Even if updating a shared variable absolutely had to be coded, I'd
insist on a runtime check, along the lines of ...

I think you may be talking more of the specifics of the example, whereas I'm talking more generalities exemplified by it. From what I've seen there is a lot of code out there where file scope vars. could be passed into large functions by reference, and break things. Those are nasty bugs to track down also, especially in someone elses code. With the C and C++ specs. (and right now for D without 'noalias function reference parameters' in the spec.), the compiler has to produce the semantically correct results, so the compiler can't safely optimize many functions even when in actuality they are never used in a way that would be broken by the optimizations. That's the crux of the issue. Aliasing can effect the code generated for a lot more than in just the few functions where it may actually apply. It's one of those "a few bad apples spoil the barrel" type of things. C has another problem because extern scope vars. can be accessed in library functions that are linked in, the programmer and compiler can't reasonably check for this in alot of cases. D on the other hand can check for this easier if I'm correct that the import statement gives the compiler visibility to imported module variable definitions. That's also why it should be able to inline functions better over an entire program, which is a huge plus for D (if I'm right about how import works). I'm guessing Intel spent several man-years on Whole Program Optimization and alias tracking so they could safely use aggressive optimization techniques when building their C/C++ compiler. Looks like they succeeded - comparable C code often (but not always) runs as fast as their Fortran compiler it seems for even numerical stuff, at least for artificial benchmark types of code, and there Fortran compiler is supposed to be quite good. I don't think the alias fix would take several man years for Walter. Because of the language design (and who we have writing the compiler), I suspect it maybe something do-able, maybe even b4 v1.0 is released. The whole point of my earlier rant is that I think a reasonable amount of effort could pay big dividends for D. It's worth some more discussion anyhow, I think.. - Dave
Aug 08 2004
prev sibling parent Dave <Dave_member pathlink.com> writes:
Dave wrote:

[big snip]
 community sees it that way. If D takes off (and I'm convinced it won't
 unless some basic issues like 'the aliasing problem' are taken care of)

For the record, after a bit more experience with D and a lot more thought, I hereby officially "retract" the above statement ;). On the contrary and FWIW, I'm starting to become more and more convinced D will be a hit. I think in many ways it is already given its maturity relative to other languages. It just offers too many other advantages (in terms of optimization opportunities and other major areas) to 'fail' because of this issue. Thanks, - Dave
Aug 17 2004
prev sibling next sibling parent reply "Sampsa Lehtonen" <snlehton cc.hut.fi> writes:
Howabout introducing a special keyword so that you could mark variables  
that will not alias?

I suggest this because the alias detection is very costly to do. During  
the compilation it is impossible (just imagine an array of  
pointers/objects that is filled runtime). During runtime you need to do it  
all the time for it to be effective - always when utilizing two variables  
of the same kind, or taken to the extreme: when accessing two memory  
locations.

For example:

void mangle(inout int a, inout int b)
{
   b += a + 5;
   a += b;
}

generates something like in pseudo-risc-asm:

// calculate b+a+5
mov reg1, a // reg1 <- a
mov reg2, b
add reg2, reg2, 5 // reg1 <- reg1 + 5
add reg2, reg2, reg1
// store b
mov b, reg2
// calculate a+b
mov reg1, a
add reg1, reg1, reg2
// store a
mov a, reg1

in this example there are one unnecessary read and one store, if the  
values do not alias. The b needs to be stored in the first statement  
because the value of a would be changed if they aliased. Similarly, we  
need to read the value a again in the latter statement because we don't  
know if the variables aliased or not.

If you know that these variables will never ever alias, you can touch them  
with noalias keyword which will tell the compiler that it can perform some  
optimizations.

void mangle(inout noalias int a, inout noalias int b)
{
   b += a + 5;
   a += b;
}

that would generate:

// calculate b+a+5
mov reg1, a
mov reg2, b
add reg2, reg2, 5
add reg2, reg2, reg1
// calculate a+b
add reg1, reg1, reg2
// store both a and b
mov a, reg1
mov b, reg2

Now the two memory accesses in the between are removed and the algorithm  
would run faster. With more complex algorithms the benefits get even  
bigger as more stuff could be stored in the registers.

This syntax might be familiar for some of you from compiler called  
VectorC. (http://www.codeplay.com/)

To sum it up:

Automatic alias detection is a very hard task to do - it will involve  
complex data flow analysis compile-time and tedious checking run-time.
In compile-time, the optimizations must be safe. If there is a change that  
variables might alias, they are expected to alias. This reduces the  
changes to optimize while the calculation time is huge (compare for  
example to inline-optimizations). It works on local variables though  
(which will be optimized anyway). Run-time checks bloat the code so that  
the benefits will vanish. And what would happen in variables alias?  
Exception thrown?

But manually aiding the compiler is very easy to implement and can be  
efficient. Instead of introducing new keyword it could be done with the  
compiler extensions that D supports (this is a compiler design problem,  
after all). There is still risk of human error, but then again if you need  
to optimize your code, you should know what you are doing.

-- texmex/sampsa lehtonen



On Sun, 1 Aug 2004 13:46:35 +0000 (UTC), Dave <Dave_member pathlink.com>  
wrote:

 I'm very new to D (literally as of yesterday), but am very impressed  
 with what
 I'm seeing so far.

 Being that I want this language to succeed and an important part of that  
 will be
 performance potential over C, I'm curious - how does/will D deal with the
 pointer 'aliasing problem' that plagues C and C++ compiler developers?

 From what little I've seen so far, it seems that this same problem has  
 been
 'forced' on D by it's backward compatability with C libraries and  
 C/C++ -like
 support for pointers.

 IMHO, any language that seeks to replace C/C++ should do it's best to  
 avoid this
 problem, or at least discourage code that introduces it.

-- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 10 2004
parent reply "Sampsa Lehtonen" <snlehton cc.hut.fi> writes:
Duh!

I should have read all the messages in the thread before posting... never  
mind.

I think that function parameters should be considered to alias, at least  
in the regular builds. In the optimized versions (release), they could be  
expected not to alias... But then again, this might have weird side  
effects, where debug code works as expected and optimized doesn't.

At least it would be good idea to have option for "safe-compile". Coupled  
with compiler extensions for noalias/alias it could prove powerful.

-- texmex/sampsa lehtonen

On Tue, 10 Aug 2004 16:09:57 +0300, Sampsa Lehtonen <snlehton cc.hut.fi>  
wrote:

 Howabout introducing a special keyword so that you could mark variables  
 that will not alias?

-- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 10 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <opscip3stw35qbu1 macray>, Sampsa Lehtonen says...
Duh!

I should have read all the messages in the thread before posting... never  
mind.

Thanks for posting - your example helped clarify the 'cost' of aliasing on performance, and why. Below is a simple example of what I'm talking about and why I think this issue is important to consider. #include <cstdio> // This sample is in C++ because it can demonstrate the large // performance difference with an Intel C++ compiler switch. // Think of '&' as 'inout' in D void foo(int& ri, int& rj) { // ri and rj are not related as used and are operated on separately. // But the C/C++ spec. says thecompiler has to assume that they may // reference the same var., so this code cannot be optimized aggressively. for(int idx = 0; idx < 10000000; idx++) { ri++; rj++; } } void bar(int& ri, int& rj, int& rk) { // Same here - successive calls passing around references (very common, // especially in OOP code) only exaserbates the issue. foo(ri,rj); for(int idx = 0; idx < 10000; idx++) { ri = rj % 10; rk += rj - rk; } } int main() { int i = 0, j = 0, k = 0; for(int idx = 0; idx < 1000; idx++) bar(i,j,k); printf("i = %d, j = %d, k = %d\n",i,j,k); } # icc -O3 -static t_alias.cpp -o t_alias # icc -O3 -fno-alias -static t_alias.cpp -o t_alias_opt # time t_alias i = 8, j = 1410065408, k = 1410065408 real 0m17.371s user 0m17.310s sys 0m0.010s # time t_alias_opt i = 8, j = 1410065408, k = 1410065408 real 0m2.414s user 0m2.410s sys 0m0.000s It's apparent this can make a very large difference in peformance, yet the results and code are identical. Considering methods calling methods calling methods, etc... The end-results can be pretty large. The magnitude of this result actually surprised me also, and I knew it was a problem. The whole concern is that more often than not, the aliasing 'de-optimization' effects common, often used, correctly used code because C/C++ compilers allow aliasing. My proposal would be something along the lines of changing out/inout function/method params to 'noalias' by default, if a compile-time or even debug run-time check could be done. How about this for a proposal: - 'noalias' for out/inout params limited to primitive types, structs and arrays of primitive types. - pointers left as-is for C compatibility, and so code like the above could be done if the side effects of param1 and param2 referencing the same variable are desired. - For primitive type params (not pointers to prim. types), warn on passing of a de-referenced pointer. For example: int i; int* j = &i; foo(i,*j); //Compiler warning: de-referenced ptr. passed inout. Among others I'm sure, that leaves this case in D: class X { int i; } int main() { X x = new X(); x.i = 10; Y y = x; // y is now a reference to x in D foo(x.i,y.i); } Anyone think of a solution for that?? Maybe that would be a case for a debug runtime check. Or could the compiler reasonably check for this at compile time? Something like the above would probably satisfy the numerics crowd along with many other applications of inout params., because often the most performance sensitive code (i.e.: passed params used in tight loops) deals with primitive types. They added the 'restrict' keyword in C, and I guess many lib. function calls are being re-written to use it. Even functions like fopen are being changed: /* C89: */ fopen(const char *path, const char *mode); // C99: fopen(const char * restrict path, const char * restrict mode); The difference can be that large I guess. Who would've thought fopen()? It's used a lot but usually not in tight loops.. Think of UI code passing primitive types around. Think of the inards of a lot of templates using native types. Think of socket calls de/serializing structs. Just about anything called in functions passing around references used repetitively can be effected in a big way by aliasing. As for your mention of aliasing within arrays, I believe the spec. already prohibits some 'overlapping' for slicing, so that takes care of part of your concern for native type arrays, I think. - Dave
Aug 10 2004
parent reply Dave <Dave_member pathlink.com> writes:
I inadvertently skewed the results when I caused an overflow by bumping the loop
count up to make the test run for a decent amt. of time.

If you change the loop in both foo() and bar() to 1000000 iterations, the
relative difference is even larger (>7x compared to >10x):

# time t_alias_opt # no-alias
i = 0, j = 1000000000, k = 1000000000
real    0m1.010s
user    0m1.000s
sys     0m0.000s

# time t_alias # alias
i = 0, j = 1000000000, k = 1000000000
real    0m11.287s
user    0m10.810s
sys     0m0.020s

BTW - Just to be clear, my intention is /not/ to suggest a compiler switch for D
like the demonstration C/C++ compiler has.

Thanks,

- Dave

In article <cfaq4t$2v67$1 digitaldaemon.com>, Dave says...
Below is a simple example of what I'm talking about and why I think this issue
is important to consider.

#include <cstdio>
// This sample is in C++ because it can demonstrate the large
//  performance difference with an Intel C++ compiler switch.
//  Think of '&' as 'inout' in D
void foo(int& ri, int& rj)
{
// ri and rj are not related as used and are operated on separately.
//  But the C/C++ spec. says thecompiler has to assume that they may
//  reference the same var., so this code cannot be optimized aggressively.
for(int idx = 0; idx < 10000000; idx++) { ri++; rj++; }
}

void bar(int& ri, int& rj, int& rk)
{
// Same here - successive calls passing around references (very common,
//  especially in OOP code) only exaserbates the issue.
foo(ri,rj);
for(int idx = 0; idx < 10000; idx++) { ri = rj % 10; rk += rj - rk; }
}

int main()
{
int i = 0, j = 0, k = 0;
for(int idx = 0; idx < 1000; idx++) bar(i,j,k);
printf("i = %d, j = %d, k = %d\n",i,j,k);
}

# icc -O3 -static t_alias.cpp -o t_alias
# icc -O3 -fno-alias -static t_alias.cpp -o t_alias_opt

# time t_alias
i = 8, j = 1410065408, k = 1410065408
real    0m17.371s
user    0m17.310s
sys     0m0.010s

# time t_alias_opt
i = 8, j = 1410065408, k = 1410065408
real    0m2.414s
user    0m2.410s
sys     0m0.000s

It's apparent this can make a very large difference in peformance, yet the
results and code are identical.

Aug 10 2004
parent reply Sean Kelly <sean f4.ca> writes:
Kind of a contrived example, but still applicable I suppose.  Walter had
mentioned defaulting to "noalias" for function parameters.  If the compiler can
enforce this for inout and class parameters then I'm all for it.  I can
understand how this may not be possible for pointers, however.


Sean
Aug 10 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 10 Aug 2004 19:37:13 +0000 (UTC), Sean Kelly <sean f4.ca> wrote:

 Kind of a contrived example, but still applicable I suppose.  Walter had
 mentioned defaulting to "noalias" for function parameters.  If the 
 compiler can
 enforce this for inout and class parameters then I'm all for it.  I can
 understand how this may not be possible for pointers, however.

If checking for aliasing is difficult/time consuming then we could only check in debug builds. eg. void foo(inout int a, inout int b) { //check a and b and assert if aliased (only in debug builds). } If aliasing is rare then noalias is a good default, then an 'alias' keyword is required to tell the compiler when a parameter could be aliased: void bar(alias inout int a, alias inout int b) { //no check and assert (even in debug builds). } If aliasing is only a problem with 'inout' parameters instead of a new keyword, what about a new parameter mode, eg: void bar(alias int a, alias int b) { //no check and assert (even in debug builds). } So we have: 'in' - (as is currently) 'out' - (as is currently) 'inout' - (should not be aliased, debug check and assert) 'alias' - (same as inout except could be aliased, no debug check) Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 10 2004
next sibling parent reply "Sampsa Lehtonen" <snlehton cc.hut.fi> writes:
On Wed, 11 Aug 2004 10:28:15 +1200, Regan Heath <regan netwin.co.nz> wrote:

 So we have:
 'in'    - (as is currently)
 'out'   - (as is currently)
 'inout' - (should not be aliased, debug check and assert)
 'alias' - (same as inout except could be aliased, no debug check)

I understand that making the 'noalias' as a default for out and inout parameters sounds tempting, but because it is then impossible to quarantee the correctness of the program, it shouldn't be done. I think that the decision whether to take the risk or not should be given to the user. It might be a compiler flag ('assume-noalias') or ability to manually mark the parameters with 'noalias' keyword or compiler extensions. And this doesn't mean that the compiler wouldn't optimize the non-aliased variables unless it has been an order to do so. Of course it can take the optimizations that it can be sure of. For example, if the variables that are used at the same time are insulated in a certain scope, the compiler can quite easily see if aliasing would occur. In my opinion, compilers should produce 100% correct code unless the user is willing to take risks and try out some optimizations. Also, I think that a separation between the meaning of the program code and the optimization performed on it should be made. That's why the keyword noalias feels a bit bad... How about pragmas? pragma(noalias, a, b) void foo(inout int a, inout int b) { } Btw, how can i define multiple pragmas that affect same declaration? Should it be pragma(foo) { pragma(bar) { void foobar() { } } } ? Looks a bit akward. pragma(foo) pragma(bar) void foobar() {} looks better. Also, I find it a bit odd that the compiler must report unknown pragmas as an error... if we had some optimization pragmas that not all compilers support, we must wrap them in a version statements. But can't create a pragma that affected a function AND which would affect only certain compiler: version(DigitalMars) { pragma(noalias) } void foobar(inout a, inout b) {} Or does this work? If it does, it doesn't make sense, because isn't the version statement considered as a block of fully formed statements (which the pragma currently isn't, it's not terminated with ; but instead it's bound the foobar)? Well, that went a bit OT... sorry :) -texmex/sampsa lehtonen -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 11 2004
next sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <opsckkmb1035qbu1 macray>, Sampsa Lehtonen says...
I understand that making the 'noalias' as a default for out and inout  
parameters sounds tempting, but because it is then impossible to quarantee  
the correctness of the program, it shouldn't be done. I think that the  

Please refer to my post just ahead of this one. Debug checks are already done for array bounds and I think would be consistent for noalias for value types and arrays if that is clearly spelled out in the language spec. (like here: http://digitalmars.com/d/arrays.html for Array Bounds Checking spec.).
decision whether to take the risk or not should be given to the user. It  
might be a compiler flag ('assume-noalias') or ability to manually mark  
the parameters with 'noalias' keyword or compiler extensions. And this  
doesn't mean that the compiler wouldn't optimize the non-aliased variables  

I don't think the language needs more keyword/type specifier complexity to carry this out, especially if it is clearly outlined in the spec. what the debug build runtime checks would be responsible for. I'm way against non-spec. compiler extensions. This language is intended to be portable. And I don't want my CD full of code examples to break because I change compilers. Compiler flags like 'assume-noalias' to me cover the shortcomings of a language and are one of the biggest learning-curve challenges to beginners, and add a lot of time to optimizing/tuning code (i.e.: let see, if I code this and set this flag, it is this fast, hmmmm, what if I code this, and set this flag, or, oh yea, how about this...). Plus, and this is a very 'real-world' scenario, what if some code is changed that 'assume-noalias' breaks and the developer forgets to tell the build-master about it (or change the build him/herself)? I think flags like this would be one of the biggest nits for people coming from C# or Java too. Finally, it is possible, given that some of the infrastructure/archtecture is already there for debug array bounds checking, that the 'noalias' debug runtime checks would both keep the compiler implementation simpler and be more consistent with what users expect, as well as warning them on potential aliasing issues when then are not intended. Referring to my post just ahead of this one, another pitfall to that idea as outlined there.. I realize that COM Interfaces may be a problem, as well as extern and export. An addition to that spec. proposal would be to exempt those from noalias. That again would be consistent with other parts of the language spec.
unless it has been an order to do so. Of course it can take the  
optimizations that it can be sure of. For example, if the variables that  
are used at the same time are insulated in a certain scope, the compiler  
can quite easily see if aliasing would occur. In my opinion, compilers  

Yes - in this case, the compiler should abort/report/warn and not leave it to the runtime debug checks, as is in the spec. for array bounds checking. But if that is difficult to do or makes things more confusing, I think runtime debug checks are good enough. - Dave
Aug 11 2004
parent reply "Sampsa Lehtonen" <snlehton cc.hut.fi> writes:
On Wed, 11 Aug 2004 15:33:48 +0000 (UTC), Dave <Dave_member pathlink.com>  
wrote:


 I'm way against non-spec. compiler extensions. This language is intended  
 to be
 portable. And I don't want my CD full of code examples to break because  
 I change
 compilers.

Well, my idea was that the compiler extensions (pragmas) would just be hints for the compiler how to optimize the code. If compiler doesn't support them, it would ignore them.
 Compiler flags like 'assume-noalias' to me cover the shortcomings of a  
 language
 and are one of the biggest learning-curve challenges to beginners, and  
 add a lot
 of time to optimizing/tuning code (i.e.: let see, if I code this and set  
 this
 flag, it is this fast, hmmmm, what if I code this, and set this flag,  
 or, oh
 yea, how about this...). Plus, and this is a very 'real-world' scenario,  
 what if
 some code is changed that 'assume-noalias' breaks and the developer  
 forgets to
 tell the build-master about it (or change the build him/herself)?

Been there, done that. I know it should be avoided. However, making compiler that produces optimal code automagically just isn't that easy. The other solution is to tell the aliasing info variable by variable.
 Finally, it is possible, given that some of the  
 infrastructure/archtecture is
 already there for debug array bounds checking, that the 'noalias' debug  
 runtime
 checks would both keep the compiler implementation simpler and be more
 consistent with what users expect, as well as warning them on potential  
 aliasing
 issues when then are not intended.

I hope you understand what these runtime checks would be. They aren't done just in the header of the function, but they should be done all the time. For example: MyClass a = x; MyClass b = y; if (random > 0.5) b = x; // check for aliasing here or assume aliasing a.i += b.i; b.i += a.i; In that example, it can't be detected compile-time whether aliasing will occur or not. The compiler cannot hold a.i in a register but it must flush it back to the memory if aliasing is assumed OR a check must be inserted to the code if we want to detect aliasing. This is just a simple example, with more complex one the amount of checks would be huge. And the checks must be made against all variables that are live simultaneously, which would bloat the code even more.
 Referring to my post just ahead of this one, another pitfall to that  
 idea as
 outlined there.. I realize that COM Interfaces may be a problem, as well  
 as
 extern and export. An addition to that spec. proposal would be to exempt  
 those
 from noalias. That again would be consistent with other parts of the  
 language
 spec.

Well all external variables should be considered aliasing. But then again, if the programmer knew that certain variables wouldn't alias, how would he tell this to the compiler?...
 unless it has been an order to do so. Of course it can take the
 optimizations that it can be sure of. For example, if the variables that
 are used at the same time are insulated in a certain scope, the compiler
 can quite easily see if aliasing would occur. In my opinion, compilers

Yes - in this case, the compiler should abort/report/warn and not leave it to the runtime debug checks, as is in the spec. for array bounds checking. But if that is difficult to do or makes things more confusing, I think runtime debug checks are good enough.

Compiler might not be sure whether aliasing will occur and unnecessary warnings would be printed. The aliasing problem isn't as simple as you might think. As I said, it isn't just a matter of function parameters, it's everything that has something to do with pointers. By pointers I mean _actual_ pointers to memory locations, not just * pointers in D or C/C++. Objects in D and Java are pointers as well. There is no magic bullet to this matter. Compiler cannot do everything for the programmer. -texmex/sampsa lehtonen -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 12 2004
parent Dave <Dave_member pathlink.com> writes:
In article <opscl6lhin35qbu1 macray>, Sampsa Lehtonen says...
I hope you understand what these runtime checks would be. They aren't done  
just in the header of the function, but they should be done all the time.  
For example:

MyClass a = x;
MyClass b = y;
if (random > 0.5)
   b = x;
// check for aliasing here or assume aliasing
a.i += b.i;
b.i += a.i;

_All_ this thread has been talking about is aliasing of _function parameters_. Not necessarily all of them either. For example, the spec. could be written to /just/ check aliasing on /value types and value type arrays/, not reference types, not array members (the spec. disallows overlapping already), not arrays of reference types, maybe leave structs out as well, and _not_ pointers. extern (C) or extern (Windows) would be left as-is in any case. void foo(inout int[] a, inout int[] b) { // runtime check here for &a != &b, if so, warn // Code following is compiled as it is now
MyClass a = x;
MyClass b = y;
if (random > 0.5)
   b = x;
// check for aliasing here or assume aliasing
a.i += b.i;
b.i += a.i;

And the checks must be made against all variables that are live  
simultaneously, which would bloat the code even more.

The run time checks would be debug only, like they are now for array bounds. No release bloat, just smaller and faster release code ;) See above for when the checks would be inserted. This is consistent with array bounds checking.
Well all external variables should be considered aliasing. But then again,  
if the programmer knew that certain variables wouldn't alias, how would he  
tell this to the compiler?...

If the spec. is limited to just function params., externs would be checked, i.e. (C++ for clarity): extern int i; extern int *j; void foo(int &ri) { ri++; if(random > 5) { // i is declared outside function scope and is accessed in this function and // the function has int ref. param. // Compiler inserts ri != i check here - consistent with array bounds i++; } if(random > 10) { // pointer involved // Code generated here would be the same as now - assume aliasing with j *j = ++ri; } } Ok - so *j = ++ri; might complicate the job of the compiler a bit. If that's the case, switch back to assume aliasing for the whole function if it contains pointers declared outside it's scope. With D that probably won't effect nearly as many functions as in C/C++.
Compiler might not be sure whether aliasing will occur and unnecessary  
warnings would be printed.

Fine - debug runtime checks only.. However, I would think that in some cases the compile-time check would be pretty much foolproof, i.e.: int i; foo(i,i); // Warning during type checking of func. params
The aliasing problem isn't as simple as you might think. As I said, it  
isn't just a matter of function parameters, it's everything that has  

I think you're making it harder than it has to be ;) I'm /not/talking about global aliasing or even operations on pointers, just with value type variables and array function params. Doing this for just function params. for value types (_as explained above_) would probably provide a big bang for the buck and keep 'noalias' manageable and reasonably safe.
Aug 12 2004
prev sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Wed, 11 Aug 2004 16:19:13 +0300, Sampsa Lehtonen <snlehton cc.hut.fi> 
wrote:
 On Wed, 11 Aug 2004 10:28:15 +1200, Regan Heath <regan netwin.co.nz> 
 wrote:

 So we have:
 'in'    - (as is currently)
 'out'   - (as is currently)
 'inout' - (should not be aliased, debug check and assert)
 'alias' - (same as inout except could be aliased, no debug check)

I understand that making the 'noalias' as a default for out and inout parameters sounds tempting, but because it is then impossible to quarantee the correctness of the program, it shouldn't be done.

Is it impossible? In a debug build couldn't the compiler insert checks to ensure the variables are not aliased? I understand that for pointers you cannot determine whether they are aliased or not, but D's arrays can be checked trivially, and pointers (so far) seem much less important in D than in C/C++. So perhaps 2 statements could be made: - pointers are assumed to be aliased unless 'noalias' is used. - other variables are assumed not to be aliased, unless 'alias' is used. And in debug builds the validity of the above could be checked (where possible).
 I think that the  decision whether to take the risk or not should be 
 given to the user. It  might be a compiler flag ('assume-noalias') or 
 ability to manually mark  the parameters with 'noalias' keyword or 
 compiler extensions. And this  doesn't mean that the compiler wouldn't 
 optimize the non-aliased variables  unless it has been an order to do 
 so. Of course it can take the  optimizations that it can be sure of. For 
 example, if the variables that  are used at the same time are insulated 
 in a certain scope, the compiler  can quite easily see if aliasing would 
 occur. In my opinion, compilers  should produce 100% correct code unless 
 the user is willing to take risks  and try out some optimizations.

I agree with the general statement here. The compiler should produce stable code by default. I think for most variables in D you can verify whether they are aliased or not, pointers being the big exception that comes to mind, so, given that, if a feature was added to the compilter to check them (where it can) in debug builds and optimise them (where it can) in release builds wouldn't we get stable /and/ fast code.
 Also, I think that a separation between the meaning of the program code  
 and the optimization performed on it should be made. That's why the  
 keyword noalias feels a bit bad... How about pragmas?

I dislike pragmas in general. That said, on one hand I see what you're saying, but on the other, an 'alias' keyword does effect the meaning of the program code, or rather describes a property of the variable which then can have an effect on the program code.
 pragma(noalias, a, b)
 void foo(inout int a, inout int b)
 {

 }

Lastly, I don't want to have to type that much. :) Some call me lazy, I prefer efficient.
 Btw, how can i define multiple pragmas that affect same declaration?  
 Should it be

 pragma(foo)
 {
 pragma(bar)
 {
 void foobar() { }
 }
 }

 ? Looks a bit akward.

 pragma(foo)
 pragma(bar)
 void foobar() {}

 looks better.

Is foobar missing it's parameters foo and bar? If so, I prefer void foobar(alias inout foo, alias inout bar) { }
 Also, I find it a bit odd that the compiler must report unknown pragmas 
 as  an error... if we had some optimization pragmas that not all 
 compilers  support, we must wrap them in a version statements. But can't 
 create a  pragma that affected a function AND which would affect only 
 certain  compiler:

 version(DigitalMars)
 {
    pragma(noalias)
 }
 void foobar(inout a, inout b) {}

 Or does this work? If it does, it doesn't make sense, because isn't the  
 version statement considered as a block of fully formed statements 
 (which  the pragma currently isn't, it's not terminated with ; but 
 instead it's  bound the foobar)?

 Well, that went a bit OT... sorry :)

NP.. If you want this to get some attention I'd post it as it's own topic (if you haven't already.. I have not checked). Regards, Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 11 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <opsclc1kpg5a2sq9 digitalmars.com>, Regan Heath says...
On Wed, 11 Aug 2004 16:19:13 +0300, Sampsa Lehtonen <snlehton cc.hut.fi> 
wrote:
 I understand that making the 'noalias' as a default for out and inout  
 parameters sounds tempting, but because it is then impossible to 
 quarantee  the correctness of the program, it shouldn't be done.

Is it impossible? In a debug build couldn't the compiler insert checks to ensure the variables are not aliased? I understand that for pointers you cannot determine whether they are aliased or not, but D's arrays can be checked trivially, and pointers (so far) seem much less important in D than in C/C++. So perhaps 2 statements could be made: - pointers are assumed to be aliased unless 'noalias' is used. - other variables are assumed not to be aliased, unless 'alias' is used.

Just curious.. If: - D still aliases by default with pointers (and D uses ptrs. less anyhow) - non-pointers are changed to noalias - pointers could still be used to get the alias side-effects if desired - and the debug runtime checks would presumably work like array bounds checking.. Why would D need new keywords, compiler extensions, pragmas or compiler switches? Thanks, Dave
Aug 11 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Thu, 12 Aug 2004 04:57:24 +0000 (UTC), Dave <Dave_member pathlink.com> 
wrote:

 In article <opsclc1kpg5a2sq9 digitalmars.com>, Regan Heath says...
 On Wed, 11 Aug 2004 16:19:13 +0300, Sampsa Lehtonen <snlehton cc.hut.fi>
 wrote:
 I understand that making the 'noalias' as a default for out and inout
 parameters sounds tempting, but because it is then impossible to
 quarantee  the correctness of the program, it shouldn't be done.

Is it impossible? In a debug build couldn't the compiler insert checks to ensure the variables are not aliased? I understand that for pointers you cannot determine whether they are aliased or not, but D's arrays can be checked trivially, and pointers (so far) seem much less important in D than in C/C++. So perhaps 2 statements could be made: - pointers are assumed to be aliased unless 'noalias' is used. - other variables are assumed not to be aliased, unless 'alias' is used.

Just curious.. If: - D still aliases by default with pointers (and D uses ptrs. less anyhow) - non-pointers are changed to noalias - pointers could still be used to get the alias side-effects if desired - and the debug runtime checks would presumably work like array bounds checking.. Why would D need new keywords, compiler extensions, pragmas or compiler switches?

Perhaps not. :) Unless you're forced to use a pointer, know it's not aliased and want to have it optimise. Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 16 2004
parent Dave <Dave_member pathlink.com> writes:
Regan Heath wrote:

 On Thu, 12 Aug 2004 04:57:24 +0000 (UTC), Dave <Dave_member pathlink.com>
 wrote:
 
 Just curious..

 If:
 - D still aliases by default with pointers (and D uses ptrs. less anyhow)
 - non-pointers are changed to noalias
 - pointeimplicitstill be used to get the alias side-effects if desired
 - and the debug runtime checks would presumably work like array bounds
 checking..

 Why would D need new keywords, compiler extensions, pragmas or compiler
 switches?

Perhaps not. :) Unless you're forced to use a pointer, know it's not aliased and want to have it optimise.

I was thinking along the lines of not adding complexity (from the user standpoint) to the language or tools if it could be avoided and still cover most cases. I figure being 'forced' to use pointers in D would almost always happen when calling libs. from other languages and in that case the rules for the other language inside the lib. function would apply anyhow. From my understanding of D so far, even things that would ideally run really fast like iterators will be implicit references and not explicit pointers, but I'm sure there are other cases. This and the better code/data visibility of the D compiler through import are two of the reasons I'm so eager to try and fix part of the aliasing overhead with D - I think it can be done w/o complicated kludges like "link-time code generation" as C/++ compilers are forced to use. The implications of import on inlining and finalizing along with other things in D like foreach(...) and first-class arrays to simplify optimization are pretty large I think, even if aliasing is not directly addressed. This really is a very, very cool language design, IMHO. BTW - In an earlier rant, I mentioned I thought D would not succeed if the aliasing issue is not addressed. I certainly /do not/ think that way anymore. Thanks, - Dave
Aug 17 2004
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <opscjfddfc5a2sq9 digitalmars.com>, Regan Heath says...
If checking for aliasing is difficult/time consuming then we could only
check in debug builds. eg.

void foo(inout int a, inout int b)
{
   //check a and b and assert if aliased (only in debug builds).
}

Is your proposal that the assert's be inserted by the compiler for debug builds? If so, I like it! It's consistent with other runtime checks for D, like array bounds, so that would make sense, be intuitive and presumably more straightforward to implement for the compiler developer. I would guess that array bounds are checked at runtime and not compile time for the same reasons it's hard for aliasing (i.e.: it's very hard for the compiler to resolve if array bounds will be violated or not at compile time). I think it would also have to check the case where the function accessed a variable outside the function scope that is the same type as 'a': int b;
Aug 11 2004
parent Dave <Dave_member pathlink.com> writes:
Bahhh! This happened again!! Either my browser or the news server somehow cut my
post short (Ok, quit clapping <g>).

Here is is again:

;---

In article <opscjfddfc5a2sq9 digitalmars.com>, Regan Heath says...
If checking for aliasing is difficult/time consuming then we could only
check in debug builds. eg.

void foo(inout int a, inout int b)
{
   //check a and b and assert if aliased (only in debug builds).
}

Is your proposal that the assert's be inserted by the compiler for debug builds? If so, I like it! It's consistent with other runtime checks for D, like array bounds, so that would make sense and be intuitive. I think it would also have to check the case where the function accessed a variable outside the function scope that is the same type as 'a': int b; int[] arr; /* more code */ void foo(inout int a) { // (i) debug assert here? a++; arr.length = 5; if(whatever == true) { // or (ii) debug assert here - depends on a thorough test case? b++; // I'm for (ii) because that is consistent with debug build array bounds // checking now and would be the easiest to implement. // runtime array bounds error for debug builds happens here now. arr[5] = b; } }
If aliasing is rare then noalias is a good default, then an 'alias'
keyword is required to tell the compiler when a parameter could be aliased:

void bar(alias inout int a, alias inout int b)
{
   //no check and assert (even in debug builds).
}

If aliasing is only a problem with 'inout' parameters instead of a new
keyword, what about a new parameter mode, eg:

It would be 'out' and 'inout' for value types; in, out and inout for reference types. However, I think just changing how out and inout are handled for value types and in/out/inout for array refs. would be Ok, since that would be pretty consistent with both how value types vs. reference types are expected to be handled in D now, and arrays are 'built-in' just like other value types like int, double, etc. The reason I say this is partly to remove complexity, but also partly because the performance/memory advantage of passing objects by ref. instead of copying them probably outweighs the advantages of 'noalias' most of the time (for reference types). Again, consistent with how the language is implemented. When I say debug check for array objects, I'm not proposing it be done for individual elements. For value types that is covered by 'no overlapping' in the spec. already and for ref. types it wouldn't matter because the are never 'noalias' anyway (so this would be consistent as above). This would probably even be pretty intuitive for users of other languages like Java too.
void bar(alias int a, alias int b)
{
   //no check and assert (even in debug builds).
}

Maybe we could get by w/o a new keyword at all, since pointers can be used for value types if aliasing is desired. This would also enforce the 'noalias for value types' idea for developers. I think something close to this just may work! The proposal would be something close to: - Change the spec. to 'noalias' for out/inout on value types. - Same for array references, except all of in/out/inout. This is consistent because arrays are built-in's while other things passed by ref. are not strictly built-in's like value types, while arrays are. - Since non-builtin ref. types are always passed by ref., they act as they do now. - Mimic the debug build runtime array bounds check for a 'noalias' check. - (I added this the 2nd time around:) extern and export specifiers would exempt the noalias. This idea: - Shares consistency with how D handles array bounds checks now. - Shares consistency with how D handles value, ref. types and built-in's now, providing a clear delineator that can be followed by developers. - Much easier to implement than strict compile-time checks. - Allows D a one-up over other languages: - Warns on aliasing issues (I believe that Fortran does not) - Allows for aggressive code gen. for these functions (C/C++ does not) - Shares the by-value/by-ref. func. param. semantics of Java - Takes into account COM interfaces, I think. Pitfalls I can think of off-hand: - pointer struct members or struct members that reference by ref. types. - How big a problem would this be out in the real world? - Could the runtime check be applied for each member here as well, w/o jumping through hoops? - Could the spec. leave this as undefined perhaps? Other thoughts? Thanks, - Dave
Aug 11 2004
prev sibling parent Norbert Nemec <Norbert Nemec-online.de> writes:
Hi there,

sorry I did not comment on this before - I just returned home after two
weeks of traveling.

Just a short statement about my current view on aliasing. ("current",
because this view is still evolving...)

First, the problem of aliasing in C was not something that came up because
of some design fault, but because of the existance of references. Fortran
has no references. The only possible cause for different names refering to
the same object in memory would be function arguments. There, aliasing is
prohibited, so Fortran has no aliasing at all, allowing very aggressive
optimizations. In C, function arguments are only one of many possible
causes for aliasing, so the "restrict" does not solve the problem but only
softens it a bit.

As I see it, there are only two ways to rival the performance of Fortran:
either step back into stone-age and create a language without references or
step forward into the future and introduce high-level abstractions that
allow the compiler to know more about the semantics of the code.

Vectorized expressions, as they have been discussed before, will hopefully
solve the problem of aliasing in the most common cases (by allowing the
compiler to freely choose an order of execution). For other cases, we can
only wait for the problems to arise and solve them then (maybe even by
introducing something like the dreaded "restrict") I strongly doubt, that
anyone will have success in finding "the solution" for the problem as a
whole.

Ciao,
Norbert


Dave wrote:

 
 I'm very new to D (literally as of yesterday), but am very impressed with
 what I'm seeing so far.
 
 Being that I want this language to succeed and an important part of that
 will be performance potential over C, I'm curious - how does/will D deal
 with the pointer 'aliasing problem' that plagues C and C++ compiler
 developers?
 
 From what little I've seen so far, it seems that this same problem has
 been 'forced' on D by it's backward compatability with C libraries and
 C/C++ -like support for pointers.
 
 IMHO, any language that seeks to replace C/C++ should do it's best to
 avoid this problem, or at least discourage code that introduces it.

Aug 14 2004