digitalmars.D - [performance]PreInitializing is an annoyance

Manfred Nowak (15/15) Jan 31 2005 I wrote about this before.

Nick Sabalausky (7/23) Jan 31 2005 By the time we have systems that have >1TB RAM, I'm sure the memory bus

Manfred Nowak (9/10) Jan 31 2005 The bus speeds can go as fast as they want. One cpu needs at least two

Lionello Lunesu (11/27) Feb 01 2005 Hi..

Manfred Nowak (12/16) Feb 01 2005 But that is not portable between OS's.

pragma (19/25) Feb 01 2005 Possibly the most compelling fact, that D's arrays are not for huge chun...

Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= (12/33) Feb 01 2005 I was reading documentation of D and found paragraph about initializatio...

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (18/29) Feb 01 2005 It's for the other 0.1% percent, where forgetting to initialize

Norbert Nemec (9/11) Feb 01 2005 This definitely is not an excuse for any limitation in D. Assembler is

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= (12/15) Feb 01 2005 I see D as a nice replacement for C++ in the (very) long run,

Norbert Nemec (10/20) Feb 01 2005 For Java this is clear - it has a completely different objective, so D d...

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (8/17) Feb 02 2005 Just some... And the languages are not *that* different, actually ?

Norbert Nemec (4/13) Feb 02 2005 OK, I probably came on a bit too fast with my answer. C certainly has it...

Walter (11/15) Feb 02 2005 Currently, the only cases where C fits better are:

Norbert Nemec (4/10) Feb 02 2005 5) You want your code to look pretty:

Walter (5/15) Feb 02 2005 You might want to check out:

Walter (24/34) Feb 02 2005 You can write "C" code in D. Take a look at the Empire source code .

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (18/37) Feb 02 2005 Yeah, you can hardly escape NullPointerErrors even in virtual machines.
Dave (16/21) Feb 02 2005 FWIW, that's where I see D making it's biggest inroads initially with th...

Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= (14/26) Feb 01 2005 I can agree that setting floats to NAN make sense, but setting forgotten...

=?ISO-8859-2?Q?Anders_F_Bj=F6rklund?= (4/7) Feb 01 2005 Hard to say in the general case, I recommend playing with -O and disasm?

Walter (8/16) Feb 02 2005 of

Norbert Nemec (4/11) Feb 02 2005 'usually' is the point here. In certain cases, the compiler will not be ...

Walter (6/16) Feb 02 2005 one

Manfred Nowak (5/8) Feb 02 2005 The measure to b taken in the case I put in this discussion is the

Norbert Nemec (15/36) Feb 01 2005 This issue should not need any measurements for justification. It is cle...

pragma (6/15) Feb 01 2005 The nice thing about this is that pragmas in D are not be ignored if not

Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= (5/24) Feb 02 2005 And you could always use "else" to initialize memory old way. I like it.

Ben Hinkle (10/25) Feb 01 2005 The ironic part is that the GC itself has malloc but the D interface to ...

J C Calvarese (7/38) Feb 01 2005 As look as we're doing wishful thinking, maybe we could just put a

Norbert Nemec (4/6) Feb 01 2005 You don't really have to go for one 1TB to see the use of that option. W...

Ivan Senji (14/20) Feb 02 2005 Write

Georg Wrede (6/9) Feb 01 2005 Admittedly, I haven't followed processor specs for a while, but

Brian Chapman (30/39) Feb 01 2005 No guys. Cycle counting is as old as MSDOS floppy disks. It's much

Ben Hinkle (8/14) Feb 01 2005 D is aiming to support bare-metal programming, from what I gather from t...

Brian Chapman (11/20) Feb 03 2005 Sorry, Ben. I was out of line with the Java comment. A little moment of

Norbert Nemec (29/35) Feb 02 2005 Writing assembler to get performance is about as outdated as counting

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/19) Feb 02 2005 Of course, *reading* assembler is a good way to help write that good C

Norbert Nemec (15/31) Feb 02 2005 Of course: if you want to exploit your compiler you have to know it, so

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= (14/18) Feb 02 2005 I think the new GCC default on Mac OS X, -Os, is a fair trade-off ?

Dave (13/17) Feb 02 2005 In article ,

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/6) Feb 02 2005 See http://gcc.gnu.org/develop.html#stage3, they're at the final stage.
Norbert Nemec (7/19) Feb 02 2005 Have you tested the current floating point performance of gdc, compared ...

Thomas Kuehne (17/17) Feb 02 2005 -----BEGIN PGP SIGNED MESSAGE-----
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (10/13) Feb 02 2005 On Mac OS X,
Dave (12/28) Feb 02 2005 There has been some of that for floating point posted here (on the NG) a...

zwang (5/39) Feb 02 2005 In general, rewriting C in assembly doesn't improve much,

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/8) Feb 02 2005 Then again, modern compilers can use those instructions too...

Dave (7/15) Feb 02 2005 If you still have that code handy, would it be possible to run it throug...

Norbert Nemec (10/28) Feb 02 2005 Already found out to my disappointment that I don't have it on my local

Brian Chapman (27/59) Feb 03 2005 I disagree with this statement so very much I wouldn't even know where

Georg Wrede (2/5) Feb 03 2005 Why?

Brian Chapman (4/11) Feb 03 2005 Excellent question. All enlightenment begins with asking a good "why?"

Georg Wrede (6/21) Feb 04 2005 Well, I at least hope the answer would be of general interest in

Norbert Nemec (27/76) Feb 04 2005 I agree - I've discussed with several people on this and hardly ever cam...

Kevin Bealer (74/74) Feb 01 2005 One potential solution: Formalize the existing (or semi-well known) met...
Walter (17/31) Feb 02 2005 There are several ways to create an array. If the array is statically

Vathix (2/6) Feb 02 2005 I like it, but will it work with 'new'? When newing arrays and value typ...

Walter (12/19) Feb 02 2005 been

Andy Friesen (6/25) Feb 02 2005 All this makes me think that the only thing that really needs to be done...

Manfred Nowak <svv1999 hotmail.com> writes:

I wrote about this before.

There is a well known time/space-tradeoff for the preInitialization of 
arrays: using about three times the space one can lazy initialize an 
array.

But this technic is useless within D, because of the automaatic 
preInitialization, which currently eats up about 3 cycles per byte.

Please awaken to, that on a 3GHz machine the busy preInitalization of 
one GB then lasts one second. And the coming 64-bit-machine will have 
up to some TB of main memory. Current mainboards can already hold up to 
4GB, which ist the current main memory limit for win32.

Check again, that to preInitialize one TB you have to wait more than 15 
minutes only to wait at least another 15 minutes until your 
videoediting can start, if no precautions are taken.

We need an option to switch automatic preInitialization of arrays off.

-manfred

Jan 31 2005

"Nick Sabalausky" <z a.a> writes:

By the time we have systems that have >1TB RAM, I'm sure the memory bus 
speeds will be much faster than they are now (And if they aren't, than 
there's a lot of other things besides initing arrays that would take 
insanely long as well).  So I don't think it would take nearly as long as 15 
minutes. But aside from that, you do raise an interesting point.


"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:ctmtne$2car$1 digitaldaemon.com...
I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

 -manfred

Jan 31 2005

Manfred Nowak <svv1999 hotmail.com> writes:

"Nick Sabalausky" wrote: 

[...]
 memory bus speeds will be much faster than they are now

The bus speeds can go as fast as they want. One cpu needs at least two 
cycles to store the next value: one cycle for incrementig the address 
and one to store the value, i.e. a 4GHZ cannot be faster than 0.5s for 
initializing one GB of RAM: still more than eight minutes for one TB.

And in standard machinces the amount of RAM  grew in the last ten years 
by the factor three more than the CPU-frequency.

-manfred

Jan 31 2005

"Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:

Hi..

I guess you can easily do an OS call is such a case: HeapAlloc (or even 
better: VirtualAlloc) in Win32. You should use these functions anyway for 
large blocks of data (dynamic arrays are not meant for that, never were). Or 
simply call malloc, it doesn't initialize the values either.

This sure is better than any crt_init(bool) or whatever call you're thinking 
of to turn on/off array initializations. A compile-time option would be even 
worse: It'd would break a program if compiled with the wrong flag.

Lionello.

"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:ctmtne$2car$1 digitaldaemon.com...
I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

 -manfred

Feb 01 2005

Manfred Nowak <svv1999 hotmail.com> writes:

"Lionello Lunesu" wrote: 

 I guess you can easily do an OS call

But that is not portable between OS's.

[...]
 large blocks of data (dynamic arrays are not meant for that, never
 were). 

From where do you have this wisdom? If so please explain the 
prereqiesites for the usage of dynamic arrays and for fixed arrays as 
well.

 Or simply call malloc, it doesn't initialize the values either.

[...]

I would use malloc, if arrays at all are not to be used for large 
amounts of memory. Which seems to be a contradiction to the fact that 
memory cells are laid out like an array.
 
-manfred

Feb 01 2005

pragma <pragma_member pathlink.com> writes:

In article <ctohtt$10mr$1 digitaldaemon.com>, Manfred Nowak says...
[...]
 large blocks of data (dynamic arrays are not meant for that, never
 were). 

From where do you have this wisdom? If so please explain the 
prereqiesites for the usage of dynamic arrays and for fixed arrays as 
well.

Possibly the most compelling fact, that D's arrays are not for huge chunks of
memory, is that they rely on copy-on-write semantics.  

With a 1GB chunk of memory in a single D array, modifications to slices and
concatenations will would result in a complete realloc and copy; so you could
only use *one-half to a third* of your system's memory... and that's on a good
day with agressive memory management and no GC.  Also, it's ill-advised to use a
single array directly for something like this simply due to the cache-misses
that are likely to result from random-accesses on a 1GB structure; the same goes
for virtually every language out there.

To that end D sits frimly in the "trade more space for less running time"
optimization camp, which is fine for the majority of tasks out there.
Superscale blobs of data require a behavior that certainly can be expressed in
D, but is not enshrined in it's underlying design.

IMO, the optimal (all-round) solution for working with massive data structures,
would approach the complexity of a memory manager and not a series of simple
array manipulations.  From there, you could implement an array-like interface to
make it more friendly to use, but it would still be a far cry from a true array.


- EricAnderton at yahoo

Feb 01 2005

Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= <arael fov.pl> writes:

Manfred Nowak wrote:

 I wrote about this before.
 
 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.
 
 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.
 
 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.
 
 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.
 
 We need an option to switch automatic preInitialization of arrays off.
 
 -manfred

I was reading documentation of D and found paragraph about initialization of
variables. Can somebody tell me what is the point in doing so and if it's
done just by setting them after creation (and IMHO wasting cpu cycles) or
is it cost free (I don't know how would be that possible, but that is why
I'm asking).

Your post just remind me that case. I'm going even further. Why to
initialize anything at all? In 99.9% cases variables are initialized one or
two lines after creation and default isn't used at all.
-- 
Dawid Ci�arkiewicz | arael
jid: arael fov.pl

Feb 01 2005

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:

Dawid Ciężarkiewicz wrote:

 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized one or
 two lines after creation and default isn't used at all.

It's for the other 0.1% percent, where forgetting to initialize
a variable causes a subtle bug ? In other languages, such as
Objective-C for instance, these are separate events altogether:

   NewObject *newObject; // newObject will be an instance of the NewObject class
   newObject = [[NewObject alloc] init]; // create and initialize the object
   [newObject doSomethingWith: anotherObject];

But in D, both will be performed when using the "new" keyword.

http://www.digitalmars.com/d/class.html#constructors:
 Members are always initialized to the default initializer for their
 type, which is usually 0 for integer types and NAN for floating point
 types. This eliminates an entire class of obscure problems that come
 from neglecting to initialize a member in one of the constructors.


Of course, in Objective-C you also have to retain/release or use
Autorelease Pools, which is more fuss than D's garbage collection...
(since it uses a simpler manual method of reference counting)

You can still just kill it, of course, similar to "delete" in D:
  [newObject dealloc];

Which has all of the double-free and dangling pointer fun, too.


PreInitializing and GarbageCollecting are a whole lot easier to use.


And for local variables, a reasonably good compiler should be able
to optimize out the .init value, if it's just replaced right away...

If not, then your code probably have other performance problems ;-)

You can still write critical parts in C or even asm, and link them in ?
(or write D code using C-standard functions or DMD inline X86 assembler)

--anders

Feb 01 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Anders F Bj�rklund wrote:

 You can still write critical parts in C or even asm, and link them in ?
 (or write D code using C-standard functions or DMD inline X86 assembler)

This definitely is not an excuse for any limitation in D. Assembler is
important for systems programming, but trying to beat a modern compiler
with hand-written assembler will work only in very special cases. Simple
code can usually be optimized be the compiler anyway and complicated code
will become a mess written in assembler.

For C - of course, one should be able to bind in existing C code, but in the
long term, there should not be any reason to write new code in C if you
have a D compiler at hand.

Feb 01 2005

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Norbert Nemec wrote:

 For C - of course, one should be able to bind in existing C code, but in the
 long term, there should not be any reason to write new code in C if you
 have a D compiler at hand.

I see D as a nice replacement for C++ in the (very) long run,
but I will continue to use either C or Java when they fit better...


But I agree that D is a bit strange in that it's pretty easy to
e.g. dereference null, but hard to e.g. allocate uninited memory ?

So far the performance has been good (just a few Mac OS X quirks still),
and seems to be one of the key points of D. So it's right to address it.


Maybe the auto initialization can be be replaced with an error if you
try to actually use the value without setting it first ? Like in Java.

(Java does D-style init of members, but only such errors for local vars.
  Not sure how much work it is for the compiler to catch such errors ?)

--anders

Feb 01 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Anders F Bj�rklund wrote:

 Norbert Nemec wrote:
 
 For C - of course, one should be able to bind in existing C code, but in
 the long term, there should not be any reason to write new code in C if
 you have a D compiler at hand.

 
 I see D as a nice replacement for C++ in the (very) long run,
 but I will continue to use either C or Java when they fit better...

For Java this is clear - it has a completely different objective, so D does
not even try to compete with in every respect.

For C on the other hand, D should try to surpass it in every respect, so
that there are no cases left, where C "fits better". This certainly is an
ambitious goal that may never be reached completely, but nevertheless, it
is a goal.

 Maybe the auto initialization can be be replaced with an error if you
 try to actually use the value without setting it first ? Like in Java.

This would not make much difference. If the compiler is able to detect this
error, it is also able to optimize away unnecessary initializations. This
is simple in trivial cases but - I believe - impossible in general.

Feb 01 2005

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:

Norbert Nemec wrote:

I see D as a nice replacement for C++ in the (very) long run,
but I will continue to use either C or Java when they fit better...

 
 For Java this is clear - it has a completely different objective, so D does
 not even try to compete with in every respect.

Just some... And the languages are not *that* different, actually ?
(just talking about the Java language, not the JVM or the religion)

 For C on the other hand, D should try to surpass it in every respect, so
 that there are no cases left, where C "fits better". This certainly is an
 ambitious goal that may never be reached completely, but nevertheless, it
 is a goal.

I don't see either D or C++ as a replacement for regular C, more as a 
compliment? In my world, C is a more portable alternative to assembler.
This does not mean I write everything in it (or in assembler, either)

For me, D fits in nicely between the C and Java language "extremes"...

--anders

Feb 02 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Anders F Bj�rklund wrote:

 Norbert Nemec wrote:
 For C on the other hand, D should try to surpass it in every respect, so
 that there are no cases left, where C "fits better". This certainly is an
 ambitious goal that may never be reached completely, but nevertheless, it
 is a goal.

 
 I don't see either D or C++ as a replacement for regular C, more as a
 compliment? In my world, C is a more portable alternative to assembler.
 This does not mean I write everything in it (or in assembler, either)

OK, I probably came on a bit too fast with my answer. C certainly has its
uses. Personally, I don't use it at all, but then - everybody has a limited
view of the world...

Feb 02 2005

"Walter" <newshound digitalmars.com> writes:

"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:ctq038$2de3$1 digitaldaemon.com...
 For C on the other hand, D should try to surpass it in every respect, so
 that there are no cases left, where C "fits better". This certainly is an
 ambitious goal that may never be reached completely, but nevertheless, it
 is a goal.

Currently, the only cases where C fits better are:

1) you need to work with existing C code
2) there isn't a D compiler for the target
3) you're working with a tool that generates C code
4) your staff is content using C and will not try anything else

These are all environmental considerations, not language issues. It's faster
to write code in D, faster to compile it and faster to debug it. If I'm
missing something, if there is something that the C language is a better fit
for, I'd like to know what it is!

Feb 02 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Walter wrote:
 Currently, the only cases where C fits better are:
 
 1) you need to work with existing C code
 2) there isn't a D compiler for the target
 3) you're working with a tool that generates C code
 4) your staff is content using C and will not try anything else

5) You want your code to look pretty:
 http://www.de.ioccc.org/2004/anonymous.c
:-)

Feb 02 2005

"Walter" <newshound digitalmars.com> writes:

"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:ctrj9i$146d$1 digitaldaemon.com...
 Walter wrote:
 Currently, the only cases where C fits better are:

 1) you need to work with existing C code
 2) there isn't a D compiler for the target
 3) you're working with a tool that generates C code
 4) your staff is content using C and will not try anything else

 5) You want your code to look pretty:
  http://www.de.ioccc.org/2004/anonymous.c
 :-)

You might want to check out:



<g>

Feb 02 2005

"Walter" <newshound digitalmars.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message
news:ctomh9$14jr$1 digitaldaemon.com...
 I see D as a nice replacement for C++ in the (very) long run,
 but I will continue to use either C or Java when they fit better...

You can write "C" code in D. Take a look at the Empire source code <g>.

 But I agree that D is a bit strange in that it's pretty easy to
 e.g. dereference null, but hard to e.g. allocate uninited memory ?

It isn't strange viewed from the perspective that dereferencing null always
generates a seg fault, and so cannot be ignored, overlooked, etc. Allocating
uninitialized memory can lead to erratic, random behavior which sometimes
can *appear* to work successfully, hence the idea that this is a bad thing
that must be stamped out.

Predictable, consistent behavior is what makes for robust, debuggable, error
free programs.

 So far the performance has been good (just a few Mac OS X quirks still),
 and seems to be one of the key points of D. So it's right to address it.

I wish to point out that DMDScript in D is faster than DMDScript in C++,
despite the D version doing the automatic initialization (and the other
safety features in D). The magic dust at work here is the D profiler and the
ease in manipulating the D source code to make it faster. (D code, I've
discovered, is easier than C++ to manipulate source to try to make it run
faster. I spent a lot less time tuning the D code, and got better results.)

 Maybe the auto initialization can be be replaced with an error if you
 try to actually use the value without setting it first ? Like in Java.

That only works well if you've got hardware support for it. The compiler
cannot reliably determine this, though some compilers fake it and issue
spurious and irritatingly wrong warnings when they get it wrong.

 (Java does D-style init of members, but only such errors for local vars.
   Not sure how much work it is for the compiler to catch such errors ?)

The same techniques for catching such errors at compile time can be used
instead to eliminate initializations that are not needed, which is done by
DMD. I much prefer the latter approach, as when an initialization is
redundant but such redundancy is not detectable, it "fails safe" by leaving
the initialization in rather than issuing a nuisance error message.

Feb 02 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter wrote:

But I agree that D is a bit strange in that it's pretty easy to
 e.g. dereference null, but hard to e.g. allocate uninited memory ?


 
 It isn't strange viewed from the perspective that dereferencing null always
 generates a seg fault, and so cannot be ignored, overlooked, etc. Allocating
 uninitialized memory can lead to erratic, random behavior which sometimes
 can *appear* to work successfully, hence the idea that this is a bad thing
 that must be stamped out.

Yeah, you can hardly escape NullPointerErrors even in virtual machines.

Just meant that there are other languages that do more of hand-holding?
D has this funny mix of low and high level, that takes a time of getting 
used to. But I like it :-) At least most of it, save a few rants... ;-)

 I wish to point out that DMDScript in D is faster than DMDScript in C++,
 despite the D version doing the automatic initialization (and the other
 safety features in D). The magic dust at work here is the D profiler and the
 ease in manipulating the D source code to make it faster. (D code, I've
 discovered, is easier than C++ to manipulate source to try to make it run
 faster. I spent a lot less time tuning the D code, and got better results.)

The differences I'm seeing are mostly due to the fact that Apple has 
spent a lot of time tuning their compiler for C, Objective-C and C++
(and even the bastard child Objective-C++) but for D, I need to use
the regular GCC which only has a few of those PowerPC tunings done...

This gets even larger when using vector operations, on the PPC G4/G5.

For DMD platforms, such as Win32 or Linux X86, this is not an issue.
(or maybe less of an issue, as I don't how the SSE/MMX support is?)

 The same techniques for catching such errors at compile time can be used
 instead to eliminate initializations that are not needed, which is done by
 DMD. I much prefer the latter approach, as when an initialization is
 redundant but such redundancy is not detectable, it "fails safe" by leaving
 the initialization in rather than issuing a nuisance error message.

It's also simpler to code with variables that start with a known value,
rather than getting warnings later (even if they could be done reliable)

I actually like that member fields are inited with known initializers,
usally zeroes, and the locals will be optimized out anyway... Leaving
large arrays and such, which is something that still can be addressed.

--anders

Feb 02 2005

Dave <Dave_member pathlink.com> writes:

In article <ctraod$rm5$1 digitaldaemon.com>, Walter says...
"Anders F Bj�rklund" <afb algonet.se> wrote in message
news:ctomh9$14jr$1 digitaldaemon.com...
 I see D as a nice replacement for C++ in the (very) long run,
 but I will continue to use either C or Java when they fit better...

You can write "C" code in D. Take a look at the Empire source code <g>.

FWIW, that's where I see D making it's biggest inroads initially with the
general programming community, especially now that Linux is surging and has a
large number of fluent C programmers who are generally not forced into "vendor
(or language or tool) tie-in". They'll use it a lot like C except maybe actually
start to use OOP because D makes that very straight-forward ;)

That's also why I personally tend to put a lot of stock in run-time performance;
an easier/safer/gc'd "C" that generally performs as well at v1.0 (and
potentially better in the future) sounds like a pretty darn good reason to risk
a switch for me <g>.

I mean the reason a systems programmer 'drops to C' and doesn't do everything in
Perl or Python or whatever is because of performance. That's usually the most
compelling reason anyway. Less than equal performance (or even perceived
performance via common benchmark results) will also be an equally compelling
reason for many C programmers to not try D ;)

- Dave

Feb 02 2005

Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= <arael fov.pl> writes:

Anders F Bj�rklund wrote:

 Dawid Ci�arkiewicz wrote:
 
 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized one
 or two lines after creation and default isn't used at all.

 
 It's for the other 0.1% percent, where forgetting to initialize
 a variable causes a subtle bug ? In other languages, such as
 Objective-C for instance, these are separate events altogether:

I can agree that setting floats to NAN make sense, but setting forgotten int
to arbitrary value won't help program so much. It will let him give same
errors rather than random ones. :)


 PreInitializing and GarbageCollecting are a whole lot easier to use.

I agree that GrabageCollecting is *necessary* for modern computer programing
language. And I agree that should be a way to disable it if needed. Just as
with variable initialization - sometimes (in critical parts of code) it
would be good to could disable this.

 And for local variables, a reasonably good compiler should be able
 to optimize out the .init value, if it's just replaced right away...

This is part of the answer that I expected and I'm glad to hear, but what
about variables that are not expected to have any value for start and are
initialized later?

-- 
Dawid Ci�arkiewicz | arael
jid: arael fov.pl

Feb 01 2005

=?ISO-8859-2?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Dawid Ci�arkiewicz wrote:

 This is part of the answer that I expected and I'm glad to hear, but what
 about variables that are not expected to have any value for start and are
 initialized later?

Hard to say in the general case, I recommend playing with -O and disasm?
(if you use GDC, you can compare -O0...-O3, and get asm output with -S)

--anders

Feb 01 2005

"Walter" <newshound digitalmars.com> writes:

"Dawid Ci�arkiewicz" <arael fov.pl> wrote in message
news:ctnv58$c53$1 digitaldaemon.com...
 I was reading documentation of D and found paragraph about initialization

of
 variables. Can somebody tell me what is the point in doing so and if it's
 done just by setting them after creation (and IMHO wasting cpu cycles) or
 is it cost free (I don't know how would be that possible, but that is why
 I'm asking).

The point of it is to eliminate a common, and difficult to find, source of
bugs.

 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized one

or
 two lines after creation and default isn't used at all.

In those cases, the optimizer will usually eliminate the initializer (since
it is a "dead assignment").

Feb 02 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Walter wrote:

 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized one

 or
 two lines after creation and default isn't used at all.

 
 In those cases, the optimizer will usually eliminate the initializer
 (since it is a "dead assignment").

'usually' is the point here. In certain cases, the compiler will not be able
to determine that it actually is a dead assignment and leave it in. There
should be a compiler pragma to tell the compiler about it in this case.

Feb 02 2005

"Walter" <newshound digitalmars.com> writes:

"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:ctr980$pr9$1 digitaldaemon.com...
 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized



one
 or
 two lines after creation and default isn't used at all.

 In those cases, the optimizer will usually eliminate the initializer
 (since it is a "dead assignment").

 'usually' is the point here. In certain cases, the compiler will not be

able
 to determine that it actually is a dead assignment and leave it in. There
 should be a compiler pragma to tell the compiler about it in this case.

I honestly think that in a non-trivial program, you'd be very, very hard
pressed to see a measurable difference in program performance from this.

Feb 02 2005

Manfred Nowak <svv1999 hotmail.com> writes:

"Walter" wrote: 

[...]
 I honestly think that in a non-trivial program, you'd be very,
 very hard pressed to see a measurable difference in program
 performance from this. 

The measure to b taken in the case I put in this discussion is the 
steadiness of the run by lazy initializing. This costly in total.

-manfred

Feb 02 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

This issue should not need any measurements for justification. It is clear
that preinitializing causes some overhead.

Personally, I believe there needs to be some way to deactivate it. One
cannot expect the compiler to optimize away all unnecessary
initializations. Especially for array, where initialization really becomes
an issue, the initialization might not happen in one simple loop.

As I understand the philosophy of D, it does allow the user to shoot himself
in the foot if he really wants to. It is ok to default to a safe behaviour,
but experts should be able to deactivate the safety measures by some
explicit command. (A global compiler option is not a good idea! I has to be
specified right in the code in some way. Any ideas for a possible syntax
specifying "This variable should not be initialized"? It should be possible
both for variables as well as for dynamically allocated memory or class
members.



Manfred Nowak wrote:

 I wrote about this before.
 
 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.
 
 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.
 
 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.
 
 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.
 
 We need an option to switch automatic preInitialization of arrays off.
 
 -manfred

Feb 01 2005

pragma <pragma_member pathlink.com> writes:

In article <ctoitr$11pk$1 digitaldaemon.com>, Norbert Nemec says...
Any ideas for a possible syntax
specifying "This variable should not be initialized"? It should be possible
both for variables as well as for dynamically allocated memory or class
members.

This sounds like a job for a compiler pragma.

 pragma(noinit) int a; // a gets a random value now (just like C!).
 int[] b; 
 pragma(noinit){
     b = new int[1024*1024*1024]  // b gets an uninitalized 1GB block.
 }

The nice thing about this is that pragmas in D are not be ignored if not
understood.  So while compiler dependent, the code won't compile on D compilers
that don't support 'noinit'.

- EricAnderton at yahoo

Feb 01 2005

Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= <arael fov.pl> writes:

pragma wrote:

 In article <ctoitr$11pk$1 digitaldaemon.com>, Norbert Nemec says...
Any ideas for a possible syntax
specifying "This variable should not be initialized"? It should be
possible both for variables as well as for dynamically allocated memory or
class members.

 
 This sounds like a job for a compiler pragma.
 
 pragma(noinit) int a; // a gets a random value now (just like C!).
 int[] b;
 pragma(noinit){
     b = new int[1024*1024*1024]  // b gets an uninitalized 1GB block.
 }

 
 The nice thing about this is that pragmas in D are not be ignored if not
 understood.  So while compiler dependent, the code won't compile on D
 compilers that don't support 'noinit'.
 
 - EricAnderton at yahoo

And you could always use "else" to initialize memory old way. I like it.
-- 
Dawid Ci�arkiewicz | arael
jid: arael fov.pl

Feb 02 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:ctmtne$2car$1 digitaldaemon.com...
I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

 -manfred

The ironic part is that the GC itself has malloc but the D interface to it 
always clears the result. See src/phobox/internal/gc.d routine _d_newarrayi. 
It would be really nice to have the following added to the GC interface:
  void* malloc(size_t len) { return _gc.malloc(len); }
Ah, to have something so close and yet so far away...

And while I'm at it how about exposing _gc.realloc, _gc.free and 
_gc.capacity, too. oh, now I'm just dreaming I know.

-Ben

Feb 01 2005

J C Calvarese <jcc7 cox.net> writes:

In article <ctokln$13jb$1 digitaldaemon.com>, Ben Hinkle says...
"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:ctmtne$2car$1 digitaldaemon.com...
I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

 -manfred

The ironic part is that the GC itself has malloc but the D interface to it 
always clears the result. See src/phobox/internal/gc.d routine _d_newarrayi. 
It would be really nice to have the following added to the GC interface:
  void* malloc(size_t len) { return _gc.malloc(len); }
Ah, to have something so close and yet so far away...

And while I'm at it how about exposing _gc.realloc, _gc.free and 
_gc.capacity, too. oh, now I'm just dreaming I know.

-Ben 

As look as we're doing wishful thinking, maybe we could just put a
readable/writeable property in the GC module called: "noinit". After
_d_newarrayi malloc's the memory, it checks "noinit" to see if it should
initialize or not. The majority of programmers would never touch this setting.
Those that have to clear 1 TB of memory have the option.

jcc7

Feb 01 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

J C Calvarese wrote:

 The majority of programmers would never touch this
 setting. Those that have to clear 1 TB of memory have the option.

You don't really have to go for one 1TB to see the use of that option. Write
a routine that has a 1KB array locally on the stack and call that routine
repeatedly...

Feb 01 2005

"Ivan Senji" <ivan.senji public.srce.hr> writes:

"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:ctq0t2$2e32$1 digitaldaemon.com...
 J C Calvarese wrote:

 The majority of programmers would never touch this
 setting. Those that have to clear 1 TB of memory have the option.

 You don't really have to go for one 1TB to see the use of that option.

Write
 a routine that has a 1KB array locally on the stack and call that routine
 repeatedly...

I had a situation like this (but more than 1KB) and it didn't seem to work
that
slow, but i am sure it would work faster if it wasn't initialized every
time.
Why not try to persuade Walter of some syntax that would allow to create
uninitialized
arrays?

noinit int[100000] array;

or any other form that could be used.

This would ofcourse be an option and used only when you need it and are sure
that you will initialize the data  later.

Feb 02 2005

Georg Wrede <georg.wrede nospam.org> writes:

 "Manfred Nowak" <svv1999 hotmail.com> wrote in message 
But this technic is useless within D, because of the automaatic
preInitialization, which currently eats up about 3 cycles per byte.


Admittedly, I haven't followed processor specs for a while, but 
initializing memory to zero should take about 1 cycle per 4 bytes, 
unless I'm totally confused. And on a 64 bit bus machine 1 cycle 8 
bytes. (We're talking about long sequences of memory here, not small 
structs or single variables.)

Does someone know this better?

Feb 01 2005

Brian Chapman <nospam-for-brian see-post-for-address.net> writes:

On 2005-02-01 16:17:17 -0600, Georg Wrede <georg.wrede nospam.org> said:

 "Manfred Nowak" <svv1999 hotmail.com> wrote in message
 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.


 
 Admittedly, I haven't followed processor specs for a while, but 
 initializing memory to zero should take about 1 cycle per 4 bytes, 
 unless I'm totally confused. And on a 64 bit bus machine 1 cycle 8 
 bytes. (We're talking about long sequences of memory here, not small 
 structs or single variables.)

No guys. Cycle counting is as old as MSDOS floppy disks. It's much 
better and more complicated than that anymore. First of all, memory is 
transfered from RAM to L1 cache in a cache-line fill. 32 or 64 byte 
lines depending on the architecture. Then to L2 (on chip) cache. 
Various RAM types (ie: DDR) and cache read/write strategies will 
greatly affect how fast this is. Wait states are the problem area. A 
processor can sit there doing nothing for many "cycles" waiting for the 
bus and memory controller to get their ass in gear.

Now, Superscalar processors (pentium onward) can read an write in 
parallel as long as the data does not depend on the previous results. 
Clearing memory would be such a case. So two ore more write 
instructions could be paired and executed in one "cycle." And if SIMD 
instructions are being used, then were talking even more through put. 
There's just no way of counting anymore. It's an old practice that 
doesn't relate to current hardware anymore.

For instance, on the PPC it's all about keeping data in the cache. You 
can have an other wise high "cycle count" of code but if it keeps cache 
misses to a minimum then it will out preform a piece of code optimized 
on "cycle counting."

The question is how much throughput and bandwidth do you have? The 
processor and it's instruction set is not the issue.

But all of this is irrelevant to me, because if you're wanting to do 
gigabyte bare-metal memory blits for video editing, it would be beyond 
me why you would be using D and expecting it to do what you want. Why 
not just use Java? That makes about as much sense. You need to know 
YOUR hardware and write bare-metal ASM to get what you need done if 
it's that vital.

I don't use a can opener to peal a grapefruit just because it's brand 
new and shiny.

Feb 01 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 But all of this is irrelevant to me, because if you're wanting to do 
 gigabyte bare-metal memory blits for video editing, it would be beyond me 
 why you would be using D and expecting it to do what you want. Why not 
 just use Java? That makes about as much sense. You need to know YOUR 
 hardware and write bare-metal ASM to get what you need done if it's that 
 vital.

D is aiming to support bare-metal programming, from what I gather from the 
Major Goals section of http://www.digitalmars.com/d/overview.html:
"Provide low level bare metal access as required"
D and Java are lightyears apart in terms of bare-metal access.

The existing way to get gobs of uninitialized memory is to call 
std.c.stdlib.malloc and manage the memory by hand. That's fine and dandy but 
we want more. What was that old Queen song? "I want it all and I want it 
now"? Sounds good to me. :-)

Feb 01 2005

Brian Chapman <nospam-for-brian see-post-for-address.net> writes:

On 2005-02-01 20:59:16 -0600, "Ben Hinkle" <ben.hinkle gmail.com> said:

 D is aiming to support bare-metal programming, from what I gather from 
 the Major Goals section of http://www.digitalmars.com/d/overview.html:
 "Provide low level bare metal access as required"
 D and Java are lightyears apart in terms of bare-metal access.
 
 The existing way to get gobs of uninitialized memory is to call 
 std.c.stdlib.malloc and manage the memory by hand. That's fine and 
 dandy but we want more. What was that old Queen song? "I want it all 
 and I want it now"? Sounds good to me. :-)

Sorry, Ben. I was out of line with the Java comment. A little moment of 
blunt humor got the better of me. ;-) D most certainly can't be 
compared with Java in that (and many other) respects. That wasn't the 
intent of my point.

I'm just saying anyone who thinks they should be able to do a 
"ubyte[1<<30] videoData;" and thinks it *should* be optimal or else "we 
need to fix the compiler" deserves the headache they're going to get. 
;-)

But I'm afraid malloc isn't the answer either. More like direct memory 
mapping, ie: mmap/mlock.

Feb 03 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Brian Chapman wrote:

 But all of this is irrelevant to me, because if you're wanting to do
 gigabyte bare-metal memory blits for video editing, it would be beyond
 me why you would be using D and expecting it to do what you want. Why
 not just use Java? That makes about as much sense. You need to know
 YOUR hardware and write bare-metal ASM to get what you need done if
 it's that vital.

Writing assembler to get performance is about as outdated as counting
cycles. As I said before: if the code is simple enough to write it in
assembler, it is also simple enough for a reasonable compiler to optimize
it to the same extent.

To exploit the full power of a modern processor, you have to do the right
amount of loop unrolling, loop fusing, command interlacing and so on. YOu
have to play with the data layout in memory, perhaps chunking arrays into
smaller pieces. There are several more techniques to use, when you want to
make full use of pipelining, branch prediction, cache lines and so on.

Languages like Fortran 95 would in principle allow to compiler to do all of
this automatically (Some good implementations begin to emerge.)

Doing all of it by hand in C results in complete spaghetti code, but it is
possible if you know exactly what you are doing. (In a course we did, we
eventually transformed one single loop into an equivalent of ~500 lines of
highly optimized spaghetti. The result was ten times faster than the
original and somewhere around 80% of the absolute theoretical limit of the
processor.

The result was still pure C and therefore completely portable. The
performance was, of course, tuned to one specific architecture, but there
were basically constants to adjust for tuning it for about any modern
processor.

Doing the same thing in assembler would probably not be much faster. (After
you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
the effort. 80% peak performance is already well beyond what people usually
go for.)

Furthermore, writing that kind of spaghetti code in C without getting an
error in already needs a lot of discipline. Doing the same thing in
assembler will probably land you in the next psychiatry...

Feb 02 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Norbert Nemec wrote:

 Writing assembler to get performance is about as outdated as counting
 cycles. As I said before: if the code is simple enough to write it in
 assembler, it is also simple enough for a reasonable compiler to optimize
 it to the same extent.

Of course, *reading* assembler is a good way to help write that good C 
code and is also a great help when debugging without the source code?

So I still think learning to read (and write too, just for compliment)
assembler is relevant, just as I think C is... Lots of people disagree*.

 Doing the same thing in assembler would probably not be much faster. (After
 you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
 the effort. 80% peak performance is already well beyond what people usually
 go for.)

You could be in for a surprise there, though. But I agree that writing
assembly is now a lot harder these days, in the post-RISC CPU era...

These days, assembler and C are more useful for generating *small* code?
Major loop unrolling and load/store reordering are a pain to do in asm.

--anders

* = Those darn Quiche Eaters. C and ASM is for us Real Programmers. :-)

Feb 02 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Anders F Bj�rklund wrote:

 Norbert Nemec wrote:
 
 Writing assembler to get performance is about as outdated as counting
 cycles. As I said before: if the code is simple enough to write it in
 assembler, it is also simple enough for a reasonable compiler to optimize
 it to the same extent.

 
 Of course, *reading* assembler is a good way to help write that good C
 code and is also a great help when debugging without the source code?

Of course: if you want to exploit your compiler you have to know it, so
reading assembler might be a good idea once in a while...

 Doing the same thing in assembler would probably not be much faster.
 (After you went from 8% to 80%, the remaining factor of 1.25 probably
 isn't worth the effort. 80% peak performance is already well beyond what
 people usually go for.)

 
 You could be in for a surprise there, though.

Not really. In that specific example (which was typical for numerics) the
algorithm was given. It was known that the calculation needed a certain
number of floating point operations. Each processor has some physical limit
of floating point operations per second that it could theoretically achieve
under absolute optimum conditions. No code in the world will ever break
this limit. If you reach 80% of it with plain C code, you know that using
assembler cannot not give you much gain. No surprise possible as long as
you stick to the same algorithm.

 These days, assembler and C are more useful for generating *small* code?

Of course. Code-size was never a concern for me yet. I was only talking
about performance. (Be aware though, that excessive code-bloat is bad for
performance as well. The code-cache is limited as well, so excessive
loop-unrolling will kill performance as well.)

Feb 02 2005

=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Norbert Nemec wrote:

 Of course. Code-size was never a concern for me yet. I was only talking
 about performance. (Be aware though, that excessive code-bloat is bad for
 performance as well. The code-cache is limited as well, so excessive
 loop-unrolling will kill performance as well.)

I think the new GCC default on Mac OS X, -Os, is a fair trade-off ?
It's same as -O2, without the excessive code-heavy optimizations...
http://gcc.gnu.org/onlinedocs/gcc-3.3.5/gcc/Optimize-Options.html

It's a good allround compiler setting. For systems that tune the
output to the present computer, like Gentoo Linux, then other
flags might be in order that more specifically target the CPU.

But it's very hard to "optimize for the general case", which is why
Just-In-Time compilers and recompiling from source code are popular ?
Problem with assembler is that it just isn't portable enough today.

But I think the *performance* of D and DMD is more than good enough.
Right now I'm more concerned about the bugs and ever getting to "1.0"
(and porting GDC to the new GCC 4.0, would also be very interesting)

--anders

Feb 02 2005

Dave <Dave_member pathlink.com> writes:

In article <ctq7qb$2lmg$1 digitaldaemon.com>,
=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...

<snip>
But I think the *performance* of D and DMD is more than good enough.
Right now I'm more concerned about the bugs and ever getting to "1.0"
(and porting GDC to the new GCC 4.0, would also be very interesting)

I agree, with the exception of DMD floating point which I hope will be given
some attention before 1.0.

It's important to me and I think will actually turn out to be important to the
overall acceptance of the language (and certainly DMD) come 1.0.

I'm not talking about new array semantics, vectorizing, expression templates or
anything like what Norbert has been speaking of lately; just plain old for(...)
{ PI * 2.0 * radius[i]; ...; } type of stuff.

BTW - How close are they getting with GCC 4? I have not been following that
lately.

- Dave

--anders

Feb 02 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Dave wrote:

 BTW - How close are they getting with GCC 4? I have not been following that
 lately.

See http://gcc.gnu.org/develop.html#stage3, they're at the final stage.

Apple is going to use it as the main system compiler in next Mac OS X.

--anders

Feb 02 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Dave wrote:

 In article <ctq7qb$2lmg$1 digitaldaemon.com>,
 =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...

 <snip>
But I think the *performance* of D and DMD is more than good enough.
Right now I'm more concerned about the bugs and ever getting to "1.0"
(and porting GDC to the new GCC 4.0, would also be very interesting)

 
 I agree, with the exception of DMD floating point which I hope will be
 given some attention before 1.0.

Have you tested the current floating point performance of gdc, compared to
gcc/g++? This would give a clue about whether it is a problem of the front
end or the DM backend. 

Are there any general comparisons of the code produced by the different
compilers? (Not only "What *does* work?", like in the stress test, but also
"How well does is work?")

Feb 02 2005

Thomas Kuehne <thomas-dloop kuehne.THISISSPAM.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Norbert Nemec wrote:
| Are there any general comparisons of the code produced by the
| different compilers? (Not only "What *does* work?", like in the
| stress test, but also "How well does is work?")

Most of the comparisons are on the benchmark level

http://www.prowiki.org/wiki4d/wiki.cgi?Benchmarks
http://gcc.gnu.org/benchmarks/
http://shootout.alioth.debian.org/

I'm not aware of any current public compiler dissections.

Thomas

-----BEGIN PGP SIGNATURE-----

iD8DBQFCAR4X3w+/yD4P9tIRAtAOAJ9ej/xFnhYO4wyhNxKiaOZwSLOScQCgiXtO
YvOsTKmUg5z4brXrlrKozlo=
=mDCr
-----END PGP SIGNATURE-----

Feb 02 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Norbert Nemec wrote:

 Are there any general comparisons of the code produced by the different
 compilers? (Not only "What *does* work?", like in the stress test, but also
 "How well does is work?")

On Mac OS X,
most of it is like "hooray, it compiles" :-)

Benchmarks hasn't been too bad, but currently
GCC is quicker for most tasks (and then again
gcc code generated on PPC is not all that good)

When Mango compiles, and some of the more annoying D
bugs like "void main()" are out, we can do some more
testing. For now, DStress is a pretty good start...

--anders

Feb 02 2005

Dave <Dave_member pathlink.com> writes:

In article <ctr5d5$lfd$1 digitaldaemon.com>, Norbert Nemec says...
Dave wrote:

 In article <ctq7qb$2lmg$1 digitaldaemon.com>,
 =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...

 <snip>
But I think the *performance* of D and DMD is more than good enough.
Right now I'm more concerned about the bugs and ever getting to "1.0"
(and porting GDC to the new GCC 4.0, would also be very interesting)

 
 I agree, with the exception of DMD floating point which I hope will be
 given some attention before 1.0.

Have you tested the current floating point performance of gdc, compared to
gcc/g++? This would give a clue about whether it is a problem of the front
end or the DM backend. 

There has been some of that for floating point posted here (on the NG) a while
back -- oopack and scimark ported by Thomas Kuehn.

IIRC, generally what it showed was that GDC significantly outperformed DMD and
also in the case of scimark, that GDC actually performed a bit better than GCC
and was very close to Intel, so the frontend doesn't appear to be the issue.

My own experience is that DMD is very good for int. A good example of this is
that the gc generally seems to run faster for DMD than GDC. 

For the FP that I'm familiar with (not cache dependent heavy-duty numerics), it
looks to me like DMD just needs to make as good of use of the FP registers as it
does with the GP registers ;)

- Dave

Feb 02 2005

zwang <nehzgnaw gmail.com> writes:

In general, rewriting C in assembly doesn't improve much,
since modern compilers are good at optimizing general-purpose code. 
Where hand-tuned assembly can often boost the performance is with
programs that may exploit MMX & SSE instructions.



Norbert Nemec wrote:
 Writing assembler to get performance is about as outdated as counting
 cycles. As I said before: if the code is simple enough to write it in
 assembler, it is also simple enough for a reasonable compiler to optimize
 it to the same extent.
 
 To exploit the full power of a modern processor, you have to do the right
 amount of loop unrolling, loop fusing, command interlacing and so on. YOu
 have to play with the data layout in memory, perhaps chunking arrays into
 smaller pieces. There are several more techniques to use, when you want to
 make full use of pipelining, branch prediction, cache lines and so on.
 
 Languages like Fortran 95 would in principle allow to compiler to do all of
 this automatically (Some good implementations begin to emerge.)
 
 Doing all of it by hand in C results in complete spaghetti code, but it is
 possible if you know exactly what you are doing. (In a course we did, we
 eventually transformed one single loop into an equivalent of ~500 lines of
 highly optimized spaghetti. The result was ten times faster than the
 original and somewhere around 80% of the absolute theoretical limit of the
 processor.
 
 The result was still pure C and therefore completely portable. The
 performance was, of course, tuned to one specific architecture, but there
 were basically constants to adjust for tuning it for about any modern
 processor.
 
 Doing the same thing in assembler would probably not be much faster. (After
 you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
 the effort. 80% peak performance is already well beyond what people usually
 go for.)
 
 Furthermore, writing that kind of spaghetti code in C without getting an
 error in already needs a lot of discipline. Doing the same thing in
 assembler will probably land you in the next psychiatry...

Feb 02 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

zwang wrote:

 In general, rewriting C in assembly doesn't improve much,
 since modern compilers are good at optimizing general-purpose code. 
 Where hand-tuned assembly can often boost the performance is with
 programs that may exploit MMX & SSE instructions.

Then again, modern compilers can use those instructions too...

Here's for GDC's speedily porting to GCC 4.0, that does those.

--anders

Feb 02 2005

Dave <Dave_member pathlink.com> writes:

In article <ctq2di$2gbd$1 digitaldaemon.com>, Norbert Nemec says...

<snip>
The result was still pure C and therefore completely portable. The
performance was, of course, tuned to one specific architecture, but there
were basically constants to adjust for tuning it for about any modern
processor.

Doing the same thing in assembler would probably not be much faster. (After
you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
the effort. 80% peak performance is already well beyond what people usually
go for.)

If you still have that code handy, would it be possible to run it through DMD
and GDC and post the results vs., say, GCC (and Intel C/++ if it's available)?

Just out of curiousity..

Thanks,

- Dave

Feb 02 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Dave wrote:

 In article <ctq2di$2gbd$1 digitaldaemon.com>, Norbert Nemec says...

 <snip>
The result was still pure C and therefore completely portable. The
performance was, of course, tuned to one specific architecture, but there
were basically constants to adjust for tuning it for about any modern
processor.

Doing the same thing in assembler would probably not be much faster.
(After you went from 8% to 80%, the remaining factor of 1.25 probably
isn't worth the effort. 80% peak performance is already well beyond what
people usually go for.)

 
 If you still have that code handy, would it be possible to run it through
 DMD and GDC and post the results vs., say, GCC (and Intel C/++ if it's
 available)?

Already found out to my disappointment that I don't have it on my local
harddisk any more. Have to dig up some old backups...

In any case, I would not expect very conclusive results. The code was plain
ANSI C and did not depend on any compiler optimizations. Furthermore, it
was tuned to a specific Alpha processor (which had a comparatively simple
cache structure) The techniques were rather general, but the specifics were
tuned exactly to that one machine which I don't have any access to any
more.

Anyhow: I'll try to dig up the code and see in which state it is.

Feb 02 2005

Brian Chapman <nospam-for-brian see-post-for-address.net> writes:

On 2005-02-02 02:18:42 -0600, Norbert Nemec <Norbert Nemec-online.de> said:

 Writing assembler to get performance is about as outdated as counting
 cycles.

I disagree with this statement so very much I wouldn't even know where 
to begin. Granted it's all depends on the situation and I'm not talking 
about writing generalized code. It's pointless to continue this because 
the argument is as old as a PDP-11 rotting away in an MIT basement. We 
could debate this till were blue in the face I won't convince you to 
break out an assembler and you wont convince me that a compiler can (or 
should) do it better. Were just going to have to agree to disagree even 
though I don't even know what the point of this tread is supposed to be 
anymore.


 As I said before: if the code is simple enough to write it in
 assembler, it is also simple enough for a reasonable compiler to optimize
 it to the same extent.
 
 To exploit the full power of a modern processor, you have to do the right
 amount of loop unrolling, loop fusing, command interlacing and so on. YOu
 have to play with the data layout in memory, perhaps chunking arrays into
 smaller pieces. There are several more techniques to use, when you want to
 make full use of pipelining, branch prediction, cache lines and so on.

Maybe. Or if you had a copy of the CPU's programmer's manual you could 
just inline a nice slick column of opcodes and do what you want exactly 
instead of crossing your fingers when you type make or spending all day 
with a profiler trying various C idioms to various results. I'd rather 
take the compilers assembly output, grumble once, rewrite it properly 
and inline it back in.


 Languages like Fortran 95 would in principle allow to compiler to do all of
 this automatically (Some good implementations begin to emerge.)

Well you're most certainly never going to convince me to code in Fortran.


 Doing all of it by hand in C results in complete spaghetti code, but it is
 possible if you know exactly what you are doing. (In a course we did, we
 eventually transformed one single loop into an equivalent of ~500 lines of
 highly optimized spaghetti. The result was ten times faster than the
 original and somewhere around 80% of the absolute theoretical limit of the
 processor.

500 line Duff devices don't impress me. They make me want to do 
everybody a big favor and promptly delete the last copy of the 
offending source file on the spot.


 The result was still pure C and therefore completely portable. The
 performance was, of course, tuned to one specific architecture, but there
 were basically constants to adjust for tuning it for about any modern
 processor.
 
 Doing the same thing in assembler would probably not be much faster. (After
 you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
 the effort. 80% peak performance is already well beyond what people usually
 go for.)
 
 Furthermore, writing that kind of spaghetti code in C without getting an
 error in already needs a lot of discipline. Doing the same thing in
 assembler will probably land you in the next psychiatry...

Heh, you obviously don't know machine. If that's the kind of stuff you 
want to write, you go for it man. I'd rather just put some inline SIMD 
code in an asm block. I don't care how good you think your Fortran 
compiler or disciplined spegetti code is, it's never gonna know how to 
fill up all vector piplines to normalize 16 vectors for the price of 4 
or fire off a DMA chain to blit gigabytes of data at max utilization.

But it's a free world. You can loop unroll, fuse, and chunk arrays if you want.

Feb 03 2005

Georg Wrede <georg.wrede nospam.org> writes:

Brian Chapman wrote:

 ...it's never gonna know how to
 fill up all vector piplines to normalize 16 vectors for the price of 4 
 or fire off a DMA chain to blit gigabytes of data at max utilization.

Why?

Feb 03 2005

Brian Chapman <nospam-for-brian see-post-for-address.net> writes:

On 2005-02-03 09:59:01 -0600, Georg Wrede <georg.wrede nospam.org> said:

 Brian Chapman wrote:
 
 ...it's never gonna know how to
 fill up all vector piplines to normalize 16 vectors for the price of 4 
 or fire off a DMA chain to blit gigabytes of data at max utilization.

 
 Why?

Excellent question. All enlightenment begins with asking a good "why?"

If you really want to understand, I would invite you to start by 
reading some of the great information available at arstechnica.com.

Feb 03 2005

Georg Wrede <georg.wrede nospam.org> writes:

Brian Chapman wrote:
 On 2005-02-03 09:59:01 -0600, Georg Wrede <georg.wrede nospam.org> said:
 
 Brian Chapman wrote:

 ...it's never gonna know how to
 fill up all vector piplines to normalize 16 vectors for the price of 
 4 or fire off a DMA chain to blit gigabytes of data at max utilization.


 Why?

 
 
 Excellent question. All enlightenment begins with asking a good "why?"

Thank you!

 If you really want to understand, I would invite you to start by reading 
 some of the great information available at arstechnica.com.

Well, I at least hope the answer would be of general interest in
this forum. Also, you seem to have a good idea of "why", based on
the above quote. So, essentially a short(ish) answer would be
appreciated.

Feb 04 2005

Norbert Nemec <Norbert Nemec-online.de> writes:

Brian Chapman wrote:

 On 2005-02-02 02:18:42 -0600, Norbert Nemec <Norbert Nemec-online.de>
 said:
 
 Writing assembler to get performance is about as outdated as counting
 cycles.

 
 I disagree with this statement so very much I wouldn't even know where
 to begin. Granted it's all depends on the situation and I'm not talking
 about writing generalized code. It's pointless to continue this because
 the argument is as old as a PDP-11 rotting away in an MIT basement. We
 could debate this till were blue in the face I won't convince you to
 break out an assembler and you wont convince me that a compiler can (or
 should) do it better. Were just going to have to agree to disagree even
 though I don't even know what the point of this tread is supposed to be
 anymore.

I agree - I've discussed with several people on this and hardly ever came to
a conclusion. High-performace numerics experts would certainly agree on it.
Old-school assemblists would never...

 To exploit the full power of a modern processor, you have to do the right
 amount of loop unrolling, loop fusing, command interlacing and so on. YOu
 have to play with the data layout in memory, perhaps chunking arrays into
 smaller pieces. There are several more techniques to use, when you want
 to make full use of pipelining, branch prediction, cache lines and so on.

 
 Maybe. Or if you had a copy of the CPU's programmer's manual you could
 just inline a nice slick column of opcodes and do what you want exactly
 instead of crossing your fingers when you type make or spending all day
 with a profiler trying various C idioms to various results. I'd rather
 take the compilers assembly output, grumble once, rewrite it properly
 and inline it back in.

Well - do so, if you like to, just to realize that once you've spent hours
optimizing your code

 Languages like Fortran 95 would in principle allow to compiler to do all
 of this automatically (Some good implementations begin to emerge.)

 
 Well you're most certainly never going to convince me to code in Fortran.

Me neither, that's why I would like to see the same features in D - so far,
Fortran 95 is the only widely-spread language with that kind of
performance.


 Doing all of it by hand in C results in complete spaghetti code, but it
 is possible if you know exactly what you are doing. (In a course we did,
 we eventually transformed one single loop into an equivalent of ~500
 lines of highly optimized spaghetti. The result was ten times faster than
 the original and somewhere around 80% of the absolute theoretical limit
 of the processor.

 
 500 line Duff devices don't impress me. They make me want to do
 everybody a big favor and promptly delete the last copy of the
 offending source file on the spot.

The algorithm was simple but nontrivial: solving partial differential
equations. The original code was not stupidly coded, but just
straightforward, as anyone would write it at the first shot unless they
think of tricky issues of modern processor architecture. Back in the cycle
counting times, the latter version would have been even slower, since it
did many integer operations that the original did not need.

 Heh, you obviously don't know machine. If that's the kind of stuff you
 want to write, you go for it man. I'd rather just put some inline SIMD
 code in an asm block. I don't care how good you think your Fortran
 compiler or disciplined spegetti code is, it's never gonna know how to
 fill up all vector piplines to normalize 16 vectors for the price of 4
 or fire off a DMA chain to blit gigabytes of data at max utilization.

Why shouldn't it? As long as the compiler has the chance to reorder the
instructions within certain constraints and has enough intelligence
built-in to search for the optimum order it may do a pretty good job at
crunching the numbers and find something quite efficient. The behaviour of
the pipeline follows very strict rules that are different for each
architecture. You put all the rules into a file and the compiler will
optimize for a given architecture. Of course, this can only be done if the
language gives the necessary flexibility. This is exactly the point why I
believe that vectorized expressions in D are essential for high-performance
computing.

 But it's a free world. You can loop unroll, fuse, and chunk arrays if you
 want.

I don't care about doing that myself. I would like to teach it to a
compiler.

Feb 04 2005

Kevin Bealer <Kevin_member pathlink.com> writes:

One potential solution:  Formalize the existing (or semi-well known) method for
"reserving space":  ie. X.reserve(N) => X.length = N; X.length = old_length;

By creating a real method called "reserve" the array could have its cake and eat
it too:  reserve would be required to allocate the memory, but NOT to clear it;
that would STILL happen when the length was bumped up, but now it could be done
lazy-style.

Additional benefit: objects which override [] could also do reserve(), and could
have special behaviour which is smarter than adjusting length() twice.  In most
cases, they would just pass the savings down by using reserve() instead of
length() on *their* underlying data structure.

I'm including a simple memory bandwidth meter quickie.

Kevin

:
:private import std.date;
:private import std.stdio;
:private import std.conv;
:
:int main(char[][] args)
:{
:    long N = 100;
:    long MB = 1024*1024;
:    long Z1 = 64*MB;
:    long Z = Z1;
:    char[] p;
:    
:    if (args.length > 1) {
:        N = toInt(args[1]);
:    }
:    
:    if (args.length > 2) {
:        Z = toInt(args[2]) * MB;
:    }
:    
:    if (! N) {
:        N = 1;
:    }
:    
:    if (Z < 1024) {
:        Z = 256*MB;
:    }
:    
:    writef("Looping %s times.\n", N);
:    writef("Writing %s bytes/loop.\n", Z);
:    
:    d_time t1 = getUTCtime();
:    
:    for(int i = 0; i<N; i++) {
:        p.length = Z;
:        if (p[p.length / 3] == 'c') {
:            writef("Have C\n");
:        }
:        
:        if (p[p.length - 1] == 'q') {
:            writef("Have Q\n");
:        }
:        
:        p[p.length / 3] = 'c';
:        p[p.length - 1] = 'q';
:        
:        p.length = 1234;
:    }
:    
:    d_time t2 = getUTCtime();
:    
:    double sec = ((t2-t1) + 0.0)/TicksPerSecond;
:    
:    writef("Time elapsed = %s [res=%s/s].\n",
:           sec, TicksPerSecond);
:    
:    writef("Mem b/w = %s MB / sec.\n", ((Z/MB)*N)/sec);
:    
:    return 0;
:}
:

Feb 01 2005

"Walter" <newshound digitalmars.com> writes:

"Manfred Nowak" <svv1999 hotmail.com> wrote in message
news:ctmtne$2car$1 digitaldaemon.com...
 I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

There are several ways to create an array. If the array is statically
initialized, it is initialized when it is demand paged in. There is no code
generated to initialize it (in fact, there is no way to prevent this from
happening!)

Next, one can allocate arrays on the stack. These are normally initialized
at runtime, but this can be turned off using the idiom outlined in
www.digitalmars.com/d/memory.html#uninitializedarrays.

And lastly, one can dynamically allocate arrays using new, in which case
they are initialized, or using std.c.stdlib.malloc, in which case they are
not, or any other allocator one wishes to use.

P.S. there's no way to allocate a TB on the stack anyway <g>

P.P.S. it's been suggested that the special initializer syntax:
    = void;
mean "I know what I'm doing, don't initialize the variable" and I've been
considering implementing it.

Feb 02 2005

Vathix <vathix dprogramming.com> writes:

 P.P.S. it's been suggested that the special initializer syntax:
     = void;
 mean "I know what I'm doing, don't initialize the variable" and I've been
 considering implementing it.

I like it, but will it work with 'new'? When newing arrays and value types  
one might also not want to initialize.

Feb 02 2005

"Walter" <newshound digitalmars.com> writes:

"Vathix" <vathix dprogramming.com> wrote in message
news:opslk5o5vckcck4r esi...
 P.P.S. it's been suggested that the special initializer syntax:
     = void;
 mean "I know what I'm doing, don't initialize the variable" and I've


been
 considering implementing it.

 I like it, but will it work with 'new'?

No.

 When newing arrays and value types
 one might also not want to initialize.

True, but I don't think that's a good idea. The cases where initialization
of an array *might* make a difference (the critical path in a program tends
to be only in a small part of it) are so unusual it is not worth upsetting
new. And frankly, uninitialized garbage in gc allocated data can cause
problems with the mark/sweep algorithm, and would pull the rug out from
doing a future type-aware gc.

Use std.c.malloc for allocating uninitialized arrays; if it must be new'd,
instead use a wrapper class that malloc's/free's an internal private array.

Feb 02 2005

Andy Friesen <andy ikagames.com> writes:

Walter wrote:
 There are several ways to create an array. If the array is statically
 initialized, it is initialized when it is demand paged in. There is no code
 generated to initialize it (in fact, there is no way to prevent this from
 happening!)
 
 Next, one can allocate arrays on the stack. These are normally initialized
 at runtime, but this can be turned off using the idiom outlined in
 www.digitalmars.com/d/memory.html#uninitializedarrays.
 
 And lastly, one can dynamically allocate arrays using new, in which case
 they are initialized, or using std.c.stdlib.malloc, in which case they are
 not, or any other allocator one wishes to use.
 
 P.S. there's no way to allocate a TB on the stack anyway <g>
 
 P.P.S. it's been suggested that the special initializer syntax:
     = void;
 mean "I know what I'm doing, don't initialize the variable" and I've been
 considering implementing it.

All this makes me think that the only thing that really needs to be done 
is for this to be added to the FAQ.

D's current behaviour is more or less ideal as it stands: uninitialized 
memory can be acquired without fuss, but it won't ever be done by accident.

  -- andy

Feb 02 2005

D Programming

C/C++ Programming

Other

digitalmars.D - [performance]PreInitializing is an annoyance