www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [performance]PreInitializing is an annoyance

reply Manfred Nowak <svv1999 hotmail.com> writes:
I wrote about this before.

There is a well known time/space-tradeoff for the preInitialization of 
arrays: using about three times the space one can lazy initialize an 
array.

But this technic is useless within D, because of the automaatic 
preInitialization, which currently eats up about 3 cycles per byte.

Please awaken to, that on a 3GHz machine the busy preInitalization of 
one GB then lasts one second. And the coming 64-bit-machine will have 
up to some TB of main memory. Current mainboards can already hold up to 
4GB, which ist the current main memory limit for win32.

Check again, that to preInitialize one TB you have to wait more than 15 
minutes only to wait at least another 15 minutes until your 
videoediting can start, if no precautions are taken.

We need an option to switch automatic preInitialization of arrays off.

-manfred
Jan 31 2005
next sibling parent reply "Nick Sabalausky" <z a.a> writes:
By the time we have systems that have >1TB RAM, I'm sure the memory bus 
speeds will be much faster than they are now (And if they aren't, than 
there's a lot of other things besides initing arrays that would take 
insanely long as well).  So I don't think it would take nearly as long as 15 
minutes. But aside from that, you do raise an interesting point.


"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:ctmtne$2car$1 digitaldaemon.com...
I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

 -manfred
 

Jan 31 2005
parent Manfred Nowak <svv1999 hotmail.com> writes:
"Nick Sabalausky" wrote: 

[...]
 memory bus speeds will be much faster than they are now

The bus speeds can go as fast as they want. One cpu needs at least two cycles to store the next value: one cycle for incrementig the address and one to store the value, i.e. a 4GHZ cannot be faster than 0.5s for initializing one GB of RAM: still more than eight minutes for one TB. And in standard machinces the amount of RAM grew in the last ten years by the factor three more than the CPU-frequency. -manfred
Jan 31 2005
prev sibling next sibling parent reply "Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:
Hi..

I guess you can easily do an OS call is such a case: HeapAlloc (or even 
better: VirtualAlloc) in Win32. You should use these functions anyway for 
large blocks of data (dynamic arrays are not meant for that, never were). Or 
simply call malloc, it doesn't initialize the values either.

This sure is better than any crt_init(bool) or whatever call you're thinking 
of to turn on/off array initializations. A compile-time option would be even 
worse: It'd would break a program if compiled with the wrong flag.

Lionello.

"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:ctmtne$2car$1 digitaldaemon.com...
I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

 -manfred
 

Feb 01 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
"Lionello Lunesu" wrote: 

 I guess you can easily do an OS call

But that is not portable between OS's. [...]
 large blocks of data (dynamic arrays are not meant for that, never
 were). 

From where do you have this wisdom? If so please explain the prereqiesites for the usage of dynamic arrays and for fixed arrays as well.
 Or simply call malloc, it doesn't initialize the values either.

I would use malloc, if arrays at all are not to be used for large amounts of memory. Which seems to be a contradiction to the fact that memory cells are laid out like an array. -manfred
Feb 01 2005
parent pragma <pragma_member pathlink.com> writes:
In article <ctohtt$10mr$1 digitaldaemon.com>, Manfred Nowak says...
[...]
 large blocks of data (dynamic arrays are not meant for that, never
 were). 

From where do you have this wisdom? If so please explain the prereqiesites for the usage of dynamic arrays and for fixed arrays as well.

Possibly the most compelling fact, that D's arrays are not for huge chunks of memory, is that they rely on copy-on-write semantics. With a 1GB chunk of memory in a single D array, modifications to slices and concatenations will would result in a complete realloc and copy; so you could only use *one-half to a third* of your system's memory... and that's on a good day with agressive memory management and no GC. Also, it's ill-advised to use a single array directly for something like this simply due to the cache-misses that are likely to result from random-accesses on a 1GB structure; the same goes for virtually every language out there. To that end D sits frimly in the "trade more space for less running time" optimization camp, which is fine for the majority of tasks out there. Superscale blobs of data require a behavior that certainly can be expressed in D, but is not enshrined in it's underlying design. IMO, the optimal (all-round) solution for working with massive data structures, would approach the complexity of a memory manager and not a series of simple array manipulations. From there, you could implement an array-like interface to make it more friendly to use, but it would still be a far cry from a true array. - EricAnderton at yahoo
Feb 01 2005
prev sibling next sibling parent reply Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= <arael fov.pl> writes:
Manfred Nowak wrote:

 I wrote about this before.
 
 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.
 
 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.
 
 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.
 
 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.
 
 We need an option to switch automatic preInitialization of arrays off.
 
 -manfred

I was reading documentation of D and found paragraph about initialization of variables. Can somebody tell me what is the point in doing so and if it's done just by setting them after creation (and IMHO wasting cpu cycles) or is it cost free (I don't know how would be that possible, but that is why I'm asking). Your post just remind me that case. I'm going even further. Why to initialize anything at all? In 99.9% cases variables are initialized one or two lines after creation and default isn't used at all. -- Dawid Ciê¿arkiewicz | arael jid: arael fov.pl
Feb 01 2005
next sibling parent reply =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Dawid Ciężarkiewicz wrote:

 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized one or
 two lines after creation and default isn't used at all.

It's for the other 0.1% percent, where forgetting to initialize a variable causes a subtle bug ? In other languages, such as Objective-C for instance, these are separate events altogether:
   NewObject *newObject; // newObject will be an instance of the NewObject class
   newObject = [[NewObject alloc] init]; // create and initialize the object
   [newObject doSomethingWith: anotherObject];

But in D, both will be performed when using the "new" keyword. http://www.digitalmars.com/d/class.html#constructors:
 Members are always initialized to the default initializer for their
 type, which is usually 0 for integer types and NAN for floating point
 types. This eliminates an entire class of obscure problems that come
 from neglecting to initialize a member in one of the constructors.

Of course, in Objective-C you also have to retain/release or use Autorelease Pools, which is more fuss than D's garbage collection... (since it uses a simpler manual method of reference counting) You can still just kill it, of course, similar to "delete" in D:
  [newObject dealloc];

PreInitializing and GarbageCollecting are a whole lot easier to use. And for local variables, a reasonably good compiler should be able to optimize out the .init value, if it's just replaced right away... If not, then your code probably have other performance problems ;-) You can still write critical parts in C or even asm, and link them in ? (or write D code using C-standard functions or DMD inline X86 assembler) --anders
Feb 01 2005
next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Anders F Björklund wrote:

 You can still write critical parts in C or even asm, and link them in ?
 (or write D code using C-standard functions or DMD inline X86 assembler)

This definitely is not an excuse for any limitation in D. Assembler is important for systems programming, but trying to beat a modern compiler with hand-written assembler will work only in very special cases. Simple code can usually be optimized be the compiler anyway and complicated code will become a mess written in assembler. For C - of course, one should be able to bind in existing C code, but in the long term, there should not be any reason to write new code in C if you have a D compiler at hand.
Feb 01 2005
parent reply =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Norbert Nemec wrote:

 For C - of course, one should be able to bind in existing C code, but in the
 long term, there should not be any reason to write new code in C if you
 have a D compiler at hand.

I see D as a nice replacement for C++ in the (very) long run, but I will continue to use either C or Java when they fit better... But I agree that D is a bit strange in that it's pretty easy to e.g. dereference null, but hard to e.g. allocate uninited memory ? So far the performance has been good (just a few Mac OS X quirks still), and seems to be one of the key points of D. So it's right to address it. Maybe the auto initialization can be be replaced with an error if you try to actually use the value without setting it first ? Like in Java. (Java does D-style init of members, but only such errors for local vars. Not sure how much work it is for the compiler to catch such errors ?) --anders
Feb 01 2005
next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Anders F Björklund wrote:

 Norbert Nemec wrote:
 
 For C - of course, one should be able to bind in existing C code, but in
 the long term, there should not be any reason to write new code in C if
 you have a D compiler at hand.

I see D as a nice replacement for C++ in the (very) long run, but I will continue to use either C or Java when they fit better...

For Java this is clear - it has a completely different objective, so D does not even try to compete with in every respect. For C on the other hand, D should try to surpass it in every respect, so that there are no cases left, where C "fits better". This certainly is an ambitious goal that may never be reached completely, but nevertheless, it is a goal.
 Maybe the auto initialization can be be replaced with an error if you
 try to actually use the value without setting it first ? Like in Java.

This would not make much difference. If the compiler is able to detect this error, it is also able to optimize away unnecessary initializations. This is simple in trivial cases but - I believe - impossible in general.
Feb 01 2005
next sibling parent reply =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Norbert Nemec wrote:

I see D as a nice replacement for C++ in the (very) long run,
but I will continue to use either C or Java when they fit better...

For Java this is clear - it has a completely different objective, so D does not even try to compete with in every respect.

Just some... And the languages are not *that* different, actually ? (just talking about the Java language, not the JVM or the religion)
 For C on the other hand, D should try to surpass it in every respect, so
 that there are no cases left, where C "fits better". This certainly is an
 ambitious goal that may never be reached completely, but nevertheless, it
 is a goal.

I don't see either D or C++ as a replacement for regular C, more as a compliment? In my world, C is a more portable alternative to assembler. This does not mean I write everything in it (or in assembler, either) For me, D fits in nicely between the C and Java language "extremes"... --anders
Feb 02 2005
parent Norbert Nemec <Norbert Nemec-online.de> writes:
Anders F Björklund wrote:

 Norbert Nemec wrote:
 For C on the other hand, D should try to surpass it in every respect, so
 that there are no cases left, where C "fits better". This certainly is an
 ambitious goal that may never be reached completely, but nevertheless, it
 is a goal.

I don't see either D or C++ as a replacement for regular C, more as a compliment? In my world, C is a more portable alternative to assembler. This does not mean I write everything in it (or in assembler, either)

OK, I probably came on a bit too fast with my answer. C certainly has its uses. Personally, I don't use it at all, but then - everybody has a limited view of the world...
Feb 02 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:ctq038$2de3$1 digitaldaemon.com...
 For C on the other hand, D should try to surpass it in every respect, so
 that there are no cases left, where C "fits better". This certainly is an
 ambitious goal that may never be reached completely, but nevertheless, it
 is a goal.

Currently, the only cases where C fits better are: 1) you need to work with existing C code 2) there isn't a D compiler for the target 3) you're working with a tool that generates C code 4) your staff is content using C and will not try anything else These are all environmental considerations, not language issues. It's faster to write code in D, faster to compile it and faster to debug it. If I'm missing something, if there is something that the C language is a better fit for, I'd like to know what it is!
Feb 02 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Walter wrote:
 Currently, the only cases where C fits better are:
 
 1) you need to work with existing C code
 2) there isn't a D compiler for the target
 3) you're working with a tool that generates C code
 4) your staff is content using C and will not try anything else

5) You want your code to look pretty: http://www.de.ioccc.org/2004/anonymous.c :-)
Feb 02 2005
parent "Walter" <newshound digitalmars.com> writes:
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:ctrj9i$146d$1 digitaldaemon.com...
 Walter wrote:
 Currently, the only cases where C fits better are:

 1) you need to work with existing C code
 2) there isn't a D compiler for the target
 3) you're working with a tool that generates C code
 4) your staff is content using C and will not try anything else

5) You want your code to look pretty: http://www.de.ioccc.org/2004/anonymous.c :-)

You might want to check out: http://fly.srk.fer.hr/ioccc/years.html#1986_bright <g>
Feb 02 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:ctomh9$14jr$1 digitaldaemon.com...
 I see D as a nice replacement for C++ in the (very) long run,
 but I will continue to use either C or Java when they fit better...

You can write "C" code in D. Take a look at the Empire source code <g>.
 But I agree that D is a bit strange in that it's pretty easy to
 e.g. dereference null, but hard to e.g. allocate uninited memory ?

It isn't strange viewed from the perspective that dereferencing null always generates a seg fault, and so cannot be ignored, overlooked, etc. Allocating uninitialized memory can lead to erratic, random behavior which sometimes can *appear* to work successfully, hence the idea that this is a bad thing that must be stamped out. Predictable, consistent behavior is what makes for robust, debuggable, error free programs.
 So far the performance has been good (just a few Mac OS X quirks still),
 and seems to be one of the key points of D. So it's right to address it.

I wish to point out that DMDScript in D is faster than DMDScript in C++, despite the D version doing the automatic initialization (and the other safety features in D). The magic dust at work here is the D profiler and the ease in manipulating the D source code to make it faster. (D code, I've discovered, is easier than C++ to manipulate source to try to make it run faster. I spent a lot less time tuning the D code, and got better results.)
 Maybe the auto initialization can be be replaced with an error if you
 try to actually use the value without setting it first ? Like in Java.

That only works well if you've got hardware support for it. The compiler cannot reliably determine this, though some compilers fake it and issue spurious and irritatingly wrong warnings when they get it wrong.
 (Java does D-style init of members, but only such errors for local vars.
   Not sure how much work it is for the compiler to catch such errors ?)

The same techniques for catching such errors at compile time can be used instead to eliminate initializations that are not needed, which is done by DMD. I much prefer the latter approach, as when an initialization is redundant but such redundancy is not detectable, it "fails safe" by leaving the initialization in rather than issuing a nuisance error message.
Feb 02 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter wrote:

But I agree that D is a bit strange in that it's pretty easy to
 e.g. dereference null, but hard to e.g. allocate uninited memory ?


It isn't strange viewed from the perspective that dereferencing null always generates a seg fault, and so cannot be ignored, overlooked, etc. Allocating uninitialized memory can lead to erratic, random behavior which sometimes can *appear* to work successfully, hence the idea that this is a bad thing that must be stamped out.

Yeah, you can hardly escape NullPointerErrors even in virtual machines. Just meant that there are other languages that do more of hand-holding? D has this funny mix of low and high level, that takes a time of getting used to. But I like it :-) At least most of it, save a few rants... ;-)
 I wish to point out that DMDScript in D is faster than DMDScript in C++,
 despite the D version doing the automatic initialization (and the other
 safety features in D). The magic dust at work here is the D profiler and the
 ease in manipulating the D source code to make it faster. (D code, I've
 discovered, is easier than C++ to manipulate source to try to make it run
 faster. I spent a lot less time tuning the D code, and got better results.)

The differences I'm seeing are mostly due to the fact that Apple has spent a lot of time tuning their compiler for C, Objective-C and C++ (and even the bastard child Objective-C++) but for D, I need to use the regular GCC which only has a few of those PowerPC tunings done... This gets even larger when using vector operations, on the PPC G4/G5. For DMD platforms, such as Win32 or Linux X86, this is not an issue. (or maybe less of an issue, as I don't how the SSE/MMX support is?)
 The same techniques for catching such errors at compile time can be used
 instead to eliminate initializations that are not needed, which is done by
 DMD. I much prefer the latter approach, as when an initialization is
 redundant but such redundancy is not detectable, it "fails safe" by leaving
 the initialization in rather than issuing a nuisance error message.

It's also simpler to code with variables that start with a known value, rather than getting warnings later (even if they could be done reliable) I actually like that member fields are inited with known initializers, usally zeroes, and the locals will be optimized out anyway... Leaving large arrays and such, which is something that still can be addressed. --anders
Feb 02 2005
prev sibling parent Dave <Dave_member pathlink.com> writes:
In article <ctraod$rm5$1 digitaldaemon.com>, Walter says...
"Anders F Björklund" <afb algonet.se> wrote in message
news:ctomh9$14jr$1 digitaldaemon.com...
 I see D as a nice replacement for C++ in the (very) long run,
 but I will continue to use either C or Java when they fit better...

You can write "C" code in D. Take a look at the Empire source code <g>.

FWIW, that's where I see D making it's biggest inroads initially with the general programming community, especially now that Linux is surging and has a large number of fluent C programmers who are generally not forced into "vendor (or language or tool) tie-in". They'll use it a lot like C except maybe actually start to use OOP because D makes that very straight-forward ;) That's also why I personally tend to put a lot of stock in run-time performance; an easier/safer/gc'd "C" that generally performs as well at v1.0 (and potentially better in the future) sounds like a pretty darn good reason to risk a switch for me <g>. I mean the reason a systems programmer 'drops to C' and doesn't do everything in Perl or Python or whatever is because of performance. That's usually the most compelling reason anyway. Less than equal performance (or even perceived performance via common benchmark results) will also be an equally compelling reason for many C programmers to not try D ;) - Dave
Feb 02 2005
prev sibling parent reply Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= <arael fov.pl> writes:
Anders F Björklund wrote:

 Dawid Ciê¿arkiewicz wrote:
 
 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized one
 or two lines after creation and default isn't used at all.

It's for the other 0.1% percent, where forgetting to initialize a variable causes a subtle bug ? In other languages, such as Objective-C for instance, these are separate events altogether:

I can agree that setting floats to NAN make sense, but setting forgotten int to arbitrary value won't help program so much. It will let him give same errors rather than random ones. :)
 PreInitializing and GarbageCollecting are a whole lot easier to use.

I agree that GrabageCollecting is *necessary* for modern computer programing language. And I agree that should be a way to disable it if needed. Just as with variable initialization - sometimes (in critical parts of code) it would be good to could disable this.
 And for local variables, a reasonably good compiler should be able
 to optimize out the .init value, if it's just replaced right away...

This is part of the answer that I expected and I'm glad to hear, but what about variables that are not expected to have any value for start and are initialized later? -- Dawid Ciê¿arkiewicz | arael jid: arael fov.pl
Feb 01 2005
parent =?ISO-8859-2?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Dawid Ciê¿arkiewicz wrote:

 This is part of the answer that I expected and I'm glad to hear, but what
 about variables that are not expected to have any value for start and are
 initialized later?

Hard to say in the general case, I recommend playing with -O and disasm? (if you use GDC, you can compare -O0...-O3, and get asm output with -S) --anders
Feb 01 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Dawid Ciê¿arkiewicz" <arael fov.pl> wrote in message
news:ctnv58$c53$1 digitaldaemon.com...
 I was reading documentation of D and found paragraph about initialization

 variables. Can somebody tell me what is the point in doing so and if it's
 done just by setting them after creation (and IMHO wasting cpu cycles) or
 is it cost free (I don't know how would be that possible, but that is why
 I'm asking).

The point of it is to eliminate a common, and difficult to find, source of bugs.
 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized one

 two lines after creation and default isn't used at all.

In those cases, the optimizer will usually eliminate the initializer (since it is a "dead assignment").
Feb 02 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Walter wrote:

 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized one

 two lines after creation and default isn't used at all.

In those cases, the optimizer will usually eliminate the initializer (since it is a "dead assignment").

'usually' is the point here. In certain cases, the compiler will not be able to determine that it actually is a dead assignment and leave it in. There should be a compiler pragma to tell the compiler about it in this case.
Feb 02 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:ctr980$pr9$1 digitaldaemon.com...
 Your post just remind me that case. I'm going even further. Why to
 initialize anything at all? In 99.9% cases variables are initialized



 or
 two lines after creation and default isn't used at all.

In those cases, the optimizer will usually eliminate the initializer (since it is a "dead assignment").

'usually' is the point here. In certain cases, the compiler will not be

 to determine that it actually is a dead assignment and leave it in. There
 should be a compiler pragma to tell the compiler about it in this case.

I honestly think that in a non-trivial program, you'd be very, very hard pressed to see a measurable difference in program performance from this.
Feb 02 2005
parent Manfred Nowak <svv1999 hotmail.com> writes:
"Walter" wrote: 

[...]
 I honestly think that in a non-trivial program, you'd be very,
 very hard pressed to see a measurable difference in program
 performance from this. 

The measure to b taken in the case I put in this discussion is the steadiness of the run by lazy initializing. This costly in total. -manfred
Feb 02 2005
prev sibling next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
This issue should not need any measurements for justification. It is clear
that preinitializing causes some overhead.

Personally, I believe there needs to be some way to deactivate it. One
cannot expect the compiler to optimize away all unnecessary
initializations. Especially for array, where initialization really becomes
an issue, the initialization might not happen in one simple loop.

As I understand the philosophy of D, it does allow the user to shoot himself
in the foot if he really wants to. It is ok to default to a safe behaviour,
but experts should be able to deactivate the safety measures by some
explicit command. (A global compiler option is not a good idea! I has to be
specified right in the code in some way. Any ideas for a possible syntax
specifying "This variable should not be initialized"? It should be possible
both for variables as well as for dynamically allocated memory or class
members.



Manfred Nowak wrote:

 I wrote about this before.
 
 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.
 
 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.
 
 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.
 
 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.
 
 We need an option to switch automatic preInitialization of arrays off.
 
 -manfred

Feb 01 2005
parent reply pragma <pragma_member pathlink.com> writes:
In article <ctoitr$11pk$1 digitaldaemon.com>, Norbert Nemec says...
Any ideas for a possible syntax
specifying "This variable should not be initialized"? It should be possible
both for variables as well as for dynamically allocated memory or class
members.

This sounds like a job for a compiler pragma.
 pragma(noinit) int a; // a gets a random value now (just like C!).
 int[] b; 
 pragma(noinit){
     b = new int[1024*1024*1024]  // b gets an uninitalized 1GB block.
 }

The nice thing about this is that pragmas in D are not be ignored if not understood. So while compiler dependent, the code won't compile on D compilers that don't support 'noinit'. - EricAnderton at yahoo
Feb 01 2005
parent Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= <arael fov.pl> writes:
pragma wrote:

 In article <ctoitr$11pk$1 digitaldaemon.com>, Norbert Nemec says...
Any ideas for a possible syntax
specifying "This variable should not be initialized"? It should be
possible both for variables as well as for dynamically allocated memory or
class members.

This sounds like a job for a compiler pragma.
 pragma(noinit) int a; // a gets a random value now (just like C!).
 int[] b;
 pragma(noinit){
     b = new int[1024*1024*1024]  // b gets an uninitalized 1GB block.
 }

The nice thing about this is that pragmas in D are not be ignored if not understood. So while compiler dependent, the code won't compile on D compilers that don't support 'noinit'. - EricAnderton at yahoo

And you could always use "else" to initialize memory old way. I like it. -- Dawid Ciê¿arkiewicz | arael jid: arael fov.pl
Feb 02 2005
prev sibling next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:ctmtne$2car$1 digitaldaemon.com...
I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

 -manfred

The ironic part is that the GC itself has malloc but the D interface to it always clears the result. See src/phobox/internal/gc.d routine _d_newarrayi. It would be really nice to have the following added to the GC interface: void* malloc(size_t len) { return _gc.malloc(len); } Ah, to have something so close and yet so far away... And while I'm at it how about exposing _gc.realloc, _gc.free and _gc.capacity, too. oh, now I'm just dreaming I know. -Ben
Feb 01 2005
next sibling parent reply J C Calvarese <jcc7 cox.net> writes:
In article <ctokln$13jb$1 digitaldaemon.com>, Ben Hinkle says...
"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:ctmtne$2car$1 digitaldaemon.com...
I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

 -manfred

The ironic part is that the GC itself has malloc but the D interface to it always clears the result. See src/phobox/internal/gc.d routine _d_newarrayi. It would be really nice to have the following added to the GC interface: void* malloc(size_t len) { return _gc.malloc(len); } Ah, to have something so close and yet so far away... And while I'm at it how about exposing _gc.realloc, _gc.free and _gc.capacity, too. oh, now I'm just dreaming I know. -Ben

As look as we're doing wishful thinking, maybe we could just put a readable/writeable property in the GC module called: "noinit". After _d_newarrayi malloc's the memory, it checks "noinit" to see if it should initialize or not. The majority of programmers would never touch this setting. Those that have to clear 1 TB of memory have the option. jcc7
Feb 01 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
J C Calvarese wrote:

 The majority of programmers would never touch this
 setting. Those that have to clear 1 TB of memory have the option.

You don't really have to go for one 1TB to see the use of that option. Write a routine that has a 1KB array locally on the stack and call that routine repeatedly...
Feb 01 2005
parent "Ivan Senji" <ivan.senji public.srce.hr> writes:
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:ctq0t2$2e32$1 digitaldaemon.com...
 J C Calvarese wrote:

 The majority of programmers would never touch this
 setting. Those that have to clear 1 TB of memory have the option.

You don't really have to go for one 1TB to see the use of that option.

 a routine that has a 1KB array locally on the stack and call that routine
 repeatedly...

I had a situation like this (but more than 1KB) and it didn't seem to work that slow, but i am sure it would work faster if it wasn't initialized every time. Why not try to persuade Walter of some syntax that would allow to create uninitialized arrays? noinit int[100000] array; or any other form that could be used. This would ofcourse be an option and used only when you need it and are sure that you will initialize the data later.

Feb 02 2005
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
 "Manfred Nowak" <svv1999 hotmail.com> wrote in message 
But this technic is useless within D, because of the automaatic
preInitialization, which currently eats up about 3 cycles per byte.


Admittedly, I haven't followed processor specs for a while, but initializing memory to zero should take about 1 cycle per 4 bytes, unless I'm totally confused. And on a 64 bit bus machine 1 cycle 8 bytes. (We're talking about long sequences of memory here, not small structs or single variables.) Does someone know this better?
Feb 01 2005
parent reply Brian Chapman <nospam-for-brian see-post-for-address.net> writes:
On 2005-02-01 16:17:17 -0600, Georg Wrede <georg.wrede nospam.org> said:

 "Manfred Nowak" <svv1999 hotmail.com> wrote in message
 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.


Admittedly, I haven't followed processor specs for a while, but initializing memory to zero should take about 1 cycle per 4 bytes, unless I'm totally confused. And on a 64 bit bus machine 1 cycle 8 bytes. (We're talking about long sequences of memory here, not small structs or single variables.)

No guys. Cycle counting is as old as MSDOS floppy disks. It's much better and more complicated than that anymore. First of all, memory is transfered from RAM to L1 cache in a cache-line fill. 32 or 64 byte lines depending on the architecture. Then to L2 (on chip) cache. Various RAM types (ie: DDR) and cache read/write strategies will greatly affect how fast this is. Wait states are the problem area. A processor can sit there doing nothing for many "cycles" waiting for the bus and memory controller to get their ass in gear. Now, Superscalar processors (pentium onward) can read an write in parallel as long as the data does not depend on the previous results. Clearing memory would be such a case. So two ore more write instructions could be paired and executed in one "cycle." And if SIMD instructions are being used, then were talking even more through put. There's just no way of counting anymore. It's an old practice that doesn't relate to current hardware anymore. For instance, on the PPC it's all about keeping data in the cache. You can have an other wise high "cycle count" of code but if it keeps cache misses to a minimum then it will out preform a piece of code optimized on "cycle counting." The question is how much throughput and bandwidth do you have? The processor and it's instruction set is not the issue. But all of this is irrelevant to me, because if you're wanting to do gigabyte bare-metal memory blits for video editing, it would be beyond me why you would be using D and expecting it to do what you want. Why not just use Java? That makes about as much sense. You need to know YOUR hardware and write bare-metal ASM to get what you need done if it's that vital. I don't use a can opener to peal a grapefruit just because it's brand new and shiny.
Feb 01 2005
next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
 But all of this is irrelevant to me, because if you're wanting to do 
 gigabyte bare-metal memory blits for video editing, it would be beyond me 
 why you would be using D and expecting it to do what you want. Why not 
 just use Java? That makes about as much sense. You need to know YOUR 
 hardware and write bare-metal ASM to get what you need done if it's that 
 vital.

D is aiming to support bare-metal programming, from what I gather from the Major Goals section of http://www.digitalmars.com/d/overview.html: "Provide low level bare metal access as required" D and Java are lightyears apart in terms of bare-metal access. The existing way to get gobs of uninitialized memory is to call std.c.stdlib.malloc and manage the memory by hand. That's fine and dandy but we want more. What was that old Queen song? "I want it all and I want it now"? Sounds good to me. :-)
Feb 01 2005
parent Brian Chapman <nospam-for-brian see-post-for-address.net> writes:
On 2005-02-01 20:59:16 -0600, "Ben Hinkle" <ben.hinkle gmail.com> said:

 D is aiming to support bare-metal programming, from what I gather from 
 the Major Goals section of http://www.digitalmars.com/d/overview.html:
 "Provide low level bare metal access as required"
 D and Java are lightyears apart in terms of bare-metal access.
 
 The existing way to get gobs of uninitialized memory is to call 
 std.c.stdlib.malloc and manage the memory by hand. That's fine and 
 dandy but we want more. What was that old Queen song? "I want it all 
 and I want it now"? Sounds good to me. :-)

Sorry, Ben. I was out of line with the Java comment. A little moment of blunt humor got the better of me. ;-) D most certainly can't be compared with Java in that (and many other) respects. That wasn't the intent of my point. I'm just saying anyone who thinks they should be able to do a "ubyte[1<<30] videoData;" and thinks it *should* be optimal or else "we need to fix the compiler" deserves the headache they're going to get. ;-) But I'm afraid malloc isn't the answer either. More like direct memory mapping, ie: mmap/mlock.
Feb 03 2005
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Brian Chapman wrote:

 But all of this is irrelevant to me, because if you're wanting to do
 gigabyte bare-metal memory blits for video editing, it would be beyond
 me why you would be using D and expecting it to do what you want. Why
 not just use Java? That makes about as much sense. You need to know
 YOUR hardware and write bare-metal ASM to get what you need done if
 it's that vital.

Writing assembler to get performance is about as outdated as counting cycles. As I said before: if the code is simple enough to write it in assembler, it is also simple enough for a reasonable compiler to optimize it to the same extent. To exploit the full power of a modern processor, you have to do the right amount of loop unrolling, loop fusing, command interlacing and so on. YOu have to play with the data layout in memory, perhaps chunking arrays into smaller pieces. There are several more techniques to use, when you want to make full use of pipelining, branch prediction, cache lines and so on. Languages like Fortran 95 would in principle allow to compiler to do all of this automatically (Some good implementations begin to emerge.) Doing all of it by hand in C results in complete spaghetti code, but it is possible if you know exactly what you are doing. (In a course we did, we eventually transformed one single loop into an equivalent of ~500 lines of highly optimized spaghetti. The result was ten times faster than the original and somewhere around 80% of the absolute theoretical limit of the processor. The result was still pure C and therefore completely portable. The performance was, of course, tuned to one specific architecture, but there were basically constants to adjust for tuning it for about any modern processor. Doing the same thing in assembler would probably not be much faster. (After you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth the effort. 80% peak performance is already well beyond what people usually go for.) Furthermore, writing that kind of spaghetti code in C without getting an error in already needs a lot of discipline. Doing the same thing in assembler will probably land you in the next psychiatry...
Feb 02 2005
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Norbert Nemec wrote:

 Writing assembler to get performance is about as outdated as counting
 cycles. As I said before: if the code is simple enough to write it in
 assembler, it is also simple enough for a reasonable compiler to optimize
 it to the same extent.

Of course, *reading* assembler is a good way to help write that good C code and is also a great help when debugging without the source code? So I still think learning to read (and write too, just for compliment) assembler is relevant, just as I think C is... Lots of people disagree*.
 Doing the same thing in assembler would probably not be much faster. (After
 you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
 the effort. 80% peak performance is already well beyond what people usually
 go for.)

You could be in for a surprise there, though. But I agree that writing assembly is now a lot harder these days, in the post-RISC CPU era... These days, assembler and C are more useful for generating *small* code? Major loop unrolling and load/store reordering are a pain to do in asm. --anders * = Those darn Quiche Eaters. C and ASM is for us Real Programmers. :-)
Feb 02 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Anders F Björklund wrote:

 Norbert Nemec wrote:
 
 Writing assembler to get performance is about as outdated as counting
 cycles. As I said before: if the code is simple enough to write it in
 assembler, it is also simple enough for a reasonable compiler to optimize
 it to the same extent.

Of course, *reading* assembler is a good way to help write that good C code and is also a great help when debugging without the source code?

Of course: if you want to exploit your compiler you have to know it, so reading assembler might be a good idea once in a while...
 Doing the same thing in assembler would probably not be much faster.
 (After you went from 8% to 80%, the remaining factor of 1.25 probably
 isn't worth the effort. 80% peak performance is already well beyond what
 people usually go for.)

You could be in for a surprise there, though.

Not really. In that specific example (which was typical for numerics) the algorithm was given. It was known that the calculation needed a certain number of floating point operations. Each processor has some physical limit of floating point operations per second that it could theoretically achieve under absolute optimum conditions. No code in the world will ever break this limit. If you reach 80% of it with plain C code, you know that using assembler cannot not give you much gain. No surprise possible as long as you stick to the same algorithm.
 These days, assembler and C are more useful for generating *small* code?

Of course. Code-size was never a concern for me yet. I was only talking about performance. (Be aware though, that excessive code-bloat is bad for performance as well. The code-cache is limited as well, so excessive loop-unrolling will kill performance as well.)
Feb 02 2005
parent reply =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Norbert Nemec wrote:

 Of course. Code-size was never a concern for me yet. I was only talking
 about performance. (Be aware though, that excessive code-bloat is bad for
 performance as well. The code-cache is limited as well, so excessive
 loop-unrolling will kill performance as well.)

I think the new GCC default on Mac OS X, -Os, is a fair trade-off ? It's same as -O2, without the excessive code-heavy optimizations... http://gcc.gnu.org/onlinedocs/gcc-3.3.5/gcc/Optimize-Options.html It's a good allround compiler setting. For systems that tune the output to the present computer, like Gentoo Linux, then other flags might be in order that more specifically target the CPU. But it's very hard to "optimize for the general case", which is why Just-In-Time compilers and recompiling from source code are popular ? Problem with assembler is that it just isn't portable enough today. But I think the *performance* of D and DMD is more than good enough. Right now I'm more concerned about the bugs and ever getting to "1.0" (and porting GDC to the new GCC 4.0, would also be very interesting) --anders
Feb 02 2005
parent reply Dave <Dave_member pathlink.com> writes:
In article <ctq7qb$2lmg$1 digitaldaemon.com>,
=?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...

But I think the *performance* of D and DMD is more than good enough.
Right now I'm more concerned about the bugs and ever getting to "1.0"
(and porting GDC to the new GCC 4.0, would also be very interesting)

I agree, with the exception of DMD floating point which I hope will be given some attention before 1.0. It's important to me and I think will actually turn out to be important to the overall acceptance of the language (and certainly DMD) come 1.0. I'm not talking about new array semantics, vectorizing, expression templates or anything like what Norbert has been speaking of lately; just plain old for(...) { PI * 2.0 * radius[i]; ...; } type of stuff. BTW - How close are they getting with GCC 4? I have not been following that lately. - Dave
--anders

Feb 02 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Dave wrote:

 BTW - How close are they getting with GCC 4? I have not been following that
 lately.

See http://gcc.gnu.org/develop.html#stage3, they're at the final stage. Apple is going to use it as the main system compiler in next Mac OS X. --anders
Feb 02 2005
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Dave wrote:

 In article <ctq7qb$2lmg$1 digitaldaemon.com>,
 =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...

But I think the *performance* of D and DMD is more than good enough.
Right now I'm more concerned about the bugs and ever getting to "1.0"
(and porting GDC to the new GCC 4.0, would also be very interesting)

I agree, with the exception of DMD floating point which I hope will be given some attention before 1.0.

Have you tested the current floating point performance of gdc, compared to gcc/g++? This would give a clue about whether it is a problem of the front end or the DM backend. Are there any general comparisons of the code produced by the different compilers? (Not only "What *does* work?", like in the stress test, but also "How well does is work?")
Feb 02 2005
next sibling parent Thomas Kuehne <thomas-dloop kuehne.THISISSPAM.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Norbert Nemec wrote:
| Are there any general comparisons of the code produced by the
| different compilers? (Not only "What *does* work?", like in the
| stress test, but also "How well does is work?")

Most of the comparisons are on the benchmark level

http://www.prowiki.org/wiki4d/wiki.cgi?Benchmarks
http://gcc.gnu.org/benchmarks/
http://shootout.alioth.debian.org/

I'm not aware of any current public compiler dissections.

Thomas

-----BEGIN PGP SIGNATURE-----

iD8DBQFCAR4X3w+/yD4P9tIRAtAOAJ9ej/xFnhYO4wyhNxKiaOZwSLOScQCgiXtO
YvOsTKmUg5z4brXrlrKozlo=
=mDCr
-----END PGP SIGNATURE-----
Feb 02 2005
prev sibling next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Norbert Nemec wrote:

 Are there any general comparisons of the code produced by the different
 compilers? (Not only "What *does* work?", like in the stress test, but also
 "How well does is work?")

On Mac OS X, most of it is like "hooray, it compiles" :-) Benchmarks hasn't been too bad, but currently GCC is quicker for most tasks (and then again gcc code generated on PPC is not all that good) When Mango compiles, and some of the more annoying D bugs like "void main()" are out, we can do some more testing. For now, DStress is a pretty good start... --anders
Feb 02 2005
prev sibling parent Dave <Dave_member pathlink.com> writes:
In article <ctr5d5$lfd$1 digitaldaemon.com>, Norbert Nemec says...
Dave wrote:

 In article <ctq7qb$2lmg$1 digitaldaemon.com>,
 =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...

But I think the *performance* of D and DMD is more than good enough.
Right now I'm more concerned about the bugs and ever getting to "1.0"
(and porting GDC to the new GCC 4.0, would also be very interesting)

I agree, with the exception of DMD floating point which I hope will be given some attention before 1.0.

Have you tested the current floating point performance of gdc, compared to gcc/g++? This would give a clue about whether it is a problem of the front end or the DM backend.

There has been some of that for floating point posted here (on the NG) a while back -- oopack and scimark ported by Thomas Kuehn. IIRC, generally what it showed was that GDC significantly outperformed DMD and also in the case of scimark, that GDC actually performed a bit better than GCC and was very close to Intel, so the frontend doesn't appear to be the issue. My own experience is that DMD is very good for int. A good example of this is that the gc generally seems to run faster for DMD than GDC. For the FP that I'm familiar with (not cache dependent heavy-duty numerics), it looks to me like DMD just needs to make as good of use of the FP registers as it does with the GP registers ;) - Dave
Feb 02 2005
prev sibling next sibling parent reply zwang <nehzgnaw gmail.com> writes:
In general, rewriting C in assembly doesn't improve much,
since modern compilers are good at optimizing general-purpose code. 
Where hand-tuned assembly can often boost the performance is with
programs that may exploit MMX & SSE instructions.



Norbert Nemec wrote:
 Writing assembler to get performance is about as outdated as counting
 cycles. As I said before: if the code is simple enough to write it in
 assembler, it is also simple enough for a reasonable compiler to optimize
 it to the same extent.
 
 To exploit the full power of a modern processor, you have to do the right
 amount of loop unrolling, loop fusing, command interlacing and so on. YOu
 have to play with the data layout in memory, perhaps chunking arrays into
 smaller pieces. There are several more techniques to use, when you want to
 make full use of pipelining, branch prediction, cache lines and so on.
 
 Languages like Fortran 95 would in principle allow to compiler to do all of
 this automatically (Some good implementations begin to emerge.)
 
 Doing all of it by hand in C results in complete spaghetti code, but it is
 possible if you know exactly what you are doing. (In a course we did, we
 eventually transformed one single loop into an equivalent of ~500 lines of
 highly optimized spaghetti. The result was ten times faster than the
 original and somewhere around 80% of the absolute theoretical limit of the
 processor.
 
 The result was still pure C and therefore completely portable. The
 performance was, of course, tuned to one specific architecture, but there
 were basically constants to adjust for tuning it for about any modern
 processor.
 
 Doing the same thing in assembler would probably not be much faster. (After
 you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
 the effort. 80% peak performance is already well beyond what people usually
 go for.)
 
 Furthermore, writing that kind of spaghetti code in C without getting an
 error in already needs a lot of discipline. Doing the same thing in
 assembler will probably land you in the next psychiatry...

Feb 02 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
zwang wrote:

 In general, rewriting C in assembly doesn't improve much,
 since modern compilers are good at optimizing general-purpose code. 
 Where hand-tuned assembly can often boost the performance is with
 programs that may exploit MMX & SSE instructions.

Then again, modern compilers can use those instructions too... Here's for GDC's speedily porting to GCC 4.0, that does those. --anders
Feb 02 2005
prev sibling next sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <ctq2di$2gbd$1 digitaldaemon.com>, Norbert Nemec says...

The result was still pure C and therefore completely portable. The
performance was, of course, tuned to one specific architecture, but there
were basically constants to adjust for tuning it for about any modern
processor.

Doing the same thing in assembler would probably not be much faster. (After
you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
the effort. 80% peak performance is already well beyond what people usually
go for.)

If you still have that code handy, would it be possible to run it through DMD and GDC and post the results vs., say, GCC (and Intel C/++ if it's available)? Just out of curiousity.. Thanks, - Dave
Feb 02 2005
parent Norbert Nemec <Norbert Nemec-online.de> writes:
Dave wrote:

 In article <ctq2di$2gbd$1 digitaldaemon.com>, Norbert Nemec says...

The result was still pure C and therefore completely portable. The
performance was, of course, tuned to one specific architecture, but there
were basically constants to adjust for tuning it for about any modern
processor.

Doing the same thing in assembler would probably not be much faster.
(After you went from 8% to 80%, the remaining factor of 1.25 probably
isn't worth the effort. 80% peak performance is already well beyond what
people usually go for.)

If you still have that code handy, would it be possible to run it through DMD and GDC and post the results vs., say, GCC (and Intel C/++ if it's available)?

Already found out to my disappointment that I don't have it on my local harddisk any more. Have to dig up some old backups... In any case, I would not expect very conclusive results. The code was plain ANSI C and did not depend on any compiler optimizations. Furthermore, it was tuned to a specific Alpha processor (which had a comparatively simple cache structure) The techniques were rather general, but the specifics were tuned exactly to that one machine which I don't have any access to any more. Anyhow: I'll try to dig up the code and see in which state it is.
Feb 02 2005
prev sibling parent reply Brian Chapman <nospam-for-brian see-post-for-address.net> writes:
On 2005-02-02 02:18:42 -0600, Norbert Nemec <Norbert Nemec-online.de> said:

 Writing assembler to get performance is about as outdated as counting
 cycles.

I disagree with this statement so very much I wouldn't even know where to begin. Granted it's all depends on the situation and I'm not talking about writing generalized code. It's pointless to continue this because the argument is as old as a PDP-11 rotting away in an MIT basement. We could debate this till were blue in the face I won't convince you to break out an assembler and you wont convince me that a compiler can (or should) do it better. Were just going to have to agree to disagree even though I don't even know what the point of this tread is supposed to be anymore.
 As I said before: if the code is simple enough to write it in
 assembler, it is also simple enough for a reasonable compiler to optimize
 it to the same extent.
 
 To exploit the full power of a modern processor, you have to do the right
 amount of loop unrolling, loop fusing, command interlacing and so on. YOu
 have to play with the data layout in memory, perhaps chunking arrays into
 smaller pieces. There are several more techniques to use, when you want to
 make full use of pipelining, branch prediction, cache lines and so on.

Maybe. Or if you had a copy of the CPU's programmer's manual you could just inline a nice slick column of opcodes and do what you want exactly instead of crossing your fingers when you type make or spending all day with a profiler trying various C idioms to various results. I'd rather take the compilers assembly output, grumble once, rewrite it properly and inline it back in.
 Languages like Fortran 95 would in principle allow to compiler to do all of
 this automatically (Some good implementations begin to emerge.)

Well you're most certainly never going to convince me to code in Fortran.
 Doing all of it by hand in C results in complete spaghetti code, but it is
 possible if you know exactly what you are doing. (In a course we did, we
 eventually transformed one single loop into an equivalent of ~500 lines of
 highly optimized spaghetti. The result was ten times faster than the
 original and somewhere around 80% of the absolute theoretical limit of the
 processor.

500 line Duff devices don't impress me. They make me want to do everybody a big favor and promptly delete the last copy of the offending source file on the spot.
 The result was still pure C and therefore completely portable. The
 performance was, of course, tuned to one specific architecture, but there
 were basically constants to adjust for tuning it for about any modern
 processor.
 
 Doing the same thing in assembler would probably not be much faster. (After
 you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth
 the effort. 80% peak performance is already well beyond what people usually
 go for.)
 
 Furthermore, writing that kind of spaghetti code in C without getting an
 error in already needs a lot of discipline. Doing the same thing in
 assembler will probably land you in the next psychiatry...

Heh, you obviously don't know machine. If that's the kind of stuff you want to write, you go for it man. I'd rather just put some inline SIMD code in an asm block. I don't care how good you think your Fortran compiler or disciplined spegetti code is, it's never gonna know how to fill up all vector piplines to normalize 16 vectors for the price of 4 or fire off a DMA chain to blit gigabytes of data at max utilization. But it's a free world. You can loop unroll, fuse, and chunk arrays if you want.
Feb 03 2005
next sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Brian Chapman wrote:

 ...it's never gonna know how to
 fill up all vector piplines to normalize 16 vectors for the price of 4 
 or fire off a DMA chain to blit gigabytes of data at max utilization.

Why?
Feb 03 2005
parent reply Brian Chapman <nospam-for-brian see-post-for-address.net> writes:
On 2005-02-03 09:59:01 -0600, Georg Wrede <georg.wrede nospam.org> said:

 Brian Chapman wrote:
 
 ...it's never gonna know how to
 fill up all vector piplines to normalize 16 vectors for the price of 4 
 or fire off a DMA chain to blit gigabytes of data at max utilization.

Why?

Excellent question. All enlightenment begins with asking a good "why?" If you really want to understand, I would invite you to start by reading some of the great information available at arstechnica.com.
Feb 03 2005
parent Georg Wrede <georg.wrede nospam.org> writes:
Brian Chapman wrote:
 On 2005-02-03 09:59:01 -0600, Georg Wrede <georg.wrede nospam.org> said:
 
 Brian Chapman wrote:

 ...it's never gonna know how to
 fill up all vector piplines to normalize 16 vectors for the price of 
 4 or fire off a DMA chain to blit gigabytes of data at max utilization.

Why?

Excellent question. All enlightenment begins with asking a good "why?"

Thank you!
 If you really want to understand, I would invite you to start by reading 
 some of the great information available at arstechnica.com.

Well, I at least hope the answer would be of general interest in this forum. Also, you seem to have a good idea of "why", based on the above quote. So, essentially a short(ish) answer would be appreciated.
Feb 04 2005
prev sibling parent Norbert Nemec <Norbert Nemec-online.de> writes:
Brian Chapman wrote:

 On 2005-02-02 02:18:42 -0600, Norbert Nemec <Norbert Nemec-online.de>
 said:
 
 Writing assembler to get performance is about as outdated as counting
 cycles.

I disagree with this statement so very much I wouldn't even know where to begin. Granted it's all depends on the situation and I'm not talking about writing generalized code. It's pointless to continue this because the argument is as old as a PDP-11 rotting away in an MIT basement. We could debate this till were blue in the face I won't convince you to break out an assembler and you wont convince me that a compiler can (or should) do it better. Were just going to have to agree to disagree even though I don't even know what the point of this tread is supposed to be anymore.

I agree - I've discussed with several people on this and hardly ever came to a conclusion. High-performace numerics experts would certainly agree on it. Old-school assemblists would never...
 To exploit the full power of a modern processor, you have to do the right
 amount of loop unrolling, loop fusing, command interlacing and so on. YOu
 have to play with the data layout in memory, perhaps chunking arrays into
 smaller pieces. There are several more techniques to use, when you want
 to make full use of pipelining, branch prediction, cache lines and so on.

Maybe. Or if you had a copy of the CPU's programmer's manual you could just inline a nice slick column of opcodes and do what you want exactly instead of crossing your fingers when you type make or spending all day with a profiler trying various C idioms to various results. I'd rather take the compilers assembly output, grumble once, rewrite it properly and inline it back in.

Well - do so, if you like to, just to realize that once you've spent hours optimizing your code
 Languages like Fortran 95 would in principle allow to compiler to do all
 of this automatically (Some good implementations begin to emerge.)

Well you're most certainly never going to convince me to code in Fortran.

Me neither, that's why I would like to see the same features in D - so far, Fortran 95 is the only widely-spread language with that kind of performance.
 Doing all of it by hand in C results in complete spaghetti code, but it
 is possible if you know exactly what you are doing. (In a course we did,
 we eventually transformed one single loop into an equivalent of ~500
 lines of highly optimized spaghetti. The result was ten times faster than
 the original and somewhere around 80% of the absolute theoretical limit
 of the processor.

500 line Duff devices don't impress me. They make me want to do everybody a big favor and promptly delete the last copy of the offending source file on the spot.

The algorithm was simple but nontrivial: solving partial differential equations. The original code was not stupidly coded, but just straightforward, as anyone would write it at the first shot unless they think of tricky issues of modern processor architecture. Back in the cycle counting times, the latter version would have been even slower, since it did many integer operations that the original did not need.
 Heh, you obviously don't know machine. If that's the kind of stuff you
 want to write, you go for it man. I'd rather just put some inline SIMD
 code in an asm block. I don't care how good you think your Fortran
 compiler or disciplined spegetti code is, it's never gonna know how to
 fill up all vector piplines to normalize 16 vectors for the price of 4
 or fire off a DMA chain to blit gigabytes of data at max utilization.

Why shouldn't it? As long as the compiler has the chance to reorder the instructions within certain constraints and has enough intelligence built-in to search for the optimum order it may do a pretty good job at crunching the numbers and find something quite efficient. The behaviour of the pipeline follows very strict rules that are different for each architecture. You put all the rules into a file and the compiler will optimize for a given architecture. Of course, this can only be done if the language gives the necessary flexibility. This is exactly the point why I believe that vectorized expressions in D are essential for high-performance computing.
 But it's a free world. You can loop unroll, fuse, and chunk arrays if you
 want.

I don't care about doing that myself. I would like to teach it to a compiler.
Feb 04 2005
prev sibling next sibling parent Kevin Bealer <Kevin_member pathlink.com> writes:
One potential solution:  Formalize the existing (or semi-well known) method for
"reserving space":  ie. X.reserve(N) => X.length = N; X.length = old_length;

By creating a real method called "reserve" the array could have its cake and eat
it too:  reserve would be required to allocate the memory, but NOT to clear it;
that would STILL happen when the length was bumped up, but now it could be done
lazy-style.

Additional benefit: objects which override [] could also do reserve(), and could
have special behaviour which is smarter than adjusting length() twice.  In most
cases, they would just pass the savings down by using reserve() instead of
length() on *their* underlying data structure.

I'm including a simple memory bandwidth meter quickie.

Kevin

:
:private import std.date;
:private import std.stdio;
:private import std.conv;
:
:int main(char[][] args)
:{
:    long N = 100;
:    long MB = 1024*1024;
:    long Z1 = 64*MB;
:    long Z = Z1;
:    char[] p;
:    
:    if (args.length > 1) {
:        N = toInt(args[1]);
:    }
:    
:    if (args.length > 2) {
:        Z = toInt(args[2]) * MB;
:    }
:    
:    if (! N) {
:        N = 1;
:    }
:    
:    if (Z < 1024) {
:        Z = 256*MB;
:    }
:    
:    writef("Looping %s times.\n", N);
:    writef("Writing %s bytes/loop.\n", Z);
:    
:    d_time t1 = getUTCtime();
:    
:    for(int i = 0; i<N; i++) {
:        p.length = Z;
:        if (p[p.length / 3] == 'c') {
:            writef("Have C\n");
:        }
:        
:        if (p[p.length - 1] == 'q') {
:            writef("Have Q\n");
:        }
:        
:        p[p.length / 3] = 'c';
:        p[p.length - 1] = 'q';
:        
:        p.length = 1234;
:    }
:    
:    d_time t2 = getUTCtime();
:    
:    double sec = ((t2-t1) + 0.0)/TicksPerSecond;
:    
:    writef("Time elapsed = %s [res=%s/s].\n",
:           sec, TicksPerSecond);
:    
:    writef("Mem b/w = %s MB / sec.\n", ((Z/MB)*N)/sec);
:    
:    return 0;
:}
:
Feb 01 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Manfred Nowak" <svv1999 hotmail.com> wrote in message
news:ctmtne$2car$1 digitaldaemon.com...
 I wrote about this before.

 There is a well known time/space-tradeoff for the preInitialization of
 arrays: using about three times the space one can lazy initialize an
 array.

 But this technic is useless within D, because of the automaatic
 preInitialization, which currently eats up about 3 cycles per byte.

 Please awaken to, that on a 3GHz machine the busy preInitalization of
 one GB then lasts one second. And the coming 64-bit-machine will have
 up to some TB of main memory. Current mainboards can already hold up to
 4GB, which ist the current main memory limit for win32.

 Check again, that to preInitialize one TB you have to wait more than 15
 minutes only to wait at least another 15 minutes until your
 videoediting can start, if no precautions are taken.

 We need an option to switch automatic preInitialization of arrays off.

There are several ways to create an array. If the array is statically initialized, it is initialized when it is demand paged in. There is no code generated to initialize it (in fact, there is no way to prevent this from happening!) Next, one can allocate arrays on the stack. These are normally initialized at runtime, but this can be turned off using the idiom outlined in www.digitalmars.com/d/memory.html#uninitializedarrays. And lastly, one can dynamically allocate arrays using new, in which case they are initialized, or using std.c.stdlib.malloc, in which case they are not, or any other allocator one wishes to use. P.S. there's no way to allocate a TB on the stack anyway <g> P.P.S. it's been suggested that the special initializer syntax: = void; mean "I know what I'm doing, don't initialize the variable" and I've been considering implementing it.
Feb 02 2005
next sibling parent reply Vathix <vathix dprogramming.com> writes:
 P.P.S. it's been suggested that the special initializer syntax:
     = void;
 mean "I know what I'm doing, don't initialize the variable" and I've been
 considering implementing it.

I like it, but will it work with 'new'? When newing arrays and value types one might also not want to initialize.
Feb 02 2005
parent "Walter" <newshound digitalmars.com> writes:
"Vathix" <vathix dprogramming.com> wrote in message
news:opslk5o5vckcck4r esi...
 P.P.S. it's been suggested that the special initializer syntax:
     = void;
 mean "I know what I'm doing, don't initialize the variable" and I've


 considering implementing it.

I like it, but will it work with 'new'?

No.
 When newing arrays and value types
 one might also not want to initialize.

True, but I don't think that's a good idea. The cases where initialization of an array *might* make a difference (the critical path in a program tends to be only in a small part of it) are so unusual it is not worth upsetting new. And frankly, uninitialized garbage in gc allocated data can cause problems with the mark/sweep algorithm, and would pull the rug out from doing a future type-aware gc. Use std.c.malloc for allocating uninitialized arrays; if it must be new'd, instead use a wrapper class that malloc's/free's an internal private array.
Feb 02 2005
prev sibling parent Andy Friesen <andy ikagames.com> writes:
Walter wrote:
 There are several ways to create an array. If the array is statically
 initialized, it is initialized when it is demand paged in. There is no code
 generated to initialize it (in fact, there is no way to prevent this from
 happening!)
 
 Next, one can allocate arrays on the stack. These are normally initialized
 at runtime, but this can be turned off using the idiom outlined in
 www.digitalmars.com/d/memory.html#uninitializedarrays.
 
 And lastly, one can dynamically allocate arrays using new, in which case
 they are initialized, or using std.c.stdlib.malloc, in which case they are
 not, or any other allocator one wishes to use.
 
 P.S. there's no way to allocate a TB on the stack anyway <g>
 
 P.P.S. it's been suggested that the special initializer syntax:
     = void;
 mean "I know what I'm doing, don't initialize the variable" and I've been
 considering implementing it.

All this makes me think that the only thing that really needs to be done is for this to be added to the FAQ. D's current behaviour is more or less ideal as it stands: uninitialized memory can be acquired without fuss, but it won't ever be done by accident. -- andy
Feb 02 2005