www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - BLADE 0.2Alpha: Vector operations with mixins, expression templates,

reply Don Clugston <dac nospam.com.au> writes:
I have been trying to come up with a convincing use case for the new 
mixins (and for metaprogramming in general). My best effort to date can 
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d

It generates near-optimal x87 asm code for BLAS1-style basic vector 
operations. 32, 64 and 80 bit vectors are all supported.

Compile with -version=BladeDebug to see the asm code which is generated.

Typical usage:

void main()
{
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
}

Notice that mixed-length operations (real[] + float[] - double[]) are 
supported.

Like the C++ Blitz++ library, expression templates are used to convert 
vector expressions into efficient element-wise operations. Unlike that 
library, however, there is no reliance on the compiler's optimiser. 
Instead, the expression template is manipulated as text, converted into 
postfix, and then passed to a simple CTFE compile-time assembler, which 
creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of 
x87 assembler. In fact, you probably need to have read Agner Fog's 
superb Pentium optimisation manual (www.agner.org).

Some observations:
* I was amazed at how simple the expression template code is (it is 
somewhat cluttered by the code to check for real/imaginary type mismatch 
errors).
* I've often read that the x87 floating-point stack is notoriously 
difficult for compilers to write code for, but it works quite well in 
this case.
* The major workarounds are:

- inability to define operators for built-in arrays (hence the use of 
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by 
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new 
mixins are. In this case, the mixin forms the _entire_ body of the 
function. This is an interesting situation which I think a language 
purist will find more palatable.

Enjoy.
Apr 03 2007
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
 I have been trying to come up with a convincing use case for the new 
 mixins (and for metaprogramming in general). My best effort to date can 
 be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
This is pretty incredible. Can I beg you to write an article about it?
Apr 03 2007
parent reply Don Clugston <dac nospam.com.au> writes:
Walter Bright wrote:
 Don Clugston wrote:
 I have been trying to come up with a convincing use case for the new 
 mixins (and for metaprogramming in general). My best effort to date 
 can be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
This is pretty incredible.
I thought you'd like it. The sheer power of D these days... "Flies like a rocket, steers like a bicycle."
 Can I beg you to write an article about it?
OK. Are you thinking somewhere like DDJ? It's nicely focused (central idea being something like "D's metaprogramming is so powerful, that compiler back-end optimisation can easily be performed in compile-time libraries") -- but it would require loads of explanation of D's coolest features. Could be a challenge to make it easily readable. But with the proposed changes to 'const', macros, and previous discussion about whether tuples should be automatically flattened, I have doubts over how stable that part of the language is. I wouldn't want to write something that didn't compile any more, or looked hopelessly obsolete by the time it was published. I've been bitten before -- CTFE blasted almost all of my previous metaprogramming code into the stone age <g>. What's the shelf life of this stuff?.
Apr 04 2007
next sibling parent Georg Wrede <georg nospam.org> writes:
Don Clugston wrote:
 Could be a challenge to make it easily readable.
I've some experience proofreading and commenting similar articles. Only too happy to oblige! Also, DDJ readers may not be the CUJ crowd, but still, they can handle advanced stuff. Besides, some similar magazines help you by rewriting the article anyway, because then it has the same overall "feel" that their other articles. Articles in SciAm for example feel the same, no matter who or from what continent is presented as the author.
Apr 04 2007
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
 Walter Bright wrote:
 Can I beg you to write an article about it?
OK. Are you thinking somewhere like DDJ?
Anywhere!
 It's nicely focused (central idea being something like "D's 
 metaprogramming is so powerful, that compiler back-end optimisation can 
 easily be performed in compile-time libraries") -- but it would require 
 loads of explanation of D's coolest features. Could be a challenge to 
 make it easily readable.
 
 But with the proposed changes to 'const', macros, and previous 
 discussion about whether tuples should be automatically flattened, I 
 have doubts over how stable that part of the language is. I wouldn't 
 want to write something that didn't compile any more, or looked 
 hopelessly obsolete by the time it was published.
 I've been bitten before -- CTFE blasted almost all of my previous 
 metaprogramming code into the stone age <g>. What's the shelf life of 
 this stuff?.
6 months from now, it's still a decade ahead of any other language.
Apr 04 2007
prev sibling parent reply "David B. Held" <dheld codelogicconsulting.com> writes:
Don Clugston wrote:
 [...]
 But with the proposed changes to 'const', macros, and previous 
 discussion about whether tuples should be automatically flattened, I 
 have doubts over how stable that part of the language is. I wouldn't 
 want to write something that didn't compile any more, or looked 
 hopelessly obsolete by the time it was published.
 I've been bitten before -- CTFE blasted almost all of my previous 
 metaprogramming code into the stone age <g>. What's the shelf life of 
 this stuff?.
Well, hopefully, if Walter breaks your code again, it will be to make it even more elegant (although, that will be a bit of a trick, I admit). Eventually, the language will be so powerful that you can't make your code any more compact, and then you won't have to worry about it getting broken. To be honest, I'd like to see an article about this in CUJ. I don't know if they would publish it or not, but if they did, I guarantee it would make more than a few people take a hard look at D. Dave
Apr 05 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
David B. Held wrote:
 Don Clugston wrote:
 [...]
 But with the proposed changes to 'const', macros, and previous 
 discussion about whether tuples should be automatically flattened, I 
 have doubts over how stable that part of the language is. I wouldn't 
 want to write something that didn't compile any more, or looked 
 hopelessly obsolete by the time it was published.
 I've been bitten before -- CTFE blasted almost all of my previous 
 metaprogramming code into the stone age <g>. What's the shelf life of 
 this stuff?.
Well, hopefully, if Walter breaks your code again, it will be to make it even more elegant (although, that will be a bit of a trick, I admit). Eventually, the language will be so powerful that you can't make your code any more compact, and then you won't have to worry about it getting broken. To be honest, I'd like to see an article about this in CUJ. I don't know if they would publish it or not, but if they did, I guarantee it would make more than a few people take a hard look at D.
I'm pretty sure DDJ would publish it, and it would reach a much wider audience than CUJ, which has disappeared <g>. Anyhow, Don, this kind of stuff needs to reach a much wider audience than a random posting in this n.g. would get.
Apr 05 2007
next sibling parent reply Dan <murpsoft hotmail.com> writes:
Current compilers tend to dump IR's to a buffer and then adjust them according
to parser rules.  The wrong approach.

I totally want to see this kind of programming get implemented into the
compiler itself.  I know it's too much to ask, but if DMD could generate
optimal floating point math and cache pipelines, it would rend the gaming
industry from C++'s hands REAL quick like.

A C++ programmer can say:
"oh yeah, it's a minor upgrade made for my convenience, but I'm already
conveniently comfortable with C++"...

Until a decision maker sees that D performs 20% better, dramatically reduces
debugging time, cuts source code volume, and has better version control.
Apr 05 2007
parent janderson <askme me.com> writes:
Dan wrote:
 Current compilers tend to dump IR's to a buffer and then adjust them according
to parser rules.  The wrong approach.
 
 I totally want to see this kind of programming get implemented into the
compiler itself.  I know it's too much to ask, but if DMD could generate
optimal floating point math and cache pipelines, it would rend the gaming
industry from C++'s hands REAL quick like.
 
 A C++ programmer can say:
 "oh yeah, it's a minor upgrade made for my convenience, but I'm already
conveniently comfortable with C++"...
 
 Until a decision maker sees that D performs 20% better, dramatically reduces
debugging time, cuts source code volume, and has better version control.
That would be nice and would help however I think it'll take more then that to convince a Game company to switch to D, particularly ones using XNA. It would have to be a startup company that is not working 360, perhaps a Nintendo, sony, Linux or perhaps windows. I say perhaps, because if a company is going to switch to anything its probably XNA. Now if we had D for Net, maybe the story would be slightly different. I do like the idea of using D for games, I just see it as being a hard sell. Scripting and tools are probably the main gateway for D into the game programming world. Once a few game companies start using it in their main project and can demonstrate productivity/speed improvements then the wall may indeed start to crumble. -Joel
Apr 07 2007
prev sibling parent reply Don Clugston <dac nospam.com.au> writes:
Walter Bright wrote:
 David B. Held wrote:
 Don Clugston wrote:
 [...]
 But with the proposed changes to 'const', macros, and previous 
 discussion about whether tuples should be automatically flattened, I 
 have doubts over how stable that part of the language is. I wouldn't 
 want to write something that didn't compile any more, or looked 
 hopelessly obsolete by the time it was published.
 I've been bitten before -- CTFE blasted almost all of my previous 
 metaprogramming code into the stone age <g>. What's the shelf life of 
 this stuff?.
Well, hopefully, if Walter breaks your code again, it will be to make it even more elegant (although, that will be a bit of a trick, I admit). Eventually, the language will be so powerful that you can't make your code any more compact, and then you won't have to worry about it getting broken. To be honest, I'd like to see an article about this in CUJ. I don't know if they would publish it or not, but if they did, I guarantee it would make more than a few people take a hard look at D.
I'm pretty sure DDJ would publish it, and it would reach a much wider audience than CUJ, which has disappeared <g>. Anyhow, Don, this kind of stuff needs to reach a much wider audience than a random posting in this n.g. would get.
Ooooooh.... I just updated the code, adding a code generator for D code, for when x87 is unavailable. I could not believe how easy it was. It was about 30 lines of trivial code. Yet directly generating a 'for' loop is still better than what Blitz++ does. I reckon it'd take less than a week to implement everything those libraries do. The language synergy of tuples, mixins, CTFE, builtin strings, the ~ operator, constant folding, array slicing, template string parameters, type deduction, defined inline asm, static if, is() expressions... is astonishing. *All* of them are required in order for this to work.
Apr 05 2007
next sibling parent Georg Wrede <georg nospam.org> writes:
Don Clugston wrote:
 Ooooooh....
 I just updated the code, adding a code generator for D code, for when 
 x87 is unavailable. I could not believe how easy it was. It was about 30 
 lines of trivial code. Yet directly generating a 'for' loop is still 
 better than what Blitz++ does.
 I reckon it'd take less than a week to implement everything those 
 libraries do.
 
 The language synergy of tuples, mixins, CTFE, builtin strings, the ~ 
 operator, constant folding, array slicing, template string parameters, 
 type deduction, defined inline asm, static if, is() expressions... is 
 astonishing. *All* of them are required in order for this to work.
Gees, Don, the above quote would be the perfect ingress (or at least a "quote box") in a DDJ article!!!! I'm clenching the chair so I wouldn't buy the tickets to come to your place and write it together with you, right now!
Apr 05 2007
prev sibling next sibling parent Georg Wrede <georg nospam.org> writes:
Quoting from the page:

 If you have an article idea, you should first send us an article
 proposal. This is a short statement (about one page) that says what your
 article will be about. You can mail it to our <a href="address.htm">office</a>
 or e-mail it to <a href="mailto:editors ddj.com">editors ddj.com</a>.
 Be sure to give us several ways to contact you (such as phone number,
 e-mail address, fax number, and mailing address). We'll look at your
 idea and get back to you, usually within a month.</p>
...
 <p>Although many people simply write their entire article and send it
 to us, there are several *advantages* to sending us a proposal. We can
 look at your idea and possibly suggest a particular slant you hadn't
 considered. We can also help you to organize the article.
Later, when you start writing the article itself:
 Write as if you are giving a brief, informal talk to a group of
 coworkers. You want to be concise and well organized because you don't
 want to waste their time, but you don't want to be too stuffy or
 formal because you know them.
Ah, and to give you more of the "feel" of being an article author, here's another quote from the same page:
 Illustrations and screen captures offer an opportunity to lure
 anyone looking through an issue into reading your article. They
 can also provide succinct summaries of complex ideas. Since we
 redraw most figures anyway, to conform to our size and style
 requirements,
 you should not worry about making your images very high quality.
 Many authors prefer to sketch them by hand and fax or mail them.
Heh, especially the last sentence. The same goes for the text itself. The point being, from the magazine's point of view, most stuff sent to them is by teenagers, idiots, mentals, over-the-hill old-timers -- and only a handful is stuff they'd be interested in. Now, for those few, they'd kill, lick your boots, or even sharpen your pencil for you. As to whether this would interest them, all you have to send them is the above mentioned Ingress. Let them take over from that. Simply naming Walter as a reference should get you a definite YES/NO from them. And once you get the YES, they'll hold your hand through and through. --- For you personally, having your article in DDJ is of course "nice". But the real benefit is that then you've done it, been there. So, by the time D advances yet another bit, and your "template jujitsu" turns into "template nuke" (as perceived from the audience -- you'll never notice the change yourself!), it's gonna be a lot easier to jot up something for them again. Or for the next magazine. (Ahem, I'm ashamed to say, but at least as I experienced, my first article was a bit like "doing it for the first time with a woman". Once you get over it, the next one is a piece of cake.) Thusly, both now and in the future, you're in a position to do D a /lot/ of good -- with effort that's actually less than some of your better D forages to date. PS, if they get back to you within /less/ than 4 weeks, they're drooling. Unless they've read this post, of course. ;-)
Apr 05 2007
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
 Ooooooh....
 I just updated the code, adding a code generator for D code, for when 
 x87 is unavailable. I could not believe how easy it was. It was about 30 
 lines of trivial code. Yet directly generating a 'for' loop is still 
 better than what Blitz++ does.
 I reckon it'd take less than a week to implement everything those 
 libraries do.
 
 The language synergy of tuples, mixins, CTFE, builtin strings, the ~ 
 operator, constant folding, array slicing, template string parameters, 
 type deduction, defined inline asm, static if, is() expressions... is 
 astonishing. *All* of them are required in order for this to work.
Just wait 'till you get AST macros! But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 months. This needs to get out now, - I suggest www.artima.com, followed up with dzone, slashdot, and digg. It would also be a real treat to have you present this at Amazon's D conference in August.
Apr 05 2007
parent reply Dan <murpsoft hotmail.com> writes:
Walter Bright Wrote:
 Just wait 'till you get AST macros!
 
 But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 months. 
 This needs to get out now, - I suggest www.artima.com, followed up with 
 dzone, slashdot, and digg.
 
 It would also be a real treat to have you present this at Amazon's D 
 conference in August.
Wow. That's the most excited I've seen Walter. : ) I'm going to have to read and understand your code. I can tell.
Apr 05 2007
parent reply Georg Wrede <georg nospam.org> writes:
Dan wrote:
 Walter Bright Wrote:
 
Just wait 'till you get AST macros!

But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 months. 
This needs to get out now, - I suggest www.artima.com, followed up with 
dzone, slashdot, and digg.

It would also be a real treat to have you present this at Amazon's D 
conference in August.
Wow. That's the most excited I've seen Walter. : )
LOL, have to admit I agree!!!
Apr 05 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Georg Wrede wrote:
 Dan wrote:
 Walter Bright Wrote:

 Just wait 'till you get AST macros!

 But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 
 months. This needs to get out now, - I suggest www.artima.com, 
 followed up with dzone, slashdot, and digg.

 It would also be a real treat to have you present this at Amazon's D 
 conference in August.
Wow. That's the most excited I've seen Walter. : )
LOL, have to admit I agree!!!
It really is very cool and all, but did I just miss the benchmarks? I need numbers, man! You can generate all the assembly you want, but at the end of the day its the numbers that people care about. --bb
Apr 05 2007
parent Don Clugston <dac nospam.com.au> writes:
Bill Baxter wrote:
 Georg Wrede wrote:
 Dan wrote:
 Walter Bright Wrote:

 Just wait 'till you get AST macros!

 But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 
 months. This needs to get out now, - I suggest www.artima.com, 
 followed up with dzone, slashdot, and digg.

 It would also be a real treat to have you present this at Amazon's D 
 conference in August.
Wow. That's the most excited I've seen Walter. : )
LOL, have to admit I agree!!!
It really is very cool and all, but did I just miss the benchmarks? I need numbers, man! You can generate all the assembly you want, but at the end of the day its the numbers that people care about. --bb
You're right, but it wasn't posted as a finished product! I post this sort of thing every now and again, largely to give Walter some feedback as to what the language is doing at the bleeding edge. In this case, I thought it would be relevant to AST macro development. Also I posted this to get an idea if there was much interest in this sort of thing -- I don't have a great deal of free time, so I want to make good use of it. I only got the code working a few days ago. The speed of the code will likely be affected by how well DMD optimises the tuple copying. I even haven't looked at what code it's actually generating.
Apr 06 2007
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
 * I've often read that the x87 floating-point stack is notoriously 
 difficult for compilers to write code for, but it works quite well in 
 this case.
The hard part is doing register allocation. Generating code for the stack machine is trivial. There are some complications for dealing with 87 stack overflows.
Apr 03 2007
prev sibling next sibling parent reply Pragma <ericanderton yahoo.removeme.com> writes:
Don Clugston wrote:
 I have been trying to come up with a convincing use case for the new 
 mixins (and for metaprogramming in general). My best effort to date can 
 be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
 
 It generates near-optimal x87 asm code for BLAS1-style basic vector 
 operations. 32, 64 and 80 bit vectors are all supported.
 
 Compile with -version=BladeDebug to see the asm code which is generated.
 
 Typical usage:
 
 void main()
 {
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
 }
 
 Notice that mixed-length operations (real[] + float[] - double[]) are 
 supported.
 
 Like the C++ Blitz++ library, expression templates are used to convert 
 vector expressions into efficient element-wise operations. Unlike that 
 library, however, there is no reliance on the compiler's optimiser. 
 Instead, the expression template is manipulated as text, converted into 
 postfix, and then passed to a simple CTFE compile-time assembler, which 
 creates highly efficient asm code which is used as a mixin.
 To understand the later parts of the code, you need some knowledge of 
 x87 assembler. In fact, you probably need to have read Agner Fog's 
 superb Pentium optimisation manual (www.agner.org).
 
 Some observations:
 * I was amazed at how simple the expression template code is (it is 
 somewhat cluttered by the code to check for real/imaginary type mismatch 
 errors).
 * I've often read that the x87 floating-point stack is notoriously 
 difficult for compilers to write code for, but it works quite well in 
 this case.
 * The major workarounds are:

 - inability to define operators for built-in arrays (hence the use of 
 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by 
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new 
 mixins are. In this case, the mixin forms the _entire_ body of the 
 function. This is an interesting situation which I think a language 
 purist will find more palatable.
 
 Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice job. :) For others that are interested in how this actually gets the job done, I found this in your documentation: * THEORY: * Expression templates are used to create an expression string of the form "(a+b*c)+d" * and a tuple, the entries of which correspond to a, b, c, d, ... * This string is converted to postfix. The postfix string is converted to * a string containing x87 asm, which is then mixed into a function which accepts the tuple. -- - EricAnderton at yahoo
Apr 03 2007
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Pragma wrote:
 Don Clugston wrote:
 I have been trying to come up with a convincing use case for the new 
 mixins (and for metaprogramming in general). My best effort to date 
 can be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d

 It generates near-optimal x87 asm code for BLAS1-style basic vector 
 operations. 32, 64 and 80 bit vectors are all supported.

 Compile with -version=BladeDebug to see the asm code which is generated.

 Typical usage:

 void main()
 {
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
 }

 Notice that mixed-length operations (real[] + float[] - double[]) are 
 supported.

 Like the C++ Blitz++ library, expression templates are used to convert 
 vector expressions into efficient element-wise operations. Unlike that 
 library, however, there is no reliance on the compiler's optimiser. 
 Instead, the expression template is manipulated as text, converted 
 into postfix, and then passed to a simple CTFE compile-time assembler, 
 which creates highly efficient asm code which is used as a mixin.
 To understand the later parts of the code, you need some knowledge of 
 x87 assembler. In fact, you probably need to have read Agner Fog's 
 superb Pentium optimisation manual (www.agner.org).

 Some observations:
 * I was amazed at how simple the expression template code is (it is 
 somewhat cluttered by the code to check for real/imaginary type 
 mismatch errors).
 * I've often read that the x87 floating-point stack is notoriously 
 difficult for compilers to write code for, but it works quite well in 
 this case.
 * The major workarounds are:

 - inability to define operators for built-in arrays (hence the use of 
 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by 
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new 
 mixins are. In this case, the mixin forms the _entire_ body of the 
 function. This is an interesting situation which I think a language 
 purist will find more palatable.

 Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice job. :) For others that are interested in how this actually gets the job done, I found this in your documentation: * THEORY: * Expression templates are used to create an expression string of the form "(a+b*c)+d" * and a tuple, the entries of which correspond to a, b, c, d, ... * This string is converted to postfix. The postfix string is converted to * a string containing x87 asm, which is then mixed into a function which accepts the tuple.
Hum, a minor question, is a string representation of the expressions better (easier to use, manipulate, etc.) than a tree representation? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Apr 07 2007
parent reply Pragma <ericanderton yahoo.removeme.com> writes:
Bruno Medeiros wrote:
 Pragma wrote:
 Don Clugston wrote:
 I have been trying to come up with a convincing use case for the new 
 mixins (and for metaprogramming in general). My best effort to date 
 can be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d 


 It generates near-optimal x87 asm code for BLAS1-style basic vector 
 operations. 32, 64 and 80 bit vectors are all supported.

 Compile with -version=BladeDebug to see the asm code which is generated.

 Typical usage:

 void main()
 {
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
 }

 Notice that mixed-length operations (real[] + float[] - double[]) are 
 supported.

 Like the C++ Blitz++ library, expression templates are used to 
 convert vector expressions into efficient element-wise operations. 
 Unlike that library, however, there is no reliance on the compiler's 
 optimiser. Instead, the expression template is manipulated as text, 
 converted into postfix, and then passed to a simple CTFE compile-time 
 assembler, which creates highly efficient asm code which is used as a 
 mixin.
 To understand the later parts of the code, you need some knowledge of 
 x87 assembler. In fact, you probably need to have read Agner Fog's 
 superb Pentium optimisation manual (www.agner.org).

 Some observations:
 * I was amazed at how simple the expression template code is (it is 
 somewhat cluttered by the code to check for real/imaginary type 
 mismatch errors).
 * I've often read that the x87 floating-point stack is notoriously 
 difficult for compilers to write code for, but it works quite well in 
 this case.
 * The major workarounds are:

 - inability to define operators for built-in arrays (hence the use of 
 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by 
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new 
 mixins are. In this case, the mixin forms the _entire_ body of the 
 function. This is an interesting situation which I think a language 
 purist will find more palatable.

 Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice job. :) For others that are interested in how this actually gets the job done, I found this in your documentation: * THEORY: * Expression templates are used to create an expression string of the form "(a+b*c)+d" * and a tuple, the entries of which correspond to a, b, c, d, ... * This string is converted to postfix. The postfix string is converted to * a string containing x87 asm, which is then mixed into a function which accepts the tuple.
Hum, a minor question, is a string representation of the expressions better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this. In short: no. But it's really an unintentionally loaded question: there are some rather severe limitations as to what kind of data structures you can create at compile time. You're basically limited to tuples and strings, each of which have drawbacks of their own. You can create a tree using tuples, but then you run into problems with over-running the limitations on identifier length built into the compiler. Strings are no where near as flexible out of the box, but pose no storage limitations, which gives a slight edge to string-based representations of data. With some clever coding, strings can be made to store just about any structure imaginable (even if it does make for some ugly code). -- - EricAnderton at yahoo
Apr 09 2007
next sibling parent reply Dan <murpsoft hotmail.com> writes:
Pragma Wrote:
 In short: no.  But it's really an unintentionally loaded question: there are
some rather severe limitations as to what 
 kind of data structures you can create at compile time.  You're basically
limited to tuples and strings, each of which 
 have drawbacks of their own.
 
 You can create a tree using tuples, but then you run into problems with
over-running the limitations on identifier 
 length built into the compiler.  Strings are no where near as flexible out of
the box, but pose no storage limitations, 
 which gives a slight edge to string-based representations of data.  With some
clever coding, strings can be made to 
 store just about any structure imaginable (even if it does make for some ugly
code).
Well said. Once the AST is reflected at compile time, I'm sure the code can be made vastly more refined; and perhaps additional code generators can be developed and *hopefully* integrated into D. Particular generators that spark my interest tend to have to do with x87, SSE extensions, and x86-64. Most compilers to date don't properly use these functionalities. Having them integrated into D Agner Fog-optimally would hugely replace alot of C++ gaming, video, rendering and graphics engine code. *bigsmile
Apr 09 2007
next sibling parent Dave <Dave_member pathlink.com> writes:
Dan wrote:
 Pragma Wrote:
 In short: no.  But it's really an unintentionally loaded question: there are
some rather severe limitations as to what 
 kind of data structures you can create at compile time.  You're basically
limited to tuples and strings, each of which 
 have drawbacks of their own.

 You can create a tree using tuples, but then you run into problems with
over-running the limitations on identifier 
 length built into the compiler.  Strings are no where near as flexible out of
the box, but pose no storage limitations, 
 which gives a slight edge to string-based representations of data.  With some
clever coding, strings can be made to 
 store just about any structure imaginable (even if it does make for some ugly
code).
Well said. Once the AST is reflected at compile time, I'm sure the code can be made vastly more refined; and perhaps additional code generators can be developed and *hopefully* integrated into D. Particular generators that spark my interest tend to have to do with x87, SSE extensions, and x86-64. Most compilers to date don't properly use these functionalities. Having them integrated into D Agner Fog-optimally would hugely replace alot of C++ gaming, video, rendering and graphics engine code. *bigsmile
Hmmm - I wonder if a lot of what Don's been doing is serving as a "proof of concept" for the D 2.0 compiler, to be written in D of course <g> (It just seems like a lot of the new D features fit so well with compiler development. Especially for things like "proving constness", etc.) - Dave
Apr 09 2007
prev sibling next sibling parent Pragma <ericanderton yahoo.removeme.com> writes:
Dan wrote:
 Pragma Wrote:
 In short: no.  But it's really an unintentionally loaded question: there are
some rather severe limitations as to what 
 kind of data structures you can create at compile time.  You're basically
limited to tuples and strings, each of which 
 have drawbacks of their own.

 You can create a tree using tuples, but then you run into problems with
over-running the limitations on identifier 
 length built into the compiler.  Strings are no where near as flexible out of
the box, but pose no storage limitations, 
 which gives a slight edge to string-based representations of data.  With some
clever coding, strings can be made to 
 store just about any structure imaginable (even if it does make for some ugly
code).
Well said. Once the AST is reflected at compile time, I'm sure the code can be made vastly more refined; and perhaps additional code generators can be developed and *hopefully* integrated into D.
Thanks. But evaluating the AST will only take things so far. There is a huge gulf of functionality that stands between where we are now (at compile time) and what D would be like as a full-on scripting language (not that I'm advocating that it be pushed that far). Much of what sits in the middle (namely structs, multi-dimensional arrays and maps) can go a long way towards *analyzing* that tree for code generation and more.
 
 Particular generators that spark my interest tend to have to do with x87, SSE
extensions, and x86-64.  Most compilers to date don't properly use these
functionalities.  Having them integrated into D Agner Fog-optimally would
hugely replace alot of C++ gaming, video, rendering and graphics engine code.
 
Anyway, I'm grinning too. I can't wait to see what comes up next. So far, converting Enki to a compile-time generator is proving to be a huge challenge, but workable. -- - EricAnderton at yahoo
Apr 09 2007
prev sibling parent reply Paul Findlay <r.lph50+d gmail.com> writes:
 Particular generators that spark my interest tend to have to do with x87,
 SSE extensions, and x86-64.  Most compilers to date don't properly use
 these functionalities.  Having them integrated into D Agner Fog-optimally
 would hugely replace alot of C++ gaming, video, rendering and graphics
 engine code.
And its dawning on me that some text processing can take advantage of doing 64-bit/128-bit chunks at a time - Paul
Apr 09 2007
parent Dan <murpsoft hotmail.com> writes:
Paul Findlay Wrote:

 Particular generators that spark my interest tend to have to do with x87,
 SSE extensions, and x86-64.  Most compilers to date don't properly use
 these functionalities.  Having them integrated into D Agner Fog-optimally
 would hugely replace alot of C++ gaming, video, rendering and graphics
 engine code.
And its dawning on me that some text processing can take advantage of doing 64-bit/128-bit chunks at a time
Yeah, pretty much any large (>1kb) buffer copy algorithm runs fastest through SSE2 ever since 128-bit came out. The raw power overcomes the need for tweaking the loop instead of just using rep movsd. I think the asm guys have the best code we could hope for already written up on a few pages. Agner Fog, Paul Hsieh et al. If we could leech their strategy (if meaning legal) then I can see a good use for such a thing. I guess we just ought to wait for AST reflection.
Apr 10 2007
prev sibling parent reply Don Clugston <dac nospam.com.au> writes:
Pragma wrote:
 Bruno Medeiros wrote:
 Pragma wrote:
 Don Clugston wrote:
 I have been trying to come up with a convincing use case for the new 
 mixins (and for metaprogramming in general). My best effort to date 
 can be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d 


 It generates near-optimal x87 asm code for BLAS1-style basic vector 
 operations. 32, 64 and 80 bit vectors are all supported.

 Compile with -version=BladeDebug to see the asm code which is 
 generated.

 Typical usage:

 void main()
 {
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
 }

 Notice that mixed-length operations (real[] + float[] - double[]) 
 are supported.

 Like the C++ Blitz++ library, expression templates are used to 
 convert vector expressions into efficient element-wise operations. 
 Unlike that library, however, there is no reliance on the compiler's 
 optimiser. Instead, the expression template is manipulated as text, 
 converted into postfix, and then passed to a simple CTFE 
 compile-time assembler, which creates highly efficient asm code 
 which is used as a mixin.
 To understand the later parts of the code, you need some knowledge 
 of x87 assembler. In fact, you probably need to have read Agner 
 Fog's superb Pentium optimisation manual (www.agner.org).

 Some observations:
 * I was amazed at how simple the expression template code is (it is 
 somewhat cluttered by the code to check for real/imaginary type 
 mismatch errors).
 * I've often read that the x87 floating-point stack is notoriously 
 difficult for compilers to write code for, but it works quite well 
 in this case.
 * The major workarounds are:

 - inability to define operators for built-in arrays (hence the use 
 of 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by 
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new 
 mixins are. In this case, the mixin forms the _entire_ body of the 
 function. This is an interesting situation which I think a language 
 purist will find more palatable.

 Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice job. :) For others that are interested in how this actually gets the job done, I found this in your documentation: * THEORY: * Expression templates are used to create an expression string of the form "(a+b*c)+d" * and a tuple, the entries of which correspond to a, b, c, d, ... * This string is converted to postfix. The postfix string is converted to * a string containing x87 asm, which is then mixed into a function which accepts the tuple.
Hum, a minor question, is a string representation of the expressions better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this. In short: no. But it's really an unintentionally loaded question: there are some rather severe limitations as to what kind of data structures you can create at compile time. You're basically limited to tuples and strings, each of which have drawbacks of their own. You can create a tree using tuples, but then you run into problems with over-running the limitations on identifier length built into the compiler.
Actually I don't know how to make a tree nicely with tuples. Strings are no where near as flexible out of the box, but
 pose no storage limitations, which gives a slight edge to string-based 
 representations of data.  With some clever coding, strings can be made 
 to store just about any structure imaginable (even if it does make for 
 some ugly code).
There's a couple of minor things, though: (1) in the case of generating D code to mixin, no tree is required (the string just gets mixed back in again, so it might as well stay as a string). (2) The "separate string and tuple" structure means the expression can be optimised without needing to move any of the values around; this creates less garbage for the compiler to optimise away. (3) Some patterns are probably easier to find with a string, than a tree. (4) It's more natural for D coders to think in terms of arrays, than lists. (and there is much more language support). (5) A string representation can easily express things like c=a+3*(a+b); where 'a' appears twice. (At present, there's no way to generate it, but theoretically the compiler could provide it many cases). So -- if trees were available, I'm not sure if I'd use them, or not.
Apr 09 2007
parent reply Pragma <ericanderton yahoo.removeme.com> writes:
Don Clugston wrote:
 Pragma wrote:
 Bruno Medeiros wrote:
 Pragma wrote:
 Don Clugston wrote:
 I have been trying to come up with a convincing use case for the 
 new mixins (and for metaprogramming in general). My best effort to 
 date can be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d 


 It generates near-optimal x87 asm code for BLAS1-style basic vector 
 operations. 32, 64 and 80 bit vectors are all supported.

 Compile with -version=BladeDebug to see the asm code which is 
 generated.

 Typical usage:

 void main()
 {
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
 }

 Notice that mixed-length operations (real[] + float[] - double[]) 
 are supported.

 Like the C++ Blitz++ library, expression templates are used to 
 convert vector expressions into efficient element-wise operations. 
 Unlike that library, however, there is no reliance on the 
 compiler's optimiser. Instead, the expression template is 
 manipulated as text, converted into postfix, and then passed to a 
 simple CTFE compile-time assembler, which creates highly efficient 
 asm code which is used as a mixin.
 To understand the later parts of the code, you need some knowledge 
 of x87 assembler. In fact, you probably need to have read Agner 
 Fog's superb Pentium optimisation manual (www.agner.org).

 Some observations:
 * I was amazed at how simple the expression template code is (it is 
 somewhat cluttered by the code to check for real/imaginary type 
 mismatch errors).
 * I've often read that the x87 floating-point stack is notoriously 
 difficult for compilers to write code for, but it works quite well 
 in this case.
 * The major workarounds are:

 - inability to define operators for built-in arrays (hence the use 
 of 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by 
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new 
 mixins are. In this case, the mixin forms the _entire_ body of the 
 function. This is an interesting situation which I think a language 
 purist will find more palatable.

 Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice job. :) For others that are interested in how this actually gets the job done, I found this in your documentation: * THEORY: * Expression templates are used to create an expression string of the form "(a+b*c)+d" * and a tuple, the entries of which correspond to a, b, c, d, ... * This string is converted to postfix. The postfix string is converted to * a string containing x87 asm, which is then mixed into a function which accepts the tuple.
Hum, a minor question, is a string representation of the expressions better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this. In short: no. But it's really an unintentionally loaded question: there are some rather severe limitations as to what kind of data structures you can create at compile time. You're basically limited to tuples and strings, each of which have drawbacks of their own. You can create a tree using tuples, but then you run into problems with over-running the limitations on identifier length built into the compiler.
Actually I don't know how to make a tree nicely with tuples.
Here's an example (for grins): import std.stdio; template Node(char[] Data,Nodes...){ alias Data data; alias Nodes nodes; } template PrintData(char[] parent){ const char[] PrintData = ""; } template PrintData(char[] parent,alias Node){ static if(parent == ""){ const char[] PrintData = Node.data ~ "\n" ~ PrintData!(Node.data,Node.nodes); } else{ const char[] PrintData = parent ~ "." ~ Node.data ~ "\n" ~ PrintData!(parent ~ "." ~ Node.data,Node.nodes); } } template PrintData(char[] parent,alias Node,V...){ const char[] PrintData = PrintData!(parent,Node) ~ PrintData!(parent,V); } // example "tree" structure alias Node!("1", Node!("1"), Node!("2", Node!("1"), Node!("2") ) ) dataset; void main(){ writefln("%s",PrintData!("",dataset)); }
 
  Strings are no where near as flexible out of the box, but
 pose no storage limitations, which gives a slight edge to string-based 
 representations of data.  With some clever coding, strings can be made 
 to store just about any structure imaginable (even if it does make for 
 some ugly code).
There's a couple of minor things, though: (1) in the case of generating D code to mixin, no tree is required (the string just gets mixed back in again, so it might as well stay as a string). (2) The "separate string and tuple" structure means the expression can be optimised without needing to move any of the values around; this creates less garbage for the compiler to optimise away. (3) Some patterns are probably easier to find with a string, than a tree. (4) It's more natural for D coders to think in terms of arrays, than lists. (and there is much more language support). (5) A string representation can easily express things like c=a+3*(a+b); where 'a' appears twice. (At present, there's no way to generate it, but theoretically the compiler could provide it many cases). So -- if trees were available, I'm not sure if I'd use them, or not.
They're really handy. I've found some uses for them already: /* ZeroOrMoreExpr = generate.ZeroOrMoreExpr() ::= "[" "{" Expression:expr "}" "]" [RangeSetExpr:ranges ] [Binding:binding ] ["*" SubExpressionTerm:delim ] [SubExpressionTerm:term]; */ bool ZeroOrMoreExpr(inout char[] value,inout char[] bindings,inout char[] tokens,inout char[] err){ return Rule!( generate.ZeroOrMoreExpr, And!( Literal!("["), Literal!("{"), Bind!(Expression,"expr"), Literal!("}"), Literal!("]"), Optional!( Bind!(RangeSetExpr,"ranges") ), Optional!( Bind!(Binding,"binding") ), Optional!( And!( Literal!("*"), Bind!(SubExpressionTerm,"delim") ) ), Bind!(SubExpressionTerm,"term") ) )(value,bindings,tokens,err); } There's a *lot* going on in this sample. Basically what you see here is a chunk of the as-of-yet-experimental compile-time Enki parser. This piece parses the Zero-or-more expression part of the EBNF variant that Enki supports. The templates evaluate to CTFE's that in turn make up the parser when it's executed. So there's several layers of compile-time evaluation going on here. Also, what's not obvious is that "char[] tokens" is actually an *array of strings* that is stored in a single char[] array; each string's length data is encoded as a size_t mapped onto the appropriate number of chars. The Bind!() expressions also map parsed out data to a key/value set which is stored in a similar fashion (char[] bindings). The goals here were to side-step the limitations of D's identifier name-length while providing a user-readable toolkit for parser composition; the only limit now is placed on the length of any given rule. I haven't found a way to unify the compile-time and run-time parser generation yet (one backend with two targets), but I anticipate using something very similar to the above by providing a CT and RT version of the template library. At a minimum I plan on having this consume a .bnf file at compile-time, in order to create a run-time parser via mixin. -- - EricAnderton at yahoo
Apr 09 2007
next sibling parent reply Don Clugston <dac nospam.com.au> writes:
Pragma wrote:
 Don Clugston wrote:
 Pragma wrote:
 Bruno Medeiros wrote:
 Pragma wrote:
 Don Clugston wrote:
 I have been trying to come up with a convincing use case for the 
 new mixins (and for metaprogramming in general). My best effort to 
 date can be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d 


 It generates near-optimal x87 asm code for BLAS1-style basic 
 vector operations. 32, 64 and 80 bit vectors are all supported.

 Compile with -version=BladeDebug to see the asm code which is 
 generated.

 Typical usage:

 void main()
 {
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
 }

 Notice that mixed-length operations (real[] + float[] - double[]) 
 are supported.

 Like the C++ Blitz++ library, expression templates are used to 
 convert vector expressions into efficient element-wise operations. 
 Unlike that library, however, there is no reliance on the 
 compiler's optimiser. Instead, the expression template is 
 manipulated as text, converted into postfix, and then passed to a 
 simple CTFE compile-time assembler, which creates highly efficient 
 asm code which is used as a mixin.
 To understand the later parts of the code, you need some knowledge 
 of x87 assembler. In fact, you probably need to have read Agner 
 Fog's superb Pentium optimisation manual (www.agner.org).

 Some observations:
 * I was amazed at how simple the expression template code is (it 
 is somewhat cluttered by the code to check for real/imaginary type 
 mismatch errors).
 * I've often read that the x87 floating-point stack is notoriously 
 difficult for compilers to write code for, but it works quite well 
 in this case.
 * The major workarounds are:
 - inability to use a tuple element directly from asm code (bug 

 - inability to define operators for built-in arrays (hence the use 
 of 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by 
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new 
 mixins are. In this case, the mixin forms the _entire_ body of the 
 function. This is an interesting situation which I think a 
 language purist will find more palatable.

 Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice job. :) For others that are interested in how this actually gets the job done, I found this in your documentation: * THEORY: * Expression templates are used to create an expression string of the form "(a+b*c)+d" * and a tuple, the entries of which correspond to a, b, c, d, ... * This string is converted to postfix. The postfix string is converted to * a string containing x87 asm, which is then mixed into a function which accepts the tuple.
Hum, a minor question, is a string representation of the expressions better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this. In short: no. But it's really an unintentionally loaded question: there are some rather severe limitations as to what kind of data structures you can create at compile time. You're basically limited to tuples and strings, each of which have drawbacks of their own. You can create a tree using tuples, but then you run into problems with over-running the limitations on identifier length built into the compiler.
Actually I don't know how to make a tree nicely with tuples.
Here's an example (for grins): import std.stdio; template Node(char[] Data,Nodes...){ alias Data data; alias Nodes nodes; } // example "tree" structure alias Node!("1", Node!("1"), Node!("2", Node!("1"), Node!("2") ) ) dataset;
The problem I was referring to, is: how to store both values, and functions/operators, inside the tree. It seems to get messy very quickly.
 So -- if trees were available, I'm not sure if I'd use them, or not.
They're really handy. I've found some uses for them already:
I meant for this application. There's no doubt they're indispensable in other contexts.
 
 /*
 ZeroOrMoreExpr
     = generate.ZeroOrMoreExpr()
     ::= "["  "{"  Expression:expr  "}"  "]"  [RangeSetExpr:ranges ] 
 [Binding:binding ]
         ["*" SubExpressionTerm:delim ] [SubExpressionTerm:term];
 */
 bool ZeroOrMoreExpr(inout char[] value,inout char[] bindings,inout 
 char[] tokens,inout char[] err){
     return Rule!(
         generate.ZeroOrMoreExpr,
         And!(
             Literal!("["),
             Literal!("{"),
             Bind!(Expression,"expr"),
             Literal!("}"),
             Literal!("]"),
             Optional!(
                 Bind!(RangeSetExpr,"ranges")
             ),           
             Optional!(
                 Bind!(Binding,"binding")
             ),
             Optional!(
                 And!(
                     Literal!("*"),
                     Bind!(SubExpressionTerm,"delim")
                 )
             ),
             Bind!(SubExpressionTerm,"term")
         )
     )(value,bindings,tokens,err);       
 }
 
 There's a *lot* going on in this sample.  Basically what you see here is 
 a chunk of the as-of-yet-experimental compile-time Enki parser.  This 
 piece parses the Zero-or-more expression part of the EBNF variant that 
 Enki supports.
 
 The templates evaluate to CTFE's that in turn make up the parser when 
 it's executed.  So there's several layers of compile-time evaluation 
 going on here.  Also, what's not obvious is that "char[] tokens" is 
 actually an *array of strings* that is stored in a single char[] array; 
 each string's length data is encoded as a size_t mapped onto the 
 appropriate number of chars.  The Bind!() expressions also map parsed 
 out data to a key/value set which is stored in a similar fashion (char[] 
 bindings).
Seriously cool! Seems like you're generating a tree of nested mixins? Anyway, I suspect this will really benefit from any compiler improvements, eg CTFE support for AAs or nested functions. And obviously AST macros.
 The goals here were to side-step the limitations of D's identifier 
 name-length while providing a user-readable toolkit for parser 
 composition; the only limit now is placed on the length of any given 
 rule.  I haven't found a way to unify the compile-time and run-time 
 parser generation yet (one backend with two targets), but I anticipate 
 using something very similar to the above by providing a CT and RT 
 version of the template library.  At a minimum I plan on having this 
 consume a .bnf file at compile-time, in order to create a run-time 
 parser via mixin.
I really think the new mixins + CTFE was a breakthrough. Publish some of this stuff, and I think we'll see an exodus of many Boost developers into D.
Apr 10 2007
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
 I really think the new mixins + CTFE was a breakthrough. Publish some of 
 this stuff, and I think we'll see an exodus of many Boost developers 
 into D.
You guys gotta publish!
Apr 10 2007
next sibling parent reply Don Clugston <dac nospam.com.au> writes:
Walter Bright wrote:
 Don Clugston wrote:
 I really think the new mixins + CTFE was a breakthrough. Publish some 
 of this stuff, and I think we'll see an exodus of many Boost 
 developers into D.
You guys gotta publish!
I've begun a draft. A historical question for you -- On this page, http://www.artima.com/cppsource/top_cpp_software.html Scott Meyers says that g++ was the first compiler to generate native code. But I thought Zortech was older than g++. Is that correct?
Apr 10 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
 I've begun a draft. A historical question for you --
 
 On this page,
 http://www.artima.com/cppsource/top_cpp_software.html
 Scott Meyers says that g++ was the first compiler to generate native 
 code. But I thought Zortech was older than g++. Is that correct?
Depends on how you look at it. g++ was first 'released' in December, 1987. But the release notes for it say: "The GNU C++ Compiler is still in test release, and is NOT ready for everyday use" so I don't consider that a real release. Oregon Software released their native C++ compiler (not for the PC) in Jan or Feb 1988, nobody seems to recall exactly. Zortech's first release was in Jun 1988. Michael Tiemann, author of g++, calls the Sep 1988 release the "first really stable version."
Apr 10 2007
prev sibling parent Pragma <ericanderton yahoo.removeme.com> writes:
Walter Bright wrote:
 Don Clugston wrote:
 I really think the new mixins + CTFE was a breakthrough. Publish some 
 of this stuff, and I think we'll see an exodus of many Boost 
 developers into D.
You guys gotta publish!
Baby steps Walter, baby steps... ;) -- - EricAnderton at yahoo
Apr 10 2007
prev sibling parent Pragma <ericanderton yahoo.removeme.com> writes:
Don Clugston wrote:
 The problem I was referring to, is: how to store both values, and 
 functions/operators, inside the tree. It seems to get messy very quickly.
It can, especially with the additional code you need to treat all those templates as specialized data types.
 I meant for this application. There's no doubt they're indispensable in 
 other contexts.
Oh, my bad. Yea, it would probably be overkill for BLADE. :)
 Basically what you see here
 is a chunk of the as-of-yet-experimental compile-time Enki parser.  
 This piece parses the Zero-or-more expression part of the EBNF variant 
 that Enki supports.

 The templates evaluate to CTFE's that in turn make up the parser when 
 it's executed.  So there's several layers of compile-time evaluation 
 going on here.  Also, what's not obvious is that "char[] tokens" is 
 actually an *array of strings* that is stored in a single char[] 
 array; each string's length data is encoded as a size_t mapped onto 
 the appropriate number of chars.  The Bind!() expressions also map 
 parsed out data to a key/value set which is stored in a similar 
 fashion (char[] bindings).
Seriously cool! Seems like you're generating a tree of nested mixins?
Almost. The generate.ZeroOrMore returns an arbitrary string, that may be another "map" or a chunk of runtime code; the value is placed in the "value" string that's passed in. The *rootmost* generator is what compiles all this stuff into an actual chunk of mixin-able code. It's analogous to how the current rendition of Enki works. It seems overkill, but it's needed so I can do some basic semantic analysis, and do things like declare binding vars. I'm still looking for ways to simplify the process.
 Anyway, I suspect this will really benefit from any compiler 
 improvements, eg CTFE support for AAs or nested functions. And obviously 
 AST macros.
Definitely for AA's but not so much for AST macros - I get the impression that AST manipulation will only be useful for D code, and not for completely arbitrary grammars like EBNF. Getting compile-time AA support would cut Enki's code size down by almost a third: const char[] hackedArray = "\x00\x00\x00\x05hello\x00\x00\x00\x05cruel\x00\x00\x00\x05world"; Under the hood, this is what most of my data looks like. Dropping the support routines for manipulating such structures would help make things a lot less cumbersome. :(
 I really think the new mixins + CTFE was a breakthrough. Publish some of 
 this stuff, and I think we'll see an exodus of many Boost developers 
 into D.
Agreed. CTFE frees us from many limitations imposed by templates, and the added flexibility of mixin() pretty much gives us the same strength of javascript's eval() statement at compile-time. It's a huge deal. It's possible that we might find a handful of interested devs in their camp, but I think we're still back to the same old problem as with the rest of C++: momentum. I sincerely doubt we'll see a mass exodus from one camp to this one regardless of how good a job is done here. Of course, I'll be happy to be wrong about that. :) -- - EricAnderton at yahoo
Apr 10 2007
prev sibling parent reply KlausO <oberhofer users.sf.net> writes:
Pragma schrieb:
 Don Clugston wrote:
 Pragma wrote:
 Bruno Medeiros wrote:
 Pragma wrote:
 Don Clugston wrote:
 I have been trying to come up with a convincing use case for the 
 new mixins (and for metaprogramming in general). My best effort to 
 date can be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d 


 It generates near-optimal x87 asm code for BLAS1-style basic 
 vector operations. 32, 64 and 80 bit vectors are all supported.

 Compile with -version=BladeDebug to see the asm code which is 
 generated.

 Typical usage:

 void main()
 {
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
 }

 Notice that mixed-length operations (real[] + float[] - double[]) 
 are supported.

 Like the C++ Blitz++ library, expression templates are used to 
 convert vector expressions into efficient element-wise operations. 
 Unlike that library, however, there is no reliance on the 
 compiler's optimiser. Instead, the expression template is 
 manipulated as text, converted into postfix, and then passed to a 
 simple CTFE compile-time assembler, which creates highly efficient 
 asm code which is used as a mixin.
 To understand the later parts of the code, you need some knowledge 
 of x87 assembler. In fact, you probably need to have read Agner 
 Fog's superb Pentium optimisation manual (www.agner.org).

 Some observations:
 * I was amazed at how simple the expression template code is (it 
 is somewhat cluttered by the code to check for real/imaginary type 
 mismatch errors).
 * I've often read that the x87 floating-point stack is notoriously 
 difficult for compilers to write code for, but it works quite well 
 in this case.
 * The major workarounds are:
 - inability to use a tuple element directly from asm code (bug 

 - inability to define operators for built-in arrays (hence the use 
 of 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by 
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new 
 mixins are. In this case, the mixin forms the _entire_ body of the 
 function. This is an interesting situation which I think a 
 language purist will find more palatable.

 Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice job. :) For others that are interested in how this actually gets the job done, I found this in your documentation: * THEORY: * Expression templates are used to create an expression string of the form "(a+b*c)+d" * and a tuple, the entries of which correspond to a, b, c, d, ... * This string is converted to postfix. The postfix string is converted to * a string containing x87 asm, which is then mixed into a function which accepts the tuple.
Hum, a minor question, is a string representation of the expressions better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this. In short: no. But it's really an unintentionally loaded question: there are some rather severe limitations as to what kind of data structures you can create at compile time. You're basically limited to tuples and strings, each of which have drawbacks of their own. You can create a tree using tuples, but then you run into problems with over-running the limitations on identifier length built into the compiler.
Actually I don't know how to make a tree nicely with tuples.
Here's an example (for grins): import std.stdio; template Node(char[] Data,Nodes...){ alias Data data; alias Nodes nodes; } template PrintData(char[] parent){ const char[] PrintData = ""; } template PrintData(char[] parent,alias Node){ static if(parent == ""){ const char[] PrintData = Node.data ~ "\n" ~ PrintData!(Node.data,Node.nodes); } else{ const char[] PrintData = parent ~ "." ~ Node.data ~ "\n" ~ PrintData!(parent ~ "." ~ Node.data,Node.nodes); } } template PrintData(char[] parent,alias Node,V...){ const char[] PrintData = PrintData!(parent,Node) ~ PrintData!(parent,V); } // example "tree" structure alias Node!("1", Node!("1"), Node!("2", Node!("1"), Node!("2") ) ) dataset; void main(){ writefln("%s",PrintData!("",dataset)); }
  Strings are no where near as flexible out of the box, but
 pose no storage limitations, which gives a slight edge to 
 string-based representations of data.  With some clever coding, 
 strings can be made to store just about any structure imaginable 
 (even if it does make for some ugly code).
There's a couple of minor things, though: (1) in the case of generating D code to mixin, no tree is required (the string just gets mixed back in again, so it might as well stay as a string). (2) The "separate string and tuple" structure means the expression can be optimised without needing to move any of the values around; this creates less garbage for the compiler to optimise away. (3) Some patterns are probably easier to find with a string, than a tree. (4) It's more natural for D coders to think in terms of arrays, than lists. (and there is much more language support). (5) A string representation can easily express things like c=a+3*(a+b); where 'a' appears twice. (At present, there's no way to generate it, but theoretically the compiler could provide it many cases). So -- if trees were available, I'm not sure if I'd use them, or not.
They're really handy. I've found some uses for them already: /* ZeroOrMoreExpr = generate.ZeroOrMoreExpr() ::= "[" "{" Expression:expr "}" "]" [RangeSetExpr:ranges ] [Binding:binding ] ["*" SubExpressionTerm:delim ] [SubExpressionTerm:term]; */ bool ZeroOrMoreExpr(inout char[] value,inout char[] bindings,inout char[] tokens,inout char[] err){ return Rule!( generate.ZeroOrMoreExpr, And!( Literal!("["), Literal!("{"), Bind!(Expression,"expr"), Literal!("}"), Literal!("]"), Optional!( Bind!(RangeSetExpr,"ranges") ), Optional!( Bind!(Binding,"binding") ), Optional!( And!( Literal!("*"), Bind!(SubExpressionTerm,"delim") ) ), Bind!(SubExpressionTerm,"term") ) )(value,bindings,tokens,err); } There's a *lot* going on in this sample. Basically what you see here is a chunk of the as-of-yet-experimental compile-time Enki parser. This piece parses the Zero-or-more expression part of the EBNF variant that Enki supports. The templates evaluate to CTFE's that in turn make up the parser when it's executed. So there's several layers of compile-time evaluation going on here. Also, what's not obvious is that "char[] tokens" is actually an *array of strings* that is stored in a single char[] array; each string's length data is encoded as a size_t mapped onto the appropriate number of chars. The Bind!() expressions also map parsed out data to a key/value set which is stored in a similar fashion (char[] bindings). The goals here were to side-step the limitations of D's identifier name-length while providing a user-readable toolkit for parser composition; the only limit now is placed on the length of any given rule. I haven't found a way to unify the compile-time and run-time parser generation yet (one backend with two targets), but I anticipate using something very similar to the above by providing a CT and RT version of the template library. At a minimum I plan on having this consume a .bnf file at compile-time, in order to create a run-time parser via mixin.
Hey pragma, really cool, I've come up with a similar structure while experimenting with a PEG parser in D (see attachment) after I've read this article series on Codeproject: http://www.codeproject.com/cpp/crafting_interpreter_p1.asp http://www.codeproject.com/cpp/crafting_interpreter_p2.asp http://www.codeproject.com/cpp/crafting_interpreter_p3.asp The template system of D does an awesome job in keeping templated PEG grammars readable. BTW: If you turn Enki into a PEG style parser I definitely throw my attempts into the dustbin :-) Greets KlausO
Apr 10 2007
parent reply Pragma <ericanderton yahoo.removeme.com> writes:
KlausO wrote:
 
 Hey pragma,
 
 really cool, I've come up with a similar structure while experimenting
 with a PEG parser in D (see attachment)
 after I've read this article series on
 Codeproject:
 
 http://www.codeproject.com/cpp/crafting_interpreter_p1.asp
 http://www.codeproject.com/cpp/crafting_interpreter_p2.asp
 http://www.codeproject.com/cpp/crafting_interpreter_p3.asp
 
 The template system of D does an awesome job in keeping
 templated PEG grammars readable.
Yes it does! Thanks for posting this - I didn't even know that article was there.
 BTW: If you turn Enki into a PEG style parser I definitely throw
 my attempts into the dustbin :-)
 Greets
Wow, that's one heck of an endorsement. Thanks, but don't throw anything out yet. This rendition of Enki is still a ways off though. FYI I plan on keeping Enki's internals as human-readable as possible by keeping it self-hosting. So there'll be two ways to utilize the toolkit: EBNF coding and "by hand".
 alias   Action!(
           And!(
             PlusRepeat!(EmailChar),
             Char!(' '), 
             PlusRepeat!(
               And!(
                 Or!(
                   In!(
                     Range!('a', 'z'),
                     Range!('A', 'Z'),
                     Range!('0', '9'),
                     Char!('_'),
                     Char!('%'),
                     Char!('-')
                   ),
                   Char!('.')
                 ),
                 Not!(EmailSuffix)
               )
             ),
             Char!('.'),
             EmailSuffix
           ),
           delegate void(char[] email) { writefln("<email:", email, ">"); }
         )
         Email;
I had an earlier cut that looked a lot like this. :) But there's a very subtle problem lurking in there. By making your entire grammar one monster-sized template instance, you'll run into DMD's identifier-length limit *fast*. As a result it failed when I tried to transcribe Enki's ENBF definition. That's why I wrap each rule as a CTFE-style function as it side-steps that issue rather nicely, without generating too many warts. -- - EricAnderton at yahoo
Apr 10 2007
next sibling parent reply KlausO <oberhofer users.sf.net> writes:
Pragma schrieb:
 KlausO wrote:
 Hey pragma,

 really cool, I've come up with a similar structure while experimenting
 with a PEG parser in D (see attachment)
 after I've read this article series on
 Codeproject:

 http://www.codeproject.com/cpp/crafting_interpreter_p1.asp
 http://www.codeproject.com/cpp/crafting_interpreter_p2.asp
 http://www.codeproject.com/cpp/crafting_interpreter_p3.asp

 The template system of D does an awesome job in keeping
 templated PEG grammars readable.
Yes it does! Thanks for posting this - I didn't even know that article was there.
 BTW: If you turn Enki into a PEG style parser I definitely throw
 my attempts into the dustbin :-)
 Greets
Wow, that's one heck of an endorsement. Thanks, but don't throw anything out yet. This rendition of Enki is still a ways off though. FYI I plan on keeping Enki's internals as human-readable as possible by keeping it self-hosting. So there'll be two ways to utilize the toolkit: EBNF coding and "by hand".
Nice to hear that I hit your taste :-)
 
 alias   Action!(
           And!(
             PlusRepeat!(EmailChar),
             Char!(' '),             PlusRepeat!(
               And!(
                 Or!(
                   In!(
                     Range!('a', 'z'),
                     Range!('A', 'Z'),
                     Range!('0', '9'),
                     Char!('_'),
                     Char!('%'),
                     Char!('-')
                   ),
                   Char!('.')
                 ),
                 Not!(EmailSuffix)
               )
             ),
             Char!('.'),
             EmailSuffix
           ),
           delegate void(char[] email) { writefln("<email:", email, 
 ">"); }
         )
         Email;
I had an earlier cut that looked a lot like this. :) But there's a very subtle problem lurking in there. By making your entire grammar one monster-sized template instance, you'll run into DMD's identifier-length limit *fast*. As a result it failed when I tried to transcribe Enki's ENBF definition. That's why I wrap each rule as a CTFE-style function as it side-steps that issue rather nicely, without generating too many warts.
Another issue I ran into is circular template dependencies. You could get nasty error messages like dpeg.d(282): Error: forward reference to 'And!(Alternative,OptRepeat!(And!(WS,Char!('/'),WS,Alternative)),WS)' Any idea how they could be resolved ? FYI: Other good PEG references: http://pdos.csail.mit.edu/~baford/packrat/ http://www.codeproject.com/csharp/cat.asp very interesting had been http://code.google.com/p/cat-language/wiki/HowTheInterpreterWorks C++ templated parsers http://www.codeproject.com/cpp/yard-xml-parser.asp http://www.codeproject.com/cpp/biscuit.asp
Apr 10 2007
parent reply Pragma <ericanderton yahoo.removeme.com> writes:
KlausO wrote:
 Pragma schrieb:
 KlausO wrote:

 alias   Action!(
           And!(
             PlusRepeat!(EmailChar),
             Char!(' '),             PlusRepeat!(
               And!(
                 Or!(
                   In!(
                     Range!('a', 'z'),
                     Range!('A', 'Z'),
                     Range!('0', '9'),
                     Char!('_'),
                     Char!('%'),
                     Char!('-')
                   ),
                   Char!('.')
                 ),
                 Not!(EmailSuffix)
               )
             ),
             Char!('.'),
             EmailSuffix
           ),
           delegate void(char[] email) { writefln("<email:", email, 
 ">"); }
         )
         Email;
I had an earlier cut that looked a lot like this. :) But there's a very subtle problem lurking in there. By making your entire grammar one monster-sized template instance, you'll run into DMD's identifier-length limit *fast*. As a result it failed when I tried to transcribe Enki's ENBF definition. That's why I wrap each rule as a CTFE-style function as it side-steps that issue rather nicely, without generating too many warts.
Another issue I ran into is circular template dependencies. You could get nasty error messages like dpeg.d(282): Error: forward reference to 'And!(Alternative,OptRepeat!(And!(WS,Char!('/'),WS,Alternative)),WS)' Any idea how they could be resolved ?
Yep, that one bit me too; just use the CTFE trick I mentioned. Wrapping each rule fixes this since DMD can resolve forward references to functions, but not with template instances.
 
 FYI:
 
 Other good PEG references:
 http://pdos.csail.mit.edu/~baford/packrat/
 

 http://www.codeproject.com/csharp/cat.asp
 very interesting had been
 http://code.google.com/p/cat-language/wiki/HowTheInterpreterWorks
 
 C++ templated parsers
 http://www.codeproject.com/cpp/yard-xml-parser.asp
 http://www.codeproject.com/cpp/biscuit.asp
Thanks. I can always use a little more research to lean on. -- - EricAnderton at yahoo
Apr 10 2007
parent BCS <BCS pathlink.com> writes:
 KlausO wrote:
 
 Pragma schrieb:
 I had an earlier cut that looked a lot like this. :)  But there's a 
 very subtle problem lurking in there.  By making your entire grammar 
 one monster-sized template instance, you'll run into DMD's 
 identifier-length limit *fast*.  As a result it failed when I tried 
 to transcribe Enki's ENBF definition.  That's why I wrap each rule as 
 a CTFE-style function as it side-steps that issue rather nicely, 
 without generating too many warts.
Another issue I ran into is circular template dependencies. You could get nasty error messages like
The way my dparse sidesteps this (both the id length limit and forward references) is to have the parser functions specialized only on the name of the reduction, the grammar is carried in an outer scope (the mixed in template that is never used as an identifier). This also allows the inner template to specialize on anything that is defined.
Apr 10 2007
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Pragma wrote:
 By making your entire grammar one 
 monster-sized template instance, you'll run into DMD's identifier-length 
 limit *fast*.  As a result it failed when I tried to transcribe Enki's 
 ENBF definition.  That's why I wrap each rule as a CTFE-style function 
 as it side-steps that issue rather nicely, without generating too many 
 warts.
One of the motivations for CTFE was to address that exact problem.
Apr 10 2007
prev sibling next sibling parent reply Witold Baryluk <baryluk mpi.int.pl> writes:
Hello,

Very good work. I'm impressed. Actually i was developing very
similar program, but yours is practicly all I was needing.
I'm developing linear algebra package in D, and now I'm optimising
it for vector machines, so your BLAS1-like pacakge will be
helpful. :)

Thx.

-- 
Witold Baryluk
Apr 03 2007
parent Don Clugston <dac nospam.com.au> writes:
Witold Baryluk wrote:
 Hello,
 
 Very good work. I'm impressed. Actually i was developing very
 similar program, but yours is practicly all I was needing.
 I'm developing linear algebra package in D, and now I'm optimising
 it for vector machines, so your BLAS1-like pacakge will be
 helpful. :)
Do you intend to implement BLAS2 and BLAS3-like functionality? I feel that I don't know enough about cache-efficient matrix blocking techniques to be confident in presenting code. I don't know how sophisticated the techniques are in libraries like ATLAS, but the ones used in Blitz++ should be easy to compete with. Blade is still mostly proof-of-concept rather than being industrial-strength -- there are so many possibilities for improvement, it's not well tested, and I haven't even put any effort into making the code look nice. But as Davidl said, it shows that hard-core back-end optimisation can now be done in a library at compile time. The code is quite modularised, so it would be straightforward to add a check for 'is it SSE-able' (ie, are all the vectors floats) and 'is it SSE2-able' (are all the vectors doubles), and if so, send them off into specialised assemblers, otherwise send them to the x87 one. The postfix-ing step will involve searching for *+ combinations where fma can be used. The ability to know the vector length in many cases can be a huge advantage for SSE code, where loop unrolling by a factor of 2 or 4 is always necessary. I haven't got that far yet. I'm pretty sure that this type of technique can blow almost everything else out of the water. <g>
Apr 04 2007
prev sibling next sibling parent Davidl <Davidl 126.com> writes:
i think compilers have feeled a great challenge from compile-time back-e=
nd  =

like power

 I have been trying to come up with a convincing use case for the new  =
 mixins (and for metaprogramming in general). My best effort to date ca=
n =
 be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blad=
e.d
 It generates near-optimal x87 asm code for BLAS1-style basic vector  =
 operations. 32, 64 and 80 bit vectors are all supported.

 Compile with -version=3DBladeDebug to see the asm code which is genera=
ted.
 Typical usage:

 void main()
 {
      auto p =3D Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
      auto q =3D Vec([3.5L, 1.1, 3.8]);  // ditto
      auto r =3D Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
      auto z =3D Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idouble=
s
      real d =3D dot(r, p+r+r);
      ireal e =3D dot(r, z);
      q -=3D ((r+p)*18.0L*314.1L - (p-r))* 35;
      d =3D dot(r, p+r+r);
 }

 Notice that mixed-length operations (real[] + float[] - double[]) are =
=
 supported.

 Like the C++ Blitz++ library, expression templates are used to convert=
=
 vector expressions into efficient element-wise operations. Unlike that=
=
 library, however, there is no reliance on the compiler's optimiser.  =
 Instead, the expression template is manipulated as text, converted int=
o =
 postfix, and then passed to a simple CTFE compile-time assembler, whic=
h =
 creates highly efficient asm code which is used as a mixin.
 To understand the later parts of the code, you need some knowledge of =
=
 x87 assembler. In fact, you probably need to have read Agner Fog's  =
 superb Pentium optimisation manual (www.agner.org).

 Some observations:
 * I was amazed at how simple the expression template code is (it is  =
 somewhat cluttered by the code to check for real/imaginary type mismat=
ch =
 errors).
 * I've often read that the x87 floating-point stack is notoriously  =
 difficult for compilers to write code for, but it works quite well in =
=
 this case.
 * The major workarounds are:

 - inability to define operators for built-in arrays (hence the use of =
=
 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by  =
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new  =
 mixins are. In this case, the mixin forms the _entire_ body of the  =
 function. This is an interesting situation which I think a language  =
 purist will find more palatable.

 Enjoy.
Apr 03 2007
prev sibling next sibling parent Jascha Wetzel <"[firstname]" mainia.de> writes:
This is really neat.
I like that this basically is a compiler extension and it's still
a) well readable
b) produces simple error messages during compilation (if any)
c) introduces non-artificial syntax for the task

c) will actually depend more on the versatility of operator overloading
than the CTFE stuff.
a) and b) are so much better than what c++ template magic gave us (as
can be observed in boost).

Now i think there's only one requirement missing to make it a perfect
compiler extension and that is to produce debuggable code.
Actually in this case, as the extension is supposed to generate mostly
single line expression statements, it might not be that important. But
if we consider multi-line DSL statements, a way to mark generated code
as belonging to one of the DSL source lines would be great.

I think, a way for a CTF to recieve the file+line where it was called
from would do the trick. In BLADE that information could be added to the
expression string, passed through to the postfix notation and finally be
inserted into the ASM code with #line directives.

Don Clugston wrote:
 I have been trying to come up with a convincing use case for the new
 mixins (and for metaprogramming in general). My best effort to date can
 be found at:
 http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
 
 It generates near-optimal x87 asm code for BLAS1-style basic vector
 operations. 32, 64 and 80 bit vectors are all supported.
 
 Compile with -version=BladeDebug to see the asm code which is generated.
 
 Typical usage:
 
 void main()
 {
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
 }
 
 Notice that mixed-length operations (real[] + float[] - double[]) are
 supported.
 
 Like the C++ Blitz++ library, expression templates are used to convert
 vector expressions into efficient element-wise operations. Unlike that
 library, however, there is no reliance on the compiler's optimiser.
 Instead, the expression template is manipulated as text, converted into
 postfix, and then passed to a simple CTFE compile-time assembler, which
 creates highly efficient asm code which is used as a mixin.
 To understand the later parts of the code, you need some knowledge of
 x87 assembler. In fact, you probably need to have read Agner Fog's
 superb Pentium optimisation manual (www.agner.org).
 
 Some observations:
 * I was amazed at how simple the expression template code is (it is
 somewhat cluttered by the code to check for real/imaginary type mismatch
 errors).
 * I've often read that the x87 floating-point stack is notoriously
 difficult for compilers to write code for, but it works quite well in
 this case.
 * The major workarounds are:

 - inability to define operators for built-in arrays (hence the use of
 'Vec' wrappers).
 - inability to index through a tuple in a CTFE function (solved by
 converting types into a string).
 * There have been mutterings about how unhygenic/dangerous the new
 mixins are. In this case, the mixin forms the _entire_ body of the
 function. This is an interesting situation which I think a language
 purist will find more palatable.
 
 Enjoy.
Apr 05 2007
prev sibling next sibling parent reply Mikola Lysenko <mclysenk mtu.edu> writes:
Excellent work!  I have to say that I've had my misgivings about mixin 
statements, but I'm glad to see that BLADE and Don's hard work have 
proven me wrong.  Compile time vector expressions are desperately 
needed, and this library goes a long way to sorting things out.  As far 
as meta-type tricks go, this is by far the coolest I've ever seen in any 
  programming language.

One very interesting application I would like to see with this technique 
would be a compile time geometric algebra library.  If no one writes one 
by this summer, I will probably do it myself.  (Actually, when I heard 
the name BLADE, my heart skipped a beat thinking it was a GA library, 
but alas it is not...)  Maybe it is my unrestrained exuberance, I would 
guess that a strong D implementation should easily crush the leading C++ 
GA lib, GAIGEN in both performance and ease of use.

For reference, here is the GAIGEN homepage:
http://www.science.uva.nl/ga/gaigen/content_gaigen.html

-Mikola Lysenko
Apr 05 2007
next sibling parent torhu <fake address.dude> writes:
Mikola Lysenko wrote:
<snip>
 One very interesting application I would like to see with this technique 
 would be a compile time geometric algebra library.  If no one writes one 
 by this summer, I will probably do it myself.  (Actually, when I heard 
 the name BLADE, my heart skipped a beat thinking it was a GA library, 
 but alas it is not...)  Maybe it is my unrestrained exuberance, I would 
 guess that a strong D implementation should easily crush the leading C++ 
 GA lib, GAIGEN in both performance and ease of use.
 
Oh my GAD? Get it? ;)
Apr 05 2007
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Mikola Lysenko wrote:
 Excellent work!  I have to say that I've had my misgivings about mixin 
 statements, but I'm glad to see that BLADE and Don's hard work have 
 proven me wrong.  Compile time vector expressions are desperately 
 needed, and this library goes a long way to sorting things out.  As far 
 as meta-type tricks go, this is by far the coolest I've ever seen in any 
  programming language.
 
 One very interesting application I would like to see with this technique 
 would be a compile time geometric algebra library.  If no one writes one 
 by this summer, I will probably do it myself.  (Actually, when I heard 
 the name BLADE, my heart skipped a beat thinking it was a GA library, 
 but alas it is not...)  Maybe it is my unrestrained exuberance, I would 
 guess that a strong D implementation should easily crush the leading C++ 
 GA lib, GAIGEN in both performance and ease of use.
 
 For reference, here is the GAIGEN homepage:
 http://www.science.uva.nl/ga/gaigen/content_gaigen.html
 
 -Mikola Lysenko
Wow, that looks really sweet. Last I heard on GA was that it was good for working out the math, but too slow for real use. Now it's fast too apparently! Now I've really got to learn GA. And a D compile-time fu version would be neat. One question: from the GAIGEN page it sounds almost like GAIGEN generates off-line a set of primitive operations that are efficient for the chosen specific GA domain (3D euclidean or whatever). If so, then that sounds pretty good to me. What's wrong with using an off-line approach? Are there some expressions things that are just too difficult to optimize ahead of time without seeing the whole expression as it will actually used? Or is it more like GAIGEN creates the primitives operations for a particular GA like dot product, cross product, and matrix multiplies? If so then it still maybe leaves a lot of optimization opportunities on the floor. Like evaluating A*B*C*vec as ((A*B)*C)*vec vs. (A*(B*(C*vec))) can make a big difference if ABC are big matrices and vec is a small vector. --bb
Apr 05 2007
parent Mikola Lysenko <mclysenk mtu.edu> writes:
Bill Baxter wrote:
 Mikola Lysenko wrote:
 Excellent work!  I have to say that I've had my misgivings about mixin 
 statements, but I'm glad to see that BLADE and Don's hard work have 
 proven me wrong.  Compile time vector expressions are desperately 
 needed, and this library goes a long way to sorting things out.  As 
 far as meta-type tricks go, this is by far the coolest I've ever seen 
 in any  programming language.

 One very interesting application I would like to see with this 
 technique would be a compile time geometric algebra library.  If no 
 one writes one by this summer, I will probably do it myself.  
 (Actually, when I heard the name BLADE, my heart skipped a beat 
 thinking it was a GA library, but alas it is not...)  Maybe it is my 
 unrestrained exuberance, I would guess that a strong D implementation 
 should easily crush the leading C++ GA lib, GAIGEN in both performance 
 and ease of use.

 For reference, here is the GAIGEN homepage:
 http://www.science.uva.nl/ga/gaigen/content_gaigen.html

 -Mikola Lysenko
Wow, that looks really sweet. Last I heard on GA was that it was good for working out the math, but too slow for real use. Now it's fast too apparently! Now I've really got to learn GA. And a D compile-time fu version would be neat. One question: from the GAIGEN page it sounds almost like GAIGEN generates off-line a set of primitive operations that are efficient for the chosen specific GA domain (3D euclidean or whatever). If so, then that sounds pretty good to me. What's wrong with using an off-line approach? Are there some expressions things that are just too difficult to optimize ahead of time without seeing the whole expression as it will actually used? Or is it more like GAIGEN creates the primitives operations for a particular GA like dot product, cross product, and matrix multiplies? If so then it still maybe leaves a lot of optimization opportunities on the floor. Like evaluating A*B*C*vec as ((A*B)*C)*vec vs. (A*(B*(C*vec))) can make a big difference if ABC are big matrices and vec is a small vector. --bb
As I understand it, GAIGEN uses an external program to generate the implementation, and the implementation uses template expressions to actually perform the optimizations. The process seems a bit contrived to me, but I'm sure they have their reasons. Also if you look at the performance statistics on that site, you will see that the geometric algebra implementation is actually SLOWER than the standard linear algebra (but not by much.) The main advantage of GA when programming is that the code is shorter and simpler. Recently, GA has been getting some more widespread acceptance and commercial success. Take Geomerics for example; they're a small company that specializes in graphics and physics software based on GA inspired algorithms. Here is there website: http://www.geomerics.com They've got a very impressive realtime radiosity engine that seems to have gotten some press in the last few weeks. There's also some interesting stuff in the technology section on their website. Writing a GA library for D is definitely on my todo list (right after I get done with this semester of classes.) I will probably work on it this summer unless someone beats me to it. -Mik
Apr 05 2007
prev sibling next sibling parent reply Dan <murpsoft hotmail.com> writes:
Don Clugston Wrote:
 this case, I thought it would be relevant to AST macro development.
 Also I posted this to get an idea if there was much interest in this 
 sort of thing -- I don't have a great deal of free time, so I want to 
 make good use of it. I only got the code working a few days ago.
 The speed of the code will likely be affected by how well DMD optimises 
 the tuple copying. I even haven't looked at what code it's actually 
 generating.
 
In essence, it's a *demonstration* that with all of D's features combined, one can write a fwicked-optimized compiler at compile time in D. He's also working towards it being Agner Fog's ideal of optimal; and is taking the right steps to do it. Agner Fog is quite respected for his work in the x86 family of assembly; and previously I haven't seen any program that can *generate* code sufficiently well to approach his optimal assembly source.
Apr 06 2007
parent Don Clugston <dac nospam.com.au> writes:
Dan wrote:
 Don Clugston Wrote:
 this case, I thought it would be relevant to AST macro development.
 Also I posted this to get an idea if there was much interest in this 
 sort of thing -- I don't have a great deal of free time, so I want to 
 make good use of it. I only got the code working a few days ago.
 The speed of the code will likely be affected by how well DMD optimises 
 the tuple copying. I even haven't looked at what code it's actually 
 generating.
In essence, it's a *demonstration* that with all of D's features combined, one can write a fwicked-optimized compiler at compile time in D.
Exactly.
 He's also working towards it being Agner Fog's ideal of optimal; and is taking
the right steps to do it.  Agner Fog is quite respected for his work in the x86
family of assembly; and previously I haven't seen any program that can
*generate* code sufficiently well to approach his optimal assembly source.
It might be because no-one has really tried. Agner told me that he has *never* been emailed by a compiler writer, even though he's tried to get their attention; I find that astonishing. Someone recently posted a link to an article by Michael Abrash about his software DirectX 3D renderer. There are some similarities to this: he's developed a basic, efficient code structure, and his renderer takes little fragments of code and inserts them into it. Working out what the code structure should be (which is what compilers generally try to do) is a fantastically difficult problem, but splicing fragments together is pretty easy. OTOH it must be pretty hard to identify patterns like this after the initial code generation step.
Apr 07 2007
prev sibling parent Dan <murpsoft hotmail.com> writes:
janderson Wrote:

 That would be nice and would help however I think it'll take more then 
 that to convince a Game company to switch to D, particularly ones using 
 XNA.
Umm... A runtime hosted microsoft product? For anything high performance, that's suicide! No wonder the games the last couple years have come out bloated to 8Gb.
 It would have to be a startup company that is not working 360, perhaps a 
 Nintendo, sony, Linux or perhaps windows.  I say perhaps, because if a 
 company is going to switch to anything its probably XNA.  Now if we had 
 D for Net, maybe the story would be slightly different.
Oh! DotNet? Another runtime hosted microsoft environment? Brilliant. I would hope you've suggested this is so *only* because programmers would want to target the XBox360. Anything running on Linux would *hugely* suffer performance running on DotNet compared to D; as would anything on the x86-64 platform - the performance hit would still be considerable on Windows, but nonetheless tolerable on modern computers.
 I do like the idea of using D for games, I just see it as being a hard 
 sell.  Scripting and tools are probably the main gateway for D into the 
 game programming world.
Well, with Walnut 2.x en route and Blade slowly evolving... I'd say that within the next year we ought to see a stupidly easy-to-edit scripting engine along with incredibly fast vector and floating point math. If someone eventually provides runtime AST reflection, we may soon thereafter see SSE2 and proveable security follow shortly thereafter.
 
 Once a few game companies start using it in their main project and can 
 demonstrate productivity/speed improvements then the wall may indeed 
 start to crumble.
 
 -Joel
That's probably a fair assessment.
Apr 10 2007