|
Archives
D Programming
D
D.gnu
digitalmars.D
digitalmars.D.bugs
digitalmars.D.dtl
digitalmars.D.dwt
digitalmars.D.announce
digitalmars.D.learn
digitalmars.D.debugger
C/C++ Programming
c++
c++.announce
c++.atl
c++.beta
c++.chat
c++.command-line
c++.dos
c++.dos.16-bits
c++.dos.32-bits
c++.idde
c++.mfc
c++.rtl
c++.stl
c++.stl.hp
c++.stl.port
c++.stl.sgi
c++.stlsoft
c++.windows
c++.windows.16-bits
c++.windows.32-bits
c++.wxwindows
digitalmars.empire
digitalmars.DMDScript
electronics
|
digitalmars.D - BLADE 0.2Alpha: Vector operations with mixins, expression templates,
↑ ↓ ← → Don Clugston <dac nospam.com.au> writes:
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date can
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic vector
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[]) are
supported.
Like the C++ Blitz++ library, expression templates are used to convert
vector expressions into efficient element-wise operations. Unlike that
library, however, there is no reliance on the compiler's optimiser.
Instead, the expression template is manipulated as text, converted into
postfix, and then passed to a simple CTFE compile-time assembler, which
creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of
x87 assembler. In fact, you probably need to have read Agner Fog's
superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is
somewhat cluttered by the code to check for real/imaginary type mismatch
errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well in
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a language
purist will find more palatable.
Enjoy.
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date can
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
This is pretty incredible.
Can I beg you to write an article about it?
↑ ↓ ← → Don Clugston <dac nospam.com.au> writes:
Walter Bright wrote:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date
can be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
This is pretty incredible.
I thought you'd like it. The sheer power of D these days...
"Flies like a rocket, steers like a bicycle."
Can I beg you to write an article about it?
OK. Are you thinking somewhere like DDJ?
It's nicely focused (central idea being something like "D's
metaprogramming is so powerful, that compiler back-end optimisation can
easily be performed in compile-time libraries") -- but it would require
loads of explanation of D's coolest features. Could be a challenge to
make it easily readable.
But with the proposed changes to 'const', macros, and previous
discussion about whether tuples should be automatically flattened, I
have doubts over how stable that part of the language is. I wouldn't
want to write something that didn't compile any more, or looked
hopelessly obsolete by the time it was published.
I've been bitten before -- CTFE blasted almost all of my previous
metaprogramming code into the stone age <g>. What's the shelf life of
this stuff?.
↑ ↓ ← → Georg Wrede <georg nospam.org> writes:
Don Clugston wrote:
Could be a challenge to make it easily readable.
I've some experience proofreading and commenting similar articles. Only
too happy to oblige!
Also, DDJ readers may not be the CUJ crowd, but still, they can handle
advanced stuff. Besides, some similar magazines help you by rewriting
the article anyway, because then it has the same overall "feel" that
their other articles. Articles in SciAm for example feel the same, no
matter who or from what continent is presented as the author.
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
Walter Bright wrote:
Can I beg you to write an article about it?
OK. Are you thinking somewhere like DDJ?
Anywhere!
It's nicely focused (central idea being something like "D's
metaprogramming is so powerful, that compiler back-end optimisation can
easily be performed in compile-time libraries") -- but it would require
loads of explanation of D's coolest features. Could be a challenge to
make it easily readable.
But with the proposed changes to 'const', macros, and previous
discussion about whether tuples should be automatically flattened, I
have doubts over how stable that part of the language is. I wouldn't
want to write something that didn't compile any more, or looked
hopelessly obsolete by the time it was published.
I've been bitten before -- CTFE blasted almost all of my previous
metaprogramming code into the stone age <g>. What's the shelf life of
this stuff?.
Good question. I've already fixed #1028. But even if it looks stone age
6 months from now, it's still a decade ahead of any other language.
↑ ↓ ← → "David B. Held" <dheld codelogicconsulting.com> writes:
Don Clugston wrote:
[...]
But with the proposed changes to 'const', macros, and previous
discussion about whether tuples should be automatically flattened, I
have doubts over how stable that part of the language is. I wouldn't
want to write something that didn't compile any more, or looked
hopelessly obsolete by the time it was published.
I've been bitten before -- CTFE blasted almost all of my previous
metaprogramming code into the stone age <g>. What's the shelf life of
this stuff?.
Well, hopefully, if Walter breaks your code again, it will be to make it
even more elegant (although, that will be a bit of a trick, I admit).
Eventually, the language will be so powerful that you can't make your
code any more compact, and then you won't have to worry about it getting
broken. To be honest, I'd like to see an article about this in CUJ. I
don't know if they would publish it or not, but if they did, I guarantee
it would make more than a few people take a hard look at D.
Dave
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
David B. Held wrote:
Don Clugston wrote:
[...]
But with the proposed changes to 'const', macros, and previous
discussion about whether tuples should be automatically flattened, I
have doubts over how stable that part of the language is. I wouldn't
want to write something that didn't compile any more, or looked
hopelessly obsolete by the time it was published.
I've been bitten before -- CTFE blasted almost all of my previous
metaprogramming code into the stone age <g>. What's the shelf life of
this stuff?.
Well, hopefully, if Walter breaks your code again, it will be to make it
even more elegant (although, that will be a bit of a trick, I admit).
Eventually, the language will be so powerful that you can't make your
code any more compact, and then you won't have to worry about it getting
broken. To be honest, I'd like to see an article about this in CUJ. I
don't know if they would publish it or not, but if they did, I guarantee
it would make more than a few people take a hard look at D.
I'm pretty sure DDJ would publish it, and it would reach a much wider
audience than CUJ, which has disappeared <g>.
Anyhow, Don, this kind of stuff needs to reach a much wider audience
than a random posting in this n.g. would get.
↑ ↓ ← → Dan <murpsoft hotmail.com> writes:
Current compilers tend to dump IR's to a buffer and then adjust them according
to parser rules. The wrong approach.
I totally want to see this kind of programming get implemented into the
compiler itself. I know it's too much to ask, but if DMD could generate
optimal floating point math and cache pipelines, it would rend the gaming
industry from C++'s hands REAL quick like.
A C++ programmer can say:
"oh yeah, it's a minor upgrade made for my convenience, but I'm already
conveniently comfortable with C++"...
Until a decision maker sees that D performs 20% better, dramatically reduces
debugging time, cuts source code volume, and has better version control.
↑ ↓ ← → janderson <askme me.com> writes:
Dan wrote:
Current compilers tend to dump IR's to a buffer and then adjust them according
to parser rules. The wrong approach.
I totally want to see this kind of programming get implemented into the
compiler itself. I know it's too much to ask, but if DMD could generate
optimal floating point math and cache pipelines, it would rend the gaming
industry from C++'s hands REAL quick like.
A C++ programmer can say:
"oh yeah, it's a minor upgrade made for my convenience, but I'm already
conveniently comfortable with C++"...
Until a decision maker sees that D performs 20% better, dramatically reduces
debugging time, cuts source code volume, and has better version control.
That would be nice and would help however I think it'll take more then
that to convince a Game company to switch to D, particularly ones using
XNA.
It would have to be a startup company that is not working 360, perhaps a
Nintendo, sony, Linux or perhaps windows. I say perhaps, because if a
company is going to switch to anything its probably XNA. Now if we had
D for Net, maybe the story would be slightly different.
I do like the idea of using D for games, I just see it as being a hard
sell. Scripting and tools are probably the main gateway for D into the
game programming world.
Once a few game companies start using it in their main project and can
demonstrate productivity/speed improvements then the wall may indeed
start to crumble.
-Joel
↑ ↓ ← → Don Clugston <dac nospam.com.au> writes:
Walter Bright wrote:
David B. Held wrote:
Don Clugston wrote:
[...]
But with the proposed changes to 'const', macros, and previous
discussion about whether tuples should be automatically flattened, I
have doubts over how stable that part of the language is. I wouldn't
want to write something that didn't compile any more, or looked
hopelessly obsolete by the time it was published.
I've been bitten before -- CTFE blasted almost all of my previous
metaprogramming code into the stone age <g>. What's the shelf life of
this stuff?.
Well, hopefully, if Walter breaks your code again, it will be to make
it even more elegant (although, that will be a bit of a trick, I
admit). Eventually, the language will be so powerful that you can't
make your code any more compact, and then you won't have to worry
about it getting broken. To be honest, I'd like to see an article
about this in CUJ. I don't know if they would publish it or not, but
if they did, I guarantee it would make more than a few people take a
hard look at D.
I'm pretty sure DDJ would publish it, and it would reach a much wider
audience than CUJ, which has disappeared <g>.
Anyhow, Don, this kind of stuff needs to reach a much wider audience
than a random posting in this n.g. would get.
Ooooooh....
I just updated the code, adding a code generator for D code, for when
x87 is unavailable. I could not believe how easy it was. It was about 30
lines of trivial code. Yet directly generating a 'for' loop is still
better than what Blitz++ does.
I reckon it'd take less than a week to implement everything those
libraries do.
The language synergy of tuples, mixins, CTFE, builtin strings, the ~
operator, constant folding, array slicing, template string parameters,
type deduction, defined inline asm, static if, is() expressions... is
astonishing. *All* of them are required in order for this to work.
↑ ↓ ← → Georg Wrede <georg nospam.org> writes:
Don Clugston wrote:
Ooooooh....
I just updated the code, adding a code generator for D code, for when
x87 is unavailable. I could not believe how easy it was. It was about 30
lines of trivial code. Yet directly generating a 'for' loop is still
better than what Blitz++ does.
I reckon it'd take less than a week to implement everything those
libraries do.
The language synergy of tuples, mixins, CTFE, builtin strings, the ~
operator, constant folding, array slicing, template string parameters,
type deduction, defined inline asm, static if, is() expressions... is
astonishing. *All* of them are required in order for this to work.
Gees, Don,
the above quote would be the perfect ingress (or at least a "quote box")
in a DDJ article!!!!
I'm clenching the chair so I wouldn't buy the tickets to come to your
place and write it together with you, right now!
↑ ↓ ← → Georg Wrede <georg nospam.org> writes:
Quoting from the page:
If you have an article idea, you should first send us an article
proposal. This is a short statement (about one page) that says what your
article will be about. You can mail it to our <a href="address.htm">office</a>
or e-mail it to <a href="mailto:editors ddj.com">editors ddj.com</a>.
Be sure to give us several ways to contact you (such as phone number,
e-mail address, fax number, and mailing address). We'll look at your
idea and get back to you, usually within a month.</p>
<p>Although many people simply write their entire article and send it
to us, there are several *advantages* to sending us a proposal. We can
look at your idea and possibly suggest a particular slant you hadn't
considered. We can also help you to organize the article.
Later, when you start writing the article itself:
Write as if you are giving a brief, informal talk to a group of
coworkers. You want to be concise and well organized because you don't
want to waste their time, but you don't want to be too stuffy or
formal because you know them.
Ah, and to give you more of the "feel" of being an article author,
here's another quote from the same page:
Illustrations and screen captures offer an opportunity to lure
anyone looking through an issue into reading your article. They
can also provide succinct summaries of complex ideas. Since we
redraw most figures anyway, to conform to our size and style
requirements,
you should not worry about making your images very high quality.
Many authors prefer to sketch them by hand and fax or mail them.
Heh, especially the last sentence. The same goes for the text itself.
The point being, from the magazine's point of view, most stuff sent to
them is by teenagers, idiots, mentals, over-the-hill old-timers -- and
only a handful is stuff they'd be interested in. Now, for those few,
they'd kill, lick your boots, or even sharpen your pencil for you.
As to whether this would interest them, all you have to send them is the
above mentioned Ingress. Let them take over from that. Simply naming
Walter as a reference should get you a definite YES/NO from them. And
once you get the YES, they'll hold your hand through and through.
---
For you personally, having your article in DDJ is of course "nice". But
the real benefit is that then you've done it, been there. So, by the
time D advances yet another bit, and your "template jujitsu" turns into
"template nuke" (as perceived from the audience -- you'll never notice
the change yourself!), it's gonna be a lot easier to jot up something
for them again. Or for the next magazine. (Ahem, I'm ashamed to say, but
at least as I experienced, my first article was a bit like "doing it for
the first time with a woman". Once you get over it, the next one is a
piece of cake.)
Thusly, both now and in the future, you're in a position to do D a /lot/
of good -- with effort that's actually less than some of your better D
forages to date.
PS, if they get back to you within /less/ than 4 weeks, they're
drooling. Unless they've read this post, of course. ;-)
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
Ooooooh....
I just updated the code, adding a code generator for D code, for when
x87 is unavailable. I could not believe how easy it was. It was about 30
lines of trivial code. Yet directly generating a 'for' loop is still
better than what Blitz++ does.
I reckon it'd take less than a week to implement everything those
libraries do.
The language synergy of tuples, mixins, CTFE, builtin strings, the ~
operator, constant folding, array slicing, template string parameters,
type deduction, defined inline asm, static if, is() expressions... is
astonishing. *All* of them are required in order for this to work.
Just wait 'till you get AST macros!
But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 months.
This needs to get out now, - I suggest www.artima.com, followed up with
dzone, slashdot, and digg.
It would also be a real treat to have you present this at Amazon's D
conference in August.
↑ ↓ ← → Dan <murpsoft hotmail.com> writes:
Walter Bright Wrote:
Just wait 'till you get AST macros!
But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 months.
This needs to get out now, - I suggest www.artima.com, followed up with
dzone, slashdot, and digg.
It would also be a real treat to have you present this at Amazon's D
conference in August.
Wow. That's the most excited I've seen Walter. : )
I'm going to have to read and understand your code. I can tell.
↑ ↓ ← → Georg Wrede <georg nospam.org> writes:
Dan wrote:
Walter Bright Wrote:
Just wait 'till you get AST macros!
But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6 months.
This needs to get out now, - I suggest www.artima.com, followed up with
dzone, slashdot, and digg.
It would also be a real treat to have you present this at Amazon's D
conference in August.
Wow. That's the most excited I've seen Walter. : )
LOL, have to admit I agree!!!
↑ ↓ ← → Bill Baxter <dnewsgroup billbaxter.com> writes:
Georg Wrede wrote:
Dan wrote:
Walter Bright Wrote:
Just wait 'till you get AST macros!
But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6
months. This needs to get out now, - I suggest www.artima.com,
followed up with dzone, slashdot, and digg.
It would also be a real treat to have you present this at Amazon's D
conference in August.
Wow. That's the most excited I've seen Walter. : )
LOL, have to admit I agree!!!
It really is very cool and all, but did I just miss the benchmarks? I
need numbers, man! You can generate all the assembly you want, but at
the end of the day its the numbers that people care about.
--bb
↑ ↓ ← → Don Clugston <dac nospam.com.au> writes:
Bill Baxter wrote:
Georg Wrede wrote:
Dan wrote:
Walter Bright Wrote:
Just wait 'till you get AST macros!
But this stuff is *awesome*. Scratch DDJ, I don't want to wait 6
months. This needs to get out now, - I suggest www.artima.com,
followed up with dzone, slashdot, and digg.
It would also be a real treat to have you present this at Amazon's D
conference in August.
Wow. That's the most excited I've seen Walter. : )
LOL, have to admit I agree!!!
It really is very cool and all, but did I just miss the benchmarks? I
need numbers, man! You can generate all the assembly you want, but at
the end of the day its the numbers that people care about.
--bb
You're right, but it wasn't posted as a finished product!
I post this sort of thing every now and again, largely to give Walter
some feedback as to what the language is doing at the bleeding edge. In
this case, I thought it would be relevant to AST macro development.
Also I posted this to get an idea if there was much interest in this
sort of thing -- I don't have a great deal of free time, so I want to
make good use of it. I only got the code working a few days ago.
The speed of the code will likely be affected by how well DMD optimises
the tuple copying. I even haven't looked at what code it's actually
generating.
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well in
this case.
The hard part is doing register allocation. Generating code for the
stack machine is trivial. There are some complications for dealing with
87 stack overflows.
↑ ↓ ← → Pragma <ericanderton yahoo.removeme.com> writes:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date can
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic vector
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[]) are
supported.
Like the C++ Blitz++ library, expression templates are used to convert
vector expressions into efficient element-wise operations. Unlike that
library, however, there is no reliance on the compiler's optimiser.
Instead, the expression template is manipulated as text, converted into
postfix, and then passed to a simple CTFE compile-time assembler, which
creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of
x87 assembler. In fact, you probably need to have read Agner Fog's
superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is
somewhat cluttered by the code to check for real/imaginary type mismatch
errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well in
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a language
purist will find more palatable.
Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice job. :)
For others that are interested in how this actually gets the job done, I found
this in your documentation:
* THEORY:
* Expression templates are used to create an expression string of the form
"(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is converted to
* a string containing x87 asm, which is then mixed into a function which
accepts the tuple.
--
- EricAnderton at yahoo
↑ ↓ ← → Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Pragma wrote:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date
can be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic vector
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[]) are
supported.
Like the C++ Blitz++ library, expression templates are used to convert
vector expressions into efficient element-wise operations. Unlike that
library, however, there is no reliance on the compiler's optimiser.
Instead, the expression template is manipulated as text, converted
into postfix, and then passed to a simple CTFE compile-time assembler,
which creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of
x87 assembler. In fact, you probably need to have read Agner Fog's
superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is
somewhat cluttered by the code to check for real/imaginary type
mismatch errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well in
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a language
purist will find more palatable.
Enjoy.
This is a work of art Don - it's practically a compiler extension. Nice
job. :)
For others that are interested in how this actually gets the job done, I
found this in your documentation:
* THEORY:
* Expression templates are used to create an expression string of the
form "(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is converted to
* a string containing x87 asm, which is then mixed into a function which
accepts the tuple.
Hum, a minor question, is a string representation of the expressions
better (easier to use, manipulate, etc.) than a tree representation?
--
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
↑ ↓ ← → Pragma <ericanderton yahoo.removeme.com> writes:
Bruno Medeiros wrote:
Pragma wrote:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date
can be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic vector
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[]) are
supported.
Like the C++ Blitz++ library, expression templates are used to
convert vector expressions into efficient element-wise operations.
Unlike that library, however, there is no reliance on the compiler's
optimiser. Instead, the expression template is manipulated as text,
converted into postfix, and then passed to a simple CTFE compile-time
assembler, which creates highly efficient asm code which is used as a
mixin.
To understand the later parts of the code, you need some knowledge of
x87 assembler. In fact, you probably need to have read Agner Fog's
superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is
somewhat cluttered by the code to check for real/imaginary type
mismatch errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well in
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a language
purist will find more palatable.
Enjoy.
This is a work of art Don - it's practically a compiler extension.
Nice job. :)
For others that are interested in how this actually gets the job done,
I found this in your documentation:
* THEORY:
* Expression templates are used to create an expression string of the
form "(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is converted to
* a string containing x87 asm, which is then mixed into a function
which accepts the tuple.
Hum, a minor question, is a string representation of the expressions
better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this.
In short: no. But it's really an unintentionally loaded question: there are
some rather severe limitations as to what
kind of data structures you can create at compile time. You're basically
limited to tuples and strings, each of which
have drawbacks of their own.
You can create a tree using tuples, but then you run into problems with
over-running the limitations on identifier
length built into the compiler. Strings are no where near as flexible out of
the box, but pose no storage limitations,
which gives a slight edge to string-based representations of data. With some
clever coding, strings can be made to
store just about any structure imaginable (even if it does make for some ugly
code).
--
- EricAnderton at yahoo
↑ ↓ ← → Dan <murpsoft hotmail.com> writes:
Pragma Wrote:
In short: no. But it's really an unintentionally loaded question: there are
some rather severe limitations as to what
kind of data structures you can create at compile time. You're basically
limited to tuples and strings, each of which
have drawbacks of their own.
You can create a tree using tuples, but then you run into problems with
over-running the limitations on identifier
length built into the compiler. Strings are no where near as flexible out of
the box, but pose no storage limitations,
which gives a slight edge to string-based representations of data. With some
clever coding, strings can be made to
store just about any structure imaginable (even if it does make for some ugly
code).
Well said. Once the AST is reflected at compile time, I'm sure the code can be
made vastly more refined; and perhaps additional code generators can be
developed and *hopefully* integrated into D.
Particular generators that spark my interest tend to have to do with x87, SSE
extensions, and x86-64. Most compilers to date don't properly use these
functionalities. Having them integrated into D Agner Fog-optimally would
hugely replace alot of C++ gaming, video, rendering and graphics engine code.
*bigsmile
↑ ↓ ← → Dave <Dave_member pathlink.com> writes:
Dan wrote:
Pragma Wrote:
In short: no. But it's really an unintentionally loaded question: there are
some rather severe limitations as to what
kind of data structures you can create at compile time. You're basically
limited to tuples and strings, each of which
have drawbacks of their own.
You can create a tree using tuples, but then you run into problems with
over-running the limitations on identifier
length built into the compiler. Strings are no where near as flexible out of
the box, but pose no storage limitations,
which gives a slight edge to string-based representations of data. With some
clever coding, strings can be made to
store just about any structure imaginable (even if it does make for some ugly
code).
Well said. Once the AST is reflected at compile time, I'm sure the code can
be made vastly more refined; and perhaps additional code generators can be
developed and *hopefully* integrated into D.
Particular generators that spark my interest tend to have to do with x87, SSE
extensions, and x86-64. Most compilers to date don't properly use these
functionalities. Having them integrated into D Agner Fog-optimally would
hugely replace alot of C++ gaming, video, rendering and graphics engine code.
*bigsmile
Hmmm - I wonder if a lot of what Don's been doing is serving as a "proof of
concept" for the D 2.0
compiler, to be written in D of course <g>
(It just seems like a lot of the new D features fit so well with compiler
development. Especially
for things like "proving constness", etc.)
- Dave
↑ ↓ ← → Pragma <ericanderton yahoo.removeme.com> writes:
Dan wrote:
Pragma Wrote:
In short: no. But it's really an unintentionally loaded question: there are
some rather severe limitations as to what
kind of data structures you can create at compile time. You're basically
limited to tuples and strings, each of which
have drawbacks of their own.
You can create a tree using tuples, but then you run into problems with
over-running the limitations on identifier
length built into the compiler. Strings are no where near as flexible out of
the box, but pose no storage limitations,
which gives a slight edge to string-based representations of data. With some
clever coding, strings can be made to
store just about any structure imaginable (even if it does make for some ugly
code).
Well said. Once the AST is reflected at compile time, I'm sure the code can
be made vastly more refined; and perhaps additional code generators can be
developed and *hopefully* integrated into D.
Thanks. But evaluating the AST will only take things so far. There is a huge
gulf of functionality that stands between
where we are now (at compile time) and what D would be like as a full-on
scripting language (not that I'm advocating
that it be pushed that far). Much of what sits in the middle (namely structs,
multi-dimensional arrays and maps) can go
a long way towards *analyzing* that tree for code generation and more.
Particular generators that spark my interest tend to have to do with x87, SSE
extensions, and x86-64. Most compilers to date don't properly use these
functionalities. Having them integrated into D Agner Fog-optimally would
hugely replace alot of C++ gaming, video, rendering and graphics engine code.
Anyway, I'm grinning too. I can't wait to see what comes up next. So far,
converting Enki to a compile-time generator
is proving to be a huge challenge, but workable.
--
- EricAnderton at yahoo
↑ ↓ ← → Paul Findlay <r.lph50+d gmail.com> writes:
Particular generators that spark my interest tend to have to do with x87,
SSE extensions, and x86-64. Most compilers to date don't properly use
these functionalities. Having them integrated into D Agner Fog-optimally
would hugely replace alot of C++ gaming, video, rendering and graphics
engine code.
64-bit/128-bit chunks at a time
- Paul
↑ ↓ ← → Dan <murpsoft hotmail.com> writes:
Paul Findlay Wrote:
Particular generators that spark my interest tend to have to do with x87,
SSE extensions, and x86-64. Most compilers to date don't properly use
these functionalities. Having them integrated into D Agner Fog-optimally
would hugely replace alot of C++ gaming, video, rendering and graphics
engine code.
64-bit/128-bit chunks at a time
Yeah, pretty much any large (>1kb) buffer copy algorithm runs fastest through
SSE2 ever since 128-bit came out. The raw power overcomes the need for
tweaking the loop instead of just using rep movsd.
I think the asm guys have the best code we could hope for already written up on
a few pages. Agner Fog, Paul Hsieh et al. If we could leech their strategy
(if meaning legal) then I can see a good use for such a thing. I guess we just
ought to wait for AST reflection.
↑ ↓ ← → Don Clugston <dac nospam.com.au> writes:
Pragma wrote:
Bruno Medeiros wrote:
Pragma wrote:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date
can be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic vector
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is
generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[])
are supported.
Like the C++ Blitz++ library, expression templates are used to
convert vector expressions into efficient element-wise operations.
Unlike that library, however, there is no reliance on the compiler's
optimiser. Instead, the expression template is manipulated as text,
converted into postfix, and then passed to a simple CTFE
compile-time assembler, which creates highly efficient asm code
which is used as a mixin.
To understand the later parts of the code, you need some knowledge
of x87 assembler. In fact, you probably need to have read Agner
Fog's superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is
somewhat cluttered by the code to check for real/imaginary type
mismatch errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well
in this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use
of 'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a language
purist will find more palatable.
Enjoy.
This is a work of art Don - it's practically a compiler extension.
Nice job. :)
For others that are interested in how this actually gets the job
done, I found this in your documentation:
* THEORY:
* Expression templates are used to create an expression string of the
form "(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is
converted to
* a string containing x87 asm, which is then mixed into a function
which accepts the tuple.
Hum, a minor question, is a string representation of the expressions
better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this.
In short: no. But it's really an unintentionally loaded question: there
are some rather severe limitations as to what kind of data structures
you can create at compile time. You're basically limited to tuples and
strings, each of which have drawbacks of their own.
You can create a tree using tuples, but then you run into problems with
over-running the limitations on identifier length built into the
compiler.
Strings are no where near as flexible out of the box, but
pose no storage limitations, which gives a slight edge to string-based
representations of data. With some clever coding, strings can be made
to store just about any structure imaginable (even if it does make for
some ugly code).
There's a couple of minor things, though: (1) in the case of generating
D code to mixin, no tree is required (the string just gets mixed back in
again, so it might as well stay as a string).
(2) The "separate string and tuple" structure means the expression can
be optimised without needing to move any of the values around; this
creates less garbage for the compiler to optimise away.
(3) Some patterns are probably easier to find with a string, than a tree.
(4) It's more natural for D coders to think in terms of arrays, than
lists. (and there is much more language support).
(5) A string representation can easily express things like c=a+3*(a+b);
where 'a' appears twice. (At present, there's no way to generate it, but
theoretically the compiler could provide it many cases).
So -- if trees were available, I'm not sure if I'd use them, or not.
↑ ↓ ← → Pragma <ericanderton yahoo.removeme.com> writes:
Don Clugston wrote:
Pragma wrote:
Bruno Medeiros wrote:
Pragma wrote:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the
new mixins (and for metaprogramming in general). My best effort to
date can be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic vector
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is
generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[])
are supported.
Like the C++ Blitz++ library, expression templates are used to
convert vector expressions into efficient element-wise operations.
Unlike that library, however, there is no reliance on the
compiler's optimiser. Instead, the expression template is
manipulated as text, converted into postfix, and then passed to a
simple CTFE compile-time assembler, which creates highly efficient
asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge
of x87 assembler. In fact, you probably need to have read Agner
Fog's superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is
somewhat cluttered by the code to check for real/imaginary type
mismatch errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well
in this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use
of 'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a language
purist will find more palatable.
Enjoy.
This is a work of art Don - it's practically a compiler extension.
Nice job. :)
For others that are interested in how this actually gets the job
done, I found this in your documentation:
* THEORY:
* Expression templates are used to create an expression string of
the form "(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is
converted to
* a string containing x87 asm, which is then mixed into a function
which accepts the tuple.
Hum, a minor question, is a string representation of the expressions
better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this.
In short: no. But it's really an unintentionally loaded question:
there are some rather severe limitations as to what kind of data
structures you can create at compile time. You're basically limited
to tuples and strings, each of which have drawbacks of their own.
You can create a tree using tuples, but then you run into problems
with over-running the limitations on identifier length built into the
compiler.
Here's an example (for grins):
import std.stdio;
template Node(char[] Data,Nodes...){
alias Data data;
alias Nodes nodes;
}
template PrintData(char[] parent){
const char[] PrintData = "";
}
template PrintData(char[] parent,alias Node){
static if(parent == ""){
const char[] PrintData = Node.data ~ "\n" ~
PrintData!(Node.data,Node.nodes);
}
else{
const char[] PrintData = parent ~ "." ~ Node.data ~ "\n" ~
PrintData!(parent ~ "." ~ Node.data,Node.nodes);
}
}
template PrintData(char[] parent,alias Node,V...){
const char[] PrintData = PrintData!(parent,Node) ~ PrintData!(parent,V);
}
// example "tree" structure
alias Node!("1",
Node!("1"),
Node!("2",
Node!("1"),
Node!("2")
)
) dataset;
void main(){
writefln("%s",PrintData!("",dataset));
}
Strings are no where near as flexible out of the box, but
pose no storage limitations, which gives a slight edge to string-based
representations of data. With some clever coding, strings can be made
to store just about any structure imaginable (even if it does make for
some ugly code).
There's a couple of minor things, though: (1) in the case of generating
D code to mixin, no tree is required (the string just gets mixed back in
again, so it might as well stay as a string).
(2) The "separate string and tuple" structure means the expression can
be optimised without needing to move any of the values around; this
creates less garbage for the compiler to optimise away.
(3) Some patterns are probably easier to find with a string, than a tree.
(4) It's more natural for D coders to think in terms of arrays, than
lists. (and there is much more language support).
(5) A string representation can easily express things like c=a+3*(a+b);
where 'a' appears twice. (At present, there's no way to generate it, but
theoretically the compiler could provide it many cases).
So -- if trees were available, I'm not sure if I'd use them, or not.
They're really handy. I've found some uses for them already:
/*
ZeroOrMoreExpr
= generate.ZeroOrMoreExpr()
::= "[" "{" Expression:expr "}" "]" [RangeSetExpr:ranges ]
[Binding:binding ]
["*" SubExpressionTerm:delim ] [SubExpressionTerm:term];
*/
bool ZeroOrMoreExpr(inout char[] value,inout char[] bindings,inout char[]
tokens,inout char[] err){
return Rule!(
generate.ZeroOrMoreExpr,
And!(
Literal!("["),
Literal!("{"),
Bind!(Expression,"expr"),
Literal!("}"),
Literal!("]"),
Optional!(
Bind!(RangeSetExpr,"ranges")
),
Optional!(
Bind!(Binding,"binding")
),
Optional!(
And!(
Literal!("*"),
Bind!(SubExpressionTerm,"delim")
)
),
Bind!(SubExpressionTerm,"term")
)
)(value,bindings,tokens,err);
}
There's a *lot* going on in this sample. Basically what you see here is a
chunk of the as-of-yet-experimental
compile-time Enki parser. This piece parses the Zero-or-more expression part
of the EBNF variant that Enki supports.
The templates evaluate to CTFE's that in turn make up the parser when it's
executed. So there's several layers of
compile-time evaluation going on here. Also, what's not obvious is that
"char[] tokens" is actually an *array of
strings* that is stored in a single char[] array; each string's length data is
encoded as a size_t mapped onto the
appropriate number of chars. The Bind!() expressions also map parsed out data
to a key/value set which is stored in a
similar fashion (char[] bindings).
The goals here were to side-step the limitations of D's identifier name-length
while providing a user-readable toolkit
for parser composition; the only limit now is placed on the length of any given
rule. I haven't found a way to unify
the compile-time and run-time parser generation yet (one backend with two
targets), but I anticipate using something
very similar to the above by providing a CT and RT version of the template
library. At a minimum I plan on having this
consume a .bnf file at compile-time, in order to create a run-time parser via
mixin.
--
- EricAnderton at yahoo
↑ ↓ ← → Don Clugston <dac nospam.com.au> writes:
Pragma wrote:
Don Clugston wrote:
Pragma wrote:
Bruno Medeiros wrote:
Pragma wrote:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the
new mixins (and for metaprogramming in general). My best effort to
date can be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic
vector operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is
generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[])
are supported.
Like the C++ Blitz++ library, expression templates are used to
convert vector expressions into efficient element-wise operations.
Unlike that library, however, there is no reliance on the
compiler's optimiser. Instead, the expression template is
manipulated as text, converted into postfix, and then passed to a
simple CTFE compile-time assembler, which creates highly efficient
asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge
of x87 assembler. In fact, you probably need to have read Agner
Fog's superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it
is somewhat cluttered by the code to check for real/imaginary type
mismatch errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well
in this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug
#1028);
- inability to define operators for built-in arrays (hence the use
of 'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a
language purist will find more palatable.
Enjoy.
This is a work of art Don - it's practically a compiler extension.
Nice job. :)
For others that are interested in how this actually gets the job
done, I found this in your documentation:
* THEORY:
* Expression templates are used to create an expression string of
the form "(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is
converted to
* a string containing x87 asm, which is then mixed into a function
which accepts the tuple.
Hum, a minor question, is a string representation of the expressions
better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this.
In short: no. But it's really an unintentionally loaded question:
there are some rather severe limitations as to what kind of data
structures you can create at compile time. You're basically limited
to tuples and strings, each of which have drawbacks of their own.
You can create a tree using tuples, but then you run into problems
with over-running the limitations on identifier length built into the
compiler.
Here's an example (for grins):
import std.stdio;
template Node(char[] Data,Nodes...){
alias Data data;
alias Nodes nodes;
}
// example "tree" structure
alias Node!("1",
Node!("1"),
Node!("2",
Node!("1"),
Node!("2")
)
) dataset;
The problem I was referring to, is: how to store both values, and
functions/operators, inside the tree. It seems to get messy very quickly.
So -- if trees were available, I'm not sure if I'd use them, or not.
They're really handy. I've found some uses for them already:
I meant for this application. There's no doubt they're indispensable in
other contexts.
/*
ZeroOrMoreExpr
= generate.ZeroOrMoreExpr()
::= "[" "{" Expression:expr "}" "]" [RangeSetExpr:ranges ]
[Binding:binding ]
["*" SubExpressionTerm:delim ] [SubExpressionTerm:term];
*/
bool ZeroOrMoreExpr(inout char[] value,inout char[] bindings,inout
char[] tokens,inout char[] err){
return Rule!(
generate.ZeroOrMoreExpr,
And!(
Literal!("["),
Literal!("{"),
Bind!(Expression,"expr"),
Literal!("}"),
Literal!("]"),
Optional!(
Bind!(RangeSetExpr,"ranges")
),
Optional!(
Bind!(Binding,"binding")
),
Optional!(
And!(
Literal!("*"),
Bind!(SubExpressionTerm,"delim")
)
),
Bind!(SubExpressionTerm,"term")
)
)(value,bindings,tokens,err);
}
There's a *lot* going on in this sample. Basically what you see here is
a chunk of the as-of-yet-experimental compile-time Enki parser. This
piece parses the Zero-or-more expression part of the EBNF variant that
Enki supports.
The templates evaluate to CTFE's that in turn make up the parser when
it's executed. So there's several layers of compile-time evaluation
going on here. Also, what's not obvious is that "char[] tokens" is
actually an *array of strings* that is stored in a single char[] array;
each string's length data is encoded as a size_t mapped onto the
appropriate number of chars. The Bind!() expressions also map parsed
out data to a key/value set which is stored in a similar fashion (char[]
bindings).
Seriously cool! Seems like you're generating a tree of nested mixins?
Anyway, I suspect this will really benefit from any compiler
improvements, eg CTFE support for AAs or nested functions. And obviously
AST macros.
The goals here were to side-step the limitations of D's identifier
name-length while providing a user-readable toolkit for parser
composition; the only limit now is placed on the length of any given
rule. I haven't found a way to unify the compile-time and run-time
parser generation yet (one backend with two targets), but I anticipate
using something very similar to the above by providing a CT and RT
version of the template library. At a minimum I plan on having this
consume a .bnf file at compile-time, in order to create a run-time
parser via mixin.
I really think the new mixins + CTFE was a breakthrough. Publish some of
this stuff, and I think we'll see an exodus of many Boost developers into D.
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
I really think the new mixins + CTFE was a breakthrough. Publish some of
this stuff, and I think we'll see an exodus of many Boost developers
into D.
You guys gotta publish!
↑ ↓ ← → Don Clugston <dac nospam.com.au> writes:
Walter Bright wrote:
Don Clugston wrote:
I really think the new mixins + CTFE was a breakthrough. Publish some
of this stuff, and I think we'll see an exodus of many Boost
developers into D.
You guys gotta publish!
I've begun a draft. A historical question for you --
On this page,
http://www.artima.com/cppsource/top_cpp_software.html
Scott Meyers says that g++ was the first compiler to generate native
code. But I thought Zortech was older than g++. Is that correct?
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
I've begun a draft. A historical question for you --
On this page,
http://www.artima.com/cppsource/top_cpp_software.html
Scott Meyers says that g++ was the first compiler to generate native
code. But I thought Zortech was older than g++. Is that correct?
Depends on how you look at it. g++ was first 'released' in December,
1987. But the release notes for it say: "The GNU C++ Compiler is still
in test release, and is NOT ready for everyday use" so I don't consider
that a real release. Oregon Software released their native C++ compiler
(not for the PC) in Jan or Feb 1988, nobody seems to recall exactly.
Zortech's first release was in Jun 1988. Michael Tiemann, author of g++,
calls the Sep 1988 release the "first really stable version."
↑ ↓ ← → Pragma <ericanderton yahoo.removeme.com> writes:
Walter Bright wrote:
Don Clugston wrote:
I really think the new mixins + CTFE was a breakthrough. Publish some
of this stuff, and I think we'll see an exodus of many Boost
developers into D.
You guys gotta publish!
Baby steps Walter, baby steps... ;)
--
- EricAnderton at yahoo
↑ ↓ ← → Pragma <ericanderton yahoo.removeme.com> writes:
Don Clugston wrote:
The problem I was referring to, is: how to store both values, and
functions/operators, inside the tree. It seems to get messy very quickly.
It can, especially with the additional code you need to treat all those
templates as specialized data types.
I meant for this application. There's no doubt they're indispensable in
other contexts.
Oh, my bad. Yea, it would probably be overkill for BLADE. :)
Basically what you see here
is a chunk of the as-of-yet-experimental compile-time Enki parser.
This piece parses the Zero-or-more expression part of the EBNF variant
that Enki supports.
The templates evaluate to CTFE's that in turn make up the parser when
it's executed. So there's several layers of compile-time evaluation
going on here. Also, what's not obvious is that "char[] tokens" is
actually an *array of strings* that is stored in a single char[]
array; each string's length data is encoded as a size_t mapped onto
the appropriate number of chars. The Bind!() expressions also map
parsed out data to a key/value set which is stored in a similar
fashion (char[] bindings).
Seriously cool! Seems like you're generating a tree of nested mixins?
Almost. The generate.ZeroOrMore returns an arbitrary string, that may be
another "map" or a chunk of runtime code; the
value is placed in the "value" string that's passed in. The *rootmost*
generator is what compiles all this stuff into
an actual chunk of mixin-able code. It's analogous to how the current
rendition of Enki works.
It seems overkill, but it's needed so I can do some basic semantic analysis,
and do things like declare binding vars.
I'm still looking for ways to simplify the process.
Anyway, I suspect this will really benefit from any compiler
improvements, eg CTFE support for AAs or nested functions. And obviously
AST macros.
Definitely for AA's but not so much for AST macros - I get the impression that
AST manipulation will only be useful for
D code, and not for completely arbitrary grammars like EBNF. Getting
compile-time AA support would cut Enki's code size
down by almost a third:
const char[] hackedArray =
"\x00\x00\x00\x05hello\x00\x00\x00\x05cruel\x00\x00\x00\x05world";
Under the hood, this is what most of my data looks like. Dropping the support
routines for manipulating such structures
would help make things a lot less cumbersome. :(
I really think the new mixins + CTFE was a breakthrough. Publish some of
this stuff, and I think we'll see an exodus of many Boost developers
into D.
Agreed. CTFE frees us from many limitations imposed by templates, and the
added flexibility of mixin() pretty much
gives us the same strength of javascript's eval() statement at compile-time.
It's a huge deal.
It's possible that we might find a handful of interested devs in their camp,
but I think we're still back to the same
old problem as with the rest of C++: momentum. I sincerely doubt we'll see a
mass exodus from one camp to this one
regardless of how good a job is done here. Of course, I'll be happy to be
wrong about that. :)
--
- EricAnderton at yahoo
↑ ↓ ← → KlausO <oberhofer users.sf.net> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Pragma schrieb:
Don Clugston wrote:
Pragma wrote:
Bruno Medeiros wrote:
Pragma wrote:
Don Clugston wrote:
I have been trying to come up with a convincing use case for the
new mixins (and for metaprogramming in general). My best effort to
date can be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic
vector operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is
generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[])
are supported.
Like the C++ Blitz++ library, expression templates are used to
convert vector expressions into efficient element-wise operations.
Unlike that library, however, there is no reliance on the
compiler's optimiser. Instead, the expression template is
manipulated as text, converted into postfix, and then passed to a
simple CTFE compile-time assembler, which creates highly efficient
asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge
of x87 assembler. In fact, you probably need to have read Agner
Fog's superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it
is somewhat cluttered by the code to check for real/imaginary type
mismatch errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well
in this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug
#1028);
- inability to define operators for built-in arrays (hence the use
of 'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a
language purist will find more palatable.
Enjoy.
This is a work of art Don - it's practically a compiler extension.
Nice job. :)
For others that are interested in how this actually gets the job
done, I found this in your documentation:
* THEORY:
* Expression templates are used to create an expression string of
the form "(a+b*c)+d"
* and a tuple, the entries of which correspond to a, b, c, d, ...
* This string is converted to postfix. The postfix string is
converted to
* a string containing x87 asm, which is then mixed into a function
which accepts the tuple.
Hum, a minor question, is a string representation of the expressions
better (easier to use, manipulate, etc.) than a tree representation?
I know Don wrote this lib, but I think I can answer this.
In short: no. But it's really an unintentionally loaded question:
there are some rather severe limitations as to what kind of data
structures you can create at compile time. You're basically limited
to tuples and strings, each of which have drawbacks of their own.
You can create a tree using tuples, but then you run into problems
with over-running the limitations on identifier length built into the
compiler.
Here's an example (for grins):
import std.stdio;
template Node(char[] Data,Nodes...){
alias Data data;
alias Nodes nodes;
}
template PrintData(char[] parent){
const char[] PrintData = "";
}
template PrintData(char[] parent,alias Node){
static if(parent == ""){
const char[] PrintData = Node.data ~ "\n" ~
PrintData!(Node.data,Node.nodes);
}
else{
const char[] PrintData = parent ~ "." ~ Node.data ~ "\n" ~
PrintData!(parent ~ "." ~ Node.data,Node.nodes);
}
}
template PrintData(char[] parent,alias Node,V...){
const char[] PrintData = PrintData!(parent,Node) ~
PrintData!(parent,V);
}
// example "tree" structure
alias Node!("1",
Node!("1"),
Node!("2",
Node!("1"),
Node!("2")
)
) dataset;
void main(){
writefln("%s",PrintData!("",dataset));
}
Strings are no where near as flexible out of the box, but
pose no storage limitations, which gives a slight edge to
string-based representations of data. With some clever coding,
strings can be made to store just about any structure imaginable
(even if it does make for some ugly code).
There's a couple of minor things, though: (1) in the case of
generating D code to mixin, no tree is required (the string just gets
mixed back in again, so it might as well stay as a string).
(2) The "separate string and tuple" structure means the expression can
be optimised without needing to move any of the values around; this
creates less garbage for the compiler to optimise away.
(3) Some patterns are probably easier to find with a string, than a tree.
(4) It's more natural for D coders to think in terms of arrays, than
lists. (and there is much more language support).
(5) A string representation can easily express things like c=a+3*(a+b);
where 'a' appears twice. (At present, there's no way to generate it,
but theoretically the compiler could provide it many cases).
So -- if trees were available, I'm not sure if I'd use them, or not.
They're really handy. I've found some uses for them already:
/*
ZeroOrMoreExpr
= generate.ZeroOrMoreExpr()
::= "[" "{" Expression:expr "}" "]" [RangeSetExpr:ranges ]
[Binding:binding ]
["*" SubExpressionTerm:delim ] [SubExpressionTerm:term];
*/
bool ZeroOrMoreExpr(inout char[] value,inout char[] bindings,inout
char[] tokens,inout char[] err){
return Rule!(
generate.ZeroOrMoreExpr,
And!(
Literal!("["),
Literal!("{"),
Bind!(Expression,"expr"),
Literal!("}"),
Literal!("]"),
Optional!(
Bind!(RangeSetExpr,"ranges")
),
Optional!(
Bind!(Binding,"binding")
),
Optional!(
And!(
Literal!("*"),
Bind!(SubExpressionTerm,"delim")
)
),
Bind!(SubExpressionTerm,"term")
)
)(value,bindings,tokens,err);
}
There's a *lot* going on in this sample. Basically what you see here is
a chunk of the as-of-yet-experimental compile-time Enki parser. This
piece parses the Zero-or-more expression part of the EBNF variant that
Enki supports.
The templates evaluate to CTFE's that in turn make up the parser when
it's executed. So there's several layers of compile-time evaluation
going on here. Also, what's not obvious is that "char[] tokens" is
actually an *array of strings* that is stored in a single char[] array;
each string's length data is encoded as a size_t mapped onto the
appropriate number of chars. The Bind!() expressions also map parsed
out data to a key/value set which is stored in a similar fashion (char[]
bindings).
The goals here were to side-step the limitations of D's identifier
name-length while providing a user-readable toolkit for parser
composition; the only limit now is placed on the length of any given
rule. I haven't found a way to unify the compile-time and run-time
parser generation yet (one backend with two targets), but I anticipate
using something very similar to the above by providing a CT and RT
version of the template library. At a minimum I plan on having this
consume a .bnf file at compile-time, in order to create a run-time
parser via mixin.
Hey pragma,
really cool, I've come up with a similar structure while experimenting
with a PEG parser in D (see attachment)
after I've read this article series on
Codeproject:
http://www.codeproject.com/cpp/crafting_interpreter_p1.asp
http://www.codeproject.com/cpp/crafting_interpreter_p2.asp
http://www.codeproject.com/cpp/crafting_interpreter_p3.asp
The template system of D does an awesome job in keeping
templated PEG grammars readable.
BTW: If you turn Enki into a PEG style parser I definitely throw
my attempts into the dustbin :-)
Greets
KlausO
↑ ↓ ← → Pragma <ericanderton yahoo.removeme.com> writes:
KlausO wrote:
Hey pragma,
really cool, I've come up with a similar structure while experimenting
with a PEG parser in D (see attachment)
after I've read this article series on
Codeproject:
http://www.codeproject.com/cpp/crafting_interpreter_p1.asp
http://www.codeproject.com/cpp/crafting_interpreter_p2.asp
http://www.codeproject.com/cpp/crafting_interpreter_p3.asp
The template system of D does an awesome job in keeping
templated PEG grammars readable.
Yes it does! Thanks for posting this - I didn't even know that article was
there.
BTW: If you turn Enki into a PEG style parser I definitely throw
my attempts into the dustbin :-)
Greets
Wow, that's one heck of an endorsement. Thanks, but don't throw anything out
yet. This rendition of Enki is still a
ways off though. FYI I plan on keeping Enki's internals as human-readable as
possible by keeping it self-hosting. So
there'll be two ways to utilize the toolkit: EBNF coding and "by hand".
alias Action!(
And!(
PlusRepeat!(EmailChar),
Char!(' '),
PlusRepeat!(
And!(
Or!(
In!(
Range!('a', 'z'),
Range!('A', 'Z'),
Range!('0', '9'),
Char!('_'),
Char!('%'),
Char!('-')
),
Char!('.')
),
Not!(EmailSuffix)
)
),
Char!('.'),
EmailSuffix
),
delegate void(char[] email) { writefln("<email:", email, ">"); }
)
Email;
I had an earlier cut that looked a lot like this. :) But there's a very subtle
problem lurking in there. By making
your entire grammar one monster-sized template instance, you'll run into DMD's
identifier-length limit *fast*. As a
result it failed when I tried to transcribe Enki's ENBF definition. That's why
I wrap each rule as a CTFE-style
function as it side-steps that issue rather nicely, without generating too many
warts.
--
- EricAnderton at yahoo
↑ ↓ ← → KlausO <oberhofer users.sf.net> writes:
Pragma schrieb:
KlausO wrote:
Hey pragma,
really cool, I've come up with a similar structure while experimenting
with a PEG parser in D (see attachment)
after I've read this article series on
Codeproject:
http://www.codeproject.com/cpp/crafting_interpreter_p1.asp
http://www.codeproject.com/cpp/crafting_interpreter_p2.asp
http://www.codeproject.com/cpp/crafting_interpreter_p3.asp
The template system of D does an awesome job in keeping
templated PEG grammars readable.
Yes it does! Thanks for posting this - I didn't even know that article
was there.
BTW: If you turn Enki into a PEG style parser I definitely throw
my attempts into the dustbin :-)
Greets
Wow, that's one heck of an endorsement. Thanks, but don't throw
anything out yet. This rendition of Enki is still a ways off though.
FYI I plan on keeping Enki's internals as human-readable as possible by
keeping it self-hosting. So there'll be two ways to utilize the
toolkit: EBNF coding and "by hand".
Nice to hear that I hit your taste :-)
alias Action!(
And!(
PlusRepeat!(EmailChar),
Char!(' '), PlusRepeat!(
And!(
Or!(
In!(
Range!('a', 'z'),
Range!('A', 'Z'),
Range!('0', '9'),
Char!('_'),
Char!('%'),
Char!('-')
),
Char!('.')
),
Not!(EmailSuffix)
)
),
Char!('.'),
EmailSuffix
),
delegate void(char[] email) { writefln("<email:", email,
">"); }
)
Email;
I had an earlier cut that looked a lot like this. :) But there's a very
subtle problem lurking in there. By making your entire grammar one
monster-sized template instance, you'll run into DMD's identifier-length
limit *fast*. As a result it failed when I tried to transcribe Enki's
ENBF definition. That's why I wrap each rule as a CTFE-style function
as it side-steps that issue rather nicely, without generating too many
warts.
Another issue I ran into is circular template dependencies. You could
get nasty error messages like
dpeg.d(282): Error: forward reference to
'And!(Alternative,OptRepeat!(And!(WS,Char!('/'),WS,Alternative)),WS)'
Any idea how they could be resolved ?
FYI:
Other good PEG references:
http://pdos.csail.mit.edu/~baford/packrat/
Cat is a open source interpreter which utilizes a rule based parser in C#
http://www.codeproject.com/csharp/cat.asp
very interesting had been
http://code.google.com/p/cat-language/wiki/HowTheInterpreterWorks
C++ templated parsers
http://www.codeproject.com/cpp/yard-xml-parser.asp
http://www.codeproject.com/cpp/biscuit.asp
↑ ↓ ← → Pragma <ericanderton yahoo.removeme.com> writes:
KlausO wrote:
Pragma schrieb:
KlausO wrote:
alias Action!(
And!(
PlusRepeat!(EmailChar),
Char!(' '), PlusRepeat!(
And!(
Or!(
In!(
Range!('a', 'z'),
Range!('A', 'Z'),
Range!('0', '9'),
Char!('_'),
Char!('%'),
Char!('-')
),
Char!('.')
),
Not!(EmailSuffix)
)
),
Char!('.'),
EmailSuffix
),
delegate void(char[] email) { writefln("<email:", email,
">"); }
)
Email;
I had an earlier cut that looked a lot like this. :) But there's a
very subtle problem lurking in there. By making your entire grammar
one monster-sized template instance, you'll run into DMD's
identifier-length limit *fast*. As a result it failed when I tried to
transcribe Enki's ENBF definition. That's why I wrap each rule as a
CTFE-style function as it side-steps that issue rather nicely, without
generating too many warts.
Another issue I ran into is circular template dependencies. You could
get nasty error messages like
dpeg.d(282): Error: forward reference to
'And!(Alternative,OptRepeat!(And!(WS,Char!('/'),WS,Alternative)),WS)'
Any idea how they could be resolved ?
Yep, that one bit me too; just use the CTFE trick I mentioned. Wrapping each
rule fixes this since DMD can resolve
forward references to functions, but not with template instances.
FYI:
Other good PEG references:
http://pdos.csail.mit.edu/~baford/packrat/
Cat is a open source interpreter which utilizes a rule based parser in C#
http://www.codeproject.com/csharp/cat.asp
very interesting had been
http://code.google.com/p/cat-language/wiki/HowTheInterpreterWorks
C++ templated parsers
http://www.codeproject.com/cpp/yard-xml-parser.asp
http://www.codeproject.com/cpp/biscuit.asp
Thanks. I can always use a little more research to lean on.
--
- EricAnderton at yahoo
↑ ↓ ← → BCS <BCS pathlink.com> writes:
KlausO wrote:
Pragma schrieb:
I had an earlier cut that looked a lot like this. :) But there's a
very subtle problem lurking in there. By making your entire grammar
one monster-sized template instance, you'll run into DMD's
identifier-length limit *fast*. As a result it failed when I tried
to transcribe Enki's ENBF definition. That's why I wrap each rule as
a CTFE-style function as it side-steps that issue rather nicely,
without generating too many warts.
Another issue I ran into is circular template dependencies. You could
get nasty error messages like
The way my dparse sidesteps this (both the id length limit and forward
references) is to have the parser functions specialized only on the name
of the reduction, the grammar is carried in an outer scope (the mixed in
template that is never used as an identifier). This also allows the
inner template to specialize on anything that is defined.
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Pragma wrote:
By making your entire grammar one
monster-sized template instance, you'll run into DMD's identifier-length
limit *fast*. As a result it failed when I tried to transcribe Enki's
ENBF definition. That's why I wrap each rule as a CTFE-style function
as it side-steps that issue rather nicely, without generating too many
warts.
One of the motivations for CTFE was to address that exact problem.
↑ ↓ ← → Witold Baryluk <baryluk mpi.int.pl> writes:
Hello,
Very good work. I'm impressed. Actually i was developing very
similar program, but yours is practicly all I was needing.
I'm developing linear algebra package in D, and now I'm optimising
it for vector machines, so your BLAS1-like pacakge will be
helpful. :)
Thx.
--
Witold Baryluk
↑ ↓ ← → Don Clugston <dac nospam.com.au> writes:
Witold Baryluk wrote:
Hello,
Very good work. I'm impressed. Actually i was developing very
similar program, but yours is practicly all I was needing.
I'm developing linear algebra package in D, and now I'm optimising
it for vector machines, so your BLAS1-like pacakge will be
helpful. :)
Do you intend to implement BLAS2 and BLAS3-like functionality? I feel
that I don't know enough about cache-efficient matrix blocking
techniques to be confident in presenting code. I don't know how
sophisticated the techniques are in libraries like ATLAS, but the ones
used in Blitz++ should be easy to compete with.
Blade is still mostly proof-of-concept rather than being
industrial-strength -- there are so many possibilities for improvement,
it's not well tested, and I haven't even put any effort into making the
code look nice. But as Davidl said, it shows that hard-core back-end
optimisation can now be done in a library at compile time.
The code is quite modularised, so it would be straightforward to add a
check for 'is it SSE-able' (ie, are all the vectors floats) and 'is it
SSE2-able' (are all the vectors doubles), and if so, send them off into
specialised assemblers, otherwise send them to the x87 one. The
postfix-ing step will involve searching for *+ combinations where fma
can be used. The ability to know the vector length in many cases can be
a huge advantage for SSE code, where loop unrolling by a factor of 2 or
4 is always necessary. I haven't got that far yet.
I'm pretty sure that this type of technique can blow almost everything
else out of the water. <g>
↑ ↓ ← → Davidl <Davidl 126.com> writes:
i think compilers have feeled a great challenge from compile-time back-e=
nd =
like power
I have been trying to come up with a convincing use case for the new =
mixins (and for metaprogramming in general). My best effort to date ca=
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blad=
It generates near-optimal x87 asm code for BLAS1-style basic vector =
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=3DBladeDebug to see the asm code which is genera=
Typical usage:
void main()
{
auto p =3D Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q =3D Vec([3.5L, 1.1, 3.8]); // ditto
auto r =3D Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z =3D Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idouble=
real d =3D dot(r, p+r+r);
ireal e =3D dot(r, z);
q -=3D ((r+p)*18.0L*314.1L - (p-r))* 35;
d =3D dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[]) are =
supported.
Like the C++ Blitz++ library, expression templates are used to convert=
vector expressions into efficient element-wise operations. Unlike that=
library, however, there is no reliance on the compiler's optimiser. =
Instead, the expression template is manipulated as text, converted int=
postfix, and then passed to a simple CTFE compile-time assembler, whic=
creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of =
x87 assembler. In fact, you probably need to have read Agner Fog's =
superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is =
somewhat cluttered by the code to check for real/imaginary type mismat=
errors).
* I've often read that the x87 floating-point stack is notoriously =
difficult for compilers to write code for, but it works quite well in =
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);=
- inability to define operators for built-in arrays (hence the use of =
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by =
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new =
mixins are. In this case, the mixin forms the _entire_ body of the =
function. This is an interesting situation which I think a language =
purist will find more palatable.
Enjoy.
↑ ↓ ← → Jascha Wetzel <"[firstname]" mainia.de> writes:
This is really neat.
I like that this basically is a compiler extension and it's still
a) well readable
b) produces simple error messages during compilation (if any)
c) introduces non-artificial syntax for the task
c) will actually depend more on the versatility of operator overloading
than the CTFE stuff.
a) and b) are so much better than what c++ template magic gave us (as
can be observed in boost).
Now i think there's only one requirement missing to make it a perfect
compiler extension and that is to produce debuggable code.
Actually in this case, as the extension is supposed to generate mostly
single line expression statements, it might not be that important. But
if we consider multi-line DSL statements, a way to mark generated code
as belonging to one of the DSL source lines would be great.
I think, a way for a CTF to recieve the file+line where it was called
from would do the trick. In BLADE that information could be added to the
expression string, passed through to the postfix notation and finally be
inserted into the ASM code with #line directives.
Don Clugston wrote:
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date can
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic vector
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[]) are
supported.
Like the C++ Blitz++ library, expression templates are used to convert
vector expressions into efficient element-wise operations. Unlike that
library, however, there is no reliance on the compiler's optimiser.
Instead, the expression template is manipulated as text, converted into
postfix, and then passed to a simple CTFE compile-time assembler, which
creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of
x87 assembler. In fact, you probably need to have read Agner Fog's
superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is
somewhat cluttered by the code to check for real/imaginary type mismatch
errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well in
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a language
purist will find more palatable.
Enjoy.
↑ ↓ ← → Mikola Lysenko <mclysenk mtu.edu> writes:
Excellent work! I have to say that I've had my misgivings about mixin
statements, but I'm glad to see that BLADE and Don's hard work have
proven me wrong. Compile time vector expressions are desperately
needed, and this library goes a long way to sorting things out. As far
as meta-type tricks go, this is by far the coolest I've ever seen in any
programming language.
One very interesting application I would like to see with this technique
would be a compile time geometric algebra library. If no one writes one
by this summer, I will probably do it myself. (Actually, when I heard
the name BLADE, my heart skipped a beat thinking it was a GA library,
but alas it is not...) Maybe it is my unrestrained exuberance, I would
guess that a strong D implementation should easily crush the leading C++
GA lib, GAIGEN in both performance and ease of use.
For reference, here is the GAIGEN homepage:
http://www.science.uva.nl/ga/gaigen/content_gaigen.html
-Mikola Lysenko
↑ ↓ ← → torhu <fake address.dude> writes:
Mikola Lysenko wrote:
<snip>
One very interesting application I would like to see with this technique
would be a compile time geometric algebra library. If no one writes one
by this summer, I will probably do it myself. (Actually, when I heard
the name BLADE, my heart skipped a beat thinking it was a GA library,
but alas it is not...) Maybe it is my unrestrained exuberance, I would
guess that a strong D implementation should easily crush the leading C++
GA lib, GAIGEN in both performance and ease of use.
Oh my GAD? Get it? ;)
↑ ↓ ← → Bill Baxter <dnewsgroup billbaxter.com> writes:
Mikola Lysenko wrote:
Excellent work! I have to say that I've had my misgivings about mixin
statements, but I'm glad to see that BLADE and Don's hard work have
proven me wrong. Compile time vector expressions are desperately
needed, and this library goes a long way to sorting things out. As far
as meta-type tricks go, this is by far the coolest I've ever seen in any
programming language.
One very interesting application I would like to see with this technique
would be a compile time geometric algebra library. If no one writes one
by this summer, I will probably do it myself. (Actually, when I heard
the name BLADE, my heart skipped a beat thinking it was a GA library,
but alas it is not...) Maybe it is my unrestrained exuberance, I would
guess that a strong D implementation should easily crush the leading C++
GA lib, GAIGEN in both performance and ease of use.
For reference, here is the GAIGEN homepage:
http://www.science.uva.nl/ga/gaigen/content_gaigen.html
-Mikola Lysenko
Wow, that looks really sweet. Last I heard on GA was that it was good
for working out the math, but too slow for real use. Now it's fast too
apparently! Now I've really got to learn GA. And a D compile-time fu
version would be neat.
One question: from the GAIGEN page it sounds almost like GAIGEN
generates off-line a set of primitive operations that are efficient for
the chosen specific GA domain (3D euclidean or whatever). If so, then
that sounds pretty good to me. What's wrong with using an off-line
approach? Are there some expressions things that are just too difficult
to optimize ahead of time without seeing the whole expression as it will
actually used? Or is it more like GAIGEN creates the primitives
operations for a particular GA like dot product, cross product, and
matrix multiplies? If so then it still maybe leaves a lot of
optimization opportunities on the floor. Like evaluating A*B*C*vec as
((A*B)*C)*vec vs. (A*(B*(C*vec))) can make a big difference if ABC are
big matrices and vec is a small vector.
--bb
↑ ↓ ← → Mikola Lysenko <mclysenk mtu.edu> writes:
Bill Baxter wrote:
Mikola Lysenko wrote:
Excellent work! I have to say that I've had my misgivings about mixin
statements, but I'm glad to see that BLADE and Don's hard work have
proven me wrong. Compile time vector expressions are desperately
needed, and this library goes a long way to sorting things out. As
far as meta-type tricks go, this is by far the coolest I've ever seen
in any programming language.
One very interesting application I would like to see with this
technique would be a compile time geometric algebra library. If no
one writes one by this summer, I will probably do it myself.
(Actually, when I heard the name BLADE, my heart skipped a beat
thinking it was a GA library, but alas it is not...) Maybe it is my
unrestrained exuberance, I would guess that a strong D implementation
should easily crush the leading C++ GA lib, GAIGEN in both performance
and ease of use.
For reference, here is the GAIGEN homepage:
http://www.science.uva.nl/ga/gaigen/content_gaigen.html
-Mikola Lysenko
Wow, that looks really sweet. Last I heard on GA was that it was good
for working out the math, but too slow for real use. Now it's fast too
apparently! Now I've really got to learn GA. And a D compile-time fu
version would be neat.
One question: from the GAIGEN page it sounds almost like GAIGEN
generates off-line a set of primitive operations that are efficient for
the chosen specific GA domain (3D euclidean or whatever). If so, then
that sounds pretty good to me. What's wrong with using an off-line
approach? Are there some expressions things that are just too difficult
to optimize ahead of time without seeing the whole expression as it will
actually used? Or is it more like GAIGEN creates the primitives
operations for a particular GA like dot product, cross product, and
matrix multiplies? If so then it still maybe leaves a lot of
optimization opportunities on the floor. Like evaluating A*B*C*vec as
((A*B)*C)*vec vs. (A*(B*(C*vec))) can make a big difference if ABC are
big matrices and vec is a small vector.
--bb
As I understand it, GAIGEN uses an external program to generate the
implementation, and the implementation uses template expressions to
actually perform the optimizations. The process seems a bit contrived
to me, but I'm sure they have their reasons. Also if you look at the
performance statistics on that site, you will see that the geometric
algebra implementation is actually SLOWER than the standard linear
algebra (but not by much.) The main advantage of GA when programming is
that the code is shorter and simpler.
Recently, GA has been getting some more widespread acceptance and
commercial success. Take Geomerics for example; they're a small company
that specializes in graphics and physics software based on GA inspired
algorithms. Here is there website:
http://www.geomerics.com
They've got a very impressive realtime radiosity engine that seems to
have gotten some press in the last few weeks. There's also some
interesting stuff in the technology section on their website.
Writing a GA library for D is definitely on my todo list (right after I
get done with this semester of classes.) I will probably work on it
this summer unless someone beats me to it.
-Mik
|
|