www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Tokenizing D at compile time?

reply dsimcha <dsimcha yahoo.com> writes:
I'm working on a parallel array ops implementation for 
std.parallel_algorithm.  (For the latest work in progress see 
https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d 
).

To make it (somewhat) pretty, I need to be able to tokenize a single 
statement worth of D source code at compile time.  Right now, the syntax 
requires manual tokenization:

mixin(parallelArrayOp(
    "lhs[]", "=", "op1[]", "*", "op2[]", "/", "op3[]"
));

where lhs, op1, op2, op3 are arrays.

I'd like it to be something like:

mixin(parallelArrayOp(
     "lhs[] = op1[] * op2[] / op3[]"
));

Does anyone have/is there any easy way to write a compile-time D tokenizer?
Aug 25 2011
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 08/26/2011 03:08 AM, dsimcha wrote:
 I'm working on a parallel array ops implementation for
 std.parallel_algorithm. (For the latest work in progress see
 https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d
 ).

 To make it (somewhat) pretty, I need to be able to tokenize a single
 statement worth of D source code at compile time. Right now, the syntax
 requires manual tokenization:

 mixin(parallelArrayOp(
 "lhs[]", "=", "op1[]", "*", "op2[]", "/", "op3[]"
 ));

That is not real tokenization, can you go with "lhs","[","]",","=",... ?
 where lhs, op1, op2, op3 are arrays.

 I'd like it to be something like:

 mixin(parallelArrayOp(
 "lhs[] = op1[] * op2[] / op3[]"
 ));

 Does anyone have/is there any easy way to write a compile-time D tokenizer?

I have written an almost complete tokenizer in D. Making it compile-time should be rather trivial, I will give it a try. (it will also convert embedded numerals to the correct type etc.) If you don't need a complete tokenizer: What are the tokenizer features you need?
Aug 25 2011
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 26.08.2011 03:08, dsimcha wrote:
 I'm working on a parallel array ops implementation for
 std.parallel_algorithm. (For the latest work in progress see
 https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d
 ).

 To make it (somewhat) pretty, I need to be able to tokenize a single
 statement worth of D source code at compile time. Right now, the syntax
 requires manual tokenization:

 mixin(parallelArrayOp(
 "lhs[]", "=", "op1[]", "*", "op2[]", "/", "op3[]"
 ));

 where lhs, op1, op2, op3 are arrays.

 I'd like it to be something like:

 mixin(parallelArrayOp(
 "lhs[] = op1[] * op2[] / op3[]"
 ));

 Does anyone have/is there any easy way to write a compile-time D tokenizer?

The lexer used by Visual D is also CTFE capable: http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d As Timon pointed out, it will separate into D tokens, not the more combined elements in your array. Here's my small CTFE test: /////////////////////////////////////////////////////////////////////// int[] ctfeLexer(string s) { Lexer lex; int state; uint pos; int[] ids; while(pos < s.length) { uint prevpos = pos; int id; int type = lex.scan(state, s, pos, id); assert(prevpos < pos); if(!Lexer.isCommentOrSpace(type, s[prevpos .. pos])) ids ~= id; } return ids; } unittest { static assert(ctfeLexer(q{int /* comment to skip */ a;}) == [ TOK_int, TOK_Identifier, TOK_semicolon ]); } If you want the tokens as strings rather than just the token ID, you can collect "s[prevpos .. pos]" instead of "id" into an array.
Aug 26 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Rainer Schuetze (r.sagitario gmx.de)'s article
 The lexer used by Visual D is also CTFE capable:
 http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d
 As Timon pointed out, it will separate into D tokens, not the more
 combined elements in your array.
 Here's my small CTFE test:

Thanks, but I've come to the conclusion that this lexer is way too big a dependency for something as small as parallel array ops, unless it were to be integrated into Phobos by itself. I'll just stick with the ugly syntax. Unfortunately, according to my benchmarks array ops may be so memory bandwidth-bound that parallelization doesn't yield very good speedups anyhow.
Aug 26 2011
parent reply Don <nospam nospam.com> writes:
dsimcha wrote:
 == Quote from Rainer Schuetze (r.sagitario gmx.de)'s article
 The lexer used by Visual D is also CTFE capable:
 http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d
 As Timon pointed out, it will separate into D tokens, not the more
 combined elements in your array.
 Here's my small CTFE test:

Thanks, but I've come to the conclusion that this lexer is way too big a dependency for something as small as parallel array ops, unless it were to be integrated into Phobos by itself. I'll just stick with the ugly syntax. Unfortunately, according to my benchmarks array ops may be so memory bandwidth-bound that parallelization doesn't yield very good speedups anyhow.

Totally. Anything below BLAS3 is memory-limited, not CPU limited. Even then, cache prefetching has as big an impact as number of processors.
Aug 27 2011
parent dsimcha <dsimcha yahoo.com> writes:
On 8/27/2011 5:37 AM, Don wrote:
 dsimcha wrote:
 == Quote from Rainer Schuetze (r.sagitario gmx.de)'s article
 The lexer used by Visual D is also CTFE capable:
 http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d
 As Timon pointed out, it will separate into D tokens, not the more
 combined elements in your array.
 Here's my small CTFE test:

Thanks, but I've come to the conclusion that this lexer is way too big a dependency for something as small as parallel array ops, unless it were to be integrated into Phobos by itself. I'll just stick with the ugly syntax. Unfortunately, according to my benchmarks array ops may be so memory bandwidth-bound that parallelization doesn't yield very good speedups anyhow.

Totally. Anything below BLAS3 is memory-limited, not CPU limited. Even then, cache prefetching has as big an impact as number of processors.

I think the "memory bandwidth-bound" statement actually applies to a lot of what I tried to do in std.parallel_algorithm. Much of it shows far-below-linear speedups, but it can't be explained by communication overhead because the speedup relative to the serial algorithm doesn't improve when I make the problem and work unit sizes huge.
Aug 27 2011