digitalmars.D - Tokenizing D at compile time?

dsimcha (16/16) Aug 25 2011 I'm working on a parallel array ops implementation for

Timon Gehr (9/25) Aug 25 2011 That is not real tokenization, can you go with
Rainer Schuetze (32/48) Aug 26 2011 The lexer used by Visual D is also CTFE capable:

dsimcha (6/11) Aug 26 2011 Thanks, but I've come to the conclusion that this lexer is way too big a

Don (3/15) Aug 27 2011 Totally. Anything below BLAS3 is memory-limited, not CPU limited. Even

dsimcha (6/23) Aug 27 2011 I think the "memory bandwidth-bound" statement actually applies to a lot...

dsimcha <dsimcha yahoo.com> writes:

I'm working on a parallel array ops implementation for 
std.parallel_algorithm.  (For the latest work in progress see 
https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d 
).

To make it (somewhat) pretty, I need to be able to tokenize a single 
statement worth of D source code at compile time.  Right now, the syntax 
requires manual tokenization:

mixin(parallelArrayOp(
    "lhs[]", "=", "op1[]", "*", "op2[]", "/", "op3[]"
));

where lhs, op1, op2, op3 are arrays.

I'd like it to be something like:

mixin(parallelArrayOp(
     "lhs[] = op1[] * op2[] / op3[]"
));

Does anyone have/is there any easy way to write a compile-time D tokenizer?

Aug 25 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 08/26/2011 03:08 AM, dsimcha wrote:
 I'm working on a parallel array ops implementation for
 std.parallel_algorithm. (For the latest work in progress see
 https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d
 ).

 To make it (somewhat) pretty, I need to be able to tokenize a single
 statement worth of D source code at compile time. Right now, the syntax
 requires manual tokenization:

 mixin(parallelArrayOp(
 "lhs[]", "=", "op1[]", "*", "op2[]", "/", "op3[]"
 ));

That is not real tokenization, can you go with

"lhs","[","]",","=",...

?

 where lhs, op1, op2, op3 are arrays.

 I'd like it to be something like:

 mixin(parallelArrayOp(
 "lhs[] = op1[] * op2[] / op3[]"
 ));

 Does anyone have/is there any easy way to write a compile-time D tokenizer?

I have written an almost complete tokenizer in D. Making it compile-time 
should be rather trivial, I will give it a try. (it will also convert 
embedded numerals to the correct type etc.)

If you don't need a complete tokenizer: What are the tokenizer features 
you need?

Aug 25 2011

Rainer Schuetze <r.sagitario gmx.de> writes:

On 26.08.2011 03:08, dsimcha wrote:
 I'm working on a parallel array ops implementation for
 std.parallel_algorithm. (For the latest work in progress see
 https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d
 ).

 To make it (somewhat) pretty, I need to be able to tokenize a single
 statement worth of D source code at compile time. Right now, the syntax
 requires manual tokenization:

 mixin(parallelArrayOp(
 "lhs[]", "=", "op1[]", "*", "op2[]", "/", "op3[]"
 ));

 where lhs, op1, op2, op3 are arrays.

 I'd like it to be something like:

 mixin(parallelArrayOp(
 "lhs[] = op1[] * op2[] / op3[]"
 ));

 Does anyone have/is there any easy way to write a compile-time D tokenizer?

The lexer used by Visual D is also CTFE capable:

http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d

As Timon pointed out, it will separate into D tokens, not the more 
combined elements in your array.

Here's my small CTFE test:

///////////////////////////////////////////////////////////////////////
int[] ctfeLexer(string s)
{
	Lexer lex;
	int state;
	uint pos;
	
	int[] ids;
	while(pos < s.length)
	{
		uint prevpos = pos;
		int id;
		int type = lex.scan(state, s, pos, id);
		assert(prevpos < pos);
		if(!Lexer.isCommentOrSpace(type, s[prevpos .. pos]))
			ids ~= id;
	}
	return ids;
}

unittest
{
	static assert(ctfeLexer(q{int /* comment to skip */ a;}) ==
		[ TOK_int, TOK_Identifier, TOK_semicolon ]);
}

If you want the tokens as strings rather than just the token ID, you can 
collect "s[prevpos .. pos]" instead of "id" into an array.

Aug 26 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from Rainer Schuetze (r.sagitario gmx.de)'s article
 The lexer used by Visual D is also CTFE capable:
 http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d
 As Timon pointed out, it will separate into D tokens, not the more
 combined elements in your array.
 Here's my small CTFE test:

Thanks, but I've come to the conclusion that this lexer is way too big a
dependency for something as small as parallel array ops, unless it were to be
integrated into Phobos by itself.  I'll just stick with the ugly syntax.
Unfortunately, according to my benchmarks array ops may be so memory
bandwidth-bound that parallelization doesn't yield very good speedups anyhow.

Aug 26 2011

Don <nospam nospam.com> writes:

dsimcha wrote:
 == Quote from Rainer Schuetze (r.sagitario gmx.de)'s article
 The lexer used by Visual D is also CTFE capable:
 http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d
 As Timon pointed out, it will separate into D tokens, not the more
 combined elements in your array.
 Here's my small CTFE test:

 
 Thanks, but I've come to the conclusion that this lexer is way too big a
 dependency for something as small as parallel array ops, unless it were to be
 integrated into Phobos by itself.  I'll just stick with the ugly syntax.
 Unfortunately, according to my benchmarks array ops may be so memory
 bandwidth-bound that parallelization doesn't yield very good speedups anyhow.

Totally. Anything below BLAS3 is memory-limited, not CPU limited. Even 
then, cache prefetching has as big an impact as number of processors.

Aug 27 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/27/2011 5:37 AM, Don wrote:
 dsimcha wrote:
 == Quote from Rainer Schuetze (r.sagitario gmx.de)'s article
 The lexer used by Visual D is also CTFE capable:
 http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d
 As Timon pointed out, it will separate into D tokens, not the more
 combined elements in your array.
 Here's my small CTFE test:

 Thanks, but I've come to the conclusion that this lexer is way too big a
 dependency for something as small as parallel array ops, unless it
 were to be
 integrated into Phobos by itself. I'll just stick with the ugly syntax.
 Unfortunately, according to my benchmarks array ops may be so memory
 bandwidth-bound that parallelization doesn't yield very good speedups
 anyhow.

 Totally. Anything below BLAS3 is memory-limited, not CPU limited. Even
 then, cache prefetching has as big an impact as number of processors.

I think the "memory bandwidth-bound" statement actually applies to a lot 
of what I tried to do in std.parallel_algorithm.  Much of it shows 
far-below-linear speedups, but it can't be explained by communication 
overhead because the speedup relative to the serial algorithm doesn't 
improve when I make the problem and work unit sizes huge.

Aug 27 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Tokenizing D at compile time?