digitalmars.D.learn - struct vs class for a simple token in my d lexer

Roman D. Boiko (5/5) May 14 2012 (Subj.) I'm in doubt which to choose for my case, but this is a

Tobias Pankrath (9/14) May 14 2012 struct Token

Roman D. Boiko (10/27) May 14 2012 Yes, except that the array of tokens for big files requires a lot

Tobias Pankrath (13/32) May 14 2012 You'll still have count the space the tokens claim on the heap.

Roman D. Boiko (17/49) May 14 2012 I use sorted array of tokens for efficient 0(log N) lookup by its

Jonathan M Davis (23/30) May 14 2012 For the question in general, there are primarily 2 questions to consider...

Roman D. Boiko (9/54) May 14 2012 Semantics is clearly of value type. But it would be nice to

Dmitry Olshansky (24/83) May 14 2012 Use the damn pointer ;)

Dmitry Olshansky (11/94) May 14 2012 if(!head)

Roman D. Boiko (2/5) May 14 2012 Thanks, most likely I'll go your way.

Era Scarecrow (7/12) May 14 2012 If you don't mind doing some lite informational reading, read up
Era Scarecrow (7/12) May 14 2012 If you don't mind doing some lite informational reading, read up

Roman D. Boiko (3/15) May 14 2012 I've got flex & bison edition (ISBN: 0596155972), will check
Roman D. Boiko (5/11) May 15 2012 Looks like I can introduce several lex/flex concepts in my lexer,

Dmitry Olshansky (46/51) May 14 2012 On 14.05.2012 19:10, Roman D. Boiko wrote:

Tobias Pankrath (2/19) May 14 2012 I wouldn't convert the string representation to a value unless

Dmitry Olshansky (14/34) May 14 2012 Thx for snipping out the gist of it :)

Roman D. Boiko (3/59) May 14 2012 Thanks Dmitry, I think I'll need to contact you privately about

"Roman D. Boiko" <rb d-coding.com> writes:

(Subj.) I'm in doubt which to choose for my case, but this is a 
generic question.

http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org

Cross-posting here. I would appreciate any feedback. (Whether to 
reply in this or that thread is up to you.) Thanks

May 14 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

On Monday, 14 May 2012 at 15:10:25 UTC, Roman D. Boiko wrote:
 (Subj.) I'm in doubt which to choose for my case, but this is a 
 generic question.

 http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org

 Cross-posting here. I would appreciate any feedback. (Whether 
 to reply in this or that thread is up to you.) Thanks


struct Token
{
     TokenType type;
     string content; // type may as well be template parameter
     Area loc;   // basically: ((x1, y1), (x2, y2))
}

That's how my Token look like. The 2 or 3 times every token get's 
copied isn't worth a heap allocation with small type like this.

May 14 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Monday, 14 May 2012 at 15:44:54 UTC, Tobias Pankrath wrote:
 On Monday, 14 May 2012 at 15:10:25 UTC, Roman D. Boiko wrote:
 (Subj.) I'm in doubt which to choose for my case, but this is 
 a generic question.

 http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org

 Cross-posting here. I would appreciate any feedback. (Whether 
 to reply in this or that thread is up to you.) Thanks


 struct Token
 {
     TokenType type;
     string content; // type may as well be template parameter
     Area loc;   // basically: ((x1, y1), (x2, y2))
 }

 That's how my Token look like. The 2 or 3 times every token 
 get's copied isn't worth a heap allocation with small type like 
 this.

Yes, except that the array of tokens for big files requires a lot 
of **continuous** space (13.5M for std.datetime, up to 2M for a 
few others in phobos & druntime). The array of references would 4 
times less. Out of memory usually happens when the system cannot 
find a big enough chunck of memory, not when there is no 
available memory.

On the contrary, I think that 10-20M on rare cases is not too 
much.

Why I use the array I'll describe in the next post.

May 14 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

Quoting your post in another thread:

On Monday, 14 May 2012 at 15:10:25 UTC, Roman D. Boiko wrote:
 Making it a class would give several benefits:

* allow not to worry about allocating a big array of tokens. 
E.g., on 64-bit OS the largest module in Phobos (IIRC, the 
std.datetime) consumes 13.5MB in an array of almost 500K tokens. 
It would require 4 times smaller chunk of contiguous memory if 
it was an array of class objects, because each would consume 
only 8 bytes instead of 32.

You'll still have count the space the tokens claim on the heap. 
So it's
basically the 500k tokens plus 500k references. I'm not sure, why 
you would need such a big array of tokens, though.

Aren't they produced by the lexer to be directly consumed and 
discarded by the parser?

 * allow subclassing, for example, for storing strongly typed 
 literal values; this flexibility could also facilitate future 
 extensibility (but it's difficult to predict which kind of 
 extension may be needed)

If performance matters, why would you subclass and risk a virtual 
method call for something as basic as tokens?


 * there would be no need to copy data from tokens into AST, 
 passing an object would be enough (again, copy 8 instead of 32 
 bytes); the same applies to passing into methods - no need to 
 pass by ref to minimise overhead

I'm using string to store source content in tokens. Because of 
the way string in D works, there is no need for data copies.

 These considerations are mostly about performance. I think 
 there is also some impact on design, but couldn't find anything 
 significant (given that currently I see a token as merely a 
 datastructure without associated behavior).

IMO token are value types.

May 14 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Monday, 14 May 2012 at 15:53:34 UTC, Tobias Pankrath wrote:
 Quoting your post in another thread:

 On Monday, 14 May 2012 at 15:10:25 UTC, Roman D. Boiko wrote:
 Making it a class would give several benefits:

* allow not to worry about allocating a big array of tokens. 
E.g., on 64-bit OS the largest module in Phobos (IIRC, the 
std.datetime) consumes 13.5MB in an array of almost 500K 
tokens. It would require 4 times smaller chunk of contiguous 
memory if it was an array of class objects, because each would 
consume only 8 bytes instead of 32.

 You'll still have count the space the tokens claim on the heap. 
 So it's
 basically the 500k tokens plus 500k references. I'm not sure, 
 why you would need such a big array of tokens, though.

 Aren't they produced by the lexer to be directly consumed and 
 discarded by the parser?

I use sorted array of tokens for efficient 0(log N) lookup by its 
index (the first code unit of token). (Since tokens are created 
in increasing order of start indices, no further sorting is 
needed.) Lookup is used for two purposes:
* find the token corresponding to location of cursor (e.g., for 
auto-complete)
* combined with 0(log M) lookup in the ordered array of first 
line code unit indices, calculate the Location (line number and 
column number) for start / end of a token on demand (they are not 
pre-calculated because not used frequently); this approach also 
makes it easy to calculate Location either taking into account 
special token sequences (#line 3 "ab/c.d"), or ignoring them.

 * allow subclassing, for example, for storing strongly typed 
 literal values; this flexibility could also facilitate future 
 extensibility (but it's difficult to predict which kind of 
 extension may be needed)

 If performance matters, why would you subclass and risk a 
 virtual method call for something as basic as tokens?

Agree, but not sure. That's why I created this thread.

 * there would be no need to copy data from tokens into AST, 
 passing an object would be enough (again, copy 8 instead of 32 
 bytes); the same applies to passing into methods - no need to 
 pass by ref to minimise overhead

 I'm using string to store source content in tokens. Because of 
 the way string in D works, there is no need for data copies.

The same do I. But size of string field is still 16 bytes (half 
of my token size).

 These considerations are mostly about performance. I think 
 there is also some impact on design, but couldn't find 
 anything significant (given that currently I see a token as 
 merely a datastructure without associated behavior).

 IMO token are value types.

The value type might be implemented as struct or class.

May 14 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, May 14, 2012 17:10:23 Roman D. Boiko wrote:
 (Subj.) I'm in doubt which to choose for my case, but this is a
 generic question.
 
 http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org
 
 Cross-posting here. I would appreciate any feedback. (Whether to
 reply in this or that thread is up to you.) Thanks

For the question in general, there are primarily 2 questions to consider:

1. Do you need inheritance/polymorphism?

2. Do you need a value type or a reference type?

If you need inheritance/polymorphism, you have to use classes, because structs 
don't have inheritance.

If you want a value type, you have to use structs, because classes are 
reference types.

If you want a reference type, then you can use either a class or a struct, but 
it's easier with classes, because classes are always reference types, whereas 
structs are naturally value types, so it can take more work to make them 
reference types depending on what its member variables are.

One other thing to consider is deterministic destruction. The only way to get 
deterministic destruction is to have a struct on the stack. Value types will 
naturally get that, but if you want a reference type with determinstic 
destruction, then you're probably going to have to use a ref-counted struct.

In the vast majority of cases, that should be enough to decide whether you 
need a class or a struct. The main case that it doesn't is the case where you 
want a reference type and don't necessarily want a class, and that decision 
can get complicated.

In the case of a token in a lexer, I would definitely expect that to be a value 
type, which means using a struct.

- Jonathan M Davis

May 14 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Monday, 14 May 2012 at 16:41:39 UTC, Jonathan M Davis wrote:
 On Monday, May 14, 2012 17:10:23 Roman D. Boiko wrote:
 (Subj.) I'm in doubt which to choose for my case, but this is a
 generic question.
 
 http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org
 
 Cross-posting here. I would appreciate any feedback. (Whether 
 to
 reply in this or that thread is up to you.) Thanks

 For the question in general, there are primarily 2 questions to 
 consider:

 1. Do you need inheritance/polymorphism?
 don't have inheritance.

 2. Do you need a value type or a reference type?

 If you need inheritance/polymorphism, you have to use classes, 
 because structs

 If you want a value type, you have to use structs, because 
 classes are
 reference types.

 If you want a reference type, then you can use either a class 
 or a struct, but
 it's easier with classes, because classes are always reference 
 types, whereas
 structs are naturally value types, so it can take more work to 
 make them
 reference types depending on what its member variables are.

 One other thing to consider is deterministic destruction. The 
 only way to get
 deterministic destruction is to have a struct on the stack. 
 Value types will
 naturally get that, but if you want a reference type with 
 determinstic
 destruction, then you're probably going to have to use a 
 ref-counted struct.

 In the vast majority of cases, that should be enough to decide 
 whether you
 need a class or a struct. The main case that it doesn't is the 
 case where you
 want a reference type and don't necessarily want a class, and 
 that decision
 can get complicated.

 In the case of a token in a lexer, I would definitely expect 
 that to be a value
 type, which means using a struct.

 - Jonathan M Davis

Semantics is clearly of value type. But it would be nice to
minimize size of instance to be copied. (I could simply split it
into two, if some part is not used in most scenarios.)

Just curious: how to implement a struct as ref type? I could add
indirection (but then why not just use a class?), or simply pass
by ref into all methods. Are there any other options?

Thanks to everybody for your feedback, it looks like I'll stay
with struct.

May 14 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.05.2012 21:20, Roman D. Boiko wrote:
 On Monday, 14 May 2012 at 16:41:39 UTC, Jonathan M Davis wrote:
 On Monday, May 14, 2012 17:10:23 Roman D. Boiko wrote:
 (Subj.) I'm in doubt which to choose for my case, but this is a
 generic question.

 http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org

 Cross-posting here. I would appreciate any feedback. (Whether to
 reply in this or that thread is up to you.) Thanks

 For the question in general, there are primarily 2 questions to consider:

 1. Do you need inheritance/polymorphism?
 don't have inheritance.

 2. Do you need a value type or a reference type?

 If you need inheritance/polymorphism, you have to use classes, because
 structs

 If you want a value type, you have to use structs, because classes are
 reference types.

 If you want a reference type, then you can use either a class or a
 struct, but
 it's easier with classes, because classes are always reference types,
 whereas
 structs are naturally value types, so it can take more work to make them
 reference types depending on what its member variables are.

 One other thing to consider is deterministic destruction. The only way
 to get
 deterministic destruction is to have a struct on the stack. Value
 types will
 naturally get that, but if you want a reference type with determinstic
 destruction, then you're probably going to have to use a ref-counted
 struct.

 In the vast majority of cases, that should be enough to decide whether
 you
 need a class or a struct. The main case that it doesn't is the case
 where you
 want a reference type and don't necessarily want a class, and that
 decision
 can get complicated.

 In the case of a token in a lexer, I would definitely expect that to
 be a value
 type, which means using a struct.

 - Jonathan M Davis

 Semantics is clearly of value type. But it would be nice to
 minimize size of instance to be copied. (I could simply split it
 into two, if some part is not used in most scenarios.)

 Just curious: how to implement a struct as ref type? I could add
 indirection (but then why not just use a class?), or simply pass
 by ref into all methods. Are there any other options?

Use the damn pointer ;)

In a few words (lines) overwrite struct contents (first 4-8 bytes) with 
pointer to next free token:

//setup:
void get_some_memory()
{
	Token pool = new Token[block_size];
	for(size_t i=0;i<pool.length-1; i++)
		*(void**)(tok.ptr+i) =   tok.ptr+i+1;
	*(void**)&tok[$-1]  = null; //end of list
}
Token* head = pool.ptr;//this guy in TLS

//usage:
Token* new_tok(){
	return head ? get_some_memory(), head : head;
}

void free_tok(Token* tok){
	*(void**)tok =  head;
	head = tok;
}

Untested but trustworthy ;)

 Thanks to everybody for your feedback, it looks like I'll stay
 with struct.


-- 
Dmitry Olshansky

May 14 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.05.2012 21:33, Dmitry Olshansky wrote:
 On 14.05.2012 21:20, Roman D. Boiko wrote:
 On Monday, 14 May 2012 at 16:41:39 UTC, Jonathan M Davis wrote:
 On Monday, May 14, 2012 17:10:23 Roman D. Boiko wrote:
 (Subj.) I'm in doubt which to choose for my case, but this is a
 generic question.

 http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org

 Cross-posting here. I would appreciate any feedback. (Whether to
 reply in this or that thread is up to you.) Thanks

 For the question in general, there are primarily 2 questions to
 consider:

 1. Do you need inheritance/polymorphism?
 don't have inheritance.

 2. Do you need a value type or a reference type?

 If you need inheritance/polymorphism, you have to use classes, because
 structs

 If you want a value type, you have to use structs, because classes are
 reference types.

 If you want a reference type, then you can use either a class or a
 struct, but
 it's easier with classes, because classes are always reference types,
 whereas
 structs are naturally value types, so it can take more work to make them
 reference types depending on what its member variables are.

 One other thing to consider is deterministic destruction. The only way
 to get
 deterministic destruction is to have a struct on the stack. Value
 types will
 naturally get that, but if you want a reference type with determinstic
 destruction, then you're probably going to have to use a ref-counted
 struct.

 In the vast majority of cases, that should be enough to decide whether
 you
 need a class or a struct. The main case that it doesn't is the case
 where you
 want a reference type and don't necessarily want a class, and that
 decision
 can get complicated.

 In the case of a token in a lexer, I would definitely expect that to
 be a value
 type, which means using a struct.

 - Jonathan M Davis

 Semantics is clearly of value type. But it would be nice to
 minimize size of instance to be copied. (I could simply split it
 into two, if some part is not used in most scenarios.)

 Just curious: how to implement a struct as ref type? I could add
 indirection (but then why not just use a class?), or simply pass
 by ref into all methods. Are there any other options?

 Use the damn pointer ;)

 In a few words (lines) overwrite struct contents (first 4-8 bytes) with
 pointer to next free token:

 //setup:
 void get_some_memory()
 {
 Token pool = new Token[block_size];
 for(size_t i=0;i<pool.length-1; i++)
 *(void**)(tok.ptr+i) = tok.ptr+i+1;
 *(void**)&tok[$-1] = null; //end of list
 }
 Token* head = pool.ptr;//this guy in TLS

 //usage:
 Token* new_tok(){

	if(!head)
		get_some_memory();
	Token* r = head;
	head = *(token**)head;
	return r;
 }

Not so trustworthy :o)
But hopefully you get the idea. See something simillar in std.regex, 
though pointer in there also serves for intrusive linked-list.

 void free_tok(Token* tok){
 *(void**)tok = head;
 head = tok;
 }

 Untested but trustworthy ;)

 Thanks to everybody for your feedback, it looks like I'll stay
 with struct.



-- 
Dmitry Olshansky

May 14 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Monday, 14 May 2012 at 17:37:02 UTC, Dmitry Olshansky wrote:
 But hopefully you get the idea. See something simillar in 
 std.regex, though pointer in there also serves for intrusive 
 linked-list.

Thanks, most likely I'll go your way.

May 14 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Monday, 14 May 2012 at 18:00:42 UTC, Roman D. Boiko wrote:
 On Monday, 14 May 2012 at 17:37:02 UTC, Dmitry Olshansky wrote:
 But hopefully you get the idea. See something simillar in 
 std.regex, though pointer in there also serves for intrusive 
 linked-list.


 Thanks, most likely I'll go your way.

  If you don't mind doing some lite informational reading, read up 
on 'lex and yacc'. ISBN: 1-56592-000-7; Hope I got that right. 
I'm not telling you to use C (or the other tools mind you), but 
it does go into handling tokens of multiple types as well as 
managing them using pointers and unions. A good portion of that 
can carry over for your lexer and project.

May 14 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Monday, 14 May 2012 at 18:00:42 UTC, Roman D. Boiko wrote:
 On Monday, 14 May 2012 at 17:37:02 UTC, Dmitry Olshansky wrote:
 But hopefully you get the idea. See something simillar in 
 std.regex, though pointer in there also serves for intrusive 
 linked-list.


 Thanks, most likely I'll go your way.

  If you don't mind doing some lite informational reading, read up 
on 'lex and yacc'. ISBN: 1-56592-000-7; Hope I got that right. 
I'm not telling you to use C (or the other tools mind you), but 
it does go into handling tokens of multiple types as well as 
managing them using pointers and unions. A good portion of that 
can carry over for your lexer and project.

May 14 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Tuesday, 15 May 2012 at 06:17:31 UTC, Era Scarecrow wrote:
 On Monday, 14 May 2012 at 18:00:42 UTC, Roman D. Boiko wrote:
 On Monday, 14 May 2012 at 17:37:02 UTC, Dmitry Olshansky wrote:
 But hopefully you get the idea. See something simillar in 
 std.regex, though pointer in there also serves for intrusive 
 linked-list.


 Thanks, most likely I'll go your way.

  If you don't mind doing some lite informational reading, read 
 up on 'lex and yacc'. ISBN: 1-56592-000-7; Hope I got that 
 right. I'm not telling you to use C (or the other tools mind 
 you), but it does go into handling tokens of multiple types as 
 well as managing them using pointers and unions. A good portion 
 of that can carry over for your lexer and project.

I've got flex & bison edition (ISBN: 0596155972), will check 
there, thanks :)

May 14 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Tuesday, 15 May 2012 at 06:17:31 UTC, Era Scarecrow wrote:
  If you don't mind doing some lite informational reading, read 
 up on 'lex and yacc'. ISBN: 1-56592-000-7; Hope I got that 
 right. I'm not telling you to use C (or the other tools mind 
 you), but it does go into handling tokens of multiple types as 
 well as managing them using pointers and unions. A good portion 
 of that can carry over for your lexer and project.

Looks like I can introduce several lex/flex concepts in my lexer,
thank you!
I've created a task for myself to do this investigation:
https://github.com/roman-d-boiko/dct/issues/42

May 15 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.05.2012 19:10, Roman D. Boiko wrote:
 (Subj.) I'm in doubt which to choose for my case, but this is a generic
 question.

 http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org

 Cross-posting here. I would appreciate any feedback. (Whether to reply
 in this or that thread is up to you.) Thanks

On 14.05.2012 19:10, Roman D. Boiko wrote:
Oops, sorry I meant to post to NG only :) Repost:

Clearly you are puting too much pressure on Token.

In my mind it should be real simple:

struct Token{
     uint col, line;
     uint flags;//indicated info about token, serves as both type tag 
and flag set;
//indicates proper type once token was cooked (like "31.415926" -> 
3.145926e1) i.e. values are calculated
     union {
         string     chars;
         float     f_val;
         double     d_val;
         uint     uint_val;
         long     long_val;
         ulnog     ulong_val;
         //... anything else you may need (8 bytes are plenty)
     }//even then you may use up to 12bytes
     //total size == 24 or 20
};

Where:
     Each raw token at start has chars == slice of characters in text 
(or if not UTF-8 source = copy of source). Except for keywords and 
operators.
     Cooking is a process of calculating constant values and such (say 
populating symbols table will putting symbol id into token instead of 
leaving string slice). Do it on the fly or after the whole source - let 
the user choose.

Value types have nice property of being real fast, I suggest you to do 
at least some syntetic tests before going with ref-based stuff. Pushing 
4 word is cheap, indirection never is. Classes also have hidden mutex 
_monitor_ field so using 'class' can be best described as suicide.

Yet you may go with freelist of tokens (leaving them as structs). It's 
an old and proven way.


About row/col - if this stuff is encoded into Finite Automation (and you 
sure want to do something like DFA) it comes out almost at no cost.
The only disadvantage is complicating DFA tables with some 
irregularities of "true Unicode line ending sequences". It's more of 
nuisance then real problem though.

P.S. if you real bend on performance, I suggest to run sparate 
Aho-Corassic-style thing for keywords, it would greatly simplify (=speed 
up) DFA structure if keywords are not hardcoded into automation.


-- 
Dmitry Olshansky

May 14 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

 struct Token{
     uint col, line;
     uint flags;//indicated info about token, serves as both 
 type tag and flag set;
 //indicates proper type once token was cooked (like "31.415926" 
 -> 3.145926e1) i.e. values are calculated
     union {
         string     chars;
         float     f_val;
         double     d_val;
         uint     uint_val;
         long     long_val;
         ulnog     ulong_val;
         //... anything else you may need (8 bytes are plenty)
     }//even then you may use up to 12bytes
     //total size == 24 or 20
 };

I wouldn't convert the string representation to a value unless 
needed. Even the parser does not need to know.

May 14 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.05.2012 21:16, Tobias Pankrath wrote:
 struct Token{
 uint col, line;
 uint flags;//indicated info about token, serves as both type tag and
 flag set;
 //indicates proper type once token was cooked (like "31.415926" ->
 3.145926e1) i.e. values are calculated
 union {
 string chars;
 float f_val;
 double d_val;
 uint uint_val;
 long long_val;
 ulnog ulong_val;
 //... anything else you may need (8 bytes are plenty)
 }//even then you may use up to 12bytes
 //total size == 24 or 20
 };

 I wouldn't convert the string representation to a value unless needed.
 Even the parser does not need to know.

Thx for snipping out the gist of it :)
See last sentence, flexibility is the king:
<<<
Where:
     Each raw token at start has chars == slice of characters in text 
(or if not UTF-8 source = copy of source). Except for keywords and 
operators.
     Cooking is a process of calculating constant values and such (say 
populating symbols table will putting symbol id into token instead of 
leaving string slice). Do it on the fly or after the whole source - let 
the user choose.



-- 
Dmitry Olshansky

May 14 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Monday, 14 May 2012 at 17:05:17 UTC, Dmitry Olshansky wrote:
 On 14.05.2012 19:10, Roman D. Boiko wrote:
 (Subj.) I'm in doubt which to choose for my case, but this is 
 a generic
 question.

 http://forum.dlang.org/post/odcrgqxoldrktdtarskf forum.dlang.org

 Cross-posting here. I would appreciate any feedback. (Whether 
 to reply
 in this or that thread is up to you.) Thanks

 On 14.05.2012 19:10, Roman D. Boiko wrote:
 Oops, sorry I meant to post to NG only :) Repost:

 Clearly you are puting too much pressure on Token.

 In my mind it should be real simple:

 struct Token{
     uint col, line;
     uint flags;//indicated info about token, serves as both 
 type tag and flag set;
 //indicates proper type once token was cooked (like "31.415926" 
 -> 3.145926e1) i.e. values are calculated
     union {
         string     chars;
         float     f_val;
         double     d_val;
         uint     uint_val;
         long     long_val;
         ulnog     ulong_val;
         //... anything else you may need (8 bytes are plenty)
     }//even then you may use up to 12bytes
     //total size == 24 or 20
 };

 Where:
     Each raw token at start has chars == slice of characters in 
 text (or if not UTF-8 source = copy of source). Except for 
 keywords and operators.
     Cooking is a process of calculating constant values and 
 such (say populating symbols table will putting symbol id into 
 token instead of leaving string slice). Do it on the fly or 
 after the whole source - let the user choose.

 Value types have nice property of being real fast, I suggest 
 you to do at least some syntetic tests before going with 
 ref-based stuff. Pushing 4 word is cheap, indirection never is. 
 Classes also have hidden mutex _monitor_ field so using 'class' 
 can be best described as suicide.

 Yet you may go with freelist of tokens (leaving them as 
 structs). It's an old and proven way.


 About row/col - if this stuff is encoded into Finite Automation 
 (and you sure want to do something like DFA) it comes out 
 almost at no cost.
 The only disadvantage is complicating DFA tables with some 
 irregularities of "true Unicode line ending sequences". It's 
 more of nuisance then real problem though.

 P.S. if you real bend on performance, I suggest to run sparate 
 Aho-Corassic-style thing for keywords, it would greatly 
 simplify (=speed up) DFA structure if keywords are not 
 hardcoded into automation.

Thanks Dmitry, I think I'll need to contact you privately about 
some details later.

May 14 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - struct vs class for a simple token in my d lexer