www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Context-Free Grammars? What about arrays?

reply %u <wfunction hotmail.com> writes:
Hi,

I think I'm having a little trouble understanding what's meant by context-free
grammar. I've read that D is context-free, but is it really? What about an
expression like:

int[U] s;

? You can't tell -- without looking at the context -- whether U is a data type
or a number, and so because associative arrays and regular arrays are
syntactically different elements of the language, the syntax of D is tied in
with its semantics, just like in C++.

So is D really context-free? Or am I misunderstanding the meaning of the term?

Thanks!
Feb 11 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 11 February 2011 01:03:21 %u wrote:
 Hi,
 
 I think I'm having a little trouble understanding what's meant by
 context-free grammar. I've read that D is context-free, but is it really?
 What about an expression like:
 
 int[U] s;
 
 ? You can't tell -- without looking at the context -- whether U is a data
 type or a number, and so because associative arrays and regular arrays are
 syntactically different elements of the language, the syntax of D is tied
 in with its semantics, just like in C++.
 
 So is D really context-free? Or am I misunderstanding the meaning of the
 term?

Yes. You're misunderstanding. It's actually quite difficult to parse a language if its grammar isn't context-free. You really should read up in context-free grammars if you want to understand it properly. Context-free means that it's unambiguous as to which grammar rule is to be applied at any point during the parsing process. In this case, U is a symbol. The parser probably doesn't care about much more than that. That symbol could end up being a number or a type. And actually, it probably isn't even that hard. U is a class, struct, alias, or template argument. The template argument and alias would be substituted with a real type or value before the compiler had to determine what type int[U] was. Not to mention, the context-free grammar is entirely for parsing tokens into an abstract syntax tree. The compiler then has to do whatever it does with the AST to actually generate assembly from it. So, as long as the parser can parse it into the AST, it could be the next stage's problem to determine what that really means. Regardless, I suggest that you read a compiler book if you really want to understand context-free grammars. I have "Compiler Construction: Principles and Practices" by Kenneth C. Louden, which is quite good, but the most popular one is the so-called "dragon" book. I'm not sure what it's actual title is though. - Jonathan M Davis
Feb 11 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-02-11 10:37, Jonathan M Davis wrote:
 On Friday 11 February 2011 01:03:21 %u wrote:
 Hi,

 I think I'm having a little trouble understanding what's meant by
 context-free grammar. I've read that D is context-free, but is it really?
 What about an expression like:

 int[U] s;

 ? You can't tell -- without looking at the context -- whether U is a data
 type or a number, and so because associative arrays and regular arrays are
 syntactically different elements of the language, the syntax of D is tied
 in with its semantics, just like in C++.

 So is D really context-free? Or am I misunderstanding the meaning of the
 term?

Yes. You're misunderstanding. It's actually quite difficult to parse a language if its grammar isn't context-free. You really should read up in context-free grammars if you want to understand it properly. Context-free means that it's unambiguous as to which grammar rule is to be applied at any point during the parsing process. In this case, U is a symbol. The parser probably doesn't care about much more than that. That symbol could end up being a number or a type. And actually, it probably isn't even that hard. U is a class, struct, alias, or template argument. The template argument and alias would be substituted with a real type or value before the compiler had to determine what type int[U] was. Not to mention, the context-free grammar is entirely for parsing tokens into an abstract syntax tree. The compiler then has to do whatever it does with the AST to actually generate assembly from it. So, as long as the parser can parse it into the AST, it could be the next stage's problem to determine what that really means. Regardless, I suggest that you read a compiler book if you really want to understand context-free grammars. I have "Compiler Construction: Principles and Practices" by Kenneth C. Louden, which is quite good, but the most popular one is the so-called "dragon" book. I'm not sure what it's actual title is though. - Jonathan M Davis

The title of the so called "dragon" book is "Compilers: Principles, Techniques, and Tools" and can be found here: http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811/ref=sr_1_1?s=books&ie=UTF8&qid=1297440057&sr=1-1 It's also listed on the bookshelf page on the DigitalMars site. -- /Jacob Carlborg
Feb 11 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 02/11/2011 10:03 AM, %u wrote:
 Hi,

 I think I'm having a little trouble understanding what's meant by context-free
 grammar. I've read that D is context-free, but is it really? What about an
 expression like:

 int[U] s;

 ? You can't tell -- without looking at the context -- whether U is a data type
 or a number, and so because associative arrays and regular arrays are
 syntactically different elements of the language, the syntax of D is tied in
 with its semantics, just like in C++.

 So is D really context-free? Or am I misunderstanding the meaning of the term?

Have you tried it? int[U] cannot be a plain array because the size must be constant. But I agree there is some context in play for the compiler (or rather the linker?) to determine /that/: --> some error if U undefined --> some other error if defined, but neither uint nore type --> some other error if uint but not constant --> ass array if type (I guess) Denis -- _________________ vita es estrany spir.wikidot.com
Feb 11 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 11 February 2011 03:22:54 spir wrote:
 On 02/11/2011 10:03 AM, %u wrote:
 Hi,
 
 I think I'm having a little trouble understanding what's meant by
 context-free grammar. I've read that D is context-free, but is it
 really? What about an expression like:
 
 int[U] s;
 
 ? You can't tell -- without looking at the context -- whether U is a data
 type or a number, and so because associative arrays and regular arrays
 are syntactically different elements of the language, the syntax of D is
 tied in with its semantics, just like in C++.
 
 So is D really context-free? Or am I misunderstanding the meaning of the
 term?

Have you tried it? int[U] cannot be a plain array because the size must be constant. But I agree there is some context in play for the compiler (or rather the linker?) to determine /that/: --> some error if U undefined --> some other error if defined, but neither uint nore type --> some other error if uint but not constant --> ass array if type (I guess)

Actually, it _can_ be a plain, static array. U could be a template argument which is a number. - Jonathan M Davis
Feb 11 2011
prev sibling parent reply Ary Manzana <ary esperanto.org.ar> writes:
On 2/11/11 6:03 AM, %u wrote:
 Hi,

 I think I'm having a little trouble understanding what's meant by context-free
 grammar. I've read that D is context-free, but is it really? What about an
 expression like:

 int[U] s;

 ? You can't tell -- without looking at the context -- whether U is a data type
 or a number, and so because associative arrays and regular arrays are
 syntactically different elements of the language, the syntax of D is tied in
 with its semantics, just like in C++.

 So is D really context-free? Or am I misunderstanding the meaning of the term?

 Thanks!

That will always parse to an associative array. Then in the semantic pass, if U is a constant expression that turns out to be an integer it is reinterpreted as a static array. Take a look: https://github.com/D-Programming-Language/dmd/blob/master/src/mtype.c#L3924
Feb 11 2011
parent reply %u <wfunction hotmail.com> writes:
 That will always parse to an associative array. Then in the semantic pass, if U

static array. Ah, interesting. But that describes the implementation (i.e. how the compiler copes with the ambiguity) and not the language itself.
 Context-free means that it's unambiguous as to which grammar rule is to be

The problem is that it _is_ ambiguous what rule to apply. To me, just because static arrays and associative arrays happen to have similar _looks_ doesn't make parsing them context-free -- they're defined with completely separate syntaxes, and they just happen to look identical. The fact that the compiler has to do a semantic analysis in order to figure this out (which was originally a syntax problem) really makes me wonder why you mention that D is still context-free.
 Not to mention, the context-free grammar is entirely for parsing tokens into an

Again, just because the AST's _happen_ to _look_ the same for static and associative arrays, does that mean that this makes D context-free? The meaning of the expression is still ambiguous, whether or not it fits well within the tree. Furthermore, I would say that a struct like this: struct Temp(T...) { int[T] arr; } should work perfectly if T.length == 0, since it would then be a dynamic array. But the parser is forced to treat it as an associative array, and so _both_ of these break: Temp!() a; Temp!(2) b; //Why does this break? Why should the second one break, in any case?
Feb 11 2011
next sibling parent %u <wfunction hotmail.com> writes:
Sorry, please ignore the very last line:
   Temp!(2) b;      //Why does this break?

I totally wasn't thinking about the fact that T is a tuple in "int[T] arr;". (I
would intuitively think that it should still work, given that a tuple is just a
bunch of types, but I can see why it wouldn't.)

However, why the previous statement breaks:
    Temp!() a;
is still puzzling me, since giving nothing to the tuple should be equivalent to
nothing being in the array.

Thanks.
Feb 11 2011
prev sibling next sibling parent "Nick Sabalausky" <a a.a> writes:
"%u" <wfunction hotmail.com> wrote in message 
news:ij3la9$t53$1 digitalmars.com...
 The problem is that it _is_ ambiguous what rule to apply. To me, just 
 because
 static arrays and associative arrays happen to have similar _looks_ 
 doesn't make
 parsing them context-free -- they're defined with completely separate 
 syntaxes,
 and they just happen to look identical.

Yes, they just happen to look identical: but that's the crux: How it "looks" is the *only* thing that parser/grammar are concerned with. The fact that it could be one of two different *types* is irrelevent since type systems are purely a semantic matter and (at least in D) what type something is never affects how the code's *structure* is interpreted. True, there are times when different types are distinguished via different syntaxes (for instance, function vs integer vs array), but that's incidental. The important thing is that that semantic-level type information never actually needs to feed back into the parser. In the case of "int[U] s;", the *only* thing the parser is concerned with is "Ok, this is a variable declaration, and a variable declaration is a statement, and a statement can occur in x, y and z places". All of that stays exactly the same no matter what "U" is and no matter what type "int[U]" is. Anything beyond that is not the parser's job. Consider this: double a; bool b; Two different variables that are treated very differently. But the *structure* is exactly the same: {type} {identifier} {semicolon}. In all cases, that means "variable declaration". And a variable declaration is a statement, etc. For both variables, that structure is identical, and that's all that the parser is concerned with. In a grammar-definition that would look something like this: <Var Decl> ::= <Type> <Ident> ';' <Statement> ::= <Var Decl> | any other type of statement here | etc Likewise, in the case of "int[U] s;" we have: <Var Decl> ::= <Type> '[' <Ident> ']' <Ident> ';' Note that everything about that is true no matter what the ident inside the brackets is. Also note that there is *nothing* in there the pulls in any information from the semantic phase. If it did, it wouldn't be context-free. If it worked like this: <Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';' <Assoc Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';' ...*Then* it would no longer be context-free because the parser would have to rely on the semantics of 'U' to determine how to parse it. But D doesn't do that. It doesn't even bother distinguishing between array declarations and assoc array declarations until the semantic phase. Therefore the grammar (ie, the parsing) is still context-free. One other thing that would prevent it from being context-free would be if the left-side of a production rule had more than one token: <A> <B> ::= etc...
 The fact that the compiler has to do a
 semantic analysis in order to figure this out (which was originally a 
 syntax
 problem) really makes me wonder why you mention that D is still 
 context-free.

It's not a syntax problem unless the grammar defines it to be a syntax problem. Anything that's defined in the grammar is syntax; anything that isn't defined in the grammar is not syntax. Also, remember that grammar only apples to lexing/parsing and does not specify anything about semantics (and "sementics" effectively means "anything that isn't handled by the lexer or the parser").
Feb 14 2011
prev sibling parent reply "Nick Sabalausky" <a a.a> writes:
"%u" <wfunction hotmail.com> wrote in message 
news:ij3la9$t53$1 digitalmars.com...
 Again, just because the AST's _happen_ to _look_ the same for static and
 associative arrays, does that mean that this makes D context-free?

Grammar is *just* about how it *looks*, not what it means.
 The meaning of
 the expression is still ambiguous, whether or not it fits well within the 
 tree.

"Meaning" is semanitcs, not syntax. Therefore, whether or not the *meaning* is ambiguous is irrelevent to the grammar. The parser/grammar does *not* deal with meaning, just structure.
Feb 14 2011
parent %u <wfunction hotmail.com> writes:
 Yes, they just happen to look identical: but that's the crux: How it

 If it worked like this:
 <Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'
 <Assoc Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'
 ...*Then* it would no longer be context-free because the parser

it. Ahh that makes sense. Thank you for the great explanation! :)
Feb 16 2011