www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - why are types all keywords?

reply Greg Smith <greg siliconoptix.com> writes:
One of the problems with C/C++ is that you can't parse it unless
you know what words are  type names, or typedef/classes; and for
this reason C/C++ type names need to be keywords. D has modified
the syntax so that a parser does not need to know in advance that
certain identifiers represent user-defined types. I think this
is a great step forward.

The question is: why are all the built-in type names (and there
are a lot of them) still keywords? They don't need to be, and I don't
see how it can do any good to make them keywords. I count about
24 keywords which are types. These could all be predefined identifiers.

So why is this bad? Part of it is just a personal bias - if I plot
a chart of all the languages I've used, with 'niceness of language'
vs. 'number of keywords', there is a strong inverse correlation -
python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
is far into the other corner.

But there are some good reasons to avoid superfluous
keywords. Keywords by  definition have the enforced meaning
everywhere - if you add new
keywords, you will break code which has any local, global, struct
member, or anything with the same name. I remember a long time
ago, a buddy was baffled that his C code wouldn't compile in
C++, it turned out he had a struct member called 'this' or 'catch'
or something (this was before the days of syntax coloring).
In D, if new types are ever added - or new predefined values such
as 'true' and 'false' - they can be added as predefined identifiers
without breaking anything. So, why not do it that way from the beginning?

Languages which implement built-in types (and constants) as predefined 
identifiers include Pascal and VHDL, and python (to the extent that it 
has type names, they are __builtin__ type objects and not keywords).

D does not define property names as keywords, why are 'true'
and 'false' and all those type names keywords?
It could be argued that 'this' doesn't need to be a keyword. I might 
want to have a struct member called 'this'; syntactically, it could be a 
predefined local variable.

In D as currently implemented,

	i = int + 2;

.. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
Is this difference important or desirable?
Making 'int' a predefined identifier would cause these two to be
treated the same way in terms of compiler diagnostics.

It might be argued that it would be very dangerous to allow
functions to define a local variable called 'float'. In C, this
could break code which is secretly inserted by macros or #include.
But (a) D doesn't have these (b) *anything* can be broken in C by these
things. In any case, you can always make it illegal to redefine float as
a variable, while still allowing it in, say, struct namespaces. With
a keyword, no such distinction is possible.

- greg
Jul 08 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Greg Smith wrote:

 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?

Good question, and I was asking the same thing actually... (but didn't get any answers so I eventually gave up on it) Another strange thing is that "bool" is *not* a keyword, as you would have expected the type of those two to be ? I wouldn't mind if bool, true and false were all moved to e.g. std.stdbool (which could still be included by default ?) Just like in C99: http://www.opengroup.org/onlinepubs/009695399/basedefs/stdbool.h.html Then again, I'm secretly plotting the swiftly demise of the "bit" type so you probably shouldn't pay any attention to me. ;-) That is: it would be even better if bool, true and false were a new type of their own - but that isn't ever going to happen. --anders
Jul 09 2005
prev sibling next sibling parent AJG <AJG_member pathlink.com> writes:
Hi,

# class Int { public int max = 1337; }
#
# int someFunc() {
#     Int int = new Int();
#     return (int.max); // Is this 1337 or 2147483647?
# } 

I think making intrinsic types keywords is a Good Thing™, but perhaps I'm not
getting your proposal correctly. Would you want something like the above to be
legal?

--AJG.



In article <damsjd$uga$1 digitaldaemon.com>, Greg Smith says...
One of the problems with C/C++ is that you can't parse it unless
you know what words are  type names, or typedef/classes; and for
this reason C/C++ type names need to be keywords. D has modified
the syntax so that a parser does not need to know in advance that
certain identifiers represent user-defined types. I think this
is a great step forward.

The question is: why are all the built-in type names (and there
are a lot of them) still keywords? They don't need to be, and I don't
see how it can do any good to make them keywords. I count about
24 keywords which are types. These could all be predefined identifiers.

So why is this bad? Part of it is just a personal bias - if I plot
a chart of all the languages I've used, with 'niceness of language'
vs. 'number of keywords', there is a strong inverse correlation -
python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
is far into the other corner.

But there are some good reasons to avoid superfluous
keywords. Keywords by  definition have the enforced meaning
everywhere - if you add new
keywords, you will break code which has any local, global, struct
member, or anything with the same name. I remember a long time
ago, a buddy was baffled that his C code wouldn't compile in
C++, it turned out he had a struct member called 'this' or 'catch'
or something (this was before the days of syntax coloring).
In D, if new types are ever added - or new predefined values such
as 'true' and 'false' - they can be added as predefined identifiers
without breaking anything. So, why not do it that way from the beginning?

Languages which implement built-in types (and constants) as predefined 
identifiers include Pascal and VHDL, and python (to the extent that it 
has type names, they are __builtin__ type objects and not keywords).

D does not define property names as keywords, why are 'true'
and 'false' and all those type names keywords?
It could be argued that 'this' doesn't need to be a keyword. I might 
want to have a struct member called 'this'; syntactically, it could be a 
predefined local variable.

In D as currently implemented,

	i = int + 2;

.. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
Is this difference important or desirable?
Making 'int' a predefined identifier would cause these two to be
treated the same way in terms of compiler diagnostics.

It might be argued that it would be very dangerous to allow
functions to define a local variable called 'float'. In C, this
could break code which is secretly inserted by macros or #include.
But (a) D doesn't have these (b) *anything* can be broken in C by these
things. In any case, you can always make it illegal to redefine float as
a variable, while still allowing it in, say, struct namespaces. With
a keyword, no such distinction is possible.

- greg

Jul 09 2005
prev sibling next sibling parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
Greg Smith wrote:
 One of the problems with C/C++ is that you can't parse it unless
 you know what words are  type names, or typedef/classes; and for
 this reason C/C++ type names need to be keywords. D has modified
 the syntax so that a parser does not need to know in advance that
 certain identifiers represent user-defined types. I think this
 is a great step forward.
 
 The question is: why are all the built-in type names (and there
 are a lot of them) still keywords? They don't need to be, and I don't
 see how it can do any good to make them keywords. I count about
 24 keywords which are types. These could all be predefined identifiers.

I just don't get it ... What's the point of making something like "int" not a keyword? #int int; //wth? #class int #{ # static int max = 1337; //wtf is int here? variable? type? class? #} #float double = int.max; //go figure #double bit = cast(typeof( double )) int;
 
 So why is this bad? Part of it is just a personal bias - if I plot
 a chart of all the languages I've used, with 'niceness of language'
 vs. 'number of keywords', there is a strong inverse correlation -
 python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
 is far into the other corner.

I hope the method in which you measure "niceness of language" doesn't include "number of keywords" ...
 
 But there are some good reasons to avoid superfluous
 keywords. Keywords by  definition have the enforced meaning
 everywhere - if you add new
 keywords, you will break code which has any local, global, struct
 member, or anything with the same name. 

A good compiler will quickly point out to you the error and hopefully it can be easily fixed, find and replace in files :)
 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).

Why didn't his compiler tell him that "this" is a keyword?
 In D, if new types are ever added - or new predefined values such
 as 'true' and 'false' - they can be added as predefined identifiers
 without breaking anything. So, why not do it that way from the beginning?
 
 Languages which implement built-in types (and constants) as predefined 
 identifiers include Pascal and VHDL, and python (to the extent that it 
 has type names, they are __builtin__ type objects and not keywords).
 

 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?

hope so)
 It could be argued that 'this' doesn't need to be a keyword. I might 
 want to have a struct member called 'this'; syntactically, it could be a 
 predefined local variable.

yeah, that's a bad argument.
 
 In D as currently implemented,
 
     i = int + 2;
 
 .. is a syntax error, whereas
 
         alias int myint;
          i = myint + 2;
 
  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?

I don't see your point .. both are errors.
 Making 'int' a predefined identifier would cause these two to be
 treated the same way in terms of compiler diagnostics.
 
 It might be argued that it would be very dangerous to allow
 functions to define a local variable called 'float'. In C, this
 could break code which is secretly inserted by macros or #include.
 But (a) D doesn't have these (b) *anything* can be broken in C by these
 things. In any case, you can always make it illegal to redefine float as
 a variable, while still allowing it in, say, struct namespaces. With
 a keyword, no such distinction is possible.
 
 - greg
 
 

I don't see one single real problem with the issue. If it ain't broken, don't fix it.
Jul 10 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Hasan Aljudy wrote:

 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?

I think true and false are not just aliases for 0 and 1 (or atleast, I hope so)

Sorry, but in D: "true" is a constant bit of 1, and "false" is a constant bit of 0. const bit true = 1; const bit false = 0; They just happen to be implemented inside the D compiler itself... case TOKtrue: e = new IntegerExp(loc, 1, Type::tbit); nextToken(); break; case TOKfalse: e = new IntegerExp(loc, 0, Type::tbit); nextToken(); break; See http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools --anders
Jul 11 2005
prev sibling parent reply Greg Smith <greg siliconoptix.com> writes:
Hasan Aljudy wrote:
 Greg Smith wrote:
 
 The question is: why are all the built-in type names (and there
 are a lot of them) still keywords? They don't need to be, and I don't
 see how it can do any good to make them keywords. I count about
 24 keywords which are types. These could all be predefined identifiers.

I just don't get it ... What's the point of making something like "int" not a keyword? #int int; //wth? #class int #{ # static int max = 1337; //wtf is int here? variable? type? class? #} #float double = int.max; //go figure #double bit = cast(typeof( double )) int;

Yes, this change would allow you to redefine int. it's possible in other languages, and they haven't self-destructed as a result. If this is a problem, you could make it illegal to redefine built-in names in certain scopes. If they are keywords, then this level of control is not possible. My point is, there's no reason to make it a keyword, unless you want it to always be (effectively) a special punctuation mark, in *all* possible contexts, and you want to extend that to *all* the built-in types, despite the fact that user-defined types don't have or need this special treatment, and you don't mind putting in extra grammar rules to deal with the fact that type names could be these keywords *or* identifiers.
 
 So why is this bad? Part of it is just a personal bias - if I plot
 a chart of all the languages I've used, with 'niceness of language'
 vs. 'number of keywords', there is a strong inverse correlation -
 python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
 is far into the other corner.

include "number of keywords" ...

Actually, that was a contributing factor for the Compiled Basic. There were several pages of keywords, and every word that had anything to do with computing was in there somewhere. So you had to make variable names with spelling errors in them, 'rekord'. But, no, in general languages with a large number of keywords seem to be designed along the principle that you should cram as much as possible into the core language, and that shows in other ways as well. Also, languages with a lot of keywords often have big, clumsy, bloated grammars, and need to have a lot of keywords to direct the parser. Keywords were invented for the purpose of adding extra punctuation to the token set, to help the grammar. I don't see any point in making more keywords than are needed for this purpose. D has >20 keywords which are not needed for the grammar, and therefore the grammar is more complicated than it needs to be, and error messages are generally less informative as a side-effect.
 
 But there are some good reasons to avoid superfluous
 keywords. Keywords by  definition have the enforced meaning
 everywhere - if you add new
 keywords, you will break code which has any local, global, struct
 member, or anything with the same name. 

A good compiler will quickly point out to you the error and hopefully it can be easily fixed, find and replace in files :)

afer they added 'xor' and 'and', etc, to the C++ keyword list, without checking first if it was OK with me :-).
 
 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).

Why didn't his compiler tell him that "this" is a keyword?

since a keyword appeared in a position where it was not allowed by the grammar. A lot of tokens other than 'identifier' are allowed there - so you wouldn't even get something as helpful as "error at 'try' - expected 'identifier'" Try it with your favourite C++ compiler.
 
 In D, if new types are ever added - or new predefined values such
 as 'true' and 'false' - they can be added as predefined identifiers
 without breaking anything. So, why not do it that way from the beginning?


 
 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?

I think true and false are not just aliases for 0 and 1 (or atleast, I hope so)

in contexts where they could otherwise be locally redefined.
 
 
 In D as currently implemented,

     i = int + 2;

 .. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?

I don't see your point .. both are errors.

(1) they are both, essentially, the same error, why should they produce completely different error messages? (2) the error message you get for the second one, "can't do that to a type", is much more useful than the one you get for the first, "syntax error". (3) The compiler writer's job is more complex for no benefit: you have a grammar with two different ways of matching types, which may be certain keywords or any identifier; and that grammar rejects certain erroneous constructs, such as adding things to built-in types, but you *still* need to check that expressions are well formed since any identifier could in fact be a type name. It should be quite clear that the D grammar[*] would be simpler if type names were not keywords; and that the semantic checking required to compensate is already in place to deal with user-defined types. [*] by this I mean the one in the compiler, not the one in the documentation; the latter tends to assume you know a priori what names are types, whereas you don't in practice.
 I don't see one single real problem with the issue.
 
 If it ain't broken, don't fix it.

Hey, then why bother with D? Use C/C++. It seems to work OK, people use it for a lot of stuff. How about this: the language is in its early development. It is still possible to make changes like this. It will be much, much harder in the future. I still don't see one reason why there *should* be so many keywords (other than the fact that's already done that way) and I've pointed out a few reasons why IMHO it's better, and cleaner, not to. The fact that C does the same thing does not qualify as a reason, since it's a stated goal of D to eliminate the very reason C needs to do that. So, having gone to the trouble to eliminate the need for keywords... why are they still there???
Jul 18 2005
parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
Greg Smith wrote:
 Hasan Aljudy wrote:

 I just don't get it ...

 What's the point of making something like "int" not a keyword?

 #int int; //wth?
 #class int
 #{
 # static int max = 1337; //wtf is int here? variable? type? class?
 #}
 #float double = int.max; //go figure
 #double bit = cast(typeof( double )) int;

Yes, this change would allow you to redefine int. it's possible in other languages, and they haven't self-destructed as a result.

Sorry, all the languages I'v worked with are from the C family (C, C++, Java, D) with the exception of Pascal. How do other languages implement that?
  If this is a problem, you could make it illegal to redefine built-in 
 names in certain scopes. If they are keywords, then
 this level of control is not possible.

The ability to use "int" or "float" or "this" for one's own purposes is not really an advantage.
 
 My point is, there's no reason to make it a keyword, unless you want
 it to always be (effectively) a special punctuation mark, in *all*
 possible contexts, and you want to extend that to *all* the built-in
 types, despite the fact that user-defined types don't have or need this
 special treatment, and you don't mind putting in extra grammar rules to
 deal with the fact that type names could be these keywords *or* 
 identifiers.

I still don't get your point .... It's a keywrod because, well, how do you define a variable to be of a certain type? well, you use a "type name" to spcify the type of a variable. type_name variable_name; You can define your own types, but your own types will always be defined in terms of other types. typedef newtype oldtype; struct new_type { some_known_type field1; some_other_known_type field2; //.. etc } every new type is defined in terms of other type(s), there must be in the end a type which isn't defined in terms of anything. int is such a type. if it's not a keyword, then it can be turned on and off. well, how do you turn it "on"? and what would be the point of having turned off? [snip]
 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).

Why didn't his compiler tell him that "this" is a keyword?

> Why on earth would it do that? it reported a syntax error, since a keyword appeared in a position where it was not allowed by the grammar. A lot of tokens other than 'identifier' are allowed there - so you wouldn't even get something as helpful as "error at 'try' - expected 'identifier'" Try it with your favourite C++ compiler.

I'm just saying the problem here is the error messege, not the keyword.
 In D as currently implemented,

     i = int + 2;

 .. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?

I don't see your point .. both are errors.

Here's the point: (1) they are both, essentially, the same error, why should they produce completely different error messages?

because .. they can be treated differently. for #int + 2 there is no way around using something other than int. but for #myint + 2 you can redefine myint to be a variable, or you can use something other than myint.
  (2) the error message you get for the second one,
    "can't do that to a type", is much more useful than the one you get
    for the first, "syntax error".

so? ask the compiler writer to produce a more informative error messege!
  (3) The compiler writer's job is more complex for no benefit:
    you have a grammar with two different ways of matching types, which
    may be certain keywords or any identifier; and that grammar rejects
    certain erroneous constructs, such as adding things to built-in
    types, but you *still* need to check that  expressions are
    well formed since any identifier could in fact be a type name.
 
 It should be quite clear that the D grammar[*] would be simpler if type
 names were not keywords; and that the semantic checking required to
 compensate is already in place to deal with user-defined types.
 
 [*] by this I mean the one in the compiler, not the one in the 
 documentation; the latter tends to assume you know a priori what names
 are types, whereas you don't in practice.

Ok, how would that help the language user? I never wrote a compiler, and I have no bit of clue about what you are talking about. But, assuming that you are corrent, and that it does indeed make writing the compielr easier .. your point still doesn't stand. The compiler has already been written! I think it would be much easier for the compiler aithur to use what he had already written than to rewrite the compiler to compensate for your suggestion.
 
 I don't see one single real problem with the issue.

 If it ain't broken, don't fix it.

Hey, then why bother with D? Use C/C++. It seems to work OK, people use it for a lot of stuff.

Because C++ is broken. And Java is broken too.
 
 How about this: the language is in its early development. It is still
 possible to make changes like this. It will be much, much harder in the
 future. I still don't see one reason why there *should* be so many 
 keywords (other than the fact that's already done that way)  and I've 
 pointed out a few reasons why IMHO it's better, and cleaner, not to.

Where are those reasons? I didn't see them. The only reasons were: 1- so you can use "int" as a variable name or something else. 2- easier to implement in a compiler. but #1 is not really a practical reason. and I already answered #2
 The fact that C does the same thing does not qualify as a reason, since
 it's a stated goal of D to eliminate the very reason C needs to do that.
 So, having gone to the trouble to eliminate the need for keywords... why
 are they still there???

Where does the documentation state that D's goal is to eliminate C's need for keywords?
Jul 18 2005
parent reply Greg Smith <greg siliconoptix.com> writes:
Hasan Aljudy wrote:
 
 Greg Smith wrote:
 
 Hasan Aljudy wrote:

 I just don't get it ...

 What's the point of making something like "int" not a keyword?

 #int int; //wth?
 #class int
 #{
 # static int max = 1337; //wtf is int here? variable? type? class?
 #}
 #float double = int.max; //go figure
 #double bit = cast(typeof( double )) int;

Yes, this change would allow you to redefine int. it's possible in other languages, and they haven't self-destructed as a result.

Sorry, all the languages I'v worked with are from the C family (C, C++, Java, D) with the exception of Pascal. How do other languages implement that?

Very simple. You go into the symbol table at startup -- the same one into which the user names go - and you predefine the names there as types. Pascal does this, and you've probably never noticed. See? it doesn't hurt at all.
 
  If this is a problem, you could make it illegal to redefine built-in 
 names in certain scopes. If they are keywords, then
 this level of control is not possible.

The ability to use "int" or "float" or "this" for one's own purposes is not really an advantage.

these. What's the difference between making it illegal to redefine them and making them keywords? (1) by making them keywords, you complicate the grammar and gain no advantage by doing so; the grammar must still support type names which are identifiers. (2) by making them keywords, you cause them to be treated differently, in the parser and semantic passes, from user-defined types. Functionality needs to be replicated in the compiler, since 'int' is discovered to be a type in the parser, while 'myint' is seen as an identifier in the parser, and is discovered to be a type in the semantic processing. This means more complexity than needed, and leads to inconsistent, and less useful, diagnostics. (3) New built-in types can be added in future to the language as predefined identifiers, with much less likelihood of breaking old code than if they are added as new keywords. (4) if they are defined as identifiers, you can make it illegal to redefine them in specific contexts. With keywords there is no such control. To appeal to the KISS principle: - If the built-in types can be implemented in the same way as the user-defined types, why not do so ?? If you want to make it illegal to redefine these, fine - but why chisel them into stone in the parser when the grammar doesn't need this, and would be simpler without it?
 My point is, there's no reason to make it a keyword, unless you want
 it to always be (effectively) a special punctuation mark, in *all*
 possible contexts, and you want to extend that to *all* the built-in
 types, despite the fact that user-defined types don't have or need this
 special treatment, and you don't mind putting in extra grammar rules to
 deal with the fact that type names could be these keywords *or* 
 identifiers.

I still don't get your point .... It's a keywrod because, well, how do you define a variable to be of a certain type? well, you use a "type name" to spcify the type of a variable. type_name variable_name; You can define your own types, but your own types will always be defined in terms of other types. typedef newtype oldtype; struct new_type { some_known_type field1; some_other_known_type field2; //.. etc } every new type is defined in terms of other type(s), there must be in the end a type which isn't defined in terms of anything. int is such a type. if it's not a keyword, then it can be turned on and off. well, how do you turn it "on"? and what would be the point of having turned off?

to whether the built-in types are defined in the grammar as keywords, or in the symbol table as predefined names, as in pascal. You are saying this: because there is no point in redefining them, they should be cast in stone in the parser. I mildly disagree with the premise, and I utterly disagree with the conclusion. Regarding the premise, as I have pointed out, what if you want to add a new built-in type -- if you define it as a new keyword, it might conflict with a local variable name in some existing code. If you want it to be illegal to redefine certain names, this is fine, but this does not by any means mean they need to be keywords!! IMHO this should be done in the symbol table, not by making keywords that are not required by the grammar. This is much simpler in the long run; it leads to better error messages, e.g. "can't redefine 'int' in this name space " vs. "Syntax error"; and allows control by scope, e.g. you might want to allow some names to be used in struct members.
 
 [snip]
 
 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).




 Why didn't his compiler tell him that "this" is a keyword?



 Why on earth would it do that? it reported a syntax error,
 since a keyword appeared in a position where it was not
 allowed by the grammar. A lot of tokens other than 'identifier'
 are allowed there - so you wouldn't even get something as helpful as
 "error at 'try' - expected 'identifier'"
  Try it with your favourite C++ compiler.

I'm just saying the problem here is the error messege, not the keyword.

I fully agree. And the best way to get better error messages is to allow the semantic pass to see these errors, rather than making them syntax errors, which is what happens when keywords are defined.
 
 In D as currently implemented,

     i = int + 2;

 .. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?

I don't see your point .. both are errors.

Here's the point: (1) they are both, essentially, the same error, why should they produce completely different error messages?

because .. they can be treated differently. for #int + 2 there is no way around using something other than int. but for #myint + 2 you can redefine myint to be a variable, or you can use something other than myint.

example. Of course they can be, and are, treated differently; I know why the behavior occurs. I'm saying there's no advantage to this and there are disadvantages; which are eliminated by eliminating the keywords. They are treated differently because the *parser* knows 'int' is a type name, and has no rule allowing it to add a type to something; but the parser has a rule saying an identifier can be added to something. What I'm saying is: if int was *not* a keyword, we could eliminate the first rule, simplify the grammar, get better diagnostics, shorten the keyword table (and thus speed up the lexer) ... the compiler code which rejects 'int + 2' would then be the same code which rejects "myint+2". Is there any advantage to treating them differently?
  (2) the error message you get for the second one,
    "can't do that to a type", is much more useful than the one you get
    for the first, "syntax error".

so? ask the compiler writer to produce a more informative error messege!

You say later that you aren't familiar with compilers, and no offence, but that's showing here. By far he easiest way to improve the error message is to do away with the unnecessary keywords. It's very hard to produce helpful messages for errors which arise because no grammar rule is applicable. A syntax error is basically the parser saying "huh?". At best, it can tell you where it became irrevocably confused, and tell what kinds of tokens are legal at that point. It is possible to add additional grammar rules, solely for the purpose of matching specific illegal constructs, so that they can be given more meaningful error messages. This gets rather messy. And in this case, the desirable grammar rules already exist -- with 'identifier' in them, so that they don't apply when types happen to be built-in types. It is far easier in the semantic phase to provide a guess at what you think the programmer was trying to do, and produce a useful error message. Imagine a language which allows array declarations sized by integer constants,or expressions formed of integer constants. It would be possible to make 'int a[-3]' a syntax error in such a language, by contriving the grammar so that no rule matched it. Far better to make it syntactically legal, so the message is "error: negative array dimension for 'a'", rather than "syntax error". The test would be needed anyhow, since the grammar can't make "int a[7-10]" illegal. Actually, we could get this improvement in D by modifying the grammar as such: identifier_or_type:: IDENTIFIER { $$ = lookup_ident($1); } | INT { $$ = /* .. type obj for 'int' */ } | BYTE { $$ = /* .. type obj for 'byte' */ } ... ... and eliminating all other rules referencing the type keywords, which, by D charter, are actually redundant. And, using 'identifier_or_type' in place of most IDENTIFER references (not the ones where IDENTIFER is assigned a meaning). Thus, 'int + 2' would be caught by the same code as 'myint + 2'. This change obtains most of the improvement I'm looking for while still preventing the names from being redefined. It's then a relatively small step to eliminate this one weird bit of grammar and provide predefined symbols.
 
 
 Ok, how would that help the language user?
 
 I never wrote a compiler, and I have no bit of clue about what you are 
 talking about.
 

 But, assuming that you are corrent, and that it does indeed make writing 
 the compielr easier .. your point still doesn't stand.
 
 The compiler has already been written!
 
 I think it would be much easier for the compiler aithur to use what he 
 had already written than to rewrite the compiler to compensate for your 
 suggestion.

of them, when there is an opportunity to get things right even it means changing something which already works as it is. D is, by charter, in such a situation. All such opportunities should be considered in the long-term view, since there will *never* be an easier time to make such a change. The cost of the change will be short-lived, the benefit will stay on.
 
 How about this: the language is in its early development. It is still
 possible to make changes like this. It will be much, much harder in the
 future. I still don't see one reason why there *should* be so many 
 keywords (other than the fact that's already done that way)  and I've 
 pointed out a few reasons why IMHO it's better, and cleaner, not to.

Where are those reasons? I didn't see them. The only reasons were: 1- so you can use "int" as a variable name or something else. 2- easier to implement in a compiler.

 
 but #1 is not really a practical reason. and I already answered #2

at the trouble Bill Gates got us all into with that thinking in the early 80's. Do you really think the current D compiler will be the only one ever written? Also, you keep missing, or dismissing, 3 - more consistent, useful error checking/error messages, by eliminating replication of semantic checking in the parser.
 
 The fact that C does the same thing does not qualify as a reason, since
 it's a stated goal of D to eliminate the very reason C needs to do that.
 So, having gone to the trouble to eliminate the need for keywords... why
 are they still there???

Where does the documentation state that D's goal is to eliminate C's need for keywords?

in C, for the parser to know which identifiers are previously defined as typedefs (or classes in C++), since C cannot be parsed otherwise. This makes D a 'context-free grammar', you don't need to feed information back to the parser from the symbol table. C defines 'int' etc as keywords for the same purpose, they must be distinguished (to the parser) from regular identifiers. (also, because C has idioms like 'unsigned char' which do not apply to typedefs, and have likewise been eliminated in D). So, making D's grammar context-free has, as a direct result, eliminated the need for type names to be keywords. ----------------- http://www.digitalmars.com/d/index.html Major Goals of D ... * Make D substantially easier to implement a compiler for than C++. ... * Have a context-free grammar. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ----------------- When I first encountered D, after reading the article on printf in Jan/05 Dr. Dobbs, I read the 'context-free grammar' part in the goals, and my first thought was "Great!" and my second was "... and so built-in types aren't keywords any more..." but that turned out be untrue, for reasons no-one has been able to supply. I don't think I've encountered any other well-thought-out language (and D clearly is one) which defines a whole bunch of keywords which are not actually necessary to the parsing process. Thank you for helping me clarify my argument. BTW, I feel like I'm telling someone, "It's summer, you don't need to wear a snowsuit any more , you'll be more comfortable without it", and I keep getting back "you haven't really given me a strong enough reason to not wear it; the grocery store is a bit chilly, for instance; and I'm already wearing it and I know it fits..." - greg
Jul 19 2005
next sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 19 Jul 2005 15:45:12 -0400, Greg Smith wrote:

[snip] 
 I don't think I've encountered any other well-thought-out language (and 
 D clearly is one) which defines a whole bunch of keywords which are not 
 actually necessary to the parsing process.

Greg, you have written a whole lot of sensible things here. You have my support for what its worth. -- Derek Parnell Melbourne, Australia 20/07/2005 8:02:37 AM
Jul 19 2005
prev sibling parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
Ah well, I guess I did help you make your point.

You're talking about compiler implementation, I have nothing to do with 
that.

My only concern is, well, I don't wanna be reading some code and keep 
wondering to myself whether this "int" here the real "int" or some 
variable name or class name defined by the user.

If it's merely a matter of compiler implementation then I don't care, as 
it's clearly not my business.

However, if what you are proposing would allow people to say "int" when 
they don't really mean the "int" that we currently know, then I have a 
problem with that.


Greg Smith wrote:
 Hasan Aljudy wrote:
 
 Greg Smith wrote:

 Hasan Aljudy wrote:

[snip]
 I just don't get it ...

 What's the point of making something like "int" not a keyword?

 #int int; //wth?
 #class int
 #{
 # static int max = 1337; //wtf is int here? variable? type? class?
 #}
 #float double = int.max; //go figure
 #double bit = cast(typeof( double )) int;

Yes, this change would allow you to redefine int. it's possible in other languages, and they haven't self-destructed as a result.

Sorry, all the languages I'v worked with are from the C family (C, C++, Java, D) with the exception of Pascal. How do other languages implement that?

Very simple. You go into the symbol table at startup -- the same one into which the user names go - and you predefine the names there as types. Pascal does this, and you've probably never noticed. See? it doesn't hurt at all.
  If this is a problem, you could make it illegal to redefine built-in 
 names in certain scopes. If they are keywords, then
 this level of control is not possible.

The ability to use "int" or "float" or "this" for one's own purposes is not really an advantage.

these. What's the difference between making it illegal to redefine them and making them keywords? (1) by making them keywords, you complicate the grammar and gain no advantage by doing so; the grammar must still support type names which are identifiers. (2) by making them keywords, you cause them to be treated differently, in the parser and semantic passes, from user-defined types. Functionality needs to be replicated in the compiler, since 'int' is discovered to be a type in the parser, while 'myint' is seen as an identifier in the parser, and is discovered to be a type in the semantic processing. This means more complexity than needed, and leads to inconsistent, and less useful, diagnostics. (3) New built-in types can be added in future to the language as predefined identifiers, with much less likelihood of breaking old code than if they are added as new keywords. (4) if they are defined as identifiers, you can make it illegal to redefine them in specific contexts. With keywords there is no such control. To appeal to the KISS principle: - If the built-in types can be implemented in the same way as the user-defined types, why not do so ?? If you want to make it illegal to redefine these, fine - but why chisel them into stone in the parser when the grammar doesn't need this, and would be simpler without it?
 My point is, there's no reason to make it a keyword, unless you want
 it to always be (effectively) a special punctuation mark, in *all*
 possible contexts, and you want to extend that to *all* the built-in
 types, despite the fact that user-defined types don't have or need this
 special treatment, and you don't mind putting in extra grammar rules to
 deal with the fact that type names could be these keywords *or* 
 identifiers.

I still don't get your point .... It's a keywrod because, well, how do you define a variable to be of a certain type? well, you use a "type name" to spcify the type of a variable. type_name variable_name; You can define your own types, but your own types will always be defined in terms of other types. typedef newtype oldtype; struct new_type { some_known_type field1; some_other_known_type field2; //.. etc } every new type is defined in terms of other type(s), there must be in the end a type which isn't defined in terms of anything. int is such a type. if it's not a keyword, then it can be turned on and off. well, how do you turn it "on"? and what would be the point of having turned off?

to whether the built-in types are defined in the grammar as keywords, or in the symbol table as predefined names, as in pascal. You are saying this: because there is no point in redefining them, they should be cast in stone in the parser. I mildly disagree with the premise, and I utterly disagree with the conclusion. Regarding the premise, as I have pointed out, what if you want to add a new built-in type -- if you define it as a new keyword, it might conflict with a local variable name in some existing code. If you want it to be illegal to redefine certain names, this is fine, but this does not by any means mean they need to be keywords!! IMHO this should be done in the symbol table, not by making keywords that are not required by the grammar. This is much simpler in the long run; it leads to better error messages, e.g. "can't redefine 'int' in this name space " vs. "Syntax error"; and allows control by scope, e.g. you might want to allow some names to be used in struct members.
 [snip]

 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).




..
 Why didn't his compiler tell him that "this" is a keyword?



..
 Why on earth would it do that? it reported a syntax error,
 since a keyword appeared in a position where it was not
 allowed by the grammar. A lot of tokens other than 'identifier'
 are allowed there - so you wouldn't even get something as helpful as
 "error at 'try' - expected 'identifier'"
  Try it with your favourite C++ compiler.

I'm just saying the problem here is the error messege, not the keyword.

I fully agree. And the best way to get better error messages is to allow the semantic pass to see these errors, rather than making them syntax errors, which is what happens when keywords are defined.
 In D as currently implemented,

     i = int + 2;

 .. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?

I don't see your point .. both are errors.

Here's the point: (1) they are both, essentially, the same error, why should they produce completely different error messages?

because .. they can be treated differently. for #int + 2 there is no way around using something other than int. but for #myint + 2 you can redefine myint to be a variable, or you can use something other than myint.

example. Of course they can be, and are, treated differently; I know why the behavior occurs. I'm saying there's no advantage to this and there are disadvantages; which are eliminated by eliminating the keywords. They are treated differently because the *parser* knows 'int' is a type name, and has no rule allowing it to add a type to something; but the parser has a rule saying an identifier can be added to something. What I'm saying is: if int was *not* a keyword, we could eliminate the first rule, simplify the grammar, get better diagnostics, shorten the keyword table (and thus speed up the lexer) ... the compiler code which rejects 'int + 2' would then be the same code which rejects "myint+2". Is there any advantage to treating them differently?
  (2) the error message you get for the second one,
    "can't do that to a type", is much more useful than the one you get
    for the first, "syntax error".

so? ask the compiler writer to produce a more informative error messege!

You say later that you aren't familiar with compilers, and no offence, but that's showing here. By far he easiest way to improve the error message is to do away with the unnecessary keywords. It's very hard to produce helpful messages for errors which arise because no grammar rule is applicable. A syntax error is basically the parser saying "huh?". At best, it can tell you where it became irrevocably confused, and tell what kinds of tokens are legal at that point. It is possible to add additional grammar rules, solely for the purpose of matching specific illegal constructs, so that they can be given more meaningful error messages. This gets rather messy. And in this case, the desirable grammar rules already exist -- with 'identifier' in them, so that they don't apply when types happen to be built-in types. It is far easier in the semantic phase to provide a guess at what you think the programmer was trying to do, and produce a useful error message. Imagine a language which allows array declarations sized by integer constants,or expressions formed of integer constants. It would be possible to make 'int a[-3]' a syntax error in such a language, by contriving the grammar so that no rule matched it. Far better to make it syntactically legal, so the message is "error: negative array dimension for 'a'", rather than "syntax error". The test would be needed anyhow, since the grammar can't make "int a[7-10]" illegal. Actually, we could get this improvement in D by modifying the grammar as such: identifier_or_type:: IDENTIFIER { $$ = lookup_ident($1); } | INT { $$ = /* .. type obj for 'int' */ } | BYTE { $$ = /* .. type obj for 'byte' */ } ... ... and eliminating all other rules referencing the type keywords, which, by D charter, are actually redundant. And, using 'identifier_or_type' in place of most IDENTIFER references (not the ones where IDENTIFER is assigned a meaning). Thus, 'int + 2' would be caught by the same code as 'myint + 2'. This change obtains most of the improvement I'm looking for while still preventing the names from being redefined. It's then a relatively small step to eliminate this one weird bit of grammar and provide predefined symbols.
 Ok, how would that help the language user?

 I never wrote a compiler, and I have no bit of clue about what you are 
 talking about.

 But, assuming that you are corrent, and that it does indeed make 
 writing the compielr easier .. your point still doesn't stand.

 The compiler has already been written!

 I think it would be much easier for the compiler aithur to use what he 
 had already written than to rewrite the compiler to compensate for 
 your suggestion.

> This is a valid point in general, but there are times, and precious few of them, when there is an opportunity to get things right even it means changing something which already works as it is. D is, by charter, in such a situation. All such opportunities should be considered in the long-term view, since there will *never* be an easier time to make such a change. The cost of the change will be short-lived, the benefit will stay on.
 How about this: the language is in its early development. It is still
 possible to make changes like this. It will be much, much harder in the
 future. I still don't see one reason why there *should* be so many 
 keywords (other than the fact that's already done that way)  and I've 
 pointed out a few reasons why IMHO it's better, and cleaner, not to.

Where are those reasons? I didn't see them. The only reasons were: 1- so you can use "int" as a variable name or something else. 2- easier to implement in a compiler.

 but #1 is not really a practical reason. and I already answered #2

Regarding 2, the only reason you gave is the pre-existing code. Look at the trouble Bill Gates got us all into with that thinking in the early 80's. Do you really think the current D compiler will be the only one ever written? Also, you keep missing, or dismissing, 3 - more consistent, useful error checking/error messages, by eliminating replication of semantic checking in the parser.
 The fact that C does the same thing does not qualify as a reason, since
 it's a stated goal of D to eliminate the very reason C needs to do that.
 So, having gone to the trouble to eliminate the need for keywords... why
 are they still there???

Where does the documentation state that D's goal is to eliminate C's need for keywords?

in C, for the parser to know which identifiers are previously defined as typedefs (or classes in C++), since C cannot be parsed otherwise. This makes D a 'context-free grammar', you don't need to feed information back to the parser from the symbol table. C defines 'int' etc as keywords for the same purpose, they must be distinguished (to the parser) from regular identifiers. (also, because C has idioms like 'unsigned char' which do not apply to typedefs, and have likewise been eliminated in D). So, making D's grammar context-free has, as a direct result, eliminated the need for type names to be keywords. ----------------- http://www.digitalmars.com/d/index.html Major Goals of D ... * Make D substantially easier to implement a compiler for than C++. ... * Have a context-free grammar. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ----------------- When I first encountered D, after reading the article on printf in Jan/05 Dr. Dobbs, I read the 'context-free grammar' part in the goals, and my first thought was "Great!" and my second was "... and so built-in types aren't keywords any more..." but that turned out be untrue, for reasons no-one has been able to supply. I don't think I've encountered any other well-thought-out language (and D clearly is one) which defines a whole bunch of keywords which are not actually necessary to the parsing process. Thank you for helping me clarify my argument. BTW, I feel like I'm telling someone, "It's summer, you don't need to wear a snowsuit any more , you'll be more comfortable without it", and I keep getting back "you haven't really given me a strong enough reason to not wear it; the grocery store is a bit chilly, for instance; and I'm already wearing it and I know it fits..." - greg

Jul 19 2005
prev sibling next sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
I don't agree; the reason I don't is completely for the thing you first 
speak of:

Parsing it.

Wait, you say: but, we already made it clear that you don't need to know 
type names to parse it (in D.)  Quite astute, yes... but, consider the 
following code:

int dumb()
{
    static int i = 0;

    return 42 + i++;
}

Now, let's say that int isn't a keyword.  Let's even say, for the sake 
of argument, I can name a function int as well:

int int()
{
    return 42;
}

Now, again, this should be parseable because we know some things here... 
but, most basic highlighters will not do this correctly.

In fact, I would argue that the fact we can do this:

int bool()
{
    return 1;
}

(which we can...) is the problem here.  Not only is that confusing, but 
the only editor I can think of offhand which has highlighting powerful 
enough to handle that is Microsoft Visual Studio - and even then, it 
would be *fun* to make it tell the difference, based on the way it works.

I will agree that if D is going to make every text editor/IDE/etc. 
developer's heads ache, it should do it from the start... otherwise, not 
at all (not for types!)

As it is, in languages like PHP... int can't really be highlighted by 
most highlighters - because you can use it as a type *AND* as a function 
name.  This is horrible, in my opinion, and detrimental - although it's 
negative effects are limited, in this case, because variables have a $ 
prefix.

-[Unknown]


 One of the problems with C/C++ is that you can't parse it unless
 you know what words are  type names, or typedef/classes; and for
 this reason C/C++ type names need to be keywords. D has modified
 the syntax so that a parser does not need to know in advance that
 certain identifiers represent user-defined types. I think this
 is a great step forward.
 
 The question is: why are all the built-in type names (and there
 are a lot of them) still keywords? They don't need to be, and I don't
 see how it can do any good to make them keywords. I count about
 24 keywords which are types. These could all be predefined identifiers.
 
 So why is this bad? Part of it is just a personal bias - if I plot
 a chart of all the languages I've used, with 'niceness of language'
 vs. 'number of keywords', there is a strong inverse correlation -
 python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
 is far into the other corner.
 
 But there are some good reasons to avoid superfluous
 keywords. Keywords by  definition have the enforced meaning
 everywhere - if you add new
 keywords, you will break code which has any local, global, struct
 member, or anything with the same name. I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).
 In D, if new types are ever added - or new predefined values such
 as 'true' and 'false' - they can be added as predefined identifiers
 without breaking anything. So, why not do it that way from the beginning?
 
 Languages which implement built-in types (and constants) as predefined 
 identifiers include Pascal and VHDL, and python (to the extent that it 
 has type names, they are __builtin__ type objects and not keywords).
 
 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?
 It could be argued that 'this' doesn't need to be a keyword. I might 
 want to have a struct member called 'this'; syntactically, it could be a 
 predefined local variable.
 
 In D as currently implemented,
 
     i = int + 2;
 
 .. is a syntax error, whereas
 
         alias int myint;
          i = myint + 2;
 
  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?
 Making 'int' a predefined identifier would cause these two to be
 treated the same way in terms of compiler diagnostics.
 
 It might be argued that it would be very dangerous to allow
 functions to define a local variable called 'float'. In C, this
 could break code which is secretly inserted by macros or #include.
 But (a) D doesn't have these (b) *anything* can be broken in C by these
 things. In any case, you can always make it illegal to redefine float as
 a variable, while still allowing it in, say, struct namespaces. With
 a keyword, no such distinction is possible.
 
 - greg

Jul 19 2005
parent Greg Smith <greg siliconoptix.com> writes:
Unknown W. Brackets wrote:

 
 Now, let's say that int isn't a keyword.  Let's even say, for the sake 
 of argument, I can name a function int as well:
 
 int int()
 {
    return 42;
 }
 
 Now, again, this should be parseable because we know some things here...
 but, most basic highlighters will not do this correctly.

Firstly, 'int' etc can be protected from redefinition (on a scope-selective basis, if needed) without being a keyword, see my temporally preceding post for some reasons why this is an advantage over just using keywords. Secondly, even if 'int' can be redefined at file scope, I don't think it's a problem that 'int' would always appear colored as a builtin-type in your editor. This would probably be preferred, since it would let you know you are doing something dubious. If you wanted to go further, and make an editor which parses the whole file so that local variables, e.g. are displayed in a different color than globals, and references to undefined variables are displayed in red, and you can hover over any variable and see its type... that's much, much easier for D than for C. The same process would allow 'int' to be highlighted properly (and/or let you know specifically that you were redefining a built-in type). I have experience in the user side of this kind of issue, since I do a lot of Python coding. In python you can write def add_dot(str): return str + '.' ... and it works, but it's poor practice, since 'str' is a predefined (__builtin__) name which corresponds to the string type. you only get in trouble when you modify it and fail to notice the conflict: def add_dot_num(str,num): return str + '.' + str(num) # oops! The second 'str' refers to the local parameter rather than the builtin 'str' which converts 2 to '2'. However, this doesn't cause anywhere near as much trouble as you might think: - many editors color 'str' differently, since it's in __builtin__; this makes it harder to redefine it by mistake. - automatic code checkers can easily determine that this code is redefining 'str' and warn you; - new builtins added to the language (and they are often added) do not break any code that happens to already use the same name as a variable. The resulting 'dubious usage' can be 'fixed' at your leisure.
Jul 19 2005
prev sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Greg Smith wrote:
...

D doesn't have all the syntax that some languages (I'm thinking of Ada here) have which would allow you to specify how many bits a particular type should have, what value range it should allow, etc. As a result all of the basic space allocating words need to by keywords. A type basically means: 1) reserve this space. 2) define these operations over this space Things get a bit more complex when we start thinking about where the space is allocated, how it interacts with other types, and how we pass it as a parameter, but those are the basics. D has a simple (relatively simple) syntax. As a result, it needs a large number of keywords.
Jul 19 2005
parent reply Greg Smith <greg siliconoptix.com> writes:
Charles Hixson wrote:
 Greg Smith wrote:
 
 D doesn't have all the syntax that some languages (I'm thinking of Ada 
 here) have which would allow you to specify how many bits a particular 
 type should have, what value range it should allow, etc.  As a result 
 all of the basic space allocating words need to by keywords.
 
 A type basically means:
 1) reserve this space.
 2) define these operations over this space
 
 Things get a bit more complex when we start thinking about where the 
 space is allocated, how it interacts with other types, and how we pass 
 it as a parameter, but those are the basics.

These are semantic issues which have absolutely nothing to do with whether the type names are keywords.
 
 D has a simple (relatively simple) syntax.  As a result, it needs a 
 large number of keywords.

'keyword'. Languages with simpler syntax generally need fewer keywords (example: python; extreme example: lisp). Furthermore, the grammar of D would be simpler still if the type names became built-in identifiers (I mean the grammar in the compiler; the one known to the user would be effectively unchanged). Ada has a lot of keywords too; but the type names are not among them, since (as in pascal and D) they don't need to be. C is actually a anomaly in this sense, and C++ inherited the anomaly. D has inherited the practice[*], while specifically shaking off the necessity, this is what puzzles me. A keyword is a specific combination of letters (e.g. 'if', 'goto') which is recognized by the lexical scanner as having a distinct significance no matter where it appears in the token sequence, despite the fact that it follows the general rule defining how an 'identifier' is formed. Keywords are assigned significance before their situation relative to the other tokens is analyzed (i.e. prior to the parser), whereas other identifers are assigned meaning after the parsing process. When meaning is assigned later, it is possible to apply sophisticated rules to the process (e.g. 'mtype' might be a function name at global scope, but also defined as an alias type inside a function, and at the same time be the name of members of several structs). [None of this is immutable law of language design, it's just the way modern languages are designed and parsed, and the terminology which is used. C and C++, in fact, require bending of these rules, which is generally viewed as a problem: Once a typedef is defined, references to it must be identified as such *prior* to the parser; since this requires scopes to be considered, and scopes are defined by the parsing process, this can be tricky]. When a word is a keyword, it's a keyword everywhere. So why define keywords at all? The conventional language-design practice is that you define keywords as needed to make the language parseable. The 'if' keyword tells the parser to expect the structure of an 'if' statement. An example of the opposite approach is FORTRAN, which was designed well before formal grammars had found their way into computer programming. In FORTRAN there are no keywords, and spaces have no significance. As a result, FORTRAN is quite difficult to parse, even though the process needs to be done only on one line at a time. Consider: 100 FORMAT(I2,I3) 100 FORMAT(I2,I3)=0 DO 100 I=1,20 DO 100 I=1.20 The first is a 'format' statement for output formatting, and the second is an assignment to an element of a 2d array called 'FORMAT'. The third is a do loop, and the fourth is an assigment to 'DO10I'. In order to distingush these, a fortran compiler basically has to dither back and forth over the entire line, trying to figure out what the heck the thing is. The analogous behaviour in a language like D, which is not split into lines, would be to dither over the entire source file, making guesses about what things are and checking if those guesses still work when inner levels are analyzed. Ugh. By having strategically positioned keywords, you can parse powerful grammars, with complex nested structure, in a more-or-less left-to-right fashion. Whenever you see 'if' sitting there, what follows is either an if statement or invalid input; you don't need to go find the other end of it to see if it might be something else. This design is very clear in pascal, where every definition of anything starts with a keyword indicating exactly what you are defining: procedure or function, or variable,constant, or type; and that in turn tells the parser what to expect next [Ada too, I think]. In D, the parser needs to work a little harder to figure things out, but you have less clutter. So the question is, why define a bunch of keywords which are not only unnecessary to the parsing process, but actually complicate the grammar? Anywhere in D where I can use 'int', I can also use 'myint', which is an identifier that I have aliased to 'int'. So the parser needs to understand every possible such construct where the type name is an identifier, and it needs additional rules to understand them when they are keywords. As I've mentioned previously, this doesn't just lead to a more complex parser, it also leads to inferior diagnostic messages. [*] I've been reading the manual a bit more, and I've found that D already has a built-in type implemented as a predefined identifier: Object. So why are all the other ones keywords?
Jul 20 2005
next sibling parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
 Ada has a lot of keywords too; but the type names are not among them, 
 since (as in pascal and D) they don't need to be. C is actually a anomaly 
 in this sense, and C++ inherited the anomaly. D has inherited the 
 practice[*], while specifically shaking off the necessity, this is what 
 puzzles me.

Java and C# retain the basic types as keywords, though I don't know if they need to or not. It could be that they remain keywords to be more compatible with C/C++ tools - though that is just a guess. For example I'm not sure if the emacs mode and syntax highlighter would color 'int' correctly if it wasn't on the keyword list.
Jul 20 2005
prev sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Greg Smith wrote:
 Charles Hixson wrote:
 Greg Smith wrote:

 D doesn't have all the syntax that some languages (I'm thinking of Ada 
 here) have which would allow you to specify how many bits a particular 
 type should have, what value range it should allow, etc.  As a result 
 all of the basic space allocating words need to by keywords.

 A type basically means:
 1) reserve this space.
 2) define these operations over this space

 Things get a bit more complex when we start thinking about where the 
 space is allocated, how it interacts with other types, and how we pass 
 it as a parameter, but those are the basics.

These are semantic issues which have absolutely nothing to do with whether the type names are keywords.
 D has a simple (relatively simple) syntax.  As a result, it needs a 
 large number of keywords.

'keyword'. Languages with simpler syntax generally need fewer keywords (example: python; extreme example: lisp). Furthermore, the grammar of D would be simpler still if the type names became built-in identifiers (I mean the grammar in the compiler; the one known to the user would be effectively unchanged). Ada has a lot of keywords too; but the type names are not among them, since (as in pascal and D) they don't need to be. C is actually a anomaly in this sense, and C++ inherited the anomaly. D has inherited the practice[*], while specifically shaking off the necessity, this is what puzzles me. A keyword is a specific combination of letters (e.g. 'if', 'goto') which is recognized by the lexical scanner as having a distinct significance no matter where it appears in the token sequence, despite the fact that it follows the general rule defining how an 'identifier' is formed. Keywords are assigned significance before their situation relative to the other tokens is analyzed (i.e. prior to the parser), whereas other identifers are assigned meaning after the parsing process. When meaning is assigned later, it is possible to apply sophisticated rules to the process (e.g. 'mtype' might be a function name at global scope, but also defined as an alias type inside a function, and at the same time be the name of members of several structs). [None of this is immutable law of language design, it's just the way modern languages are designed and parsed, and the terminology which is used. C and C++, in fact, require bending of these rules, which is generally viewed as a problem: Once a typedef is defined, references to it must be identified as such *prior* to the parser; since this requires scopes to be considered, and scopes are defined by the parsing process, this can be tricky]. When a word is a keyword, it's a keyword everywhere. So why define keywords at all? The conventional language-design practice is that you define keywords as needed to make the language parseable. The 'if' keyword tells the parser to expect the structure of an 'if' statement. An example of the opposite approach is FORTRAN, which was designed well before formal grammars had found their way into computer programming. In FORTRAN there are no keywords, and spaces have no significance. As a result, FORTRAN is quite difficult to parse, even though the process needs to be done only on one line at a time. Consider: 100 FORMAT(I2,I3) 100 FORMAT(I2,I3)=0 DO 100 I=1,20 DO 100 I=1.20 The first is a 'format' statement for output formatting, and the second is an assignment to an element of a 2d array called 'FORMAT'. The third is a do loop, and the fourth is an assigment to 'DO10I'. In order to distingush these, a fortran compiler basically has to dither back and forth over the entire line, trying to figure out what the heck the thing is. The analogous behaviour in a language like D, which is not split into lines, would be to dither over the entire source file, making guesses about what things are and checking if those guesses still work when inner levels are analyzed. Ugh. By having strategically positioned keywords, you can parse powerful grammars, with complex nested structure, in a more-or-less left-to-right fashion. Whenever you see 'if' sitting there, what follows is either an if statement or invalid input; you don't need to go find the other end of it to see if it might be something else. This design is very clear in pascal, where every definition of anything starts with a keyword indicating exactly what you are defining: procedure or function, or variable,constant, or type; and that in turn tells the parser what to expect next [Ada too, I think]. In D, the parser needs to work a little harder to figure things out, but you have less clutter. So the question is, why define a bunch of keywords which are not only unnecessary to the parsing process, but actually complicate the grammar? Anywhere in D where I can use 'int', I can also use 'myint', which is an identifier that I have aliased to 'int'. So the parser needs to understand every possible such construct where the type name is an identifier, and it needs additional rules to understand them when they are keywords. As I've mentioned previously, this doesn't just lead to a more complex parser, it also leads to inferior diagnostic messages. [*] I've been reading the manual a bit more, and I've found that D already has a built-in type implemented as a predefined identifier: Object. So why are all the other ones keywords?

Perhaps I am using an unusual definition. E.g., I consider all of the words built into Forth to be keywords. Note that you can, at your own risk, override any of them. Forth has almost no syntax, it's all subsumed into the definitions of the words. I consider a keyword to be anything that the compiler (or interpreter) knows what means. Examples from D include not only things like int and uint, but also import, struct, etc. With more syntax you need fewer keywords. Perhaps Snobol is an example here. (I don't really remember it clearly, but my impression was that it has LOTS of syntax, and few keywords.) Note that this "Syntax" isn't an unified thing. Ada has lots of syntax around storage allocation, but relatively few keywords, even though it allows you to specify such things as "This type denotes things that take up 37 bits and are floating point numbers with 3 digits of precision." Just imagine the amount of work it would take to create such a type in D. (Well, also imagine just how often it would be needed.) D has chosen to PREDEFINE several "types" as keywords. The other types are created by combining the primitive types. One could argue, perhaps, that complex is a redundant type...but it can be very convenient. For that matter, I occasionally wish that D had a bit more syntax around building types. I'd like to be able to define a string class that has string literals. (Others have uttered similar wishes, with perhaps a different idea of precisely what a string class would look like.) What did you mean by keyword?
Jul 20 2005
parent Greg Smith <greg siliconoptix.com> writes:
Charles Hixson wrote:

 Greg Smith wrote:
 
 Charles Hixson wrote:

 Greg Smith wrote:

 D doesn't have all the syntax that some languages (I'm thinking of 
 Ada here) have which would allow you to specify how many bits a 
 particular type should have, what value range it should allow, etc.  
 As a result all of the basic space allocating words need to by keywords.

 A type basically means:
 1) reserve this space.
 2) define these operations over this space

 Things get a bit more complex when we start thinking about where the 
 space is allocated, how it interacts with other types, and how we 
 pass it as a parameter, but those are the basics.

These are semantic issues which have absolutely nothing to do with whether the type names are keywords.
 D has a simple (relatively simple) syntax.  As a result, it needs a 
 large number of keywords.

This makes no sense at all; you must be using an unusual definition of 'keyword'. Languages with simpler syntax generally need fewer keywords (example: python; extreme example: lisp). Furthermore, the grammar of D would be simpler still if the type names became built-in identifiers (I mean the grammar in the compiler; the one known to the user would be effectively unchanged). Ada has a lot of keywords too; but the type names are not among them, since (as in pascal and D) they don't need to be. C is actually a anomaly in this sense, and C++ inherited the anomaly. D has inherited the practice[*], while specifically shaking off the necessity, this is what puzzles me. A keyword is a specific combination of letters (e.g. 'if', 'goto') which is recognized by the lexical scanner as having a distinct significance no matter where it appears in the token sequence, despite the fact that it follows the general rule defining how an 'identifier' is formed. Keywords are assigned significance before their situation relative to the other tokens is analyzed (i.e. prior to the parser), whereas other identifers are assigned meaning after the parsing process. When meaning is assigned later, it is possible to apply sophisticated rules to the process (e.g. 'mtype' might be a function name at global scope, but also defined as an alias type inside a function, and at the same time be the name of members of several structs). [None of this is immutable law of language design, it's just the way modern languages are designed and parsed, and the terminology which is used. C and C++, in fact, require bending of these rules, which is generally viewed as a problem: Once a typedef is defined, references to it must be identified as such *prior* to the parser; since this requires scopes to be considered, and scopes are defined by the parsing process, this can be tricky]. When a word is a keyword, it's a keyword everywhere. So why define keywords at all? The conventional language-design practice is that you define keywords as needed to make the language parseable. The 'if' keyword tells the parser to expect the structure of an 'if' statement. An example of the opposite approach is FORTRAN, which was designed well before formal grammars had found their way into computer programming. In FORTRAN there are no keywords, and spaces have no significance. As a result, FORTRAN is quite difficult to parse, even though the process needs to be done only on one line at a time. Consider: 100 FORMAT(I2,I3) 100 FORMAT(I2,I3)=0 DO 100 I=1,20 DO 100 I=1.20 The first is a 'format' statement for output formatting, and the second is an assignment to an element of a 2d array called 'FORMAT'. The third is a do loop, and the fourth is an assigment to 'DO10I'. In order to distingush these, a fortran compiler basically has to dither back and forth over the entire line, trying to figure out what the heck the thing is. The analogous behaviour in a language like D, which is not split into lines, would be to dither over the entire source file, making guesses about what things are and checking if those guesses still work when inner levels are analyzed. Ugh. By having strategically positioned keywords, you can parse powerful grammars, with complex nested structure, in a more-or-less left-to-right fashion. Whenever you see 'if' sitting there, what follows is either an if statement or invalid input; you don't need to go find the other end of it to see if it might be something else. This design is very clear in pascal, where every definition of anything starts with a keyword indicating exactly what you are defining: procedure or function, or variable,constant, or type; and that in turn tells the parser what to expect next [Ada too, I think]. In D, the parser needs to work a little harder to figure things out, but you have less clutter. So the question is, why define a bunch of keywords which are not only unnecessary to the parsing process, but actually complicate the grammar? Anywhere in D where I can use 'int', I can also use 'myint', which is an identifier that I have aliased to 'int'. So the parser needs to understand every possible such construct where the type name is an identifier, and it needs additional rules to understand them when they are keywords. As I've mentioned previously, this doesn't just lead to a more complex parser, it also leads to inferior diagnostic messages. [*] I've been reading the manual a bit more, and I've found that D already has a built-in type implemented as a predefined identifier: Object. So why are all the other ones keywords?

Perhaps I am using an unusual definition. E.g., I consider all of the words built into Forth to be keywords. Note that you can, at your own risk, override any of them. Forth has almost no syntax, it's all subsumed into the definitions of the words. I consider a keyword to be anything that the compiler (or interpreter) knows what means. Examples from D include not only things like int and uint, but also import, struct, etc. With more syntax you need fewer keywords. Perhaps Snobol is an example here. (I don't really remember it clearly, but my impression was that it has LOTS of syntax, and few keywords.) Note that this "Syntax" isn't an unified thing. Ada has lots of syntax around storage allocation, but relatively few keywords, even though it allows you to specify such things as "This type denotes things that take up 37 bits and are floating point numbers with 3 digits of precision." Just imagine the amount of work it would take to create such a type in D. (Well, also imagine just how often it would be needed.) D has chosen to PREDEFINE several "types" as keywords. The other types are created by combining the primitive types. One could argue, perhaps, that complex is a redundant type...but it can be very convenient.

proposing changing 'wchar' etc from a keyword to a predefined identifier. So it's still predefined, this doesn't affect anything you've discussed in the previous paragraph. All existing D code would be unaffected.
 For that matter, I occasionally wish that D had a bit more syntax around 
 building types.  I'd like to be able to define a string class that has 
 string literals.  (Others have uttered similar wishes, with perhaps a 
 different idea of precisely what a string class would look like.)
 
 What did you mean by keyword?
 

try again. Conventional terminology is that a keyword is a sequence of letters taken away from the allowed set of identifiers, or names, and effectively used as a nicely readable punctuation mark. You cannot redefine a keyword in any context, since it's recognized as a keyword before its context is considered. By contrast, you can have identifiers which are reserved in specific contexts without being keywords. For instance, in C++, it would be possible to remove 'this' from the keyword list, so that it could be used in other contexts, such as a parameter in a non-member function, or a struct member name. In member functions, 'this' would be an implicitly declared parameter. I'm not suggesting this is a good idea; my point is, that the language would be no harder to parse, since 'this' is syntactically allowed only in places where you can use an identifier; and changing 'this' to a local variable name doesn't change the meaning of any construct to the point where a different parse would be desirable. Such a change would not break any existing code, but it would allow code which is currently illegal C++ (including some legal C code). By this definition, forth (like postscript) has no keywords at all, and (also like postscript) virtually no grammar, thus no need for keywords. [So it's quite possible that the term 'keyword' could take on a different meaning in various discussions of forth...] Side note, anybody remember 'small c'? this was a sort-of-C compiler for 8080 which, by negligence rather than intent, let you freely redefine most keywords, since it didn't really have a lexer separate from the parser. The parser had things like this: /* expect a statement */ if( next_token("{") ) { /* compound statement */ ... }else if ( next_word_is("while") ){ /* it's a while statement */ expect("("); ... So, you could define 'int while' as a variable, and reference it, etc, but any statement starting in 'while' ( e.g. while=0;) would be disallowed because the code above would detect it. I discovered this after using 'switch' as a global variable. A very 'interesting' compiler in many ways.
Jul 21 2005