digitalmars.D - why are types all keywords?

Greg Smith (51/51) Jul 08 2005 One of the problems with C/C++ is that you can't parse it unless

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (14/16) Jul 09 2005 Good question, and I was asking the same thing actually...
AJG (12/63) Jul 09 2005 Hi,
Hasan Aljudy (21/87) Jul 10 2005 I just don't get it ...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (16/21) Jul 11 2005 Sorry, but in D:
Greg Smith (70/150) Jul 18 2005 What's the point of *making* it a keyword???

Hasan Aljudy (55/154) Jul 18 2005 Sorry, all the languages I'v worked with are from the C family (C, C++,

Greg Smith (150/319) Jul 19 2005 Very simple. You go into the symbol table at startup -- the same one

Derek Parnell (8/11) Jul 19 2005 Greg, you have written a whole lot of sensible things here. You have my
Hasan Aljudy (12/396) Jul 19 2005 Ah well, I guess I did help you make your point.

Unknown W. Brackets (37/100) Jul 19 2005 I don't agree; the reason I don't is completely for the thing you first

Greg Smith (35/46) Jul 19 2005 Firstly, 'int' etc can be protected from redefinition (on a

Charles Hixson (14/15) Jul 19 2005 D doesn't have all the syntax that some languages (I'm thinking

Greg Smith (74/91) Jul 20 2005 These are semantic issues which have absolutely nothing to do with

Ben Hinkle (5/10) Jul 20 2005 Java and C# retain the basic types as keywords, though I don't know if t...
Charles Hixson (28/129) Jul 20 2005 Perhaps I am using an unusual definition. E.g., I consider all

Greg Smith (46/187) Jul 21 2005 I think you are making this much more complicated than it is. I'm

Greg Smith <greg siliconoptix.com> writes:

One of the problems with C/C++ is that you can't parse it unless
you know what words are  type names, or typedef/classes; and for
this reason C/C++ type names need to be keywords. D has modified
the syntax so that a parser does not need to know in advance that
certain identifiers represent user-defined types. I think this
is a great step forward.

The question is: why are all the built-in type names (and there
are a lot of them) still keywords? They don't need to be, and I don't
see how it can do any good to make them keywords. I count about
24 keywords which are types. These could all be predefined identifiers.

So why is this bad? Part of it is just a personal bias - if I plot
a chart of all the languages I've used, with 'niceness of language'
vs. 'number of keywords', there is a strong inverse correlation -
python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
is far into the other corner.

But there are some good reasons to avoid superfluous
keywords. Keywords by  definition have the enforced meaning
everywhere - if you add new
keywords, you will break code which has any local, global, struct
member, or anything with the same name. I remember a long time
ago, a buddy was baffled that his C code wouldn't compile in
C++, it turned out he had a struct member called 'this' or 'catch'
or something (this was before the days of syntax coloring).
In D, if new types are ever added - or new predefined values such
as 'true' and 'false' - they can be added as predefined identifiers
without breaking anything. So, why not do it that way from the beginning?

Languages which implement built-in types (and constants) as predefined 
identifiers include Pascal and VHDL, and python (to the extent that it 
has type names, they are __builtin__ type objects and not keywords).

D does not define property names as keywords, why are 'true'
and 'false' and all those type names keywords?
It could be argued that 'this' doesn't need to be a keyword. I might 
want to have a struct member called 'this'; syntactically, it could be a 
predefined local variable.

In D as currently implemented,

	i = int + 2;

.. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
Is this difference important or desirable?
Making 'int' a predefined identifier would cause these two to be
treated the same way in terms of compiler diagnostics.

It might be argued that it would be very dangerous to allow
functions to define a local variable called 'float'. In C, this
could break code which is secretly inserted by macros or #include.
But (a) D doesn't have these (b) *anything* can be broken in C by these
things. In any case, you can always make it illegal to redefine float as
a variable, while still allowing it in, say, struct namespaces. With
a keyword, no such distinction is possible.

- greg

Jul 08 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Greg Smith wrote:

 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?

Good question, and I was asking the same thing actually...
(but didn't get any answers so I eventually gave up on it)


Another strange thing is that "bool" is *not* a keyword,
as you would have expected the type of those two to be ?

I wouldn't mind if bool, true and false were all moved to
e.g. std.stdbool (which could still be included by default ?)

Just like in C99:
http://www.opengroup.org/onlinepubs/009695399/basedefs/stdbool.h.html

Then again, I'm secretly plotting the swiftly demise of the
"bit" type so you probably shouldn't pay any attention to me. ;-)

That is: it would be even better if bool, true and false were
a new type of their own - but that isn't ever going to happen.

--anders

Jul 09 2005

AJG <AJG_member pathlink.com> writes:

Hi,








I think making intrinsic types keywords is a Good Thing�, but perhaps I'm not
getting your proposal correctly. Would you want something like the above to be
legal?

--AJG.



In article <damsjd$uga$1 digitaldaemon.com>, Greg Smith says...
One of the problems with C/C++ is that you can't parse it unless
you know what words are  type names, or typedef/classes; and for
this reason C/C++ type names need to be keywords. D has modified
the syntax so that a parser does not need to know in advance that
certain identifiers represent user-defined types. I think this
is a great step forward.

The question is: why are all the built-in type names (and there
are a lot of them) still keywords? They don't need to be, and I don't
see how it can do any good to make them keywords. I count about
24 keywords which are types. These could all be predefined identifiers.

So why is this bad? Part of it is just a personal bias - if I plot
a chart of all the languages I've used, with 'niceness of language'
vs. 'number of keywords', there is a strong inverse correlation -
python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
is far into the other corner.

But there are some good reasons to avoid superfluous
keywords. Keywords by  definition have the enforced meaning
everywhere - if you add new
keywords, you will break code which has any local, global, struct
member, or anything with the same name. I remember a long time
ago, a buddy was baffled that his C code wouldn't compile in
C++, it turned out he had a struct member called 'this' or 'catch'
or something (this was before the days of syntax coloring).
In D, if new types are ever added - or new predefined values such
as 'true' and 'false' - they can be added as predefined identifiers
without breaking anything. So, why not do it that way from the beginning?

Languages which implement built-in types (and constants) as predefined 
identifiers include Pascal and VHDL, and python (to the extent that it 
has type names, they are __builtin__ type objects and not keywords).

D does not define property names as keywords, why are 'true'
and 'false' and all those type names keywords?
It could be argued that 'this' doesn't need to be a keyword. I might 
want to have a struct member called 'this'; syntactically, it could be a 
predefined local variable.

In D as currently implemented,

	i = int + 2;

.. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
Is this difference important or desirable?
Making 'int' a predefined identifier would cause these two to be
treated the same way in terms of compiler diagnostics.

It might be argued that it would be very dangerous to allow
functions to define a local variable called 'float'. In C, this
could break code which is secretly inserted by macros or #include.
But (a) D doesn't have these (b) *anything* can be broken in C by these
things. In any case, you can always make it illegal to redefine float as
a variable, while still allowing it in, say, struct namespaces. With
a keyword, no such distinction is possible.

- greg

Jul 09 2005

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Greg Smith wrote:
 One of the problems with C/C++ is that you can't parse it unless
 you know what words are  type names, or typedef/classes; and for
 this reason C/C++ type names need to be keywords. D has modified
 the syntax so that a parser does not need to know in advance that
 certain identifiers represent user-defined types. I think this
 is a great step forward.
 
 The question is: why are all the built-in type names (and there
 are a lot of them) still keywords? They don't need to be, and I don't
 see how it can do any good to make them keywords. I count about
 24 keywords which are types. These could all be predefined identifiers.

I just don't get it ...

What's the point of making something like "int" not a keyword?

#int int; //wth?
#class int



#float double = int.max; //go figure
#double bit = cast(typeof( double )) int;


 
 So why is this bad? Part of it is just a personal bias - if I plot
 a chart of all the languages I've used, with 'niceness of language'
 vs. 'number of keywords', there is a strong inverse correlation -
 python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
 is far into the other corner.

I hope the method in which you measure "niceness of language" doesn't 
include "number of keywords" ...

 
 But there are some good reasons to avoid superfluous
 keywords. Keywords by  definition have the enforced meaning
 everywhere - if you add new
 keywords, you will break code which has any local, global, struct
 member, or anything with the same name. 

A good compiler will quickly point out to you the error and hopefully it 
can be easily fixed, find and replace in files :)

 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).

Why didn't his compiler tell him that "this" is a keyword?

 In D, if new types are ever added - or new predefined values such
 as 'true' and 'false' - they can be added as predefined identifiers
 without breaking anything. So, why not do it that way from the beginning?
 
 Languages which implement built-in types (and constants) as predefined 
 identifiers include Pascal and VHDL, and python (to the extent that it 
 has type names, they are __builtin__ type objects and not keywords).
 


 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?

I think true and false are not just aliases for 0 and 1 (or atleast, I 
hope so)

 It could be argued that 'this' doesn't need to be a keyword. I might 
 want to have a struct member called 'this'; syntactically, it could be a 
 predefined local variable.

yeah, that's a bad argument.

 
 In D as currently implemented,
 
     i = int + 2;
 
 .. is a syntax error, whereas
 
         alias int myint;
          i = myint + 2;
 
  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?

I don't see your point .. both are errors.

 Making 'int' a predefined identifier would cause these two to be
 treated the same way in terms of compiler diagnostics.
 
 It might be argued that it would be very dangerous to allow
 functions to define a local variable called 'float'. In C, this
 could break code which is secretly inserted by macros or #include.
 But (a) D doesn't have these (b) *anything* can be broken in C by these
 things. In any case, you can always make it illegal to redefine float as
 a variable, while still allowing it in, say, struct namespaces. With
 a keyword, no such distinction is possible.
 
 - greg
 
 

I don't see one single real problem with the issue.

If it ain't broken, don't fix it.

Jul 10 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Hasan Aljudy wrote:

 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?

 
 I think true and false are not just aliases for 0 and 1 (or atleast, I 
 hope so)

Sorry, but in D:
"true" is a constant bit of 1, and "false" is a constant bit of 0.

const bit true = 1;
const bit false = 0;

They just happen to be implemented inside the D compiler itself...

	case TOKtrue:
	    e = new IntegerExp(loc, 1, Type::tbit);
	    nextToken();
	    break;

	case TOKfalse:
	    e = new IntegerExp(loc, 0, Type::tbit);
	    nextToken();
	    break;

See http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools

--anders

Jul 11 2005

Greg Smith <greg siliconoptix.com> writes:

Hasan Aljudy wrote:
 Greg Smith wrote:
 
 The question is: why are all the built-in type names (and there
 are a lot of them) still keywords? They don't need to be, and I don't
 see how it can do any good to make them keywords. I count about
 24 keywords which are types. These could all be predefined identifiers.

 
 
 I just don't get it ...
 
 What's the point of making something like "int" not a keyword?
 
 #int int; //wth?
 #class int



 #float double = int.max; //go figure
 #double bit = cast(typeof( double )) int;
 

What's the point of *making* it a keyword???

Yes, this change would allow you to redefine int. it's possible in
other languages, and they haven't self-destructed as a result.
  If this is a problem, you could make it illegal to redefine built-in 
names in certain scopes. If they are keywords, then
this level of control is not possible.

My point is, there's no reason to make it a keyword, unless you want
it to always be (effectively) a special punctuation mark, in *all*
possible contexts, and you want to extend that to *all* the built-in
types, despite the fact that user-defined types don't have or need this
special treatment, and you don't mind putting in extra grammar rules to
deal with the fact that type names could be these keywords *or* identifiers.
 
 So why is this bad? Part of it is just a personal bias - if I plot
 a chart of all the languages I've used, with 'niceness of language'
 vs. 'number of keywords', there is a strong inverse correlation -
 python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
 is far into the other corner.

 I hope the method in which you measure "niceness of language" doesn't 
 include "number of keywords" ...

Actually, that was a contributing factor for the Compiled Basic.
There were several pages of keywords, and every word that had
anything to do with computing was in there somewhere. So you had
to make variable names with spelling errors in them, 'rekord'.
But, no, in general languages with a large number of keywords
seem to be designed along the principle that you should cram
as much as possible into the core
language, and that shows in other ways as well.

Also, languages with a lot of keywords often have big, clumsy, bloated
grammars, and need to have a lot of keywords to direct the parser. 
Keywords were invented for the purpose of adding extra punctuation to
the token set, to help the grammar. I don't see any point in making more 
keywords than are needed for this purpose. D has >20 keywords which are 
not needed for the grammar, and therefore the grammar is more 
complicated than it needs to be, and error messages are generally less 
informative as a side-effect.
 
 But there are some good reasons to avoid superfluous
 keywords. Keywords by  definition have the enforced meaning
 everywhere - if you add new
 keywords, you will break code which has any local, global, struct
 member, or anything with the same name. 

 
 
 A good compiler will quickly point out to you the error and hopefully it 
 can be easily fixed, find and replace in files :)

In practice, you get baffling error messages. I've been through that,
afer they added 'xor' and 'and', etc, to the C++ keyword list, without
checking first if it was OK with me :-).

 
 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).

 
 
 Why didn't his compiler tell him that "this" is a keyword?

Why on earth would it do that? it reported a syntax error,
since a keyword appeared in a position where it was not
allowed by the grammar. A lot of tokens other than 'identifier'
are allowed there - so you wouldn't even get something as helpful as
"error at 'try' - expected 'identifier'"
  Try it with your favourite C++ compiler.
 
 In D, if new types are ever added - or new predefined values such
 as 'true' and 'false' - they can be added as predefined identifiers
 without breaking anything. So, why not do it that way from the beginning?


...
 
 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?

 
 I think true and false are not just aliases for 0 and 1 (or atleast, I 
 hope so)

The point is, they are keywords, therefore these words are not available 
in contexts where they could otherwise be locally redefined.
 
 
 In D as currently implemented,

     i = int + 2;

 .. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?

 
 
 I don't see your point .. both are errors.

Here's the point:
  (1) they are both, essentially, the same error, why should they
      produce completely different error messages?
  (2) the error message you get for the second one,
    "can't do that to a type", is much more useful than the one you get
    for the first, "syntax error".
  (3) The compiler writer's job is more complex for no benefit:
    you have a grammar with two different ways of matching types, which
    may be certain keywords or any identifier; and that grammar rejects
    certain erroneous constructs, such as adding things to built-in
    types, but you *still* need to check that  expressions are
    well formed since any identifier could in fact be a type name.

It should be quite clear that the D grammar[*] would be simpler if type
names were not keywords; and that the semantic checking required to
compensate is already in place to deal with user-defined types.

[*] by this I mean the one in the compiler, not the one in the 
documentation; the latter tends to assume you know a priori what names
are types, whereas you don't in practice.

 I don't see one single real problem with the issue.
 
 If it ain't broken, don't fix it.

Hey, then why bother with D? Use C/C++. It seems to work OK, people
use it for a lot of stuff.

How about this: the language is in its early development. It is still
possible to make changes like this. It will be much, much harder in the
future. I still don't see one reason why there *should* be so many 
keywords (other than the fact that's already done that way)  and I've 
pointed out a few reasons why IMHO it's better, and cleaner, not to.

The fact that C does the same thing does not qualify as a reason, since
it's a stated goal of D to eliminate the very reason C needs to do that.
So, having gone to the trouble to eliminate the need for keywords... why
are they still there???

Jul 18 2005

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Greg Smith wrote:
 Hasan Aljudy wrote:

[snip]
 I just don't get it ...

 What's the point of making something like "int" not a keyword?

 #int int; //wth?
 #class int



 #float double = int.max; //go figure
 #double bit = cast(typeof( double )) int;

 What's the point of *making* it a keyword???
 
 Yes, this change would allow you to redefine int. it's possible in
 other languages, and they haven't self-destructed as a result.

Sorry, all the languages I'v worked with are from the C family (C, C++, 
Java, D) with the exception of Pascal.

How do other languages implement that?

  If this is a problem, you could make it illegal to redefine built-in 
 names in certain scopes. If they are keywords, then
 this level of control is not possible.

The ability to use "int" or "float" or "this" for one's own purposes is 
not really an advantage.

 
 My point is, there's no reason to make it a keyword, unless you want
 it to always be (effectively) a special punctuation mark, in *all*
 possible contexts, and you want to extend that to *all* the built-in
 types, despite the fact that user-defined types don't have or need this
 special treatment, and you don't mind putting in extra grammar rules to
 deal with the fact that type names could be these keywords *or* 
 identifiers.

I still don't get your point ....
It's a keywrod because, well, how do you define a variable to be of a 
certain type? well, you use a "type name" to spcify the type of a variable.

type_name variable_name;

You can define your own types, but your own types will always be defined 
in terms of other types.

typedef newtype oldtype;

struct new_type
{
     some_known_type field1;
     some_other_known_type field2;
     //.. etc
}

every new type is defined in terms of other type(s), there must be in 
the end a type which isn't defined in terms of anything.

int is such a type.

if it's not a keyword, then it can be turned on and off.
well, how do you turn it "on"? and what would be the point of having 
turned off?


[snip]
 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).



 Why didn't his compiler tell him that "this" is a keyword?

 
  >
 Why on earth would it do that? it reported a syntax error,
 since a keyword appeared in a position where it was not
 allowed by the grammar. A lot of tokens other than 'identifier'
 are allowed there - so you wouldn't even get something as helpful as
 "error at 'try' - expected 'identifier'"
  Try it with your favourite C++ compiler.


I'm just saying the problem here is the error messege, not the keyword.

 In D as currently implemented,

     i = int + 2;

 .. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?



 I don't see your point .. both are errors.

 

 Here's the point:
  (1) they are both, essentially, the same error, why should they
      produce completely different error messages?

because .. they can be treated differently.
for
#int + 2
there is no way around using something other than int.
but for
#myint + 2
you can redefine myint to be a variable, or you can use something other 
than myint.

  (2) the error message you get for the second one,
    "can't do that to a type", is much more useful than the one you get
    for the first, "syntax error".

so? ask the compiler writer to produce a more informative error messege!

  (3) The compiler writer's job is more complex for no benefit:
    you have a grammar with two different ways of matching types, which
    may be certain keywords or any identifier; and that grammar rejects
    certain erroneous constructs, such as adding things to built-in
    types, but you *still* need to check that  expressions are
    well formed since any identifier could in fact be a type name.
 
 It should be quite clear that the D grammar[*] would be simpler if type
 names were not keywords; and that the semantic checking required to
 compensate is already in place to deal with user-defined types.
 
 [*] by this I mean the one in the compiler, not the one in the 
 documentation; the latter tends to assume you know a priori what names
 are types, whereas you don't in practice.

Ok, how would that help the language user?

I never wrote a compiler, and I have no bit of clue about what you are 
talking about.

But, assuming that you are corrent, and that it does indeed make writing 
the compielr easier .. your point still doesn't stand.

The compiler has already been written!

I think it would be much easier for the compiler aithur to use what he 
had already written than to rewrite the compiler to compensate for your 
suggestion.

 
 I don't see one single real problem with the issue.

 If it ain't broken, don't fix it.

 
 
 Hey, then why bother with D? Use C/C++. It seems to work OK, people
 use it for a lot of stuff.

Because C++ is broken.
And Java is broken too.

 
 How about this: the language is in its early development. It is still
 possible to make changes like this. It will be much, much harder in the
 future. I still don't see one reason why there *should* be so many 
 keywords (other than the fact that's already done that way)  and I've 
 pointed out a few reasons why IMHO it's better, and cleaner, not to.

Where are those reasons? I didn't see them.
The only reasons were:
1- so you can use "int" as a variable name or something else.
2- easier to implement in a compiler.



 The fact that C does the same thing does not qualify as a reason, since
 it's a stated goal of D to eliminate the very reason C needs to do that.
 So, having gone to the trouble to eliminate the need for keywords... why
 are they still there???

Where does the documentation state that D's goal is to eliminate C's 
need for keywords?

Jul 18 2005

Greg Smith <greg siliconoptix.com> writes:

Hasan Aljudy wrote:
 
 Greg Smith wrote:
 
 Hasan Aljudy wrote:

 [snip]
 I just don't get it ...

 What's the point of making something like "int" not a keyword?

 #int int; //wth?
 #class int



 #float double = int.max; //go figure
 #double bit = cast(typeof( double )) int;

 What's the point of *making* it a keyword???

 Yes, this change would allow you to redefine int. it's possible in
 other languages, and they haven't self-destructed as a result.

 
 Sorry, all the languages I'v worked with are from the C family (C, C++, 
 Java, D) with the exception of Pascal.
 
 How do other languages implement that?

Very simple. You go into the symbol table at startup -- the same one
into which the user names go - and you predefine the names there as 
types. Pascal does this, and you've probably never noticed. See?
it doesn't hurt at all.

 
  If this is a problem, you could make it illegal to redefine built-in 
 names in certain scopes. If they are keywords, then
 this level of control is not possible.

 
 
 The ability to use "int" or "float" or "this" for one's own purposes is 
 not really an advantage.
 

No, that's not the point. You can still make it illegal to redefine 
these. What's the difference between making it illegal to redefine them 
and making them keywords?
   (1) by making them keywords, you complicate the grammar and gain no 
advantage by doing so; the grammar must still support type names which 
are identifiers.
   (2) by making them keywords, you cause them to be treated 
differently, in the parser and semantic passes, from user-defined types. 
Functionality needs to be replicated in the compiler, since 'int' is 
discovered to be a type in the parser, while 'myint' is seen as an 
identifier in the parser, and is discovered to be a type in the semantic
processing. This means more complexity than needed, and leads to 
inconsistent, and less useful, diagnostics.
   (3) New built-in types can be added in future to the language as 
predefined identifiers, with much less likelihood of breaking old code
than if they are added as new keywords.
   (4) if they are defined as identifiers, you can make it illegal to
redefine them in specific contexts. With keywords there is no such control.

To appeal to the KISS principle:
   - If the built-in types can be implemented in the same way as the 
user-defined types, why not do so ?? If you want to make it illegal
to redefine these, fine - but why chisel them into stone in the parser
when the grammar doesn't need this, and would be simpler without it?

 My point is, there's no reason to make it a keyword, unless you want
 it to always be (effectively) a special punctuation mark, in *all*
 possible contexts, and you want to extend that to *all* the built-in
 types, despite the fact that user-defined types don't have or need this
 special treatment, and you don't mind putting in extra grammar rules to
 deal with the fact that type names could be these keywords *or* 
 identifiers.

 
 
 I still don't get your point ....
 It's a keywrod because, well, how do you define a variable to be of a 
 certain type? well, you use a "type name" to spcify the type of a variable.
 
 type_name variable_name;
 
 You can define your own types, but your own types will always be defined 
 in terms of other types.
 
 typedef newtype oldtype;
 
 struct new_type
 {
     some_known_type field1;
     some_other_known_type field2;
     //.. etc
 }
 
 every new type is defined in terms of other type(s), there must be in 
 the end a type which isn't defined in terms of anything.
 
 int is such a type.
 
 if it's not a keyword, then it can be turned on and off.
 well, how do you turn it "on"? and what would be the point of having 
 turned off?
 

Clearly, all types have to start from built-in types. This is immaterial
to whether the built-in types are defined in the grammar as keywords, or
in the symbol table as predefined names, as in pascal.

You are saying this: because there is no point in redefining them, they
should be cast in stone in the parser. I mildly disagree with the 
premise, and I utterly disagree with the conclusion.

Regarding the premise, as I have pointed out, what if you want to add a 
new built-in type -- if you define it as a new keyword, it might 
conflict with a local variable name in some existing code.

If you want it to be illegal to redefine certain names, this is fine, 
but this does not by any means mean they need to be keywords!!
IMHO this should be done in the symbol table, not by making keywords 
that are not required by the grammar. This is much simpler in the long 
run; it leads to better error messages, e.g. "can't redefine 'int' in 
this name space " vs. "Syntax error"; and allows control by scope, e.g. 
you might want to allow some names to be used in struct members.


 
 [snip]
 
 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).




..
 Why didn't his compiler tell him that "this" is a keyword?



  ..
 Why on earth would it do that? it reported a syntax error,
 since a keyword appeared in a position where it was not
 allowed by the grammar. A lot of tokens other than 'identifier'
 are allowed there - so you wouldn't even get something as helpful as
 "error at 'try' - expected 'identifier'"
  Try it with your favourite C++ compiler.

 
 I'm just saying the problem here is the error messege, not the keyword.

I fully agree. And the best way to get better error messages is to allow 
the semantic pass to see these errors, rather than making them syntax 
errors, which is what happens when keywords are defined.
 
 In D as currently implemented,

     i = int + 2;

 .. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?




 I don't see your point .. both are errors.



 Here's the point:
  (1) they are both, essentially, the same error, why should they
      produce completely different error messages?

 
 
 because .. they can be treated differently.
 for
 #int + 2
 there is no way around using something other than int.
 but for
 #myint + 2
 you can redefine myint to be a variable, or you can use something other 
 than myint.
 

Please step back and think about what I am trying to say with this 
example. Of course they can be, and are, treated differently; I know
why the behavior occurs.
I'm saying there's no advantage to this and there are disadvantages;
which are eliminated by eliminating the keywords.
They are treated differently because the *parser* knows 'int' is a type 
name, and has no rule allowing it to add a type to something; but the 
parser has a rule saying an identifier can be added to something. What 
I'm saying is: if int
was *not* a keyword, we could eliminate the first rule, simplify the
grammar, get better diagnostics, shorten the keyword table (and thus
speed up the lexer) ... the compiler code which rejects 'int + 2' would
then be the same code which rejects "myint+2".
Is there any advantage to treating them differently?

  (2) the error message you get for the second one,
    "can't do that to a type", is much more useful than the one you get
    for the first, "syntax error".

 
 
 so? ask the compiler writer to produce a more informative error messege!

You say later that you aren't familiar with compilers, and no offence,
but that's showing here.
By far he easiest way to improve the error message is to do away with 
the unnecessary keywords. It's very hard to produce helpful messages for 
errors which arise because no grammar rule is applicable. A syntax error 
is basically the parser saying "huh?". At best, it can tell you where it 
became irrevocably confused, and tell what kinds of tokens are legal at 
that point. It is possible to add additional grammar rules, solely for 
the purpose of matching specific illegal constructs, so that they can be 
given more meaningful error messages. This gets rather messy. And in 
this case, the desirable grammar rules already exist -- with 
'identifier' in them, so that they don't apply when types happen to
be built-in types.

It is far easier in the semantic phase to provide a guess at what you 
think the programmer was trying to do, and produce a useful error 
message. Imagine a language which allows array declarations sized by 
integer constants,or expressions formed of integer constants. It would 
be possible to make 'int a[-3]' a syntax error in such a language, by 
contriving the grammar so that no rule matched it. Far better to make it 
syntactically  legal, so the message is "error: negative array dimension 
for 'a'", rather than "syntax error". The test would be needed anyhow, 
since the grammar can't make "int a[7-10]" illegal.

Actually, we could get this improvement in D by modifying the grammar as 
such:
    identifier_or_type::
              IDENTIFIER  { $$ = lookup_ident($1); }
           |  INT     { $$ = /* .. type obj for 'int' */ }
           |  BYTE    { $$ = /* .. type obj for 'byte' */ }
         ...

... and eliminating all other rules referencing the type keywords, 
which, by D charter, are actually redundant. And, using 
'identifier_or_type' in place of most IDENTIFER references (not the ones 
where IDENTIFER is assigned a meaning).

Thus, 'int + 2' would be caught by the same code as 'myint + 2'.

This change obtains most of the improvement I'm looking
for while still preventing the names from being redefined. It's then a 
relatively small step to eliminate this one weird bit of grammar and 
provide predefined symbols.

 
 
 Ok, how would that help the language user?
 
 I never wrote a compiler, and I have no bit of clue about what you are 
 talking about.
 


 But, assuming that you are corrent, and that it does indeed make writing 
 the compielr easier .. your point still doesn't stand.
 
 The compiler has already been written!
 
 I think it would be much easier for the compiler aithur to use what he 
 had already written than to rewrite the compiler to compensate for your 
 suggestion.

This is a valid point in general, but there are times, and precious few 
of them, when there is an opportunity to get things right even it means 
changing something which already works as it is. D is, by charter, in 
such a situation. All such opportunities should be considered in the 
long-term view, since there will *never* be an easier time to make such
a change. The cost of the change will be short-lived, the benefit will
stay on.

 
 How about this: the language is in its early development. It is still
 possible to make changes like this. It will be much, much harder in the
 future. I still don't see one reason why there *should* be so many 
 keywords (other than the fact that's already done that way)  and I've 
 pointed out a few reasons why IMHO it's better, and cleaner, not to.

 
 
 Where are those reasons? I didn't see them.
 The only reasons were:
 1- so you can use "int" as a variable name or something else.
 2- easier to implement in a compiler.

 


Regarding 2, the only reason you gave is the pre-existing code. Look
at the trouble Bill Gates got us all into with that thinking in the
early 80's. Do you really think the current D compiler will be the only 
one ever written?

  Also, you keep missing, or dismissing,

   3 - more consistent, useful error checking/error messages, by 
eliminating replication of semantic checking in the parser.

 
 The fact that C does the same thing does not qualify as a reason, since
 it's a stated goal of D to eliminate the very reason C needs to do that.
 So, having gone to the trouble to eliminate the need for keywords... why
 are they still there???

 
 
 Where does the documentation state that D's goal is to eliminate C's 
 need for keywords?
 

Not quite that. The stated goal is to eliminate the need, which exists
in C, for the parser to know which identifiers are previously defined
as typedefs (or classes in C++), since C cannot be parsed otherwise.


This makes D a 'context-free grammar', you don't need to feed
information back to the parser from the symbol table.
C defines 'int' etc as keywords for the same purpose, they must
be distinguished (to the parser) from regular identifiers.
(also, because C has idioms like 'unsigned char' which do not apply to 
typedefs, and have likewise been eliminated in D). So, making D's 
grammar context-free has, as a direct result, eliminated the need
for type names to be keywords.

-----------------

http://www.digitalmars.com/d/index.html

Major Goals of D
  ...
     * Make D substantially easier to implement a compiler for than C++.
  ...
     * Have a context-free grammar.
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-----------------

When I first encountered D, after reading the article on printf in 
Jan/05 Dr. Dobbs, I read the 'context-free grammar' part in the goals, 
and my first thought was "Great!" and my second was "... and so built-in 
types aren't keywords any more..." but that turned out be untrue, for 
reasons no-one has been able to supply.

I don't think I've encountered any other well-thought-out language (and 
D clearly is one) which defines a whole bunch of keywords which are not 
actually necessary to the parsing process.


Thank you for helping me clarify my argument.


BTW, I feel like I'm telling someone, "It's summer, you don't need to 
wear a snowsuit any more , you'll be more comfortable without it", and I 
keep getting back "you haven't really given me a strong enough reason to 
not wear it; the grocery store is a bit chilly, for instance; and I'm 
already wearing it and I know it fits..."

- greg

Jul 19 2005

Derek Parnell <derek psych.ward> writes:

On Tue, 19 Jul 2005 15:45:12 -0400, Greg Smith wrote:

[snip] 
 I don't think I've encountered any other well-thought-out language (and 
 D clearly is one) which defines a whole bunch of keywords which are not 
 actually necessary to the parsing process.

Greg, you have written a whole lot of sensible things here. You have my
support for what its worth.

-- 
Derek Parnell
Melbourne, Australia
20/07/2005 8:02:37 AM

Jul 19 2005

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Ah well, I guess I did help you make your point.

You're talking about compiler implementation, I have nothing to do with 
that.

My only concern is, well, I don't wanna be reading some code and keep 
wondering to myself whether this "int" here the real "int" or some 
variable name or class name defined by the user.

If it's merely a matter of compiler implementation then I don't care, as 
it's clearly not my business.

However, if what you are proposing would allow people to say "int" when 
they don't really mean the "int" that we currently know, then I have a 
problem with that.


Greg Smith wrote:
 Hasan Aljudy wrote:
 
 Greg Smith wrote:

 Hasan Aljudy wrote:

 [snip]

 I just don't get it ...

 What's the point of making something like "int" not a keyword?

 #int int; //wth?
 #class int



 #float double = int.max; //go figure
 #double bit = cast(typeof( double )) int;

 What's the point of *making* it a keyword???

 Yes, this change would allow you to redefine int. it's possible in
 other languages, and they haven't self-destructed as a result.


 Sorry, all the languages I'v worked with are from the C family (C, 
 C++, Java, D) with the exception of Pascal.

 How do other languages implement that?

 
 
 Very simple. You go into the symbol table at startup -- the same one
 into which the user names go - and you predefine the names there as 
 types. Pascal does this, and you've probably never noticed. See?
 it doesn't hurt at all.
 
  If this is a problem, you could make it illegal to redefine built-in 
 names in certain scopes. If they are keywords, then
 this level of control is not possible.



 The ability to use "int" or "float" or "this" for one's own purposes 
 is not really an advantage.

 No, that's not the point. You can still make it illegal to redefine 
 these. What's the difference between making it illegal to redefine them 
 and making them keywords?
   (1) by making them keywords, you complicate the grammar and gain no 
 advantage by doing so; the grammar must still support type names which 
 are identifiers.
   (2) by making them keywords, you cause them to be treated differently, 
 in the parser and semantic passes, from user-defined types. 
 Functionality needs to be replicated in the compiler, since 'int' is 
 discovered to be a type in the parser, while 'myint' is seen as an 
 identifier in the parser, and is discovered to be a type in the semantic
 processing. This means more complexity than needed, and leads to 
 inconsistent, and less useful, diagnostics.
   (3) New built-in types can be added in future to the language as 
 predefined identifiers, with much less likelihood of breaking old code
 than if they are added as new keywords.
   (4) if they are defined as identifiers, you can make it illegal to
 redefine them in specific contexts. With keywords there is no such control.
 
 To appeal to the KISS principle:
   - If the built-in types can be implemented in the same way as the 
 user-defined types, why not do so ?? If you want to make it illegal
 to redefine these, fine - but why chisel them into stone in the parser
 when the grammar doesn't need this, and would be simpler without it?
 
 My point is, there's no reason to make it a keyword, unless you want
 it to always be (effectively) a special punctuation mark, in *all*
 possible contexts, and you want to extend that to *all* the built-in
 types, despite the fact that user-defined types don't have or need this
 special treatment, and you don't mind putting in extra grammar rules to
 deal with the fact that type names could be these keywords *or* 
 identifiers.



 I still don't get your point ....
 It's a keywrod because, well, how do you define a variable to be of a 
 certain type? well, you use a "type name" to spcify the type of a 
 variable.

 type_name variable_name;

 You can define your own types, but your own types will always be 
 defined in terms of other types.

 typedef newtype oldtype;

 struct new_type
 {
     some_known_type field1;
     some_other_known_type field2;
     //.. etc
 }

 every new type is defined in terms of other type(s), there must be in 
 the end a type which isn't defined in terms of anything.

 int is such a type.

 if it's not a keyword, then it can be turned on and off.
 well, how do you turn it "on"? and what would be the point of having 
 turned off?

 Clearly, all types have to start from built-in types. This is immaterial
 to whether the built-in types are defined in the grammar as keywords, or
 in the symbol table as predefined names, as in pascal.
 
 You are saying this: because there is no point in redefining them, they
 should be cast in stone in the parser. I mildly disagree with the 
 premise, and I utterly disagree with the conclusion.
 
 Regarding the premise, as I have pointed out, what if you want to add a 
 new built-in type -- if you define it as a new keyword, it might 
 conflict with a local variable name in some existing code.
 
 If you want it to be illegal to redefine certain names, this is fine, 
 but this does not by any means mean they need to be keywords!!
 IMHO this should be done in the symbol table, not by making keywords 
 that are not required by the grammar. This is much simpler in the long 
 run; it leads to better error messages, e.g. "can't redefine 'int' in 
 this name space " vs. "Syntax error"; and allows control by scope, e.g. 
 you might want to allow some names to be used in struct members.
 
 
 [snip]

 I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).




 
 ..
 
 Why didn't his compiler tell him that "this" is a keyword?



 
  ..
 
 Why on earth would it do that? it reported a syntax error,
 since a keyword appeared in a position where it was not
 allowed by the grammar. A lot of tokens other than 'identifier'
 are allowed there - so you wouldn't even get something as helpful as
 "error at 'try' - expected 'identifier'"
  Try it with your favourite C++ compiler.


 I'm just saying the problem here is the error messege, not the keyword.

 
 
 I fully agree. And the best way to get better error messages is to allow 
 the semantic pass to see these errors, rather than making them syntax 
 errors, which is what happens when keywords are defined.
 
 In D as currently implemented,

     i = int + 2;

 .. is a syntax error, whereas

         alias int myint;
          i = myint + 2;

  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?





 I don't see your point .. both are errors.




 Here's the point:
  (1) they are both, essentially, the same error, why should they
      produce completely different error messages?



 because .. they can be treated differently.
 for
 #int + 2
 there is no way around using something other than int.
 but for
 #myint + 2
 you can redefine myint to be a variable, or you can use something 
 other than myint.

 Please step back and think about what I am trying to say with this 
 example. Of course they can be, and are, treated differently; I know
 why the behavior occurs.
 I'm saying there's no advantage to this and there are disadvantages;
 which are eliminated by eliminating the keywords.
 They are treated differently because the *parser* knows 'int' is a type 
 name, and has no rule allowing it to add a type to something; but the 
 parser has a rule saying an identifier can be added to something. What 
 I'm saying is: if int
 was *not* a keyword, we could eliminate the first rule, simplify the
 grammar, get better diagnostics, shorten the keyword table (and thus
 speed up the lexer) ... the compiler code which rejects 'int + 2' would
 then be the same code which rejects "myint+2".
 Is there any advantage to treating them differently?
 
  (2) the error message you get for the second one,
    "can't do that to a type", is much more useful than the one you get
    for the first, "syntax error".



 so? ask the compiler writer to produce a more informative error messege!

 
 
 You say later that you aren't familiar with compilers, and no offence,
 but that's showing here.
 By far he easiest way to improve the error message is to do away with 
 the unnecessary keywords. It's very hard to produce helpful messages for 
 errors which arise because no grammar rule is applicable. A syntax error 
 is basically the parser saying "huh?". At best, it can tell you where it 
 became irrevocably confused, and tell what kinds of tokens are legal at 
 that point. It is possible to add additional grammar rules, solely for 
 the purpose of matching specific illegal constructs, so that they can be 
 given more meaningful error messages. This gets rather messy. And in 
 this case, the desirable grammar rules already exist -- with 
 'identifier' in them, so that they don't apply when types happen to
 be built-in types.
 
 It is far easier in the semantic phase to provide a guess at what you 
 think the programmer was trying to do, and produce a useful error 
 message. Imagine a language which allows array declarations sized by 
 integer constants,or expressions formed of integer constants. It would 
 be possible to make 'int a[-3]' a syntax error in such a language, by 
 contriving the grammar so that no rule matched it. Far better to make it 
 syntactically  legal, so the message is "error: negative array dimension 
 for 'a'", rather than "syntax error". The test would be needed anyhow, 
 since the grammar can't make "int a[7-10]" illegal.
 
 Actually, we could get this improvement in D by modifying the grammar as 
 such:
    identifier_or_type::
              IDENTIFIER  { $$ = lookup_ident($1); }
           |  INT     { $$ = /* .. type obj for 'int' */ }
           |  BYTE    { $$ = /* .. type obj for 'byte' */ }
         ...
 
 ... and eliminating all other rules referencing the type keywords, 
 which, by D charter, are actually redundant. And, using 
 'identifier_or_type' in place of most IDENTIFER references (not the ones 
 where IDENTIFER is assigned a meaning).
 
 Thus, 'int + 2' would be caught by the same code as 'myint + 2'.
 
 This change obtains most of the improvement I'm looking
 for while still preventing the names from being redefined. It's then a 
 relatively small step to eliminate this one weird bit of grammar and 
 provide predefined symbols.
 
 Ok, how would that help the language user?

 I never wrote a compiler, and I have no bit of clue about what you are 
 talking about.

 
 
 But, assuming that you are corrent, and that it does indeed make 
 writing the compielr easier .. your point still doesn't stand.

 The compiler has already been written!

 I think it would be much easier for the compiler aithur to use what he 
 had already written than to rewrite the compiler to compensate for 
 your suggestion.

 
  >
 This is a valid point in general, but there are times, and precious few 
 of them, when there is an opportunity to get things right even it means 
 changing something which already works as it is. D is, by charter, in 
 such a situation. All such opportunities should be considered in the 
 long-term view, since there will *never* be an easier time to make such
 a change. The cost of the change will be short-lived, the benefit will
 stay on.
 
 How about this: the language is in its early development. It is still
 possible to make changes like this. It will be much, much harder in the
 future. I still don't see one reason why there *should* be so many 
 keywords (other than the fact that's already done that way)  and I've 
 pointed out a few reasons why IMHO it's better, and cleaner, not to.



 Where are those reasons? I didn't see them.
 The only reasons were:
 1- so you can use "int" as a variable name or something else.
 2- easier to implement in a compiler.

 
 


 
 Regarding 2, the only reason you gave is the pre-existing code. Look
 at the trouble Bill Gates got us all into with that thinking in the
 early 80's. Do you really think the current D compiler will be the only 
 one ever written?
 
  Also, you keep missing, or dismissing,
 
   3 - more consistent, useful error checking/error messages, by 
 eliminating replication of semantic checking in the parser.
 
 The fact that C does the same thing does not qualify as a reason, since
 it's a stated goal of D to eliminate the very reason C needs to do that.
 So, having gone to the trouble to eliminate the need for keywords... why
 are they still there???



 Where does the documentation state that D's goal is to eliminate C's 
 need for keywords?

 Not quite that. The stated goal is to eliminate the need, which exists
 in C, for the parser to know which identifiers are previously defined
 as typedefs (or classes in C++), since C cannot be parsed otherwise.
 
 
 This makes D a 'context-free grammar', you don't need to feed
 information back to the parser from the symbol table.
 C defines 'int' etc as keywords for the same purpose, they must
 be distinguished (to the parser) from regular identifiers.
 (also, because C has idioms like 'unsigned char' which do not apply to 
 typedefs, and have likewise been eliminated in D). So, making D's 
 grammar context-free has, as a direct result, eliminated the need
 for type names to be keywords.
 
 -----------------
 
 http://www.digitalmars.com/d/index.html
 
 Major Goals of D
  ...
     * Make D substantially easier to implement a compiler for than C++.
  ...
     * Have a context-free grammar.
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 -----------------
 
 When I first encountered D, after reading the article on printf in 
 Jan/05 Dr. Dobbs, I read the 'context-free grammar' part in the goals, 
 and my first thought was "Great!" and my second was "... and so built-in 
 types aren't keywords any more..." but that turned out be untrue, for 
 reasons no-one has been able to supply.
 
 I don't think I've encountered any other well-thought-out language (and 
 D clearly is one) which defines a whole bunch of keywords which are not 
 actually necessary to the parsing process.
 
 
 Thank you for helping me clarify my argument.
 
 
 BTW, I feel like I'm telling someone, "It's summer, you don't need to 
 wear a snowsuit any more , you'll be more comfortable without it", and I 
 keep getting back "you haven't really given me a strong enough reason to 
 not wear it; the grocery store is a bit chilly, for instance; and I'm 
 already wearing it and I know it fits..."
 
 - greg

Jul 19 2005

"Unknown W. Brackets" <unknown simplemachines.org> writes:

I don't agree; the reason I don't is completely for the thing you first 
speak of:

Parsing it.

Wait, you say: but, we already made it clear that you don't need to know 
type names to parse it (in D.)  Quite astute, yes... but, consider the 
following code:

int dumb()
{
    static int i = 0;

    return 42 + i++;
}

Now, let's say that int isn't a keyword.  Let's even say, for the sake 
of argument, I can name a function int as well:

int int()
{
    return 42;
}

Now, again, this should be parseable because we know some things here... 
but, most basic highlighters will not do this correctly.

In fact, I would argue that the fact we can do this:

int bool()
{
    return 1;
}

(which we can...) is the problem here.  Not only is that confusing, but 
the only editor I can think of offhand which has highlighting powerful 
enough to handle that is Microsoft Visual Studio - and even then, it 
would be *fun* to make it tell the difference, based on the way it works.

I will agree that if D is going to make every text editor/IDE/etc. 
developer's heads ache, it should do it from the start... otherwise, not 
at all (not for types!)

As it is, in languages like PHP... int can't really be highlighted by 
most highlighters - because you can use it as a type *AND* as a function 
name.  This is horrible, in my opinion, and detrimental - although it's 
negative effects are limited, in this case, because variables have a $ 
prefix.

-[Unknown]


 One of the problems with C/C++ is that you can't parse it unless
 you know what words are  type names, or typedef/classes; and for
 this reason C/C++ type names need to be keywords. D has modified
 the syntax so that a parser does not need to know in advance that
 certain identifiers represent user-defined types. I think this
 is a great step forward.
 
 The question is: why are all the built-in type names (and there
 are a lot of them) still keywords? They don't need to be, and I don't
 see how it can do any good to make them keywords. I count about
 24 keywords which are types. These could all be predefined identifiers.
 
 So why is this bad? Part of it is just a personal bias - if I plot
 a chart of all the languages I've used, with 'niceness of language'
 vs. 'number of keywords', there is a strong inverse correlation -
 python is in one corner, and Dec Compiled BASIC (yes, I'm that old)
 is far into the other corner.
 
 But there are some good reasons to avoid superfluous
 keywords. Keywords by  definition have the enforced meaning
 everywhere - if you add new
 keywords, you will break code which has any local, global, struct
 member, or anything with the same name. I remember a long time
 ago, a buddy was baffled that his C code wouldn't compile in
 C++, it turned out he had a struct member called 'this' or 'catch'
 or something (this was before the days of syntax coloring).
 In D, if new types are ever added - or new predefined values such
 as 'true' and 'false' - they can be added as predefined identifiers
 without breaking anything. So, why not do it that way from the beginning?
 
 Languages which implement built-in types (and constants) as predefined 
 identifiers include Pascal and VHDL, and python (to the extent that it 
 has type names, they are __builtin__ type objects and not keywords).
 
 D does not define property names as keywords, why are 'true'
 and 'false' and all those type names keywords?
 It could be argued that 'this' doesn't need to be a keyword. I might 
 want to have a struct member called 'this'; syntactically, it could be a 
 predefined local variable.
 
 In D as currently implemented,
 
     i = int + 2;
 
 .. is a syntax error, whereas
 
         alias int myint;
          i = myint + 2;
 
  ... is syntactically legal, but disallowed at the semantic level.
 Is this difference important or desirable?
 Making 'int' a predefined identifier would cause these two to be
 treated the same way in terms of compiler diagnostics.
 
 It might be argued that it would be very dangerous to allow
 functions to define a local variable called 'float'. In C, this
 could break code which is secretly inserted by macros or #include.
 But (a) D doesn't have these (b) *anything* can be broken in C by these
 things. In any case, you can always make it illegal to redefine float as
 a variable, while still allowing it in, say, struct namespaces. With
 a keyword, no such distinction is possible.
 
 - greg

Jul 19 2005

Greg Smith <greg siliconoptix.com> writes:

Unknown W. Brackets wrote:

 
 Now, let's say that int isn't a keyword.  Let's even say, for the sake 
 of argument, I can name a function int as well:
 
 int int()
 {
    return 42;
 }
 
 Now, again, this should be parseable because we know some things here...
 but, most basic highlighters will not do this correctly.

Firstly, 'int' etc can be protected from redefinition (on a 
scope-selective basis, if needed) without being a keyword, see my 
temporally preceding post for some reasons why this is an advantage over 
just using keywords.

Secondly, even if 'int' can be redefined at file scope,
I don't think it's a problem that 'int' would always appear colored as
a builtin-type in your editor. This would probably be preferred, since 
it would let you know you are doing something dubious.

If you wanted to go further, and make an editor which parses the whole 
file so that local variables, e.g. are displayed in a different color 
than globals, and references to undefined variables are displayed in 
red, and you can hover over any variable and see its type...  that's 
much, much easier for D  than for C. The same process would allow 'int' 
to be highlighted properly (and/or let you know specifically that you 
were redefining a built-in type).

I have experience in the user side of this kind of issue, since I do a 
lot of Python coding. In python you can write

def add_dot(str):
	return str + '.'

... and it works, but it's poor practice, since 'str' is a predefined
(__builtin__) name which corresponds to the string type. you only get
in trouble when you modify it and fail to notice the conflict:

def add_dot_num(str,num):


The second 'str' refers to the local parameter rather than the builtin
'str' which converts 2 to '2'. However, this doesn't cause anywhere near
as much trouble as you might think:

    - many editors color 'str' differently, since it's in __builtin__; 
this makes it harder to redefine it by mistake.
    - automatic code checkers can easily determine that this code is 
redefining 'str' and warn you;
    - new builtins added to the language (and they are often added) do 
not break any code that happens to already use the same name as a 
variable. The resulting 'dubious usage' can be 'fixed' at your leisure.

Jul 19 2005

Charles Hixson <charleshixsn earthlink.net> writes:

Greg Smith wrote:
...

D doesn't have all the syntax that some languages (I'm thinking 
of Ada here) have which would allow you to specify how many bits 
a particular type should have, what value range it should allow, 
etc.  As a result all of the basic space allocating words need to 
by keywords.

A type basically means:
1) reserve this space.
2) define these operations over this space

Things get a bit more complex when we start thinking about where 
the space is allocated, how it interacts with other types, and 
how we pass it as a parameter, but those are the basics.

D has a simple (relatively simple) syntax.  As a result, it needs 
a large number of keywords.

Jul 19 2005

Greg Smith <greg siliconoptix.com> writes:

Charles Hixson wrote:
 Greg Smith wrote:
 
 D doesn't have all the syntax that some languages (I'm thinking of Ada 
 here) have which would allow you to specify how many bits a particular 
 type should have, what value range it should allow, etc.  As a result 
 all of the basic space allocating words need to by keywords.
 
 A type basically means:
 1) reserve this space.
 2) define these operations over this space
 
 Things get a bit more complex when we start thinking about where the 
 space is allocated, how it interacts with other types, and how we pass 
 it as a parameter, but those are the basics.

These are semantic issues which have absolutely nothing to do with 
whether the type names are keywords.
 
 D has a simple (relatively simple) syntax.  As a result, it needs a 
 large number of keywords.

This makes no sense at all; you must be using an unusual definition of 
'keyword'. Languages with simpler syntax generally need
fewer keywords (example: python; extreme example: lisp). Furthermore,
the grammar of D would be simpler still if the type names became 
built-in identifiers (I mean the grammar in the compiler; the one known 
to the user would be effectively unchanged). Ada has a lot of keywords 
too; but the type names are not among them, since (as in pascal and D) 
they don't need to be. C is actually a anomaly in this sense, and C++ 
inherited the anomaly. D has inherited the practice[*], while 
specifically shaking off the necessity, this is what puzzles me.

A keyword is a specific combination of letters (e.g. 'if', 'goto') which 
is recognized by the lexical scanner as having a distinct significance 
no matter where it appears in the token sequence, despite the fact that 
it follows the general rule defining how an 'identifier' is formed.

Keywords are assigned significance before their situation relative to 
the other tokens is analyzed (i.e. prior to the parser), whereas other
identifers are assigned meaning after the parsing process. When meaning
is assigned later, it is possible to apply sophisticated rules to the 
process (e.g. 'mtype' might be a function name at global scope, but also
defined as an alias type inside a function, and at the same time be the
name of members of several structs).

[None of this is immutable law of language design, it's just the way
modern languages are designed and parsed, and the terminology which is 
used. C and C++, in fact, require bending of these rules, which is 
generally viewed as a problem: Once a typedef is defined, references to 
it must be identified as such *prior* to the parser; since this requires 
scopes to be considered, and scopes are defined by the parsing process, 
this can be tricky].

When a word is a keyword, it's a keyword everywhere. So why define
keywords at all? The conventional language-design practice is that you 
define keywords as needed to make the language parseable. The 'if' 
keyword tells the parser to expect the structure of an 'if' statement.
An example of the opposite approach is FORTRAN, which was designed well 
before formal grammars had found their way into computer programming. In
FORTRAN there are no keywords, and spaces have no significance. As a 
result, FORTRAN is quite difficult to parse, even though the process 
needs to be done only on one line at a time. Consider:

100 FORMAT(I2,I3)
100 FORMAT(I2,I3)=0
     DO 100 I=1,20
     DO 100 I=1.20

The first is a 'format' statement for output formatting, and the second 
is an assignment to an element of a 2d array called 'FORMAT'.  The third
is a do loop, and the fourth is an assigment to 'DO10I'. In order to 
distingush these, a fortran compiler basically has to dither back and 
forth over the entire line, trying to figure out what the heck the thing 
is. The analogous behaviour in a language like D, which is not split 
into lines, would be to dither over the entire source file, making 
guesses about what things are and checking if those guesses still work 
when inner levels are analyzed. Ugh.

By having strategically positioned keywords, you can parse powerful 
grammars, with complex nested structure, in a more-or-less left-to-right 
fashion. Whenever you see 'if' sitting there, what follows is either an 
if statement or invalid input; you don't need to go find the other end 
of it to see if it might be something else. This design is very clear in 
  pascal, where every definition of anything starts with a keyword 
indicating exactly what you are defining: procedure or function, or 
variable,constant, or type; and that in turn tells the parser what to 
expect next [Ada too, I think]. In D, the parser needs to work a little 
harder to figure things out, but you have less clutter.

So the question is, why define a bunch of keywords which are not only 
unnecessary to the parsing process, but actually complicate the grammar? 
Anywhere in D where I can use 'int', I can also use 'myint', which is an 
identifier that I have aliased to 'int'. So the parser needs to 
understand every possible such construct where the type name is an 
identifier, and it needs additional rules to understand them when they 
are keywords. As I've mentioned previously, this doesn't just lead to a 
more complex parser, it also leads to inferior diagnostic messages.

[*] I've been reading the manual a bit more, and I've found that D 
already has a built-in type implemented as a predefined identifier: 
Object. So why are all the other ones keywords?

Jul 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 Ada has a lot of keywords too; but the type names are not among them, 
 since (as in pascal and D) they don't need to be. C is actually a anomaly 
 in this sense, and C++ inherited the anomaly. D has inherited the 
 practice[*], while specifically shaking off the necessity, this is what 
 puzzles me.


need to or not. It could be that they remain keywords to be more compatible 
with C/C++ tools - though that is just a guess. For example I'm not sure if 
the emacs mode and syntax highlighter would color 'int' correctly if it 
wasn't on the keyword list.

Jul 20 2005

Charles Hixson <charleshixsn earthlink.net> writes:

Greg Smith wrote:
 Charles Hixson wrote:
 Greg Smith wrote:

 D doesn't have all the syntax that some languages (I'm thinking of Ada 
 here) have which would allow you to specify how many bits a particular 
 type should have, what value range it should allow, etc.  As a result 
 all of the basic space allocating words need to by keywords.

 A type basically means:
 1) reserve this space.
 2) define these operations over this space

 Things get a bit more complex when we start thinking about where the 
 space is allocated, how it interacts with other types, and how we pass 
 it as a parameter, but those are the basics.

 
 These are semantic issues which have absolutely nothing to do with 
 whether the type names are keywords.
 D has a simple (relatively simple) syntax.  As a result, it needs a 
 large number of keywords.

 This makes no sense at all; you must be using an unusual definition of 
 'keyword'. Languages with simpler syntax generally need
 fewer keywords (example: python; extreme example: lisp). Furthermore,
 the grammar of D would be simpler still if the type names became 
 built-in identifiers (I mean the grammar in the compiler; the one known 
 to the user would be effectively unchanged). Ada has a lot of keywords 
 too; but the type names are not among them, since (as in pascal and D) 
 they don't need to be. C is actually a anomaly in this sense, and C++ 
 inherited the anomaly. D has inherited the practice[*], while 
 specifically shaking off the necessity, this is what puzzles me.
 
 A keyword is a specific combination of letters (e.g. 'if', 'goto') which 
 is recognized by the lexical scanner as having a distinct significance 
 no matter where it appears in the token sequence, despite the fact that 
 it follows the general rule defining how an 'identifier' is formed.
 
 Keywords are assigned significance before their situation relative to 
 the other tokens is analyzed (i.e. prior to the parser), whereas other
 identifers are assigned meaning after the parsing process. When meaning
 is assigned later, it is possible to apply sophisticated rules to the 
 process (e.g. 'mtype' might be a function name at global scope, but also
 defined as an alias type inside a function, and at the same time be the
 name of members of several structs).
 
 [None of this is immutable law of language design, it's just the way
 modern languages are designed and parsed, and the terminology which is 
 used. C and C++, in fact, require bending of these rules, which is 
 generally viewed as a problem: Once a typedef is defined, references to 
 it must be identified as such *prior* to the parser; since this requires 
 scopes to be considered, and scopes are defined by the parsing process, 
 this can be tricky].
 
 When a word is a keyword, it's a keyword everywhere. So why define
 keywords at all? The conventional language-design practice is that you 
 define keywords as needed to make the language parseable. The 'if' 
 keyword tells the parser to expect the structure of an 'if' statement.
 An example of the opposite approach is FORTRAN, which was designed well 
 before formal grammars had found their way into computer programming. In
 FORTRAN there are no keywords, and spaces have no significance. As a 
 result, FORTRAN is quite difficult to parse, even though the process 
 needs to be done only on one line at a time. Consider:
 
 100 FORMAT(I2,I3)
 100 FORMAT(I2,I3)=0
     DO 100 I=1,20
     DO 100 I=1.20
 
 The first is a 'format' statement for output formatting, and the second 
 is an assignment to an element of a 2d array called 'FORMAT'.  The third
 is a do loop, and the fourth is an assigment to 'DO10I'. In order to 
 distingush these, a fortran compiler basically has to dither back and 
 forth over the entire line, trying to figure out what the heck the thing 
 is. The analogous behaviour in a language like D, which is not split 
 into lines, would be to dither over the entire source file, making 
 guesses about what things are and checking if those guesses still work 
 when inner levels are analyzed. Ugh.
 
 By having strategically positioned keywords, you can parse powerful 
 grammars, with complex nested structure, in a more-or-less left-to-right 
 fashion. Whenever you see 'if' sitting there, what follows is either an 
 if statement or invalid input; you don't need to go find the other end 
 of it to see if it might be something else. This design is very clear in 
  pascal, where every definition of anything starts with a keyword 
 indicating exactly what you are defining: procedure or function, or 
 variable,constant, or type; and that in turn tells the parser what to 
 expect next [Ada too, I think]. In D, the parser needs to work a little 
 harder to figure things out, but you have less clutter.
 
 So the question is, why define a bunch of keywords which are not only 
 unnecessary to the parsing process, but actually complicate the grammar? 
 Anywhere in D where I can use 'int', I can also use 'myint', which is an 
 identifier that I have aliased to 'int'. So the parser needs to 
 understand every possible such construct where the type name is an 
 identifier, and it needs additional rules to understand them when they 
 are keywords. As I've mentioned previously, this doesn't just lead to a 
 more complex parser, it also leads to inferior diagnostic messages.
 
 [*] I've been reading the manual a bit more, and I've found that D 
 already has a built-in type implemented as a predefined identifier: 
 Object. So why are all the other ones keywords?

Perhaps I am using an unusual definition.  E.g., I consider all 
of the words built into Forth to be keywords.  Note that you can, 
at your own risk, override any of them.  Forth has almost no 
syntax, it's all subsumed into the definitions of the words.  I 
consider a keyword to be anything that the compiler (or 
interpreter) knows what means.  Examples from D include not only 
things like int and uint, but also import, struct, etc.  With 
more syntax you need fewer keywords.  Perhaps Snobol is an 
example here.  (I don't really remember it clearly, but my 
impression was that it has LOTS of syntax, and few keywords.)

Note that this "Syntax" isn't an unified thing.  Ada has lots of 
syntax around storage allocation, but relatively few keywords, 
even though it allows you to specify such things as "This type 
denotes things that take up 37 bits and are floating point 
numbers with 3 digits of precision."  Just imagine the amount of 
work it would take to create such a type in D.  (Well, also 
imagine just how often it would be needed.)  D has chosen to 
PREDEFINE several "types" as keywords.  The other types are 
created by combining the primitive types.  One could argue, 
perhaps, that complex is a redundant type...but it can be very 
convenient.

For that matter, I occasionally wish that D had a bit more syntax 
around building types.  I'd like to be able to define a string 
class that has string literals.  (Others have uttered similar 
wishes, with perhaps a different idea of precisely what a string 
class would look like.)

What did you mean by keyword?

Jul 20 2005

Greg Smith <greg siliconoptix.com> writes:

Charles Hixson wrote:

 Greg Smith wrote:
 
 Charles Hixson wrote:

 Greg Smith wrote:

 D doesn't have all the syntax that some languages (I'm thinking of 
 Ada here) have which would allow you to specify how many bits a 
 particular type should have, what value range it should allow, etc.  
 As a result all of the basic space allocating words need to by keywords.

 A type basically means:
 1) reserve this space.
 2) define these operations over this space

 Things get a bit more complex when we start thinking about where the 
 space is allocated, how it interacts with other types, and how we 
 pass it as a parameter, but those are the basics.


 These are semantic issues which have absolutely nothing to do with 
 whether the type names are keywords.

 D has a simple (relatively simple) syntax.  As a result, it needs a 
 large number of keywords.

 This makes no sense at all; you must be using an unusual definition of 
 'keyword'. Languages with simpler syntax generally need
 fewer keywords (example: python; extreme example: lisp). Furthermore,
 the grammar of D would be simpler still if the type names became 
 built-in identifiers (I mean the grammar in the compiler; the one 
 known to the user would be effectively unchanged). Ada has a lot of 
 keywords too; but the type names are not among them, since (as in 
 pascal and D) they don't need to be. C is actually a anomaly in this 
 sense, and C++ inherited the anomaly. D has inherited the practice[*], 
 while specifically shaking off the necessity, this is what puzzles me.

 A keyword is a specific combination of letters (e.g. 'if', 'goto') 
 which is recognized by the lexical scanner as having a distinct 
 significance no matter where it appears in the token sequence, despite 
 the fact that it follows the general rule defining how an 'identifier' 
 is formed.

 Keywords are assigned significance before their situation relative to 
 the other tokens is analyzed (i.e. prior to the parser), whereas other
 identifers are assigned meaning after the parsing process. When meaning
 is assigned later, it is possible to apply sophisticated rules to the 
 process (e.g. 'mtype' might be a function name at global scope, but also
 defined as an alias type inside a function, and at the same time be the
 name of members of several structs).

 [None of this is immutable law of language design, it's just the way
 modern languages are designed and parsed, and the terminology which is 
 used. C and C++, in fact, require bending of these rules, which is 
 generally viewed as a problem: Once a typedef is defined, references 
 to it must be identified as such *prior* to the parser; since this 
 requires scopes to be considered, and scopes are defined by the 
 parsing process, this can be tricky].

 When a word is a keyword, it's a keyword everywhere. So why define
 keywords at all? The conventional language-design practice is that you 
 define keywords as needed to make the language parseable. The 'if' 
 keyword tells the parser to expect the structure of an 'if' statement.
 An example of the opposite approach is FORTRAN, which was designed 
 well before formal grammars had found their way into computer 
 programming. In
 FORTRAN there are no keywords, and spaces have no significance. As a 
 result, FORTRAN is quite difficult to parse, even though the process 
 needs to be done only on one line at a time. Consider:

 100 FORMAT(I2,I3)
 100 FORMAT(I2,I3)=0
     DO 100 I=1,20
     DO 100 I=1.20

 The first is a 'format' statement for output formatting, and the 
 second is an assignment to an element of a 2d array called 'FORMAT'.  
 The third
 is a do loop, and the fourth is an assigment to 'DO10I'. In order to 
 distingush these, a fortran compiler basically has to dither back and 
 forth over the entire line, trying to figure out what the heck the 
 thing is. The analogous behaviour in a language like D, which is not 
 split into lines, would be to dither over the entire source file, 
 making guesses about what things are and checking if those guesses 
 still work when inner levels are analyzed. Ugh.

 By having strategically positioned keywords, you can parse powerful 
 grammars, with complex nested structure, in a more-or-less 
 left-to-right fashion. Whenever you see 'if' sitting there, what 
 follows is either an if statement or invalid input; you don't need to 
 go find the other end of it to see if it might be something else. This 
 design is very clear in  pascal, where every definition of anything 
 starts with a keyword indicating exactly what you are defining: 
 procedure or function, or variable,constant, or type; and that in turn 
 tells the parser what to expect next [Ada too, I think]. In D, the 
 parser needs to work a little harder to figure things out, but you 
 have less clutter.

 So the question is, why define a bunch of keywords which are not only 
 unnecessary to the parsing process, but actually complicate the 
 grammar? Anywhere in D where I can use 'int', I can also use 'myint', 
 which is an identifier that I have aliased to 'int'. So the parser 
 needs to understand every possible such construct where the type name 
 is an identifier, and it needs additional rules to understand them 
 when they are keywords. As I've mentioned previously, this doesn't 
 just lead to a more complex parser, it also leads to inferior 
 diagnostic messages.

 [*] I've been reading the manual a bit more, and I've found that D 
 already has a built-in type implemented as a predefined identifier: 
 Object. So why are all the other ones keywords?

 
 
 Perhaps I am using an unusual definition.  E.g., I consider all of the 
 words built into Forth to be keywords.  Note that you can, at your own 
 risk, override any of them.  Forth has almost no syntax, it's all 
 subsumed into the definitions of the words.  I consider a keyword to be 
 anything that the compiler (or interpreter) knows what means.  Examples 
 from D include not only things like int and uint, but also import, 
 struct, etc.  With more syntax you need fewer keywords.  Perhaps Snobol 
 is an example here.  (I don't really remember it clearly, but my 
 impression was that it has LOTS of syntax, and few keywords.)
 
 Note that this "Syntax" isn't an unified thing.  Ada has lots of syntax 
 around storage allocation, but relatively few keywords, even though it 
 allows you to specify such things as "This type denotes things that take 
 up 37 bits and are floating point numbers with 3 digits of precision."  
 Just imagine the amount of work it would take to create such a type in 
 D.  (Well, also imagine just how often it would be needed.)  D has 
 chosen to PREDEFINE several "types" as keywords.  The other types are 
 created by combining the primitive types.  One could argue, perhaps, 
 that complex is a redundant type...but it can be very convenient.
 

I think you are making this much more complicated than it is. I'm 
proposing changing 'wchar' etc from a keyword to a predefined 
identifier. So it's still predefined, this doesn't affect anything 
you've discussed in the previous paragraph. All existing D code would be 
unaffected.

 For that matter, I occasionally wish that D had a bit more syntax around 
 building types.  I'd like to be able to define a string class that has 
 string literals.  (Others have uttered similar wishes, with perhaps a 
 different idea of precisely what a string class would look like.)
 
 What did you mean by keyword?
 

I think I made that pretty clear in my last post, quoted above, but I'll 
try again.
Conventional terminology is that a keyword is a sequence of letters 
taken away from the allowed set of identifiers, or names, and 
effectively used as a nicely readable punctuation mark.  You cannot 
redefine a keyword in any context, since it's recognized as a keyword 
before its context is considered.

By contrast, you can have identifiers which are reserved in specific 
contexts without being keywords. For instance, in C++, it would be 
possible to remove 'this' from the keyword list, so that it could be 
used in other contexts, such as a parameter in a non-member function,
or a struct member name. In member functions, 'this' would be an 
implicitly declared parameter.
I'm not suggesting this is a good idea; my point is, that the language 
would be no harder to parse, since 'this' is syntactically allowed only
in places where you can use an identifier; and changing 'this' to a 
local variable name doesn't change the meaning of any construct to the 
point where a different parse would be desirable. Such a change would 
not break any existing code, but it would allow code which is currently 
illegal C++ (including some legal C code).

By this definition, forth (like postscript) has no keywords at all, and 
(also like postscript) virtually no grammar, thus no need for keywords. 
[So it's quite possible that the term 'keyword' could take on a 
different meaning in various discussions of forth...]

Side note, anybody remember 'small c'? this was a sort-of-C compiler for 
8080 which, by negligence rather than intent, let you freely redefine 
most keywords, since it didn't really have a lexer separate from the 
parser. The parser had things like this:

     /* expect a statement */
     if( next_token("{") ) { /* compound statement */
     ...
     }else if ( next_word_is("while") ){
             /* it's a while statement */
              expect("(");
              ...

So, you could define 'int while' as a variable, and reference it, etc, 
but any statement starting in 'while' ( e.g. while=0;) would be 
disallowed because the code above would detect it.  I discovered this 
after using 'switch' as a global variable. A very 'interesting' compiler 
in many ways.

Jul 21 2005

D Programming

C/C++ Programming

Other

digitalmars.D - why are types all keywords?