www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A solution to the keyword problem

reply Greg Smith <greg siliconoptix.com> writes:
What problem? IMHO, there are too many, and more to the point, there are 
more than 20 of them which simply don't need to be keywords.
A couple of weeks ago I made the suggestion that all the built-in types
(int, double, wchar etc) should be predefined identifiers, like
'Object'; likewise, 'true' and 'false' do not need to be keywords. 
There were some responses suggesting that there was no need for such a 
change.

I just believe at a gut level that's it a bad idea to put things in the 
keyword table when they can be done as predefined identifiers. I don't 
know of any other well-thought-out language which does this. But, that's 
not a very strong argument. So, I've been looking through the spec, and 
have found some specific things which support a reduction of the number 
of keywords.  Also, I have a simple suggestion which will allow a 
workaround for the issues caused by the keywords.

So, weakest first:

(1) Version identifiers are in a separate namespace. If I write some 
code with version(remote){...}, and later on, 'remote' becomes a 
keyword, then not only will I have to change all the code, I'll need tp 
change the build scripts, too.

(2) the import statement, and module names, are tied to the file system.
I might create modules called 'supercomm.local' and 'supercomm.remote', 
if 'local' or 'remote' are added to the language as keywords, I have
to rename the directories to fix it. If this is 3rd-party code supplied 
as a library, I'm skunked.

(3) D can interface to C code. Unless the C code uses identifiers which 
are D keywords. At particular risk of being C function or variable 
names: align, bit, byte, delegate, export, final, interface, version.

Not all of these problems can be really solved by eliminating the 
roughly 20% of keywords which stand in for predefined types or values, 
of course; you just reduce the risk of a collision.

So the suggestion is this: provide a lexical convention for quoting 
identifiers, to prevent them from being recognized as keywords:

import supercomm.'local';  /* ... assuming 'local' became a keyword */

   /*
    * access to C function called 'final', which is a D keyword
    */
extern (C) int 'final'( struct foodle *p );


The proposed extension to the D lexer is that an 'identifier' of at 
least two characters can be enclosed in single quotes; this will not 
change its meaning as an identifier but will prevent it from being 
recognized as a keyword. VHDL has this feature; it's very important to 
VHDL since it's often necessary to give certain names to ports in VHDL 
(even if those names are VHDL keywords), to interface to external 
things. You can even have names starting with digits in VHDL, but they 
must be quoted. The VHDL  quouting convention is to use slashes:

     variable /7up/: integer;  -- variable name starts with digit
     variable /if/: integer;   -- variable name same as keyword
...
     if /if/ > 0 then
        /7up/ := /7up/ + 3;
     end if;

The problem with slashes is you need to properly handle things like 
"a=b/c/d;" I suspect, in VHDL, that the set of tokens which can legally 
precede an identifier is disjoint from the set which can legally precede 
the '/' operator, so the lexer can just look at the previous token. 
However, in C/C++/D, ')', for one, can legally precede either of these.

Another possibility is to put a single letter in front of the string 
quote, this is consistent with r"foo\bar" and x"0D0A": i.e.

import supercomm.w"local";

     extern (C) int w"final"( struct foodle *p );

Personally I prefer the single-quote approach, but I don't think it 
matters too much, as long as there's a way to do it. Another 
possibility: <final>, but that presumes that expressions like a<b>c are 
useless enough that you don't mind encumbering them with the need for 
spaces.

Scripts which automatically translate C headers to D could detect the 
reserved D words and automatically escape them; or even escape any C 
identifer which is in danger of becoming a D keyword, since there's no 
harm in escaping something that isn't actually a keyword.

Another example of a separate namespace which is unnecessarily harmed by 
defining built-in types as keywords: goto labels. If I have some code 
where I do 'goto pchar', and pchar later becomes a new type and a new 
keyword, this code will need to be changed. If 'pchar' is added as a new 
pre-defined identifier, no harm will be done.

A while ago I was trying to build some old C++ code; a class had been 
defined with member functions called 'and', 'or' and 'xor'. These have 
been, fairly recently, added to the C++ language as keywords, presumably 
for folks who don't have '^','|' and '&'. (There are some more to cover 
all the operators with these characters in them). What happened was, I 
got these baffling error messages, I ended up looking at the 
preprocessor output to see if there was something screwy in there; I 
eventually found the problem and created a header file like this

#define xor ident_xor
#define and ident_and
...


But, D doesn't have #define. Also, if the code I was working with had 
been in the form of a 3rd party object module, it would have been 
impossible to link to  it without getting the compiler to recognize 
'xor' etc as identifiers.
(turns out there's a gcc compiler option to suppress recognition of 
these silly keywords, BTW).

So to summarize the suggestion:

   - keywords can get you into trouble, especially when you are 
interfacing with other environments. D interfaces with C, and the file 
system (module names), in this manner.

   - the more keywords you have, the more likely to get into this 
trouble. Keywords not necessary for parsing should be removed and 
implemented as predefined identifiers, like 'Object' is already. If you 
don't want it to be possible to redefine them in certain contexts, then 
implement that as needed (no need to prohibit 'wchar' from being used as 
a goto label, for instance, which is currently impossible). If builtin 
types were identifiers, I could "import mmedia.real.realaudio;", without 
redefining 'real' in any harmful way.

  - a lexical convention should be added to allow the construction of 
identifiers whose patterns are otherwise reserved as keywords; as a 
workaround when you run into a problem.


One more thing: if the lexical convention is adopted, then it makes a 
big difference whether or not the built-in types become identifiers. 
Because, for instance 'Object' and unquoted Object would mean the same 
thing; likewise 'creal' and unquoted creal would mean the same thing if 
creal were a predefined identifier; if creal were a keyword, 'creal' and 
creal would of course be completely independent.

-- Greg
Jul 25 2005
next sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Greg Smith wrote:
 What problem? IMHO, there are too many, and more to the point, there are 
 more than 20 of them which simply don't need to be keywords.
 A couple of weeks ago I made the suggestion that all the built-in types
 (int, double, wchar etc) should be predefined identifiers, like
 'Object'; likewise, 'true' and 'false' do not need to be keywords. There 
 were some responses suggesting that there was no need for such a change.
...
 One more thing: if the lexical convention is adopted, then it makes a 
 big difference whether or not the built-in types become identifiers. 
 Because, for instance 'Object' and unquoted Object would mean the same 
 thing; likewise 'creal' and unquoted creal would mean the same thing if 
 creal were a predefined identifier; if creal were a keyword, 'creal' and 
 creal would of course be completely independent.
 
 -- Greg
I could see doing this if it were a part of a more wide reaching change which would turn the built-in types into Objects (well, magic objects, that were literally present rather than pointed to...same internal representation, but different language mapping), resulting in such features as, say, 10.mul(7) being the equivalent of 10 * 7, and allowing one to create classes (NON-magic classes) descending from them. This would also need to somehow harmonize array so that it could be a normal class, probably by allowing things like optAssign and optSliceAssign (do I remember those names correctly...I always need to look them up) to be redefined and used on classes that AREN'T strictly arrays as we now know them. The trick would be doing all that without paying an unacceptable penalty in terms of either ambiguity or performance. (Or development time.) In other words...I don't see the utility of what you are proposing as an isolated change, and including it where it appears to fit won't happen until AFTER version 1.0 is out. (Probably not then, as I don't think that Walter likes the "Everything is an object" school of language design. I may like it, but I consider it a lot less important than I do many other features.)
Jul 25 2005
next sibling parent AJG <AJG_member pathlink.com> writes:
In article <dc40t4$2o2k$1 digitaldaemon.com>, Charles Hixson says...
Greg Smith wrote:
 What problem? IMHO, there are too many, and more to the point, there are 
 more than 20 of them which simply don't need to be keywords.
 A couple of weeks ago I made the suggestion that all the built-in types
 (int, double, wchar etc) should be predefined identifiers, like
 'Object'; likewise, 'true' and 'false' do not need to be keywords. There 
 were some responses suggesting that there was no need for such a change.
...
 One more thing: if the lexical convention is adopted, then it makes a 
 big difference whether or not the built-in types become identifiers. 
 Because, for instance 'Object' and unquoted Object would mean the same 
 thing; likewise 'creal' and unquoted creal would mean the same thing if 
 creal were a predefined identifier; if creal were a keyword, 'creal' and 
 creal would of course be completely independent.
 
 -- Greg
I could see doing this if it were a part of a more wide reaching change which would turn the built-in types into Objects (well, magic objects, that were literally present rather than pointed to...same internal representation, but different language mapping), resulting in such features as, say, 10.mul(7) being the equivalent of 10 * 7, and allowing one to create classes (NON-magic classes) descending from them. This would also need to somehow harmonize array so that it could be a normal class, probably by allowing things like optAssign and optSliceAssign (do I remember those names correctly...I always need to look them up) to be redefined and used on classes that AREN'T strictly arrays as we now know them. The trick would be doing all that without paying an unacceptable penalty in terms of either ambiguity or performance. (Or development time.) In other words...I don't see the utility of what you are proposing as an isolated change, and including it where it appears to fit won't happen until AFTER version 1.0 is out. (Probably not then, as I don't think that Walter likes the "Everything is an object" school of language design. I may like it, but I consider it a lot less important than I do many other features.)
I second this thought. Change alone: not worth it. Change with added benefit of complete type homogeneity: very much worth it. --AJG.
Jul 25 2005
prev sibling parent reply Greg Smith <greg siliconoptix.com> writes:
Charles Hixson wrote:
 Greg Smith wrote:
 
 What problem? IMHO, there are too many, and more to the point, there 
 are more than 20 of them which simply don't need to be keywords.
 A couple of weeks ago I made the suggestion that all the built-in types
 (int, double, wchar etc) should be predefined identifiers, like
 'Object'; likewise, 'true' and 'false' do not need to be keywords. 
 There were some responses suggesting that there was no need for such a 
 change.
...
 One more thing: if the lexical convention is adopted, then it makes a 
 big difference whether or not the built-in types become identifiers. 
 Because, for instance 'Object' and unquoted Object would mean the same 
 thing; likewise 'creal' and unquoted creal would mean the same thing 
 if creal were a predefined identifier; if creal were a keyword, 
 'creal' and creal would of course be completely independent.

 -- Greg
I could see doing this if it were a part of a more wide reaching change which would turn the built-in types into Objects (well, magic objects, that were literally present rather than pointed to...same internal representation, but different language mapping), resulting in such features as, say, 10.mul(7) being the equivalent of 10 * 7, and allowing one to create classes (NON-magic classes) descending from them. This would also need to somehow harmonize array so that it could be a normal class, probably by allowing things like optAssign and optSliceAssign (do I remember those names correctly...I always need to look them up) to be redefined and used on classes that AREN'T strictly arrays as we now know them. The trick would be doing all that without paying an unacceptable penalty in terms of either ambiguity or performance. (Or development time.) In other words...I don't see the utility of what you are proposing as an isolated change, and including it where it appears to fit won't happen until AFTER version 1.0 is out. (Probably not then, as I don't think that Walter likes the "Everything is an object" school of language design. I may like it, but I consider it a lot less important than I do many other features.)
I can't speak to what you are suggesting; this is a 'type/class unification', I just don't know enough yet about D class semantics to comment. A similar change was made to Python in versions 2.2 and 2.3, and it was a great enhancement. It also took a lot of plannning and implementing in order to work well without breaking too much existing code. However, I don't see how this major change is connected at all to the minor change I am suggesting. Currently, the built-in type 'int' is indicated by a keyword 'int'; I am suggesting that the keyword (and all the other 'type' keywords) be removed, and pre-defined identifiers be implemented instead. This would have *zero* effect on existing D code. This change could be done without changing the semantics of the types. It would also be perfectly possible and reasonable to change the semantics of the types as you suggest, while leaving the 20 keywords in place. This would not, however, address the issues I raised in my post. I would appreciate if someone could address these specifically rather than just saying, "I don't see the utility". I could be utterly wrong about the problems with the C interface, for instance, and I'd appreciate knowing why. I can see how my mention of 'Object' could cause confusion; since 'Object' can be used as a base class, and 'int' can't; and 'Object' is a pre-defined identifier and 'int' isn't. But they're both predefined types; I'm *not* suggesting that int, real, etc. be changed into predefined identifiers so that they can be used as base classes; I'm suggesting that it be done because: (a) there's absolutely no reason whatsoever, none, anyone has given, or that I can think of, that they should be keywords as opposed to predefined identifiers, except for "because it's already done that way". If there is a reason it was done that way, I'd be interested in hearing it. (b) It would not break any code whatsoever. There is no effect on the semantics of any currently legal D code, since the new identifier 'int' has the same semantics as the old keyword 'int'. The only downside is the effort to change the compiler. (c) There are at least a few good reasons why types should *not* be keywords; it would make more identifiers available for unrelated namespaces such as goto labels and module (directory path) names. It would provide a way to add new built-in types in a uniform manner, without breaking old code. Certain error messages are much clearer with this change. Refer to previous posts. (d) "because that's the way it's already done", to my mind, should not apply to 'tabula rasa' designs which are in such a preliminary stage, especially for relatively minor changes to the compiler. I'm really guessing at how big a change it is, it's certainly not trivial; but the fact that the language already has a predefined identifier for the 'Object' type suggests to me that this is mostly a matter of deleting a bunch of code in the parser and lexer, and adding some new code, near where 'Object' is defined, to predefine the new identifiers. Since you can define 'alias int myint;' and then use the identifier 'myint' in place of keyword 'int' everywhere, it is clear that the parser can already handle identifiers for ints, so it's largely a matter of deleting the grammar rules which recognize the keyword tokens. And again, the suggestion for lexically escaping identifiers is almost completely independent of all of this stuff, and is, I feel, a decent workaround for problems caused by keywords clashing with external names. -- Greg
Jul 26 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
For what it's worth I like your idea. I have nothing further to add to the  
argument in either direction however. I wonder what Walter thinks, it  
appears to me that he is the only person in a position to have all the  
information for a descision on this.

Regan

On Tue, 26 Jul 2005 10:45:55 -0400, Greg Smith <greg siliconoptix.com>  
wrote:
 Charles Hixson wrote:
 Greg Smith wrote:

 What problem? IMHO, there are too many, and more to the point, there  
 are more than 20 of them which simply don't need to be keywords.
 A couple of weeks ago I made the suggestion that all the built-in types
 (int, double, wchar etc) should be predefined identifiers, like
 'Object'; likewise, 'true' and 'false' do not need to be keywords.  
 There were some responses suggesting that there was no need for such a  
 change.
...
 One more thing: if the lexical convention is adopted, then it makes a  
 big difference whether or not the built-in types become identifiers.  
 Because, for instance 'Object' and unquoted Object would mean the same  
 thing; likewise 'creal' and unquoted creal would mean the same thing  
 if creal were a predefined identifier; if creal were a keyword,  
 'creal' and creal would of course be completely independent.

 -- Greg
I could see doing this if it were a part of a more wide reaching change which would turn the built-in types into Objects (well, magic objects, that were literally present rather than pointed to...same internal representation, but different language mapping), resulting in such features as, say, 10.mul(7) being the equivalent of 10 * 7, and allowing one to create classes (NON-magic classes) descending from them. This would also need to somehow harmonize array so that it could be a normal class, probably by allowing things like optAssign and optSliceAssign (do I remember those names correctly...I always need to look them up) to be redefined and used on classes that AREN'T strictly arrays as we now know them. The trick would be doing all that without paying an unacceptable penalty in terms of either ambiguity or performance. (Or development time.) In other words...I don't see the utility of what you are proposing as an isolated change, and including it where it appears to fit won't happen until AFTER version 1.0 is out. (Probably not then, as I don't think that Walter likes the "Everything is an object" school of language design. I may like it, but I consider it a lot less important than I do many other features.)
I can't speak to what you are suggesting; this is a 'type/class unification', I just don't know enough yet about D class semantics to comment. A similar change was made to Python in versions 2.2 and 2.3, and it was a great enhancement. It also took a lot of plannning and implementing in order to work well without breaking too much existing code. However, I don't see how this major change is connected at all to the minor change I am suggesting. Currently, the built-in type 'int' is indicated by a keyword 'int'; I am suggesting that the keyword (and all the other 'type' keywords) be removed, and pre-defined identifiers be implemented instead. This would have *zero* effect on existing D code. This change could be done without changing the semantics of the types. It would also be perfectly possible and reasonable to change the semantics of the types as you suggest, while leaving the 20 keywords in place. This would not, however, address the issues I raised in my post. I would appreciate if someone could address these specifically rather than just saying, "I don't see the utility". I could be utterly wrong about the problems with the C interface, for instance, and I'd appreciate knowing why. I can see how my mention of 'Object' could cause confusion; since 'Object' can be used as a base class, and 'int' can't; and 'Object' is a pre-defined identifier and 'int' isn't. But they're both predefined types; I'm *not* suggesting that int, real, etc. be changed into predefined identifiers so that they can be used as base classes; I'm suggesting that it be done because: (a) there's absolutely no reason whatsoever, none, anyone has given, or that I can think of, that they should be keywords as opposed to predefined identifiers, except for "because it's already done that way". If there is a reason it was done that way, I'd be interested in hearing it. (b) It would not break any code whatsoever. There is no effect on the semantics of any currently legal D code, since the new identifier 'int' has the same semantics as the old keyword 'int'. The only downside is the effort to change the compiler. (c) There are at least a few good reasons why types should *not* be keywords; it would make more identifiers available for unrelated namespaces such as goto labels and module (directory path) names. It would provide a way to add new built-in types in a uniform manner, without breaking old code. Certain error messages are much clearer with this change. Refer to previous posts. (d) "because that's the way it's already done", to my mind, should not apply to 'tabula rasa' designs which are in such a preliminary stage, especially for relatively minor changes to the compiler. I'm really guessing at how big a change it is, it's certainly not trivial; but the fact that the language already has a predefined identifier for the 'Object' type suggests to me that this is mostly a matter of deleting a bunch of code in the parser and lexer, and adding some new code, near where 'Object' is defined, to predefine the new identifiers. Since you can define 'alias int myint;' and then use the identifier 'myint' in place of keyword 'int' everywhere, it is clear that the parser can already handle identifiers for ints, so it's largely a matter of deleting the grammar rules which recognize the keyword tokens. And again, the suggestion for lexically escaping identifiers is almost completely independent of all of this stuff, and is, I feel, a decent workaround for problems caused by keywords clashing with external names. -- Greg
Jul 26 2005
prev sibling parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
I have nothing against that.

Maybe you would want to change the terminology you are using, because in 
the other thread you made, I got totally the wrong idea.

keyword: for me (and I'm an idiot, btw) means an identifier that has a 
special meaning. >>> therefor it makes perfect sense that things like 
"int" and "this" be keywords.

keyowrd: when you speak of it, you actually mean how the compiler deals 
with this "special identifier"

So when you say remove keywords, you aren't suggesting to make D 
dynamicly typed or anything like that.

^
^
That's what confused me in the previous thread. Maybe it's just because 
I don't know alot about compilers.


Although one concern remains: syntax hilighting!
I'm thinking most editors with d-syntax-hilighting would color 'real' in:
#import something.real.somethingelse;



Greg Smith wrote:
 What problem? IMHO, there are too many, and more to the point, there are 
 more than 20 of them which simply don't need to be keywords.
 A couple of weeks ago I made the suggestion that all the built-in types
 (int, double, wchar etc) should be predefined identifiers, like
 'Object'; likewise, 'true' and 'false' do not need to be keywords. There 
 were some responses suggesting that there was no need for such a change.
 
 I just believe at a gut level that's it a bad idea to put things in the 
 keyword table when they can be done as predefined identifiers. I don't 
 know of any other well-thought-out language which does this. But, that's 
 not a very strong argument. So, I've been looking through the spec, and 
 have found some specific things which support a reduction of the number 
 of keywords.  Also, I have a simple suggestion which will allow a 
 workaround for the issues caused by the keywords.
 
 So, weakest first:
 
 (1) Version identifiers are in a separate namespace. If I write some 
 code with version(remote){...}, and later on, 'remote' becomes a 
 keyword, then not only will I have to change all the code, I'll need tp 
 change the build scripts, too.
 
 (2) the import statement, and module names, are tied to the file system.
 I might create modules called 'supercomm.local' and 'supercomm.remote', 
 if 'local' or 'remote' are added to the language as keywords, I have
 to rename the directories to fix it. If this is 3rd-party code supplied 
 as a library, I'm skunked.
 
 (3) D can interface to C code. Unless the C code uses identifiers which 
 are D keywords. At particular risk of being C function or variable 
 names: align, bit, byte, delegate, export, final, interface, version.
 
 Not all of these problems can be really solved by eliminating the 
 roughly 20% of keywords which stand in for predefined types or values, 
 of course; you just reduce the risk of a collision.
 
 So the suggestion is this: provide a lexical convention for quoting 
 identifiers, to prevent them from being recognized as keywords:
 
 import supercomm.'local';  /* ... assuming 'local' became a keyword */
 
   /*
    * access to C function called 'final', which is a D keyword
    */
 extern (C) int 'final'( struct foodle *p );
 
 
 The proposed extension to the D lexer is that an 'identifier' of at 
 least two characters can be enclosed in single quotes; this will not 
 change its meaning as an identifier but will prevent it from being 
 recognized as a keyword. VHDL has this feature; it's very important to 
 VHDL since it's often necessary to give certain names to ports in VHDL 
 (even if those names are VHDL keywords), to interface to external 
 things. You can even have names starting with digits in VHDL, but they 
 must be quoted. The VHDL  quouting convention is to use slashes:
 
     variable /7up/: integer;  -- variable name starts with digit
     variable /if/: integer;   -- variable name same as keyword
 ...
     if /if/ > 0 then
        /7up/ := /7up/ + 3;
     end if;
 
 The problem with slashes is you need to properly handle things like 
 "a=b/c/d;" I suspect, in VHDL, that the set of tokens which can legally 
 precede an identifier is disjoint from the set which can legally precede 
 the '/' operator, so the lexer can just look at the previous token. 
 However, in C/C++/D, ')', for one, can legally precede either of these.
 
 Another possibility is to put a single letter in front of the string 
 quote, this is consistent with r"foo\bar" and x"0D0A": i.e.
 
 import supercomm.w"local";
 
     extern (C) int w"final"( struct foodle *p );
 
 Personally I prefer the single-quote approach, but I don't think it 
 matters too much, as long as there's a way to do it. Another 
 possibility: <final>, but that presumes that expressions like a<b>c are 
 useless enough that you don't mind encumbering them with the need for 
 spaces.
 
 Scripts which automatically translate C headers to D could detect the 
 reserved D words and automatically escape them; or even escape any C 
 identifer which is in danger of becoming a D keyword, since there's no 
 harm in escaping something that isn't actually a keyword.
 
 Another example of a separate namespace which is unnecessarily harmed by 
 defining built-in types as keywords: goto labels. If I have some code 
 where I do 'goto pchar', and pchar later becomes a new type and a new 
 keyword, this code will need to be changed. If 'pchar' is added as a new 
 pre-defined identifier, no harm will be done.
 
 A while ago I was trying to build some old C++ code; a class had been 
 defined with member functions called 'and', 'or' and 'xor'. These have 
 been, fairly recently, added to the C++ language as keywords, presumably 
 for folks who don't have '^','|' and '&'. (There are some more to cover 
 all the operators with these characters in them). What happened was, I 
 got these baffling error messages, I ended up looking at the 
 preprocessor output to see if there was something screwy in there; I 
 eventually found the problem and created a header file like this
 
 #define xor ident_xor
 #define and ident_and
 ...
 
 
 But, D doesn't have #define. Also, if the code I was working with had 
 been in the form of a 3rd party object module, it would have been 
 impossible to link to  it without getting the compiler to recognize 
 'xor' etc as identifiers.
 (turns out there's a gcc compiler option to suppress recognition of 
 these silly keywords, BTW).
 
 So to summarize the suggestion:
 
   - keywords can get you into trouble, especially when you are 
 interfacing with other environments. D interfaces with C, and the file 
 system (module names), in this manner.
 
   - the more keywords you have, the more likely to get into this 
 trouble. Keywords not necessary for parsing should be removed and 
 implemented as predefined identifiers, like 'Object' is already. If you 
 don't want it to be possible to redefine them in certain contexts, then 
 implement that as needed (no need to prohibit 'wchar' from being used as 
 a goto label, for instance, which is currently impossible). If builtin 
 types were identifiers, I could "import mmedia.real.realaudio;", without 
 redefining 'real' in any harmful way.
 
  - a lexical convention should be added to allow the construction of 
 identifiers whose patterns are otherwise reserved as keywords; as a 
 workaround when you run into a problem.
 
 
 One more thing: if the lexical convention is adopted, then it makes a 
 big difference whether or not the built-in types become identifiers. 
 Because, for instance 'Object' and unquoted Object would mean the same 
 thing; likewise 'creal' and unquoted creal would mean the same thing if 
 creal were a predefined identifier; if creal were a keyword, 'creal' and 
 creal would of course be completely independent.
 
 -- Greg
 
 
 
 
 
 
 
 
 
Jul 28 2005
parent reply Greg Smith <greg siliconoptix.com> writes:
Hasan Aljudy wrote:

 I have nothing against that.
 
 Maybe you would want to change the terminology you are using, because in 
 the other thread you made, I got totally the wrong idea.
 
 keyword: for me (and I'm an idiot, btw) means an identifier that has a 
 special meaning. >>> therefor it makes perfect sense that things like 
 "int" and "this" be keywords.
 
It's a question of what exactly that special meaning is, and where it's implemented. By your broad definition, 'Object' should be a keyword in D. But if you look in the list of keywords ( http://www.digitalmars.com/d/lex.html#keyword ) , you won't find it there. 'Keyword' is pretty standard terminology in the compiler business, and it is tied to the almost universal concept of analyzing source code by a two-stage process of (a) lexical analysis (lexing) and (b) grammatical analysis (parsing). A 'keyword' is recognized in the lexer, prior to parsing. Predefined identifiers like 'Object' are recognized after parsing, and, unlike keywords, their interpretation may depend on name scoping rules. Since name scopes are generally defined by the structure of the program, which is inferred by the parser, it's very cumbersome to apply scoping rules before parsing. Because of this distinction, you can have a D local variable called 'Object' but not one called 'while'. I suspect that if I used different terminology, the result would be greater confusion overall. - Greg
Aug 03 2005
parent reply J C Calvarese <technocrat7 gmail.com> writes:
In article <dcr2se$1hv2$1 digitaldaemon.com>, Greg Smith says...
Hasan Aljudy wrote:

 I have nothing against that.
 
 Maybe you would want to change the terminology you are using, because in 
 the other thread you made, I got totally the wrong idea.
 
 keyword: for me (and I'm an idiot, btw) means an identifier that has a 
 special meaning. >>> therefor it makes perfect sense that things like 
 "int" and "this" be keywords.
 
It's a question of what exactly that special meaning is, and where it's implemented. By your broad definition, 'Object' should be a keyword in D. But if you look in the list of keywords ( http://www.digitalmars.com/d/lex.html#keyword ) , you won't find it there. 'Keyword' is pretty standard terminology in the compiler business, and it is tied to the almost universal concept of analyzing source code by a two-stage process of (a) lexical analysis (lexing) and (b) grammatical analysis (parsing). A 'keyword' is recognized in the lexer, prior to parsing. Predefined identifiers like 'Object' are recognized after parsing, and, unlike keywords, their interpretation may depend on name scoping rules. Since name scopes are generally defined by the structure of the program, which is inferred by the parser, it's very cumbersome to apply scoping rules before parsing.
I'm not a compiler writer, so examples help me understand what your suggesting.
Because of this distinction, you can have a D local variable called 
'Object' but not one called 'while'.
I didn't even realize that we could do this: import std.stdio; int main() { int Object; Object = 1; writefln(Object); return true; } It seems like a clever way coding that I'd probably end up shooting myself in the foot with. So now, you want to be able to do this: import std.stdio; int main() { double int; int = 1; writefln(int); return true; } Seems like a bad idea to me. I don't see a big problem with the compiler prohibiting this. Yes, it could be a pain if you're porting a library written in another language that uses "int" as an identifier, but I don't expect that happens too often.
I suspect that if I used different terminology, the result would be 
greater confusion overall.
Probably. It's a confusing topic anyway.
- Greg
That's my 1.99999 cents. jcc7
Aug 03 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 3 Aug 2005 20:01:21 +0000 (UTC), J C Calvarese  
<technocrat7 gmail.com> wrote:
 Maybe you would want to change the terminology you are using, because  
 in
 the other thread you made, I got totally the wrong idea.

 keyword: for me (and I'm an idiot, btw) means an identifier that has a
 special meaning. >>> therefor it makes perfect sense that things like
 "int" and "this" be keywords.
It's a question of what exactly that special meaning is, and where it's implemented. By your broad definition, 'Object' should be a keyword in D. But if you look in the list of keywords ( http://www.digitalmars.com/d/lex.html#keyword ) , you won't find it there. 'Keyword' is pretty standard terminology in the compiler business, and it is tied to the almost universal concept of analyzing source code by a two-stage process of (a) lexical analysis (lexing) and (b) grammatical analysis (parsing). A 'keyword' is recognized in the lexer, prior to parsing. Predefined identifiers like 'Object' are recognized after parsing, and, unlike keywords, their interpretation may depend on name scoping rules. Since name scopes are generally defined by the structure of the program, which is inferred by the parser, it's very cumbersome to apply scoping rules before parsing.
I'm not a compiler writer, so examples help me understand what your suggesting.
I've writen a simple C parser. I recommend it to anyone who wants an interesting and challenging (assuming no experience) challenge.
 Because of this distinction, you can have a D local variable called
 'Object' but not one called 'while'.
I didn't even realize that we could do this: import std.stdio; int main() { int Object; Object = 1; writefln(Object); return true; } It seems like a clever way coding that I'd probably end up shooting myself in the foot with. So now, you want to be able to do this: import std.stdio; int main() { double int; int = 1; writefln(int); return true; } Seems like a bad idea to me. I don't see a big problem with the compiler prohibiting this. Yes, it could be a pain if you're porting a library written in another language that uses "int" as an identifier, but I don't expect that happens too often.
One of the points made was that you can still prohibit the use above, if you want, as part of the 'parser' (not the lexer), the advantage being that you have more control over where you allow it and where you prohibit it. Personally I don't have a problem with:
 double int;
 int = 1;
 writefln(int);
because it's clear when you read any of those lines, together or stand alone, that 'int' is not a type but a variable. This is because, you, like the parser can take the context of 'int' into consideration when you read the lines. About the worst bug I can think of would be if you meant to type "int a = 1;" and accidently missed the "a" getting "int = 1;". But, that would cause an error in all situations except where there was a variable called "int" present. So, it's really no different to any other typo except that it can occur in a variable declaration (which can occur just about anywhere, anyway). Can anyone think of a potential bug which becomes 'more' likely because of this change? If the keywords were removed then yes, code like that shown above with 'int' could become legal, unless Walter decided to prohibit it in the parser. The advantage would would be a change to the D grammar, it would get smaller and simpler. Another point made was that as a result you get more descriptive error messages for less work (you don't have to code special cases in the lexer). Some examples were given in a previous thread. The fact that conversion from another language, that allows "this" or any other D keyword, would become easier is another advantage. In short, more control, better errors, other advantages and no disadvantages (except some work for Walter to implement the change) that I can see. Regan
Aug 03 2005
parent reply J C Calvarese <technocrat7 gmail.com> writes:
In article <opsuydw2lk23k2f5 nrage.netwin.co.nz>, Regan Heath says...
On Wed, 3 Aug 2005 20:01:21 +0000 (UTC), J C Calvarese  
<technocrat7 gmail.com> wrote:
..
 I'm not a compiler writer, so examples help me understand what your  
 suggesting.
I've writen a simple C parser. I recommend it to anyone who wants an interesting and challenging (assuming no experience) challenge.
I'm much too lazy for that. ;)
 Because of this distinction, you can have a D local variable called
 'Object' but not one called 'while'.
I didn't even realize that we could do this: import std.stdio; int main() { int Object; Object = 1; writefln(Object); return true; } It seems like a clever way coding that I'd probably end up shooting myself in the foot with. So now, you want to be able to do this: import std.stdio; int main() { double int; int = 1; writefln(int); return true; } Seems like a bad idea to me. I don't see a big problem with the compiler prohibiting this. Yes, it could be a pain if you're porting a library written in another language that uses "int" as an identifier, but I don't expect that happens too often.
One of the points made was that you can still prohibit the use above, if you want, as part of the 'parser' (not the lexer), the advantage being that you have more control over where you allow it and where you prohibit it.
Well, if we're not prohibiting this kind of crazy, I don't understand why we'd undertake the effort. Are we trying to speed up a compiler that's already blazingly fast?
Personally I don't have a problem with:

 double int;
 int = 1;
 writefln(int);
because it's clear when you read any of those lines, together or stand alone, that 'int' is not a type but a variable. This is because, you, like the parser can take the context of 'int' into consideration when you read the lines.
It's clear to me that who ever wrote that example was a masochist (oops, I wrote that). The context shows that something fishy is going on. :)
About the worst bug I can think of would be if you meant to type "int a =  
1;" and accidently missed the "a" getting "int = 1;".  But, that would  
cause an error in all situations except where there was a variable called  
"int" present.  So, it's really no different to any other typo except that  
Right. Well, I'm worried about when a variable called "int" is present.
it can occur in a variable declaration (which can occur just about  
anywhere, anyway). Can anyone think of a potential bug which becomes  
'more' likely because of this change?
Hey, we could make semi-colons optional, too. :)
If the keywords were removed then yes, code like that shown above with  
'int' could become legal, unless Walter decided to prohibit it in the  
parser. The advantage would would be a change to the D grammar, it would  
get smaller and simpler.

Another point made was that as a result you get more descriptive error  
messages for less work (you don't have to code special cases in the  
lexer). Some examples were given in a previous thread.
There's always cost/benefit issues. Are you sure that the cost of changing the innards of the compiler outweigh the benefit of possibly better error messages? I'm sure I don't know. Even if we do get better error messages, though, it seems like a detour taking us farther away from D 1.0.
The fact that conversion from another language, that allows "this" or any  
other D keyword, would become easier is another advantage.
So then we could call a method "this". Could we use the "this" of that method's class? :x I think keywords serve a purpose ("This identifier is off-limits"). It hurts my head to consider all of the possible pitfalls.
In short, more control, better errors, other advantages and no  
disadvantages (except some work for Walter to implement the change) that I  
can see.

Regan
Well, you might be right, but I'm unconvinced. If Walter wants to do it, he can do it. If someone else wants to try it out, that's what GDC is for: http://www.prowiki.org/wiki4d/wiki.cgi?GdcHacking jcc7
Aug 03 2005
parent reply Greg Smith <greg siliconoptix.com> writes:
J C Calvarese wrote:


 I think keywords serve a purpose ("This identifier is off-limits"). It
 hurts my head to consider all of the possible pitfalls.
 
But off-limits *everywhere*? I can't have a module called 'mmedia.real.realaudio', simply because 'real' is a keyword and the parser would balk. If 'real' were a predefined identifier, this would work, and would not create a harmful redefinition of 'real'. In my view, "This identifier is off-limits" is precisely the collateral damage *caused* by keywords, *not* their purpose. They serve to provide a set of readable punctuation marks to guide the parser, and the downside is that they are reserved in all possible contexts, whereas normal identifiers are scoped. This is more of an issue when you consider the likely possibility that new built-in types will be added to the language in the future; if they are added as keywords, they will break existing code; but if they are added as predefined identifiers, they won't. Yes, it's easy to laugh at silly examples where 'int' is used as a local variable of type 'char'. We could sit here all day writing dangerously misleading code, without having to propose language changes to make that possible. "alias wchar rea1;" comes to mind... You'd fill less silly, and more p-d off, if you had used 'rchar' as a perfectly good local variable name, or struct member name, and it later became a new built-in type (and a new keyword). And by the way, it's possible to redefine 'integer' in Pascal, and nobody has ever seemed to complain of their head hurting, or even notice, for that matter. They just don't do it. [Ada too, I think]. I have never once proposed that anybody should be encouraged to redefine int; in fact, this can be prohibited without making it a keyword. There's IMHO an issue in D, arising from the fact that there are a lot of keywords; and while I believe that more than 20 of them are unnecessary (all the type names, and true/false), even if they are eliminated there will still be a problem as follows: extern(C) int delegate( foo*p, int id ); // Can't do it! I simply can't interface to this existing C function since its name is a D keyword. Almost 100 C identifiers are simply 'off-limits', and there is no real solution to this unless you can modify the C code. I've previously proposed that a lexical convention be added to escape identifiers from keyword recognition, e.g. extern(C) int 'delegate'( foo*p, int id ); ... if( can_delegate ) 'delegate' ( foop, 0); // call C function int 'if' = 0; 'if'++; // now possible, if you really want to. char x = 'c'; // still a char constant if only 1 char import commlib.'interface'.localapi; // interface is a keyword Or, w"delegate". This suggestion is separate from the suggestion that type names should not be keywords; either could be adopted without the other.
 So then we could call a method "this". Could we use the "this" of
 that method's class? :x
FWIW, if 'this' were converted from a D keyword to a predefined identifier, it would not be so simple, since 'this' in the class namespace would be the constructor. I don't know enough about all the details of D classes to comment further. In C++, it would be possible to remove 'this' from the keyword table and implement as a predefined formal parameter of member funcs. And the answer to your question would be "Yes, and with no extra trouble than is already required in some cases". It would work like this: class CLS { // mutant C++, with 'this' not a keyword int f1(CLS *); int f2(); int f3(); int this(); // this has no special meaning outside a mem func body }; int CLS ::f1(CLS *f3) // note: parameter named f3 hides CLS::f3 as per normal C++ // implicit parameter 'this' hides member 'this' in the same way { f2(); // member func call this->f3(); // can't just use f3(), it's hidden by parameter this->this(); // likewise, when calling CLS::this() f3->this(); // call member func of other CLS object f3->f3(); } The call to 'CLS::this()' has the same issue, and solves it in the same way, as the call to CLS::f3. My rather convoluted point is that whatever difficulty is presented here is already present in the language and can be avoided in the same way you are already avoiding it. I.e., you probably try to avoid giving class mem func params the same name as class members, so don't go using 'this' as a member name just because you can. But if you had some old C code that defined a struct member 'this', you wouldn't have to change that to convert it into our mutant C++, whereas with real C++, you do. -- Greg
Aug 04 2005
next sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi,

 I think keywords serve a purpose ("This identifier is off-limits"). It
 hurts my head to consider all of the possible pitfalls.
 
But off-limits *everywhere*? I can't have a module called 'mmedia.real.realaudio', simply because 'real' is a keyword and the parser would balk. If 'real' were a predefined identifier, this would work, and would not create a harmful redefinition of 'real'.
There are reasons for that. First of all, IMHO, it would be fairly confusing to allow such a thing. Second, what happens if you declare something like min or max in your new 'real' module: ulong m = real.max; // Is this your module's or the primitive's?
In my view, "This identifier is off-limits" is precisely the collateral 
damage *caused* by keywords, *not* their purpose. They serve to provide 
a set of readable punctuation marks to guide the parser, and the 
downside is that they are reserved in all possible contexts, whereas 
normal identifiers are scoped.
Keywords are scoped too. Their scope is universal. Where, pray tell, would declaring a variable called "int" be a good idea? IMO nowhere, thus the scope of the keyword is universal for a good reason.
This is more of an issue when you consider the likely possibility that 
new built-in types will be added to the language in the future; if they 
are added as keywords, they will break existing code; but if they are 
added as predefined identifiers, they won't.
You can't guarantee this.
Yes, it's easy to laugh at silly examples where 'int' is used as a local 
variable of type 'char'.  We could sit here all day writing dangerously 
misleading code, without having to propose language changes to make that 
possible.
One of the goals of any good language should be to reduce the possibility of writing such dangerous code. Just because you can use _other_ features to write dangerous code doesn't mean we need more such features. Two wrongs don't make a right. Declaring a variable named "int" or "char" is just plain wrong IMO. If the language prevents it, all the better.
"alias wchar rea1;" comes to mind... You'd fill less silly, 
and more p-d off, if you had used 'rchar' as a perfectly good local 
variable name, or struct member name, and it later became a new built-in 
type (and a new keyword).
Since this is all about identifiers, you can very easily do a global search and replace. Normally, such replaces are not that simple if it's something like a string, or a construct, or what have you. But since it's an identifier, it's exceedingly simple to replace it.
And by the way, it's possible to redefine 
'integer' in Pascal, and nobody has ever seemed to complain of their 
head hurting, or even notice, for that matter. They just don't do it.
"Nobody" has ever complained? Is this a fact? "They" just don't do it? Can you guarantee this hasn't happened and caused bugs?
[Ada too, I think]. I have never once proposed that anybody should be 
encouraged to redefine int; in fact, this can be prohibited without 
making it a keyword.
You are contradicting yourself here. Do you want "int" to be a valid variable name or not? If yes, then that's a bad idea. If no, then what's with all the "evidence" that this is not an abominable idea?
There's IMHO an issue in D, arising from the fact that there are a lot 
of keywords; and while I believe that more than 20 of them are 
unnecessary (all the type names, and true/false), even if they are 
eliminated there will still be a problem as follows:

     extern(C) int delegate( foo*p, int id );	// Can't do it!

I simply can't interface to this existing C function since its name is a 
D keyword. Almost 100 C identifiers are simply 'off-limits', and there 
is no real solution to this unless you can modify the C code.
A language has keywords. This is a fact of life. I understand your desire to reduce the number of keywords, but at what cost? I see a miniscule benefit (slightly higher compatibility with C), at a huge cost (ambiguity, Walter's time, potential for dangerous code, etc.).
I've previously proposed that a lexical convention be added
to escape identifiers from keyword recognition, e.g.
This suggestion is separate from the suggestion that type names should 
not be keywords; either could be adopted without the other.
This makes a _lot_ more sense than allowing people to name their functions "delegate". You should push for this suggestion instead.
 So then we could call a method "this". Could we use the "this" of
 that method's class? :x
<snip> Allowing redefinition of "this" is insane IMO.
But if you had some  old C code that defined a struct member 'this', you 
wouldn't have to change that to convert it into our mutant C++, whereas 
with real C++, you do.
What about the simple suggestion to allow quoted identifiers? extern (C) 'this'(int foo); // for a function called 'this'. Cheers, --AJG.
Aug 04 2005
parent reply Greg Smith <greg siliconoptix.com> writes:
AJG wrote:

 Hi,
 
 
I think keywords serve a purpose ("This identifier is off-limits"). It
hurts my head to consider all of the possible pitfalls.
But off-limits *everywhere*? I can't have a module called 'mmedia.real.realaudio', simply because 'real' is a keyword and the parser would balk. If 'real' were a predefined identifier, this would work, and would not create a harmful redefinition of 'real'.
There are reasons for that. First of all, IMHO, it would be fairly confusing to allow such a thing. Second, what happens if you declare something like min or max in your new 'real' module: ulong m = real.max; // Is this your module's or the primitive's?
What's wrong with import mmedia.real.realaudio; ulong m = real.max; // Existing real ulong m2 = mmedia.real.realaudio.max; // from module or just "m2=max;", as I understand import to work. I have not redefined real in the global scope. This is what scopes are for. It's not confusing to *allow* something, what's confusing is what people do with it. Do you think this change is going to turn D from something in which no confusing code is possible into something which is a morass of unavoidable confusion?
 
In my view, "This identifier is off-limits" is precisely the collateral 
damage *caused* by keywords, *not* their purpose. They serve to provide 
a set of readable punctuation marks to guide the parser, and the 
downside is that they are reserved in all possible contexts, whereas 
normal identifiers are scoped.
Keywords are scoped too. Their scope is universal. Where, pray tell, would declaring a variable called "int" be a good idea? IMO nowhere, thus the scope of the keyword is universal for a good reason.
A keyword is actually scoped more universally than a variable. You can't do version(keyword){}, or extern(keyword) int foo();, whereas normal identifers can be used in those contexts without being defined or even mentioned anywhere else; and even if they are mentioned in other contexts there is no connection between a version tag 'xyz' and a language identifier (type, variable, etc) 'xyz'. I can't see how using version(real){ } version(emulation){ } ... should be particularly harmful. And again, it's not the existing keywords that are the issue, it's the new ones that will be defined later.
 
This is more of an issue when you consider the likely possibility that 
new built-in types will be added to the language in the future; if they 
are added as keywords, they will break existing code; but if they are 
added as predefined identifiers, they won't.
You can't guarantee this.
I think you can, for practical purposes. The user definition of the name will hide the predefined definition. This is what scoping was designed to do; to allow local namespaces to be protected from changes to global namespaces.
 
 
Yes, it's easy to laugh at silly examples where 'int' is used as a local 
variable of type 'char'.  We could sit here all day writing dangerously 
misleading code, without having to propose language changes to make that 
possible.
One of the goals of any good language should be to reduce the possibility of writing such dangerous code. Just because you can use _other_ features to write dangerous code doesn't mean we need more such features. Two wrongs don't make a right. Declaring a variable named "int" or "char" is just plain wrong IMO. If the language prevents it, all the better.
The *only* argument you are making is that this change adds the ability to do a few additional confusing things. Therefore it shouldn't be done, despite the benefits. That doesn't wash. Besides, you can still make it illegal to redefine 'int'.
 
 
"alias wchar rea1;" comes to mind... You'd fill less silly, 
and more p-d off, if you had used 'rchar' as a perfectly good local 
variable name, or struct member name, and it later became a new built-in 
type (and a new keyword).
You are contradicting yourself here. Do you want "int" to be a valid variable name or not? If yes, then that's a bad idea. If no, then what's with all the "evidence" that this is not an abominable idea?
I DON'T CARE if you can redefine int or not. I DON'T CARE. I would never do it. It's a very bad idea to redefine it. But if the language lets it be done, the language will still work. I have no evidence that it would be useful to redefine 'int'. I have presented evidence that it would be useful to redefine *less* *central* type names, which may not have even been in existence at the time you wrote your code. If you want it to be illegal to redefine int, you can *still* make int a predefined identifier, not a keyword, and make it illegal to redefine it, in whatever contexts you think it might be harmful, including all of them. I'm not going to repeat all the reasons why this is different from int being a keyword, you can go check out the other thread "Why are type names keywords". If you insist on using 'int' as an example and fail to recognize that the language has both a past (C code compatiblity) and a future (possible new types to be added) to deal with, then I am sure you will never understand my point. You seem to be assuming that keywords can never conflict with identifiers in any troublesome way. It would be great if that were true. Here's a worst-case scenario: (1) I have code like this: import commlib.rchar.api; (2) 'rchar' becomes a new type and a new keyword. (3) I have to change all my code, renaming 'rchar' to, say, 'rrchar'; and worse, my directory structure (possibly throwing my version control into havoc), then modify the build scripts, and rebuild the libraries. If the library came as object code from a 3rd party, I'm skunked, I need to get a new build of it with a different name, because I can't link with the old unless I can use 'rchar' as an identifier. If, on the other hand: (2a) 'rchar' becomes a new type and a new predefined identifier (3a) I don't have to do anything. (4a) At my leisure, I could rename things to avoid confusion. It could be a lot of work, since I'd have to rename the directories and possibly change the build scripts for the module. If I think the effort is worth it, I'll do it. But I'm not forced to do it. I don't know what else I can stay. Please stop talking about redefining 'int'.
I've previously proposed that a lexical convention be added
to escape identifiers from keyword recognition, e.g.
This suggestion is separate from the suggestion that type names should 
not be keywords; either could be adopted without the other.
This makes a _lot_ more sense than allowing people to name their functions "delegate". You should push for this suggestion instead.
Actually, I *am* exactly proposing that people be allowed to name their functions "delegate". I'm just proposing a method of doing it despite the presence of a keyword "delegate". Note that foo() and 'foo'() ...would be identical. This suggestion provides a method of dealing with keyword conflicts; the other suggestion reduces the likelihood of them happening in the first place. In the worst-case scenario discussed above, with import commlib.rchar.api; ... and 'rchar' becoming a new keyword, I wouldn't be nearly as screwed if I could use this quote thing. I would still need to edit the source, but I could keep the directory name. I could continue to use the 3rd-party compiled library.
 
So then we could call a method "this". Could we use the "this" of
that method's class? :x
<snip> Allowing redefinition of "this" is insane IMO.
I agree that it's no huge improvement. 'Insane', no. In fact, 'this' in C++ (not D) is a perfect example of something that doesn't need to be a keyword. It is *only* meaningful in contexts where a member function parameter has scope, and has exactly the same grammar treatment as an identifier. Nothing is gained by making it a keyword. In D the 'this' keyword is doing more work, in constructor declarations, etc, so the situation is different.
 
But if you had some  old C code that defined a struct member 'this', you 
wouldn't have to change that to convert it into our mutant C++, whereas 
with real C++, you do.
What about the simple suggestion to allow quoted identifiers? extern (C) 'this'(int foo); // for a function called 'this'.
I was discussing converting old C source code into C++, not linking old C code to D. If C++ 'this' were *not* a keyword, I wouldn't need to change any 'this' identifiers in the C code, since none of them could conflict with the C++ meaning. This is not a hypothetical example, by the way - I've run into this exact case. - Greg
Aug 04 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

I think keywords serve a purpose ("This identifier is off-limits"). It
hurts my head to consider all of the possible pitfalls.
But off-limits *everywhere*? I can't have a module called 'mmedia.real.realaudio', simply because 'real' is a keyword and the parser would balk. If 'real' were a predefined identifier, this would work, and would not create a harmful redefinition of 'real'.
There are reasons for that. First of all, IMHO, it would be fairly confusing to allow such a thing. Second, what happens if you declare something like min or max in your new 'real' module: ulong m = real.max; // Is this your module's or the primitive's?
What's wrong with import mmedia.real.realaudio; ulong m = real.max; // Existing real ulong m2 = mmedia.real.realaudio.max; // from module
My example did not use "realaudio.max," but rather "real.max." There's the difference. Do you not see a problem with that usage?
or just "m2=max;", as I understand import to work. I have not redefined
real in the global scope.

This is what scopes are for.

It's not confusing to *allow* something, what's confusing is what people 
do with it. Do you think this change is going to turn D from something 
in which no confusing code is possible into something which is a morass 
of unavoidable confusion?
No. Mere introduction of the "feature" would unlikely break havoc immediately. What I mean is that it creates the potential for confusion. In addition, the feature has _very_ _little_ _benefit_. In other words, it is not worth introducing the great potential for confusion for such a miniscule gain in functionality. The juice is not worth the squeeze. On the other hand, let me present to you another such potentially-abused feature: goto. Now, some will say goto is the source of all evil, but for all its evil, goto is a very powerful tool. So in that respect it is different, thus D allows goto.
 Keywords are scoped too. Their scope is universal. Where, pray tell, would
 declaring a variable called "int" be a good idea? IMO nowhere, thus the scope
of
 the keyword is universal for a good reason.
A keyword is actually scoped more universally than a variable. You can't do version(keyword){}, or extern(keyword) int foo();, whereas normal identifers can be used in those contexts without being defined or even mentioned anywhere else; and even if they are mentioned in other contexts there is no connection between a version tag 'xyz' and a language identifier (type, variable, etc) 'xyz'. I can't see how using version(real){ } version(emulation){ } ... should be particularly harmful.
Well, this is all relative. Would it cause great physical harm to the user? Unlikely. Will it create confusion? Yes. If a user only knows the D language and sees: version (real) what is he supposed to think? It kinda looks like "use this version if D is compiled with real number support." Or perhaps "use this version if you want real number precision."
And again, it's not the existing 
keywords that are the issue, it's the new ones that will be defined later.
Frankly, I don't see a flood of new primitive types and keywords. I'd speculate maybe a couple per year, since Walter likes to reuse his keywords. That's just a risk you're gonna have to take. Btw, perhaps you shouldn't be naming your things so close to already existing keywords. There's char, there's dchar, there's wchar. It doesn't take a genius to figure out the naming convention here. Therefore it also doesn't take a genius to figure out that rchar is particularly prone. Also, here's another quick tip: keywords in D are all lowercase. Hint, hint.
This is more of an issue when you consider the likely possibility that 
new built-in types will be added to the language in the future; if they 
are added as keywords, they will break existing code; but if they are 
added as predefined identifiers, they won't.
You can't guarantee this.
I think you can, for practical purposes. The user definition of the name will hide the predefined definition. This is what scoping was designed to do; to allow local namespaces to be protected from changes to global namespaces.
Hm... actually, it's the other way around. The purpose is to protect the global namespace from the local ones. A local namespace cannot override the global one: int x; { float x; //IIRC, error. } If a mere variable is prevented from redefinition, I think it's safe to say a predefined type/identifier/keyword would be too. And for good reason. I wouldn't want subtle redefinitions to happen in my code without so much as a warning.
The *only* argument you are making is that this change adds the ability 
to do a few additional confusing things. Therefore it shouldn't be done, 
despite the benefits. That doesn't wash. Besides, you can still make it 
illegal to redefine 'int'.
No. You seem to ignore the costs of your little operation: it would probably take valuable time to implement. It would require complex scoping rules to be created. In addition to the potential for abuse. Moreover, what doesn't wash is that all that work is not worth it. The quotes proposal already takes care of that at a fraction of the cost.
If you want it to be illegal to redefine int, you can *still* make int a 
predefined identifier, not a keyword, and make it illegal to redefine 
it, in whatever contexts you think it might be harmful, including all of 
them. I'm not going to repeat all the reasons why this is different from 
int being a keyword, you can go check out the other thread "Why are type 
names keywords".
So then you need to introduce complex scoping rules for predefined identifiers just so you can use language keywords in certain special scopes were they are not meaningful. Does this makes sense over a simple "no use" rule across the language?
If you insist on using 'int' as an example and fail to recognize that 
the language has both a past (C code compatiblity)
D does not have C code compatibility. It would be hazardous to your mental health to keep believing that. If you fail to recognize that then you should work on it.
and a future 
(possible new types to be added) to deal with, then I am sure you will 
never understand my point.
I understand your point very well, I just don't happen to agree with it.
This suggestion is separate from the suggestion that type names should 
not be keywords; either could be adopted without the other.
This makes a _lot_ more sense than allowing people to name their functions "delegate". You should push for this suggestion instead.
Actually, I *am* exactly proposing that people be allowed to name their functions "delegate". I'm just proposing a method of doing it despite the presence of a keyword "delegate". Note that foo() and 'foo'() ...would be identical.
Well, no they wouldn't (if foo were a keyword), because you'd have to always use quotes to accesss foo. In other words, it's almost as if the quotes became part of the identifier and they were taken off only for C-compat.
This suggestion provides a method of dealing with keyword conflicts; the 
other suggestion reduces the likelihood of them happening in the first 
place.

In the worst-case scenario discussed above, with
	import commlib.rchar.api;
... and 'rchar' becoming a new keyword, I wouldn't be nearly as screwed 
if I could use this quote thing. I would still need to edit the source, 
but I could keep the directory name. I could continue to use the 
3rd-party compiled library.
I have already said that the quote suggestion is valid, and would (at little cost) bring about the benefit you suggest. That's what makes the feature good: It's a small cost.
 Allowing redefinition of "this" is insane IMO.
 
I agree that it's no huge improvement. 'Insane', no. In fact, 'this' in C++ (not D) is a perfect example of something that doesn't need to be a keyword. It is *only* meaningful in contexts where a member function parameter has scope, and has exactly the same grammar treatment as an identifier. Nothing is gained by making it a keyword. In D the 'this' keyword is doing more work, in constructor declarations, etc, so the situation is different.
Unfortunately, D is not C++ and in D, this has almost universal meaning. In fact, at the base scope (module-level, I assume), 'this' is relevant, due to static ctors/dtors. My suggestion that redefining 'this' is insane applies to D, not C++, or I would have said so. Since apparently you didn't like my redefining int that much, let me propose redefining this then ;) I suppose you would have no problem with the following: Nope, no confusion in sight.
But if you had some  old C code that defined a struct member 'this', you 
wouldn't have to change that to convert it into our mutant C++, whereas 
with real C++, you do.
What about the simple suggestion to allow quoted identifiers? extern (C) 'this'(int foo); // for a function called 'this'.
I was discussing converting old C source code into C++, not linking old C code to D. If C++ 'this' were *not* a keyword, I wouldn't need to change any 'this' identifiers in the C code, since none of them could conflict with the C++ meaning.
If you are converting old C code to D chances are you are already doing some heavy lifting. I don't think ONE additional global search and replace will be particularly troublesome. Furthermore, with the quotes proposal the problem would be even less relevant. Finally, what is with all this C compat glorification? D is not C. D is not a superset of C. D already breaks all sorts of things that prevent C-code compat. Relatively speaking, the keywords are very small issue compared to the other things. Creating all sorts of complex keyword/predefined-identifier scoping rules just so you can have your almighty 'this' structure in C is horribly misguided, IMHO. Cheers, --AJG.
Aug 04 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 4 Aug 2005 21:00:29 +0000 (UTC), AJG wrote:

[snip]

 int x;
 {
 float x; //IIRC, error.
 }
Actually, this is only an error at the module level. The code below is quite okay ... void main() { int x; { float x; // okay } } This too is alright ... void x() { int x; { float x; //Okay too. } } [snip]
 Unfortunately, D is not C++ 
LOL. And I think exactly the opposite. Thank Bob that D is not like C++! My take on this issue is that if people want to write stupid and confusing code, then they should be allowed to. They might not stay employed for very long in any of my teams but that's natural selection at work. -- Derek Parnell Melbourne, Australia 5/08/2005 7:26:30 AM
Aug 04 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

[snip]

 int x;
 {
 float x; //IIRC, error.
 }
Actually, this is only an error at the module level. The code below is quite okay ...
That's certainly not what the docs say...: "A block statement introduces a new scope for local symbols. A local symbol's name, however, must be unique within the function." http://www.digitalmars.com/d/statement.html The examples there contradict yours. I don't have a compiler at hand, though, so I can't say. I prefer what the docs say, frankly.
[snip]

 Unfortunately, D is not C++ 
LOL. And I think exactly the opposite. Thank Bob that D is not like C++!
Thank Bob indeed. You knew I was going for the sarcastic effect! ;)
My take on this issue is that if people want to write stupid and confusing
code, then they should be allowed to. They might not stay employed for very
long in any of my teams but that's natural selection at work.
I don't agree. This kind of "remedial" attitude is no good. What we need (just like in real life) is preventative care. Take a proactive approach so that things don't become fatal. Having been assigned "maintenance" duties on occasion, I can say the ones burned ("skunked," to quote Greg) are not generally the original doofuses but the subsequent maintainers. <often quoted possibly completely wrong statistic> <take this with a grain of salt> Around 80% of the total cost of software is not the original development. </take this with a grain of salt> </often quoted possibly completely wrong statistic> Cheers, --AJG.
Aug 04 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 5 Aug 2005 00:01:30 +0000 (UTC), AJG wrote:

 Hi,
 
[snip]

 int x;
 {
 float x; //IIRC, error.
 }
Actually, this is only an error at the module level. The code below is quite okay ...
That's certainly not what the docs say...: "A block statement introduces a new scope for local symbols. A local symbol's name, however, must be unique within the function." http://www.digitalmars.com/d/statement.html The examples there contradict yours. I don't have a compiler at hand, though, so I can't say. I prefer what the docs say, frankly.
Yes, I knew about the docs, but I believe Walter has said that the docs are wrong in this case. The compiler does compile the examples I gave and this is the intended behaviour. I just can't find the Walter-quote right now.
My take on this issue is that if people want to write stupid and confusing
code, then they should be allowed to. They might not stay employed for very
long in any of my teams but that's natural selection at work.
I don't agree. This kind of "remedial" attitude is no good. What we need (just like in real life) is preventative care. Take a proactive approach so that things don't become fatal. Having been assigned "maintenance" duties on occasion, I can say the ones burned ("skunked," to quote Greg) are not generally the original doofuses but the subsequent maintainers.
Yes, I can understand this. We have defined coding standards here and all code is peer-reviewed and non-standard code is identified and corrected before being allowed to go out into the world. But under more common development methodologies, I can understand that you'd like to have a mechanized method of enforcing coding decency - i.e. the compiler.
 <often quoted possibly completely wrong statistic> 
 <take this with a grain of salt>
 Around 80% of the total cost of software is not the original development.
 </take this with a grain of salt>
 </often quoted possibly completely wrong statistic>
I think I heard once that 94.37% of all statistics are misused, and the remaining 84.12% are actually a bit suspect too. -- Derek Parnell Melbourne, Australia Download BUILD from ... http://www.dsource.org/projects/build/ v2.08 released 29/May/2005 http://www.prowiki.org/wiki4d/wiki.cgi?FrontPage 5/08/2005 11:04:09 AM
Aug 04 2005
next sibling parent J C Calvarese <technocrat7 gmail.com> writes:
In article <1l0oh1ebg8svr.13hqjnzi2kbus$.dlg 40tude.net>, Derek Parnell says...
On Fri, 5 Aug 2005 00:01:30 +0000 (UTC), AJG wrote:
..
 <often quoted possibly completely wrong statistic> 
 <take this with a grain of salt>
 Around 80% of the total cost of software is not the original development.
 </take this with a grain of salt>
 </often quoted possibly completely wrong statistic>
I think I heard once that 94.37% of all statistics are misused, and the remaining 84.12% are actually a bit suspect too.
8 out of 10 statistics are completely made up. ;) jcc7
Aug 04 2005
prev sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi,

 The examples there contradict yours. I don't have a compiler at hand, though,
so
 I can't say. I prefer what the docs say, frankly.
Yes, I knew about the docs, but I believe Walter has said that the docs are wrong in this case. The compiler does compile the examples I gave and this is the intended behaviour. I just can't find the Walter-quote right now.
Hm... interesting. I hope Walter brings back the behaviour in the docs.
Yes, I can understand this. We have defined coding standards here and all
code is peer-reviewed and non-standard code is identified and corrected
before being allowed to go out into the world. But under more common
development methodologies, I can understand that you'd like to have a
mechanized method of enforcing coding decency - i.e. the compiler.
Exactly! Thank you. I'm glad somebody here understands.
 <often quoted possibly completely wrong statistic> 
 <take this with a grain of salt>
 Around 80% of the total cost of software is not the original development.
 </take this with a grain of salt>
 </often quoted possibly completely wrong statistic>
I think I heard once that 94.37% of all statistics are misused, and the remaining 84.12% are actually a bit suspect too.
"There are three types of lies - lies, damn lies, and documentation." Hehehe... --AJG.
Aug 05 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 5 Aug 2005 13:45:14 +0000 (UTC), AJG wrote:


[snip]

Yes, I can understand this. We have defined coding standards here and all
code is peer-reviewed and non-standard code is identified and corrected
before being allowed to go out into the world. But under more common
development methodologies, I can understand that you'd like to have a
mechanized method of enforcing coding decency - i.e. the compiler.
Exactly! Thank you. I'm glad somebody here understands.
But the problem is that we are then stuck with somebody else's interpretation of what decency is (i.e. the language designer) and we don't have the freedom to use another interpretation. A general purpose language should be just that - general purpose. Inserting artificial restrictions that are only there to support a specific coding philosophy is fine - just don't call it a general purpose language in that case. I actually support the idea of reducing the number of keywords in D. The current built-in types really shouldn't be keywords. And they are inconsistent ... 'Object' is not a keyword, 'bool' is not a keyword. void main() { int Object = 2; // okay int bool = 3; // okay int real = 4; // error } The other keywords in the language are punctuation symbols that the compiler uses, and this is the reasonable use for keywords. -- Derek Parnell Melbourne, Australia 6/08/2005 6:01:37 AM
Aug 05 2005
parent AJG <AJG_member pathlink.com> writes:
Hi,

Yes, I can understand this. We have defined coding standards here and all
code is peer-reviewed and non-standard code is identified and corrected
before being allowed to go out into the world. But under more common
development methodologies, I can understand that you'd like to have a
mechanized method of enforcing coding decency - i.e. the compiler.
Exactly! Thank you. I'm glad somebody here understands.
But the problem is that we are then stuck with somebody else's interpretation of what decency is (i.e. the language designer) and we don't have the freedom to use another interpretation. A general purpose language should be just that - general purpose. Inserting artificial restrictions that are only there to support a specific coding philosophy is fine - just don't call it a general purpose language in that case.
_Some_ decency is better than _no_ decency. The "interpretations" are generally linear in that there is usually "more" and "less." I'll always want more. And of course, I wouldn't want the "zero" interpretation.
And they are
inconsistent ... 'Object' is not a keyword, 'bool' is not a keyword.
Hey, that inconsistency is a problem by itself. I'd treat it as a bug.
I actually support the idea of reducing the number of keywords in D. 
The other keywords in the language are punctuation symbols that the
compiler uses, and this is the reasonable use for keywords.
Ceteris paribus, I also support the idea of less keywords. But not when it reduces the "decency," if you will, and also not when it leads the syntax to become like Perl's. Cheers, --AJG.
Aug 05 2005
prev sibling parent reply Greg Smith <greg siliconoptix.com> writes:
AJG wrote:

ulong m = real.max; // Is this your module's or the primitive's?
My example did not use "realaudio.max," but rather "real.max." There's the difference. Do you not see a problem with that usage?
Yes, you can construct cases where there is confusion. This does not change the fact that there are useful implications.
 
 No. Mere introduction of the "feature" would unlikely break havoc immediately.
 What I mean is that it creates the potential for confusion. 
 
Same comment.
 
 In addition, the feature has _very_ _little_ _benefit_. In other words, it is
 not worth introducing the great potential for confusion for such a miniscule
 gain in functionality. The juice is not worth the squeeze.
We'll have to agree to differ on that. I keep bringing up C++, not because I'm confusing D with C++, but because, more than once, I've been bitten by C++ adding new keywords (either relative to C, or to its own former self).
 
 Well, this is all relative. Would it cause great physical harm to the user?
 Unlikely. Will it create confusion? Yes.
 
 If a user only knows the D language and sees:
 
 version (real) 
 
 what is he supposed to think? It kinda looks like "use this version if D is
 compiled with real number support." Or perhaps "use this version if you want
 real number precision."
 
Or you could read the manual. The relevant page is a google and about 3 clicks away :-). I had no idea at all how 'version' worked until I read the manual.
 
And again, it's not the existing 
keywords that are the issue, it's the new ones that will be defined later.
Frankly, I don't see a flood of new primitive types and keywords. I'd speculate maybe a couple per year, since Walter likes to reuse his keywords. That's just a risk you're gonna have to take. Btw, perhaps you shouldn't be naming your things so close to already existing keywords.
I had trouble with C++, at a much lower rate of keyword growth.
 
 There's char, there's dchar, there's wchar. It doesn't take a genius to figure
 out the naming convention here. Therefore it also doesn't take a genius to
 figure out that rchar is particularly prone. Also, here's another quick tip:
 keywords in D are all lowercase. Hint, hint.
 
Right. But the problems I had with C++ were with code others had written. If you have full control over all of the code you work with, none of this matters nearly as much. Lucky you. I don't intend to stop using local variable names consisting entirely of lowercase letters. And, the set of likely new type names is (a) not quite as predictable as that and (b) not utterly disjoint from the set of likely names for local vars, parameter names, and member names, all of which are lowercase by many coding conventions.
 
...
I think you can, for practical purposes. The user definition of the name 
will hide the predefined definition. This is what scoping was designed 
to do; to allow local namespaces to be protected from changes to global 
namespaces.
Hm... actually, it's the other way around. The purpose is to protect the global namespace from the local ones. A local namespace cannot override the global one: int x; { float x; //IIRC, error. }
I think you are mistaken, or just being evasive. I remember seeing something about how a D local isn't allowed to hide a local in an enclosing scope,which is fine; but a local variable *can* hide a global. I just tried it. I agree there is a risk of a global namespace being affected by modifications to the local namespace, but the effect of that is contained to the locality of the change, so it's reasonable to expect that to be addressed by diligence (and via variable naming conventions).
 
 If a mere variable is prevented from redefinition, I think it's safe to say a
 predefined type/identifier/keyword would be too. And for good reason. I
wouldn't
 want subtle redefinitions to happen in my code without so much as a warning.
 
flawed premise, see above. The main point is, if you are redefining the new built-in type 'rchar', you are doing it because it wasn't there when you wrote the code. So, you want the new 'rchar' type to be hidden in all places where the user's 'rchar' is in scope. Which is what the existing scope rules do already.
 
The *only* argument you are making is that this change adds the ability 
to do a few additional confusing things. Therefore it shouldn't be done, 
despite the benefits. That doesn't wash. Besides, you can still make it 
illegal to redefine 'int'.
No. You seem to ignore the costs of your little operation: it would probably take valuable time to implement. It would require complex scoping rules to be created. In addition to the potential for abuse.
IMHO not as bad as you think, and will never be any easier to implement than now. D already has a bunch of predefined identifiers (Object, max, nan, infinity, sizeof...) - does that bother you?
 
If you want it to be illegal to redefine int, you can *still* make int a 
predefined identifier, not a keyword, and make it illegal to redefine 
it, in whatever contexts you think it might be harmful, including all of 
them. I'm not going to repeat all the reasons why this is different from 
int being a keyword, you can go check out the other thread "Why are type 
names keywords".
So then you need to introduce complex scoping rules for predefined identifiers just so you can use language keywords in certain special scopes were they are not meaningful. Does this makes sense over a simple "no use" rule across the language?
The existing scope rules are fine. The 'no use' rule has the problems I have mentioned, it's not future-proof.
 
 
If you insist on using 'int' as an example and fail to recognize that 
the language has both a past (C code compatiblity)
D does not have C code compatibility. It would be hazardous to your mental health to keep believing that. If you fail to recognize that then you should work on it.
You can link to external C code. This is what I meant. This compatibility is degraded by having too many keywords.
 
 
This suggestion is separate from the suggestion that type names should 
not be keywords; either could be adopted without the other.
This makes a _lot_ more sense than allowing people to name their functions "delegate". You should push for this suggestion instead.
Actually, I *am* exactly proposing that people be allowed to name their functions "delegate". I'm just proposing a method of doing it despite the presence of a keyword "delegate". Note that foo() and 'foo'() ...would be identical.
Well, no they wouldn't (if foo were a keyword), because you'd have to always use quotes to accesss foo. In other words, it's almost as if the quotes became part of the identifier and they were taken off only for C-compat.
I'm just saying that by declaring " int 'delegate'(), you are naming a function "delegate", plain and simple. You said 'this makes more sense than allowing people to name their functions "delegate"' . I found that statement very confusing and hoped to clarify things. I see now what you were getting at; but I have never proposed removal of any keywords other than those which are manifestly unnecessary for parsing; 'delegate' isn't in that list.
Allowing redefinition of "this" is insane IMO.
I agree that it's no huge improvement. 'Insane', no. In fact, 'this' in C++ (not D) is a perfect example of something that doesn't need to be a keyword. It is *only* meaningful in contexts where a member function parameter has scope, and has exactly the same grammar treatment as an identifier. Nothing is gained by making it a keyword. In D the 'this' keyword is doing more work, in constructor declarations, etc, so the situation is different.
Unfortunately, D is not C++ and in D, this has almost universal meaning. In fact, at the base scope (module-level, I assume), 'this' is relevant, due to static ctors/dtors. My suggestion that redefining 'this' is insane applies to D, not C++, or I would have said so. Since apparently you didn't like my redefining int that much, let me propose redefining this then ;)
well that caused me a lot of confusion since I explicitly said I was talking about removing keyword 'this' in C++ and not D. The parallel is that 'wchar' in D, like 'this' in C++, and 'Object' in D, doesn't actually need to be a keyword. Of these three, only 'Object' has the benefit of being a predefined identifier rather than a keyword.
 
 I suppose you would have no problem with the following:
 








 
 Nope, no confusion in sight.
Do you have a problem with ... because I think that's perfectly legal, and isn't any less confusing. Construct all the confusing examples you like.
 
 
But if you had some  old C code that defined a struct member 'this', you 
wouldn't have to change that to convert it into our mutant C++, whereas 
with real C++, you do.
What about the simple suggestion to allow quoted identifiers? extern (C) 'this'(int foo); // for a function called 'this'.
I was discussing converting old C source code into C++, not linking old C code to D. If C++ 'this' were *not* a keyword, I wouldn't need to change any 'this' identifiers in the C code, since none of them could conflict with the C++ meaning.
If you are converting old C code to D chances are you are already doing some heavy lifting.
Again, I'm not talking about converting C to D; I'm talking about the general effects of adding new keywords. The task of converting C source to C++ is just a real world example of where these effects show up. They will also show up if/when new keywords are added to D. -Greg
Aug 04 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

Yes, you can construct cases where there is confusion. This does not 
change the fact that there are useful implications.
The minimal useful "implications" are overshadowed. Moreover, even these implications can be achieved thru far simpler, far cheaper solutions: Either the quotes proposal, or a simple global search and replace.
 In addition, the feature has _very_ _little_ _benefit_. In other words, it is
 not worth introducing the great potential for confusion for such a miniscule
 gain in functionality. The juice is not worth the squeeze.
We'll have to agree to differ on that. I keep bringing up C++, not because I'm confusing D with C++, but because, more than once, I've been bitten by C++ adding new keywords (either relative to C, or to its own former self).
Yes, we'll have to differ. I haven't once run into a single keyword collision, in any language I've used. Maybe I've just been lucky. :p
 what is he supposed to think? It kinda looks like "use this version if D is
 compiled with real number support." Or perhaps "use this version if you want
 real number precision."
 
Or you could read the manual. The relevant page is a google and about 3 clicks away :-). I had no idea at all how 'version' worked until I read the manual.
The more intuitive you make things, the better overall. That's just commonsense.
And again, it's not the existing 
keywords that are the issue, it's the new ones that will be defined later.
Frankly, I don't see a flood of new primitive types and keywords. I'd speculate maybe a couple per year, since Walter likes to reuse his keywords. That's just a risk you're gonna have to take. Btw, perhaps you shouldn't be naming your things so close to already existing keywords.
I had trouble with C++, at a much lower rate of keyword growth.
When you say you "had" trouble, does this mean: "Well, there was this one time where one identifier collided with a keyword;" or "Heck, I ran into keywords left and right. It seemed the very language evolved against my code!"
 There's char, there's dchar, there's wchar. It doesn't take a genius to figure
 out the naming convention here. Therefore it also doesn't take a genius to
 figure out that rchar is particularly prone. Also, here's another quick tip:
 keywords in D are all lowercase. Hint, hint.
 
Right. But the problems I had with C++ were with code others had written. If you have full control over all of the code you work with, none of this matters nearly as much. Lucky you.
As a matter of fact I don't. Despite this fact it's never been a problem.
I don't intend to stop using local variable names consisting entirely of 
lowercase letters. And, the set of likely new type names is (a) not 
quite as predictable as that and (b) not utterly disjoint from the set 
of likely names for local vars, parameter names, and member names, all 
of which are lowercase by many coding conventions.
In D, it's actually predictable to a certain degree: stmt is unlikely to become a keyword. statement could. expr, wouldn't. expression, could. * The one exception I can think of is "const" which I believe was kept due to C. Had walter had his choice, I think he would have gone with "constant." Verbs/prepositions: in/do/is/for/a/as/be. Things like: while/when/where/what/why. Moreover, if you are talking about _types_, it is a good naming convention to capitalize those (and keywords are all lowercase): Database; Statement; Result; I think following those simple conventions almost guarantees not running into a keyword. Language designers (D included) are not stupid. "i" and "j" won't become keywords.
flawed premise, see above. The main point is, if you are redefining the 
new built-in type 'rchar', you are doing it because it wasn't there when 
you wrote the code. So, you want the new 'rchar' type to be hidden in 
all places where the user's 'rchar' is in scope. Which is what the 
existing scope rules do already.
AND
I think you are mistaken, or just being evasive. I remember seeing 
something about how a D local isn't allowed to hide a local in an 
enclosing scope,which is fine; but a local variable *can* hide a global. 
I just tried it.
I'm sorry about this. I'm confused ATM about scopes/identifiers because apparently the D docs say one thing and Walter "said" another. I'm not sure what I'm supposed to conclude.
I agree there is a risk of a global namespace being affected by 
modifications to the local namespace, but the effect of that is 
contained to the locality of the change, so it's reasonable to expect 
that to be addressed by diligence (and via variable naming conventions).
The "containment" you speak of is very leaky. Throughout the code that redefines rchar, the original rchar would be subtly hidden. So if a programmer with no knowledge of this comes along (since, as you said, you are not the only one involved), and uses rchar, then there will be trouble. Whereas as it stands, simple knowledge of D itself (not of whatever "variations" the specific code imposes) is enough to understand unambiguously what (the hypothetical) rchar is. IMHO it is a much safer assumption to think a programmer will know the language itself before whatever else is written with it.
 No. You seem to ignore the costs of your little operation: it would probably
 take valuable time to implement. It would require complex scoping rules to be
 created. In addition to the potential for abuse.
IMHO not as bad as you think, and will never be any easier to implement than now. D already has a bunch of predefined identifiers (Object, max, nan, infinity, sizeof...) - does that bother you?
No, because right now I couldn't accidentally redefine those to hide the real ones. For instance, "real.max" will always be real.max, not some user defined thing.
If you want it to be illegal to redefine int, you can *still* make int a 
predefined identifier, not a keyword, and make it illegal to redefine 
it, in whatever contexts you think it might be harmful, including all of 
them. I'm not going to repeat all the reasons why this is different from 
int being a keyword, you can go check out the other thread "Why are type 
names keywords".
So then you need to introduce complex scoping rules for predefined identifiers just so you can use language keywords in certain special scopes were they are not meaningful. Does this makes sense over a simple "no use" rule across the language?
The existing scope rules are fine. The 'no use' rule has the problems I have mentioned, it's not future-proof.
Are you saying the variable scoping rules could be applied _identically_ to type redefinition? That would yield some interesting results. Moreover, the language will _never_ be future-proof. If you name your type something that becomes a non-type keyword, you are skunked too my friend.
If you insist on using 'int' as an example and fail to recognize that 
the language has both a past (C code compatiblity)
D does not have C code compatibility. It would be hazardous to your mental health to keep believing that. If you fail to recognize that then you should work on it.
You can link to external C code. This is what I meant.
You can't "link" to code. You link to objects, which are no longer code. Once again, D does not have C code compatibility. I don't know why you keep bringing this up. D has C library _link_ compatibility, which is quite a different beast. Most languages can link to C too. D's syntax being _similar_ to C has little relevance in this respect.
This compatibility is degraded by having too many keywords.
Furthermore, clearly the definition of "too many" is quite subjective and relative. If you want to see some languages with truly "too many" keywords, then I can show you plenty.
This suggestion is separate from the suggestion that type names should 
not be keywords; either could be adopted without the other.
This makes a _lot_ more sense than allowing people to name their functions "delegate". You should push for this suggestion instead.
Actually, I *am* exactly proposing that people be allowed to name their functions "delegate". I'm just proposing a method of doing it despite the presence of a keyword "delegate". Note that foo() and 'foo'() ...would be identical.
Well, no they wouldn't (if foo were a keyword), because you'd have to always use quotes to accesss foo. In other words, it's almost as if the quotes became part of the identifier and they were taken off only for C-compat.
I'm just saying that by declaring " int 'delegate'(), you are naming a function "delegate", plain and simple. You said 'this makes more sense than allowing people to name their functions "delegate"' . I found that statement very confusing and hoped to clarify things. I see now what you were getting at; but I have never proposed removal of any keywords other than those which are manifestly unnecessary for parsing; 'delegate' isn't in that list.
Then why did your example have delegate expressly in it??? Now who's being evasive? ;)
Allowing redefinition of "this" is insane IMO.
I agree that it's no huge improvement. 'Insane', no. In fact, 'this' in C++ (not D) is a perfect example of something that doesn't need to be a keyword. It is *only* meaningful in contexts where a member function parameter has scope, and has exactly the same grammar treatment as an identifier. Nothing is gained by making it a keyword. In D the 'this' keyword is doing more work, in constructor declarations, etc, so the situation is different.
Unfortunately, D is not C++ and in D, this has almost universal meaning. In fact, at the base scope (module-level, I assume), 'this' is relevant, due to static ctors/dtors. My suggestion that redefining 'this' is insane applies to D, not C++, or I would have said so. Since apparently you didn't like my redefining int that much, let me propose redefining this then ;)
well that caused me a lot of confusion since I explicitly said I was talking about removing keyword 'this' in C++ and not D. The parallel is that 'wchar' in D, like 'this' in C++, and 'Object' in D, doesn't actually need to be a keyword. Of these three, only 'Object' has the benefit of being a predefined identifier rather than a keyword.
Mostly because Object is not a primitive. That's a big difference.
But if you had some  old C code that defined a struct member 'this', you 
wouldn't have to change that to convert it into our mutant C++, whereas 
with real C++, you do.
What about the simple suggestion to allow quoted identifiers? extern (C) 'this'(int foo); // for a function called 'this'.
I was discussing converting old C source code into C++, not linking old C code to D. If C++ 'this' were *not* a keyword, I wouldn't need to change any 'this' identifiers in the C code, since none of them could conflict with the C++ meaning.
If you are converting old C code to D chances are you are already doing some heavy lifting.
Again, I'm not talking about converting C to D; I'm talking about the general effects of adding new keywords.
You keep going back and forth using C code as a point and then saying you are not "talking about" it. But you _were_ talking about C code just there. Also, I'm sure converting C code to COBOL would pose some difficulties, and I'm sure adding new keywords to COBOL has its own set of problems, but I wouldn't bring it up as proof of my point.
The task of converting C source 
to C++ is just a real world example of where these effects show up. They 
  will also show up if/when new keywords are added to D.
If the conversion is going to take place, then you are back to what I said about the "heavy lifting." You can't have your cake and eat it. Are you going to convert code to D or not? Cheers, --AJG.
Aug 05 2005
parent Greg Smith <greg siliconoptix.com> writes:
AJG wrote:

And again, it's not the existing 
keywords that are the issue, it's the new ones that will be defined later.
Frankly, I don't see a flood of new primitive types and keywords. I'd speculate maybe a couple per year, since Walter likes to reuse his keywords. That's just a risk you're gonna have to take. Btw, perhaps you shouldn't be naming your things so close to already existing keywords.
I had trouble with C++, at a much lower rate of keyword growth.
When you say you "had" trouble, does this mean: "Well, there was this one time where one identifier collided with a keyword;" or "Heck, I ran into keywords left and right. It seemed the very language evolved against my code!"
One time it was more than one. But it makes no difference; what matters is the difference between "I have these 10K lines of code which work, out of the box" and "I have to spend a bunch of time changing things, and now I possibly have to maintain a local mod of 3rd-partty source, because the other guys prefer to use the old compiler which I can't use for some other reason". The maintenance issue can easily overshadow the effort of making the change.
 
 
 The "containment" you speak of is very leaky. Throughout the code that
redefines
 rchar, the original rchar would be subtly hidden. So if a programmer with no
 knowledge of this comes along (since, as you said, you are not the only one
 involved), and uses rchar, then there will be trouble.
 
Yes. But this difficulty is only encountered when you change the code, not when you upgrade the compiler; and it's local to the change. I prefer it to the other difficulty.
 
 No, because right now I couldn't accidentally redefine those to hide the real
 ones. For instance, "real.max" will always be real.max, not some user defined
 thing.
 
What does a.max mean? 'a' could be from 'alias int a', or a class object. My point is, the required scoping rules are clearly already in place.
So then you need to introduce complex scoping rules for predefined identifiers
just so you can use language keywords in certain special scopes were they are
not meaningful. Does this makes sense over a simple "no use" rule across the
language?
The existing scope rules are fine. The 'no use' rule has the problems I have mentioned, it's not future-proof.
Are you saying the variable scoping rules could be applied _identically_ to type redefinition? That would yield some interesting results.
The scoping rules already apply to types! every time you define an alias, or a typedef, or a class, or you import a type from a module, you are making some identifier refer to a type. There are scoping rules for that.
 
 Moreover, the language will _never_ be future-proof. If you name your type
 something that becomes a non-type keyword, you are skunked too my friend.
 
The point is to reduce the probability. New type names are more likely than other kinds of keywords. There are a lot of specialized C compilers (for DSPs, etc) which define special types for the environment; the same may happen to D; the new types should be p.d. indentifiers, not kwords, and this will be easier if the existing ones are too.
You can link to external C code. This is what I meant.
You can't "link" to code. You link to objects, which are no longer code. Once again, D does not have C code compatibility. I don't know why you keep bringing this up. D has C library _link_ compatibility, which is quite a different beast. Most languages can link to C too. D's syntax being _similar_ to C has little relevance in this respect.
?? If I write "int is_it_prime(int n);" in C, and leave it in a .c file, then I can write "extern (C) int is_it_prime( int n);" in D, and link them together and call the func from D. No? That's what I mean by 'linking to external C code'. I use the linker. The external code is in C. The linker obviously cannot process C source, so yes, it needs to be compiled, but it's still 'C code' due to its origins. I know how things like this can be confusing; when I first read your 'You link to objects', I thought you meant 'class' objects rather than .o files, and was temporarily baffled. But I took the time to understand what you actually meant. I've been linking object modules (not objects) using linkers since about 1980, so please try to give me the benefit of the doubt for any confusing phraseology.
I'm just saying that by declaring " int 'delegate'(), you are naming a 
function "delegate", plain and simple. You said 'this makes more sense 
than allowing people to name their functions "delegate"' . I found that 
statement very confusing and hoped to clarify things. I see now what you 
were getting at; but I have never proposed removal of any keywords other 
than those which are manifestly unnecessary for parsing; 'delegate' 
isn't in that list.
Then why did your example have delegate expressly in it??? Now who's being evasive? ;)
Why? Exactly for that reason! I was trying to avoid confusion (ha!). Stay with me here. I didn't want to use a type name in the example, since I've proposed that they *not* be keywords, and I wanted it to be very clear that the word used in the example was definitely a keyword, hopefully avoiding any confusion with the other proposal. Also, 'delegate' seems fairly prone to a collision with a C function in some weird library.
 
 Mostly because Object is not a primitive. That's a big difference.
Sure, but I don't see it as that big that it needs to impact the grammar. They're all built-in types.
Again, I'm not talking about converting C to D; I'm talking about the 
general effects of adding new keywords. 
You keep going back and forth using C code as a point and then saying you are not "talking about" it. But you _were_ talking about C code just there. Also, I'm sure converting C code to COBOL would pose some difficulties, and I'm sure adding new keywords to COBOL has its own set of problems, but I wouldn't bring it up as proof of my point.
You are getting very close by bringing up COBOL. If I had an example of how an unnecessary keyword could be easily removed from the Arcturian language Zlatwold-II, without degrading it in any way, and how that would make converting code from the similar language Zlatwold-I (*not* C) easier, and if everybody knew the languages enough to make sense of the example, then maybe I would have used that example to illustrate my point. I've used C/C++ instead of Zlatwold-I/II. Any similarity between C/C++ and D is not relevant. I'm not talking about *converting* *C* *code* *to* *D*, but I was, as you say, talking about C. I don't see any "going back and forth" there. The relevance is to when you have to someday convert D-I to D-II, and D-II may have additional, unnecessary, keywords.
 If the conversion is going to take place, then you are back to what I said
about
 the "heavy lifting." You can't have your cake and eat it. Are you going to
 convert code to D or not?
Why is this relevant??? The only relationship between C and D i've discussed is via the extern(C) linking process. -Greg
Aug 05 2005
prev sibling parent J C Calvarese <technocrat7 gmail.com> writes:
In article <dctfjq$gs3$1 digitaldaemon.com>, Greg Smith says...
J C Calvarese wrote:


 I think keywords serve a purpose ("This identifier is off-limits"). It
 hurts my head to consider all of the possible pitfalls.
 
But off-limits *everywhere*? I can't have a module called 'mmedia.real.realaudio', simply because 'real' is a keyword and the parser would balk. If 'real' were a predefined identifier, this would work, and would not create a harmful redefinition of 'real'.
It'd be nice if module names were more flexible, but all of kinds of practical effects of such a naming would either have to be specifically prohibited by the compiler or the programmer would have to watch out for any clashes (see my "real.max" discussion below). I think Walter has decided it's easier just to prohibit the name altogether and make it a keyword than come up with more complex rules about what happens to be barely allowed and barely disallowed. (You can compile a module "int.module.import.real.typeof.3d", but if you try to import it, you computer will crash.)
In my view, "This identifier is off-limits" is precisely the collateral 
damage *caused* by keywords, *not* their purpose. They serve to provide 
a set of readable punctuation marks to guide the parser, and the 
downside is that they are reserved in all possible contexts, whereas 
normal identifiers are scoped.
Well, I'm not sure about the whole cause-and-effect issue, but the reality is we have to program with the allowable syntax of the language. If we want the benefit of having the functionality provided by a keyword, we can't use it an identifier, too.
This is more of an issue when you consider the likely possibility that 
new built-in types will be added to the language in the future; if they 
are added as keywords, they will break existing code; but if they are 
added as predefined identifiers, they won't.

Yes, it's easy to laugh at silly examples where 'int' is used as a local 
variable of type 'char'.  We could sit here all day writing dangerously 
misleading code, without having to propose language changes to make that 
I'm not trying to be humorous. I'm exploring the possible ramifications of your proposal. It's not a matter to be taken lightly. I'm getting a vibe from your many lengthy posts that you think this is definitely the way to go, but I see many downsides and not much (if any) upside. Throwing away all of those all-so-pesky rules would have real consequences. Even opening up module names could allow all kinds of disturbing code. If you have a module called real what would happen if you had a function called max. These days "real.max" means "largest representable value that's not infinity for a real type". Isn't it ambiguous with a "real.max" function thrown into the mix?
possible. "alias wchar rea1;" comes to mind... You'd fill less silly, 
Don't change the subject. ;)
and more p-d off, if you had used 'rchar' as a perfectly good local 
variable name, or struct member name, and it later became a new built-in 
type (and a new keyword). And by the way, it's possible to redefine 
I think Walter would agree that adding a new keyword after 1.0 is established would only be done after significant deliberation. That's why cent (signed 128 bits) and ucent (unsigned 128 bits) are already reserved for future use even though it's unclear when they'll be implemented.
'integer' in Pascal, and nobody has ever seemed to complain of their 
head hurting, or even notice, for that matter. They just don't do it. 
[Ada too, I think]. I have never once proposed that anybody should be 
encouraged to redefine int; in fact, this can be prohibited without 
making it a keyword.
So why does it matter it's not technically a "keyword" if we should prohibit this anyway.
There's IMHO an issue in D, arising from the fact that there are a lot 
of keywords; and while I believe that more than 20 of them are 
unnecessary (all the type names, and true/false), even if they are 
eliminated there will still be a problem as follows:
The true and false keywords are a different situation than the types, but I can see a reason to make them keywords: to compel consistancy. If they aren't keywords, one programmer can say "true = 0" and another can say "false = 99". If everyone got creative with defining their own true and false, it'd be that must harder to understand someone else's code. Someone can use True and False (or TRUE and FALSE) if a need arises.
     extern(C) int delegate( foo*p, int id );	// Can't do it!

I simply can't interface to this existing C function since its name is a 
D keyword. Almost 100 C identifiers are simply 'off-limits', and there 
is no real solution to this unless you can modify the C code.
Are they commonly used? (No hypotheticals, how often have you seen these actually used?)
I've previously proposed that a lexical convention be added
to escape identifiers from keyword recognition, e.g.

     extern(C) int 'delegate'( foo*p, int id );	

	...
    if( can_delegate )  'delegate' ( foop, 0); // call C function

    int 'if' = 0;   'if'++;   // now possible, if you really want to.
    char x = 'c';	// still a char constant if only 1 char

    import commlib.'interface'.localapi;	// interface is a keyword

Or, w"delegate".

This suggestion is separate from the suggestion that type names should 
not be keywords; either could be adopted without the other.
This is a whole other argument. I don't see a problem with this except that apostrophes are already used for character literals. I guess they could be reused here, but it might be worth it to use another symbol. Could we go to Unicode for a purpose such as this?
 So then we could call a method "this". Could we use the "this" of
 that method's class? :x
FWIW, if 'this' were converted from a D keyword to a predefined identifier, it would not be so simple, since 'this' in the class namespace would be the constructor. I don't know enough about all the details of D classes to comment further.
I agree with this. Well, it seems we still mostly disagree and I don't expect you to suddenly agree with me on everything, so maybe we should just agree to disagree. jcc7
Aug 04 2005