digitalmars.D - Octal literals: who uses this?
- Christopher Wright <dhasenan gmail.com> Mar 14 2009
- Sean Kelly <sean invisibleduck.org> Mar 14 2009
- Stewart Gordon <smjg_1998 yahoo.com> Mar 14 2009
- Jarrett Billingsley <jarrett.billingsley gmail.com> Mar 14 2009
- Stewart Gordon <smjg_1998 yahoo.com> Mar 14 2009
- Walter Bright <newshound1 digitalmars.com> Mar 14 2009
- Stewart Gordon <smjg_1998 yahoo.com> Mar 14 2009
- BCS <none anon.com> Mar 14 2009
- Walter Bright <newshound1 digitalmars.com> Mar 14 2009
- Don <nospam nospam.com> Mar 17 2009
- BCS <ao pathlink.com> Mar 17 2009
- Walter Bright <newshound1 digitalmars.com> Mar 17 2009
- BCS <ao pathlink.com> Mar 17 2009
- Walter Bright <newshound1 digitalmars.com> Mar 17 2009
- BCS <none anon.com> Mar 17 2009
- Walter Bright <newshound1 digitalmars.com> Mar 17 2009
- Don <nospam nospam.com> Mar 18 2009
- Christopher Wright <dhasenan gmail.com> Mar 14 2009
I've been looking at dil and lexing D. Lexing character literals and string literals is not quite so easy as I thought it would be, but overall not difficult either. One thing I'm curious about: There are three forms of hex literals: \x: 2 digits \u: 4 digits \U: 8 digits There is one form of octal literal: \: 1 to 3 digits Why? With hex literals, each option is a fixed width. That is sensible. Octal literals aren't necessary with hex literals, but they might be convenient. However, making them variable width seems like it opens up the possibility for obscure bugs. I would not recommend that anyone use octal literals, and I don't think they're an advantage to the language. Even if they were, their current representation is not. Can we just remove this?
Mar 14 2009
Christopher Wright wrote:I've been looking at dil and lexing D. Lexing character literals and string literals is not quite so easy as I thought it would be, but overall not difficult either. One thing I'm curious about: There are three forms of hex literals: \x: 2 digits \u: 4 digits \U: 8 digits There is one form of octal literal: \: 1 to 3 digits Why? With hex literals, each option is a fixed width. That is sensible. Octal literals aren't necessary with hex literals, but they might be convenient. However, making them variable width seems like it opens up the possibility for obscure bugs. I would not recommend that anyone use octal literals, and I don't think they're an advantage to the language. Even if they were, their current representation is not. Can we just remove this?
All the escaped literals are going away, I believe.
Mar 14 2009
Sean Kelly wrote: <snip>All the escaped literals are going away, I believe.
I think all that's happening there is the removal of escaped characters not enclosed in quotes. Stewart.
Mar 14 2009
On Sat, Mar 14, 2009 at 9:13 AM, Christopher Wright <dhasenan gmail.com> wrote:I've been looking at dil and lexing D. Lexing character literals and string literals is not quite so easy as I thought it would be, but overall not difficult either. One thing I'm curious about: There are three forms of hex literals: \x: 2 digits \u: 4 digits \U: 8 digits There is one form of octal literal: \: 1 to 3 digits Why? With hex literals, each option is a fixed width. That is sensible. Octal literals aren't necessary with hex literals, but they might be convenient. However, making them variable width seems like it opens up the possibility for obscure bugs. I would not recommend that anyone use octal literals, and I don't think they're an advantage to the language. Even if they were, their current representation is not.
People use octal? Agreed.
Mar 14 2009
Christopher Wright wrote: <snip>Octal literals aren't necessary with hex literals, but they might be convenient. However, making them variable width seems like it opens up the possibility for obscure bugs. I would not recommend that anyone use octal literals, and I don't think they're an advantage to the language. Even if they were, their current representation is not. Can we just remove this?
One octal literal is very commonly used: \0. At least save this one. Just don't go allowing things like "\012" to mean ['\0', '1', '2']. Stewart.
Mar 14 2009
Christopher Wright wrote:Can we just remove this?
The octal literals are done the way C does them. The reason they are there are for when translating C code to D code, obscure bugs are not introduced.
Mar 14 2009
Walter Bright wrote: <snip>The octal literals are done the way C does them. The reason they are there are for when translating C code to D code, obscure bugs are not introduced.
How would making them illegal not achieve this aim? Stewart.
Mar 14 2009
Hello Stewart,Walter Bright wrote: <snip>The octal literals are done the way C does them. The reason they are there are for when translating C code to D code, obscure bugs are not introduced.
Stewart.
Unless you also drop \0 then any octal literal starting in 0 will get incorrectly lexed.
Mar 14 2009
Stewart Gordon wrote:Walter Bright wrote: <snip>The octal literals are done the way C does them. The reason they are there are for when translating C code to D code, obscure bugs are not introduced.
How would making them illegal not achieve this aim?
The only point to making them illegal would be to eventually remove them completely, which puts us back to \00 meaning something different in D than in C.
Mar 14 2009
Walter Bright wrote:Stewart Gordon wrote:Walter Bright wrote: <snip>The octal literals are done the way C does them. The reason they are there are for when translating C code to D code, obscure bugs are not introduced.
How would making them illegal not achieve this aim?
The only point to making them illegal would be to eventually remove them completely, which puts us back to \00 meaning something different in D than in C.
The "Obscure bugs during translation from C" argument presumes that such errors are more likely than ones such as: int powersOfTen[] = { 0001, //okay 0010, // error: this is 8, not 10 0100, // error: this is 64, not 100 1000, // okay }; and what the heck does "\000000\000000000\000\0000" mean? I doubt there is much extant C code which uses octal. Automated translations of octal literals can be done accurately, and you're even supplying the 'htod' converter! Note that C# doesn't have octal literals, but does include \0. So there's a precedent for dropping them. This also means that right now, converting code from C# to D can also introduce obscure bugs. I'd argue that that's a scenario that is at least as likely as bugs from C. I think the argument for octal is very, very weak.
Mar 17 2009
Reply to don,I think the argument for octal is very, very weak.
OTOH even if I grant that, I don't see much reason for dropping them.
Mar 17 2009
Don wrote:and what the heck does "\000000\000000000\000\0000" mean?
It doesn't matter, because if you're translating C code to D, the code is probably correct even if you don't know what it means.I doubt there is much extant C code which uses octal. Automated translations of octal literals can be done accurately, and you're even supplying the 'htod' converter!
htod is not intended for creating implementation source code. It's just for headers. I expect most C translations will be done by hand.Note that C# doesn't have octal literals, but does include \0. So there's a precedent for dropping them. This also means that right now, converting code from C# to D can also introduce obscure bugs. I'd argue that that's a scenario that is at least as likely as bugs from C.
It is a good point, but I don't see people translating C# to D. But I do see translating C to D (I do it myself!).I think the argument for octal is very, very weak.
The issue is really the cost of it being in vs the benefit of pulling it out. I see very little cost of leaving it in, so it doesn't need much benefit to make it worthwhile.
Mar 17 2009
Reply to Walter,It is a good point, but I don't see people translating C# to D. But I do see translating C to D (I do it myself!).
I am working with a ~11KLOC c# code base and a tool to automatically translate it to D "I had a problem, I decided to solve it with reg-ex, not I have 200 problems" <g>
Mar 17 2009
BCS wrote:Reply to Walter,It is a good point, but I don't see people translating C# to D. But I do see translating C to D (I do it myself!).
I am working with a ~11KLOC c# code base and a tool to automatically translate it to D
Color me wrong, then!
Mar 17 2009
Hello Walter,BCS wrote:Reply to Walter,It is a good point, but I don't see people translating C# to D. But I do see translating C to D (I do it myself!).
translate it to D
Not to far off, you just forgot to qualify it with "sane". That said, D is a lot like c#, near enough that most things in console apps translate well. We have plans to release the translator "at some point".
Mar 17 2009
BCS wrote:We have plans to release the translator "at some point".
That'll be cool!
Mar 17 2009
Walter Bright wrote:Don wrote:and what the heck does "\000000\000000000\000\0000" mean?
It doesn't matter, because if you're translating C code to D, the code is probably correct even if you don't know what it means.
Note that in C, you can't reasonably have \0 embedded in a string. But in both D and C# you can. So the "\0000" case isn't really a problem for C. It's far more likely in D that someone would write: "1st\02nd\03rd\04th\0"; and expect it to work.I doubt there is much extant C code which uses octal. Automated translations of octal literals can be done accurately, and you're even supplying the 'htod' converter!
htod is not intended for creating implementation source code. It's just for headers. I expect most C translations will be done by hand.
The point is that a reasonable fraction of the few remaining instances of octal literals, will be machine translated, and will therefore be free from these errors.Note that C# doesn't have octal literals, but does include \0. So there's a precedent for dropping them. This also means that right now, converting code from C# to D can also introduce obscure bugs. I'd argue that that's a scenario that is at least as likely as bugs from C.
It is a good point, but I don't see people translating C# to D. But I do see translating C to D (I do it myself!).I think the argument for octal is very, very weak.
The issue is really the cost of it being in vs the benefit of pulling it out. I see very little cost of leaving it in, so it doesn't need much benefit to make it worthwhile.
Inertia is the strongest argument, I think. Octal-related bugs may occur (1) when translating from ancient C code, if octal is removed. (2) when translating from C#, if octal is retained. (3) when writing new D code, if octal is retained. IMHO, (2) and (3) are more probable than (1). However, all 3 cases are quite unlikely. It's extremely low on the list of priorities.
Mar 18 2009
Walter Bright wrote:Christopher Wright wrote:Can we just remove this?
The octal literals are done the way C does them. The reason they are there are for when translating C code to D code, obscure bugs are not introduced.
Okay, that makes sense. Removing it would be an option; \0 would have to change to \x00. But it's not a big deal, just an annoying blemish.
Mar 14 2009









Stewart Gordon <smjg_1998 yahoo.com> 