www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Octal literals: who uses this?

reply Christopher Wright <dhasenan gmail.com> writes:
I've been looking at dil and lexing D. Lexing character literals and 
string literals is not quite so easy as I thought it would be, but 
overall not difficult either.

One thing I'm curious about:
There are three forms of hex literals:
\x: 2 digits
\u: 4 digits
\U: 8 digits

There is one form of octal literal:
\: 1 to 3 digits

Why? With hex literals, each option is a fixed width. That is sensible.

Octal literals aren't necessary with hex literals, but they might be 
convenient. However, making them variable width seems like it opens up 
the possibility for obscure bugs. I would not recommend that anyone use 
octal literals, and I don't think they're an advantage to the language. 
Even if they were, their current representation is not.

Can we just remove this?
Mar 14 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Christopher Wright wrote:
 I've been looking at dil and lexing D. Lexing character literals and 
 string literals is not quite so easy as I thought it would be, but 
 overall not difficult either.
 
 One thing I'm curious about:
 There are three forms of hex literals:
 \x: 2 digits
 \u: 4 digits
 \U: 8 digits
 
 There is one form of octal literal:
 \: 1 to 3 digits
 
 Why? With hex literals, each option is a fixed width. That is sensible.
 
 Octal literals aren't necessary with hex literals, but they might be 
 convenient. However, making them variable width seems like it opens up 
 the possibility for obscure bugs. I would not recommend that anyone use 
 octal literals, and I don't think they're an advantage to the language. 
 Even if they were, their current representation is not.
 
 Can we just remove this?

All the escaped literals are going away, I believe.
Mar 14 2009
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Sean Kelly wrote:
<snip>
 All the escaped literals are going away, I believe.

I think all that's happening there is the removal of escaped characters not enclosed in quotes. Stewart.
Mar 14 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sat, Mar 14, 2009 at 9:13 AM, Christopher Wright <dhasenan gmail.com> wrote:
 I've been looking at dil and lexing D. Lexing character literals and string
 literals is not quite so easy as I thought it would be, but overall not
 difficult either.

 One thing I'm curious about:
 There are three forms of hex literals:
 \x: 2 digits
 \u: 4 digits
 \U: 8 digits

 There is one form of octal literal:
 \: 1 to 3 digits

 Why? With hex literals, each option is a fixed width. That is sensible.

 Octal literals aren't necessary with hex literals, but they might be
 convenient. However, making them variable width seems like it opens up the
 possibility for obscure bugs. I would not recommend that anyone use octal
 literals, and I don't think they're an advantage to the language. Even if
 they were, their current representation is not.

People use octal? Agreed.
Mar 14 2009
prev sibling next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Christopher Wright wrote:
<snip>
 Octal literals aren't necessary with hex literals, but they might be 
 convenient. However, making them variable width seems like it opens up 
 the possibility for obscure bugs. I would not recommend that anyone use 
 octal literals, and I don't think they're an advantage to the language. 
 Even if they were, their current representation is not.
 
 Can we just remove this?

One octal literal is very commonly used: \0. At least save this one. Just don't go allowing things like "\012" to mean ['\0', '1', '2']. Stewart.
Mar 14 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Christopher Wright wrote:
 Can we just remove this?

The octal literals are done the way C does them. The reason they are there are for when translating C code to D code, obscure bugs are not introduced.
Mar 14 2009
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter Bright wrote:
<snip>
 The octal literals are done the way C does them. The reason they are 
 there are for when translating C code to D code, obscure bugs are not 
 introduced.

How would making them illegal not achieve this aim? Stewart.
Mar 14 2009
next sibling parent BCS <none anon.com> writes:
Hello Stewart,

 Walter Bright wrote:
 <snip>
 The octal literals are done the way C does them. The reason they are
 there are for when translating C code to D code, obscure bugs are not
 introduced.
 

Stewart.

Unless you also drop \0 then any octal literal starting in 0 will get incorrectly lexed.
Mar 14 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Stewart Gordon wrote:
 Walter Bright wrote:
 <snip>
 The octal literals are done the way C does them. The reason they are 
 there are for when translating C code to D code, obscure bugs are not 
 introduced.

How would making them illegal not achieve this aim?

The only point to making them illegal would be to eventually remove them completely, which puts us back to \00 meaning something different in D than in C.
Mar 14 2009
parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 <snip>
 The octal literals are done the way C does them. The reason they are 
 there are for when translating C code to D code, obscure bugs are not 
 introduced.

How would making them illegal not achieve this aim?

The only point to making them illegal would be to eventually remove them completely, which puts us back to \00 meaning something different in D than in C.

The "Obscure bugs during translation from C" argument presumes that such errors are more likely than ones such as: int powersOfTen[] = { 0001, //okay 0010, // error: this is 8, not 10 0100, // error: this is 64, not 100 1000, // okay }; and what the heck does "\000000\000000000\000\0000" mean? I doubt there is much extant C code which uses octal. Automated translations of octal literals can be done accurately, and you're even supplying the 'htod' converter! Note that C# doesn't have octal literals, but does include \0. So there's a precedent for dropping them. This also means that right now, converting code from C# to D can also introduce obscure bugs. I'd argue that that's a scenario that is at least as likely as bugs from C. I think the argument for octal is very, very weak.
Mar 17 2009
next sibling parent BCS <ao pathlink.com> writes:
Reply to don,


 I think the argument for octal is very, very weak.
 

OTOH even if I grant that, I don't see much reason for dropping them.
Mar 17 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Don wrote:
 and what the heck does "\000000\000000000\000\0000" mean?

It doesn't matter, because if you're translating C code to D, the code is probably correct even if you don't know what it means.
 I doubt there is much extant C code which uses octal. Automated 
 translations of octal literals can be done accurately, and you're even 
 supplying the 'htod' converter!

htod is not intended for creating implementation source code. It's just for headers. I expect most C translations will be done by hand.
 Note that C# doesn't have octal literals, but does include \0. So 
 there's a precedent for dropping them. This also means that right now, 
 converting code from C# to D can also introduce obscure bugs. I'd argue 
 that that's a scenario that is at least as likely as bugs from C.

It is a good point, but I don't see people translating C# to D. But I do see translating C to D (I do it myself!).
 I think the argument for octal is very, very weak.

The issue is really the cost of it being in vs the benefit of pulling it out. I see very little cost of leaving it in, so it doesn't need much benefit to make it worthwhile.
Mar 17 2009
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Walter,

 It is a good point, but I don't see people translating C# to D. But I
 do see translating C to D (I do it myself!).
 

I am working with a ~11KLOC c# code base and a tool to automatically translate it to D "I had a problem, I decided to solve it with reg-ex, not I have 200 problems" <g>
Mar 17 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
BCS wrote:
 Reply to Walter,
 
 It is a good point, but I don't see people translating C# to D. But I
 do see translating C to D (I do it myself!).

I am working with a ~11KLOC c# code base and a tool to automatically translate it to D

Color me wrong, then!
Mar 17 2009
parent reply BCS <none anon.com> writes:
Hello Walter,

 BCS wrote:
 
 Reply to Walter,
 
 It is a good point, but I don't see people translating C# to D. But
 I do see translating C to D (I do it myself!).
 

translate it to D


Not to far off, you just forgot to qualify it with "sane". That said, D is a lot like c#, near enough that most things in console apps translate well. We have plans to release the translator "at some point".
Mar 17 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
BCS wrote:
 We have plans to release the translator "at some point".

That'll be cool!
Mar 17 2009
prev sibling parent Don <nospam nospam.com> writes:
Walter Bright wrote:
 Don wrote:
 and what the heck does "\000000\000000000\000\0000" mean?

It doesn't matter, because if you're translating C code to D, the code is probably correct even if you don't know what it means.

Note that in C, you can't reasonably have \0 embedded in a string. But in both D and C# you can. So the "\0000" case isn't really a problem for C. It's far more likely in D that someone would write: "1st\02nd\03rd\04th\0"; and expect it to work.
 I doubt there is much extant C code which uses octal. Automated 
 translations of octal literals can be done accurately, and you're even 
 supplying the 'htod' converter!

htod is not intended for creating implementation source code. It's just for headers. I expect most C translations will be done by hand.

The point is that a reasonable fraction of the few remaining instances of octal literals, will be machine translated, and will therefore be free from these errors.
 
 Note that C# doesn't have octal literals, but does include \0. So 
 there's a precedent for dropping them. This also means that right now, 
 converting code from C# to D can also introduce obscure bugs. I'd 
 argue that that's a scenario that is at least as likely as bugs from C.

It is a good point, but I don't see people translating C# to D. But I do see translating C to D (I do it myself!).
 I think the argument for octal is very, very weak.

The issue is really the cost of it being in vs the benefit of pulling it out. I see very little cost of leaving it in, so it doesn't need much benefit to make it worthwhile.

Inertia is the strongest argument, I think. Octal-related bugs may occur (1) when translating from ancient C code, if octal is removed. (2) when translating from C#, if octal is retained. (3) when writing new D code, if octal is retained. IMHO, (2) and (3) are more probable than (1). However, all 3 cases are quite unlikely. It's extremely low on the list of priorities.
Mar 18 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Walter Bright wrote:
 Christopher Wright wrote:
 Can we just remove this?

The octal literals are done the way C does them. The reason they are there are for when translating C code to D code, obscure bugs are not introduced.

Okay, that makes sense. Removing it would be an option; \0 would have to change to \x00. But it's not a big deal, just an annoying blemish.
Mar 14 2009