www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - C locale

reply =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
On my OS X SDK, locale.h has:

     #define	LC_ALL		0
     #define	LC_COLLATE	1
     #define	LC_CTYPE	2
     #define	LC_MONETARY	3
     #define	LC_NUMERIC	4
     #define	LC_TIME		5
     #define	LC_MESSAGES	6

     #define	_LC_LAST	7		/* marks end */

On std.c.locate it has:

     enum LC_CTYPE          = 0;
     enum LC_NUMERIC        = 1;
     enum LC_TIME           = 2;
     enum LC_COLLATE        = 3;
     enum LC_MONETARY       = 4;
     enum LC_ALL            = 6;
     enum LC_PAPER          = 7;  // non-standard
     enum LC_NAME           = 8;  // non-standard
     enum LC_ADDRESS        = 9;  // non-standard
     enum LC_TELEPHONE      = 10; // non-standard
     enum LC_MEASUREMENT    = 11; // non-standard
     enum LC_IDENTIFICATION = 12; // non-standard

The mismatch of course causes problems.

Are the locale enumerate values supposed to be only source 
compatible? (not binary compatible). Should I send a patch for OS 
X? For what system are the current D values, Linux? (The author 
is Sean Kelly).
Sep 26 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/26/2013 7:38 PM, "Luís Marques" <luis luismarques.eu>" wrote:
 The mismatch of course causes problems.

std.c.locale must match the values in the host system's <locale.h>. If it doesn't, it's a bug.
Sep 26 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/27/2013 10:25 AM, "Luís Marques" <luis luismarques.eu>" wrote:
 - I asked for what OS the current values were, but for now I assumed they were
 for Linux only. Does anyone besides Sean Kelly know? Is it reasonable to assert
 for the other systems? If not, what's the alternative? Let the compilation fail
 and people wonder why LC_* are not defined?

The idea is: version (linux) { ... } else version (Windows) { ... } else version (OSX) { ... } else { static assert(0); } I.e. the values should be POSITIVELY set for each system, NOT defaulted. The advantages are: 1. when you port to a new system, you get compile time errors for every place where you need to check/fix the values 2. if you want to fix the values for one system, you don't muck up the values for any other system
 - Why, oh why, is "linux" the only OS version() identifier that is not
 capitalized?

Because "linux" is what gcc predefines for Linux. (gcc also sets __gnu_linux, __linux__, and __linux, none of which are capitalized. It's the Linux way, not some nefarious plot of mine to disparage it.)
Sep 27 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/27/2013 5:28 PM, "Luís Marques" <luis luismarques.eu>" wrote:
 BTW, I have for more than once wondered why there was no way to specify more
 than one version identifier (is there?), like this:

      version(Windows, OSX)
      {

For the reason you mentioned earlier. If you are changing the OSX values, you'll likely mess up the Windows ones. I've been at this for 30 years, and am quite fed up with the bugs from attempts to save a few keystrokes. The practice of separating the os sections into distinct ones has been a big win in reliability.
Sep 27 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/27/2013 6:36 PM, Walter Bright wrote:
 On 9/27/2013 5:28 PM, "Luís Marques" <luis luismarques.eu>" wrote:
 BTW, I have for more than once wondered why there was no way to specify more
 than one version identifier (is there?), like this:

      version(Windows, OSX)
      {

For the reason you mentioned earlier. If you are changing the OSX values, you'll likely mess up the Windows ones. I've been at this for 30 years, and am quite fed up with the bugs from attempts to save a few keystrokes. The practice of separating the os sections into distinct ones has been a big win in reliability.

And, of course, as you discovered, when they are defaulted they are usually wrong, and they are wrong in a most pernicious, hard to discover way. The code looks right, and may even sort of behave itself. The only way you can tell if it's wrong is to laboriously and tediously go through the system's .h files.
Sep 27 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 9/27/2013 7:18 PM, "Luís Marques" <luis luismarques.eu>" wrote:
 My point here is not to argue against your choice for the
 standard library. My issues of version(X, Y) arose in client code
 (non-lib), where it seems to me that the second kind of bug is
 probably more likely to occur than the first kind. So, I politely
 ask, are you sure the language should not support something like
 version(X, Y), for cases where developers think that it is a
 better trade-off than multiple version() blocks?

I understand your point and reasoning, and it has come up repeatedly. It's not obvious why, but that feature (in C and C++) leads to wretched, unmaintainable, buggy, coding horrors. You don't have to take my word for it - do a grep for #if across some C or C++ code that's been maintained by multiple people over a period of years. I include the source code of dmd itself as a (bad) example of #if hell. I've gone to some effort to beat that disease out of the front end, but like barnacles on a boat, they always come back and threaten to sink the ship. The back end is far worse. There are other ways of doing versioning that do not have this result, and I recommend giving them a chance long before reaching for this particular one.
Sep 27 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/27/2013 4:44 PM, "Luís Marques" <luis luismarques.eu>" wrote:
 On Friday, 27 September 2013 at 19:23:12 UTC, Walter Bright wrote:
     static assert(0);

Do you prefer assert(0) instead of assert(false)?

Do whichever you prefer.
 Is it not worth to put a
 message after the 0/false? (static assert(0, "foo missing"); )

I find assert messages to be redundant, pointlessly repeating what is obvious from the context, and saying things an extra time. But I'm in the minority with that opinion.
 BTW, does that mean that gcc also defines capitalized "OSX", "Posix", etc.?
 (otherwise I don't understand your argument)

If you're looking for gcc naming consistency, you'll be badly disappointed.
Sep 27 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/27/2013 7:50 PM, "Luís Marques" <luis luismarques.eu>" wrote:
 It really is not my intention to start an argument over something
 inconsequential, but I understood your point was: "´linux´ is the
 only OS identifier that is not capitalized because that's what's
 consistent with gcc". But if the other OS identifiers are not
 consistent with gcc then I don't understand your point.

If you're looking for consistency, there isn't any consistent consistency. It's all just a bikeshed issue. I explained why it's "linux", and for better or worse, it is not worth all the grief & disruption changing it.
Sep 27 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 9/27/2013 9:46 PM, "Luís Marques" <luis luismarques.eu>" wrote:
 Let's use some lateral thinking. How about a compiler warning if,
 say, a version statement does not match any defined version
 identifier but it would if a case-insensitive comparison was made?

Having some of the language be case sensitive and others case insensitive? Please no.
Sep 27 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-09-28 06:46, "Luís Marques" <luis luismarques.eu>" wrote:

 Let's use some lateral thinking. How about a compiler warning if,
 say, a version statement does not match any defined version
 identifier but it would if a case-insensitive comparison was made?

It's possible to have user defined version identifiers: module foo; version (Foo) // when Foo else // else dmd foo.d -version=Foo -- /Jacob Carlborg
Sep 28 2013
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/27/13 6:52 PM, Walter Bright wrote:
 Is it not worth to put a
 message after the 0/false? (static assert(0, "foo missing"); )

I find assert messages to be redundant, pointlessly repeating what is obvious from the context, and saying things an extra time. But I'm in the minority with that opinion.

On my team we found this to be the case for static asserts. Dynamic asserts are more often preceded by an explanatory comment. A couple of quick examples from a grep search, revealing a grab bag: // missing token! always_assert(false); // should be handled in onfunction / onmethod always_assert(false); // where do we output n_HEREDOC? always_assert(false); // unexpected assert(IS_STRING_TYPE(cell->m_type)); assert(IsValidKey(k)); // Array escalation must not happen during these reserved // initializations. assert(newp == m_data); static_assert(!(KindOfBoolean & KindOfStringBit), ""); static_assert(keyType != KeyType::Any, "KeyType::Any is not supported in arraySetMImpl"); Andrei
Sep 28 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-09-28 03:52, Walter Bright wrote:

 If you're looking for gcc naming consistency, you'll be badly disappointed.

Same for D. It's not consistent with GCC neither is it consistent within it self. But we have already had this discussion several times before. No point in having it again. -- /Jacob Carlborg
Sep 28 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-09-28 01:44, "Luís Marques" <luis luismarques.eu>" wrote:

 I can send a pull request with the values filled-in for Windows and OS X.

You need FreeBSD as well.
 Haha :-) I understand, my remark was lighthearted. Still, it seems a bit
 inconsistent and error prone, given the other identifiers. I mean, I'm
 all in favor of using "darwin" for OS X (more technically correct, and
 allows your code to compile in a pure Darwin environment), but if you
 changed it to "OSX" because it was more discoverable then... that's the
 kind of usability issue I'm talking about.

Yes, it is inconsistent.
 BTW, does that mean that gcc also defines capitalized "OSX", "Posix",
 etc.? (otherwise I don't understand your argument)

No, GCC defines __APPLE__ for Mac OS X and it does not define Posix. -- /Jacob Carlborg
Sep 28 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-09-28 02:28, "Luís Marques" <luis luismarques.eu>" wrote:
 BTW, I have for more than once wondered why there was no way to specify
 more than one version identifier (is there?), like this:

      version(Windows, OSX)
      {
          enum LC_ALL            = 0;
          enum LC_COLLATE        = 1;
          enum LC_CTYPE          = 2;
          enum LC_MONETARY       = 3;
          enum LC_NUMERIC        = 4;
          enum LC_TIME           = 5;
      }
      version(OSX)
      {

          enum LC_MESSAGES       = 6;
      }

 Is there a way to use version() that avoids having to repeat the equal
 declarations for both Windows and OSX?

If you really want to do this, despite what Walter has said. You can use manifest constants and static ifs: version (OSX) enum OSX = true; else enum OSX = false; version (linux) enum linux = true; else enum linux = false; And so on static if (linux || OSX) { } else {} -- /Jacob Carlborg
Sep 28 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-09-27 19:25, "Luís Marques" <luis luismarques.eu>" wrote:

 What should I do? Do you want me to submit this? Does anyone have
 another Posix system laying around and want to check the locale constants?

Any file dealing with platform specific functionality need to at least support the following platforms: Mac OS X Linux Windows FreeBSD 32 and 64bit on all of the above platforms. It should be possible to find headers or documentation for all of these online. -- /Jacob Carlborg
Sep 28 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-09-27 04:38, "Luís Marques" <luis luismarques.eu>" wrote:

 On std.c.locate it has:

You should be using core.stdc.locale, not that it does any difference in this case. -- /Jacob Carlborg
Sep 26 2013
prev sibling next sibling parent =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
On Friday, 27 September 2013 at 06:43:02 UTC, Jacob Carlborg 
wrote:
 You should be using core.stdc.locale, not that it does any 
 difference in this case.

Sorry, that's what I meant.
Sep 27 2013
prev sibling next sibling parent =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
On Friday, 27 September 2013 at 04:54:45 UTC, Walter Bright wrote:
 std.c.locale must match the values in the host system's 
 <locale.h>. If it doesn't, it's a bug.

Well, I was trying to assess how exactly I should fix it. For instance, in my local copy I changed it to: version(linux) { enum LC_CTYPE = 0; enum LC_NUMERIC = 1; enum LC_TIME = 2; enum LC_COLLATE = 3; enum LC_MONETARY = 4; enum LC_ALL = 6; enum LC_PAPER = 7; // non-standard enum LC_NAME = 8; // non-standard enum LC_ADDRESS = 9; // non-standard enum LC_TELEPHONE = 10; // non-standard enum LC_MEASUREMENT = 11; // non-standard enum LC_IDENTIFICATION = 12; // non-standard } else version(OSX) { enum LC_ALL = 0; enum LC_COLLATE = 1; enum LC_CTYPE = 2; enum LC_MONETARY = 3; enum LC_NUMERIC = 4; enum LC_TIME = 5; enum LC_MESSAGES = 6; } else version(all) { static assert(false, "locales not specified for this system"); } I have that ready to push in my git repo, but I'm not very happy about it: - I asked for what OS the current values were, but for now I assumed they were for Linux only. Does anyone besides Sean Kelly know? Is it reasonable to assert for the other systems? If not, what's the alternative? Let the compilation fail and people wonder why LC_* are not defined? - Why, oh why, is "linux" the only OS version() identifier that is not capitalized? :-) I mean, if you get it wrong the compiler won't even warn you about it.... What should I do? Do you want me to submit this? Does anyone have another Posix system laying around and want to check the locale constants?
Sep 27 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Sep 27, 2013 at 07:25:14PM +0200, digitalmars-d-bounces puremagic.com
wrote:
 On Friday, 27 September 2013 at 04:54:45 UTC, Walter Bright wrote:
std.c.locale must match the values in the host system's
<locale.h>. If it doesn't, it's a bug.

Well, I was trying to assess how exactly I should fix it. For instance, in my local copy I changed it to: version(linux) { enum LC_CTYPE = 0; enum LC_NUMERIC = 1; enum LC_TIME = 2; enum LC_COLLATE = 3; enum LC_MONETARY = 4; enum LC_ALL = 6; enum LC_PAPER = 7; // non-standard enum LC_NAME = 8; // non-standard enum LC_ADDRESS = 9; // non-standard enum LC_TELEPHONE = 10; // non-standard enum LC_MEASUREMENT = 11; // non-standard enum LC_IDENTIFICATION = 12; // non-standard } else version(OSX) { enum LC_ALL = 0; enum LC_COLLATE = 1; enum LC_CTYPE = 2; enum LC_MONETARY = 3; enum LC_NUMERIC = 4; enum LC_TIME = 5; enum LC_MESSAGES = 6; } else version(all) { static assert(false, "locales not specified for this system"); } I have that ready to push in my git repo, but I'm not very happy about it: - I asked for what OS the current values were, but for now I assumed they were for Linux only. Does anyone besides Sean Kelly know? Is it reasonable to assert for the other systems? If not, what's the alternative? Let the compilation fail and people wonder why LC_* are not defined?

Walter's stance, IIRC, is that this code should only compile for the platforms on which the values have been verified, and static assert(0) for everything else. Every other platform will break, of course, but that's what tells us what other OSes we need to verify the values for. FWIW, I just verified the correctness of the version(linux) block above against /usr/include/locale.h on my system (Linux 64-bit Debian/unstable). Sadly, I don't have access to other Posix OSes so I can't say much beyond this. T -- I'm still trying to find a pun for "punishment"...
Sep 27 2013
prev sibling next sibling parent =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
On Friday, 27 September 2013 at 19:23:12 UTC, Walter Bright wrote:
     static assert(0);

Do you prefer assert(0) instead of assert(false)? Is it not worth to put a message after the 0/false? (static assert(0, "foo missing"); ) I can send a pull request with the values filled-in for Windows and OS X.
 - Why, oh why, is "linux" the only OS version() identifier 
 that is not
 capitalized?

Because "linux" is what gcc predefines for Linux. (gcc also sets __gnu_linux, __linux__, and __linux, none of which are capitalized. It's the Linux way, not some nefarious plot of mine to disparage it.)

Haha :-) I understand, my remark was lighthearted. Still, it seems a bit inconsistent and error prone, given the other identifiers. I mean, I'm all in favor of using "darwin" for OS X (more technically correct, and allows your code to compile in a pure Darwin environment), but if you changed it to "OSX" because it was more discoverable then... that's the kind of usability issue I'm talking about. BTW, does that mean that gcc also defines capitalized "OSX", "Posix", etc.? (otherwise I don't understand your argument)
Sep 27 2013
prev sibling next sibling parent =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
BTW, I have for more than once wondered why there was no way to 
specify more than one version identifier (is there?), like this:

     version(Windows, OSX)
     {
         enum LC_ALL            = 0;
         enum LC_COLLATE        = 1;
         enum LC_CTYPE          = 2;
         enum LC_MONETARY       = 3;
         enum LC_NUMERIC        = 4;
         enum LC_TIME           = 5;
     }
     version(OSX)
     {

         enum LC_MESSAGES	   = 6;
     }

Is there a way to use version() that avoids having to repeat the 
equal declarations for both Windows and OSX?
Sep 27 2013
prev sibling next sibling parent =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
On Saturday, 28 September 2013 at 01:42:52 UTC, Walter Bright
wrote:
 For the reason you mentioned earlier. If you are changing the 
 OSX values, you'll
 likely mess up the Windows ones.

 I've been at this for 30 years, and am quite fed up with the 
 bugs from attempts
 to save a few keystrokes. The practice of separating the os 
 sections into
 distinct ones has been a big win in reliability.

And, of course, as you discovered, when they are defaulted they are usually wrong, and they are wrong in a most pernicious, hard to discover way. The code looks right, and may even sort of behave itself. The only way you can tell if it's wrong is to laboriously and tediously go through the system's .h files.

Sure, I think it was unwise to "default" the locale enums if the values do not have standard definitions across all systems (although I still thank Sean Kelly for the headers, without them porting code would have been an even greater chore). But the issue of version(X, Y) is not a case of default, it is something where you say explicitly that you want the same code for two explicitly stated systems, without having to copy-paste the block of code for both systems. You mention that one disadvantage of such construct is that it will increase the probability of developers changing both X and Y when they only want to change X. But with two version blocks (and copy-pasted code) you increase the probability that you only change X when you should change X and Y. And with version() blocks that is even less likely to be noticed, because the code only has to be syntactically correct. My point here is not to argue against your choice for the standard library. My issues of version(X, Y) arose in client code (non-lib), where it seems to me that the second kind of bug is probably more likely to occur than the first kind. So, I politely ask, are you sure the language should not support something like version(X, Y), for cases where developers think that it is a better trade-off than multiple version() blocks?
Sep 27 2013
prev sibling next sibling parent =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
On Saturday, 28 September 2013 at 01:52:10 UTC, Walter Bright
wrote:
 BTW, does that mean that gcc also defines capitalized "OSX", 
 "Posix", etc.?
 (otherwise I don't understand your argument)

If you're looking for gcc naming consistency, you'll be badly disappointed.

It really is not my intention to start an argument over something inconsequential, but I understood your point was: "´linux´ is the only OS identifier that is not capitalized because that's what's consistent with gcc". But if the other OS identifiers are not consistent with gcc then I don't understand your point.
Sep 27 2013
prev sibling next sibling parent =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
On Saturday, 28 September 2013 at 04:31:41 UTC, Walter Bright
wrote:
 If you're looking for consistency, there isn't any consistent 
 consistency. It's all just a bikeshed issue. I explained why 
 it's "linux", and for better or worse, it is not worth all the 
 grief & disruption changing it.

Let's use some lateral thinking. How about a compiler warning if, say, a version statement does not match any defined version identifier but it would if a case-insensitive comparison was made?
Sep 27 2013
prev sibling parent =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11293
https://github.com/D-Programming-Language/druntime/pull/641
Oct 18 2013