digitalmars.D - std.locale

Andrei Alexandrescu (44/44) Mar 01 2009 Sooner or later that will need to be defined. I know next to nothing

Walter Bright (7/11) Mar 01 2009 I disagree with being able to assign to the global defaultLocale. This

Andrei Alexandrescu (9/22) Mar 01 2009 I don't understand this. That means there's no more default locale.

Walter Bright (2/27) Mar 01 2009 That's fine, I was thrown off by your reference to a "global reference".

Andrei Alexandrescu (7/36) Mar 01 2009 Well I was thinking a global reference might be handy for people who

Walter Bright (3/40) Mar 01 2009 User settable global state is eeevil.

Andrei Alexandrescu (22/33) Mar 01 2009 I am thinking of a better form using scope-based locale usage. Consider:
BCS (4/6) Mar 01 2009 User *alterable* global state is eeevil. I can see a good argument for i...

Walter Bright (2/10) Mar 01 2009 Sure, I meant global state once initialized.

Georg Wrede (16/25) Mar 01 2009 The two programs that are most "locale aware" are usually spread sheets

Andrei Alexandrescu (7/37) Mar 01 2009 That's exactly what my proposal is doing. People can start with the

Georg Wrede (77/117) Mar 01 2009 From Java.util.class.locale (j2se/1.4.2): "A Locale object represents a...

Rainer Deyke (12/21) Mar 02 2009 There's a third faction: graphical apps that don't use the underlying
Walter Bright (7/10) Mar 02 2009 Sure. Often the only way to see if a feature is useful is to actually

Georg Wrede (2/16) Mar 02 2009 LOL :-)
Gide Nwawudu (7/14) Mar 03 2009 I think DbC would be widely used if it worked with inheritance and
Daniel Keep (18/26) Mar 03 2009 I've used complex numbers before, but only when rendering fractals.

bearophile (6/10) Mar 03 2009 A simple solution is to not use -release for the final version of the co...

Andrei Alexandrescu (8/20) Mar 03 2009 I agree. I'm having the same problem: I put a contract in there, I know

Sergey Gromov (3/23) Mar 03 2009 I'd really like to see enforce() as a built-in language feature.
Walter Bright (13/19) Mar 03 2009 Contracts are not for input validation! They are checking if the logic

Max Samukha (7/26) Mar 03 2009 This is exactly how I look at them. However I've never tried to use
Sean Kelly (14/27) Mar 03 2009 Why should contracts be limited to parameter checking of internally used

Georg Wrede (6/36) Mar 04 2009 The distinction is not whether you or others write stuff. It's about

Sean Kelly (6/32) Mar 04 2009 So I guess the real question is whether a function is expected to

Andrei Alexandrescu (32/41) Mar 04 2009 Interesting. My policy is to favor validation whenever it doesn't impact...

Max Samukha (4/45) Mar 04 2009 If you intruduce a dummy type, why not make it perform validation in a

Andrei Alexandrescu (8/11) Mar 04 2009 Because in the case of binarySearch slowButSafe quickly becomes

Max Samukha (6/17) Mar 04 2009 I intentionaly proposed a special debug mode, not regular asserts,

Andrei Alexandrescu (4/25) Mar 04 2009 I am waiting a couple of days in release mode with all optimizations

Max Samukha (3/28) Mar 04 2009 Ok

Sean Kelly (21/67) Mar 04 2009 Interesting. So the inexpensive checks would go in the function body

Andrei Alexandrescu (27/35) Mar 04 2009 I understand. This is rather new, but I found it irresistibly cool to

bearophile (9/16) Mar 04 2009 I guess you meant:

Andrei Alexandrescu (5/26) Mar 04 2009 Binary search is rather common.

bearophile (7/10) Mar 04 2009 I don't want to use too much of your time (that it may be better spent w...

Andrei Alexandrescu (25/40) Mar 04 2009 Of five, three are frequent (linear, binary, Boyer-Moore), one is a form

bearophile (9/17) Mar 04 2009 Yeah. Sorted-array-based sets are probably common enough in C++ code (bu...
Steve Schveighoffer (14/62) Mar 04 2009 BTW, what happens if you pass a sorted list into find? Intuitively,

Andrei Alexandrescu (12/25) Mar 04 2009 It's up to the designer of the API. Passing a sorted forward range into

Christopher Wright (6/14) Mar 05 2009 How does this work? find in a sorted linked list has the same expected

Andrei Alexandrescu (3/18) Mar 05 2009 We're saying the same thing.

Sean Kelly (6/9) Mar 04 2009 tango.core.Array deals exclusively with indexes, but its aim is to make

bearophile (12/20) Mar 04 2009 In my dlibs I do in a simpler and shorter way:

Andrei Alexandrescu (5/24) Mar 04 2009 They do the exact same thing. Unifying them under the same name is good

Derek Parnell (23/27) Mar 04 2009 The rule-of-thumb that I use is that a function needs to validate a

Walter Bright (6/19) Mar 04 2009 Your "users" are anyone external to your built binary. That means that

Derek Parnell (6/7) Mar 03 2009 Hear! Hear! This is exactly correct.

Christopher Wright (2/10) Mar 03 2009 I fucking love contracts. I need to use them more, but I do use them.

Christopher Wright (9/16) Mar 02 2009 On the other hand, if you have a Chinese development team, why should

Walter Bright (13/22) Mar 02 2009 As soon as you put in a limit on identifier name length, sooner or later...

Georg Wrede (2/29) Mar 02 2009 I take it back.

Andrei Alexandrescu (13/18) Mar 02 2009 I think the C locale (or any predefined locale) tells what left bracket

grauzone (3/3) Mar 02 2009 What is language specific about how an array is formatted? I think
Daniel Keep (5/29) Mar 02 2009 As far as I'm concerned, an array should be printed as close to how it

Sergey Gromov (21/27) Mar 02 2009 I'm Russian. For me, encoding problems are a PITA of such epic

Andrei Alexandrescu (5/16) Mar 02 2009 This is serendipitous. I just posted an example involving throwing a
Yigal Chripun (3/29) Mar 02 2009 encoding isn't that hard compared to other issues.

Don (15/44) Mar 02 2009 And Microsoft products do "locale awareness" so badly, I'm pretty sure

Frits van Bommel (9/23) Mar 02 2009 Not just expats.

Georg Wrede (37/39) Mar 01 2009 D uses Utf-8, and that is *good enough*!

Andrei Alexandrescu (18/67) Mar 01 2009 I don't find that scary at all. It's quite what I expected. We should

Georg Wrede (60/134) Mar 02 2009 My ex wife has this GPS thing in her car. Very nice. But once on the

Andrei Alexandrescu (22/66) Mar 02 2009 They will only be slower, by necessity, for people who want them

Georg Wrede (48/68) Mar 02 2009 I read the page. It says "This data is used by a wide spectrum of

Andrei Alexandrescu (7/90) Mar 02 2009 Very simple. If we have a locale table, I am thinking of dedicating a

Georg Wrede (11/17) Mar 02 2009 Let's disconnect the two.

Walter Bright (4/6) Mar 02 2009 Wow, D usually gets slammed for not having a feature that even a cursory...
Jarrett Billingsley (15/28) Mar 02 2009 hs

Walter Bright (2/4) Mar 02 2009 These already exist in std.uni.

Jarrett Billingsley (3/8) Mar 02 2009 It's std.utf, but good to know.

Andrei Alexandrescu (18/43) Mar 02 2009 I must be missing something huge because I keep on misunderestimating

Michel Fortin (27/41) Mar 02 2009 What I'm missing is a justification as of why you need all this data in
Georg Wrede (20/66) Mar 02 2009 One of the problems is, people start expecting something if they find

Andrei Alexandrescu (7/7) Mar 02 2009 Georg Wrede wrote:

Michel Fortin (43/48) Mar 03 2009 Sad.

Leandro Lucarella (13/24) Mar 02 2009 I'm not following this thread carefully and I don't know if this is what

Michel Fortin (26/33) Mar 02 2009 I do agree.

Andrei Alexandrescu (10/48) Mar 02 2009 This is exactly one thing I want to avoid for Phobos: defining class

Christopher Wright (2/11) Mar 02 2009 Weak typing for the win!

Andrei Alexandrescu (4/16) Mar 02 2009 Yes. Sometimes it's exactly what the doctor prescribed, as I believe is

Andrei Alexandrescu (4/24) Mar 02 2009 You're right, we won't engage in the business of maintaining locale

Derek Parnell (9/11) Mar 02 2009 Ok, for awhile there I thought you were attempting to duplicate the effo...

Walter Bright (7/12) Mar 01 2009 I've attempted to use locales, but the reason I'd always wind up doing

Joel C. Salomon (4/11) Mar 01 2009 Sounds like it’s not yet suitable for D2, then, at least not in std.

Andrei Alexandrescu (8/19) Mar 01 2009 Good idea. But before we do so, I was hoping I'd pick the brains of

Michel Fortin (36/43) Mar 02 2009 I think there are three aspects to localization. One is date and number

Leandro Lucarella (29/39) Mar 02 2009 I think you are confusing localization (l10n) with internationalization
Andrei Alexandrescu (31/75) Mar 02 2009 Well I was thinking of passing the buck around. Instead of std.locale

Sergey Gromov (12/25) Mar 02 2009 This example does not address the encoding problem. Currently, errnomsg

Andrei Alexandrescu (9/38) Mar 02 2009 I think it isn't too much, considering the sorry state of affairs of
Rainer Deyke (8/12) Mar 02 2009 No, stdin/stdout *must* perform this conversion. It is a serious bug if

Andrei Alexandrescu (3/17) Mar 02 2009 What API to use to detect the encoding used by the console?

Daniel Keep (14/41) Mar 02 2009 According to , it's uint

Andrei Alexandrescu (5/54) Mar 02 2009 Ahhhh... Windows you mean? Ehm. I need to get to a Windows machine. If

Sergey Gromov (2/18) Mar 02 2009 There is std.windows.charset.toMBSz(str, 1) which does the right thing.

Christopher Wright (6/12) Mar 02 2009 This short example suggests:
Derek Parnell (12/23) Mar 02 2009 One problem with this approach is that we meet the limitation of the

Andrei Alexandrescu (4/27) Mar 02 2009 Phobos has supported Posix positional syntax since 2.006.

Derek Parnell (6/9) Mar 02 2009 Thank you. I was behind the times (again).

Michel Fortin (54/127) Mar 02 2009 Looks somewhat like what I proposed. But the point I was trying to make

Walter Bright (4/10) Mar 02 2009 It's a silly thing, but I love the little google widget you can add to a...

Michel Fortin (25/28) Mar 02 2009 It's not a silly thing, it's hilarious. Look, Google has invented the

Walter Bright (3/12) Mar 02 2009 A bug in Google's translator is there's no way to tell it to ignore a
Sean Kelly (4/22) Mar 03 2009 Wow, I didn't know the standard French form for formulas was prefix

Daniel Keep (4/29) Mar 03 2009 Wait, does this mean the French speak LISP?

Georg Wrede (16/29) Mar 01 2009 I'd venture to say, it's not only the libraries -- the stuff itself is

Ellery Newcomer (7/64) Mar 01 2009 If you don't use it, you don't use it; but please don't ruin it for the
Christopher Wright (2/8) Mar 02 2009 *cough*tango.time*cough*

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sooner or later that will need to be defined. I know next to nothing 
about locales. (I know I dislike the design C++ uses.)

I was thinking of a design along the following lines. There are RFCs 
dedicated to locale nomenclature:

http://tools.ietf.org/html/rfc4646 for language names
http://www.unicode.org/cldr/ for various locale names

So we know the basic names we want to follow, which is one less burden. 
Then what I want to do is to define a hierarchical string table that 
fills the appropriate names.

This is in opposition to defining an actual class hierarchy that mimics 
the localization table. I think a hierarchical string table is better 
because it allows simple extensibility.

The type stored by each slot of a locale is:

Algebraic!(
     int,
     string,
     Variant delegate(Variant),
     This[string]);

meaning that a locale could store one of these types. (What else should 
go in there?)

The access pattern goes like:

// Get the date display pattern
auto pat = myLocale.get("calendars", "calendar=default",
     "dateFormats", "dateFormatLength=medium", "pattern");

This will return an Algebraic with a string in it. The string looks like 
e.g. "yyyy-MM-dd".

The access is rather verbose because the corresponding locale names tree 
is equally (actually more) verbose, see 
http://unicode.org/Public/cldr/1.6.1/core.zip. But the flexibility and 
the standards-compliance are there. We may add later some convenience 
functions for frequently-used stuff such as dates, times, and numbers.

Extension is obvious:

myLocale.put("my-category", "my-slot", "whatever");

Getting later the stuff in "my-category", "my-slot" will return a string 
Algebraic containing "whatever".

There will be a global reference to a Locale class, e.g. defaultLocale. 
By default the reference will be null, implying the C locale should be 
in effect. Applications can assign to it as they find fit, and also pass 
around multiple locale variables.

So I wanted to gather some good ideas about locale design. Is a 
string-and-Algebraic design good for all uses? What kind of locale 
functionality does it not capture? I must have missed a ton of details, 
so if you don't understand what I mean by the above, it must be me.



Andrei

Mar 01 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. defaultLocale. 
 By default the reference will be null, implying the C locale should be 
 in effect. Applications can assign to it as they find fit, and also pass 
 around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This 
is going to cause endless problems. Just one is that any function that 
uses locale can no longer be pure. defaultLocale should be immutable.

Any function that is locale aware should be parameterized with a locale 
parameter. (Not only is that better design, it self-documents the 
dependency.)

Mar 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they find 
 fit, and also pass around multiple locale variables.

 
 I disagree with being able to assign to the global defaultLocale. This 
 is going to cause endless problems. Just one is that any function that 
 uses locale can no longer be pure. defaultLocale should be immutable.
 
 Any function that is locale aware should be parameterized with a locale 
 parameter. (Not only is that better design, it self-documents the 
 dependency.)

I don't understand this. That means there's no more default locale. 
Here's what I had in mind:

class Locale { ... }

// function parameterized with an optional locale
void foo(Data d, Locale loc = null);

So there's no more default locale. If you pass in null, that's the 
default locale.


Andrei

Mar 01 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

 I disagree with being able to assign to the global defaultLocale. This 
 is going to cause endless problems. Just one is that any function that 
 uses locale can no longer be pure. defaultLocale should be immutable.

 Any function that is locale aware should be parameterized with a 
 locale parameter. (Not only is that better design, it self-documents 
 the dependency.)

 
 I don't understand this. That means there's no more default locale. 
 Here's what I had in mind:
 
 class Locale { ... }
 
 // function parameterized with an optional locale
 void foo(Data d, Locale loc = null);
 
 So there's no more default locale. If you pass in null, that's the 
 default locale.

That's fine, I was thrown off by your reference to a "global reference".

Mar 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

 I disagree with being able to assign to the global defaultLocale. 
 This is going to cause endless problems. Just one is that any 
 function that uses locale can no longer be pure. defaultLocale should 
 be immutable.

 Any function that is locale aware should be parameterized with a 
 locale parameter. (Not only is that better design, it self-documents 
 the dependency.)

 I don't understand this. That means there's no more default locale. 
 Here's what I had in mind:

 class Locale { ... }

 // function parameterized with an optional locale
 void foo(Data d, Locale loc = null);

 So there's no more default locale. If you pass in null, that's the 
 default locale.

 
 That's fine, I was thrown off by your reference to a "global reference".

Well I was thinking a global reference might be handy for people who 
e.g. want to set the locale once and then be done with it. I think only 
a few apps actually manipulate multiple locales simultaneously. Most 
would just want to load the locale present on the user's computer and 
then use it.

Andrei

Mar 01 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the 
 C locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

 I disagree with being able to assign to the global defaultLocale. 
 This is going to cause endless problems. Just one is that any 
 function that uses locale can no longer be pure. defaultLocale 
 should be immutable.

 Any function that is locale aware should be parameterized with a 
 locale parameter. (Not only is that better design, it self-documents 
 the dependency.)

 I don't understand this. That means there's no more default locale. 
 Here's what I had in mind:

 class Locale { ... }

 // function parameterized with an optional locale
 void foo(Data d, Locale loc = null);

 So there's no more default locale. If you pass in null, that's the 
 default locale.

 That's fine, I was thrown off by your reference to a "global reference".

 
 Well I was thinking a global reference might be handy for people who 
 e.g. want to set the locale once and then be done with it.

That's what I was objecting to!

 I think only 
 a few apps actually manipulate multiple locales simultaneously. Most 
 would just want to load the locale present on the user's computer and 
 then use it.

User settable global state is eeevil.

Mar 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Well I was thinking a global reference might be handy for people who 
 e.g. want to set the locale once and then be done with it.

 
 That's what I was objecting to!
 
 I think only a few apps actually manipulate multiple locales 
 simultaneously. Most would just want to load the locale present on the 
 user's computer and then use it.

 
 User settable global state is eeevil.

I am thinking of a better form using scope-based locale usage. Consider:

class Locale { ... }
struct LocaleContext {
     this(Locale value);
     ~this();
     private Locale value();
     alias value this;
     ...
}

People wouldn't have access to a global Locale object. They can, 
however, create LocaleContext objects. Such objects set the current 
locale to user's locale in the constructor and restore the previous 
locale in the destructor.

That way use of locales follows use of scopes and the long-distance 
dependency created by globals is largely diminished.

An application just needing to create a LocaleContext upon loading and 
be done with it can create its own LocaleContext inside e.g. main(). A 
more sophisticated app may manage multiple locale contexts and put them 
in action as it needs. It's really flexible, and without promoting bad 
programming styles.


Andrei

Mar 01 2009

BCS <none anon.com> writes:

Hello Walter,
 User settable global state is eeevil.
 

User *alterable* global state is eeevil. I can see a good argument for
immutable 
WORM variables that can be assigned to exactly once very early in the program 
load process.

Mar 01 2009

Walter Bright <newshound1 digitalmars.com> writes:

BCS wrote:
 Hello Walter,
 User settable global state is eeevil.

 
 User *alterable* global state is eeevil. I can see a good argument for 
 immutable WORM variables that can be assigned to exactly once very early 
 in the program load process.

Sure, I meant global state once initialized.

Mar 01 2009

Georg Wrede <georg.wrede iki.fi> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they find 
 fit, and also pass around multiple locale variables.

 
 I disagree with being able to assign to the global defaultLocale. This 
 is going to cause endless problems. Just one is that any function that 
 uses locale can no longer be pure. defaultLocale should be immutable.

The two programs that are most "locale aware" are usually spread sheets 
and word processors.

It is usual that the user needs to write, say, in Swedish or in Russian, 
while in a Finnish setting. Or that one wants to use a decimal separator 
other than what is "proper" for the country.

For example, a lot of people use "." instead of the official "," in 
Finland, and many use time as "18:23" instead of "18.23".


For this purpose, these programs let the users define these any way they 
want.

I think the notion of locales is, slowly but steadily, going away.

It was a nice idea at the time, but with two problems: users don't use 
it, and programmers don't use it.


Of course, eventually we will want to "do something" about this. But 
that should be left to the day when real issues are all sorted out in D. 
This is a non-urgent, low-priority thing.

Mar 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

 I disagree with being able to assign to the global defaultLocale. This 
 is going to cause endless problems. Just one is that any function that 
 uses locale can no longer be pure. defaultLocale should be immutable.

 
 The two programs that are most "locale aware" are usually spread sheets 
 and word processors.
 
 It is usual that the user needs to write, say, in Swedish or in Russian, 
 while in a Finnish setting. Or that one wants to use a decimal separator 
 other than what is "proper" for the country.
 
 For example, a lot of people use "." instead of the official "," in 
 Finland, and many use time as "18:23" instead of "18.23".
 
 
 For this purpose, these programs let the users define these any way they 
 want.

That's exactly what my proposal is doing. People can start with the 
defaults of the Finnish locale and then overwrite whichever parts they want.

 I think the notion of locales is, slowly but steadily, going away.

Do you have any data backing this up?

 It was a nice idea at the time, but with two problems: users don't use 
 it, and programmers don't use it.

Is it because it hasn't been properly packaged?

 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in D. 
 This is a non-urgent, low-priority thing.

I guess. Now please tell me how I print arrays in D.


Andrei

Mar 01 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

 I disagree with being able to assign to the global defaultLocale. 
 This is going to cause endless problems. Just one is that any 
 function that uses locale can no longer be pure. defaultLocale should 
 be immutable.

 The two programs that are most "locale aware" are usually spread 
 sheets and word processors.

 It is usual that the user needs to write, say, in Swedish or in 
 Russian, while in a Finnish setting. Or that one wants to use a 
 decimal separator other than what is "proper" for the country.

 For example, a lot of people use "." instead of the official "," in 
 Finland, and many use time as "18:23" instead of "18.23".

 For this purpose, these programs let the users define these any way 
 they want.

 
 That's exactly what my proposal is doing. People can start with the 
 defaults of the Finnish locale and then overwrite whichever parts they 
 want.

 From Java.util.class.locale (j2se/1.4.2): "A Locale object represents a 
specific geographical, political, or cultural region."

Nice. If those three were orthogonal, then you'd choose each once and be 
done with it. Unfortunately, they blend. And they blend in a different 
way in every area. That creates "continuums" of needs for settings, and 
these can't really be predicted easily.

A GUI user can rely on the settings been made at OS install by himself 
or the local vendor. But the console is different. (See below.)

 I think the notion of locales is, slowly but steadily, going away.

 
 Do you have any data backing this up?

For instance, in the old days, the operating system used to define the 
variable LC_LOCAL for the user. It signified the locale, usually the 
user's country.

Today, I see no such thing. The only variables related to such are for 
the GUI:

LANG=en_US.UTF-8
GDM_LANG=en_US.UTF-8

One is the console input language and the other is the GUI input 
language. No locale stuff anywhere.

 It was a nice idea at the time, but with two problems: users don't use 
 it, and programmers don't use it.

 
 Is it because it hasn't been properly packaged?

No. Imagine for a moment that we had a Perfect Locale Implementation 
(which I say is not even possible, but still).

If a programmer wanted to use locale dependent printing, then he'd have 
to get familiar with all the possible ways his string may get printed if 
someone uses his program in a far away country. And there are a few 
different ways, believe me.

Would you imagine anybody actually bothering to do that? Would you?? So 
what the programmer does, is, he prints things the way he wants, and 
caters only to the specific things he feels he needs to. And creates a 
solution that behaves *predictably*, from his point of view.

He may want folks in France and Finland to use his program. And since he 
doesn't write the UI strings in any other language, the program will be 
unusable to folks in Afghanistan anyway.

Or he writes an English UI, whereupon people accept that it may not 
cater for all kinds of exotic needs.

 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in 
 D. This is a non-urgent, low-priority thing.


Had there been any need for locales, believe me, the "foreigners" in 
this NG would have asked for it.

 I guess. Now please tell me how I print arrays in D.

Think about it for a moment. We have two kinds of programs, those 
written for the console, and those written for a GUI. It's natural for 
the GUI programs to be locale aware, but with the console apps, it 
simply is not possible to do properly. I'll explain, but first:

Let's split this into two separate issues, the console and the GUI.

The GUI is aware of your preferences.
You don't use writefln with the GUI.
You use the GUI API for any I/O, right?

Now, wouldn't it be natural to assume that the GUI API takes care of all 
of this? Print a date, and it prints it with the user's preferred 
format. *The same with your array*.


And then let's look at the console.

A proper internationalisation would mean that the Chinese could use the 
console, and all character mode apps in Chinese. Problem is, there 
simply aren't enough pixels on many consoles to render the Chinese 
character set.

So we're off track already. And with the ubiquitous GUIs around, people 
are increasingly accepting that a GUI is for nationalised stuff, and the 
console is for "technical" stuff.

Haven't you noticed: in the last decade it has become all the more 
evident that the reason to write a non-GUI app, is very specifically 
just to get rid of all kinds of hassles, and simply concentrate on what 
the program is supposed to do!

(You know, a few years ago we had a major conversation here about 
whether non-ASCII variable names should be accepted in D. The end result 
is, yes. (I just tried it.) Now, how can an international team cowork on 
a project where variable names are written so the other folks can't even 
type them with their keyboards??? -- All very nice, but no cigar. That's 
about as smart as letting people define *unlimited* length variable names!)


*** How to print arrays ***

You print arrays in a predictable and expected way.

D array printing is for non-GUI stuff. Hence, you use the C locale, period.

A matematician seriously doesn't want his arrays to have commas instead 
of decimal points. He sure as heck doesn't want the numbers to all of a 
sudden turn to Klingon like hieroglyphs just because he is showing his 
results in an overseas seminar, on the local computer!!!!!


And what about the programmer who wants his array to go into another 
program? What do you think happens to parsing when the decimal point is 
suddenly a comma??

We've had Walter make nice features to D that were laborious to create, 
only to see nobody use them. It's happened, ask him. *Now* is not the 
time to do that again.

Mar 01 2009

Rainer Deyke <rainerd eldwood.com> writes:

Georg Wrede wrote:
 Let's split this into two separate issues, the console and the GUI.
 
 The GUI is aware of your preferences.
 You don't use writefln with the GUI.
 You use the GUI API for any I/O, right?

There's a third faction: graphical apps that don't use the underlying
GUI API.  Most games fall in this category.

When writing cross-platform apps (whether gui, non-gui-but-graphical, or
console), you need some layer of abstraction over the underlying
platform localization API.  This abstraction can be provided by the
programming language, or a third-party library.

 A proper internationalisation would mean that the Chinese could use the
 console, and all character mode apps in Chinese. Problem is, there
 simply aren't enough pixels on many consoles to render the Chinese
 character set.

I have Windows configured to use a Japanese text encoding for command
windows.  I can and do run Japanese console applications, but console
applications that assume CP437 or Latin-1 don't work for me.


-- 
Rainer Deyke - rainerd eldwood.com

Mar 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to create, 
 only to see nobody use them. It's happened, ask him.

Sure. Often the only way to see if a feature is useful is to actually 
implement it and see what happens. Some features have succeeded and 
found uses far beyond my expectations (CTFE, string mixins) while others 
have pretty much languished (design by contract, complex numbers).


 *Now* is not the time to do that again.

To some extent, we can't predict that. But I did find your arguments 
pretty strong.

Mar 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

Walter Bright wrote:
 Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to 
 create, only to see nobody use them. It's happened, ask him.

 
 Sure. Often the only way to see if a feature is useful is to actually 
 implement it and see what happens. Some features have succeeded and 
 found uses far beyond my expectations (CTFE, string mixins) while others 
 have pretty much languished (design by contract, complex numbers).
 
 
 *Now* is not the time to do that again.

 
 To some extent, we can't predict that. But I did find your arguments 
 pretty strong.

LOL :-)

Mar 02 2009

Gide Nwawudu <gide btinternet.com> writes:

On Mon, 02 Mar 2009 01:37:55 -0800, Walter Bright
<newshound1 digitalmars.com> wrote:

Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to create, 
 only to see nobody use them. It's happened, ask him.

Sure. Often the only way to see if a feature is useful is to actually 
implement it and see what happens. Some features have succeeded and 
found uses far beyond my expectations (CTFE, string mixins) while others 
have pretty much languished (design by contract, complex numbers).

I think DbC would be widely used if it worked with inheritance and
could possible be apply to interfaces. There is an entry in Bugzilla
and has been voted sixth up to sixth place.

http://d.puremagic.com/issues/show_bug.cgi?id=302

Gide

Mar 03 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Walter Bright wrote:
 Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to
 create, only to see nobody use them. It's happened, ask him.

 
 Sure. Often the only way to see if a feature is useful is to actually
 implement it and see what happens. Some features have succeeded and
 found uses far beyond my expectations (CTFE, string mixins) while others
 have pretty much languished (design by contract, complex numbers).

I've used complex numbers before, but only when rendering fractals.
Sorry :P

As for design by contract, my problem has always been this:

Contracts let you ensure that your assumptions about program state are
never violated.  That means checking pre- and post-conditions on
functions, and invariants for classes.  Which is great.

So I put contracts on everything.  Fantastic.  I do a release compile,
and all that safety disappears.  So only the debug build has contracts
enabled.  But it's the release build, if it crashes, that I need help
diagnosing.

There's also libraries.  If you put contracts on public APIs, then your
library is only checking arguments in debug builds.  This makes release
builds faster, but also less safe.  So do you only put contracts on
internal APIs and do manual exception testing on public APIs?

I like DbC, I really do.  I just have trouble figuring out where and how
to use it properly.

  -- Daniel

Mar 03 2009

bearophile <bearophileHUGS lycos.com> writes:

Daniel Keep:
 So I put contracts on everything.  Fantastic.  I do a release compile,
 and all that safety disappears.  So only the debug build has contracts
 enabled.  But it's the release build, if it crashes, that I need help
 diagnosing.

A simple solution is to not use -release for the final version of the code, but
this keeps array bound controls too.
LDC may have already solved your problem, with extra compilation arguments that
you can use to disable such controls independently from each other.
It's not a fault of design by contract, it's just that the D compiler switches
are lumped together. It seems a simple to solve problem.

Bye,
bearophile

Mar 03 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Daniel Keep:
 So I put contracts on everything.  Fantastic.  I do a release compile,
 and all that safety disappears.  So only the debug build has contracts
 enabled.  But it's the release build, if it crashes, that I need help
 diagnosing.

 
 A simple solution is to not use -release for the final version of the code,
but this keeps array bound controls too.
 LDC may have already solved your problem, with extra compilation arguments
that you can use to disable such controls independently from each other.
 It's not a fault of design by contract, it's just that the D compiler switches
are lumped together. It seems a simple to solve problem.
 
 Bye,
 bearophile

I agree. I'm having the same problem: I put a contract in there, I know 
it's as good as assert. So I can't do e.g. input validation because in 
most functions input must always be validated. I also know that 
contracts are doing the wrong thing with inheritance and can't apply to 
interfaces, which is exactly the (only?) place they'd be interesting. So 
I send the contracts home and use assert, enforce, and unittest.

Andrei

Mar 03 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Tue, 03 Mar 2009 07:05:51 -0800, Andrei Alexandrescu wrote:

 bearophile wrote:
 Daniel Keep:
 So I put contracts on everything.  Fantastic.  I do a release compile,
 and all that safety disappears.  So only the debug build has contracts
 enabled.  But it's the release build, if it crashes, that I need help
 diagnosing.

 
 A simple solution is to not use -release for the final version of the code,
but this keeps array bound controls too.
 LDC may have already solved your problem, with extra compilation arguments
that you can use to disable such controls independently from each other.
 It's not a fault of design by contract, it's just that the D compiler switches
are lumped together. It seems a simple to solve problem.
 
 Bye,
 bearophile

 
 I agree. I'm having the same problem: I put a contract in there, I know 
 it's as good as assert. So I can't do e.g. input validation because in 
 most functions input must always be validated. I also know that 
 contracts are doing the wrong thing with inheritance and can't apply to 
 interfaces, which is exactly the (only?) place they'd be interesting. So 
 I send the contracts home and use assert, enforce, and unittest.

I'd really like to see enforce() as a built-in language feature.
assert() doesn't help in way too many situations.

Mar 03 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know 
 it's as good as assert. So I can't do e.g. input validation because in 
 most functions input must always be validated. I also know that 
 contracts are doing the wrong thing with inheritance and can't apply to 
 interfaces, which is exactly the (only?) place they'd be interesting. So 
 I send the contracts home and use assert, enforce, and unittest.

Contracts are not for input validation! They are checking if the logic 
of your program is correct or not. Think of it this way - your program 
should behave exactly the same with or without the contracts turned on.

Contracts should NOT be used for scrubbing user input, checking for 
errors from other components, or validating any input from external to 
the dll.

If you feel the need to leave them on in a release build, then:
1) your testing is inadequate
2) you are using them incorrectly

For example, Windows API functions check all their input. This is not 
contract programming - it's validating user input over which Microsoft 
has no control.

Mar 03 2009

Max Samukha <samukha voliacable.com.removethis> writes:

On Tue, 03 Mar 2009 11:00:36 -0800, Walter Bright
<newshound1 digitalmars.com> wrote:

Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know 
 it's as good as assert. So I can't do e.g. input validation because in 
 most functions input must always be validated. I also know that 
 contracts are doing the wrong thing with inheritance and can't apply to 
 interfaces, which is exactly the (only?) place they'd be interesting. So 
 I send the contracts home and use assert, enforce, and unittest.

Contracts are not for input validation! They are checking if the logic 
of your program is correct or not. Think of it this way - your program 
should behave exactly the same with or without the contracts turned on.

Contracts should NOT be used for scrubbing user input, checking for 
errors from other components, or validating any input from external to 
the dll.

If you feel the need to leave them on in a release build, then:
1) your testing is inadequate
2) you are using them incorrectly

For example, Windows API functions check all their input. This is not 
contract programming - it's validating user input over which Microsoft 
has no control.

This is exactly how I look at them. However I've never tried to use
pre/post conditions. I guess it's because of the syntax.

By the way, about that image on the contracts page. Is the bullet
flying away from the D-man because it's disgusted by his extreme
ugliness? :)

Mar 03 2009

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know
 it's as good as assert. So I can't do e.g. input validation because in
 most functions input must always be validated. I also know that
 contracts are doing the wrong thing with inheritance and can't apply to
 interfaces, which is exactly the (only?) place they'd be interesting. So
 I send the contracts home and use assert, enforce, and unittest.

 Contracts are not for input validation! They are checking if the logic
 of your program is correct or not. Think of it this way - your program
 should behave exactly the same with or without the contracts turned on.
 Contracts should NOT be used for scrubbing user input, checking for
 errors from other components, or validating any input from external to
 the dll.

Why should contracts be limited to parameter checking of internally used
functions only?  If I write a function and document parameter constraints
then I certainly expect those constraints to be followed regardless of
whether I'm calling the function or someone else is calling the function.
Checking these via a contract simply provides an optional means of
ensuring that a logic error didn't occur within the program as a whole.

If you're talking about application input however, then I agree completely.
ie. stuff typed in by the user, read from a file, etc, should never be validated
within a contract because an input failure at that level doesn't represent
a program logic error but rather user error.  An assertion failure isn't
a terribly good way of notifying the user that they shouldn't have put an
alphabetic character in a box intended to receive an integer :-)


Sean

Mar 03 2009

Georg Wrede <georg.wrede iki.fi> writes:

Sean Kelly wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know
 it's as good as assert. So I can't do e.g. input validation because in
 most functions input must always be validated. I also know that
 contracts are doing the wrong thing with inheritance and can't apply to
 interfaces, which is exactly the (only?) place they'd be interesting. So
 I send the contracts home and use assert, enforce, and unittest.

 Contracts are not for input validation! They are checking if the logic
 of your program is correct or not. Think of it this way - your program
 should behave exactly the same with or without the contracts turned on.
 Contracts should NOT be used for scrubbing user input, checking for
 errors from other components, or validating any input from external to
 the dll.

 
 Why should contracts be limited to parameter checking of internally used
 functions only?  If I write a function and document parameter constraints
 then I certainly expect those constraints to be followed regardless of
 whether I'm calling the function or someone else is calling the function.
 Checking these via a contract simply provides an optional means of
 ensuring that a logic error didn't occur within the program as a whole.

The distinction is not whether you or others write stuff. It's about 
whether it is for debugging *only*, as opposed to general input validation.

Sort of, like it's not prudent to put an assert anywhere other than 
where the source code (that is, a bug or goof by the programmer) causes 
the assert to fire.

 If you're talking about application input however, then I agree completely.
 ie. stuff typed in by the user, read from a file, etc, should never be
validated
 within a contract because an input failure at that level doesn't represent
 a program logic error but rather user error.  An assertion failure isn't
 a terribly good way of notifying the user that they shouldn't have put an
 alphabetic character in a box intended to receive an integer :-)
 
 
 Sean

Mar 04 2009

Sean Kelly <sean invisibleduck.org> writes:

Georg Wrede wrote:
 Sean Kelly wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know
 it's as good as assert. So I can't do e.g. input validation because in
 most functions input must always be validated. I also know that
 contracts are doing the wrong thing with inheritance and can't apply to
 interfaces, which is exactly the (only?) place they'd be 
 interesting. So
 I send the contracts home and use assert, enforce, and unittest.

 Contracts are not for input validation! They are checking if the logic
 of your program is correct or not. Think of it this way - your program
 should behave exactly the same with or without the contracts turned on.
 Contracts should NOT be used for scrubbing user input, checking for
 errors from other components, or validating any input from external to
 the dll.

 Why should contracts be limited to parameter checking of internally used
 functions only?  If I write a function and document parameter constraints
 then I certainly expect those constraints to be followed regardless of
 whether I'm calling the function or someone else is calling the function.
 Checking these via a contract simply provides an optional means of
 ensuring that a logic error didn't occur within the program as a whole.

 
 The distinction is not whether you or others write stuff. It's about 
 whether it is for debugging *only*, as opposed to general input validation.

So I guess the real question is whether a function is expected to 
validate its parameters.  I'd argue that it isn't, but then I'm from a 
C/C++ background.  For me, validation is a debugging tool, or at least 
an optional feature for applications that want the added insurance.


Sean

Mar 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Georg Wrede wrote:
 The distinction is not whether you or others write stuff. It's about 
 whether it is for debugging *only*, as opposed to general input 
 validation.

 
 So I guess the real question is whether a function is expected to 
 validate its parameters.  I'd argue that it isn't, but then I'm from a 
 C/C++ background.  For me, validation is a debugging tool, or at least 
 an optional feature for applications that want the added insurance.

Interesting. My policy is to favor validation whenever it doesn't impact 
performance. Imagine for example that strlen() validated its input for 
non-null. Would that show on the profiling chart of any C application? 
No, unless the application's core loop only called strlen() on a 
1-character string or so.

One simple case that clarifies the necessary tradeoff is binary search. 
That assumes the range to be searched is sorted. If you actually checked 
for that, it would render binary search useless as a linear search would 
be in fact faster. So you need to assume. One way to do so is in the 
documentation. You write in the docs that findSorted expects a sorted 
range. Another way is to encode this information in the type of the 
sorted range. But that's onerous as most of the time you have an array 
you just sorted, not a SortedArray value.

The approach I took with the new phobos is:

int[] haystack;
int[] needle;
...
auto pos1 = find(haystack, needle); // linear
sort(haystack);
auto pos2 = find(assumeSorted(haystack), needle);

The assumeSorted function wraps the haystack in an AssumeSorted!(int[]) 
type without adding members or running extra code. It's there to clarify 
to everyone what's going on. And it's usable with other arguments or 
functions too, e.g.

auto pos3 = find(haystack, assumeSorted(needle));
setIntersection(assumeSorted(haystack), assumeSorted(needle));

Interestingly, assumeSorted can actually do checking without impacting 
the complexity of the search. In debug mode, it can arrange to run 
random isSorted tests every 1/N calls, where N is the average length of 
the incoming arrays, then its complexity impact is amortized constant.


Andrei

Mar 04 2009

Max Samukha <samukha voliacable.com.removethis> writes:

On Wed, 04 Mar 2009 08:47:50 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Sean Kelly wrote:
 Georg Wrede wrote:
 The distinction is not whether you or others write stuff. It's about 
 whether it is for debugging *only*, as opposed to general input 
 validation.

 
 So I guess the real question is whether a function is expected to 
 validate its parameters.  I'd argue that it isn't, but then I'm from a 
 C/C++ background.  For me, validation is a debugging tool, or at least 
 an optional feature for applications that want the added insurance.

Interesting. My policy is to favor validation whenever it doesn't impact 
performance. Imagine for example that strlen() validated its input for 
non-null. Would that show on the profiling chart of any C application? 
No, unless the application's core loop only called strlen() on a 
1-character string or so.

One simple case that clarifies the necessary tradeoff is binary search. 
That assumes the range to be searched is sorted. If you actually checked 
for that, it would render binary search useless as a linear search would 
be in fact faster. So you need to assume. One way to do so is in the 
documentation. You write in the docs that findSorted expects a sorted 
range. Another way is to encode this information in the type of the 
sorted range. But that's onerous as most of the time you have an array 
you just sorted, not a SortedArray value.

The approach I took with the new phobos is:

int[] haystack;
int[] needle;
...
auto pos1 = find(haystack, needle); // linear
sort(haystack);
auto pos2 = find(assumeSorted(haystack), needle);

The assumeSorted function wraps the haystack in an AssumeSorted!(int[]) 
type without adding members or running extra code. It's there to clarify 
to everyone what's going on. And it's usable with other arguments or 
functions too, e.g.

auto pos3 = find(haystack, assumeSorted(needle));
setIntersection(assumeSorted(haystack), assumeSorted(needle));

Interestingly, assumeSorted can actually do checking without impacting 
the complexity of the search. In debug mode, it can arrange to run 
random isSorted tests every 1/N calls, where N is the average length of 
the incoming arrays, then its complexity impact is amortized constant.


Andrei

If you intruduce a dummy type, why not make it perform validation in a
debug build when sumthing like debug=slowButSafe is set?

Mar 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Max Samukha wrote:
 
 If you intruduce a dummy type, why not make it perform validation in a
 debug build when sumthing like debug=slowButSafe is set?

Because in the case of binarySearch slowButSafe quickly becomes 
slowAndUseless. It's happened to me - I had an assert(isSorted) in 
binary search (I guess it's in one of the older phobos releases!) and 
when I was using the debug version, my program would take forever to run.

A debug build should at most change the constant multiplying the 
complexity, not the complexity.


Andrei

Mar 04 2009

Max Samukha <samukha voliacable.com.removethis> writes:

On Wed, 04 Mar 2009 10:27:55 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Max Samukha wrote:
 
 If you intruduce a dummy type, why not make it perform validation in a
 debug build when sumthing like debug=slowButSafe is set?

Because in the case of binarySearch slowButSafe quickly becomes 
slowAndUseless. It's happened to me - I had an assert(isSorted) in 
binary search (I guess it's in one of the older phobos releases!) and 
when I was using the debug version, my program would take forever to run.

A debug build should at most change the constant multiplying the 
complexity, not the complexity.


Andrei

I intentionaly proposed a special debug mode, not regular asserts,
which are on in any debug build. I would like, knowing that I can wait
a couple of days to make sure my program is correct, to be able to
turn on validation.

Mar 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Max Samukha wrote:
 On Wed, 04 Mar 2009 10:27:55 -0800, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Max Samukha wrote:
 If you intruduce a dummy type, why not make it perform validation in a
 debug build when sumthing like debug=slowButSafe is set?

 Because in the case of binarySearch slowButSafe quickly becomes 
 slowAndUseless. It's happened to me - I had an assert(isSorted) in 
 binary search (I guess it's in one of the older phobos releases!) and 
 when I was using the debug version, my program would take forever to run.

 A debug build should at most change the constant multiplying the 
 complexity, not the complexity.


 Andrei

 
 I intentionaly proposed a special debug mode, not regular asserts,
 which are on in any debug build. I would like, knowing that I can wait
 a couple of days to make sure my program is correct, to be able to
 turn on validation.

I am waiting a couple of days in release mode with all optimizations 
turned on and wind from behind.

Andrei

Mar 04 2009

Max Samukha <samukha voliacable.com.removethis> writes:

On Wed, 04 Mar 2009 12:14:53 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Max Samukha wrote:
 On Wed, 04 Mar 2009 10:27:55 -0800, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Max Samukha wrote:
 If you intruduce a dummy type, why not make it perform validation in a
 debug build when sumthing like debug=slowButSafe is set?

 Because in the case of binarySearch slowButSafe quickly becomes 
 slowAndUseless. It's happened to me - I had an assert(isSorted) in 
 binary search (I guess it's in one of the older phobos releases!) and 
 when I was using the debug version, my program would take forever to run.

 A debug build should at most change the constant multiplying the 
 complexity, not the complexity.


 Andrei

 
 I intentionaly proposed a special debug mode, not regular asserts,
 which are on in any debug build. I would like, knowing that I can wait
 a couple of days to make sure my program is correct, to be able to
 turn on validation.

I am waiting a couple of days in release mode with all optimizations 
turned on and wind from behind.
Andrei

Ok

Mar 04 2009

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Georg Wrede wrote:
 The distinction is not whether you or others write stuff. It's about 
 whether it is for debugging *only*, as opposed to general input 
 validation.

 So I guess the real question is whether a function is expected to 
 validate its parameters.  I'd argue that it isn't, but then I'm from a 
 C/C++ background.  For me, validation is a debugging tool, or at least 
 an optional feature for applications that want the added insurance.

 
 Interesting. My policy is to favor validation whenever it doesn't impact 
 performance. Imagine for example that strlen() validated its input for 
 non-null. Would that show on the profiling chart of any C application? 
 No, unless the application's core loop only called strlen() on a 
 1-character string or so.

Interesting.  So the inexpensive checks would go in the function body 
itself, with the exhaustive extra stuff in contracts.  That does seem 
reasonable, though I still like the visual separation that the 'in' 
clause provides, and I'd love to be able to use the proposed inheritance 
feature of contracts, which seems like it might necessitate duplicating 
these inexpensive checks in the contract and in the function body itself.

 One simple case that clarifies the necessary tradeoff is binary search. 
 That assumes the range to be searched is sorted. If you actually checked 
 for that, it would render binary search useless as a linear search would 
 be in fact faster. So you need to assume. One way to do so is in the 
 documentation. You write in the docs that findSorted expects a sorted 
 range. Another way is to encode this information in the type of the 
 sorted range. But that's onerous as most of the time you have an array 
 you just sorted, not a SortedArray value.
 
 The approach I took with the new phobos is:
 
 int[] haystack;
 int[] needle;
 ...
 auto pos1 = find(haystack, needle); // linear
 sort(haystack);
 auto pos2 = find(assumeSorted(haystack), needle);
 
 The assumeSorted function wraps the haystack in an AssumeSorted!(int[]) 
 type without adding members or running extra code. It's there to clarify 
 to everyone what's going on. And it's usable with other arguments or 
 functions too, e.g.
 
 auto pos3 = find(haystack, assumeSorted(needle));
 setIntersection(assumeSorted(haystack), assumeSorted(needle));
 
 Interestingly, assumeSorted can actually do checking without impacting 
 the complexity of the search. In debug mode, it can arrange to run 
 random isSorted tests every 1/N calls, where N is the average length of 
 the incoming arrays, then its complexity impact is amortized constant.

One thing I've always really liked about pointer arguments is that they 
tend to document what's happening at the call-side as well (because of 
the address-of operator typically needed to obtain the address of a 
variable).  I tend to avoid boolean parameters for similar reasons, 
unless the meaning can be communicated clearly at the call point.  It 
seems like this serves a similar purpose, and I like it despite the 
potential for a user accidentally calling the slow overload when he 
could actually use the fast one--better it be correct than fast, after all.

I'm not terribly fond of the added verbosity however, or that this seems 
like I couldn't use the property form:

     assumeSorted("abcd").find('c')

Truth be told, my initial inclination would be to repackage the binary 
search as a one-liner with a different name, which kind of sabotages the 
whole idea.  But I'll try to resist this urge.

Mar 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 I'm not terribly fond of the added verbosity however, or that this seems 
 like I couldn't use the property form:
 
     assumeSorted("abcd").find('c')
 
 Truth be told, my initial inclination would be to repackage the binary 
 search as a one-liner with a different name, which kind of sabotages the 
 whole idea.  But I'll try to resist this urge.

I understand. This is rather new, but I found it irresistibly cool to 
unify find() routines under one name and specify structure in the 
arguments' types. Usage is very easy, there's little to remember, and 
every piece of structure is where it should be. Consider:

int[] a = [ 1, 2, 3, 4 ];
int[] b = [ 2. 3 ];

These algorithms each performs search a different way because each is 
informed in different ways about the structure of their arguments:

find(a, b);
find(assumeSorted(a), b);
find(a, assumeSorted(b));
find(assumeSorted(a), assumeSorted(b));
find(a, boyerMooreFinder(b));

There's three names to remember that compose modularly. The 
run-of-the-mill approach is:

find(a, b);
binaryFind(a, b);
findRhsSorted(a, b);
binaryFindRhsSorted(a, b);
boyerMooreFind(a, b);

To add insult to injury, boyerMooreFind is not enough because it hides 
the structure created around b. So there's also need for one extra type 
e.g. BoyerMooreFinder!(int[]) for cases when there are multiple searches 
of the same thing. It's just onerous.


Andrei

P.S. By the way, this is the running example used in Chapter 4 of TDPL.

Mar 04 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 int[] a = [ 1, 2, 3, 4 ];
 int[] b = [ 2. 3 ];

I guess you meant:

int[] b = [ 2, 3 ];

 find(a, b);
 find(assumeSorted(a), b);
 find(a, assumeSorted(b));
 find(assumeSorted(a), assumeSorted(b));
 find(a, boyerMooreFinder(b));

Are you talking about finding the position of a subarray into a bigger array?
Then the two useful cases are:
a.index(b);
a.indexBoyerMoore(b);
The other cases aren't common enough, I think.

Bye,
bearophile

Mar 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 int[] a = [ 1, 2, 3, 4 ];
 int[] b = [ 2. 3 ];

 
 I guess you meant:
 
 int[] b = [ 2, 3 ];
 
 find(a, b);
 find(assumeSorted(a), b);
 find(a, assumeSorted(b));
 find(assumeSorted(a), assumeSorted(b));
 find(a, boyerMooreFinder(b));

 
 Are you talking about finding the position of a subarray into a bigger array?
Then the two useful cases are:
 a.index(b);
 a.indexBoyerMoore(b);
 The other cases aren't common enough, I think.
 
 Bye,
 bearophile

Binary search is rather common.

As an aside, your use of "index" suggests you return integrals out of 
the function. IMHO that's strongly unrecommended.


Andrei

Mar 04 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Binary search is rather common.

Oh, yes, sorry, I meant among the ones you have listed there...


 As an aside, your use of "index" suggests you return integrals out of 
 the function. IMHO that's strongly unrecommended.

I don't want to use too much of your time (that it may be better spent with
your new child), but I don't understand what you mean.
That index() function is meant the index position of the item or sub-sequence
into the bigger array (or iterable), and it returns -1 if not found. This is an
usual design.
Some people think that such controls for -1 value aren't always done, so to
avoid that and some bugs, it's better to raise something like IndexException
when the needle isn't found.

Bye,
bearophile

Mar 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 Binary search is rather common.

 
 Oh, yes, sorry, I meant among the ones you have listed there...

Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
of set intersection (find sorted in sorted), and the odd one is:

find(a, assumeSorted(b));

This is rare but has excellent best-case complexity (is it O(a.length /
b.length)?) and is easy to add for completeness.

 As an aside, your use of "index" suggests you return integrals out
 of the function. IMHO that's strongly unrecommended.

 
 I don't want to use too much of your time (that it may be better
 spent with your new child), but I don't understand what you mean. 
 That index() function is meant the index position of the item or
 sub-sequence into the bigger array (or iterable), and it returns -1
 if not found. This is an usual design.

This is an extremely sloppy design. That it is usual doesn't make things
any better!

 Some people think that such controls for -1 value aren't always done,
 so to avoid that and some bugs, it's better to raise something like
 IndexException when the needle isn't found.

Yah, this is the subject of a long rant but in short: returning int from 
find means that essentially that find is unusable with anything except 
random access structures. This in turn means you'd have to have 
different means, APIs, and user code to deal with e.g. lists, in spite 
of the fact that linear search is the same boring thing for all: look at 
the current thing, yes/no, move on to the next thing. IMHO ever since 
the STL has seen the light of day there is no excuse, not even sheer 
ignorance, to ever traffic in integers as a mean to access elements in 
containers, in any language that has even the most modest parameterized 
types capability.

Returning int from find is an insult. To add injury to it, have find 
also return an int for a list.

"Is this item in this list?"
"Yeppers. It's the 538th element. Took me a hike to find it."
"Well I wanted to do something with it."
"Then go get it. I'm telling you, it'll take exactly 538 steps."


Andrei

Mar 04 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
 of set intersection (find sorted in sorted),

Yeah. Sorted-array-based sets are probably common enough in C++ code (but I
generally use something equivalent to hash-sets for this purpose, I have even
mostly implemented such class for D1 in the dlibs, with a nice API).
(I see sorted-array-based sets as an optimization to use in special cases,
while hash-sets are for the general case when you want to manage sets (in D
hashing needs a comparison too because of the chains are external and
tree-based, but some other languages for hash-sets you need just hashability
and not sortability of items)).


 and the odd one is:
 find(a, assumeSorted(b));
 This is rare but has excellent best-case complexity (is it O(a.length /
 b.length)?) and is easy to add for completeness.

I suggest you to not add this uncommon case, and add it only if later there are
enough people asking for it. Less code to write and maintain.
Creating functions and algorithms is a matter of balance between
over-generalization (that often leads to useless complexity and longer syntax)
and too much "special casing" that has other problems. Both extrema aren't good.
Life isn't easy, I guess we are asking you to be a cross between Alexander
Stepanov and Guido van Rossum :o)


 Returning int from find is an insult. To add injury to it, have find 
 also return an int for a list.

So you return an iterator, and no exception is raised, I see. I like this
enough.

Bye,
bearophile

Mar 04 2009

Steve Schveighoffer <schveiguy yahoo.com> writes:

On Wed, 04 Mar 2009 13:55:10 -0800, Andrei Alexandrescu wrote:

 bearophile wrote:
 Andrei Alexandrescu:
 Binary search is rather common.

 
 Oh, yes, sorry, I meant among the ones you have listed there...

 
 Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
 of set intersection (find sorted in sorted), and the odd one is:
 
 find(a, assumeSorted(b));
 
 This is rare but has excellent best-case complexity (is it O(a.length /
 b.length)?) and is easy to add for completeness.
 
 As an aside, your use of "index" suggests you return integrals out of
 the function. IMHO that's strongly unrecommended.

 
 I don't want to use too much of your time (that it may be better spent
 with your new child), but I don't understand what you mean. That
 index() function is meant the index position of the item or
 sub-sequence into the bigger array (or iterable), and it returns -1 if
 not found. This is an usual design.

 
 This is an extremely sloppy design. That it is usual doesn't make things
 any better!
 
 Some people think that such controls for -1 value aren't always done,
 so to avoid that and some bugs, it's better to raise something like
 IndexException when the needle isn't found.

 
 Yah, this is the subject of a long rant but in short: returning int from
 find means that essentially that find is unusable with anything except
 random access structures. This in turn means you'd have to have
 different means, APIs, and user code to deal with e.g. lists, in spite
 of the fact that linear search is the same boring thing for all: look at
 the current thing, yes/no, move on to the next thing. IMHO ever since
 the STL has seen the light of day there is no excuse, not even sheer
 ignorance, to ever traffic in integers as a mean to access elements in
 containers, in any language that has even the most modest parameterized
 types capability.
 
 Returning int from find is an insult. To add injury to it, have find
 also return an int for a list.
 
 "Is this item in this list?"
 "Yeppers. It's the 538th element. Took me a hike to find it." "Well I
 wanted to do something with it." "Then go get it. I'm telling you, it'll
 take exactly 538 steps."

BTW, what happens if you pass a sorted list into find?  Intuitively, 
you'd assume you can pass as assumeSorted?  But you can't really do 
anything but linear search?

Then what if you pass a tree structure into find?  It's sorted, but not 
exactly random access...

I think find should return an iterator/pointer fine, but I don't think 
there's a find template that can be used on all possible container 
types.  Probably the best solution I think is to implement find inside 
the container itself, and have the global find function advance a range 
to the point of the element (or return an empty range if not found), with 
"quick" searches reserved for sorted random-access ranges.  Note that stl 
works this way.

-Steve

Mar 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steve Schveighoffer wrote:
 BTW, what happens if you pass a sorted list into find?  Intuitively, 
 you'd assume you can pass as assumeSorted?  But you can't really do 
 anything but linear search?

It's up to the designer of the API. Passing a sorted forward range into 
sort will cut the average search time in half, which does not improve 
complexity.

 Then what if you pass a tree structure into find?  It's sorted, but not 
 exactly random access...
 
 I think find should return an iterator/pointer fine, but I don't think 
 there's a find template that can be used on all possible container 
 types.  Probably the best solution I think is to implement find inside 
 the container itself, and have the global find function advance a range 
 to the point of the element (or return an empty range if not found), with 
 "quick" searches reserved for sorted random-access ranges.  Note that stl 
 works this way.

Yah, that's right. The coolness of the technique comes mostly when the 
structure that can help searching is not obvious at the type system 
level (e.g. "is sorted") or is present in the needle, not the haystack 
(Boyer-Moore). I believe this approach is superior to STL's.

I still think it's cool to have find operate on a variety of haystacks 
and needles. That way users don't need to fiddle with details of 
changing call syntax etc.


Andrei

Mar 04 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 Steve Schveighoffer wrote:
 BTW, what happens if you pass a sorted list into find?  Intuitively, 
 you'd assume you can pass as assumeSorted?  But you can't really do 
 anything but linear search?

 
 It's up to the designer of the API. Passing a sorted forward range into 
 sort will cut the average search time in half, which does not improve 
 complexity.

How does this work? find in a sorted linked list has the same expected 
runtime as in an unsorted one -- on average, 0.5 * length. Assuming that 
comparisons and iterating are equally expensive, that is. If you assume 
that comparison is more expensive, you can do better, though cheaper 
comparisons don't benefit you at all.

Mar 05 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 Steve Schveighoffer wrote:
 BTW, what happens if you pass a sorted list into find?  Intuitively, 
 you'd assume you can pass as assumeSorted?  But you can't really do 
 anything but linear search?

 It's up to the designer of the API. Passing a sorted forward range 
 into sort will cut the average search time in half, which does not 
 improve complexity.

 
 How does this work? find in a sorted linked list has the same expected 
 runtime as in an unsorted one -- on average, 0.5 * length. Assuming that 
 comparisons and iterating are equally expensive, that is. If you assume 
 that comparison is more expensive, you can do better, though cheaper 
 comparisons don't benefit you at all.

We're saying the same thing.

Andrei

Mar 05 2009

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 
 Returning int from find is an insult. To add injury to it, have find 
 also return an int for a list.

tango.core.Array deals exclusively with indexes, but its aim is to make 
D's built-in arrays look more like a robust type than to provide a 
general set of algorithms usable with containers, etc.  So in that 
instance, I think the decision is justifiable.  It certainly makes for 
some nifty code when combined with the slice syntax.

Mar 04 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 The approach I took with the new phobos is:
 
 int[] haystack;
 int[] needle;
 ...
 auto pos1 = find(haystack, needle); // linear
 sort(haystack);
 auto pos2 = find(assumeSorted(haystack), needle);

In my dlibs I do in a simpler and shorter way:

auto pos1 = haystack.index(needle);
haystack.sort();
auto pos2 = haystack.bisect(needle);

Here there's no need to give the same name to two very different functions.

If you really like to use a single function mame, with named arguments you may
also do (bisect is false by default):

auto pos1 = haystack.index(needle);
haystack.sort();
auto pos2 = haystack.index(needle, bisect=true);

Bye,
bearophile

Mar 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 The approach I took with the new phobos is:
 
 int[] haystack; int[] needle; ... auto pos1 = find(haystack,
 needle); // linear sort(haystack); auto pos2 =
 find(assumeSorted(haystack), needle);

 
 In my dlibs I do in a simpler and shorter way:
 
 auto pos1 = haystack.index(needle); haystack.sort(); auto pos2 =
 haystack.bisect(needle);
 
 Here there's no need to give the same name to two very different
 functions.

They do the exact same thing. Unifying them under the same name is good
abstraction.

 If you really like to use a single function mame, with named
 arguments you may also do (bisect is false by default):
 
 auto pos1 = haystack.index(needle); haystack.sort(); 
 auto pos2 = haystack.index(needle, bisect=true);

I prefer encoding structural information in types.


Andrei

Mar 04 2009

Derek Parnell <derek psych.ward> writes:

On Wed, 04 Mar 2009 08:12:48 -0800, Sean Kelly wrote:

 So I guess the real question is whether a function is expected to 
 validate its parameters.  I'd argue that it isn't, but then I'm from a 
 C/C++ background.  For me, validation is a debugging tool, or at least 
 an optional feature for applications that want the added insurance.

The rule-of-thumb that I use is that a function needs to validate a
parameter if that parameter /can/ come from user input and /may not/ have
been previously validated and is /critical/ to the success of the
function's behaviour.

If all of these are true, it means that the function has a potential to
fail if it doesn't take the responsibility of parameter validation.

If a parameter can only come from other functions, which are already
guaranteed to only emit validate data, the parameter data does not need
re-validation. However, even for some of these functions a 'contract'
validation of input parameters might be needed if you are attempting to
validate the logic or data flow, rather than the contents of the data
itself.

Contract validation of function results is not the same thing as input
validation. Output validation is an attempt to prove that the function's
logic is correct.

Input validation is not a debugging tool. It is a chance to inform the
program's user that they might have given the program some wrong
information to work with.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Mar 04 2009

Walter Bright <newshound1 digitalmars.com> writes:

Sean Kelly wrote:
 Why should contracts be limited to parameter checking of internally used
 functions only?  If I write a function and document parameter constraints
 then I certainly expect those constraints to be followed regardless of
 whether I'm calling the function or someone else is calling the function.
 Checking these via a contract simply provides an optional means of
 ensuring that a logic error didn't occur within the program as a whole.
 
 If you're talking about application input however, then I agree completely.
 ie. stuff typed in by the user, read from a file, etc, should never be
validated
 within a contract because an input failure at that level doesn't represent
 a program logic error but rather user error.  An assertion failure isn't
 a terribly good way of notifying the user that they shouldn't have put an
 alphabetic character in a box intended to receive an integer :-)

Your "users" are anyone external to your built binary. That means that 
dll's should not use contracts to validate arguments passed to the dll's 
entry points.

If you're doing a library to be statically linked, it is debatable, and 
a decision you (as the library developer) need to make.

Mar 04 2009

Derek Parnell <derek psych.ward> writes:

On Tue, 03 Mar 2009 11:00:36 -0800, Walter Bright wrote:

 Contracts are not for input validation!

Hear! Hear! This is exactly correct.


-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Mar 03 2009

Christopher Wright <dhasenan gmail.com> writes:

Walter Bright wrote:
 Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to 
 create, only to see nobody use them. It's happened, ask him.

 
 Sure. Often the only way to see if a feature is useful is to actually 
 implement it and see what happens. Some features have succeeded and 
 found uses far beyond my expectations (CTFE, string mixins) while others 
 have pretty much languished (design by contract, complex numbers).

I fucking love contracts. I need to use them more, but I do use them.

Mar 03 2009

Christopher Wright <dhasenan gmail.com> writes:

Georg Wrede wrote:
 (You know, a few years ago we had a major conversation here about 
 whether non-ASCII variable names should be accepted in D. The end result 
 is, yes. (I just tried it.) Now, how can an international team cowork on 
 a project where variable names are written so the other folks can't even 
 type them with their keyboards???

On the other hand, if you have a Chinese development team, why should 
they be limited to ASCII variable names? It doesn't make sense for them.

 -- All very nice, but no cigar. That's 
 about as smart as letting people define *unlimited* length variable names!)

I recently dealt with a programming language that specified a limit of 
63 characters for identifier names. This wouldn't have been a 
significant problem, except that I was generating code automatically, 
and some of my identifiers were over 90 characters. Identifier length 

identifiers to 4096 characters).

Mar 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Christopher Wright wrote:
 -- All very nice, but no cigar. That's about as smart as letting 
 people define *unlimited* length variable names!)

 
 I recently dealt with a programming language that specified a limit of 
 63 characters for identifier names. This wouldn't have been a 
 significant problem, except that I was generating code automatically, 
 and some of my identifiers were over 90 characters. Identifier length 

 identifiers to 4096 characters).

As soon as you put in a limit on identifier name length, sooner or later 
you'll get a bug report on it.

For example, C++ can be compiled to C code. C++ templates encode their 
entire state into the template instance identifier, and these can easily 
reach 10,000 characters or more. So if your C compiler has a length 
limit on identifiers, then C++ templates become severely limited.

Another thing to consider is it's actually *more* work to put a limit 
on, where you have to document it, explain it, detect it, diagnose it, 
recover from it, than if you just make it unlimited.

There are really only 3 numbers in computer programming: 0, 1, and 
unlimited. I always chuckle when I see an ad for like, an editor, that 
says "up to 5 files open at once!".

Mar 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

Walter Bright wrote:
 Christopher Wright wrote:
 Georg Wrede wrote:
 -- All very nice, but no cigar. That's about as smart as letting 
 people define *unlimited* length variable names!)

 I recently dealt with a programming language that specified a limit of 
 63 characters for identifier names. This wouldn't have been a 
 significant problem, except that I was generating code automatically, 
 and some of my identifiers were over 90 characters. Identifier length 

 limits identifiers to 4096 characters).

 
 As soon as you put in a limit on identifier name length, sooner or later 
 you'll get a bug report on it.
 
 For example, C++ can be compiled to C code. C++ templates encode their 
 entire state into the template instance identifier, and these can easily 
 reach 10,000 characters or more. So if your C compiler has a length 
 limit on identifiers, then C++ templates become severely limited.
 
 Another thing to consider is it's actually *more* work to put a limit 
 on, where you have to document it, explain it, detect it, diagnose it, 
 recover from it, than if you just make it unlimited.
 
 There are really only 3 numbers in computer programming: 0, 1, and 
 unlimited. I always chuckle when I see an ad for like, an editor, that 
 says "up to 5 files open at once!".

I take it back.

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 *** How to print arrays ***
 
 You print arrays in a predictable and expected way.
 
 D array printing is for non-GUI stuff. Hence, you use the C locale, period.

I think the C locale (or any predefined locale) tells what left bracket 
I should use for array, what separator, and what right bracket. For now 
the left and right brackets were eliminated because the user can easily 
add them on the caller side. The separator is a space simply because it 
looks the least harmful. But for example I don't have a good solution 
for what to print as the separator between a hash key and a hash value. 
A simple, extensible locale support would have allowed me to stop 
worrying about that.

Also, D array printing is not only for console - a GUI may use to!string 
with arrays.

But overall I guess I'll let myself bludgeoned into complacency...


Andrei

Mar 02 2009

grauzone <none example.net> writes:

What is language specific about how an array is formatted? I think 
you're abusing the locale stuff as some kind of user customization 
mechanism for format().

Mar 02 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 *** How to print arrays ***

 You print arrays in a predictable and expected way.

 D array printing is for non-GUI stuff. Hence, you use the C locale,
 period.

 
 I think the C locale (or any predefined locale) tells what left bracket
 I should use for array, what separator, and what right bracket. For now
 the left and right brackets were eliminated because the user can easily
 add them on the caller side. The separator is a space simply because it
 looks the least harmful. But for example I don't have a good solution
 for what to print as the separator between a hash key and a hash value.
 A simple, extensible locale support would have allowed me to stop
 worrying about that.
 
 Also, D array printing is not only for console - a GUI may use to!string
 with arrays.
 
 But overall I guess I'll let myself bludgeoned into complacency...
 
 
 Andrei

As far as I'm concerned, an array should be printed as close to how it
would be represented in the language as possible.  If the user needs to
format the array, then they need to format the array, not the runtime.

  -- Daniel

Mar 02 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:

 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in 
 D. This is a non-urgent, low-priority thing.


 
 Had there been any need for locales, believe me, the "foreigners" in 
 this NG would have asked for it.

I'm Russian.  For me, encoding problems are a PITA of such epic
proportions that little format inconsistencies simply fade away.  Yes
it's sometimes hard to decipher what 02/03/08 means since our custom is
to put day first and separate with dots.  But compare this to Adobe Flex
SDK which prints half compiler error messages in Russian (thank you
Adobe!) using system default code page, 1251, while default /console/
code page is actually so-called IBM 866.  Whenever I use MXML compiler
from console I get rubbish for error messages.  And there is no way to
disable translation--I've found none.  Phobos is no better.  Any
exception resulting from an invalid OS call dumps UTF-8 garbage instead
of an error message.  std.file.read("non-existent") for instance.

I think games are not an issue.  I've worked for a company producing
cell phone games for a long time.  I've localized my game for Chinese
market, too.  The thing is, game interfaces are always custom, always
ad-hoc.  They *never* work in untested locales.  Well, with some
experience you can make them work most of the time in languages you are
familiar with, from localization perspective.  Anyway, all you need to
know is an ID of a supported locale so that you can replace text and
locale-specific images accordingly.  Then you have correctors and native
testing to make sure the localization works.

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:
 
 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in 
 D. This is a non-urgent, low-priority thing.


 Had there been any need for locales, believe me, the "foreigners" in 
 this NG would have asked for it.

 
 Phobos is no better.  Any
 exception resulting from an invalid OS call dumps UTF-8 garbage instead
 of an error message.  std.file.read("non-existent") for instance.

This is serendipitous. I just posted an example involving throwing a 
localized "File not found" exception. Please let me know whether that 
would help.

Andrei

Mar 02 2009

Yigal Chripun <yigal100 gmail.com> writes:

Sergey Gromov wrote:
 Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:

 Of course, eventually we will want to "do something" about this. But
 that should be left to the day when real issues are all sorted out in
 D. This is a non-urgent, low-priority thing.


 Had there been any need for locales, believe me, the "foreigners" in
 this NG would have asked for it.

 I'm Russian.  For me, encoding problems are a PITA of such epic
 proportions that little format inconsistencies simply fade away.  Yes
 it's sometimes hard to decipher what 02/03/08 means since our custom is
 to put day first and separate with dots.  But compare this to Adobe Flex
 SDK which prints half compiler error messages in Russian (thank you
 Adobe!) using system default code page, 1251, while default /console/
 code page is actually so-called IBM 866.  Whenever I use MXML compiler
 from console I get rubbish for error messages.  And there is no way to
 disable translation--I've found none.  Phobos is no better.  Any
 exception resulting from an invalid OS call dumps UTF-8 garbage instead
 of an error message.  std.file.read("non-existent") for instance.

 I think games are not an issue.  I've worked for a company producing
 cell phone games for a long time.  I've localized my game for Chinese
 market, too.  The thing is, game interfaces are always custom, always
 ad-hoc.  They *never* work in untested locales.  Well, with some
 experience you can make them work most of the time in languages you are
 familiar with, from localization perspective.  Anyway, all you need to
 know is an ID of a supported locale so that you can replace text and
 locale-specific images accordingly.  Then you have correctors and native
 testing to make sure the localization works.

encoding isn't that hard compared to other issues.
for instance, have you ever tried to make a website go both ways?

Mar 02 2009

Don <nospam nospam.com> writes:

Georg Wrede wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

 I disagree with being able to assign to the global defaultLocale. This 
 is going to cause endless problems. Just one is that any function that 
 uses locale can no longer be pure. defaultLocale should be immutable.

 
 The two programs that are most "locale aware" are usually spread sheets 
 and word processors.

And Microsoft products do "locale awareness" so badly, I'm pretty sure 
there's no simple solution. (<gripe> They could at least recognize that 
outside the US, everyone uses A4-size paper, not that bizarro 
letter/legal stuff </gripe>).

 It is usual that the user needs to write, say, in Swedish or in Russian, 
 while in a Finnish setting. Or that one wants to use a decimal separator 
 other than what is "proper" for the country.
 
 For example, a lot of people use "." instead of the official "," in 
 Finland, and many use time as "18:23" instead of "18.23".

This is my experience as well. There's an awful lot of expats in the world.

 For this purpose, these programs let the users define these any way they 
 want.
 
 I think the notion of locales is, slowly but steadily, going away.
 
 It was a nice idea at the time, but with two problems: users don't use 
 it, and programmers don't use it.

I think the whole idea is based on a fallacy: that there IS a locale.
The idea that you can choose which currency symbol to use, based on 
where the computer is, is utterly absurd. Surely these days, nearly 
everyone has to deal with the Euro, the US dollar, the Pound, and the 
Yen, as well as their local currency.

The world is international now, not local.

I nearly always end up setting the locale to "Antarctica", it turns off 
most the locale logic <g>. There's so many programs that try to be too 
clever.

 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in D. 
 This is a non-urgent, low-priority thing.

Mar 02 2009

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Don wrote:
 there's no simple solution. (<gripe> They could at least recognize that 
 outside the US, everyone uses A4-size paper, not that bizarro 
 letter/legal stuff </gripe>).

Amen!

 It is usual that the user needs to write, say, in Swedish or in 
 Russian, while in a Finnish setting. Or that one wants to use a 
 decimal separator other than what is "proper" for the country.

 For example, a lot of people use "." instead of the official "," in 
 Finland, and many use time as "18:23" instead of "18.23".

 
 This is my experience as well. There's an awful lot of expats in the world.

Not just expats.
For example: I was born & raised in the Netherlands, but even though 
officially we use a decimal comma here, I almost always use a decimal 
point instead. This may have been caused by use of the US keyboard 
layout (and its numeric keypad in particular), but I now even catch 
myself using it when writing with a pen...

 I nearly always end up setting the locale to "Antarctica", it turns off 
 most the locale logic <g>. There's so many programs that try to be too 
 clever.

lol :)

Mar 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing 
 about locales. (I know I dislike the design C++ uses.)


D uses Utf-8, and that is *good enough*!

This lets my programs "understand" Finnish, and doesn't give me undue 
headaches.


Seriously tending to locale issues would be an *endless swamp*. Just for 
this, I looked up something suitable to read:

http://www.manpagez.com/man/1/perllocale/

It may even be that you would find the time, but think about Walter and 
us, please. There *really are* other things to do.


An excellent string hierarchy without the entire rest of i18n, is only 
going to look like a Ferrari with a Trabant engine. Which is worse than 
nothing at all.

Besides, there's more to this than just designing the perfect, or even a 
good locale system in a language. *Somebody should actually use it*.

Now, the non-English programmer, what does he really want? He wants to 
be able to type stuff into his program in his native character set. D 
already does that, by way of Utf-8.

What else? Well, it is conceivable that he wants his program to print 
dates and times the way it's done over there. He simply writes the 
program "by hand" so it does dates and times like he wants. Even if 
there was a locale thing in the language, he wouldn't bother with the 
hassle. And he couldn't care less about Urdu.

The hypothetical Ambitious Programmer might want to use locale. He could 
then have the dates and times (and currencies, etc.) follow the country. 
Now, that might sound commendable, but in practice it *crumbles*.
He can't possibly know how to deal with languages that are written 
backwards, languages where several characters make one letter, exotic 
ways of writing dates, etc.

So, his fancy i18n project is doomed to be, at most, as usable as the 
"normal" D program. Probably less, since his decisions will actually 
worsen the user experience -- for users in another culture.


And, any project big enough to tackle this, will implement its own 
locale handling anyway. I'm sorry to say.

----

Yes, locales are nice and all.
For D 3.5 that is.
Honestly.

Mar 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing 
 about locales. (I know I dislike the design C++ uses.)

 
 
 D uses Utf-8, and that is *good enough*!
 
 This lets my programs "understand" Finnish, and doesn't give me undue 
 headaches.
 
 
 Seriously tending to locale issues would be an *endless swamp*. Just for 
 this, I looked up something suitable to read:
 
 http://www.manpagez.com/man/1/perllocale/
 
 It may even be that you would find the time, but think about Walter and 
 us, please. There *really are* other things to do.

I don't find that scary at all. It's quite what I expected. We should 
phase it in, after we do a good design. Also I don't plan to sit down 
and write locale definition files, I want to parse the XML in that 
locale repository I referred to.

 An excellent string hierarchy without the entire rest of i18n, is only 
 going to look like a Ferrari with a Trabant engine. Which is worse than 
 nothing at all.

I don't understand this. What is the rest of i18n?

 Besides, there's more to this than just designing the perfect, or even a 
 good locale system in a language. *Somebody should actually use it*.
 
 Now, the non-English programmer, what does he really want? He wants to 
 be able to type stuff into his program in his native character set. D 
 already does that, by way of Utf-8.
 
 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.

If we come up with a good design, then they will be compelled to use it. 
Applications meant to be used across multiple countries have fumbled 
with locale support because there's no good support in most languages. 
So then why not offer a compelling support in D?

 The hypothetical Ambitious Programmer might want to use locale. He could 
 then have the dates and times (and currencies, etc.) follow the country. 
 Now, that might sound commendable, but in practice it *crumbles*.
 He can't possibly know how to deal with languages that are written 
 backwards, languages where several characters make one letter, exotic 
 ways of writing dates, etc.

Well my understanding is that the guys who wrote those RFCs and whatnot 
spent time figuring out the right abstractions. Why not use them?

 So, his fancy i18n project is doomed to be, at most, as usable as the 
 "normal" D program. Probably less, since his decisions will actually 
 worsen the user experience -- for users in another culture.
 
 
 And, any project big enough to tackle this, will implement its own 
 locale handling anyway. I'm sorry to say.

They will implement their own because the language doesn't offer an 
extensible framework that they can build on.

 Yes, locales are nice and all.
 For D 3.5 that is.
 Honestly.

I just don't see where the big problem is. I'm talking about a blessed 
hierarchical hashtable to begin with. My initial desire is to be able to 
customize the array separators in writeln.


Andrei

Mar 01 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing 
 about locales. (I know I dislike the design C++ uses.)

 D uses Utf-8, and that is *good enough*!

 This lets my programs "understand" Finnish, and doesn't give me undue 
 headaches.

 Seriously tending to locale issues would be an *endless swamp*. Just 
 for this, I looked up something suitable to read:

 http://www.manpagez.com/man/1/perllocale/

 It may even be that you would find the time, but think about Walter 
 and us, please. There *really are* other things to do.

 
 I don't find that scary at all.

Maybe a quick skim doesn't let the issues sink in. :-)

 It's quite what I expected. We should 
 phase it in, after we do a good design. Also I don't plan to sit down 
 and write locale definition files, I want to parse the XML in that 
 locale repository I referred to.

My ex wife has this GPS thing in her car. Very nice. But once on the 
road, it's too much hassle to type in a street address. And you're 
always in a hurry, so you don't have time to type it in before driving, 
while you're stuffing the kids in the car.

 An excellent string hierarchy without the entire rest of i18n, is only 
 going to look like a Ferrari with a Trabant engine. Which is worse 
 than nothing at all.

 
 I don't understand this. What is the rest of i18n?

i18n stands for internationalisation. The word was too long to type.

Ah, or you meant the rest? That is, if there is this shiny repository 
right inside the language for storing these i18n preferences, then that 
does oblige us to have writefln, regexp, sort, and other stuff to 
recognise those values, right? Otherwise people will ask how come we 
have a car but no engine. And that is a job bigger than it looks like. 
But not doing it fully will have people feel D is less good than if we 
never had the repository at all!

Oh, and who wants writefln, regexp, sort, and the others to become 
slower? Hands up.

 Besides, there's more to this than just designing the perfect, or even 
 a good locale system in a language. *Somebody should actually use it*.

 Now, the non-English programmer, what does he really want? He wants to 
 be able to type stuff into his program in his native character set. D 
 already does that, by way of Utf-8.

 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.

 
 If we come up with a good design, then they will be compelled to use it. 

Gnome and KDE are both GUIs designed by "foreigners". i18n has been a 
*top priority* from the outset. Start a default project, and you have 
i18n "inbuilt" in your app.

And still, my default clock applet only lets me choose between 12 and 24 
hour clock, but the date is always "Mon Mar 2", and I can't get it to 
"020309", which I want. Or change it at all.

And while there are simply excellent provisions for having all your app 
strings in the local language, hardly any application actually has more 
than a couple language choices.

 Applications meant to be used across multiple countries have fumbled 
 with locale support because there's no good support in most languages. 
 So then why not offer a compelling support in D?

Nobody will use it. (People buy all these expensive workout machines 
they see on TV, and they never use them after two weeks.) i18n support 
is more than having your arrays print in peculiar ways overseas.

Ideally, you would translate the UI to several languages, take in 
consideration some cultural differences, and then have the library muck 
your strings and variables into the "local" representation.

Won't happen in a non-GUI program.

 The hypothetical Ambitious Programmer might want to use locale. He 
 could then have the dates and times (and currencies, etc.) follow the 
 country. Now, that might sound commendable, but in practice it 
 *crumbles*.

 He can't possibly know how to deal with languages that are written 
 backwards, languages where several characters make one letter, exotic 
 ways of writing dates, etc.

 
 Well my understanding is that the guys who wrote those RFCs and whatnot 
 spent time figuring out the right abstractions. Why not use them?

Because we don't have infinite time. Urgent, much asked for, 
technologically imperative, and other stuff should be done instead. 
There are both mundane and interesting tasks. Nice-to-haves come later.

 So, his fancy i18n project is doomed to be, at most, as usable as the 
 "normal" D program. Probably less, since his decisions will actually 
 worsen the user experience -- for users in another culture.


 And, any project big enough to tackle this, will implement its own 
 locale handling anyway. I'm sorry to say.

 
 They will implement their own because the language doesn't offer an 
 extensible framework that they can build on.

No, it's because they will only implement the parts that they're 
interested in. That's pretty easy to do for a big project. (If there 
will be one for a non-GUI purpose.)

 Yes, locales are nice and all.
 For D 3.5 that is.
 Honestly.

 
 I just don't see where the big problem is. I'm talking about a blessed 
 hierarchical hashtable to begin with. 

The  big problem is, SOMEONE will have to tell your XML table what 
values the user wants. Where is this knowledge stored in a way that 
every D app can get to it? And how do you force the user to populate the 
XMl table with his choices to begin with?

What I'm saying is, it's debatable whether this stuff belongs to "the 
programming language itself" at all. Rather, it should be an external 
library, provided by someone else than us. It belongs to SourceForge or 
Dsource, not here.

And definitely all this should be deferred to not 2.0, but to 2.5 or 
preferrably 3.0. If by that time we have seen that there actually is any 
use for such a thing, then we can decide whether to outsource it to 
anybody interested, or to actually try to make it part of the language.


I'm not saying it's impossible to do, or to do well. But I am saying it 
is *way* too insignificant to deserve any attention at this time.

 My initial desire is to be able to customize the array separators in
 writeln.

One might want to print arrays in different ways, even in the same 
program. Why not let the programmer customise the array printing the 
same as he does with integers and floats? Just a little addition to the 
syntax?

Or why not just have a print function that takes an array and a format? 
Arrays are different enough to not comfortably fit into writefln 
semantics anyway. Clean and practical, in a practical language.

Whatever you do, don't mix this with any internationalisation, please.

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 An excellent string hierarchy without the entire rest of i18n, is 
 only going to look like a Ferrari with a Trabant engine. Which is 
 worse than nothing at all.

 I don't understand this. What is the rest of i18n?

 
 i18n stands for internationalisation. The word was too long to type.
 
 Ah, or you meant the rest? That is, if there is this shiny repository 
 right inside the language for storing these i18n preferences, then that 
 does oblige us to have writefln, regexp, sort, and other stuff to 
 recognise those values, right? Otherwise people will ask how come we 
 have a car but no engine. And that is a job bigger than it looks like. 
 But not doing it fully will have people feel D is less good than if we 
 never had the repository at all!
 
 Oh, and who wants writefln, regexp, sort, and the others to become 
 slower? Hands up.

They will only be slower, by necessity, for people who want them 
localized, not for anyone else.

 Well my understanding is that the guys who wrote those RFCs and 
 whatnot spent time figuring out the right abstractions. Why not use them?

 
 Because we don't have infinite time. Urgent, much asked for, 
 technologically imperative, and other stuff should be done instead. 
 There are both mundane and interesting tasks. Nice-to-haves come later.

This is a misunderstanding. I am talking about a few dozens of lines of 
code that capitalize on Algebraic to structure the locale space. For 
starters I just want to e.g. allow people to configure how they 
stringize and print stuff from D. Hardcoding that kind of stuff, or the 
strings thrown in exceptions, does not sound too good.

 I just don't see where the big problem is. I'm talking about a blessed 
 hierarchical hashtable to begin with. 

 
 The  big problem is, SOMEONE will have to tell your XML table what 
 values the user wants. Where is this knowledge stored in a way that 
 every D app can get to it? And how do you force the user to populate the 
 XMl table with his choices to begin with?

You see, we're not communicating. I sent this link:

http://www.unicode.org/cldr/

Did you look at it? It is essentially a database of locale information 
in a highly structured format. All I want is to define a structure 
expressive enough to gobble the part of that database that is of 
interest. The Phobos documentation will say, we just adopt their schema. 
If users don't want to load any, then fine - everything is just like today.

 What I'm saying is, it's debatable whether this stuff belongs to "the 
 programming language itself" at all. Rather, it should be an external 
 library, provided by someone else than us. It belongs to SourceForge or 
 Dsource, not here.

http://www.unicode.org/cldr/

We just need to load it if there is such a need.

 And definitely all this should be deferred to not 2.0, but to 2.5 or 
 preferrably 3.0. If by that time we have seen that there actually is any 
 use for such a thing, then we can decide whether to outsource it to 
 anybody interested, or to actually try to make it part of the language.
 
 
 I'm not saying it's impossible to do, or to do well. But I am saying it 
 is *way* too insignificant to deserve any attention at this time.

You and I have completely different understandings of the level of 
effort needed. It's not like I don't have anything to do. :o)

Let me try again: I don't want to define locale support. I want to 
provide the basics for people to roll it out themselves.



Andrei

Mar 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 You see, we're not communicating. I sent this link:
 
 http://www.unicode.org/cldr/
 
 Did you look at it? It is essentially a database of locale information 
 in a highly structured format. All I want is to define a structure 
 expressive enough to gobble the part of that database that is of 
 interest. The Phobos documentation will say, we just adopt their schema. 
 If users don't want to load any, then fine - everything is just like today.

I read the page. It says "This data is used by a wide spectrum of 
companies for their software internationalization and localization".

The first link in the text part is to the CLDR Overview ppt. I read it. 
On page 5 it says:

"Companies / Organizations
Adobe, Apple (Mac OS X), abas Software, Ascential Software, Avaya, BEA, 
BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, 
CERN, ClearCommerce, Cognos, Debian Linux, D programming language, 
Gentoo Linux, GNU Classpath, HP, Hyperion, IBM, Inktomi, Innodata 
Isogen, Isogon, Informatica, Intel, Interlogics, IONA, IXOS, Macromedia, 
Mathworks, OpenOffice, Language Analysis Systems, Lawson Software, Leica 
Geosystems GIS & Mapping LLC, Mandrake Linux, Novell (SuSE), Optio 
Software, PayPal, Progress Software, Python, QNX, Quark, Rogue Wave, 
SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), 
Sybase, Teradata (NCR), Trados, Trend Micro, Virage, webMethods, WMS 
Gaming, Xerox, Yahoo!, and many more�"

One sees here major companies, operating systems, and three languages: 
D, Python and Java. The page is from 2005.

So D "has had this since at least 2005". What can I say? I guess we have 
to implement it then...

 What I'm saying is, it's debatable whether this stuff belongs to "the 
 programming language itself" at all. Rather, it should be an external 
 library, provided by someone else than us. It belongs to SourceForge 
 or Dsource, not here.

 
 http://www.unicode.org/cldr/
 
 We just need to load it if there is such a need.

In another post you sounded as if there is a connection between this 
stuff and printing arrays. I'm not sure I see the connection.

 Let me try again: I don't want to define locale support. I want to 
 provide the basics for people to roll it out themselves.

I downloaded the files in http://unicode.org/Public/cldr/1.6.1/ which 
were core.zip, posix.zip, tests.zip and tools.zip. They unzipped to 
140MB, containing some 200 java files and some 800 xml files, among others.

The readme.txt in tools.zip says:

"The code is very preliminary, so don't expect stability from the APIs 
(or documentation!), since we still have to work out how we want to do 
the architecture."

The main web page says "CLDR 1.7 Tentative Schedule: 2008-09", but it 
still isn't on the download page. The last version is 2008-07-23 
Version1.6.1.

==============

My take:

  * This is still a moving target
  * Using this is a major hassle for the programmer
  * With D2 itelf a moving target, nobody is going to invest enough time 
in this to actually use it for something worthwhile in the next 6 to 12 
months anyway
  * This is more application level stuff than language level stuff
  * Doing this now will steal time from you, Walter, and many of us, 
both directly, and indirectly by leaching bandwidth in the newsgroup -- 
time that should be spent on more urgent or more important things, or 
even documentation
  * If it's so easy to do, then why not do it a week before the release 
of final D2

I really can't help it, but this is how I see it.

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 You see, we're not communicating. I sent this link:

 http://www.unicode.org/cldr/

 Did you look at it? It is essentially a database of locale information 
 in a highly structured format. All I want is to define a structure 
 expressive enough to gobble the part of that database that is of 
 interest. The Phobos documentation will say, we just adopt their 
 schema. If users don't want to load any, then fine - everything is 
 just like today.

 
 I read the page. It says "This data is used by a wide spectrum of 
 companies for their software internationalization and localization".
 
 The first link in the text part is to the CLDR Overview ppt. I read it. 
 On page 5 it says:
 
 "Companies / Organizations
 Adobe, Apple (Mac OS X), abas Software, Ascential Software, Avaya, BEA, 
 BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, 
 CERN, ClearCommerce, Cognos, Debian Linux, D programming language, 
 Gentoo Linux, GNU Classpath, HP, Hyperion, IBM, Inktomi, Innodata 
 Isogen, Isogon, Informatica, Intel, Interlogics, IONA, IXOS, Macromedia, 
 Mathworks, OpenOffice, Language Analysis Systems, Lawson Software, Leica 
 Geosystems GIS & Mapping LLC, Mandrake Linux, Novell (SuSE), Optio 
 Software, PayPal, Progress Software, Python, QNX, Quark, Rogue Wave, 
 SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), 
 Sybase, Teradata (NCR), Trados, Trend Micro, Virage, webMethods, WMS 
 Gaming, Xerox, Yahoo!, and many more�"
 
 One sees here major companies, operating systems, and three languages: 
 D, Python and Java. The page is from 2005.
 
 So D "has had this since at least 2005". What can I say? I guess we have 
 to implement it then...

Hehe, didn't see that.

 What I'm saying is, it's debatable whether this stuff belongs to "the 
 programming language itself" at all. Rather, it should be an external 
 library, provided by someone else than us. It belongs to SourceForge 
 or Dsource, not here.

 http://www.unicode.org/cldr/

 We just need to load it if there is such a need.

 
 In another post you sounded as if there is a connection between this 
 stuff and printing arrays. I'm not sure I see the connection.

Very simple. If we have a locale table, I am thinking of dedicating a 
branch "std" in it to stuff that's in std. For example, I can use 
currentLocale.get("std", "array-separator") or something.

 Let me try again: I don't want to define locale support. I want to 
 provide the basics for people to roll it out themselves.

 
 I downloaded the files in http://unicode.org/Public/cldr/1.6.1/ which 
 were core.zip, posix.zip, tests.zip and tools.zip. They unzipped to 
 140MB, containing some 200 java files and some 800 xml files, among others.
 
 The readme.txt in tools.zip says:
 
 "The code is very preliminary, so don't expect stability from the APIs 
 (or documentation!), since we still have to work out how we want to do 
 the architecture."
 
 The main web page says "CLDR 1.7 Tentative Schedule: 2008-09", but it 
 still isn't on the download page. The last version is 2008-07-23 
 Version1.6.1.
 
 ==============
 
 My take:
 
  * This is still a moving target
  * Using this is a major hassle for the programmer
  * With D2 itelf a moving target, nobody is going to invest enough time 
 in this to actually use it for something worthwhile in the next 6 to 12 
 months anyway
  * This is more application level stuff than language level stuff
  * Doing this now will steal time from you, Walter, and many of us, both 
 directly, and indirectly by leaching bandwidth in the newsgroup -- time 
 that should be spent on more urgent or more important things, or even 
 documentation
  * If it's so easy to do, then why not do it a week before the release 
 of final D2
 
 I really can't help it, but this is how I see it.

I understand.


Andrei

Mar 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 In another post you sounded as if there is a connection between this 
 stuff and printing arrays. I'm not sure I see the connection.

 
 Very simple. If we have a locale table, I am thinking of dedicating a 
 branch "std" in it to stuff that's in std. For example, I can use 
 currentLocale.get("std", "array-separator") or something.

Let's disconnect the two.

What if you were to document the data structure Algebraic, and mention 
that D itself uses an instance of it already to store array printing 
parameters.

You could also mention that an Algebraic would be a great place to store 
Unicode CLDR data, and other (Windows) registry type things.

And then, in the examples directory one could find a program that reads 
the CLDR xml (or of course a suitable snippet of it) and populates an 
instance of Algebraic. And then this example program would do something 
small but useful with the data.

Mar 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Georg Wrede wrote:
 So D "has had this since at least 2005". What can I say? I guess we have 
 to implement it then...

Wow, D usually gets slammed for not having a feature that even a cursory 
glance at the documentation shows it has. This is the first vaporware 
feature!

Mar 02 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Mon, Mar 2, 2009 at 1:52 PM, Georg Wrede <georg.wrede iki.fi> wrote:
 My take:

 =A0* This is still a moving target
 =A0* Using this is a major hassle for the programmer
 =A0* With D2 itelf a moving target, nobody is going to invest enough time=

 in
 this to actually use it for something worthwhile in the next 6 to 12 mont=

hs
 anyway
 =A0* This is more application level stuff than language level stuff
 =A0* Doing this now will steal time from you, Walter, and many of us, bot=

h
 directly, and indirectly by leaching bandwidth in the newsgroup -- time t=

hat
 should be spent on more urgent or more important things, or even
 documentation
 =A0* If it's so easy to do, then why not do it a week before the release =

of
 final D2

I agree entirely.  Localization and internationalization seem like
things that should be at a much higher level than a standard library.
Everyone's going to want to do it differently.  Providing a thin,
cross-platform wrapper over what the OS exposes is fine, but creating
a proper i18n/l10n framework is a huge project in and of itself (I
think the 140MB Java package makes that abundantly clear).

I'd much rather see a rewritten std.stream and proper Unicode support
in std.string (support for types other than string, functions for
indexing and slicing on character boundaries) before this.

Mar 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jarrett Billingsley wrote:
 functions for
 indexing and slicing on character boundaries) before this.

These already exist in std.uni.

Mar 02 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Mon, Mar 2, 2009 at 3:48 PM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Jarrett Billingsley wrote:
 functions for
 indexing and slicing on character boundaries) before this.

 These already exist in std.uni.

It's std.utf, but good to know.

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jarrett Billingsley wrote:
 On Mon, Mar 2, 2009 at 1:52 PM, Georg Wrede <georg.wrede iki.fi> wrote:
 My take:

  * This is still a moving target
  * Using this is a major hassle for the programmer
  * With D2 itelf a moving target, nobody is going to invest enough time in
 this to actually use it for something worthwhile in the next 6 to 12 months
 anyway
  * This is more application level stuff than language level stuff
  * Doing this now will steal time from you, Walter, and many of us, both
 directly, and indirectly by leaching bandwidth in the newsgroup -- time that
 should be spent on more urgent or more important things, or even
 documentation
  * If it's so easy to do, then why not do it a week before the release of
 final D2

 
 I agree entirely.  Localization and internationalization seem like
 things that should be at a much higher level than a standard library.
 Everyone's going to want to do it differently.  Providing a thin,
 cross-platform wrapper over what the OS exposes is fine, but creating
 a proper i18n/l10n framework is a huge project in and of itself (I
 think the 140MB Java package makes that abundantly clear).

I must be missing something huge because I keep on misunderestimating 
(sic :o)) the scope of this project.

Let me try to state my point again: I don't want to provide 
locale-specific strings, collation orders, date, time, and number 
formatters, or class hierarchies that do all of the above. Zip. Nada. Zilch.

I want to put together a string-based hierarchical string table that 
allows depositing ALL OF THE ABOVE in it, without initially putting 
ANYTHING in it. What's nice is that others have already defined the keys 
and the possible values used by that table.

Possibly you are missing one or more of the following points:

1) The existence of a hierarchical nomenclature for localization;

2) The existence of a large database containing localized values for 
said nomenclature;

2) The power of Algebraic, which allows depositing data, functions, and 
subtables alike in a uniform format.

 I'd much rather see a rewritten std.stream and proper Unicode support
 in std.string (support for types other than string, functions for
 indexing and slicing on character boundaries) before this.

That, incidentally, is more complicated :o).


Andrei

Mar 02 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-03-02 16:42:37 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I want to put together a string-based hierarchical string table that 
 allows depositing ALL OF THE ABOVE in it, without initially putting 
 ANYTHING in it. What's nice is that others have already defined the 
 keys and the possible values used by that table.
 
 Possibly you are missing one or more of the following points:
 
 1) The existence of a hierarchical nomenclature for localization;
 
 2) The existence of a large database containing localized values for 
 said nomenclature;
 
 2) The power of Algebraic, which allows depositing data, functions, and 
 subtables alike in a uniform format.

What I'm missing is a justification as of why you need all this data in 
a common deposit in the first place. How do you justify the need for 
that? Which function needs this data and why using an Algebraic makes 
it better than other approaches.

As for the large database, I have nothing with using an existing large 
database, but I'd rather see my app use whatever is part of the 
underlying OS first, then rely on an external database if that is 
insuficient.

Your approach seems to be this: Unicode defines a huge database 
containing all kinds of locale information, let's expose that, allow 
other people to plug their own data inside, and use that as the 
standard format for passing locale data to various functions.

I only oppose the last part -- the "use that as the standard format for 
passing locale data to various functions" part. That you're using 
Algebraic does not change that various functions will search data at 
some places in the structure. If the data isn't there, because you want 
to some other formatting system, you'll get wrong results.

Perhaps you should explain more how you see this used in the context 
where we want to localize some data, how we can use it to define our 
own data, etc. Because this dicussion is lost in generalities and vague 
ideas right now.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Jarrett Billingsley wrote:
 On Mon, Mar 2, 2009 at 1:52 PM, Georg Wrede <georg.wrede iki.fi> wrote:
 My take:

  * This is still a moving target
  * Using this is a major hassle for the programmer
  * With D2 itelf a moving target, nobody is going to invest enough 
 time in
 this to actually use it for something worthwhile in the next 6 to 12 
 months
 anyway
  * This is more application level stuff than language level stuff
  * Doing this now will steal time from you, Walter, and many of us, both
 directly, and indirectly by leaching bandwidth in the newsgroup -- 
 time that
 should be spent on more urgent or more important things, or even
 documentation
  * If it's so easy to do, then why not do it a week before the 
 release of
 final D2

 I agree entirely.  Localization and internationalization seem like
 things that should be at a much higher level than a standard library.
 Everyone's going to want to do it differently.  Providing a thin,
 cross-platform wrapper over what the OS exposes is fine, but creating
 a proper i18n/l10n framework is a huge project in and of itself (I
 think the 140MB Java package makes that abundantly clear).

 
 I must be missing something huge because I keep on misunderestimating 
 (sic :o)) the scope of this project.

I agree. :-)

 Let me try to state my point again: I don't want to provide 
 locale-specific strings, collation orders, date, time, and number 
 formatters, or class hierarchies that do all of the above. Zip. Nada. 
 Zilch.
 
 I want to put together a string-based hierarchical string table that 
 allows depositing ALL OF THE ABOVE in it, without initially putting 
 ANYTHING in it. What's nice is that others have already defined the keys 
 and the possible values used by that table.

One of the problems is, people start expecting something if they find 
this string repository. They'd expect some of the work you said you 
don't provide, done. And if the table isn't even *prepopulated*, then 
people really feel stranded. It doesn't help much to state in the docs 
"if you need to fill it goto http://whatever, and hope the format hasn't 
changed".

Besides, on that site, what exactly should be downloaded is unobvious 
enough that the new user will probably not bother. Nor the normal app 
programmer.

 Possibly you are missing one or more of the following points:
 
 1) The existence of a hierarchical nomenclature for localization;

With a hammer in hand, everything looks like a nail. With a swiss army 
knife in your hand, nothing in the house is safe.

 2) The existence of a large database containing localized values for 
 said nomenclature;

So where will this be stored? In a .dmdrc directory in the user's home? 
One per system? Or every app stores it in a .ini file? Is this per app 
or common to all user's apps?

And when it's updated (by who?), will all his own settings vanish? Or is 
there a mechanism (or does he have to invent one?) for reattaching his 
own settings after the update?

 2) The power of Algebraic, which allows depositing data, functions, and 
 subtables alike in a uniform format.

Seriously however, Algebraic does sound cool! No question.

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
[snip]

Well I guess what I'll do is take the path of least resistance - 
nothing. Looks like locales are rather unpopular...

Actually I will do something. I'll start removing some of the silly 
Exception derivees from std.


Andrei

Mar 02 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-03-02 23:27:49 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Georg Wrede wrote:
 [snip]
 
 Well I guess what I'll do is take the path of least resistance - 
 nothing. Looks like locales are rather unpopular...

Sad.

Seriously, I think if D could have locales as a standard feature it'd 
be great. Supporting various locales is often a must when you deploy an 
application, and when libraries try to do it differently you find 
yourself in a mess.

One thing I dislike in your approach is that you're designing the 
underlying data storage system before considering the API we're going 
to use. What we need the most is a standard API for localizing the 
display and input of data, and I somewhat fail to see how storing all 
the localization parameters in an Algebraic solves this problem.

I mean, let's say you want to output a localized message, perhaps we 
could do this:

	writefln("Hello number %f", 123456.44);
	writefln(localize("Hello number %f"), 123456.44); // default locale

	Locale fr = locale("fr");
	writefln(localize("Hello number %f", fr), 123456.44);

and expect this output:

	Hello number 123456.44
	Hello number 123,456.44
	Bonjour num�ro 123�456,44

?

That'd be an interesting feature. But as of yet I have no idea how 
we're supposed to use all that locale information you want to keep in 
your algebraic type; you haven't provided much examples like the one 
above where all you want is to format a string and a number. Perhaps 
it's clear in your head, but to me it's vague. Exposing all this locale 
data is useless if it isn't supported by the library's functions.

What I wrote above could work that way: localize(...) returns a 
FormattedString!(...) struct template containing a string and a 
templated function "format" for formatting its arguments. writefln 
being a templated function, it'll call toString on its first argument 
and check if it provide a "format" function, and if it does it passes 
all the other arguments through it before output.

The "localize" function could be overloaded to accept various types of 
locales. Including, but not limited to, your Algebraic locale data.

The downside of this approach is that it requires functions accepting a 
locale or a localized formatted string to be templates.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 03 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  1 de marzo a las 19:40 me escribiste:
 Georg Wrede wrote:
Andrei Alexandrescu wrote:
Sooner or later that will need to be defined. I know next to nothing about
locales. (I know I dislike the design C++ uses.)

D uses Utf-8, and that is *good enough*!
This lets my programs "understand" Finnish, and doesn't give me undue headaches.
Seriously tending to locale issues would be an *endless swamp*. Just for this,
I looked up something suitable to read:
http://www.manpagez.com/man/1/perllocale/
It may even be that you would find the time, but think about Walter and us,
please. There *really are* other things to do.

 
 I don't find that scary at all. It's quite what I expected. We should phase it
in, after we do a good design. Also I don't plan to sit down and write locale 
 definition files, I want to parse the XML in that locale repository I referred
to.

I'm not following this thread carefully and I don't know if this is what
you are implying, but: Please don't you even think in duplicating the
locale stuff, at least on unix there is a very nice database that needs to
be updated sometimes very often (due to stupid presidents like the one
I have now that changes the summer saving time all the time).

PHP for example maintains a copy of this locale data and is a real PITA.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
The average person laughs 13 times a day

Mar 02 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-03-02 08:32:40 -0500, Leandro Lucarella <llucax gmail.com> said:

 I'm not following this thread carefully and I don't know if this is what
 you are implying, but: Please don't you even think in duplicating the
 locale stuff, at least on unix there is a very nice database that needs to
 be updated sometimes very often (due to stupid presidents like the one
 I have now that changes the summer saving time all the time).
 
 PHP for example maintains a copy of this locale data and is a real PITA.

I do agree.

In another post I proposed we create formatter classes for numbers and 
dates. This way, you can use a formatter binding to the UNIX database 
and APIs, or the Windows APIs, or Cocoa, etc., or you can build your 
own. All you need is a generic front end formatter interface you can 
bind to anything (and a common internal representation for dates) 
something like:

	interface DateFormatter
	{
		string timestampToString(int timestamp);
		int stringToTimestamp(string date);
	}

	DateFormatter defaultDateFormatter();
	DateFormatter dateFormatterForLocale(string localeName);

	interface NumberFormatter
	{
		string intToString(int number);
		int stringToInt(string number);
	}

	NumberFormatter defaultNumberFormatter();
	NumberFormatter numberFormatterForLocale(string localeName);

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2009-03-02 08:32:40 -0500, Leandro Lucarella <llucax gmail.com> said:
 
 I'm not following this thread carefully and I don't know if this is what
 you are implying, but: Please don't you even think in duplicating the
 locale stuff, at least on unix there is a very nice database that 
 needs to
 be updated sometimes very often (due to stupid presidents like the one
 I have now that changes the summer saving time all the time).

 PHP for example maintains a copy of this locale data and is a real PITA.

 
 I do agree.
 
 In another post I proposed we create formatter classes for numbers and 
 dates. This way, you can use a formatter binding to the UNIX database 
 and APIs, or the Windows APIs, or Cocoa, etc., or you can build your 
 own. All you need is a generic front end formatter interface you can 
 bind to anything (and a common internal representation for dates) 
 something like:
 
     interface DateFormatter
     {
         string timestampToString(int timestamp);
         int stringToTimestamp(string date);
     }
 
     DateFormatter defaultDateFormatter();
     DateFormatter dateFormatterForLocale(string localeName);
 
     interface NumberFormatter
     {
         string intToString(int number);
         int stringToInt(string number);
     }
 
     NumberFormatter defaultNumberFormatter();
     NumberFormatter numberFormatterForLocale(string localeName);
 

This is exactly one thing I want to avoid for Phobos: defining class 
hierarchies for locales.

No.

If you want to provide a specific date formatter, you plant a delegate 
in the locale table. The code in Phobos doing formatting will detect 
that and call your delegate passing in the date. You do whatever you 
want on your side (format on the spot, use your own class hierarchy etc.)

Again: mechanism only. Not policy.


Andrei

Mar 02 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 If you want to provide a specific date formatter, you plant a delegate 
 in the locale table. The code in Phobos doing formatting will detect 
 that and call your delegate passing in the date. You do whatever you 
 want on your side (format on the spot, use your own class hierarchy etc.)
 
 Again: mechanism only. Not policy.
 
 
 Andrei

Weak typing for the win!

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 If you want to provide a specific date formatter, you plant a delegate 
 in the locale table. The code in Phobos doing formatting will detect 
 that and call your delegate passing in the date. You do whatever you 
 want on your side (format on the spot, use your own class hierarchy etc.)

 Again: mechanism only. Not policy.


 Andrei

 
 Weak typing for the win!

Yes. Sometimes it's exactly what the doctor prescribed, as I believe is 
in this case.

Andrei

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el  1 de marzo a las 19:40 me escribiste:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing about
locales. (I know I dislike the design C++ uses.)

 D uses Utf-8, and that is *good enough*!
 This lets my programs "understand" Finnish, and doesn't give me undue
headaches.
 Seriously tending to locale issues would be an *endless swamp*. Just for this,
I looked up something suitable to read:
 http://www.manpagez.com/man/1/perllocale/
 It may even be that you would find the time, but think about Walter and us,
please. There *really are* other things to do.

 I don't find that scary at all. It's quite what I expected. We should phase it
in, after we do a good design. Also I don't plan to sit down and write locale 
 definition files, I want to parse the XML in that locale repository I referred
to.

 
 I'm not following this thread carefully and I don't know if this is what
 you are implying, but: Please don't you even think in duplicating the
 locale stuff, at least on unix there is a very nice database that needs to
 be updated sometimes very often (due to stupid presidents like the one
 I have now that changes the summer saving time all the time).
 
 PHP for example maintains a copy of this locale data and is a real PITA.
 

You're right, we won't engage in the business of maintaining locale 
databases. We provide mechanism, not policy.

Andrei

Mar 02 2009

Derek Parnell <derek psych.ward> writes:

On Mon, 02 Mar 2009 06:28:12 -0800, Andrei Alexandrescu wrote:


 You're right, we won't engage in the business of maintaining locale 
 databases. We provide mechanism, not policy.

Ok, for awhile there I thought you were attempting to duplicate the efforts
that the operating systems already do. 

I see locale support in D as being a platform-independant method of
invoking existing operating system functionality.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Mar 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Georg Wrede wrote:
 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.

I've attempted to use locales, but the reason I'd always wind up doing 
it by hand is because the existing libraries to do it are obtuse, 
impenetrable, execrable, and pretty much unusable.

So it may be that it's an insoluble problem, or maybe nobody has come up 
with the right abstraction yet. I don't have nearly enough experience 
with it to know the answer.

Mar 01 2009

"Joel C. Salomon" <joelcsalomon gmail.com> writes:

Walter Bright wrote:
 I've attempted to use locales, but the reason I'd always wind up doing
 it by hand is because the existing libraries to do it are obtuse,
 impenetrable, execrable, and pretty much unusable.
 
 So it may be that it's an insoluble problem, or maybe nobody has come up
 with the right abstraction yet. I don't have nearly enough experience
 with it to know the answer.

Sounds like it’s not yet suitable for D2, then, at least not in std.
Perhaps put an experimental interface in ext?

—Joel Salomon

Mar 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Joel C. Salomon wrote:
 Walter Bright wrote:
 I've attempted to use locales, but the reason I'd always wind up doing
 it by hand is because the existing libraries to do it are obtuse,
 impenetrable, execrable, and pretty much unusable.

 So it may be that it's an insoluble problem, or maybe nobody has come up
 with the right abstraction yet. I don't have nearly enough experience
 with it to know the answer.

 
 Sounds like it�s not yet suitable for D2, then, at least not in std.
 Perhaps put an experimental interface in ext?

Good idea. But before we do so, I was hoping I'd pick the brains of 
people who have used locales in other languages and understand the 
burning points. Somehow, however, I'm doing a lousy job at eliciting 
contributions from people on this newsgroup (guess I'd be a lousy 
salesman). I tried a couple of times and all I got was a few new keyword 
proposals and a few new syntax proposals :o). What am I doing wrong?

Andrei

Mar 01 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Good idea. But before we do so, I was hoping I'd pick the brains of 
 people who have used locales in other languages and understand the 
 burning points. Somehow, however, I'm doing a lousy job at eliciting 
 contributions from people on this newsgroup (guess I'd be a lousy 
 salesman). I tried a couple of times and all I got was a few new 
 keyword proposals and a few new syntax proposals :o). What am I doing 
 wrong?

I think there are three aspects to localization. One is date and number 
formating. Another is offering a facility for translating all the 
messages an application can give. And the last one is the configuration 
part, where you know which format to use.

The only problem I've seen addressed by you right now is the 
configuration part; I believe it's the wrong end to start with.

We should start by defining how to perform the tasks I enumerated 
above: translating date and number formats, selecting strings for a 
given language. After that we can figure out how to pass the proper 
default configuration around. And then you're done.

For date and number formatting, I like very much the NSDateFormatter 
and NSNumberFormatter approach in Cocoa for instance: you have a base 
class to format dates, another for numbers; you can easily create your 
own subclass if you want, and there's a way to get the default 
formatter instance.

This is extensible, because if you wanted to go further, you could add 
formatter classes for various units (length, mass...), or anything else.

Translating strings is a little harder because 1) strings are 
application-defined, 2) strings are often not available in the user's 
prefered language, adding the need for a fallback mecanism, and 3) 
different applications will want to to store those strings in different 
ways. Perhaps we could define a base class for getting translated 
strings, then allow the program to use whatever subclass it wants.

Notice how I'm not using the word "locale" to talk about these things. 
"Locale" is a concept too abstract to be able to do something good with 
it. Since you could only define it using Algebraic type and a loosely 
defined tree of strings, that seems to confirm my view. Call the module 
std.locale if you want, but keep in mind that the most important task 
at hand is facilitating localization, not defining what constitutes a 
locale, that can wait.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 02 2009

Leandro Lucarella <llucax gmail.com> writes:

Michel Fortin, el  2 de marzo a las 07:30 me escribiste:
 On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> said:
 
Good idea. But before we do so, I was hoping I'd pick the brains of people who
have used locales in other languages and understand the burning points.
Somehow, 
however, I'm doing a lousy job at eliciting contributions from people on this
newsgroup (guess I'd be a lousy salesman). I tried a couple of times and all I
got 
was a few new keyword proposals and a few new syntax proposals :o). What am I
doing wrong?

 
 I think there are three aspects to localization. One is date and number
 formating. Another is offering a facility for translating all the
 messages an application can give. And the last one is the configuration
 part, where you know which format to use.

I think you are confusing localization (l10n) with internationalization
(i18n)[1]. Locales is about l10n, it's numbers and date formats, time zones,
etc. i18n is about translations.

I've used the standard C API for localization and I found it quite simple
and good. What's wrong with it?

I've used gettext[1] too (which is almost a de-facto standard in unix),
and even when it could be improved I think it does a pretty good job, and
it has a lot of very subtle problems solved.

I think l10n and i18n should be taken with a lot of care, because it's
very hard to get it right (like concurrency ;). There are a lot of rough
edges and exceptions to thing that at first sight looks so universal that
makes very easy to make a bad desing (like plural forms[3]). The gettext
manual[4] is a great source to see how big this is. Gettext is supported
in most major programming languages, so I think D could greatly benefit
from using it
too.

[1] http://en.wikipedia.org/wiki/Internationalization_and_localization
[2] http://www.gnu.org/software/gettext/
[3] http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
[4] http://www.gnu.org/software/gettext/manual/gettext.html

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
I always get the feeling that when lesbians look at me, they're thinking,
'*That's* why I'm not a heterosexual.'
	-- George Constanza

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Good idea. But before we do so, I was hoping I'd pick the brains of 
 people who have used locales in other languages and understand the 
 burning points. Somehow, however, I'm doing a lousy job at eliciting 
 contributions from people on this newsgroup (guess I'd be a lousy 
 salesman). I tried a couple of times and all I got was a few new 
 keyword proposals and a few new syntax proposals :o). What am I doing 
 wrong?

 
 I think there are three aspects to localization. One is date and number 
 formating. Another is offering a facility for translating all the 
 messages an application can give. And the last one is the configuration 
 part, where you know which format to use.

Sounds like a good start.

 The only problem I've seen addressed by you right now is the 
 configuration part; I believe it's the wrong end to start with.
 
 We should start by defining how to perform the tasks I enumerated above: 
 translating date and number formats, selecting strings for a given 
 language. After that we can figure out how to pass the proper default 
 configuration around. And then you're done.
 
 For date and number formatting, I like very much the NSDateFormatter and 
 NSNumberFormatter approach in Cocoa for instance: you have a base class 
 to format dates, another for numbers; you can easily create your own 
 subclass if you want, and there's a way to get the default formatter 
 instance.

Well I was thinking of passing the buck around. Instead of std.locale 
defining a hierarchy for formatting numbers and dates, it provides a 
means for user code to plant a routine in the locale object that knows 
how to format numbers and dates. Of course, with time default localized 
routine implementations will show up (hopefully contributed to by 
people), but the basic mechanism is simple - there exists a locale table 
that allows you to store a delegate in it.

 This is extensible, because if you wanted to go further, you could add 
 formatter classes for various units (length, mass...), or anything else.

This I want to avoid, at least for the time being. I want to define a 
table that can contain strings, integers, delegates, and other 
sub-tables. This is it. The path to extensibility will not be Phobos 
defining new classes to format various things. This could go on forever. 
Phobos will use the table consistently, and users who do want to format 
various things will simply plant their delegates in the table.

 Translating strings is a little harder because 1) strings are 
 application-defined, 2) strings are often not available in the user's 
 prefered language, adding the need for a fallback mecanism, and 3) 
 different applications will want to to store those strings in different 
 ways. Perhaps we could define a base class for getting translated 
 strings, then allow the program to use whatever subclass it wants.

There's no need for classes and subclasses. It's all data. Why should we 
replace data with code? Data is easier.

Consider some code in phobos that must throw an exception:

throw Exception("File `%s' not found, system error is %s.",
     filename, errnomsg);

The localized version will look like this:

auto format = "File `%s' not found, system error is %s.";
auto localFormat = currentLocale ? currentLocale.peek(format) : null;
if (!localFormat) localFormat = format;
throw Exception(localFormat, filename, errnomsg);

What happens is that the default format string _is_ the key for looking 
up the localized strings. If there's no value for that string, the 
default format string is in vigor. Note that on the default path, 
currentLocale is null so there is hardly any inefficiency.

 Notice how I'm not using the word "locale" to talk about these things. 
 "Locale" is a concept too abstract to be able to do something good with 
 it. Since you could only define it using Algebraic type and a loosely 
 defined tree of strings, that seems to confirm my view. Call the module 
 std.locale if you want, but keep in mind that the most important task at 
 hand is facilitating localization, not defining what constitutes a 
 locale, that can wait.
 

How should I call it?


Andrei

Mar 02 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:

 Consider some code in phobos that must throw an exception:
 
 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);
 
 The localized version will look like this:
 
 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

This example does not address the encoding problem.  Currently, errnomsg
is in Russian, UTF-8 encoded.  So I get "system error is <garbage>" on
the console.  If you adopt locales I'll get garbage not only for the
system error but for the rest of the exception message as well.

To actually solve this problem the default exception handler must be
fixed to convert any UTF-8 into the current OEM code page before
printing.  It would also help if default stdin and stdout performed such
a conversion.

 What happens is that the default format string _is_ the key for looking 
 up the localized strings.

Nice.  This means that error messages become a part of API and are
subject to backward and forward compatibility issues.  Isn't it too
much?

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:
 
 Consider some code in phobos that must throw an exception:

 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);

 The localized version will look like this:

 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

 
 This example does not address the encoding problem.  Currently, errnomsg
 is in Russian, UTF-8 encoded.  So I get "system error is <garbage>" on
 the console.  If you adopt locales I'll get garbage not only for the
 system error but for the rest of the exception message as well.
 
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

I see.

 What happens is that the default format string _is_ the key for looking 
 up the localized strings.

 
 Nice.  This means that error messages become a part of API and are
 subject to backward and forward compatibility issues.  Isn't it too
 much?

I think it isn't too much, considering the sorry state of affairs of 
today's exceptions. You can't even answer the question: "Given this 
FileException object, what file name was concerned?" And each module 
defines its own exception class that is equally useless. It's 
ridiculous. 95% of them must be removed. And we must have systematic 
formatting of all strings initiated by Phobos.


Andrei

Mar 02 2009

Rainer Deyke <rainerd eldwood.com> writes:

Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

No, stdin/stdout *must* perform this conversion.  It is a serious bug if
they don't.

The conversion cannot be performed at any other level.  D uses unicode
internally.  The console uses a specific encoding.  Therefore all data
passing between D and the console must be encoded/decoded.


-- 
Rainer Deyke - rainerd eldwood.com

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

 
 No, stdin/stdout *must* perform this conversion.  It is a serious bug if
 they don't.
 
 The conversion cannot be performed at any other level.  D uses unicode
 internally.  The console uses a specific encoding.  Therefore all data
 passing between D and the console must be encoded/decoded.
 
 

What API to use to detect the encoding used by the console?

Andrei

Mar 02 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

 No, stdin/stdout *must* perform this conversion.  It is a serious bug if
 they don't.

 The conversion cannot be performed at any other level.  D uses unicode
 internally.  The console uses a specific encoding.  Therefore all data
 passing between D and the console must be encoded/decoded.

 
 What API to use to detect the encoding used by the console?
 
 Andrei

According to <http://markmail.org/message/neu2pllqz3sst4tq>, it's uint
GetConsoleOutputCP()
<http://msdn.microsoft.com/en-us/library/ms683169%28VS.85%29.aspx>.

Interestingly, there's a SetConsoleOutputCP
<http://msdn.microsoft.com/en-us/library/ms686036(VS.85).aspx> function.
 Check this out:

 module utf;

 import tango.io.Stdout;

 extern(Windows) int SetConsoleOutputCP(uint wCodePageID);

 void main()
 {
     SetConsoleOutputCP(65001);
     Stdout("Не∟└Ω").newline;
 }

FYI, "65001" is how Windows spells "UTF-8".  Also note that this won't
work in anything earlier than Windows 2000, but then, even that's not
supported any more.

Note that you MUST change the console's font to Lucidia Console
(right-click title, properties, font tab) for this to actually display,
but that's not something D can control.  :P

  -- Daniel

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Daniel Keep wrote:
 
 Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

 No, stdin/stdout *must* perform this conversion.  It is a serious bug if
 they don't.

 The conversion cannot be performed at any other level.  D uses unicode
 internally.  The console uses a specific encoding.  Therefore all data
 passing between D and the console must be encoded/decoded.

 What API to use to detect the encoding used by the console?

 Andrei

 
 According to <http://markmail.org/message/neu2pllqz3sst4tq>, it's uint
 GetConsoleOutputCP()
 <http://msdn.microsoft.com/en-us/library/ms683169%28VS.85%29.aspx>.
 
 Interestingly, there's a SetConsoleOutputCP
 <http://msdn.microsoft.com/en-us/library/ms686036(VS.85).aspx> function.
  Check this out:
 
 module utf;

 import tango.io.Stdout;

 extern(Windows) int SetConsoleOutputCP(uint wCodePageID);

 void main()
 {
     SetConsoleOutputCP(65001);
     Stdout("Не∟└Ω").newline;
 }

 
 FYI, "65001" is how Windows spells "UTF-8".  Also note that this won't
 work in anything earlier than Windows 2000, but then, even that's not
 supported any more.
 
 Note that you MUST change the console's font to Lucidia Console
 (right-click title, properties, font tab) for this to actually display,
 but that's not something D can control.  :P
 
   -- Daniel

Ahhhh... Windows you mean? Ehm. I need to get to a Windows machine. If 
you could paste this into a bug report that would be great.

Thanks,

Andrei

Mar 02 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Mon, 02 Mar 2009 12:53:48 -0800, Andrei Alexandrescu wrote:

 Rainer Deyke wrote:
 Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

 
 No, stdin/stdout *must* perform this conversion.  It is a serious bug if
 they don't.
 
 The conversion cannot be performed at any other level.  D uses unicode
 internally.  The console uses a specific encoding.  Therefore all data
 passing between D and the console must be encoded/decoded.
 

 
 What API to use to detect the encoding used by the console?

There is std.windows.charset.toMBSz(str, 1) which does the right thing.

Mar 02 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 The localized version will look like this:
 
 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

This short example suggests:
Locale.peek(T)(char[] key, T ifNotFound = T.init)

auto localFormat = currentLocale ? currentLocale.peek(format, format) : 
format;
throw new Exception(localFormat);

Mar 02 2009

Derek Parnell <derek psych.ward> writes:

On Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:

 Consider some code in phobos that must throw an exception:
 
 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);
 
 The localized version will look like this:
 
 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

One problem with this approach is that we meet the limitation of the
formatting string's micro-syntax. Currently, there is no way to reorder the
tokens in a message string, and that is required for /some/ messages in
/some/ languages.

I have used my own text formatting routine rather than Phobos' because it
allows the implementer to develop messages whose word order is correct for
their target language.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Mar 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Derek Parnell wrote:
 On Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:
 
 Consider some code in phobos that must throw an exception:

 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);

 The localized version will look like this:

 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

 
 One problem with this approach is that we meet the limitation of the
 formatting string's micro-syntax. Currently, there is no way to reorder the
 tokens in a message string, and that is required for /some/ messages in
 /some/ languages.
 
 I have used my own text formatting routine rather than Phobos' because it
 allows the implementer to develop messages whose word order is correct for
 their target language.
 

Phobos has supported Posix positional syntax since 2.006.

http://digitalmars.com/d/2.0/phobos/std_stdio.html


Andrei

Mar 02 2009

Derek Parnell <derek psych.ward> writes:

On Mon, 02 Mar 2009 18:36:09 -0800, Andrei Alexandrescu wrote:

 Phobos has supported Posix positional syntax since 2.006.
 
 http://digitalmars.com/d/2.0/phobos/std_stdio.html

Thank you. I was behind the times (again). 

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Mar 02 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-03-02 10:02:10 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 I think there are three aspects to localization. One is date and number 
 formating. Another is offering a facility for translating all the 
 messages an application can give. And the last one is the configuration 
 part, where you know which format to use.

 
 Sounds like a good start.
 
 The only problem I've seen addressed by you right now is the 
 configuration part; I believe it's the wrong end to start with.
 
 We should start by defining how to perform the tasks I enumerated 
 above: translating date and number formats, selecting strings for a 
 given language. After that we can figure out how to pass the proper 
 default configuration around. And then you're done.
 
 For date and number formatting, I like very much the NSDateFormatter 
 and NSNumberFormatter approach in Cocoa for instance: you have a base 
 class to format dates, another for numbers; you can easily create your 
 own subclass if you want, and there's a way to get the default 
 formatter instance.

 
 Well I was thinking of passing the buck around. Instead of std.locale 
 defining a hierarchy for formatting numbers and dates, it provides a 
 means for user code to plant a routine in the locale object that knows 
 how to format numbers and dates. Of course, with time default localized 
 routine implementations will show up (hopefully contributed to by 
 people), but the basic mechanism is simple - there exists a locale 
 table that allows you to store a delegate in it.

Looks somewhat like what I proposed. But the point I was trying to make 
is that you don't need to regroup all these in one big object called a 
"locale".

Instead of seeing a locale as a central object for localizing every 
kind of data, I'm suggesting that we have different kinds of formatters 
capable of localizing different kinds of data. Each formatter would 
have its own definition of a locale that suits its needs. All you need 
is a standardized naming scheme for locales compatible between 
formatters, but that we have.

Note that while I've proposed that formatters be classes, I have no 
problem in them being structs which could be accepted in template 
functions.

What's good about a class, or a struct, is that it can regroup a bunch 
of related functions. For instance, you could have a number formatter 
help you display the right string, read a formatted string, and 
validate a formatted string. And you could configure the formatter for 
a fixed number of decimals, specific rounding behaviour, negative 
format, etc.


 This is extensible, because if you wanted to go further, you could add 
 formatter classes for various units (length, mass...), or anything else.

 
 This I want to avoid, at least for the time being. I want to define a 
 table that can contain strings, integers, delegates, and other 
 sub-tables. This is it. The path to extensibility will not be Phobos 
 defining new classes to format various things. This could go on 
 forever. Phobos will use the table consistently, and users who do want 
 to format various things will simply plant their delegates in the table.

Well, when I said "you", I really meant anyone, and not necessarily 
inside Phobos. That was just to point out that the design is 
extensible. Sorry, it was confusing.


 Translating strings is a little harder because 1) strings are 
 application-defined, 2) strings are often not available in the user's 
 prefered language, adding the need for a fallback mecanism, and 3) 
 different applications will want to to store those strings in different 
 ways. Perhaps we could define a base class for getting translated 
 strings, then allow the program to use whatever subclass it wants.

 
 There's no need for classes and subclasses. It's all data. Why should 
 we replace data with code? Data is easier.
 
 Consider some code in phobos that must throw an exception:
 
 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);
 
 The localized version will look like this:
 
 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);
 
 What happens is that the default format string _is_ the key for looking 
 up the localized strings. If there's no value for that string, the 
 default format string is in vigor. Note that on the default path, 
 currentLocale is null so there is hardly any inefficiency.

Firstly, while you and I both agree that it's good that the key for 
searching a localized string be a readable message, not everyone does. 
It often doesn't work well when you want to translate small words 
having an overloaded meaning in English for instance.

Secondly, always falling back to english (or the developer's locale) 
when the currentLocale is not available isn't flexible enough. On Mac 
OS X for instance, you can select a number of languages for 
applications to use in order of preference. When the first isn't 
available, it looks for the second (skipping some details).

Thirdly, I hope you don't expect everyone to write the above each time. 
We should provide a nice fucntion to do the localization, say 
"localize"? This function should really be an overridable delegate.

	auto format = "File `%s' not found, system error is %s.";
	throw Exception(localize(format), filename, errnomsg);

Fourthly, various libraries are likely to provide their own translation 
tables (perhaps even in various formats). Unless you merge them all 
(risking some clashes) so you may want a second argument for specifying 
the translation table to use.

	auto format = "File `%s' not found, system error is %s.";
	throw Exception(localize(format, PHOBOS), filename, errnomsg);

Finally, no current library address this, but I'd be great if there was 
a way to correctly manage plurals in all languages. Perhaps making a 
word parametrizable depending on a number...


 Notice how I'm not using the word "locale" to talk about these things. 
 "Locale" is a concept too abstract to be able to do something good with 
 it. Since you could only define it using Algebraic type and a loosely 
 defined tree of strings, that seems to confirm my view. Call the module 
 std.locale if you want, but keep in mind that the most important task 
 at hand is facilitating localization, not defining what constitutes a 
 locale, that can wait.

 
 How should I call it?

My point was that there shouldn't be a class/struct/thing representing 
a locale. Having a collection of formatters, each knowning where to get 
their locale information (when given a locale name) would work better 
in my opinion.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Michel Fortin wrote:
 Translating strings is a little harder because 1) strings are 
 application-defined, 2) strings are often not available in the user's 
 prefered language, adding the need for a fallback mecanism, and 3) 
 different applications will want to to store those strings in different 
 ways. Perhaps we could define a base class for getting translated 
 strings, then allow the program to use whatever subclass it wants.

It's a silly thing, but I love the little google widget you can add to a 
web page to automatically translate the pages. All the D site pages have 
it in the left column.

Mar 02 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-03-02 14:58:26 -0500, Walter Bright <newshound1 digitalmars.com> said:

 It's a silly thing, but I love the little google widget you can add to 
 a web page to automatically translate the pages. All the D site pages 
 have it in the left column.

It's not a silly thing, it's hilarious. Look, Google has invented the 
D-French language:

-	import std.stdio;
+	std.stdio importation;
----------
-	delete cl;
+	supprimer cl;
----------
-	s.allocated += argv.length * typeof (argv[0]).sizeof;
+	s.allocated + = * argv.length typeof (argv [0]). sizeof;
----------
-	writefln( "argc = %d, "  ~ "allocated = %d" ,
-	  argspecs().count, argspecs().allocated);
+	Writefln ( "argc =% d," ~ "attribu�s =% d",
+	  argspecs (). count, argspecs (). allou�);
----------
-	this ( int  argc, string argv) // constructor
+	ce (int argc, string argv) / / constructeur

Funny French that is. Perhaps DMD should make its identifiers and 
keywords localizable, the result would be much better. :-)

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Michel Fortin wrote:
 On 2009-03-02 14:58:26 -0500, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 It's a silly thing, but I love the little google widget you can add to 
 a web page to automatically translate the pages. All the D site pages 
 have it in the left column.

 
 It's not a silly thing, it's hilarious. Look, Google has invented the 
 D-French language:

A bug in Google's translator is there's no way to tell it to ignore a 
section, like a code section.

Mar 02 2009

Sean Kelly <sean invisibleduck.org> writes:

Michel Fortin wrote:
 On 2009-03-02 14:58:26 -0500, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 It's a silly thing, but I love the little google widget you can add to 
 a web page to automatically translate the pages. All the D site pages 
 have it in the left column.

 
 It's not a silly thing, it's hilarious. Look, Google has invented the 
 D-French language:
 
 -    import std.stdio;
 +    std.stdio importation;
 ----------
 -    delete cl;
 +    supprimer cl;
 ----------
 -    s.allocated += argv.length * typeof (argv[0]).sizeof;
 +    s.allocated + = * argv.length typeof (argv [0]). sizeof;

Wow, I didn't know the standard French form for formulas was prefix 
notation :-)


Sean

Mar 03 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Sean Kelly wrote:
 Michel Fortin wrote:
 On 2009-03-02 14:58:26 -0500, Walter Bright
 <newshound1 digitalmars.com> said:

 It's a silly thing, but I love the little google widget you can add
 to a web page to automatically translate the pages. All the D site
 pages have it in the left column.

 It's not a silly thing, it's hilarious. Look, Google has invented the
 D-French language:

 -    import std.stdio;
 +    std.stdio importation;
 ----------
 -    delete cl;
 +    supprimer cl;
 ----------
 -    s.allocated += argv.length * typeof (argv[0]).sizeof;
 +    s.allocated + = * argv.length typeof (argv [0]). sizeof;

 
 Wow, I didn't know the standard French form for formulas was prefix
 notation :-)
 
 
 Sean

Wait, does this mean the French speak LISP?

No wonder I could never understand them! :O

  -- Daniel

Mar 03 2009

Georg Wrede <georg.wrede iki.fi> writes:

Walter Bright wrote:
 Georg Wrede wrote:
 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.

 
 I've attempted to use locales, but the reason I'd always wind up doing 
 it by hand is because the existing libraries to do it are obtuse, 
 impenetrable, execrable, and pretty much unusable.

I'd venture to say, it's not only the libraries -- the stuff itself is 
obtuse. In most countries there's no *real* consensus on what and how 
folks want their settings, and often the Official Settings (as dictated 
by either a real or imagined authority) are less than practical.

A case in point, in Finland, what I get when trying to type a dollar 
sign, is a �, which is a circle with four spokes. This sign is not used 
for absolutely anything, anywhere. Ever. (And I've been at this for more 
than 25 years.)

 So it may be that it's an insoluble problem, or maybe nobody has come up 
 with the right abstraction yet. I don't have nearly enough experience 
 with it to know the answer.

National pride, anti-imperialism, you name it. The numeric keyboard 
around here has a comma instead of the decimal point. Just guess if it's 
nice to try to do spread sheets, where you have use a decimal point just 
because this spread sheet goes to company correspondence overseas.

Folks are all eager about locales, until they get their hands dirty.

IMHO, it actually is an insoluble problem -- at least as far as a 
*programming language* is concerned.

Mar 01 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing 
 about locales. (I know I dislike the design C++ uses.)

 
 
 D uses Utf-8, and that is *good enough*!
 
 This lets my programs "understand" Finnish, and doesn't give me undue 
 headaches.
 
 
 Seriously tending to locale issues would be an *endless swamp*. Just for 
 this, I looked up something suitable to read:
 
 http://www.manpagez.com/man/1/perllocale/
 
 It may even be that you would find the time, but think about Walter and 
 us, please. There *really are* other things to do.
 
 
 An excellent string hierarchy without the entire rest of i18n, is only 
 going to look like a Ferrari with a Trabant engine. Which is worse than 
 nothing at all.
 
 Besides, there's more to this than just designing the perfect, or even a 
 good locale system in a language. *Somebody should actually use it*.
 
 Now, the non-English programmer, what does he really want? He wants to 
 be able to type stuff into his program in his native character set. D 
 already does that, by way of Utf-8.
 
 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.
 
 The hypothetical Ambitious Programmer might want to use locale. He could 
 then have the dates and times (and currencies, etc.) follow the country. 
 Now, that might sound commendable, but in practice it *crumbles*.
 He can't possibly know how to deal with languages that are written 
 backwards, languages where several characters make one letter, exotic 
 ways of writing dates, etc.
 
 So, his fancy i18n project is doomed to be, at most, as usable as the 
 "normal" D program. Probably less, since his decisions will actually 
 worsen the user experience -- for users in another culture.
 
 
 And, any project big enough to tackle this, will implement its own 
 locale handling anyway. I'm sorry to say.
 
 ----
 
 Yes, locales are nice and all.
 For D 3.5 that is.
 Honestly.

If you don't use it, you don't use it; but please don't ruin it for the 
sake of those of us who will.

I will use it (go Andrei!)
people who have to muck with spreadsheet libraries might use it
people who write spreadsheet libraries might use it

wish I had some good ideas for Andrei, but I can't say as I do.

Mar 01 2009

Christopher Wright <dhasenan gmail.com> writes:

Georg Wrede wrote:
 The hypothetical Ambitious Programmer might want to use locale. He could 
 then have the dates and times (and currencies, etc.) follow the country. 
 Now, that might sound commendable, but in practice it *crumbles*.
 He can't possibly know how to deal with languages that are written 
 backwards, languages where several characters make one letter, exotic 
 ways of writing dates, etc.

*cough*tango.time*cough*

Mar 02 2009

D Programming

C/C++ Programming

Other

digitalmars.D - std.locale