www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.locale

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sooner or later that will need to be defined. I know next to nothing 
about locales. (I know I dislike the design C++ uses.)

I was thinking of a design along the following lines. There are RFCs 
dedicated to locale nomenclature:

http://tools.ietf.org/html/rfc4646 for language names
http://www.unicode.org/cldr/ for various locale names

So we know the basic names we want to follow, which is one less burden. 
Then what I want to do is to define a hierarchical string table that 
fills the appropriate names.

This is in opposition to defining an actual class hierarchy that mimics 
the localization table. I think a hierarchical string table is better 
because it allows simple extensibility.

The type stored by each slot of a locale is:

Algebraic!(
     int,
     string,
     Variant delegate(Variant),
     This[string]);

meaning that a locale could store one of these types. (What else should 
go in there?)

The access pattern goes like:

// Get the date display pattern
auto pat = myLocale.get("calendars", "calendar=default",
     "dateFormats", "dateFormatLength=medium", "pattern");

This will return an Algebraic with a string in it. The string looks like 
e.g. "yyyy-MM-dd".

The access is rather verbose because the corresponding locale names tree 
is equally (actually more) verbose, see 
http://unicode.org/Public/cldr/1.6.1/core.zip. But the flexibility and 
the standards-compliance are there. We may add later some convenience 
functions for frequently-used stuff such as dates, times, and numbers.

Extension is obvious:

myLocale.put("my-category", "my-slot", "whatever");

Getting later the stuff in "my-category", "my-slot" will return a string 
Algebraic containing "whatever".

There will be a global reference to a Locale class, e.g. defaultLocale. 
By default the reference will be null, implying the C locale should be 
in effect. Applications can assign to it as they find fit, and also pass 
around multiple locale variables.

So I wanted to gather some good ideas about locale design. Is a 
string-and-Algebraic design good for all uses? What kind of locale 
functionality does it not capture? I must have missed a ton of details, 
so if you don't understand what I mean by the above, it must be me.



Andrei
Mar 01 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. defaultLocale. 
 By default the reference will be null, implying the C locale should be 
 in effect. Applications can assign to it as they find fit, and also pass 
 around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable. Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)
Mar 01 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they find 
 fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable. Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)

I don't understand this. That means there's no more default locale. Here's what I had in mind: class Locale { ... } // function parameterized with an optional locale void foo(Data d, Locale loc = null); So there's no more default locale. If you pass in null, that's the default locale. Andrei
Mar 01 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable. Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)

I don't understand this. That means there's no more default locale. Here's what I had in mind: class Locale { ... } // function parameterized with an optional locale void foo(Data d, Locale loc = null); So there's no more default locale. If you pass in null, that's the default locale.

That's fine, I was thrown off by your reference to a "global reference".
Mar 01 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable. Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)

I don't understand this. That means there's no more default locale. Here's what I had in mind: class Locale { ... } // function parameterized with an optional locale void foo(Data d, Locale loc = null); So there's no more default locale. If you pass in null, that's the default locale.

That's fine, I was thrown off by your reference to a "global reference".

Well I was thinking a global reference might be handy for people who e.g. want to set the locale once and then be done with it. I think only a few apps actually manipulate multiple locales simultaneously. Most would just want to load the locale present on the user's computer and then use it. Andrei
Mar 01 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the 
 C locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable. Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)

I don't understand this. That means there's no more default locale. Here's what I had in mind: class Locale { ... } // function parameterized with an optional locale void foo(Data d, Locale loc = null); So there's no more default locale. If you pass in null, that's the default locale.

That's fine, I was thrown off by your reference to a "global reference".

Well I was thinking a global reference might be handy for people who e.g. want to set the locale once and then be done with it.

That's what I was objecting to!
 I think only 
 a few apps actually manipulate multiple locales simultaneously. Most 
 would just want to load the locale present on the user's computer and 
 then use it.

User settable global state is eeevil.
Mar 01 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Well I was thinking a global reference might be handy for people who 
 e.g. want to set the locale once and then be done with it.

That's what I was objecting to!
 I think only a few apps actually manipulate multiple locales 
 simultaneously. Most would just want to load the locale present on the 
 user's computer and then use it.

User settable global state is eeevil.

I am thinking of a better form using scope-based locale usage. Consider: class Locale { ... } struct LocaleContext { this(Locale value); ~this(); private Locale value(); alias value this; ... } People wouldn't have access to a global Locale object. They can, however, create LocaleContext objects. Such objects set the current locale to user's locale in the constructor and restore the previous locale in the destructor. That way use of locales follows use of scopes and the long-distance dependency created by globals is largely diminished. An application just needing to create a LocaleContext upon loading and be done with it can create its own LocaleContext inside e.g. main(). A more sophisticated app may manage multiple locale contexts and put them in action as it needs. It's really flexible, and without promoting bad programming styles. Andrei
Mar 01 2009
prev sibling parent reply BCS <none anon.com> writes:
Hello Walter,
 User settable global state is eeevil.
 

User *alterable* global state is eeevil. I can see a good argument for immutable WORM variables that can be assigned to exactly once very early in the program load process.
Mar 01 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
BCS wrote:
 Hello Walter,
 User settable global state is eeevil.

User *alterable* global state is eeevil. I can see a good argument for immutable WORM variables that can be assigned to exactly once very early in the program load process.

Sure, I meant global state once initialized.
Mar 01 2009
prev sibling next sibling parent reply Georg Wrede <georg.wrede iki.fi> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they find 
 fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.

The two programs that are most "locale aware" are usually spread sheets and word processors. It is usual that the user needs to write, say, in Swedish or in Russian, while in a Finnish setting. Or that one wants to use a decimal separator other than what is "proper" for the country. For example, a lot of people use "." instead of the official "," in Finland, and many use time as "18:23" instead of "18.23". For this purpose, these programs let the users define these any way they want. I think the notion of locales is, slowly but steadily, going away. It was a nice idea at the time, but with two problems: users don't use it, and programmers don't use it. Of course, eventually we will want to "do something" about this. But that should be left to the day when real issues are all sorted out in D. This is a non-urgent, low-priority thing.
Mar 01 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Georg Wrede wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.

The two programs that are most "locale aware" are usually spread sheets and word processors. It is usual that the user needs to write, say, in Swedish or in Russian, while in a Finnish setting. Or that one wants to use a decimal separator other than what is "proper" for the country. For example, a lot of people use "." instead of the official "," in Finland, and many use time as "18:23" instead of "18.23". For this purpose, these programs let the users define these any way they want.

That's exactly what my proposal is doing. People can start with the defaults of the Finnish locale and then overwrite whichever parts they want.
 I think the notion of locales is, slowly but steadily, going away.

Do you have any data backing this up?
 It was a nice idea at the time, but with two problems: users don't use 
 it, and programmers don't use it.

Is it because it hasn't been properly packaged?
 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in D. 
 This is a non-urgent, low-priority thing.

I guess. Now please tell me how I print arrays in D. Andrei
Mar 01 2009
parent reply Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.

The two programs that are most "locale aware" are usually spread sheets and word processors. It is usual that the user needs to write, say, in Swedish or in Russian, while in a Finnish setting. Or that one wants to use a decimal separator other than what is "proper" for the country. For example, a lot of people use "." instead of the official "," in Finland, and many use time as "18:23" instead of "18.23". For this purpose, these programs let the users define these any way they want.

That's exactly what my proposal is doing. People can start with the defaults of the Finnish locale and then overwrite whichever parts they want.

From Java.util.class.locale (j2se/1.4.2): "A Locale object represents a specific geographical, political, or cultural region." Nice. If those three were orthogonal, then you'd choose each once and be done with it. Unfortunately, they blend. And they blend in a different way in every area. That creates "continuums" of needs for settings, and these can't really be predicted easily. A GUI user can rely on the settings been made at OS install by himself or the local vendor. But the console is different. (See below.)
 I think the notion of locales is, slowly but steadily, going away.

Do you have any data backing this up?

For instance, in the old days, the operating system used to define the variable LC_LOCAL for the user. It signified the locale, usually the user's country. Today, I see no such thing. The only variables related to such are for the GUI: LANG=en_US.UTF-8 GDM_LANG=en_US.UTF-8 One is the console input language and the other is the GUI input language. No locale stuff anywhere.
 It was a nice idea at the time, but with two problems: users don't use 
 it, and programmers don't use it.

Is it because it hasn't been properly packaged?

No. Imagine for a moment that we had a Perfect Locale Implementation (which I say is not even possible, but still). If a programmer wanted to use locale dependent printing, then he'd have to get familiar with all the possible ways his string may get printed if someone uses his program in a far away country. And there are a few different ways, believe me. Would you imagine anybody actually bothering to do that? Would you?? So what the programmer does, is, he prints things the way he wants, and caters only to the specific things he feels he needs to. And creates a solution that behaves *predictably*, from his point of view. He may want folks in France and Finland to use his program. And since he doesn't write the UI strings in any other language, the program will be unusable to folks in Afghanistan anyway. Or he writes an English UI, whereupon people accept that it may not cater for all kinds of exotic needs.
 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in 
 D. This is a non-urgent, low-priority thing.


Had there been any need for locales, believe me, the "foreigners" in this NG would have asked for it.
 I guess. Now please tell me how I print arrays in D.

Think about it for a moment. We have two kinds of programs, those written for the console, and those written for a GUI. It's natural for the GUI programs to be locale aware, but with the console apps, it simply is not possible to do properly. I'll explain, but first: Let's split this into two separate issues, the console and the GUI. The GUI is aware of your preferences. You don't use writefln with the GUI. You use the GUI API for any I/O, right? Now, wouldn't it be natural to assume that the GUI API takes care of all of this? Print a date, and it prints it with the user's preferred format. *The same with your array*. And then let's look at the console. A proper internationalisation would mean that the Chinese could use the console, and all character mode apps in Chinese. Problem is, there simply aren't enough pixels on many consoles to render the Chinese character set. So we're off track already. And with the ubiquitous GUIs around, people are increasingly accepting that a GUI is for nationalised stuff, and the console is for "technical" stuff. Haven't you noticed: in the last decade it has become all the more evident that the reason to write a non-GUI app, is very specifically just to get rid of all kinds of hassles, and simply concentrate on what the program is supposed to do! (You know, a few years ago we had a major conversation here about whether non-ASCII variable names should be accepted in D. The end result is, yes. (I just tried it.) Now, how can an international team cowork on a project where variable names are written so the other folks can't even type them with their keyboards??? -- All very nice, but no cigar. That's about as smart as letting people define *unlimited* length variable names!) *** How to print arrays *** You print arrays in a predictable and expected way. D array printing is for non-GUI stuff. Hence, you use the C locale, period. A matematician seriously doesn't want his arrays to have commas instead of decimal points. He sure as heck doesn't want the numbers to all of a sudden turn to Klingon like hieroglyphs just because he is showing his results in an overseas seminar, on the local computer!!!!! And what about the programmer who wants his array to go into another program? What do you think happens to parsing when the decimal point is suddenly a comma?? We've had Walter make nice features to D that were laborious to create, only to see nobody use them. It's happened, ask him. *Now* is not the time to do that again.
Mar 01 2009
next sibling parent Rainer Deyke <rainerd eldwood.com> writes:
Georg Wrede wrote:
 Let's split this into two separate issues, the console and the GUI.
 
 The GUI is aware of your preferences.
 You don't use writefln with the GUI.
 You use the GUI API for any I/O, right?

There's a third faction: graphical apps that don't use the underlying GUI API. Most games fall in this category. When writing cross-platform apps (whether gui, non-gui-but-graphical, or console), you need some layer of abstraction over the underlying platform localization API. This abstraction can be provided by the programming language, or a third-party library.
 A proper internationalisation would mean that the Chinese could use the
 console, and all character mode apps in Chinese. Problem is, there
 simply aren't enough pixels on many consoles to render the Chinese
 character set.

I have Windows configured to use a Japanese text encoding for command windows. I can and do run Japanese console applications, but console applications that assume CP437 or Latin-1 don't work for me. -- Rainer Deyke - rainerd eldwood.com
Mar 02 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to create, 
 only to see nobody use them. It's happened, ask him.

Sure. Often the only way to see if a feature is useful is to actually implement it and see what happens. Some features have succeeded and found uses far beyond my expectations (CTFE, string mixins) while others have pretty much languished (design by contract, complex numbers).
 *Now* is not the time to do that again.

To some extent, we can't predict that. But I did find your arguments pretty strong.
Mar 02 2009
next sibling parent Georg Wrede <georg.wrede iki.fi> writes:
Walter Bright wrote:
 Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to 
 create, only to see nobody use them. It's happened, ask him.

Sure. Often the only way to see if a feature is useful is to actually implement it and see what happens. Some features have succeeded and found uses far beyond my expectations (CTFE, string mixins) while others have pretty much languished (design by contract, complex numbers).
 *Now* is not the time to do that again.

To some extent, we can't predict that. But I did find your arguments pretty strong.

LOL :-)
Mar 02 2009
prev sibling next sibling parent Gide Nwawudu <gide btinternet.com> writes:
On Mon, 02 Mar 2009 01:37:55 -0800, Walter Bright
<newshound1 digitalmars.com> wrote:

Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to create, 
 only to see nobody use them. It's happened, ask him.

Sure. Often the only way to see if a feature is useful is to actually implement it and see what happens. Some features have succeeded and found uses far beyond my expectations (CTFE, string mixins) while others have pretty much languished (design by contract, complex numbers).

I think DbC would be widely used if it worked with inheritance and could possible be apply to interfaces. There is an entry in Bugzilla and has been voted sixth up to sixth place. http://d.puremagic.com/issues/show_bug.cgi?id=302 Gide
Mar 03 2009
prev sibling next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Walter Bright wrote:
 Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to
 create, only to see nobody use them. It's happened, ask him.

Sure. Often the only way to see if a feature is useful is to actually implement it and see what happens. Some features have succeeded and found uses far beyond my expectations (CTFE, string mixins) while others have pretty much languished (design by contract, complex numbers).

I've used complex numbers before, but only when rendering fractals. Sorry :P As for design by contract, my problem has always been this: Contracts let you ensure that your assumptions about program state are never violated. That means checking pre- and post-conditions on functions, and invariants for classes. Which is great. So I put contracts on everything. Fantastic. I do a release compile, and all that safety disappears. So only the debug build has contracts enabled. But it's the release build, if it crashes, that I need help diagnosing. There's also libraries. If you put contracts on public APIs, then your library is only checking arguments in debug builds. This makes release builds faster, but also less safe. So do you only put contracts on internal APIs and do manual exception testing on public APIs? I like DbC, I really do. I just have trouble figuring out where and how to use it properly. -- Daniel
Mar 03 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Daniel Keep:
 So I put contracts on everything.  Fantastic.  I do a release compile,
 and all that safety disappears.  So only the debug build has contracts
 enabled.  But it's the release build, if it crashes, that I need help
 diagnosing.

A simple solution is to not use -release for the final version of the code, but this keeps array bound controls too. LDC may have already solved your problem, with extra compilation arguments that you can use to disable such controls independently from each other. It's not a fault of design by contract, it's just that the D compiler switches are lumped together. It seems a simple to solve problem. Bye, bearophile
Mar 03 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Daniel Keep:
 So I put contracts on everything.  Fantastic.  I do a release compile,
 and all that safety disappears.  So only the debug build has contracts
 enabled.  But it's the release build, if it crashes, that I need help
 diagnosing.

A simple solution is to not use -release for the final version of the code, but this keeps array bound controls too. LDC may have already solved your problem, with extra compilation arguments that you can use to disable such controls independently from each other. It's not a fault of design by contract, it's just that the D compiler switches are lumped together. It seems a simple to solve problem. Bye, bearophile

I agree. I'm having the same problem: I put a contract in there, I know it's as good as assert. So I can't do e.g. input validation because in most functions input must always be validated. I also know that contracts are doing the wrong thing with inheritance and can't apply to interfaces, which is exactly the (only?) place they'd be interesting. So I send the contracts home and use assert, enforce, and unittest. Andrei
Mar 03 2009
next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Tue, 03 Mar 2009 07:05:51 -0800, Andrei Alexandrescu wrote:

 bearophile wrote:
 Daniel Keep:
 So I put contracts on everything.  Fantastic.  I do a release compile,
 and all that safety disappears.  So only the debug build has contracts
 enabled.  But it's the release build, if it crashes, that I need help
 diagnosing.

A simple solution is to not use -release for the final version of the code, but this keeps array bound controls too. LDC may have already solved your problem, with extra compilation arguments that you can use to disable such controls independently from each other. It's not a fault of design by contract, it's just that the D compiler switches are lumped together. It seems a simple to solve problem. Bye, bearophile

I agree. I'm having the same problem: I put a contract in there, I know it's as good as assert. So I can't do e.g. input validation because in most functions input must always be validated. I also know that contracts are doing the wrong thing with inheritance and can't apply to interfaces, which is exactly the (only?) place they'd be interesting. So I send the contracts home and use assert, enforce, and unittest.

I'd really like to see enforce() as a built-in language feature. assert() doesn't help in way too many situations.
Mar 03 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know 
 it's as good as assert. So I can't do e.g. input validation because in 
 most functions input must always be validated. I also know that 
 contracts are doing the wrong thing with inheritance and can't apply to 
 interfaces, which is exactly the (only?) place they'd be interesting. So 
 I send the contracts home and use assert, enforce, and unittest.

Contracts are not for input validation! They are checking if the logic of your program is correct or not. Think of it this way - your program should behave exactly the same with or without the contracts turned on. Contracts should NOT be used for scrubbing user input, checking for errors from other components, or validating any input from external to the dll. If you feel the need to leave them on in a release build, then: 1) your testing is inadequate 2) you are using them incorrectly For example, Windows API functions check all their input. This is not contract programming - it's validating user input over which Microsoft has no control.
Mar 03 2009
next sibling parent Max Samukha <samukha voliacable.com.removethis> writes:
On Tue, 03 Mar 2009 11:00:36 -0800, Walter Bright
<newshound1 digitalmars.com> wrote:

Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know 
 it's as good as assert. So I can't do e.g. input validation because in 
 most functions input must always be validated. I also know that 
 contracts are doing the wrong thing with inheritance and can't apply to 
 interfaces, which is exactly the (only?) place they'd be interesting. So 
 I send the contracts home and use assert, enforce, and unittest.

Contracts are not for input validation! They are checking if the logic of your program is correct or not. Think of it this way - your program should behave exactly the same with or without the contracts turned on. Contracts should NOT be used for scrubbing user input, checking for errors from other components, or validating any input from external to the dll. If you feel the need to leave them on in a release build, then: 1) your testing is inadequate 2) you are using them incorrectly For example, Windows API functions check all their input. This is not contract programming - it's validating user input over which Microsoft has no control.

This is exactly how I look at them. However I've never tried to use pre/post conditions. I guess it's because of the syntax. By the way, about that image on the contracts page. Is the bullet flying away from the D-man because it's disgusted by his extreme ugliness? :)
Mar 03 2009
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know
 it's as good as assert. So I can't do e.g. input validation because in
 most functions input must always be validated. I also know that
 contracts are doing the wrong thing with inheritance and can't apply to
 interfaces, which is exactly the (only?) place they'd be interesting. So
 I send the contracts home and use assert, enforce, and unittest.

of your program is correct or not. Think of it this way - your program should behave exactly the same with or without the contracts turned on. Contracts should NOT be used for scrubbing user input, checking for errors from other components, or validating any input from external to the dll.

Why should contracts be limited to parameter checking of internally used functions only? If I write a function and document parameter constraints then I certainly expect those constraints to be followed regardless of whether I'm calling the function or someone else is calling the function. Checking these via a contract simply provides an optional means of ensuring that a logic error didn't occur within the program as a whole. If you're talking about application input however, then I agree completely. ie. stuff typed in by the user, read from a file, etc, should never be validated within a contract because an input failure at that level doesn't represent a program logic error but rather user error. An assertion failure isn't a terribly good way of notifying the user that they shouldn't have put an alphabetic character in a box intended to receive an integer :-) Sean
Mar 03 2009
next sibling parent reply Georg Wrede <georg.wrede iki.fi> writes:
Sean Kelly wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know
 it's as good as assert. So I can't do e.g. input validation because in
 most functions input must always be validated. I also know that
 contracts are doing the wrong thing with inheritance and can't apply to
 interfaces, which is exactly the (only?) place they'd be interesting. So
 I send the contracts home and use assert, enforce, and unittest.

of your program is correct or not. Think of it this way - your program should behave exactly the same with or without the contracts turned on. Contracts should NOT be used for scrubbing user input, checking for errors from other components, or validating any input from external to the dll.

Why should contracts be limited to parameter checking of internally used functions only? If I write a function and document parameter constraints then I certainly expect those constraints to be followed regardless of whether I'm calling the function or someone else is calling the function. Checking these via a contract simply provides an optional means of ensuring that a logic error didn't occur within the program as a whole.

The distinction is not whether you or others write stuff. It's about whether it is for debugging *only*, as opposed to general input validation. Sort of, like it's not prudent to put an assert anywhere other than where the source code (that is, a bug or goof by the programmer) causes the assert to fire.
 If you're talking about application input however, then I agree completely.
 ie. stuff typed in by the user, read from a file, etc, should never be
validated
 within a contract because an input failure at that level doesn't represent
 a program logic error but rather user error.  An assertion failure isn't
 a terribly good way of notifying the user that they shouldn't have put an
 alphabetic character in a box intended to receive an integer :-)
 
 
 Sean

Mar 04 2009
parent reply Sean Kelly <sean invisibleduck.org> writes:
Georg Wrede wrote:
 Sean Kelly wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Andrei Alexandrescu wrote:
 I agree. I'm having the same problem: I put a contract in there, I know
 it's as good as assert. So I can't do e.g. input validation because in
 most functions input must always be validated. I also know that
 contracts are doing the wrong thing with inheritance and can't apply to
 interfaces, which is exactly the (only?) place they'd be 
 interesting. So
 I send the contracts home and use assert, enforce, and unittest.

of your program is correct or not. Think of it this way - your program should behave exactly the same with or without the contracts turned on. Contracts should NOT be used for scrubbing user input, checking for errors from other components, or validating any input from external to the dll.

Why should contracts be limited to parameter checking of internally used functions only? If I write a function and document parameter constraints then I certainly expect those constraints to be followed regardless of whether I'm calling the function or someone else is calling the function. Checking these via a contract simply provides an optional means of ensuring that a logic error didn't occur within the program as a whole.

The distinction is not whether you or others write stuff. It's about whether it is for debugging *only*, as opposed to general input validation.

So I guess the real question is whether a function is expected to validate its parameters. I'd argue that it isn't, but then I'm from a C/C++ background. For me, validation is a debugging tool, or at least an optional feature for applications that want the added insurance. Sean
Mar 04 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Georg Wrede wrote:
 The distinction is not whether you or others write stuff. It's about 
 whether it is for debugging *only*, as opposed to general input 
 validation.

So I guess the real question is whether a function is expected to validate its parameters. I'd argue that it isn't, but then I'm from a C/C++ background. For me, validation is a debugging tool, or at least an optional feature for applications that want the added insurance.

Interesting. My policy is to favor validation whenever it doesn't impact performance. Imagine for example that strlen() validated its input for non-null. Would that show on the profiling chart of any C application? No, unless the application's core loop only called strlen() on a 1-character string or so. One simple case that clarifies the necessary tradeoff is binary search. That assumes the range to be searched is sorted. If you actually checked for that, it would render binary search useless as a linear search would be in fact faster. So you need to assume. One way to do so is in the documentation. You write in the docs that findSorted expects a sorted range. Another way is to encode this information in the type of the sorted range. But that's onerous as most of the time you have an array you just sorted, not a SortedArray value. The approach I took with the new phobos is: int[] haystack; int[] needle; ... auto pos1 = find(haystack, needle); // linear sort(haystack); auto pos2 = find(assumeSorted(haystack), needle); The assumeSorted function wraps the haystack in an AssumeSorted!(int[]) type without adding members or running extra code. It's there to clarify to everyone what's going on. And it's usable with other arguments or functions too, e.g. auto pos3 = find(haystack, assumeSorted(needle)); setIntersection(assumeSorted(haystack), assumeSorted(needle)); Interestingly, assumeSorted can actually do checking without impacting the complexity of the search. In debug mode, it can arrange to run random isSorted tests every 1/N calls, where N is the average length of the incoming arrays, then its complexity impact is amortized constant. Andrei
Mar 04 2009
next sibling parent reply Max Samukha <samukha voliacable.com.removethis> writes:
On Wed, 04 Mar 2009 08:47:50 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Sean Kelly wrote:
 Georg Wrede wrote:
 The distinction is not whether you or others write stuff. It's about 
 whether it is for debugging *only*, as opposed to general input 
 validation.

So I guess the real question is whether a function is expected to validate its parameters. I'd argue that it isn't, but then I'm from a C/C++ background. For me, validation is a debugging tool, or at least an optional feature for applications that want the added insurance.

Interesting. My policy is to favor validation whenever it doesn't impact performance. Imagine for example that strlen() validated its input for non-null. Would that show on the profiling chart of any C application? No, unless the application's core loop only called strlen() on a 1-character string or so. One simple case that clarifies the necessary tradeoff is binary search. That assumes the range to be searched is sorted. If you actually checked for that, it would render binary search useless as a linear search would be in fact faster. So you need to assume. One way to do so is in the documentation. You write in the docs that findSorted expects a sorted range. Another way is to encode this information in the type of the sorted range. But that's onerous as most of the time you have an array you just sorted, not a SortedArray value. The approach I took with the new phobos is: int[] haystack; int[] needle; ... auto pos1 = find(haystack, needle); // linear sort(haystack); auto pos2 = find(assumeSorted(haystack), needle); The assumeSorted function wraps the haystack in an AssumeSorted!(int[]) type without adding members or running extra code. It's there to clarify to everyone what's going on. And it's usable with other arguments or functions too, e.g. auto pos3 = find(haystack, assumeSorted(needle)); setIntersection(assumeSorted(haystack), assumeSorted(needle)); Interestingly, assumeSorted can actually do checking without impacting the complexity of the search. In debug mode, it can arrange to run random isSorted tests every 1/N calls, where N is the average length of the incoming arrays, then its complexity impact is amortized constant. Andrei

If you intruduce a dummy type, why not make it perform validation in a debug build when sumthing like debug=slowButSafe is set?
Mar 04 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Max Samukha wrote:
 
 If you intruduce a dummy type, why not make it perform validation in a
 debug build when sumthing like debug=slowButSafe is set?

Because in the case of binarySearch slowButSafe quickly becomes slowAndUseless. It's happened to me - I had an assert(isSorted) in binary search (I guess it's in one of the older phobos releases!) and when I was using the debug version, my program would take forever to run. A debug build should at most change the constant multiplying the complexity, not the complexity. Andrei
Mar 04 2009
parent reply Max Samukha <samukha voliacable.com.removethis> writes:
On Wed, 04 Mar 2009 10:27:55 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Max Samukha wrote:
 
 If you intruduce a dummy type, why not make it perform validation in a
 debug build when sumthing like debug=slowButSafe is set?

Because in the case of binarySearch slowButSafe quickly becomes slowAndUseless. It's happened to me - I had an assert(isSorted) in binary search (I guess it's in one of the older phobos releases!) and when I was using the debug version, my program would take forever to run. A debug build should at most change the constant multiplying the complexity, not the complexity. Andrei

I intentionaly proposed a special debug mode, not regular asserts, which are on in any debug build. I would like, knowing that I can wait a couple of days to make sure my program is correct, to be able to turn on validation.
Mar 04 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Max Samukha wrote:
 On Wed, 04 Mar 2009 10:27:55 -0800, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Max Samukha wrote:
 If you intruduce a dummy type, why not make it perform validation in a
 debug build when sumthing like debug=slowButSafe is set?

slowAndUseless. It's happened to me - I had an assert(isSorted) in binary search (I guess it's in one of the older phobos releases!) and when I was using the debug version, my program would take forever to run. A debug build should at most change the constant multiplying the complexity, not the complexity. Andrei

I intentionaly proposed a special debug mode, not regular asserts, which are on in any debug build. I would like, knowing that I can wait a couple of days to make sure my program is correct, to be able to turn on validation.

I am waiting a couple of days in release mode with all optimizations turned on and wind from behind. Andrei
Mar 04 2009
parent Max Samukha <samukha voliacable.com.removethis> writes:
On Wed, 04 Mar 2009 12:14:53 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Max Samukha wrote:
 On Wed, 04 Mar 2009 10:27:55 -0800, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Max Samukha wrote:
 If you intruduce a dummy type, why not make it perform validation in a
 debug build when sumthing like debug=slowButSafe is set?

slowAndUseless. It's happened to me - I had an assert(isSorted) in binary search (I guess it's in one of the older phobos releases!) and when I was using the debug version, my program would take forever to run. A debug build should at most change the constant multiplying the complexity, not the complexity. Andrei

I intentionaly proposed a special debug mode, not regular asserts, which are on in any debug build. I would like, knowing that I can wait a couple of days to make sure my program is correct, to be able to turn on validation.

I am waiting a couple of days in release mode with all optimizations turned on and wind from behind. Andrei

Ok
Mar 04 2009
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Georg Wrede wrote:
 The distinction is not whether you or others write stuff. It's about 
 whether it is for debugging *only*, as opposed to general input 
 validation.

So I guess the real question is whether a function is expected to validate its parameters. I'd argue that it isn't, but then I'm from a C/C++ background. For me, validation is a debugging tool, or at least an optional feature for applications that want the added insurance.

Interesting. My policy is to favor validation whenever it doesn't impact performance. Imagine for example that strlen() validated its input for non-null. Would that show on the profiling chart of any C application? No, unless the application's core loop only called strlen() on a 1-character string or so.

Interesting. So the inexpensive checks would go in the function body itself, with the exhaustive extra stuff in contracts. That does seem reasonable, though I still like the visual separation that the 'in' clause provides, and I'd love to be able to use the proposed inheritance feature of contracts, which seems like it might necessitate duplicating these inexpensive checks in the contract and in the function body itself.
 One simple case that clarifies the necessary tradeoff is binary search. 
 That assumes the range to be searched is sorted. If you actually checked 
 for that, it would render binary search useless as a linear search would 
 be in fact faster. So you need to assume. One way to do so is in the 
 documentation. You write in the docs that findSorted expects a sorted 
 range. Another way is to encode this information in the type of the 
 sorted range. But that's onerous as most of the time you have an array 
 you just sorted, not a SortedArray value.
 
 The approach I took with the new phobos is:
 
 int[] haystack;
 int[] needle;
 ...
 auto pos1 = find(haystack, needle); // linear
 sort(haystack);
 auto pos2 = find(assumeSorted(haystack), needle);
 
 The assumeSorted function wraps the haystack in an AssumeSorted!(int[]) 
 type without adding members or running extra code. It's there to clarify 
 to everyone what's going on. And it's usable with other arguments or 
 functions too, e.g.
 
 auto pos3 = find(haystack, assumeSorted(needle));
 setIntersection(assumeSorted(haystack), assumeSorted(needle));
 
 Interestingly, assumeSorted can actually do checking without impacting 
 the complexity of the search. In debug mode, it can arrange to run 
 random isSorted tests every 1/N calls, where N is the average length of 
 the incoming arrays, then its complexity impact is amortized constant.

One thing I've always really liked about pointer arguments is that they tend to document what's happening at the call-side as well (because of the address-of operator typically needed to obtain the address of a variable). I tend to avoid boolean parameters for similar reasons, unless the meaning can be communicated clearly at the call point. It seems like this serves a similar purpose, and I like it despite the potential for a user accidentally calling the slow overload when he could actually use the fast one--better it be correct than fast, after all. I'm not terribly fond of the added verbosity however, or that this seems like I couldn't use the property form: assumeSorted("abcd").find('c') Truth be told, my initial inclination would be to repackage the binary search as a one-liner with a different name, which kind of sabotages the whole idea. But I'll try to resist this urge.
Mar 04 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 I'm not terribly fond of the added verbosity however, or that this seems 
 like I couldn't use the property form:
 
     assumeSorted("abcd").find('c')
 
 Truth be told, my initial inclination would be to repackage the binary 
 search as a one-liner with a different name, which kind of sabotages the 
 whole idea.  But I'll try to resist this urge.

I understand. This is rather new, but I found it irresistibly cool to unify find() routines under one name and specify structure in the arguments' types. Usage is very easy, there's little to remember, and every piece of structure is where it should be. Consider: int[] a = [ 1, 2, 3, 4 ]; int[] b = [ 2. 3 ]; These algorithms each performs search a different way because each is informed in different ways about the structure of their arguments: find(a, b); find(assumeSorted(a), b); find(a, assumeSorted(b)); find(assumeSorted(a), assumeSorted(b)); find(a, boyerMooreFinder(b)); There's three names to remember that compose modularly. The run-of-the-mill approach is: find(a, b); binaryFind(a, b); findRhsSorted(a, b); binaryFindRhsSorted(a, b); boyerMooreFind(a, b); To add insult to injury, boyerMooreFind is not enough because it hides the structure created around b. So there's also need for one extra type e.g. BoyerMooreFinder!(int[]) for cases when there are multiple searches of the same thing. It's just onerous. Andrei P.S. By the way, this is the running example used in Chapter 4 of TDPL.
Mar 04 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 int[] a = [ 1, 2, 3, 4 ];
 int[] b = [ 2. 3 ];

I guess you meant: int[] b = [ 2, 3 ];
 find(a, b);
 find(assumeSorted(a), b);
 find(a, assumeSorted(b));
 find(assumeSorted(a), assumeSorted(b));
 find(a, boyerMooreFinder(b));

Are you talking about finding the position of a subarray into a bigger array? Then the two useful cases are: a.index(b); a.indexBoyerMoore(b); The other cases aren't common enough, I think. Bye, bearophile
Mar 04 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 int[] a = [ 1, 2, 3, 4 ];
 int[] b = [ 2. 3 ];

I guess you meant: int[] b = [ 2, 3 ];
 find(a, b);
 find(assumeSorted(a), b);
 find(a, assumeSorted(b));
 find(assumeSorted(a), assumeSorted(b));
 find(a, boyerMooreFinder(b));

Are you talking about finding the position of a subarray into a bigger array? Then the two useful cases are: a.index(b); a.indexBoyerMoore(b); The other cases aren't common enough, I think. Bye, bearophile

Binary search is rather common. As an aside, your use of "index" suggests you return integrals out of the function. IMHO that's strongly unrecommended. Andrei
Mar 04 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Binary search is rather common.

Oh, yes, sorry, I meant among the ones you have listed there...
 As an aside, your use of "index" suggests you return integrals out of 
 the function. IMHO that's strongly unrecommended.

I don't want to use too much of your time (that it may be better spent with your new child), but I don't understand what you mean. That index() function is meant the index position of the item or sub-sequence into the bigger array (or iterable), and it returns -1 if not found. This is an usual design. Some people think that such controls for -1 value aren't always done, so to avoid that and some bugs, it's better to raise something like IndexException when the needle isn't found. Bye, bearophile
Mar 04 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 Binary search is rather common.

Oh, yes, sorry, I meant among the ones you have listed there...

Of five, three are frequent (linear, binary, Boyer-Moore), one is a form of set intersection (find sorted in sorted), and the odd one is: find(a, assumeSorted(b)); This is rare but has excellent best-case complexity (is it O(a.length / b.length)?) and is easy to add for completeness.
 As an aside, your use of "index" suggests you return integrals out
 of the function. IMHO that's strongly unrecommended.

I don't want to use too much of your time (that it may be better spent with your new child), but I don't understand what you mean. That index() function is meant the index position of the item or sub-sequence into the bigger array (or iterable), and it returns -1 if not found. This is an usual design.

This is an extremely sloppy design. That it is usual doesn't make things any better!
 Some people think that such controls for -1 value aren't always done,
 so to avoid that and some bugs, it's better to raise something like
 IndexException when the needle isn't found.

Yah, this is the subject of a long rant but in short: returning int from find means that essentially that find is unusable with anything except random access structures. This in turn means you'd have to have different means, APIs, and user code to deal with e.g. lists, in spite of the fact that linear search is the same boring thing for all: look at the current thing, yes/no, move on to the next thing. IMHO ever since the STL has seen the light of day there is no excuse, not even sheer ignorance, to ever traffic in integers as a mean to access elements in containers, in any language that has even the most modest parameterized types capability. Returning int from find is an insult. To add injury to it, have find also return an int for a list. "Is this item in this list?" "Yeppers. It's the 538th element. Took me a hike to find it." "Well I wanted to do something with it." "Then go get it. I'm telling you, it'll take exactly 538 steps." Andrei
Mar 04 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
 of set intersection (find sorted in sorted),

Yeah. Sorted-array-based sets are probably common enough in C++ code (but I generally use something equivalent to hash-sets for this purpose, I have even mostly implemented such class for D1 in the dlibs, with a nice API). (I see sorted-array-based sets as an optimization to use in special cases, while hash-sets are for the general case when you want to manage sets (in D hashing needs a comparison too because of the chains are external and tree-based, but some other languages for hash-sets you need just hashability and not sortability of items)).
 and the odd one is:
 find(a, assumeSorted(b));
 This is rare but has excellent best-case complexity (is it O(a.length /
 b.length)?) and is easy to add for completeness.

I suggest you to not add this uncommon case, and add it only if later there are enough people asking for it. Less code to write and maintain. Creating functions and algorithms is a matter of balance between over-generalization (that often leads to useless complexity and longer syntax) and too much "special casing" that has other problems. Both extrema aren't good. Life isn't easy, I guess we are asking you to be a cross between Alexander Stepanov and Guido van Rossum :o)
 Returning int from find is an insult. To add injury to it, have find 
 also return an int for a list.

So you return an iterator, and no exception is raised, I see. I like this enough. Bye, bearophile
Mar 04 2009
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 
 Returning int from find is an insult. To add injury to it, have find 
 also return an int for a list.

tango.core.Array deals exclusively with indexes, but its aim is to make D's built-in arrays look more like a robust type than to provide a general set of algorithms usable with containers, etc. So in that instance, I think the decision is justifiable. It certainly makes for some nifty code when combined with the slice syntax.
Mar 04 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steve Schveighoffer wrote:
 BTW, what happens if you pass a sorted list into find?  Intuitively, 
 you'd assume you can pass as assumeSorted?  But you can't really do 
 anything but linear search?

It's up to the designer of the API. Passing a sorted forward range into sort will cut the average search time in half, which does not improve complexity.
 Then what if you pass a tree structure into find?  It's sorted, but not 
 exactly random access...
 
 I think find should return an iterator/pointer fine, but I don't think 
 there's a find template that can be used on all possible container 
 types.  Probably the best solution I think is to implement find inside 
 the container itself, and have the global find function advance a range 
 to the point of the element (or return an empty range if not found), with 
 "quick" searches reserved for sorted random-access ranges.  Note that stl 
 works this way.

Yah, that's right. The coolness of the technique comes mostly when the structure that can help searching is not obvious at the type system level (e.g. "is sorted") or is present in the needle, not the haystack (Boyer-Moore). I believe this approach is superior to STL's. I still think it's cool to have find operate on a variety of haystacks and needles. That way users don't need to fiddle with details of changing call syntax etc. Andrei
Mar 04 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 Steve Schveighoffer wrote:
 BTW, what happens if you pass a sorted list into find?  Intuitively, 
 you'd assume you can pass as assumeSorted?  But you can't really do 
 anything but linear search?

It's up to the designer of the API. Passing a sorted forward range into sort will cut the average search time in half, which does not improve complexity.

How does this work? find in a sorted linked list has the same expected runtime as in an unsorted one -- on average, 0.5 * length. Assuming that comparisons and iterating are equally expensive, that is. If you assume that comparison is more expensive, you can do better, though cheaper comparisons don't benefit you at all.
Mar 05 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 Steve Schveighoffer wrote:
 BTW, what happens if you pass a sorted list into find?  Intuitively, 
 you'd assume you can pass as assumeSorted?  But you can't really do 
 anything but linear search?

It's up to the designer of the API. Passing a sorted forward range into sort will cut the average search time in half, which does not improve complexity.

How does this work? find in a sorted linked list has the same expected runtime as in an unsorted one -- on average, 0.5 * length. Assuming that comparisons and iterating are equally expensive, that is. If you assume that comparison is more expensive, you can do better, though cheaper comparisons don't benefit you at all.

We're saying the same thing. Andrei
Mar 05 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 The approach I took with the new phobos is:
 
 int[] haystack;
 int[] needle;
 ...
 auto pos1 = find(haystack, needle); // linear
 sort(haystack);
 auto pos2 = find(assumeSorted(haystack), needle);

In my dlibs I do in a simpler and shorter way: auto pos1 = haystack.index(needle); haystack.sort(); auto pos2 = haystack.bisect(needle); Here there's no need to give the same name to two very different functions. If you really like to use a single function mame, with named arguments you may also do (bisect is false by default): auto pos1 = haystack.index(needle); haystack.sort(); auto pos2 = haystack.index(needle, bisect=true); Bye, bearophile
Mar 04 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 The approach I took with the new phobos is:
 
 int[] haystack; int[] needle; ... auto pos1 = find(haystack,
 needle); // linear sort(haystack); auto pos2 =
 find(assumeSorted(haystack), needle);

In my dlibs I do in a simpler and shorter way: auto pos1 = haystack.index(needle); haystack.sort(); auto pos2 = haystack.bisect(needle); Here there's no need to give the same name to two very different functions.

They do the exact same thing. Unifying them under the same name is good abstraction.
 If you really like to use a single function mame, with named
 arguments you may also do (bisect is false by default):
 
 auto pos1 = haystack.index(needle); haystack.sort(); 
 auto pos2 = haystack.index(needle, bisect=true);

I prefer encoding structural information in types. Andrei
Mar 04 2009
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Wed, 04 Mar 2009 08:12:48 -0800, Sean Kelly wrote:

 So I guess the real question is whether a function is expected to 
 validate its parameters.  I'd argue that it isn't, but then I'm from a 
 C/C++ background.  For me, validation is a debugging tool, or at least 
 an optional feature for applications that want the added insurance.

The rule-of-thumb that I use is that a function needs to validate a parameter if that parameter /can/ come from user input and /may not/ have been previously validated and is /critical/ to the success of the function's behaviour. If all of these are true, it means that the function has a potential to fail if it doesn't take the responsibility of parameter validation. If a parameter can only come from other functions, which are already guaranteed to only emit validate data, the parameter data does not need re-validation. However, even for some of these functions a 'contract' validation of input parameters might be needed if you are attempting to validate the logic or data flow, rather than the contents of the data itself. Contract validation of function results is not the same thing as input validation. Output validation is an attempt to prove that the function's logic is correct. Input validation is not a debugging tool. It is a chance to inform the program's user that they might have given the program some wrong information to work with. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Mar 04 2009
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Sean Kelly wrote:
 Why should contracts be limited to parameter checking of internally used
 functions only?  If I write a function and document parameter constraints
 then I certainly expect those constraints to be followed regardless of
 whether I'm calling the function or someone else is calling the function.
 Checking these via a contract simply provides an optional means of
 ensuring that a logic error didn't occur within the program as a whole.
 
 If you're talking about application input however, then I agree completely.
 ie. stuff typed in by the user, read from a file, etc, should never be
validated
 within a contract because an input failure at that level doesn't represent
 a program logic error but rather user error.  An assertion failure isn't
 a terribly good way of notifying the user that they shouldn't have put an
 alphabetic character in a box intended to receive an integer :-)

Your "users" are anyone external to your built binary. That means that dll's should not use contracts to validate arguments passed to the dll's entry points. If you're doing a library to be statically linked, it is debatable, and a decision you (as the library developer) need to make.
Mar 04 2009
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 03 Mar 2009 11:00:36 -0800, Walter Bright wrote:

 Contracts are not for input validation!

Hear! Hear! This is exactly correct. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Mar 03 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Walter Bright wrote:
 Georg Wrede wrote:
 We've had Walter make nice features to D that were laborious to 
 create, only to see nobody use them. It's happened, ask him.

Sure. Often the only way to see if a feature is useful is to actually implement it and see what happens. Some features have succeeded and found uses far beyond my expectations (CTFE, string mixins) while others have pretty much languished (design by contract, complex numbers).

I fucking love contracts. I need to use them more, but I do use them.
Mar 03 2009
prev sibling next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Georg Wrede wrote:
 (You know, a few years ago we had a major conversation here about 
 whether non-ASCII variable names should be accepted in D. The end result 
 is, yes. (I just tried it.) Now, how can an international team cowork on 
 a project where variable names are written so the other folks can't even 
 type them with their keyboards???

On the other hand, if you have a Chinese development team, why should they be limited to ASCII variable names? It doesn't make sense for them.
 -- All very nice, but no cigar. That's 
 about as smart as letting people define *unlimited* length variable names!)

I recently dealt with a programming language that specified a limit of 63 characters for identifier names. This wouldn't have been a significant problem, except that I was generating code automatically, and some of my identifiers were over 90 characters. Identifier length limits are evil, unless they're ridiculously large (C#, I think, limits identifiers to 4096 characters).
Mar 02 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Christopher Wright wrote:
 -- All very nice, but no cigar. That's about as smart as letting 
 people define *unlimited* length variable names!)

I recently dealt with a programming language that specified a limit of 63 characters for identifier names. This wouldn't have been a significant problem, except that I was generating code automatically, and some of my identifiers were over 90 characters. Identifier length limits are evil, unless they're ridiculously large (C#, I think, limits identifiers to 4096 characters).

As soon as you put in a limit on identifier name length, sooner or later you'll get a bug report on it. For example, C++ can be compiled to C code. C++ templates encode their entire state into the template instance identifier, and these can easily reach 10,000 characters or more. So if your C compiler has a length limit on identifiers, then C++ templates become severely limited. Another thing to consider is it's actually *more* work to put a limit on, where you have to document it, explain it, detect it, diagnose it, recover from it, than if you just make it unlimited. There are really only 3 numbers in computer programming: 0, 1, and unlimited. I always chuckle when I see an ad for like, an editor, that says "up to 5 files open at once!".
Mar 02 2009
parent Georg Wrede <georg.wrede iki.fi> writes:
Walter Bright wrote:
 Christopher Wright wrote:
 Georg Wrede wrote:
 -- All very nice, but no cigar. That's about as smart as letting 
 people define *unlimited* length variable names!)

I recently dealt with a programming language that specified a limit of 63 characters for identifier names. This wouldn't have been a significant problem, except that I was generating code automatically, and some of my identifiers were over 90 characters. Identifier length limits are evil, unless they're ridiculously large (C#, I think, limits identifiers to 4096 characters).

As soon as you put in a limit on identifier name length, sooner or later you'll get a bug report on it. For example, C++ can be compiled to C code. C++ templates encode their entire state into the template instance identifier, and these can easily reach 10,000 characters or more. So if your C compiler has a length limit on identifiers, then C++ templates become severely limited. Another thing to consider is it's actually *more* work to put a limit on, where you have to document it, explain it, detect it, diagnose it, recover from it, than if you just make it unlimited. There are really only 3 numbers in computer programming: 0, 1, and unlimited. I always chuckle when I see an ad for like, an editor, that says "up to 5 files open at once!".

I take it back.
Mar 02 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Georg Wrede wrote:
 *** How to print arrays ***
 
 You print arrays in a predictable and expected way.
 
 D array printing is for non-GUI stuff. Hence, you use the C locale, period.

I think the C locale (or any predefined locale) tells what left bracket I should use for array, what separator, and what right bracket. For now the left and right brackets were eliminated because the user can easily add them on the caller side. The separator is a space simply because it looks the least harmful. But for example I don't have a good solution for what to print as the separator between a hash key and a hash value. A simple, extensible locale support would have allowed me to stop worrying about that. Also, D array printing is not only for console - a GUI may use to!string with arrays. But overall I guess I'll let myself bludgeoned into complacency... Andrei
Mar 02 2009
next sibling parent grauzone <none example.net> writes:
What is language specific about how an array is formatted? I think 
you're abusing the locale stuff as some kind of user customization 
mechanism for format().
Mar 02 2009
prev sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 *** How to print arrays ***

 You print arrays in a predictable and expected way.

 D array printing is for non-GUI stuff. Hence, you use the C locale,
 period.

I think the C locale (or any predefined locale) tells what left bracket I should use for array, what separator, and what right bracket. For now the left and right brackets were eliminated because the user can easily add them on the caller side. The separator is a space simply because it looks the least harmful. But for example I don't have a good solution for what to print as the separator between a hash key and a hash value. A simple, extensible locale support would have allowed me to stop worrying about that. Also, D array printing is not only for console - a GUI may use to!string with arrays. But overall I guess I'll let myself bludgeoned into complacency... Andrei

As far as I'm concerned, an array should be printed as close to how it would be represented in the language as possible. If the user needs to format the array, then they need to format the array, not the runtime. -- Daniel
Mar 02 2009
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:

 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in 
 D. This is a non-urgent, low-priority thing.


Had there been any need for locales, believe me, the "foreigners" in this NG would have asked for it.

I'm Russian. For me, encoding problems are a PITA of such epic proportions that little format inconsistencies simply fade away. Yes it's sometimes hard to decipher what 02/03/08 means since our custom is to put day first and separate with dots. But compare this to Adobe Flex SDK which prints half compiler error messages in Russian (thank you Adobe!) using system default code page, 1251, while default /console/ code page is actually so-called IBM 866. Whenever I use MXML compiler from console I get rubbish for error messages. And there is no way to disable translation--I've found none. Phobos is no better. Any exception resulting from an invalid OS call dumps UTF-8 garbage instead of an error message. std.file.read("non-existent") for instance. I think games are not an issue. I've worked for a company producing cell phone games for a long time. I've localized my game for Chinese market, too. The thing is, game interfaces are always custom, always ad-hoc. They *never* work in untested locales. Well, with some experience you can make them work most of the time in languages you are familiar with, from localization perspective. Anyway, all you need to know is an ID of a supported locale so that you can replace text and locale-specific images accordingly. Then you have correctors and native testing to make sure the localization works.
Mar 02 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:
 
 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in 
 D. This is a non-urgent, low-priority thing.


this NG would have asked for it.

Phobos is no better. Any exception resulting from an invalid OS call dumps UTF-8 garbage instead of an error message. std.file.read("non-existent") for instance.

This is serendipitous. I just posted an example involving throwing a localized "File not found" exception. Please let me know whether that would help. Andrei
Mar 02 2009
prev sibling parent Yigal Chripun <yigal100 gmail.com> writes:
Sergey Gromov wrote:
 Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:

 Of course, eventually we will want to "do something" about this. But
 that should be left to the day when real issues are all sorted out in
 D. This is a non-urgent, low-priority thing.


this NG would have asked for it.

I'm Russian. For me, encoding problems are a PITA of such epic proportions that little format inconsistencies simply fade away. Yes it's sometimes hard to decipher what 02/03/08 means since our custom is to put day first and separate with dots. But compare this to Adobe Flex SDK which prints half compiler error messages in Russian (thank you Adobe!) using system default code page, 1251, while default /console/ code page is actually so-called IBM 866. Whenever I use MXML compiler from console I get rubbish for error messages. And there is no way to disable translation--I've found none. Phobos is no better. Any exception resulting from an invalid OS call dumps UTF-8 garbage instead of an error message. std.file.read("non-existent") for instance. I think games are not an issue. I've worked for a company producing cell phone games for a long time. I've localized my game for Chinese market, too. The thing is, game interfaces are always custom, always ad-hoc. They *never* work in untested locales. Well, with some experience you can make them work most of the time in languages you are familiar with, from localization perspective. Anyway, all you need to know is an ID of a supported locale so that you can replace text and locale-specific images accordingly. Then you have correctors and native testing to make sure the localization works.

encoding isn't that hard compared to other issues. for instance, have you ever tried to make a website go both ways?
Mar 02 2009
prev sibling parent reply Don <nospam nospam.com> writes:
Georg Wrede wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 There will be a global reference to a Locale class, e.g. 
 defaultLocale. By default the reference will be null, implying the C 
 locale should be in effect. Applications can assign to it as they 
 find fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.

The two programs that are most "locale aware" are usually spread sheets and word processors.

And Microsoft products do "locale awareness" so badly, I'm pretty sure there's no simple solution. (<gripe> They could at least recognize that outside the US, everyone uses A4-size paper, not that bizarro letter/legal stuff </gripe>).
 It is usual that the user needs to write, say, in Swedish or in Russian, 
 while in a Finnish setting. Or that one wants to use a decimal separator 
 other than what is "proper" for the country.
 
 For example, a lot of people use "." instead of the official "," in 
 Finland, and many use time as "18:23" instead of "18.23".

This is my experience as well. There's an awful lot of expats in the world.
 For this purpose, these programs let the users define these any way they 
 want.
 
 I think the notion of locales is, slowly but steadily, going away.
 
 It was a nice idea at the time, but with two problems: users don't use 
 it, and programmers don't use it.

I think the whole idea is based on a fallacy: that there IS a locale. The idea that you can choose which currency symbol to use, based on where the computer is, is utterly absurd. Surely these days, nearly everyone has to deal with the Euro, the US dollar, the Pound, and the Yen, as well as their local currency. The world is international now, not local. I nearly always end up setting the locale to "Antarctica", it turns off most the locale logic <g>. There's so many programs that try to be too clever.
 Of course, eventually we will want to "do something" about this. But 
 that should be left to the day when real issues are all sorted out in D. 
 This is a non-urgent, low-priority thing.

Mar 02 2009
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Don wrote:
 there's no simple solution. (<gripe> They could at least recognize that 
 outside the US, everyone uses A4-size paper, not that bizarro 
 letter/legal stuff </gripe>).

Amen!
 It is usual that the user needs to write, say, in Swedish or in 
 Russian, while in a Finnish setting. Or that one wants to use a 
 decimal separator other than what is "proper" for the country.

 For example, a lot of people use "." instead of the official "," in 
 Finland, and many use time as "18:23" instead of "18.23".

This is my experience as well. There's an awful lot of expats in the world.

Not just expats. For example: I was born & raised in the Netherlands, but even though officially we use a decimal comma here, I almost always use a decimal point instead. This may have been caused by use of the US keyboard layout (and its numeric keypad in particular), but I now even catch myself using it when writing with a pen...
 I nearly always end up setting the locale to "Antarctica", it turns off 
 most the locale logic <g>. There's so many programs that try to be too 
 clever.

lol :)
Mar 02 2009
prev sibling parent Steve Schveighoffer <schveiguy yahoo.com> writes:
On Wed, 04 Mar 2009 13:55:10 -0800, Andrei Alexandrescu wrote:

 bearophile wrote:
 Andrei Alexandrescu:
 Binary search is rather common.

Oh, yes, sorry, I meant among the ones you have listed there...

Of five, three are frequent (linear, binary, Boyer-Moore), one is a form of set intersection (find sorted in sorted), and the odd one is: find(a, assumeSorted(b)); This is rare but has excellent best-case complexity (is it O(a.length / b.length)?) and is easy to add for completeness.
 As an aside, your use of "index" suggests you return integrals out of
 the function. IMHO that's strongly unrecommended.

I don't want to use too much of your time (that it may be better spent with your new child), but I don't understand what you mean. That index() function is meant the index position of the item or sub-sequence into the bigger array (or iterable), and it returns -1 if not found. This is an usual design.

This is an extremely sloppy design. That it is usual doesn't make things any better!
 Some people think that such controls for -1 value aren't always done,
 so to avoid that and some bugs, it's better to raise something like
 IndexException when the needle isn't found.

Yah, this is the subject of a long rant but in short: returning int from find means that essentially that find is unusable with anything except random access structures. This in turn means you'd have to have different means, APIs, and user code to deal with e.g. lists, in spite of the fact that linear search is the same boring thing for all: look at the current thing, yes/no, move on to the next thing. IMHO ever since the STL has seen the light of day there is no excuse, not even sheer ignorance, to ever traffic in integers as a mean to access elements in containers, in any language that has even the most modest parameterized types capability. Returning int from find is an insult. To add injury to it, have find also return an int for a list. "Is this item in this list?" "Yeppers. It's the 538th element. Took me a hike to find it." "Well I wanted to do something with it." "Then go get it. I'm telling you, it'll take exactly 538 steps."

BTW, what happens if you pass a sorted list into find? Intuitively, you'd assume you can pass as assumeSorted? But you can't really do anything but linear search? Then what if you pass a tree structure into find? It's sorted, but not exactly random access... I think find should return an iterator/pointer fine, but I don't think there's a find template that can be used on all possible container types. Probably the best solution I think is to implement find inside the container itself, and have the global find function advance a range to the point of the element (or return an empty range if not found), with "quick" searches reserved for sorted random-access ranges. Note that stl works this way. -Steve
Mar 04 2009
prev sibling parent reply Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing 
 about locales. (I know I dislike the design C++ uses.)

D uses Utf-8, and that is *good enough*! This lets my programs "understand" Finnish, and doesn't give me undue headaches. Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read: http://www.manpagez.com/man/1/perllocale/ It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do. An excellent string hierarchy without the entire rest of i18n, is only going to look like a Ferrari with a Trabant engine. Which is worse than nothing at all. Besides, there's more to this than just designing the perfect, or even a good locale system in a language. *Somebody should actually use it*. Now, the non-English programmer, what does he really want? He wants to be able to type stuff into his program in his native character set. D already does that, by way of Utf-8. What else? Well, it is conceivable that he wants his program to print dates and times the way it's done over there. He simply writes the program "by hand" so it does dates and times like he wants. Even if there was a locale thing in the language, he wouldn't bother with the hassle. And he couldn't care less about Urdu. The hypothetical Ambitious Programmer might want to use locale. He could then have the dates and times (and currencies, etc.) follow the country. Now, that might sound commendable, but in practice it *crumbles*. He can't possibly know how to deal with languages that are written backwards, languages where several characters make one letter, exotic ways of writing dates, etc. So, his fancy i18n project is doomed to be, at most, as usable as the "normal" D program. Probably less, since his decisions will actually worsen the user experience -- for users in another culture. And, any project big enough to tackle this, will implement its own locale handling anyway. I'm sorry to say. ---- Yes, locales are nice and all. For D 3.5 that is. Honestly.
Mar 01 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing 
 about locales. (I know I dislike the design C++ uses.)

D uses Utf-8, and that is *good enough*! This lets my programs "understand" Finnish, and doesn't give me undue headaches. Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read: http://www.manpagez.com/man/1/perllocale/ It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.

I don't find that scary at all. It's quite what I expected. We should phase it in, after we do a good design. Also I don't plan to sit down and write locale definition files, I want to parse the XML in that locale repository I referred to.
 An excellent string hierarchy without the entire rest of i18n, is only 
 going to look like a Ferrari with a Trabant engine. Which is worse than 
 nothing at all.

I don't understand this. What is the rest of i18n?
 Besides, there's more to this than just designing the perfect, or even a 
 good locale system in a language. *Somebody should actually use it*.
 
 Now, the non-English programmer, what does he really want? He wants to 
 be able to type stuff into his program in his native character set. D 
 already does that, by way of Utf-8.
 
 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.

If we come up with a good design, then they will be compelled to use it. Applications meant to be used across multiple countries have fumbled with locale support because there's no good support in most languages. So then why not offer a compelling support in D?
 The hypothetical Ambitious Programmer might want to use locale. He could 
 then have the dates and times (and currencies, etc.) follow the country. 
 Now, that might sound commendable, but in practice it *crumbles*.
 He can't possibly know how to deal with languages that are written 
 backwards, languages where several characters make one letter, exotic 
 ways of writing dates, etc.

Well my understanding is that the guys who wrote those RFCs and whatnot spent time figuring out the right abstractions. Why not use them?
 So, his fancy i18n project is doomed to be, at most, as usable as the 
 "normal" D program. Probably less, since his decisions will actually 
 worsen the user experience -- for users in another culture.
 
 
 And, any project big enough to tackle this, will implement its own 
 locale handling anyway. I'm sorry to say.

They will implement their own because the language doesn't offer an extensible framework that they can build on.
 Yes, locales are nice and all.
 For D 3.5 that is.
 Honestly.

I just don't see where the big problem is. I'm talking about a blessed hierarchical hashtable to begin with. My initial desire is to be able to customize the array separators in writeln. Andrei
Mar 01 2009
next sibling parent reply Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing 
 about locales. (I know I dislike the design C++ uses.)

D uses Utf-8, and that is *good enough*! This lets my programs "understand" Finnish, and doesn't give me undue headaches. Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read: http://www.manpagez.com/man/1/perllocale/ It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.

I don't find that scary at all.

Maybe a quick skim doesn't let the issues sink in. :-)
 It's quite what I expected. We should 
 phase it in, after we do a good design. Also I don't plan to sit down 
 and write locale definition files, I want to parse the XML in that 
 locale repository I referred to.

My ex wife has this GPS thing in her car. Very nice. But once on the road, it's too much hassle to type in a street address. And you're always in a hurry, so you don't have time to type it in before driving, while you're stuffing the kids in the car.
 An excellent string hierarchy without the entire rest of i18n, is only 
 going to look like a Ferrari with a Trabant engine. Which is worse 
 than nothing at all.

I don't understand this. What is the rest of i18n?

i18n stands for internationalisation. The word was too long to type. Ah, or you meant the rest? That is, if there is this shiny repository right inside the language for storing these i18n preferences, then that does oblige us to have writefln, regexp, sort, and other stuff to recognise those values, right? Otherwise people will ask how come we have a car but no engine. And that is a job bigger than it looks like. But not doing it fully will have people feel D is less good than if we never had the repository at all! Oh, and who wants writefln, regexp, sort, and the others to become slower? Hands up.
 Besides, there's more to this than just designing the perfect, or even 
 a good locale system in a language. *Somebody should actually use it*.

 Now, the non-English programmer, what does he really want? He wants to 
 be able to type stuff into his program in his native character set. D 
 already does that, by way of Utf-8.

 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.

If we come up with a good design, then they will be compelled to use it.

Gnome and KDE are both GUIs designed by "foreigners". i18n has been a *top priority* from the outset. Start a default project, and you have i18n "inbuilt" in your app. And still, my default clock applet only lets me choose between 12 and 24 hour clock, but the date is always "Mon Mar 2", and I can't get it to "020309", which I want. Or change it at all. And while there are simply excellent provisions for having all your app strings in the local language, hardly any application actually has more than a couple language choices.
 Applications meant to be used across multiple countries have fumbled 
 with locale support because there's no good support in most languages. 
 So then why not offer a compelling support in D?

Nobody will use it. (People buy all these expensive workout machines they see on TV, and they never use them after two weeks.) i18n support is more than having your arrays print in peculiar ways overseas. Ideally, you would translate the UI to several languages, take in consideration some cultural differences, and then have the library muck your strings and variables into the "local" representation. Won't happen in a non-GUI program.
 The hypothetical Ambitious Programmer might want to use locale. He 
 could then have the dates and times (and currencies, etc.) follow the 
 country. Now, that might sound commendable, but in practice it 
 *crumbles*.

 He can't possibly know how to deal with languages that are written 
 backwards, languages where several characters make one letter, exotic 
 ways of writing dates, etc.

Well my understanding is that the guys who wrote those RFCs and whatnot spent time figuring out the right abstractions. Why not use them?

Because we don't have infinite time. Urgent, much asked for, technologically imperative, and other stuff should be done instead. There are both mundane and interesting tasks. Nice-to-haves come later.
 So, his fancy i18n project is doomed to be, at most, as usable as the 
 "normal" D program. Probably less, since his decisions will actually 
 worsen the user experience -- for users in another culture.


 And, any project big enough to tackle this, will implement its own 
 locale handling anyway. I'm sorry to say.

They will implement their own because the language doesn't offer an extensible framework that they can build on.

No, it's because they will only implement the parts that they're interested in. That's pretty easy to do for a big project. (If there will be one for a non-GUI purpose.)
 Yes, locales are nice and all.
 For D 3.5 that is.
 Honestly.

I just don't see where the big problem is. I'm talking about a blessed hierarchical hashtable to begin with.

The big problem is, SOMEONE will have to tell your XML table what values the user wants. Where is this knowledge stored in a way that every D app can get to it? And how do you force the user to populate the XMl table with his choices to begin with? What I'm saying is, it's debatable whether this stuff belongs to "the programming language itself" at all. Rather, it should be an external library, provided by someone else than us. It belongs to SourceForge or Dsource, not here. And definitely all this should be deferred to not 2.0, but to 2.5 or preferrably 3.0. If by that time we have seen that there actually is any use for such a thing, then we can decide whether to outsource it to anybody interested, or to actually try to make it part of the language. I'm not saying it's impossible to do, or to do well. But I am saying it is *way* too insignificant to deserve any attention at this time.
 My initial desire is to be able to customize the array separators in
 writeln.

One might want to print arrays in different ways, even in the same program. Why not let the programmer customise the array printing the same as he does with integers and floats? Just a little addition to the syntax? Or why not just have a print function that takes an array and a format? Arrays are different enough to not comfortably fit into writefln semantics anyway. Clean and practical, in a practical language. Whatever you do, don't mix this with any internationalisation, please.
Mar 02 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 An excellent string hierarchy without the entire rest of i18n, is 
 only going to look like a Ferrari with a Trabant engine. Which is 
 worse than nothing at all.

I don't understand this. What is the rest of i18n?

i18n stands for internationalisation. The word was too long to type. Ah, or you meant the rest? That is, if there is this shiny repository right inside the language for storing these i18n preferences, then that does oblige us to have writefln, regexp, sort, and other stuff to recognise those values, right? Otherwise people will ask how come we have a car but no engine. And that is a job bigger than it looks like. But not doing it fully will have people feel D is less good than if we never had the repository at all! Oh, and who wants writefln, regexp, sort, and the others to become slower? Hands up.

They will only be slower, by necessity, for people who want them localized, not for anyone else.
 Well my understanding is that the guys who wrote those RFCs and 
 whatnot spent time figuring out the right abstractions. Why not use them?

Because we don't have infinite time. Urgent, much asked for, technologically imperative, and other stuff should be done instead. There are both mundane and interesting tasks. Nice-to-haves come later.

This is a misunderstanding. I am talking about a few dozens of lines of code that capitalize on Algebraic to structure the locale space. For starters I just want to e.g. allow people to configure how they stringize and print stuff from D. Hardcoding that kind of stuff, or the strings thrown in exceptions, does not sound too good.
 I just don't see where the big problem is. I'm talking about a blessed 
 hierarchical hashtable to begin with. 

The big problem is, SOMEONE will have to tell your XML table what values the user wants. Where is this knowledge stored in a way that every D app can get to it? And how do you force the user to populate the XMl table with his choices to begin with?

You see, we're not communicating. I sent this link: http://www.unicode.org/cldr/ Did you look at it? It is essentially a database of locale information in a highly structured format. All I want is to define a structure expressive enough to gobble the part of that database that is of interest. The Phobos documentation will say, we just adopt their schema. If users don't want to load any, then fine - everything is just like today.
 What I'm saying is, it's debatable whether this stuff belongs to "the 
 programming language itself" at all. Rather, it should be an external 
 library, provided by someone else than us. It belongs to SourceForge or 
 Dsource, not here.

http://www.unicode.org/cldr/ We just need to load it if there is such a need.
 And definitely all this should be deferred to not 2.0, but to 2.5 or 
 preferrably 3.0. If by that time we have seen that there actually is any 
 use for such a thing, then we can decide whether to outsource it to 
 anybody interested, or to actually try to make it part of the language.
 
 
 I'm not saying it's impossible to do, or to do well. But I am saying it 
 is *way* too insignificant to deserve any attention at this time.

You and I have completely different understandings of the level of effort needed. It's not like I don't have anything to do. :o) Let me try again: I don't want to define locale support. I want to provide the basics for people to roll it out themselves. Andrei
Mar 02 2009
parent reply Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 You see, we're not communicating. I sent this link:
 
 http://www.unicode.org/cldr/
 
 Did you look at it? It is essentially a database of locale information 
 in a highly structured format. All I want is to define a structure 
 expressive enough to gobble the part of that database that is of 
 interest. The Phobos documentation will say, we just adopt their schema. 
 If users don't want to load any, then fine - everything is just like today.

I read the page. It says "This data is used by a wide spectrum of companies for their software internationalization and localization". The first link in the text part is to the CLDR Overview ppt. I read it. On page 5 it says: "Companies / Organizations Adobe, Apple (Mac OS X), abas Software, Ascential Software, Avaya, BEA, BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, CERN, ClearCommerce, Cognos, Debian Linux, D programming language, Gentoo Linux, GNU Classpath, HP, Hyperion, IBM, Inktomi, Innodata Isogen, Isogon, Informatica, Intel, Interlogics, IONA, IXOS, Macromedia, Mathworks, OpenOffice, Language Analysis Systems, Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, Novell (SuSE), Optio Software, PayPal, Progress Software, Python, QNX, Quark, Rogue Wave, SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), Sybase, Teradata (NCR), Trados, Trend Micro, Virage, webMethods, WMS Gaming, Xerox, Yahoo!, and many more" One sees here major companies, operating systems, and three languages: D, Python and Java. The page is from 2005. So D "has had this since at least 2005". What can I say? I guess we have to implement it then...
 What I'm saying is, it's debatable whether this stuff belongs to "the 
 programming language itself" at all. Rather, it should be an external 
 library, provided by someone else than us. It belongs to SourceForge 
 or Dsource, not here.

http://www.unicode.org/cldr/ We just need to load it if there is such a need.

In another post you sounded as if there is a connection between this stuff and printing arrays. I'm not sure I see the connection.
 Let me try again: I don't want to define locale support. I want to 
 provide the basics for people to roll it out themselves.

I downloaded the files in http://unicode.org/Public/cldr/1.6.1/ which were core.zip, posix.zip, tests.zip and tools.zip. They unzipped to 140MB, containing some 200 java files and some 800 xml files, among others. The readme.txt in tools.zip says: "The code is very preliminary, so don't expect stability from the APIs (or documentation!), since we still have to work out how we want to do the architecture." The main web page says "CLDR 1.7 Tentative Schedule: 2008-09", but it still isn't on the download page. The last version is 2008-07-23 Version1.6.1. ============== My take: * This is still a moving target * Using this is a major hassle for the programmer * With D2 itelf a moving target, nobody is going to invest enough time in this to actually use it for something worthwhile in the next 6 to 12 months anyway * This is more application level stuff than language level stuff * Doing this now will steal time from you, Walter, and many of us, both directly, and indirectly by leaching bandwidth in the newsgroup -- time that should be spent on more urgent or more important things, or even documentation * If it's so easy to do, then why not do it a week before the release of final D2 I really can't help it, but this is how I see it.
Mar 02 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 You see, we're not communicating. I sent this link:

 http://www.unicode.org/cldr/

 Did you look at it? It is essentially a database of locale information 
 in a highly structured format. All I want is to define a structure 
 expressive enough to gobble the part of that database that is of 
 interest. The Phobos documentation will say, we just adopt their 
 schema. If users don't want to load any, then fine - everything is 
 just like today.

I read the page. It says "This data is used by a wide spectrum of companies for their software internationalization and localization". The first link in the text part is to the CLDR Overview ppt. I read it. On page 5 it says: "Companies / Organizations Adobe, Apple (Mac OS X), abas Software, Ascential Software, Avaya, BEA, BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, CERN, ClearCommerce, Cognos, Debian Linux, D programming language, Gentoo Linux, GNU Classpath, HP, Hyperion, IBM, Inktomi, Innodata Isogen, Isogon, Informatica, Intel, Interlogics, IONA, IXOS, Macromedia, Mathworks, OpenOffice, Language Analysis Systems, Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, Novell (SuSE), Optio Software, PayPal, Progress Software, Python, QNX, Quark, Rogue Wave, SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), Sybase, Teradata (NCR), Trados, Trend Micro, Virage, webMethods, WMS Gaming, Xerox, Yahoo!, and many more" One sees here major companies, operating systems, and three languages: D, Python and Java. The page is from 2005. So D "has had this since at least 2005". What can I say? I guess we have to implement it then...

Hehe, didn't see that.
 What I'm saying is, it's debatable whether this stuff belongs to "the 
 programming language itself" at all. Rather, it should be an external 
 library, provided by someone else than us. It belongs to SourceForge 
 or Dsource, not here.

http://www.unicode.org/cldr/ We just need to load it if there is such a need.

In another post you sounded as if there is a connection between this stuff and printing arrays. I'm not sure I see the connection.

Very simple. If we have a locale table, I am thinking of dedicating a branch "std" in it to stuff that's in std. For example, I can use currentLocale.get("std", "array-separator") or something.
 Let me try again: I don't want to define locale support. I want to 
 provide the basics for people to roll it out themselves.

I downloaded the files in http://unicode.org/Public/cldr/1.6.1/ which were core.zip, posix.zip, tests.zip and tools.zip. They unzipped to 140MB, containing some 200 java files and some 800 xml files, among others. The readme.txt in tools.zip says: "The code is very preliminary, so don't expect stability from the APIs (or documentation!), since we still have to work out how we want to do the architecture." The main web page says "CLDR 1.7 Tentative Schedule: 2008-09", but it still isn't on the download page. The last version is 2008-07-23 Version1.6.1. ============== My take: * This is still a moving target * Using this is a major hassle for the programmer * With D2 itelf a moving target, nobody is going to invest enough time in this to actually use it for something worthwhile in the next 6 to 12 months anyway * This is more application level stuff than language level stuff * Doing this now will steal time from you, Walter, and many of us, both directly, and indirectly by leaching bandwidth in the newsgroup -- time that should be spent on more urgent or more important things, or even documentation * If it's so easy to do, then why not do it a week before the release of final D2 I really can't help it, but this is how I see it.

I understand. Andrei
Mar 02 2009
parent Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 In another post you sounded as if there is a connection between this 
 stuff and printing arrays. I'm not sure I see the connection.

Very simple. If we have a locale table, I am thinking of dedicating a branch "std" in it to stuff that's in std. For example, I can use currentLocale.get("std", "array-separator") or something.

Let's disconnect the two. What if you were to document the data structure Algebraic, and mention that D itself uses an instance of it already to store array printing parameters. You could also mention that an Algebraic would be a great place to store Unicode CLDR data, and other (Windows) registry type things. And then, in the examples directory one could find a program that reads the CLDR xml (or of course a suitable snippet of it) and populates an instance of Algebraic. And then this example program would do something small but useful with the data.
Mar 02 2009
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Georg Wrede wrote:
 So D "has had this since at least 2005". What can I say? I guess we have 
 to implement it then...

Wow, D usually gets slammed for not having a feature that even a cursory glance at the documentation shows it has. This is the first vaporware feature!
Mar 02 2009
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Jarrett Billingsley wrote:
 functions for
 indexing and slicing on character boundaries) before this.

These already exist in std.uni.
Mar 02 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jarrett Billingsley wrote:
 On Mon, Mar 2, 2009 at 1:52 PM, Georg Wrede <georg.wrede iki.fi> wrote:
 My take:

  * This is still a moving target
  * Using this is a major hassle for the programmer
  * With D2 itelf a moving target, nobody is going to invest enough time in
 this to actually use it for something worthwhile in the next 6 to 12 months
 anyway
  * This is more application level stuff than language level stuff
  * Doing this now will steal time from you, Walter, and many of us, both
 directly, and indirectly by leaching bandwidth in the newsgroup -- time that
 should be spent on more urgent or more important things, or even
 documentation
  * If it's so easy to do, then why not do it a week before the release of
 final D2

I agree entirely. Localization and internationalization seem like things that should be at a much higher level than a standard library. Everyone's going to want to do it differently. Providing a thin, cross-platform wrapper over what the OS exposes is fine, but creating a proper i18n/l10n framework is a huge project in and of itself (I think the 140MB Java package makes that abundantly clear).

I must be missing something huge because I keep on misunderestimating (sic :o)) the scope of this project. Let me try to state my point again: I don't want to provide locale-specific strings, collation orders, date, time, and number formatters, or class hierarchies that do all of the above. Zip. Nada. Zilch. I want to put together a string-based hierarchical string table that allows depositing ALL OF THE ABOVE in it, without initially putting ANYTHING in it. What's nice is that others have already defined the keys and the possible values used by that table. Possibly you are missing one or more of the following points: 1) The existence of a hierarchical nomenclature for localization; 2) The existence of a large database containing localized values for said nomenclature; 2) The power of Algebraic, which allows depositing data, functions, and subtables alike in a uniform format.
 I'd much rather see a rewritten std.stream and proper Unicode support
 in std.string (support for types other than string, functions for
 indexing and slicing on character boundaries) before this.

That, incidentally, is more complicated :o). Andrei
Mar 02 2009
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-03-02 16:42:37 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I want to put together a string-based hierarchical string table that 
 allows depositing ALL OF THE ABOVE in it, without initially putting 
 ANYTHING in it. What's nice is that others have already defined the 
 keys and the possible values used by that table.
 
 Possibly you are missing one or more of the following points:
 
 1) The existence of a hierarchical nomenclature for localization;
 
 2) The existence of a large database containing localized values for 
 said nomenclature;
 
 2) The power of Algebraic, which allows depositing data, functions, and 
 subtables alike in a uniform format.

What I'm missing is a justification as of why you need all this data in a common deposit in the first place. How do you justify the need for that? Which function needs this data and why using an Algebraic makes it better than other approaches. As for the large database, I have nothing with using an existing large database, but I'd rather see my app use whatever is part of the underlying OS first, then rely on an external database if that is insuficient. Your approach seems to be this: Unicode defines a huge database containing all kinds of locale information, let's expose that, allow other people to plug their own data inside, and use that as the standard format for passing locale data to various functions. I only oppose the last part -- the "use that as the standard format for passing locale data to various functions" part. That you're using Algebraic does not change that various functions will search data at some places in the structure. If the data isn't there, because you want to some other formatting system, you'll get wrong results. Perhaps you should explain more how you see this used in the context where we want to localize some data, how we can use it to define our own data, etc. Because this dicussion is lost in generalities and vague ideas right now. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 02 2009
prev sibling parent reply Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 Jarrett Billingsley wrote:
 On Mon, Mar 2, 2009 at 1:52 PM, Georg Wrede <georg.wrede iki.fi> wrote:
 My take:

  * This is still a moving target
  * Using this is a major hassle for the programmer
  * With D2 itelf a moving target, nobody is going to invest enough 
 time in
 this to actually use it for something worthwhile in the next 6 to 12 
 months
 anyway
  * This is more application level stuff than language level stuff
  * Doing this now will steal time from you, Walter, and many of us, both
 directly, and indirectly by leaching bandwidth in the newsgroup -- 
 time that
 should be spent on more urgent or more important things, or even
 documentation
  * If it's so easy to do, then why not do it a week before the 
 release of
 final D2

I agree entirely. Localization and internationalization seem like things that should be at a much higher level than a standard library. Everyone's going to want to do it differently. Providing a thin, cross-platform wrapper over what the OS exposes is fine, but creating a proper i18n/l10n framework is a huge project in and of itself (I think the 140MB Java package makes that abundantly clear).

I must be missing something huge because I keep on misunderestimating (sic :o)) the scope of this project.

I agree. :-)
 Let me try to state my point again: I don't want to provide 
 locale-specific strings, collation orders, date, time, and number 
 formatters, or class hierarchies that do all of the above. Zip. Nada. 
 Zilch.
 
 I want to put together a string-based hierarchical string table that 
 allows depositing ALL OF THE ABOVE in it, without initially putting 
 ANYTHING in it. What's nice is that others have already defined the keys 
 and the possible values used by that table.

One of the problems is, people start expecting something if they find this string repository. They'd expect some of the work you said you don't provide, done. And if the table isn't even *prepopulated*, then people really feel stranded. It doesn't help much to state in the docs "if you need to fill it goto http://whatever, and hope the format hasn't changed". Besides, on that site, what exactly should be downloaded is unobvious enough that the new user will probably not bother. Nor the normal app programmer.
 Possibly you are missing one or more of the following points:
 
 1) The existence of a hierarchical nomenclature for localization;

With a hammer in hand, everything looks like a nail. With a swiss army knife in your hand, nothing in the house is safe.
 2) The existence of a large database containing localized values for 
 said nomenclature;

So where will this be stored? In a .dmdrc directory in the user's home? One per system? Or every app stores it in a .ini file? Is this per app or common to all user's apps? And when it's updated (by who?), will all his own settings vanish? Or is there a mechanism (or does he have to invent one?) for reattaching his own settings after the update?
 2) The power of Algebraic, which allows depositing data, functions, and 
 subtables alike in a uniform format.

Seriously however, Algebraic does sound cool! No question.
Mar 02 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Georg Wrede wrote:
[snip]

Well I guess what I'll do is take the path of least resistance - 
nothing. Looks like locales are rather unpopular...

Actually I will do something. I'll start removing some of the silly 
Exception derivees from std.


Andrei
Mar 02 2009
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-03-02 23:27:49 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Georg Wrede wrote:
 [snip]
 
 Well I guess what I'll do is take the path of least resistance - 
 nothing. Looks like locales are rather unpopular...

Sad. Seriously, I think if D could have locales as a standard feature it'd be great. Supporting various locales is often a must when you deploy an application, and when libraries try to do it differently you find yourself in a mess. One thing I dislike in your approach is that you're designing the underlying data storage system before considering the API we're going to use. What we need the most is a standard API for localizing the display and input of data, and I somewhat fail to see how storing all the localization parameters in an Algebraic solves this problem. I mean, let's say you want to output a localized message, perhaps we could do this: writefln("Hello number %f", 123456.44); writefln(localize("Hello number %f"), 123456.44); // default locale Locale fr = locale("fr"); writefln(localize("Hello number %f", fr), 123456.44); and expect this output: Hello number 123456.44 Hello number 123,456.44 Bonjour numro 123456,44 ? That'd be an interesting feature. But as of yet I have no idea how we're supposed to use all that locale information you want to keep in your algebraic type; you haven't provided much examples like the one above where all you want is to format a string and a number. Perhaps it's clear in your head, but to me it's vague. Exposing all this locale data is useless if it isn't supported by the library's functions. What I wrote above could work that way: localize(...) returns a FormattedString!(...) struct template containing a string and a templated function "format" for formatting its arguments. writefln being a templated function, it'll call toString on its first argument and check if it provide a "format" function, and if it does it passes all the other arguments through it before output. The "localize" function could be overloaded to accept various types of locales. Including, but not limited to, your Algebraic locale data. The downside of this approach is that it requires functions accepting a locale or a localized formatted string to be templates. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 03 2009
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  1 de marzo a las 19:40 me escribiste:
 Georg Wrede wrote:
Andrei Alexandrescu wrote:
Sooner or later that will need to be defined. I know next to nothing about
locales. (I know I dislike the design C++ uses.)

This lets my programs "understand" Finnish, and doesn't give me undue headaches. Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read: http://www.manpagez.com/man/1/perllocale/ It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.

I don't find that scary at all. It's quite what I expected. We should phase it in, after we do a good design. Also I don't plan to sit down and write locale definition files, I want to parse the XML in that locale repository I referred to.

I'm not following this thread carefully and I don't know if this is what you are implying, but: Please don't you even think in duplicating the locale stuff, at least on unix there is a very nice database that needs to be updated sometimes very often (due to stupid presidents like the one I have now that changes the summer saving time all the time). PHP for example maintains a copy of this locale data and is a real PITA. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- The average person laughs 13 times a day
Mar 02 2009
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-03-02 08:32:40 -0500, Leandro Lucarella <llucax gmail.com> said:

 I'm not following this thread carefully and I don't know if this is what
 you are implying, but: Please don't you even think in duplicating the
 locale stuff, at least on unix there is a very nice database that needs to
 be updated sometimes very often (due to stupid presidents like the one
 I have now that changes the summer saving time all the time).
 
 PHP for example maintains a copy of this locale data and is a real PITA.

I do agree. In another post I proposed we create formatter classes for numbers and dates. This way, you can use a formatter binding to the UNIX database and APIs, or the Windows APIs, or Cocoa, etc., or you can build your own. All you need is a generic front end formatter interface you can bind to anything (and a common internal representation for dates) something like: interface DateFormatter { string timestampToString(int timestamp); int stringToTimestamp(string date); } DateFormatter defaultDateFormatter(); DateFormatter dateFormatterForLocale(string localeName); interface NumberFormatter { string intToString(int number); int stringToInt(string number); } NumberFormatter defaultNumberFormatter(); NumberFormatter numberFormatterForLocale(string localeName); -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 02 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2009-03-02 08:32:40 -0500, Leandro Lucarella <llucax gmail.com> said:
 
 I'm not following this thread carefully and I don't know if this is what
 you are implying, but: Please don't you even think in duplicating the
 locale stuff, at least on unix there is a very nice database that 
 needs to
 be updated sometimes very often (due to stupid presidents like the one
 I have now that changes the summer saving time all the time).

 PHP for example maintains a copy of this locale data and is a real PITA.

I do agree. In another post I proposed we create formatter classes for numbers and dates. This way, you can use a formatter binding to the UNIX database and APIs, or the Windows APIs, or Cocoa, etc., or you can build your own. All you need is a generic front end formatter interface you can bind to anything (and a common internal representation for dates) something like: interface DateFormatter { string timestampToString(int timestamp); int stringToTimestamp(string date); } DateFormatter defaultDateFormatter(); DateFormatter dateFormatterForLocale(string localeName); interface NumberFormatter { string intToString(int number); int stringToInt(string number); } NumberFormatter defaultNumberFormatter(); NumberFormatter numberFormatterForLocale(string localeName);

This is exactly one thing I want to avoid for Phobos: defining class hierarchies for locales. No. If you want to provide a specific date formatter, you plant a delegate in the locale table. The code in Phobos doing formatting will detect that and call your delegate passing in the date. You do whatever you want on your side (format on the spot, use your own class hierarchy etc.) Again: mechanism only. Not policy. Andrei
Mar 02 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 If you want to provide a specific date formatter, you plant a delegate 
 in the locale table. The code in Phobos doing formatting will detect 
 that and call your delegate passing in the date. You do whatever you 
 want on your side (format on the spot, use your own class hierarchy etc.)
 
 Again: mechanism only. Not policy.
 
 
 Andrei

Weak typing for the win!
Mar 02 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 If you want to provide a specific date formatter, you plant a delegate 
 in the locale table. The code in Phobos doing formatting will detect 
 that and call your delegate passing in the date. You do whatever you 
 want on your side (format on the spot, use your own class hierarchy etc.)

 Again: mechanism only. Not policy.


 Andrei

Weak typing for the win!

Yes. Sometimes it's exactly what the doctor prescribed, as I believe is in this case. Andrei
Mar 02 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el  1 de marzo a las 19:40 me escribiste:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing about
locales. (I know I dislike the design C++ uses.)

This lets my programs "understand" Finnish, and doesn't give me undue headaches. Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read: http://www.manpagez.com/man/1/perllocale/ It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.

definition files, I want to parse the XML in that locale repository I referred to.

I'm not following this thread carefully and I don't know if this is what you are implying, but: Please don't you even think in duplicating the locale stuff, at least on unix there is a very nice database that needs to be updated sometimes very often (due to stupid presidents like the one I have now that changes the summer saving time all the time). PHP for example maintains a copy of this locale data and is a real PITA.

You're right, we won't engage in the business of maintaining locale databases. We provide mechanism, not policy. Andrei
Mar 02 2009
parent Derek Parnell <derek psych.ward> writes:
On Mon, 02 Mar 2009 06:28:12 -0800, Andrei Alexandrescu wrote:


 You're right, we won't engage in the business of maintaining locale 
 databases. We provide mechanism, not policy.

Ok, for awhile there I thought you were attempting to duplicate the efforts that the operating systems already do. I see locale support in D as being a platform-independant method of invoking existing operating system functionality. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Mar 02 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Georg Wrede wrote:
 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.

I've attempted to use locales, but the reason I'd always wind up doing it by hand is because the existing libraries to do it are obtuse, impenetrable, execrable, and pretty much unusable. So it may be that it's an insoluble problem, or maybe nobody has come up with the right abstraction yet. I don't have nearly enough experience with it to know the answer.
Mar 01 2009
next sibling parent reply "Joel C. Salomon" <joelcsalomon gmail.com> writes:
Walter Bright wrote:
 I've attempted to use locales, but the reason I'd always wind up doing
 it by hand is because the existing libraries to do it are obtuse,
 impenetrable, execrable, and pretty much unusable.
 
 So it may be that it's an insoluble problem, or maybe nobody has come up
 with the right abstraction yet. I don't have nearly enough experience
 with it to know the answer.

Sounds like it’s not yet suitable for D2, then, at least not in std. Perhaps put an experimental interface in ext? —Joel Salomon
Mar 01 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Joel C. Salomon wrote:
 Walter Bright wrote:
 I've attempted to use locales, but the reason I'd always wind up doing
 it by hand is because the existing libraries to do it are obtuse,
 impenetrable, execrable, and pretty much unusable.

 So it may be that it's an insoluble problem, or maybe nobody has come up
 with the right abstraction yet. I don't have nearly enough experience
 with it to know the answer.

Sounds like its not yet suitable for D2, then, at least not in std. Perhaps put an experimental interface in ext?

Good idea. But before we do so, I was hoping I'd pick the brains of people who have used locales in other languages and understand the burning points. Somehow, however, I'm doing a lousy job at eliciting contributions from people on this newsgroup (guess I'd be a lousy salesman). I tried a couple of times and all I got was a few new keyword proposals and a few new syntax proposals :o). What am I doing wrong? Andrei
Mar 01 2009
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Good idea. But before we do so, I was hoping I'd pick the brains of 
 people who have used locales in other languages and understand the 
 burning points. Somehow, however, I'm doing a lousy job at eliciting 
 contributions from people on this newsgroup (guess I'd be a lousy 
 salesman). I tried a couple of times and all I got was a few new 
 keyword proposals and a few new syntax proposals :o). What am I doing 
 wrong?

I think there are three aspects to localization. One is date and number formating. Another is offering a facility for translating all the messages an application can give. And the last one is the configuration part, where you know which format to use. The only problem I've seen addressed by you right now is the configuration part; I believe it's the wrong end to start with. We should start by defining how to perform the tasks I enumerated above: translating date and number formats, selecting strings for a given language. After that we can figure out how to pass the proper default configuration around. And then you're done. For date and number formatting, I like very much the NSDateFormatter and NSNumberFormatter approach in Cocoa for instance: you have a base class to format dates, another for numbers; you can easily create your own subclass if you want, and there's a way to get the default formatter instance. This is extensible, because if you wanted to go further, you could add formatter classes for various units (length, mass...), or anything else. Translating strings is a little harder because 1) strings are application-defined, 2) strings are often not available in the user's prefered language, adding the need for a fallback mecanism, and 3) different applications will want to to store those strings in different ways. Perhaps we could define a base class for getting translated strings, then allow the program to use whatever subclass it wants. Notice how I'm not using the word "locale" to talk about these things. "Locale" is a concept too abstract to be able to do something good with it. Since you could only define it using Algebraic type and a loosely defined tree of strings, that seems to confirm my view. Call the module std.locale if you want, but keep in mind that the most important task at hand is facilitating localization, not defining what constitutes a locale, that can wait. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 02 2009
next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Michel Fortin, el  2 de marzo a las 07:30 me escribiste:
 On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> said:
 
Good idea. But before we do so, I was hoping I'd pick the brains of people who
have used locales in other languages and understand the burning points.
Somehow, 
however, I'm doing a lousy job at eliciting contributions from people on this
newsgroup (guess I'd be a lousy salesman). I tried a couple of times and all I
got 
was a few new keyword proposals and a few new syntax proposals :o). What am I
doing wrong?

I think there are three aspects to localization. One is date and number formating. Another is offering a facility for translating all the messages an application can give. And the last one is the configuration part, where you know which format to use.

I think you are confusing localization (l10n) with internationalization (i18n)[1]. Locales is about l10n, it's numbers and date formats, time zones, etc. i18n is about translations. I've used the standard C API for localization and I found it quite simple and good. What's wrong with it? I've used gettext[1] too (which is almost a de-facto standard in unix), and even when it could be improved I think it does a pretty good job, and it has a lot of very subtle problems solved. I think l10n and i18n should be taken with a lot of care, because it's very hard to get it right (like concurrency ;). There are a lot of rough edges and exceptions to thing that at first sight looks so universal that makes very easy to make a bad desing (like plural forms[3]). The gettext manual[4] is a great source to see how big this is. Gettext is supported in most major programming languages, so I think D could greatly benefit from using it too. [1] http://en.wikipedia.org/wiki/Internationalization_and_localization [2] http://www.gnu.org/software/gettext/ [3] http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms [4] http://www.gnu.org/software/gettext/manual/gettext.html -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- I always get the feeling that when lesbians look at me, they're thinking, '*That's* why I'm not a heterosexual.' -- George Constanza
Mar 02 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Good idea. But before we do so, I was hoping I'd pick the brains of 
 people who have used locales in other languages and understand the 
 burning points. Somehow, however, I'm doing a lousy job at eliciting 
 contributions from people on this newsgroup (guess I'd be a lousy 
 salesman). I tried a couple of times and all I got was a few new 
 keyword proposals and a few new syntax proposals :o). What am I doing 
 wrong?

I think there are three aspects to localization. One is date and number formating. Another is offering a facility for translating all the messages an application can give. And the last one is the configuration part, where you know which format to use.

Sounds like a good start.
 The only problem I've seen addressed by you right now is the 
 configuration part; I believe it's the wrong end to start with.
 
 We should start by defining how to perform the tasks I enumerated above: 
 translating date and number formats, selecting strings for a given 
 language. After that we can figure out how to pass the proper default 
 configuration around. And then you're done.
 
 For date and number formatting, I like very much the NSDateFormatter and 
 NSNumberFormatter approach in Cocoa for instance: you have a base class 
 to format dates, another for numbers; you can easily create your own 
 subclass if you want, and there's a way to get the default formatter 
 instance.

Well I was thinking of passing the buck around. Instead of std.locale defining a hierarchy for formatting numbers and dates, it provides a means for user code to plant a routine in the locale object that knows how to format numbers and dates. Of course, with time default localized routine implementations will show up (hopefully contributed to by people), but the basic mechanism is simple - there exists a locale table that allows you to store a delegate in it.
 This is extensible, because if you wanted to go further, you could add 
 formatter classes for various units (length, mass...), or anything else.

This I want to avoid, at least for the time being. I want to define a table that can contain strings, integers, delegates, and other sub-tables. This is it. The path to extensibility will not be Phobos defining new classes to format various things. This could go on forever. Phobos will use the table consistently, and users who do want to format various things will simply plant their delegates in the table.
 Translating strings is a little harder because 1) strings are 
 application-defined, 2) strings are often not available in the user's 
 prefered language, adding the need for a fallback mecanism, and 3) 
 different applications will want to to store those strings in different 
 ways. Perhaps we could define a base class for getting translated 
 strings, then allow the program to use whatever subclass it wants.

There's no need for classes and subclasses. It's all data. Why should we replace data with code? Data is easier. Consider some code in phobos that must throw an exception: throw Exception("File `%s' not found, system error is %s.", filename, errnomsg); The localized version will look like this: auto format = "File `%s' not found, system error is %s."; auto localFormat = currentLocale ? currentLocale.peek(format) : null; if (!localFormat) localFormat = format; throw Exception(localFormat, filename, errnomsg); What happens is that the default format string _is_ the key for looking up the localized strings. If there's no value for that string, the default format string is in vigor. Note that on the default path, currentLocale is null so there is hardly any inefficiency.
 Notice how I'm not using the word "locale" to talk about these things. 
 "Locale" is a concept too abstract to be able to do something good with 
 it. Since you could only define it using Algebraic type and a loosely 
 defined tree of strings, that seems to confirm my view. Call the module 
 std.locale if you want, but keep in mind that the most important task at 
 hand is facilitating localization, not defining what constitutes a 
 locale, that can wait.
 

How should I call it? Andrei
Mar 02 2009
next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:

 Consider some code in phobos that must throw an exception:
 
 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);
 
 The localized version will look like this:
 
 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

This example does not address the encoding problem. Currently, errnomsg is in Russian, UTF-8 encoded. So I get "system error is <garbage>" on the console. If you adopt locales I'll get garbage not only for the system error but for the rest of the exception message as well. To actually solve this problem the default exception handler must be fixed to convert any UTF-8 into the current OEM code page before printing. It would also help if default stdin and stdout performed such a conversion.
 What happens is that the default format string _is_ the key for looking 
 up the localized strings.

Nice. This means that error messages become a part of API and are subject to backward and forward compatibility issues. Isn't it too much?
Mar 02 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:
 
 Consider some code in phobos that must throw an exception:

 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);

 The localized version will look like this:

 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

This example does not address the encoding problem. Currently, errnomsg is in Russian, UTF-8 encoded. So I get "system error is <garbage>" on the console. If you adopt locales I'll get garbage not only for the system error but for the rest of the exception message as well. To actually solve this problem the default exception handler must be fixed to convert any UTF-8 into the current OEM code page before printing. It would also help if default stdin and stdout performed such a conversion.

I see.
 What happens is that the default format string _is_ the key for looking 
 up the localized strings.

Nice. This means that error messages become a part of API and are subject to backward and forward compatibility issues. Isn't it too much?

I think it isn't too much, considering the sorry state of affairs of today's exceptions. You can't even answer the question: "Given this FileException object, what file name was concerned?" And each module defines its own exception class that is equally useless. It's ridiculous. 95% of them must be removed. And we must have systematic formatting of all strings initiated by Phobos. Andrei
Mar 02 2009
prev sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

No, stdin/stdout *must* perform this conversion. It is a serious bug if they don't. The conversion cannot be performed at any other level. D uses unicode internally. The console uses a specific encoding. Therefore all data passing between D and the console must be encoded/decoded. -- Rainer Deyke - rainerd eldwood.com
Mar 02 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Rainer Deyke wrote:
 Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

No, stdin/stdout *must* perform this conversion. It is a serious bug if they don't. The conversion cannot be performed at any other level. D uses unicode internally. The console uses a specific encoding. Therefore all data passing between D and the console must be encoded/decoded.

What API to use to detect the encoding used by the console? Andrei
Mar 02 2009
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

No, stdin/stdout *must* perform this conversion. It is a serious bug if they don't. The conversion cannot be performed at any other level. D uses unicode internally. The console uses a specific encoding. Therefore all data passing between D and the console must be encoded/decoded.

What API to use to detect the encoding used by the console? Andrei

According to <http://markmail.org/message/neu2pllqz3sst4tq>, it's uint GetConsoleOutputCP() <http://msdn.microsoft.com/en-us/library/ms683169%28VS.85%29.aspx>. Interestingly, there's a SetConsoleOutputCP <http://msdn.microsoft.com/en-us/library/ms686036(VS.85).aspx> function. Check this out:
 module utf;

 import tango.io.Stdout;

 extern(Windows) int SetConsoleOutputCP(uint wCodePageID);

 void main()
 {
     SetConsoleOutputCP(65001);
     Stdout("Не∟└Ω").newline;
 }

FYI, "65001" is how Windows spells "UTF-8". Also note that this won't work in anything earlier than Windows 2000, but then, even that's not supported any more. Note that you MUST change the console's font to Lucidia Console (right-click title, properties, font tab) for this to actually display, but that's not something D can control. :P -- Daniel
Mar 02 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Daniel Keep wrote:
 
 Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

they don't. The conversion cannot be performed at any other level. D uses unicode internally. The console uses a specific encoding. Therefore all data passing between D and the console must be encoded/decoded.

Andrei

According to <http://markmail.org/message/neu2pllqz3sst4tq>, it's uint GetConsoleOutputCP() <http://msdn.microsoft.com/en-us/library/ms683169%28VS.85%29.aspx>. Interestingly, there's a SetConsoleOutputCP <http://msdn.microsoft.com/en-us/library/ms686036(VS.85).aspx> function. Check this out:
 module utf;

 import tango.io.Stdout;

 extern(Windows) int SetConsoleOutputCP(uint wCodePageID);

 void main()
 {
     SetConsoleOutputCP(65001);
     Stdout("Не∟└Ω").newline;
 }

FYI, "65001" is how Windows spells "UTF-8". Also note that this won't work in anything earlier than Windows 2000, but then, even that's not supported any more. Note that you MUST change the console's font to Lucidia Console (right-click title, properties, font tab) for this to actually display, but that's not something D can control. :P -- Daniel

Ahhhh... Windows you mean? Ehm. I need to get to a Windows machine. If you could paste this into a bug report that would be great. Thanks, Andrei
Mar 02 2009
prev sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 02 Mar 2009 12:53:48 -0800, Andrei Alexandrescu wrote:

 Rainer Deyke wrote:
 Sergey Gromov wrote:
 To actually solve this problem the default exception handler must be
 fixed to convert any UTF-8 into the current OEM code page before
 printing.  It would also help if default stdin and stdout performed such
 a conversion.

No, stdin/stdout *must* perform this conversion. It is a serious bug if they don't. The conversion cannot be performed at any other level. D uses unicode internally. The console uses a specific encoding. Therefore all data passing between D and the console must be encoded/decoded.

What API to use to detect the encoding used by the console?

There is std.windows.charset.toMBSz(str, 1) which does the right thing.
Mar 02 2009
prev sibling next sibling parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 The localized version will look like this:
 
 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

This short example suggests: Locale.peek(T)(char[] key, T ifNotFound = T.init) auto localFormat = currentLocale ? currentLocale.peek(format, format) : format; throw new Exception(localFormat);
Mar 02 2009
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:

 Consider some code in phobos that must throw an exception:
 
 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);
 
 The localized version will look like this:
 
 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

One problem with this approach is that we meet the limitation of the formatting string's micro-syntax. Currently, there is no way to reorder the tokens in a message string, and that is required for /some/ messages in /some/ languages. I have used my own text formatting routine rather than Phobos' because it allows the implementer to develop messages whose word order is correct for their target language. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Mar 02 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 On Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:
 
 Consider some code in phobos that must throw an exception:

 throw Exception("File `%s' not found, system error is %s.",
      filename, errnomsg);

 The localized version will look like this:

 auto format = "File `%s' not found, system error is %s.";
 auto localFormat = currentLocale ? currentLocale.peek(format) : null;
 if (!localFormat) localFormat = format;
 throw Exception(localFormat, filename, errnomsg);

One problem with this approach is that we meet the limitation of the formatting string's micro-syntax. Currently, there is no way to reorder the tokens in a message string, and that is required for /some/ messages in /some/ languages. I have used my own text formatting routine rather than Phobos' because it allows the implementer to develop messages whose word order is correct for their target language.

Phobos has supported Posix positional syntax since 2.006. http://digitalmars.com/d/2.0/phobos/std_stdio.html Andrei
Mar 02 2009
parent Derek Parnell <derek psych.ward> writes:
On Mon, 02 Mar 2009 18:36:09 -0800, Andrei Alexandrescu wrote:

 Phobos has supported Posix positional syntax since 2.006.
 
 http://digitalmars.com/d/2.0/phobos/std_stdio.html

Thank you. I was behind the times (again). -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Mar 02 2009
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-03-02 10:02:10 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 I think there are three aspects to localization. One is date and number 
 formating. Another is offering a facility for translating all the 
 messages an application can give. And the last one is the configuration 
 part, where you know which format to use.

Sounds like a good start.
 The only problem I've seen addressed by you right now is the 
 configuration part; I believe it's the wrong end to start with.
 
 We should start by defining how to perform the tasks I enumerated 
 above: translating date and number formats, selecting strings for a 
 given language. After that we can figure out how to pass the proper 
 default configuration around. And then you're done.
 
 For date and number formatting, I like very much the NSDateFormatter 
 and NSNumberFormatter approach in Cocoa for instance: you have a base 
 class to format dates, another for numbers; you can easily create your 
 own subclass if you want, and there's a way to get the default 
 formatter instance.

Well I was thinking of passing the buck around. Instead of std.locale defining a hierarchy for formatting numbers and dates, it provides a means for user code to plant a routine in the locale object that knows how to format numbers and dates. Of course, with time default localized routine implementations will show up (hopefully contributed to by people), but the basic mechanism is simple - there exists a locale table that allows you to store a delegate in it.

Looks somewhat like what I proposed. But the point I was trying to make is that you don't need to regroup all these in one big object called a "locale". Instead of seeing a locale as a central object for localizing every kind of data, I'm suggesting that we have different kinds of formatters capable of localizing different kinds of data. Each formatter would have its own definition of a locale that suits its needs. All you need is a standardized naming scheme for locales compatible between formatters, but that we have. Note that while I've proposed that formatters be classes, I have no problem in them being structs which could be accepted in template functions. What's good about a class, or a struct, is that it can regroup a bunch of related functions. For instance, you could have a number formatter help you display the right string, read a formatted string, and validate a formatted string. And you could configure the formatter for a fixed number of decimals, specific rounding behaviour, negative format, etc.
 This is extensible, because if you wanted to go further, you could add 
 formatter classes for various units (length, mass...), or anything else.

This I want to avoid, at least for the time being. I want to define a table that can contain strings, integers, delegates, and other sub-tables. This is it. The path to extensibility will not be Phobos defining new classes to format various things. This could go on forever. Phobos will use the table consistently, and users who do want to format various things will simply plant their delegates in the table.

Well, when I said "you", I really meant anyone, and not necessarily inside Phobos. That was just to point out that the design is extensible. Sorry, it was confusing.
 Translating strings is a little harder because 1) strings are 
 application-defined, 2) strings are often not available in the user's 
 prefered language, adding the need for a fallback mecanism, and 3) 
 different applications will want to to store those strings in different 
 ways. Perhaps we could define a base class for getting translated 
 strings, then allow the program to use whatever subclass it wants.

There's no need for classes and subclasses. It's all data. Why should we replace data with code? Data is easier. Consider some code in phobos that must throw an exception: throw Exception("File `%s' not found, system error is %s.", filename, errnomsg); The localized version will look like this: auto format = "File `%s' not found, system error is %s."; auto localFormat = currentLocale ? currentLocale.peek(format) : null; if (!localFormat) localFormat = format; throw Exception(localFormat, filename, errnomsg); What happens is that the default format string _is_ the key for looking up the localized strings. If there's no value for that string, the default format string is in vigor. Note that on the default path, currentLocale is null so there is hardly any inefficiency.

Firstly, while you and I both agree that it's good that the key for searching a localized string be a readable message, not everyone does. It often doesn't work well when you want to translate small words having an overloaded meaning in English for instance. Secondly, always falling back to english (or the developer's locale) when the currentLocale is not available isn't flexible enough. On Mac OS X for instance, you can select a number of languages for applications to use in order of preference. When the first isn't available, it looks for the second (skipping some details). Thirdly, I hope you don't expect everyone to write the above each time. We should provide a nice fucntion to do the localization, say "localize"? This function should really be an overridable delegate. auto format = "File `%s' not found, system error is %s."; throw Exception(localize(format), filename, errnomsg); Fourthly, various libraries are likely to provide their own translation tables (perhaps even in various formats). Unless you merge them all (risking some clashes) so you may want a second argument for specifying the translation table to use. auto format = "File `%s' not found, system error is %s."; throw Exception(localize(format, PHOBOS), filename, errnomsg); Finally, no current library address this, but I'd be great if there was a way to correctly manage plurals in all languages. Perhaps making a word parametrizable depending on a number...
 Notice how I'm not using the word "locale" to talk about these things. 
 "Locale" is a concept too abstract to be able to do something good with 
 it. Since you could only define it using Algebraic type and a loosely 
 defined tree of strings, that seems to confirm my view. Call the module 
 std.locale if you want, but keep in mind that the most important task 
 at hand is facilitating localization, not defining what constitutes a 
 locale, that can wait.

How should I call it?

My point was that there shouldn't be a class/struct/thing representing a locale. Having a collection of formatters, each knowning where to get their locale information (when given a locale name) would work better in my opinion. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 02 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Michel Fortin wrote:
 Translating strings is a little harder because 1) strings are 
 application-defined, 2) strings are often not available in the user's 
 prefered language, adding the need for a fallback mecanism, and 3) 
 different applications will want to to store those strings in different 
 ways. Perhaps we could define a base class for getting translated 
 strings, then allow the program to use whatever subclass it wants.

It's a silly thing, but I love the little google widget you can add to a web page to automatically translate the pages. All the D site pages have it in the left column.
Mar 02 2009
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-03-02 14:58:26 -0500, Walter Bright <newshound1 digitalmars.com> said:

 It's a silly thing, but I love the little google widget you can add to 
 a web page to automatically translate the pages. All the D site pages 
 have it in the left column.

It's not a silly thing, it's hilarious. Look, Google has invented the D-French language: - import std.stdio; + std.stdio importation; ---------- - delete cl; + supprimer cl; ---------- - s.allocated += argv.length * typeof (argv[0]).sizeof; + s.allocated + = * argv.length typeof (argv [0]). sizeof; ---------- - writefln( "argc = %d, " ~ "allocated = %d" , - argspecs().count, argspecs().allocated); + Writefln ( "argc =% d," ~ "attribus =% d", + argspecs (). count, argspecs (). allou); ---------- - this ( int argc, string argv) // constructor + ce (int argc, string argv) / / constructeur Funny French that is. Perhaps DMD should make its identifiers and keywords localizable, the result would be much better. :-) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 02 2009
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Michel Fortin wrote:
 On 2009-03-02 14:58:26 -0500, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 It's a silly thing, but I love the little google widget you can add to 
 a web page to automatically translate the pages. All the D site pages 
 have it in the left column.

It's not a silly thing, it's hilarious. Look, Google has invented the D-French language:

A bug in Google's translator is there's no way to tell it to ignore a section, like a code section.
Mar 02 2009
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Michel Fortin wrote:
 On 2009-03-02 14:58:26 -0500, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 It's a silly thing, but I love the little google widget you can add to 
 a web page to automatically translate the pages. All the D site pages 
 have it in the left column.

It's not a silly thing, it's hilarious. Look, Google has invented the D-French language: - import std.stdio; + std.stdio importation; ---------- - delete cl; + supprimer cl; ---------- - s.allocated += argv.length * typeof (argv[0]).sizeof; + s.allocated + = * argv.length typeof (argv [0]). sizeof;

Wow, I didn't know the standard French form for formulas was prefix notation :-) Sean
Mar 03 2009
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Sean Kelly wrote:
 Michel Fortin wrote:
 On 2009-03-02 14:58:26 -0500, Walter Bright
 <newshound1 digitalmars.com> said:

 It's a silly thing, but I love the little google widget you can add
 to a web page to automatically translate the pages. All the D site
 pages have it in the left column.

It's not a silly thing, it's hilarious. Look, Google has invented the D-French language: - import std.stdio; + std.stdio importation; ---------- - delete cl; + supprimer cl; ---------- - s.allocated += argv.length * typeof (argv[0]).sizeof; + s.allocated + = * argv.length typeof (argv [0]). sizeof;

Wow, I didn't know the standard French form for formulas was prefix notation :-) Sean

Wait, does this mean the French speak LISP? No wonder I could never understand them! :O -- Daniel
Mar 03 2009
prev sibling parent Georg Wrede <georg.wrede iki.fi> writes:
Walter Bright wrote:
 Georg Wrede wrote:
 What else? Well, it is conceivable that he wants his program to print 
 dates and times the way it's done over there. He simply writes the 
 program "by hand" so it does dates and times like he wants. Even if 
 there was a locale thing in the language, he wouldn't bother with the 
 hassle. And he couldn't care less about Urdu.

I've attempted to use locales, but the reason I'd always wind up doing it by hand is because the existing libraries to do it are obtuse, impenetrable, execrable, and pretty much unusable.

I'd venture to say, it's not only the libraries -- the stuff itself is obtuse. In most countries there's no *real* consensus on what and how folks want their settings, and often the Official Settings (as dictated by either a real or imagined authority) are less than practical. A case in point, in Finland, what I get when trying to type a dollar sign, is a , which is a circle with four spokes. This sign is not used for absolutely anything, anywhere. Ever. (And I've been at this for more than 25 years.)
 So it may be that it's an insoluble problem, or maybe nobody has come up 
 with the right abstraction yet. I don't have nearly enough experience 
 with it to know the answer.

National pride, anti-imperialism, you name it. The numeric keyboard around here has a comma instead of the decimal point. Just guess if it's nice to try to do spread sheets, where you have use a decimal point just because this spread sheet goes to company correspondence overseas. Folks are all eager about locales, until they get their hands dirty. IMHO, it actually is an insoluble problem -- at least as far as a *programming language* is concerned.
Mar 01 2009
prev sibling next sibling parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Sooner or later that will need to be defined. I know next to nothing 
 about locales. (I know I dislike the design C++ uses.)

D uses Utf-8, and that is *good enough*! This lets my programs "understand" Finnish, and doesn't give me undue headaches. Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read: http://www.manpagez.com/man/1/perllocale/ It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do. An excellent string hierarchy without the entire rest of i18n, is only going to look like a Ferrari with a Trabant engine. Which is worse than nothing at all. Besides, there's more to this than just designing the perfect, or even a good locale system in a language. *Somebody should actually use it*. Now, the non-English programmer, what does he really want? He wants to be able to type stuff into his program in his native character set. D already does that, by way of Utf-8. What else? Well, it is conceivable that he wants his program to print dates and times the way it's done over there. He simply writes the program "by hand" so it does dates and times like he wants. Even if there was a locale thing in the language, he wouldn't bother with the hassle. And he couldn't care less about Urdu. The hypothetical Ambitious Programmer might want to use locale. He could then have the dates and times (and currencies, etc.) follow the country. Now, that might sound commendable, but in practice it *crumbles*. He can't possibly know how to deal with languages that are written backwards, languages where several characters make one letter, exotic ways of writing dates, etc. So, his fancy i18n project is doomed to be, at most, as usable as the "normal" D program. Probably less, since his decisions will actually worsen the user experience -- for users in another culture. And, any project big enough to tackle this, will implement its own locale handling anyway. I'm sorry to say. ---- Yes, locales are nice and all. For D 3.5 that is. Honestly.

If you don't use it, you don't use it; but please don't ruin it for the sake of those of us who will. I will use it (go Andrei!) people who have to muck with spreadsheet libraries might use it people who write spreadsheet libraries might use it wish I had some good ideas for Andrei, but I can't say as I do.
Mar 01 2009
prev sibling next sibling parent Christopher Wright <dhasenan gmail.com> writes:
Georg Wrede wrote:
 The hypothetical Ambitious Programmer might want to use locale. He could 
 then have the dates and times (and currencies, etc.) follow the country. 
 Now, that might sound commendable, but in practice it *crumbles*.
 He can't possibly know how to deal with languages that are written 
 backwards, languages where several characters make one letter, exotic 
 ways of writing dates, etc.

*cough*tango.time*cough*
Mar 02 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Mon, Mar 2, 2009 at 1:52 PM, Georg Wrede <georg.wrede iki.fi> wrote:
 My take:

 =A0* This is still a moving target
 =A0* Using this is a major hassle for the programmer
 =A0* With D2 itelf a moving target, nobody is going to invest enough time=

 this to actually use it for something worthwhile in the next 6 to 12 mont=

 anyway
 =A0* This is more application level stuff than language level stuff
 =A0* Doing this now will steal time from you, Walter, and many of us, bot=

 directly, and indirectly by leaching bandwidth in the newsgroup -- time t=

 should be spent on more urgent or more important things, or even
 documentation
 =A0* If it's so easy to do, then why not do it a week before the release =

 final D2

I agree entirely. Localization and internationalization seem like things that should be at a much higher level than a standard library. Everyone's going to want to do it differently. Providing a thin, cross-platform wrapper over what the OS exposes is fine, but creating a proper i18n/l10n framework is a huge project in and of itself (I think the 140MB Java package makes that abundantly clear). I'd much rather see a rewritten std.stream and proper Unicode support in std.string (support for types other than string, functions for indexing and slicing on character boundaries) before this.
Mar 02 2009
prev sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Mon, Mar 2, 2009 at 3:48 PM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Jarrett Billingsley wrote:
 functions for
 indexing and slicing on character boundaries) before this.

These already exist in std.uni.

It's std.utf, but good to know.
Mar 02 2009