www.digitalmars.com         C & C++   DMDScript  

D - Internationalization

reply Juarez Rudsatz <juarez correio.com> writes:
People,

See the following function :

void printValue(int value){
    	printf("The value is :");
    	printf('%d', value);
}

    	Internationalization is a big issue in these days. The one of 
strategies adopted is the use of translators, like gnu gettext package, 
which extract strings from source code and generate a new file for 
translation. This file is recombined later during compilation.
    	This can't be the best way, but is a simple and is very common.
    	Problem is not all strings must be translated ! And there are not a 
way of marking "text" and utility strings.
    	Why not differ the meaning of double quotes and single quotes for 
this pourpose ?

double quotes (") : text for translation.
single quotes (') : constants for general use.


    	Maybe someone have a nicer idea.

Juarez
Jun 13 2002
parent reply "Walter" <walter digitalmars.com> writes:
That's a great idea, but the ' and " functionality is pretty well settled in
D by now. How about, say, putting a unique comment before the strings to be
translated, like:
    print(/*T*/"this string gets translated");


"Juarez Rudsatz" <juarez correio.com> wrote in message
news:Xns922C927DA4780juarezcom 63.105.9.61...
 People,

 See the following function :

 void printValue(int value){
     printf("The value is :");
     printf('%d', value);
 }

     Internationalization is a big issue in these days. The one of
 strategies adopted is the use of translators, like gnu gettext package,
 which extract strings from source code and generate a new file for
 translation. This file is recombined later during compilation.
     This can't be the best way, but is a simple and is very common.
     Problem is not all strings must be translated ! And there are not a
 way of marking "text" and utility strings.
     Why not differ the meaning of double quotes and single quotes for
 this pourpose ?

 double quotes (") : text for translation.
 single quotes (') : constants for general use.


     Maybe someone have a nicer idea.

 Juarez
Jun 13 2002
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Might there be a place for language support of this?  Something like version,
maybe?
    print( language( "EN_US" -> "this string gets translated",
                            "LATIN_PIG" -> "isthay ingsytray etsgay
anslatedtray") );
I don't like the inline syntax, though.  Maybe a simple tag on the string:
    print( translate "this string gets translated" );
With a table elsewhere in the source code:
    translate-table "this string gets translated"
    {
        "EN_US" -> "this string gets translated",
        ...
    }

Walter wrote:

 That's a great idea, but the ' and " functionality is pretty well settled in
 D by now. How about, say, putting a unique comment before the strings to be
 translated, like:
     print(/*T*/"this string gets translated");

 "Juarez Rudsatz" <juarez correio.com> wrote in message
 news:Xns922C927DA4780juarezcom 63.105.9.61...
 People,

 See the following function :

 void printValue(int value){
     printf("The value is :");
     printf('%d', value);
 }

     Internationalization is a big issue in these days. The one of
 strategies adopted is the use of translators, like gnu gettext package,
 which extract strings from source code and generate a new file for
 translation. This file is recombined later during compilation.
     This can't be the best way, but is a simple and is very common.
     Problem is not all strings must be translated ! And there are not a
 way of marking "text" and utility strings.
     Why not differ the meaning of double quotes and single quotes for
 this pourpose ?

 double quotes (") : text for translation.
 single quotes (') : constants for general use.


     Maybe someone have a nicer idea.

 Juarez
-- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
Jun 13 2002
parent reply "Alix Pexton" <Alix seven-point-star.co.uk> writes:
     print( translate "this string gets translated" );
 With a table elsewhere in the source code:
     translate-table "this string gets translated"
     {
         "EN_US" -> "this string gets translated",
         ...
     }
I like the syntax, but I think the translation table would be better in a separate file, to avoid clutter, but then that introduces a load of linking problems, or something... Java's Internationalization with resource bundles works like this, but relies on being able to dynamically load classes, which I seem to remember reading, D can't do. Java's Resource Bundles did however allow people to write message texts in any language with out altering the program... I'm sure their is a middle ground... On the whole I think internationalization and localization are better performed outside of the actual language... Alix Pexton...
Jun 13 2002
parent reply "Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:
Hi,

"Alix Pexton" <Alix seven-point-star.co.uk> wrote in message
news:01c2130b$8818f460$90ac7ad5 jpswm...

 I like the syntax, but I think the translation table would be better in a
 separate file
Absolutely. The best translations is done by experts in spoken languages, not programming languages. They should never need to see the source code. Also, using resource files allows you to develop in one language, and let someone else do the translation - someone that is not entitled to see the source code, eg. a customer or system integrator of a commercial product. Another benefit is that you can ship resource files for multiple languages with the product and let the end user choose among them. Then you have only one product. If you choose language at compile time (or build time), it will add an extra dimension to the product making it more costly to maintain and support. Another thing to consider is positional parameters like the format specifiers of printf(). The order of the parameters will vary from language to language, and well designed internationalization support will support this as well. Windows' FormatMessage() does this, and so does GNU's printf() - see http://www.gnu.org/manual/gettext-0.10.35/html_node/gettext_17.html I'm not sure anything is gained by building support into the D language, though. A library solution would do just as well. Consider Walter's proposal: > print(/*T*/"this string gets translated"); It could just as well be: print(T("this string gets translated")); It is even shorter, and it could easily support resource files. If it was supplemented with a GNU style sprintf() without buffer allocation problems, everything above could be achieved. In D, in would like an sprintf() having a proto like this: char[] sprintf(char[] format, ...);
 On the whole I think internationalization and localization are better
 performed outside of the actual language...
Agreed :-) Regards, Martin M. Pedersen
Jun 13 2002
next sibling parent reply "Matthew Wilson" <matthew thedjournal.com> writes:
Martin

You are right in it being a big issue, and that probably should not be
solely left to code.


alternate printf syntax, such as

    if(language == 1)
    {
        fmtStr    =    "This {0} is {2} the {3} format {1} text";
    }
    else
    {
        fmtStr    =    "Bonjour {1} nous {0} sommes {3} discarde {2} le cup
de monde";    // Spot the linguist!
    }

    Console.WriteLine(fmtStr, arg0, arg1, arg2, arg3);

such that one can handle ordering in combination with resources strings.

Is there any such support in D?

Matthew

"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message
news:aeat39$v70$1 digitaldaemon.com...
 Hi,

 "Alix Pexton" <Alix seven-point-star.co.uk> wrote in message
 news:01c2130b$8818f460$90ac7ad5 jpswm...

 I like the syntax, but I think the translation table would be better in
a
 separate file
Absolutely. The best translations is done by experts in spoken languages, not programming languages. They should never need to see the source code. Also, using resource files allows you to develop in one language, and let someone else do the translation - someone that is not entitled to see the source code, eg. a customer or system integrator of a commercial product. Another benefit is that you can ship resource files for multiple languages with the product and let the end user choose among them. Then you have
only
 one product. If you choose language at compile time (or build time), it
will
 add an extra dimension to the product making it more costly to maintain
and
 support.

 Another thing to consider is positional parameters like the format
 specifiers of printf(). The order of the parameters will vary from
language
 to language, and well designed internationalization support will support
 this as well. Windows' FormatMessage() does this, and so does GNU's
 printf() - see
 http://www.gnu.org/manual/gettext-0.10.35/html_node/gettext_17.html

 I'm not sure anything is gained by building support into the D language,
 though. A library solution would do just as well. Consider Walter's
 proposal:

     > print(/*T*/"this string gets translated");

 It could just as well be:

     print(T("this string gets translated"));

 It is even shorter, and it could easily support resource files. If it was
 supplemented with a GNU style sprintf() without buffer allocation
problems,
 everything above could be achieved. In D, in would like an sprintf()
having
 a proto like this:

     char[] sprintf(char[] format, ...);

 On the whole I think internationalization and localization are better
 performed outside of the actual language...
Agreed :-) Regards, Martin M. Pedersen
Jun 13 2002
next sibling parent "Sean L. Palmer" <seanpalmer earthlink.net> writes:
No doubt.  printf can *so* be improved upon;  I don't see it as the greatest
thing since sliced bread.  For one many times you just want to format a
string instead of do actual printing, and for this you use sprintf.  But
sprintf (like printf) is C, and was designed over 20 years ago.  D doesn't
work well with it.  It's not "native" D.

Surely we can do a wee bit better;  something array based like the rest of D
strings work.  Maybe something with ~ array concatenation syntax and some
kinds of conversion functions that return strings;

Perhaps we can have some standard way to pipe arrays to / from streams.  And
to convert variables to / from arrays of char.

I'm no expert in localization so I'll leave that detail to someone else.

Sean


"Matthew Wilson" <matthew thedjournal.com> wrote in message
news:aebaa2$1ctn$1 digitaldaemon.com...
 Martin

 You are right in it being a big issue, and that probably should not be
 solely left to code.


 alternate printf syntax, such as

     if(language == 1)
     {
         fmtStr    =    "This {0} is {2} the {3} format {1} text";
     }
     else
     {
         fmtStr    =    "Bonjour {1} nous {0} sommes {3} discarde {2} le
cup
 de monde";    // Spot the linguist!
     }

     Console.WriteLine(fmtStr, arg0, arg1, arg2, arg3);

 such that one can handle ordering in combination with resources strings.

 Is there any such support in D?

 Matthew
Jun 14 2002
prev sibling next sibling parent reply Burton Radons <loth users.sourceforge.net> writes:
Matthew Wilson wrote:
[snip]

 alternate printf syntax, such as
 
     if(language == 1)
     {
         fmtStr    =    "This {0} is {2} the {3} format {1} text";
     }
     else
     {
         fmtStr    =    "Bonjour {1} nous {0} sommes {3} discarde {2} le cup
 de monde";    // Spot the linguist!
     }
 
     Console.WriteLine(fmtStr, arg0, arg1, arg2, arg3);
 
 such that one can handle ordering in combination with resources strings.
 
 Is there any such support in D?
Since va_list gives no information about the type, size, or number of parameters, indexed conversion specifiers would either have to depend upon some beezarre format string analysis to figure out what the parameters are, a prefixed dictionary, as in "ssds:This %s is %:2d the %:3s format %:1s text", or preprocessing, as in gettext. All three solutions are bug-inducing, and the first requires that all parameters to the highest one indexed get mentioned in the format string. One stone can kill this bird and many others - I described a solution that fixes the problem more readily than generic objects (which don't address others that this solution does, such as constructing and destructing arguments to normal functions) in the "float -> double conversion" thread in early May. Then, as with Python, most of the time you'd just use %s regardless of the type. [snip]
Jun 14 2002
next sibling parent "Matthew Wilson" <dmd synesis.com.au> writes:
Good points.

Far out, I seem to say a great many stupid things in my morning posts. :)
Might have to save it all for the evenings ...

Matthew

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:3D09B1FD.8060908 users.sourceforge.net...
 Matthew Wilson wrote:
 [snip]

 alternate printf syntax, such as

     if(language == 1)
     {
         fmtStr    =    "This {0} is {2} the {3} format {1} text";
     }
     else
     {
         fmtStr    =    "Bonjour {1} nous {0} sommes {3} discarde {2} le
cup
 de monde";    // Spot the linguist!
     }

     Console.WriteLine(fmtStr, arg0, arg1, arg2, arg3);

 such that one can handle ordering in combination with resources strings.

 Is there any such support in D?
Since va_list gives no information about the type, size, or number of parameters, indexed conversion specifiers would either have to depend upon some beezarre format string analysis to figure out what the parameters are, a prefixed dictionary, as in "ssds:This %s is %:2d the %:3s format %:1s text", or preprocessing, as in gettext. All three solutions are bug-inducing, and the first requires that all parameters to the highest one indexed get mentioned in the format string. One stone can kill this bird and many others - I described a solution that fixes the problem more readily than generic objects (which don't address others that this solution does, such as constructing and destructing arguments to normal functions) in the "float -> double conversion" thread in early May. Then, as with Python, most of the time you'd just use %s regardless of the type. [snip]
Jun 14 2002
prev sibling parent Juarez Rudsatz <juarez correio.co> writes:
 
 Is there any such support in D?
Since va_list gives no information about the type, size, or number of parameters, indexed conversion specifiers would either have to depend upon some beezarre format string analysis to figure out what the parameters are, a prefixed dictionary, as in "ssds:This %s is %:2d the %:3s format %:1s text", or preprocessing, as in gettext. All three solutions are bug-inducing, and the first requires that all parameters to the highest one indexed get mentioned in the format string. One stone can kill this bird and many others - I described a solution that fixes the problem more readily than generic objects (which don't address others that this solution does, such as constructing and destructing arguments to normal functions) in the "float -> double conversion" thread in early May. Then, as with Python, most of the time you'd just use %s regardless of the type. [snip]
Yes ! There are many problems with message output ! But theare are also the necessity of a standart way of tranlating a program. I know the program must be projected with internationalization in mind. But without simple conventions, each program will have a form of translation. And the state of D translation will be the same of C. For now, maybe the resouce/constants/replace_token could be the best. I dont understand why the formal meaning of double quotes and single quotes could be problematic. There are some reason ?
Jun 14 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Matthew Wilson" <matthew thedjournal.com> wrote in message
news:aebaa2$1ctn$1 digitaldaemon.com...

 alternate printf syntax, such as

     if(language == 1)
     {
         fmtStr    =    "This {0} is {2} the {3} format {1} text";
     }
     else
     {
         fmtStr    =    "Bonjour {1} nous {0} sommes {3} discarde {2} le
cup
 de monde";    // Spot the linguist!
     }
The version statement in D can serve to handle things like this.
Jun 18 2002
next sibling parent reply Pavel Minayev <evilone omen.ru> writes:
On Tue, 18 Jun 2002 15:47:54 -0700 "Walter" <walter digitalmars.com> wrote:

 "Matthew Wilson" <matthew thedjournal.com> wrote in message
 news:aebaa2$1ctn$1 digitaldaemon.com...

 alternate printf syntax, such as

     if(language == 1)
     {
         fmtStr    =    "This {0} is {2} the {3} format {1} text";
     }
     else
     {
         fmtStr    =    "Bonjour {1} nous {0} sommes {3} discarde {2} le
cup
 de monde";    // Spot the linguist!
     }
The version statement in D can serve to handle things like this.
As long as D doesn't have a printing (or formatting) function which allows to put arguments in arbitrary order, version() would do nothing.
Jun 19 2002
parent Burton Radons <loth users.sourceforge.net> writes:
Pavel Minayev wrote:
 On Tue, 18 Jun 2002 15:47:54 -0700 "Walter" <walter digitalmars.com> wrote:
 
 
"Matthew Wilson" <matthew thedjournal.com> wrote in message
news:aebaa2$1ctn$1 digitaldaemon.com...


alternate printf syntax, such as

    if(language == 1)
    {
        fmtStr    =    "This {0} is {2} the {3} format {1} text";
    }
    else
    {
        fmtStr    =    "Bonjour {1} nous {0} sommes {3} discarde {2} le
cup
de monde";    // Spot the linguist!
    }
The version statement in D can serve to handle things like this.
As long as D doesn't have a printing (or formatting) function which allows to put arguments in arbitrary order, version() would do nothing.
This could be hacked around. The original string is in normal printf format, such as: "%s(%08X): Expected data to be in the range (0 <= %d < %d)". Translations start with "%1(%2): Expected data to be in the range (0 <= %3 < %4)" and can move them around. During translation, you take the original formatting string and convert it into "%s\0%08X\0%d\0%d", take the result, split it up, and then format the translated string with correct indices using a special function.
Jun 19 2002
prev sibling parent reply Juarez Rudsatz <juarez correio.com> writes:
"Walter" <walter digitalmars.com> wrote in news:aeoeoo$92j$1
 digitaldaemon.com:

 
 "Matthew Wilson" <matthew thedjournal.com> wrote in message
 news:aebaa2$1ctn$1 digitaldaemon.com...

 alternate printf syntax, such as

     if(language == 1)
     {
         fmtStr    =    "This {0} is {2} the {3} format {1} text";
     }
     else
     {
         fmtStr    =    "Bonjour {1} nous {0} sommes {3} discarde {2} le
cup
 de monde";    // Spot the linguist!
     }
The version statement in D can serve to handle things like this.
Yes. In some cases this could be used. But if breaks some things? o Translator dont need know programming to translate. o All strings will be in ALL files and not centralized in one o Source is greather and dificult to read A simple program which parses the input D file and modifies it replacing the strings with constants and put all strings in another file will kill a majority of problems. This program is not to dificult for write, but for this is needed a convention saying what is a resource string and what is not translatable. For a solution like this I have proposed the convention: "foo" is a normal string. 'foo' is a string used in translation. ( or the inverse, I dont remember ) Maybe a better solution for using it with versioning could be : module messages; version(language == pt_BR){ import messages_pt_BR; } else version(language == pt_PT)}{ import messages_pt; } else import messages_en; A standart table of constants for versioning and languages could be valuable in such cases.
Jun 19 2002
next sibling parent reply Pavel Minayev <evilone omen.ru> writes:
On Wed, 19 Jun 2002 16:47:10 +0000 (UTC) Juarez Rudsatz <juarez correio.com> 
wrote:

 For a solution like this I have proposed the convention:
 
 "foo" is a normal string.
 'foo' is a string used in translation. ( or the inverse, I dont remember )
Don't forget that D already differentiates single and double quotes: strings enclosed in the former don't support escape-characters.
Jun 19 2002
parent Juarez Rudsatz <juarez nowhere.com> writes:
Pavel Minayev <evilone omen.ru> wrote in
news:CFN37426909090787 news.digitalmars.com: 

 
 Don't forget that D already differentiates single and double quotes:
 strings enclosed in the former don't support escape-characters.
 
I can't distigue what is the best choice. Maybe the second because the string can contain newlines and tabs. But there are problems when writing chars like "\\".
Jun 19 2002
prev sibling parent "Nic Tiger" <nictiger pt.comcor.ru> writes:
"Juarez Rudsatz" <juarez correio.com> wrote in message
news:Xns92328D38A1172juarezcom 63.105.9.61...
 A simple program which parses the input D file and modifies it replacing
 the strings with constants and put all strings in another file will kill a
 majority of problems. This program is not to dificult for write, but for
 this is needed a convention saying what is a resource string and what is
 not translatable.
I've developed the program of this kind to make localization of my Russian DOSX program into Chinese. It considered strings as localizable which started with L" , while starting with " were considered as internal name, filenames and so on. The utility replaced L"..." strings with elements of wchar_t *Strs[];, and really they were placed onto external file. Special application has been done by me in order to edit (localize) strings. With it Chinese engineers translated all information from Russian into Chinese themselves. The problem which arises when replacing L"..." with Strs[NNN] is declarations like this: wchar_t msg[10] = L"Dummy". The same applies to "wchar_t [N]"-like members of structs, when initializing struct with = { ..., L"Dummy", ... }; I had to manually convert these declarations into wchar_t msg[10]; wcscpy ( msg, L"Dummy" ); before running utility. Nic Tiger.
Jun 21 2002
prev sibling parent "Sean L. Palmer" <seanpalmer earthlink.net> writes:
This problem is easily solved by an enum index into an exterrnally linked
array of static constant strings.

The problem with that is maintaining the parallel arrays.

Can the core problem with such be eased at all?

Sean


"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message
news:aeat39$v70$1 digitaldaemon.com...
 Hi,

 "Alix Pexton" <Alix seven-point-star.co.uk> wrote in message
 news:01c2130b$8818f460$90ac7ad5 jpswm...

 I like the syntax, but I think the translation table would be better in
a
 separate file
Absolutely. The best translations is done by experts in spoken languages, not programming languages. They should never need to see the source code. Also, using resource files allows you to develop in one language, and let someone else do the translation - someone that is not entitled to see the source code, eg. a customer or system integrator of a commercial product. Another benefit is that you can ship resource files for multiple languages with the product and let the end user choose among them. Then you have
only
 one product. If you choose language at compile time (or build time), it
will
 add an extra dimension to the product making it more costly to maintain
and
 support.

 Another thing to consider is positional parameters like the format
 specifiers of printf(). The order of the parameters will vary from
language
 to language, and well designed internationalization support will support
 this as well. Windows' FormatMessage() does this, and so does GNU's
 printf() - see
 http://www.gnu.org/manual/gettext-0.10.35/html_node/gettext_17.html

 I'm not sure anything is gained by building support into the D language,
 though. A library solution would do just as well. Consider Walter's
 proposal:

     > print(/*T*/"this string gets translated");

 It could just as well be:

     print(T("this string gets translated"));

 It is even shorter, and it could easily support resource files. If it was
 supplemented with a GNU style sprintf() without buffer allocation
problems,
 everything above could be achieved. In D, in would like an sprintf()
having
 a proto like this:

     char[] sprintf(char[] format, ...);

 On the whole I think internationalization and localization are better
 performed outside of the actual language...
Agreed :-) Regards, Martin M. Pedersen
Jun 14 2002