digitalmars.D - Character set conversions

Adam D. Ruppe (11/11) May 29 2011 I've encountered some problems with other charsets recently. Phobos has

Jonathan M Davis (17/32) May 29 2011 Well, generally the idea is that you just use UTF-8, UTF-16, or UTF-32, ...

Adam D. Ruppe (4/9) May 29 2011 Translation is all I want. Internally, everything is utf8 strings,

Daniel Gibson (7/17) May 29 2011 Hmm on the one hand iconv already does this for a plethora of
Jonathan M Davis (14/24) May 29 2011 Well, likely no one has done it yet because none of the Phobos developer...

Kagamin (3/7) May 30 2011 May be, it's his cgi lib? :)

Adam D. Ruppe (13/15) May 30 2011 In practice, that hasn't been a problem because browser tend to

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (11/29) May 30 2011 Fun fact about Excel generated CSV files: quite apart from encoding

Nick Sabalausky (3/8) May 30 2011 Heh, that's just wonderful: localized file format specs...
Simen Kjaeraas (11/16) May 30 2011 On Mon, 30 May 2011 19:57:32 +0200, J=C3=A9r=C3=B4me M. Berger

Daniel Gibson (19/35) May 30 2011 CSV in Excel is totally misleading anyway. At least in the German

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (15/26) May 30 2011 il.

Jonathan M Davis (6/23) May 30 2011 Well, knowing Microsoft, they probably did it with printf (or fprintf or...

Kagamin (2/7) May 31 2011 Doesn't C standard specify the locale to be "C" until you set it explici...

Daniel Gibson (6/15) May 31 2011 At least on Linux it is usually set to whatever you specified on

Jonathan M Davis (11/27) May 30 2011 ed

Adam D. Ruppe (13/17) May 30 2011 Yeah, I've seen the semicolon in the wild before too, though I didn't

Jacob Carlborg (7/32) May 30 2011 Yeah, that is a nightmare. I tried SYLK, symbolic link as well, it's

Sean Kelly (4/19) May 30 2011 I suggest looking into ICU if you're doing this stuff. I believe =

Kagamin (2/4) May 31 2011 I suppose it's system ANSI encoding, which is locale-dependent, you can ...

Kagamin (2/6) May 31 2011 The client usually send information about its locale, from this info you...

Kagamin (7/23) May 31 2011 according to N1425

Daniel Gibson (2/28) May 31 2011 So they break it deliberately in Excel? Smart.

Kagamin (3/12) May 31 2011 Excel deliberately localizes data presented to the user. Wouldn't it be ...

Daniel Gibson (5/19) May 31 2011 I'm not talking about representing the values on the screen - I'm

Kagamin (2/7) May 31 2011

Daniel Gibson (8/18) May 31 2011 It's natural to have an internal representation of the value of a field

Kagamin (2/4) May 31 2011 I've checked excel 2007, seems like it stores (in xlsx) numbers and func...

Daniel Gibson (3/9) May 31 2011 Ok. Everything else would be a really unusable mess, especially for

Adam D. Ruppe <destructionator gmail.com> writes:

I've encountered some problems with other charsets recently. Phobos has
a std.encoding that can do some useful stuff, but there's some
encodings I've seen in the wild that it can't handle (indeed, it's
a fairly short list that it does support)

I used gnu iconv for one of my projects and it works for me, but
I wonder:

Is anyone planning to add more charset support to Phobos?
(alternatively, am I missing something already there?)


If no, maybe I'll do a few myself. I've never actually written code
to do this, but it can't be rocket science. I suspect it's more
tedious than anything else.

May 29 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-05-29 19:21, Adam D. Ruppe wrote:
 I've encountered some problems with other charsets recently. Phobos has
 a std.encoding that can do some useful stuff, but there's some
 encodings I've seen in the wild that it can't handle (indeed, it's
 a fairly short list that it does support)
 
 I used gnu iconv for one of my projects and it works for me, but
 I wonder:
 
 Is anyone planning to add more charset support to Phobos?
 (alternatively, am I missing something already there?)
 
 
 If no, maybe I'll do a few myself. I've never actually written code
 to do this, but it can't be rocket science. I suspect it's more
 tedious than anything else.

Well, generally the idea is that you just use UTF-8, UTF-16, or UTF-32, and 
for the most part, I wouldn't really expect people to be using UTF-16 when 
they need to interface with Windows system functions which require it. By 
definition, char is supposed to be UTF-8, wchar is supposed to be UTF-16, and 
dchar is supposed to be UTF-32. I don't really think that it's expected that 
you be using any other encodings within your typical D program. Sometimes it 
may be necessary to translate from another encoding to UTF-8, UTF-16, or 
UTF-32 when getting input from somewhere, and sometimes it may be necessary to 
translate to another encoding from UTF-8, UTF-16, or UTF-16 when outputting 
somewhere, but it certainly isn't the norm. It may be that we need better 
suppport for dealing with those cases, but they should really only be for 
converting on input or output. So, if you want to improve std.encoding to 
handle more charsets, then feel free, but don't expect the rest of Phobos to 
work with anything beyond UTF-8, UTF-16, and UTF-16. It's going to be throwing 
UtfExceptions if you do.

- Jonathan M Davis

May 29 2011

Adam D. Ruppe <destructionator gmail.com> writes:

Jonathan M Davis wrote:
 Sometimes it may be necessary to translate from another encoding to
 UTF-8, UTF-16, or UTF-32 when getting input from somewhere, and
 sometimes it may be necessary to translate to another encoding from
 UTF-8, UTF-16, or UTF-16 when outputting somewhere, but it
 certainly isn't the norm.

Translation is all I want. Internally, everything is utf8 strings,
but sometimes the program is fed files in another encoding and it
needs to handle them too.

May 29 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 30.05.2011 05:03, schrieb Adam D. Ruppe:
 Jonathan M Davis wrote:
 Sometimes it may be necessary to translate from another encoding to
 UTF-8, UTF-16, or UTF-32 when getting input from somewhere, and
 sometimes it may be necessary to translate to another encoding from
 UTF-8, UTF-16, or UTF-16 when outputting somewhere, but it
 certainly isn't the norm.

 
 Translation is all I want. Internally, everything is utf8 strings,
 but sometimes the program is fed files in another encoding and it
 needs to handle them too.

Hmm on the one hand iconv already does this for a plethora of
encodings.. on the other hand AFAIK there is no iconv implementation
that could be shipped with Phobos, so if a module for translating
between encodings should become part of Phobos there seems be no other
way than writing one from scratch :/
(And I think having this in Phobos would make sense)

May 29 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-05-29 20:03, Adam D. Ruppe wrote:
 Jonathan M Davis wrote:
 Sometimes it may be necessary to translate from another encoding to
 UTF-8, UTF-16, or UTF-32 when getting input from somewhere, and
 sometimes it may be necessary to translate to another encoding from
 UTF-8, UTF-16, or UTF-16 when outputting somewhere, but it
 certainly isn't the norm.

 
 Translation is all I want. Internally, everything is utf8 strings,
 but sometimes the program is fed files in another encoding and it
 needs to handle them too.

Well, likely no one has done it yet because none of the Phobos developers have 
needed it enough to implement it, and no one outside of them has taken the 
time to do so and tried to get it into Phobos. And with everything else there 
is to do, it's the sort of thing that's likely not to get done anytime soon - 
especially with no feature requests or bug reports no the matter. Personally, 
I wasn't even aware that it was an issue. Pure UTF-8 has always worked just 
fine for me. Presumably, you're running into issues with it because you're 
actually using D at work.

So, you can either implement it yourself and create a pull request for it, or 
you can create an enhancement request, and it'll probably get done eventually, 
but with everything else that needs doing, I don't know how quickly it'll get 
done.

- Jonathan M Davis

May 29 2011

Kagamin <spam here.lot> writes:

Jonathan M Davis Wrote:

 especially with no feature requests or bug reports no the matter. Personally, 
 I wasn't even aware that it was an issue. Pure UTF-8 has always worked just 
 fine for me. Presumably, you're running into issues with it because you're 
 actually using D at work.

May be, it's his cgi lib? :)
Client is free to send requests in any encoding, I suppose.

May 30 2011

Adam D. Ruppe <destructionator gmail.com> writes:

Kagamin wrote:
 May be, it's his cgi lib? :)
 Client is free to send requests in any encoding, I suppose.

In practice, that hasn't been a problem because browser tend to
send requests in the same encoding as the html you served.

Since the D always outputs utf8, the browsers all send back utf8
too.


The first problem I had was users can upload csv files, which they
generally make in Excel... which apparently outputs Windows-1252.
Fine for 99% of text, but then someone puts in a curly quote or
an em dash and it throws an invalid utf 8 sequence.

Converting that is easy enough though.


Second problem is now I want to fetch and process random websites
on the internet, and they come in a variety of encodings... again,
utf covers a big majority, but not all of them.

May 30 2011

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Adam D. Ruppe wrote:
 Kagamin wrote:
 May be, it's his cgi lib? :)
 Client is free to send requests in any encoding, I suppose.

=20
 In practice, that hasn't been a problem because browser tend to
 send requests in the same encoding as the html you served.
=20
 Since the D always outputs utf8, the browsers all send back utf8
 too.
=20
=20
 The first problem I had was users can upload csv files, which they
 generally make in Excel... which apparently outputs Windows-1252.
 Fine for 99% of text, but then someone puts in a curly quote or
 an em dash and it throws an invalid utf 8 sequence.
=20
 Converting that is easy enough though.
=20

	Fun fact about Excel generated CSV files: quite apart from encoding
issues, the separator used between cells depends on the locale: for
example, in English locales it uses a coma but in French locales it
uses a semicolon...

	Just thought I'd point it out in case you did not know.

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

May 30 2011

"Nick Sabalausky" <a a.a> writes:

""J�r�me M. Berger"" <jeberger free.fr> wrote in message 
news:is0m2h$1s32$1 digitalmars.com...
Fun fact about Excel generated CSV files: quite apart from encoding
issues, the separator used between cells depends on the locale: for
example, in English locales it uses a coma but in French locales it
uses a semicolon...

Just thought I'd point it out in case you did not know.

Heh, that's just wonderful: localized file format specs...

May 30 2011

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Mon, 30 May 2011 19:57:32 +0200, J=C3=A9r=C3=B4me M. Berger <jeberger=
 free.fr>  =

wrote:

 	Fun fact about Excel generated CSV files: quite apart from encoding
 issues, the separator used between cells depends on the locale: for
 example, in English locales it uses a coma but in French locales it
 uses a semicolon...

 	Just thought I'd point it out in case you did not know.

Fun? Gods, it's the most horrible idea I've witnessed in computing.
If only they'd call it something other than CSV, at least - Comma Separa=
ted
Values separated by semicolons? WTF?
And the fantastic joy of opening one of those abominations in some other=

program... *shiver*

-- =

   Simen

May 30 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 30.05.2011 22:20, schrieb Simen Kjaeraas:
 On Mon, 30 May 2011 19:57:32 +0200, Jérôme M. Berger <jeberger free.fr>
 wrote:
 
     Fun fact about Excel generated CSV files: quite apart from encoding
 issues, the separator used between cells depends on the locale: for
 example, in English locales it uses a coma but in French locales it
 uses a semicolon...

     Just thought I'd point it out in case you did not know.

 
 Fun? Gods, it's the most horrible idea I've witnessed in computing.
 If only they'd call it something other than CSV, at least - Comma Separated
 Values separated by semicolons? WTF?
 And the fantastic joy of opening one of those abominations in some other
 program... *shiver*
 

CSV in Excel is totally misleading anyway. At least in the German
Version, if you want to import a CSV file, the standard seperator is
tab, not comma.. If you use File->Open this is all you can get,
importing with custom seperators is hidden somewhere else IIRC.
(This refers to Office XP, dunno if newer versions are better in this
regard.)

In plain C (at least on Linux) you have fun locale-dependent in/output
as well: printf and scanf are locale dependent, so if you use sprintf
to generate a string you'll write into a file (or fprintf directly) with
one locale, reading it with scanf functions with another locale will fail.
Pretty fucking stupid IMHO.
This was/is(?) a bug in GtkRadiant, a level editor for Quake like games,
which uses printf or something to write the map files. The map compiler
will reject them if decimals use a , instead of a . and stuff like that.
(The workaround is to always use the standard LOCALE, i.e. "LC_ALL=C
gtkradiant" to start it).


Cheers,
- Daniel

May 30 2011

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Daniel Gibson wrote:
 In plain C (at least on Linux) you have fun locale-dependent in/output
 as well: printf and scanf are locale dependent, so if you use sprintf
 to generate a string you'll write into a file (or fprintf directly) wit=

h
 one locale, reading it with scanf functions with another locale will fa=

il.
 Pretty fucking stupid IMHO.
 This was/is(?) a bug in GtkRadiant, a level editor for Quake like games=

,
 which uses printf or something to write the map files. The map compiler=

 will reject them if decimals use a , instead of a . and stuff like that=

=2E
 (The workaround is to always use the standard LOCALE, i.e. "LC_ALL=3DC
 gtkradiant" to start it).
=20

	Actually, that is the same issue: Excel outputs numbers to CSV in a
locale dependent way (probably using printf), which means that in
some locales the decimal point is a coma, which prevents using it as
a field separator. Braindead of course, and a real pain when you
want to interface with other software.

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

May 30 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-05-30 14:40, J=E9r=F4me M. Berger wrote:
 Daniel Gibson wrote:
 In plain C (at least on Linux) you have fun locale-dependent in/output
 as well: printf and scanf are locale dependent, so if you use sprintf
 to generate a string you'll write into a file (or fprintf directly) with
 one locale, reading it with scanf functions with another locale will
 fail. Pretty fucking stupid IMHO.
 This was/is(?) a bug in GtkRadiant, a level editor for Quake like games,
 which uses printf or something to write the map files. The map compiler
 will reject them if decimals use a , instead of a . and stuff like that.
 (The workaround is to always use the standard LOCALE, i.e. "LC_ALL=3DC
 gtkradiant" to start it).

=20
 	Actually, that is the same issue: Excel outputs numbers to CSV in a
 locale dependent way (probably using printf), which means that in
 some locales the decimal point is a coma, which prevents using it as
 a field separator. Braindead of course, and a real pain when you
 want to interface with other software.

Well, knowing Microsoft, they probably did it with printf (or fprintf or=20
whatever), not realizing that it had locale issues, but once they figured o=
ut,=20
they wouldn't fix it because that would break backwards compatibility.

=2D Jonathan M Davis

May 30 2011

Kagamin <spam here.lot> writes:

Daniel Gibson Wrote:

 In plain C (at least on Linux) you have fun locale-dependent in/output
 as well: printf and scanf are locale dependent, so if you use sprintf
 to generate a string you'll write into a file (or fprintf directly) with
 one locale, reading it with scanf functions with another locale will fail.
 Pretty fucking stupid IMHO.

Doesn't C standard specify the locale to be "C" until you set it explicitly?

May 31 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 31.05.2011 09:02, schrieb Kagamin:
 Daniel Gibson Wrote:
 
 In plain C (at least on Linux) you have fun locale-dependent in/output
 as well: printf and scanf are locale dependent, so if you use sprintf
 to generate a string you'll write into a file (or fprintf directly) with
 one locale, reading it with scanf functions with another locale will fail.
 Pretty fucking stupid IMHO.

 
 Doesn't C standard specify the locale to be "C" until you set it explicitly?

At least on Linux it is usually set to whatever you specified on
installation (usually you just say "I want a german/english/whatever
installation" and the installer then sets the locales to de_DE.UTF8 or
whatever).
Applications use these settings to decide the language of their menus etc

May 31 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-05-30 13:20, Simen Kjaeraas wrote:
 On Mon, 30 May 2011 19:57:32 +0200, J=E9r=F4me M. Berger <jeberger free.f=

r>
=20
 wrote:
 	Fun fact about Excel generated CSV files: quite apart from encoding
=20
 issues, the separator used between cells depends on the locale: for
 example, in English locales it uses a coma but in French locales it
 uses a semicolon...
=20
 	Just thought I'd point it out in case you did not know.

=20
 Fun? Gods, it's the most horrible idea I've witnessed in computing.
 If only they'd call it something other than CSV, at least - Comma Separat=

ed
 Values separated by semicolons? WTF?
 And the fantastic joy of opening one of those abominations in some other
 program... *shiver*

Well, then it isn't really CSV anymore. They different screwed the French o=
n=20
that one. Oh, you wanted your supposedly universal format to work with othe=
r=20
programs? Sorry, no can do. But you can keep using Excel! See, no reason to=
 be=20
unhappy about it. :P

=2D Jonathan M Davis

May 30 2011

Adam D. Ruppe <destructionator gmail.com> writes:

  Fun fact about Excel generated CSV files: quite apart from encoding
 issues, the separator used between cells depends on the locale: for
 example, in English locales it uses a coma but in French locales it
 uses a semicolon...

Yeah, I've seen the semicolon in the wild before too, though I didn't
know it was a locale thing.

My program solves it by confirming with the user. When you upload a
file, it tries to parse it with a few different assumptions. The
one that looks best is presented back to the user. (Looks best means
it has headings that roughly match what we expect and number of
columns that's more or less consistent).

It does charset the same way, actually. First, guess UTF-8. If that
doesn't validate, assume it's Windows-1252 unless told otherwise.

The user then confirms the guesses and organizes the final data
import.


It's worked out pretty well so far aside from unsupported charsets;
the users seem to like it.

May 30 2011

Jacob Carlborg <doob me.com> writes:

On 2011-05-30 19:57, "Jérôme M. Berger" wrote:
 Adam D. Ruppe wrote:
 Kagamin wrote:
 May be, it's his cgi lib? :)
 Client is free to send requests in any encoding, I suppose.

 In practice, that hasn't been a problem because browser tend to
 send requests in the same encoding as the html you served.

 Since the D always outputs utf8, the browsers all send back utf8
 too.


 The first problem I had was users can upload csv files, which they
 generally make in Excel... which apparently outputs Windows-1252.
 Fine for 99% of text, but then someone puts in a curly quote or
 an em dash and it throws an invalid utf 8 sequence.

 Converting that is easy enough though.

 	Fun fact about Excel generated CSV files: quite apart from encoding
 issues, the separator used between cells depends on the locale: for
 example, in English locales it uses a coma but in French locales it
 uses a semicolon...

 	Just thought I'd point it out in case you did not know.

 		Jerome

Yeah, that is a nightmare. I tried SYLK, symbolic link as well, it's 
something like CSV but more advanced, didn't work out that well either. 
I ended up using real Excel documents with the help of the rubygem 
"spreadsheet".

-- 
/Jacob Carlborg

May 30 2011

Sean Kelly <sean invisibleduck.org> writes:

I suggest looking into ICU if you're doing this stuff.  I believe =
there's even a wrapper somewhere in the Mango tree on DSource.

On May 29, 2011, at 7:21 PM, Adam D. Ruppe wrote:

 I've encountered some problems with other charsets recently. Phobos =

has
 a std.encoding that can do some useful stuff, but there's some
 encodings I've seen in the wild that it can't handle (indeed, it's
 a fairly short list that it does support)
=20
 I used gnu iconv for one of my projects and it works for me, but
 I wonder:
=20
 Is anyone planning to add more charset support to Phobos?
 (alternatively, am I missing something already there?)
=20
=20
 If no, maybe I'll do a few myself. I've never actually written code
 to do this, but it can't be rocket science. I suspect it's more
 tedious than anything else.

May 30 2011

Kagamin <spam here.lot> writes:

Adam D. Ruppe Wrote:

 The first problem I had was users can upload csv files, which they
 generally make in Excel... which apparently outputs Windows-1252.

I suppose it's system ANSI encoding, which is locale-dependent, you can see the
list of ANSI encodings for different locales somewhere in MSDN.

May 31 2011

Kagamin <spam here.lot> writes:

Adam D. Ruppe Wrote:

 The first problem I had was users can upload csv files, which they
 generally make in Excel... which apparently outputs Windows-1252.
 Fine for 99% of text, but then someone puts in a curly quote or
 an em dash and it throws an invalid utf 8 sequence.

The client usually send information about its locale, from this info you can
infer ANSI encoding.

May 31 2011

Kagamin <spam here.lot> writes:

Daniel Gibson Wrote:

 Am 31.05.2011 09:02, schrieb Kagamin:
 Daniel Gibson Wrote:
 
 In plain C (at least on Linux) you have fun locale-dependent in/output
 as well: printf and scanf are locale dependent, so if you use sprintf
 to generate a string you'll write into a file (or fprintf directly) with
 one locale, reading it with scanf functions with another locale will fail.
 Pretty fucking stupid IMHO.

 
 Doesn't C standard specify the locale to be "C" until you set it explicitly?

 
 At least on Linux it is usually set to whatever you specified on
 installation (usually you just say "I want a german/english/whatever
 installation" and the installer then sets the locales to de_DE.UTF8 or
 whatever).
 Applications use these settings to decide the language of their menus etc

according to N1425
7.11.1.1
4. At program startup, the equivalent of
setlocale(LC_ALL, "C");
is executed.

Fun fact is MS conforms with this specification.

May 31 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 31.05.2011 09:12, schrieb Kagamin:
 Daniel Gibson Wrote:
 
 Am 31.05.2011 09:02, schrieb Kagamin:
 Daniel Gibson Wrote:

 In plain C (at least on Linux) you have fun locale-dependent in/output
 as well: printf and scanf are locale dependent, so if you use sprintf
 to generate a string you'll write into a file (or fprintf directly) with
 one locale, reading it with scanf functions with another locale will fail.
 Pretty fucking stupid IMHO.

 Doesn't C standard specify the locale to be "C" until you set it explicitly?

 At least on Linux it is usually set to whatever you specified on
 installation (usually you just say "I want a german/english/whatever
 installation" and the installer then sets the locales to de_DE.UTF8 or
 whatever).
 Applications use these settings to decide the language of their menus etc

 
 according to N1425
 7.11.1.1
 4. At program startup, the equivalent of
 setlocale(LC_ALL, "C");
 is executed.
 
 Fun fact is MS conforms with this specification.

So they break it deliberately in Excel? Smart.

May 31 2011

Kagamin <spam here.lot> writes:

Daniel Gibson Wrote:

 according to N1425
 7.11.1.1
 4. At program startup, the equivalent of
 setlocale(LC_ALL, "C");
 is executed.
 
 Fun fact is MS conforms with this specification.

 
 So they break it deliberately in Excel? Smart.

Excel deliberately localizes data presented to the user. Wouldn't it be strange
for user to work with C locale (Excel users aren't programmers)? It even
translates builtin function names :)
As this presentation is quite customizable, I doubt it's done by c runtime. I
think it just gets string values from cells during CSV generation.

May 31 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 31.05.2011 13:12, schrieb Kagamin:
 Daniel Gibson Wrote:
 
 according to N1425
 7.11.1.1
 4. At program startup, the equivalent of
 setlocale(LC_ALL, "C");
 is executed.

 Fun fact is MS conforms with this specification.

 So they break it deliberately in Excel? Smart.

 
 Excel deliberately localizes data presented to the user. Wouldn't it be
strange for user to work with C locale (Excel users aren't programmers)? It
even translates builtin function names :)

I'm not talking about representing the values on the screen - I'm
talking about the format of CSV files.
And I find translated function names pretty strange.. I'm wondering how
well that works when opening a file with another locale etc.

 As this presentation is quite customizable, I doubt it's done by c runtime. I
think it just gets string values from cells during CSV generation.

May 31 2011

Kagamin <spam here.lot> writes:

Daniel Gibson Wrote:

 I'm not talking about representing the values on the screen - I'm
 talking about the format of CSV files.
 And I find translated function names pretty strange.. I'm wondering how
 well that works when opening a file with another locale etc.

Isn't it natural to get string from a cell and put it into the output CSV
stream? UI also gets string from a cell and presents it.

 As this presentation is quite customizable, I doubt it's done by c runtime. I
think it just gets string values from cells during CSV generation.

May 31 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 31.05.2011 14:54, schrieb Kagamin:
 Daniel Gibson Wrote:
 
 I'm not talking about representing the values on the screen - I'm
 talking about the format of CSV files.
 And I find translated function names pretty strange.. I'm wondering how
 well that works when opening a file with another locale etc.

 
 Isn't it natural to get string from a cell and put it into the output CSV
stream? UI also gets string from a cell and presents it.

It's natural to have an internal representation of the value of a field
and different representations in the UI (this could be dependent on
locale settings etc) and for saving it on the disk (this should really
not depend on a locale, but be portable). Ideally the different
representations can be converted to each other in a lossless way.

So the representation on the disk (as CSV, .xls, XML, whatever) doesn't
have to match the screen representation.

 
 As this presentation is quite customizable, I doubt it's done by c runtime. I
think it just gets string values from cells during CSV generation.

May 31 2011

Kagamin <spam here.lot> writes:

Daniel Gibson Wrote:

 And I find translated function names pretty strange.. I'm wondering how
 well that works when opening a file with another locale etc.

I've checked excel 2007, seems like it stores (in xlsx) numbers and function
names in locale independent form. Don't know, how it works in older versions.

May 31 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 31.05.2011 15:02, schrieb Kagamin:
 Daniel Gibson Wrote:
 
 And I find translated function names pretty strange.. I'm wondering how
 well that works when opening a file with another locale etc.

 
 I've checked excel 2007, seems like it stores (in xlsx) numbers and function
names in locale independent form. Don't know, how it works in older versions.

Ok. Everything else would be a really unusable mess, especially for
international companies.

May 31 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Character set conversions