www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - iconv

reply Ben Hinkle <bhinkle4 juno.com> writes:
I just stumbled across the iconv routines in POSIX to convert between
charsets. I think they look promising since it appears to be lighter weight
than ICU - not that I object to having the ICU interface around. Does
anyone know of a free win32 port that could be used for D projects?
Here's an example of how to use iconv in D on Linux:

module iconv;

// converter datatype
typedef void *iconv_t;

// allocate a converter between charsets fromcode and tocode
extern (C) iconv_t iconv_open (char *tocode, char *fromcode);

// convert inbuf to outbuf and set inbytesleft to unused input and
// outbuf to unused output and return number of non-reversable 
// conversions or -1 on error.
extern (C) size_t iconv (iconv_t cd, void **inbuf,
                         size_t *inbytesleft,
                         void **outbuf,
                         size_t *outbytesleft);

// close converter
extern (C) int iconv_close (iconv_t cd);

private import std.stdio;

int main() {
  iconv_t cd = iconv_open("UTF-16LE","UTF-8");

  char[] str = "this is a test";
  void* inp = str;
  size_t in_len = str.length;

  wchar[256] outstr; // some giant buffer
  void* outp=outstr;
  size_t out_len = outstr.length;

  size_t res = iconv(cd,&inp,&in_len,&outp,&out_len);

  writefln(outstr[0..str.length]);
  iconv_close(cd);
  return 0;
}
Nov 28 2004
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 I just stumbled across the iconv routines in POSIX to convert between
 charsets. I think they look promising since it appears to be lighter weight
 than ICU - not that I object to having the ICU interface around. Does
 anyone know of a free win32 port that could be used for D projects?
I believe http://gettext.sourceforge.net/ has some Win32 stuff :
 The official GNU gettext 0.13+ and libiconv-1.9+ are win32 ready
 out-of-the-box, and the GNU releases now include "woe32" binaries
--anders
Nov 28 2004
prev sibling parent reply "Kris" <fu bar.com> writes:
Hey Ben,

I've also been looking at iconv, for the same reasons ... if you don't need
all the other ICU stuff (which is really great, btw) then iconv offers a
reasonable alternative for character-conversion only. Was intending to wrap
it with a D class, and then hook it up to Mango.io via an adapter. For those
who care, iconv is GPL'd.

Here's a link to a page where win32 DLLs are made available:
http://gettext.sourceforge.net/

Also, your example looks like it perhaps has a typo: should that say
"outstr[0..out_len]" instead of "outstr[0..str.length]" ?

If you do this, I'd certainly like to leverage it!

- Kris


"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:cod669$2j2$1 digitaldaemon.com...
| I just stumbled across the iconv routines in POSIX to convert between
| charsets. I think they look promising since it appears to be lighter
weight
| than ICU - not that I object to having the ICU interface around. Does
| anyone know of a free win32 port that could be used for D projects?
| Here's an example of how to use iconv in D on Linux:
|
| module iconv;
|
| // converter datatype
| typedef void *iconv_t;
|
| // allocate a converter between charsets fromcode and tocode
| extern (C) iconv_t iconv_open (char *tocode, char *fromcode);
|
| // convert inbuf to outbuf and set inbytesleft to unused input and
| // outbuf to unused output and return number of non-reversable
| // conversions or -1 on error.
| extern (C) size_t iconv (iconv_t cd, void **inbuf,
|                          size_t *inbytesleft,
|                          void **outbuf,
|                          size_t *outbytesleft);
|
| // close converter
| extern (C) int iconv_close (iconv_t cd);
|
| private import std.stdio;
|
| int main() {
|   iconv_t cd = iconv_open("UTF-16LE","UTF-8");
|
|   char[] str = "this is a test";
|   void* inp = str;
|   size_t in_len = str.length;
|
|   wchar[256] outstr; // some giant buffer
|   void* outp=outstr;
|   size_t out_len = outstr.length;
|
|   size_t res = iconv(cd,&inp,&in_len,&outp,&out_len);
|
|   writefln(outstr[0..str.length]);
|   iconv_close(cd);
|   return 0;
| }
|
Nov 28 2004
next sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Kris wrote:

 Hey Ben,
 
 I've also been looking at iconv, for the same reasons ... if you don't
 need all the other ICU stuff (which is really great, btw) then iconv
 offers a reasonable alternative for character-conversion only. Was
 intending to wrap it with a D class, and then hook it up to Mango.io via
 an adapter. For those who care, iconv is GPL'd.

 Here's a link to a page where win32 DLLs are made available:
 http://gettext.sourceforge.net/
ok thanks for the link. I've found libiconv, too http://www.gnu.org/software/libiconv/
 Also, your example looks like it perhaps has a typo: should that say
 "outstr[0..out_len]" instead of "outstr[0..str.length]" ?
I could be wrong but I think out_len is decremented for each byte used up from the output buffer. I was printing out_len and it had values in the 200's or so after iconv returned. The iconv API is a tad wierd :-)
 If you do this, I'd certainly like to leverage it!
cool - I'll keep goofing around with it.
 
 - Kris
Nov 28 2004
parent "Kris" <fu bar.com> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:codb2u$dp2$1 digitaldaemon.com...
| I could be wrong but I think out_len is decremented for each byte used up
| from the output buffer. I was printing out_len and it had values in the
| 200's or so after iconv returned. The iconv API is a tad wierd :-)

I followed your link, to the documentation, and 'outlen' is indeed the bytes
left in the output buffer. My mistake. So would outstr[0 .. outstr.length -
out_len] be correct ?

The 64 million dollar question is this: when it fills the output buffer to
the point where there's no room for more, does it convey an accurate count
for the bytes consumed thus far from the input? I would imagine so, but
there are some niggling edge-conditions there; related to stateful,
streaming conversions.

Will look forward to your success with this lib.
Nov 28 2004
prev sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kris wrote:

 For those who care, iconv is GPL'd.
GNU libiconv 1.9.1 is released under the *LGPL*, not the GPL... There is a difference. (http://www.gnu.org/copyleft/lesser.html) --anders
Nov 28 2004
next sibling parent "Kris" <fu bar.com> writes:
"Anders F Björklund" <afb algonet.se> wrote ...
| GNU libiconv 1.9.1 is released under the *LGPL*, not the GPL...
|
| There is a difference. (http://www.gnu.org/copyleft/lesser.html)

There is indeed! Sorry, and thank-you for the correction ... I'm clearly
'distracted' today :-(
Nov 28 2004
prev sibling parent reply "Simon Buchan" <currently no.where> writes:
On Mon, 29 Nov 2004 01:00:23 +0100, Anders F Björklund <afb algonet.se>  
wrote:

 Kris wrote:

 For those who care, iconv is GPL'd.
GNU libiconv 1.9.1 is released under the *LGPL*, not the GPL... There is a difference. (http://www.gnu.org/copyleft/lesser.html) --anders
Thought LGPL was the Library GPL, as opposed to Lesser GPL, or is there no difference? (Or worse, there is, but the not in the acronym?) -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Nov 28 2004
next sibling parent reply Bill Cox <Bill_member pathlink.com> writes:
In article <opsh7n88qyjccy7t simon.homenet>, Simon Buchan says...
On Mon, 29 Nov 2004 01:00:23 +0100, Anders F Björklund <afb algonet.se>  
wrote:

 Kris wrote:

 For those who care, iconv is GPL'd.
GNU libiconv 1.9.1 is released under the *LGPL*, not the GPL... There is a difference. (http://www.gnu.org/copyleft/lesser.html) --anders
Thought LGPL was the Library GPL, as opposed to Lesser GPL, or is there no difference? (Or worse, there is, but the not in the acronym?) -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
They're basically the same thing. RMS changed the word 'library' to 'lesser', and also changed the license to allow anyone to convert their copy of the source to the full GPL at any later point without asking the author. This basically reflects his feeling that LGPL'd code is bad for the free-source movement (he dislikes the term 'open-source') in that it helps commercial software companies. AFAIK, there isn't any major difference between the new 'lesser' and the old 'library' license.
Nov 28 2004
parent "Simon Buchan" <currently no.where> writes:
On Mon, 29 Nov 2004 06:39:26 +0000 (UTC), Bill Cox  
<Bill_member pathlink.com> wrote:

 In article <opsh7n88qyjccy7t simon.homenet>, Simon Buchan says...
 On Mon, 29 Nov 2004 01:00:23 +0100, Anders F Björklund <afb algonet.se>
 wrote:

 Kris wrote:

 For those who care, iconv is GPL'd.
GNU libiconv 1.9.1 is released under the *LGPL*, not the GPL... There is a difference. (http://www.gnu.org/copyleft/lesser.html) --anders
Thought LGPL was the Library GPL, as opposed to Lesser GPL, or is there no difference? (Or worse, there is, but the not in the acronym?) -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
They're basically the same thing. RMS changed the word 'library' to 'lesser', and also changed the license to allow anyone to convert their copy of the source to the full GPL at any later point without asking the author. This basically reflects his feeling that LGPL'd code is bad for the free-source movement (he dislikes the term 'open-source') in that it helps commercial software companies. AFAIK, there isn't any major difference between the new 'lesser' and the old 'library' license.
Plus, the naming makes you want the full GPL, cause it MUST be better, right? right? wrong. I don't really want to start a religious war here, but RMS seems to take the same stance toward using looser licences that record companies take toward piracy, the whole "lost sale" pile of crud, i.e.: Theory: pirate can't steal -> pirate pays developer can't make proprietry with free -> developer makes free Practice: pirate steals something else developer makes it by his/her self Obviously there are differences (like the law, for one :P) but the concept is the same. (RMS would probably freak if he knew he was being compared to IP companies :D) -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Nov 28 2004
prev sibling parent =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Simon Buchan wrote:

 Thought LGPL was the Library GPL, as opposed to Lesser GPL, or is
 there no difference? (Or worse, there is, but the not in the acronym?)
It was, but they (GNU) changed the name in "version 2.1" of the license:
 "This license was formerly called the Library GPL, but we changed the name,
 because the old name encouraged people to use this license more often than
 it really ought to be used."
See http://www.gnu.org/licenses/why-not-lgpl.html --anders
Nov 29 2004