www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - ICU D Wrapper

reply "Jake" <asdf asdf.com> writes:
I'm not sure if anyone has noticed, but the D wrapper for C's ICU 
library is far from working it seems. mango.icu is its technical 
name. I've read articles on the forum about how excited people 
were to get ICU usable in D, but whoever made mango.icu hasn't 
made any updates on it or even documented it online.

I'm just letting you all know about this. D seems to already have 
a bunch of ICU's functionality though, so maybe the wrapper died 
on purpose. I really just wanted to use its time zone and date 
features since std.datetime doesn't make it very easy to use the 
TZ database on Windows.
Dec 12 2014
next sibling parent reply "Trent Forkert" <trentforkert+d gmail.com> writes:
On Friday, 12 December 2014 at 16:51:43 UTC, Jake wrote:
 I'm not sure if anyone has noticed, but the D wrapper for C's 
 ICU library is far from working it seems. mango.icu is its 
 technical name. I've read articles on the forum about how 
 excited people were to get ICU usable in D, but whoever made 
 mango.icu hasn't made any updates on it or even documented it 
 online.

 I'm just letting you all know about this. D seems to already 
 have a bunch of ICU's functionality though, so maybe the 
 wrapper died on purpose. I really just wanted to use its time 
 zone and date features since std.datetime doesn't make it very 
 easy to use the TZ database on Windows.
I've looked into writing a binding for ICU recently, but ultimately decided to abandon that idea in favor of writing a replacement for it in D. The reasons for this are: * ICU breaks its ABI with every release, meaning a D binding would only work for one version of ICU, and need pragma(mangle) to have a hope of easy updating. Alternatively, what mango.icu seems to have done is load ICU at runtime in order to figure out what library to bind * ICU's data and APIs use UTF-16. I'd rather everything be UTF-8. * ICU's API is incredibly inconvenient for (if not impossible to access from) D. For example, some of the functionality requires binding C++ classes that use multiple inheritance * A decent chunk (though not all) of ICU is actually generated from CLDR, meaning I can do the same It looks to me like mango.icu hasn't updated since ICU v38 (it is up to v54 now), and made extensive use of wrappers in order to hide the C-API nastiness. It also doesn't support any of the functionality that requires C++. Binding ICU would be very nice, but this is one of the few cases I actually think we'd be better off rolling our own. I'm still a little ways off from having my work ready for public release, but I've been making good progress recently. If you can point out what ICU API you need, I'll make sure to included equivalent API in my library. - Trent
Dec 12 2014
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 12 December 2014 at 17:57:41 UTC, Trent Forkert wrote:
 I've looked into writing a binding for ICU recently, but 
 ultimately decided to abandon that idea in favor of writing a 
 replacement for it in D.
Wow... really? You're actually going to write transcoders for all available encodings? Plus the conversion and parsing tools, plus expand our calendar functionality to handle the things it doesn't do now, plus... I mean I'd love it, but the scope of the project can be measured in tens of man-years.
Dec 13 2014
parent "Trent Forkert" <trentforkert+d gmail.com> writes:
On Saturday, 13 December 2014 at 15:44:59 UTC, Sean Kelly wrote:
 On Friday, 12 December 2014 at 17:57:41 UTC, Trent Forkert 
 wrote:
 I've looked into writing a binding for ICU recently, but 
 ultimately decided to abandon that idea in favor of writing a 
 replacement for it in D.
Wow... really? You're actually going to write transcoders for all available encodings? Plus the conversion and parsing tools, plus expand our calendar functionality to handle the things it doesn't do now, plus... I mean I'd love it, but the scope of the project can be measured in tens of man-years.
Running down the icu4c API listing: * Basic Types and Constants - only as needed * Strings and character iteration - Just use D strings, std.string * Unicode character properties and names - I think std.uni handles this * Sets of Unicode Code Points and Strings - ditto * Codepage conversion - ignoring, at least for now. See below. * Unicode text compression - again, I think std.uni handles this * Locales - yes * Resource Bundles - will offer equivalent functionality, just not identical * Normalization - std.uni * Calendars - see below * Date and time formatting - yes * Message formatting - yes * Number formatting / spell-out - yes * Transliteration - yes, but may be delayed until after initial release * Bidirectional Algorithm - not at first, is this in std.uni? * Arabic shaping - not at first, is this in std.uni? * Collation - I'm delaying this until after the initial release to get it out faster * String searching - depends on Collation * Index characters - depends on Collation * Text Boundary analysis - depends on Collation * Regular Expression - use std.regex * StringPrep - not initially, is this in std.uni? * IDNA - not initially, is this in Phobos? * Identifier spoofing and confusability - not initially * Layout engine - delayed, looks like ICU is removing this and pointing to another library * Universal Time Scale - see below * ICU I/O - use phobos There are very few things above that are not possible to generate from CLDR data. Of those, most are RFC-defined algorithms, several of which I believe are already part of Phobos. If I add codepage conversion, it will likely be in terms of iconv on POSIX and MultiByteToWideChar and friends on Windows. Alternatively, I could "borrow" the IBM CDRA/UCM data the way I'm getting almost everything else from CLDR data. Support of other calendar systems is up in the air at the moment. I had thought CLDR contained what I needed, but it looks like it might not. It has locale-specific formatting and display info for calendars, and mappings to when other calendar's eras begin in terms of the Gregorian calendar, but I don't see further breakdown of information. So, initially it looks like I'll only be supporting Gregorian calendar, but I may add the others in the future. It is a lot of work, yes, but the Unicode Consortium already does a significant chunk of it with CLDR. - Trent
Dec 13 2014
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
12-Dec-2014 19:51, Jake пишет:
 I'm not sure if anyone has noticed, but the D wrapper for C's ICU
 library is far from working it seems. mango.icu is its technical name.
 I've read articles on the forum about how excited people were to get ICU
 usable in D, but whoever made mango.icu hasn't made any updates on it or
 even documented it online.

 I'm just letting you all know about this. D seems to already have a
 bunch of ICU's functionality though, so maybe the wrapper died on
 purpose. I really just wanted to use its time zone and date features
 since std.datetime doesn't make it very easy to use the TZ database on
 Windows.
Well I collect ideas/enhancmenets for std.uni so feel free to list what's missing and what primitives do you need for TZ database. -- Dmitry Olshansky
Dec 12 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-12-12 17:51, Jake wrote:
 I'm not sure if anyone has noticed, but the D wrapper for C's ICU
 library is far from working it seems. mango.icu is its technical name.
That library is very old, for the days of D1, and not maintained anymore. There's also this version, that might be more up to date [1]. [1] https://github.com/d-widget-toolkit/com.ibm.icu -- /Jacob Carlborg
Dec 12 2014