digitalmars.D - Planning to migrate SDWF to Unicode
- Stewart Gordon <smjg_1998 yahoo.com> Jan 21 2012
Those who've been following SDWF will by now have realised that it abuses char for ANSI strings, whereas D strings are meant to be in Unicode. It's high time I did something about this. When I started on it, I was still using Windows 98, which has very limited Unicode support. But that was years ago now. And it must be coming on 7 years now since MS discontinued support for it. So maybe I might as well drop Windows 9x support, just like 16-bit support was dropped with the creation of D (which was only 4 years after Windows 95, after all). As such, I plan to change SDWF to work in Unicode. Probably using UTF-16 internally, but possibly giving the programmer the choice between UTF-8 and UTF-16. But this begs the question of what to do with the existing char-based API. Possibilities I've thought of: (a) Just get rid of it. Programmers upgrading to the new SDWF version will be forced to change instances of char to wchar; what more there is to do depends on what else the program does with character/string data. (b) Keep functions that take a char or char parameter, make them convert from ANSI to UTF-16, but deprecate them. Thinking about it now, there are problems: - In order to have versions of each function that return an ANSI string and that return a Unicode string, I would need to name them differently, which could get ugly. - When returning ANSI, what would happen to characters outside the code page? - Mixing ANSI and Unicode could also have adverse effects on the interpretation of string literals. So maybe this isn't a good plan at all. (c) Use versioning to give the programmer the choice of an ANSI API or a UTF-16 API, rather like the WindowsAPI bindings themselves. (d) Change char functions to use UTF-8. This would break any code that relies on the characters being ANSI, or even that manipulates text on a one character, one byte basis. As with (c), versioning could be used to give a choice between UTF-8 and UTF-16. If path (b) or (c) is taken, the ANSI API could later be removed. Once this is done, or if path (a) is taken, we could add UTF-8 support, thereby ending up at (d). It's early days yet, but the thread I started a few hours ago ("D1, D2 and the future of libraries") could still lead to my migrating SDWF to D2. If it does, I'll likely combine the migration to Unicode with this. Thoughts? Stewart.
Jan 21 2012