www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Planning to migrate SDWF to Unicode

Those who've been following SDWF will by now have realised that it abuses
char[] for ANSI 
strings, whereas D strings are meant to be in Unicode.  It's high time I did
something 
about this.

When I started on it, I was still using Windows 98, which has very limited
Unicode 
support.  But that was years ago now.  And it must be coming on 7 years now
since MS 
discontinued support for it.  So maybe I might as well drop Windows 9x support,
just like 
16-bit support was dropped with the creation of D (which was only 4 years after
Windows 
95, after all).

As such, I plan to change SDWF to work in Unicode.  Probably using UTF-16
internally, but 
possibly giving the programmer the choice between UTF-8 and UTF-16.

But this begs the question of what to do with the existing char-based API. 
Possibilities 
I've thought of:

(a) Just get rid of it.  Programmers upgrading to the new SDWF version will be
forced to 
change instances of char to wchar; what more there is to do depends on what
else the 
program does with character/string data.

(b) Keep functions that take a char or char[] parameter, make them convert from
ANSI to 
UTF-16, but deprecate them.  Thinking about it now, there are problems:
- In order to have versions of each function that return an ANSI string and
that return a 
Unicode string, I would need to name them differently, which could get ugly.
- When returning ANSI, what would happen to characters outside the code page?
- Mixing ANSI and Unicode could also have adverse effects on the interpretation
of string 
literals.
So maybe this isn't a good plan at all.

(c) Use versioning to give the programmer the choice of an ANSI API or a UTF-16
API, 
rather like the WindowsAPI bindings themselves.

(d) Change char functions to use UTF-8.  This would break any code that relies
on the 
characters being ANSI, or even that manipulates text on a one character, one
byte basis. 
As with (c), versioning could be used to give a choice between UTF-8 and UTF-16.


If path (b) or (c) is taken, the ANSI API could later be removed.  Once this is
done, or 
if path (a) is taken, we could add UTF-8 support, thereby ending up at (d).

It's early days yet, but the thread I started a few hours ago ("D1, D2 and the
future of 
libraries") could still lead to my migrating SDWF to D2.  If it does, I'll
likely combine 
the migration to Unicode with this.

Thoughts?

Stewart.
Jan 21 2012