www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Let's schedule WinAPI ASCII functions for deprecation!

reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
Since Win9x isn't supported any more why do we have ASCII WinAPI 
functions in druntime's core.sys.windows.windows (and, possibly, other 
places)?

Reasons against *A functions:
* using of every such function is unsafe (with really seldom exceptions 
like LoadLibraryA("ntdll")) because inability to encode non-ASCII 
characters to OEM encoding will almost always give unpredictable results 
for programmer (simple test: you, reader, what will happen?);
* in D it's too easy to make a mistake by passing UTF-8 string pointer 
to such function because D has no string types other than UTF and 
elimination of such function is the only solution unless ASCII string 
type is created
* it performs worse because Windows has to convert ASCII string to 
UTF-16 first

And yes, druntime already has encoding bugs because of using such functions.

P.S.
Let's finally solve encoding problems that should be solved 10 years 
ago! By the way, Git+TurtoiseGit still has encoding problems on Windows 
and it is awful (see its changelog).

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
May 22 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 22.05.2012 22:11, Denis Shelomovskij wrote:
 Since Win9x isn't supported any more why do we have ASCII WinAPI
 functions in druntime's core.sys.windows.windows (and, possibly, other
 places)?

 Reasons against *A functions:
 * using of every such function is unsafe (with really seldom exceptions
 like LoadLibraryA("ntdll")) because inability to encode non-ASCII
 characters to OEM encoding will almost always give unpredictable results
 for programmer (simple test: you, reader, what will happen?);
 * in D it's too easy to make a mistake by passing UTF-8 string pointer
 to such function because D has no string types other than UTF and
 elimination of such function is the only solution unless ASCII string
 type is created
 * it performs worse because Windows has to convert ASCII string to
 UTF-16 first

 And yes, druntime already has encoding bugs because of using such
 functions.

Yes, let them burn! Burn, burn, burn! Seriously. For those that are bend on compatibility, *A functions also are: - security disasters - limited in more then just one way: 256 max path, and so on and so forth And last but not least: - *W were supported on Win98+ Second Edition with official addon - Unicode Layer for Windows ;) Not to mention the OEM encoding were never supported properly by D.
 P.S.
 Let's finally solve encoding problems that should be solved 10 years
 ago! By the way, Git+TurtoiseGit still has encoding problems on Windows
 and it is awful (see its changelog).

-- Dmitry Olshansky
May 22 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
 P.S.
 Let's finally solve encoding problems that should be solved 10 years
 ago! By the way, Git+TurtoiseGit still has encoding problems on Windows
 and it is awful (see its changelog).


forgot to mention that my GSOC project has support for legacy encodings as it's secondary goal. Check out: TODOs, synopsis & status: https://github.com/blackwhale/phobos/wiki/GSOC-Unicode-support/tree/gsoc-uni Original proposal: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dolsh/20002# -- Dmitry Olshansky
May 22 2012
prev sibling next sibling parent "Roman D. Boiko" <rb d-coding.com> writes:
On Tuesday, 22 May 2012 at 18:39:46 UTC, Dmitry Olshansky wrote:
 P.S.
 Let's finally solve encoding problems that should be solved 
 10 years
 ago! By the way, Git+TurtoiseGit still has encoding problems 
 on Windows
 and it is awful (see its changelog).


forgot to mention that my GSOC project has support for legacy encodings as it's secondary goal. Check out: TODOs, synopsis & status: https://github.com/blackwhale/phobos/wiki/GSOC-Unicode-support/tree/gsoc-uni Original proposal: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dolsh/20002#

Dmitry, your project looks really cool. As for the topic, I would vote for that, too, but don't have enough knowledge to understand all possible consequences...
May 22 2012
prev sibling next sibling parent "Roman D. Boiko" <rb d-coding.com> writes:
On Tuesday, 22 May 2012 at 18:43:58 UTC, Roman D. Boiko wrote:
 Dmitry, your project looks really cool.

 As for the topic, I would vote for that, too, but don't have 
 enough knowledge to understand all possible consequences...

relevant tradeoffs".
May 22 2012
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 22/05/2012 19:24, Dmitry Olshansky wrote:
<snip>
 * in D it's too easy to make a mistake by passing UTF-8 string pointer
 to such function


That's just as easy in almost any language. It's part of why so many websites have character encoding bugs. <snip>
 And last but not least:
 - *W were supported on Win98+ Second Edition with official addon - Unicode
Layer for
 Windows ;)

I've heard of MS Layer for Unicode - don't know if that's what you meant or you're talking about something else. From what I recall reading, MSLU had the problem that EXEs have to be explicitly built to depend on it. So a typical app targeted at Win2000 and above wouldn't work with it, and you can't (at least easily) make an app detect whether Unicode is available and use it if it's there. Stewart.
May 23 2012
prev sibling next sibling parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
LPTSTR issue (it aliases char*) is already filled by Martin Nowak:
Issue 8132 - LPTSTR always aliases to LPSTR
http://d.puremagic.com/issues/show_bug.cgi?id=8132


-- 
Денис В. Шеломовский
Denis V. Shelomovskij
May 22 2012
prev sibling next sibling parent reply "Martin Nowak" <dawg dawgfoto.de> writes:
 * it performs worse because Windows has to convert ASCII string to  
 UTF-16 first

 P.S.
 Let's finally solve encoding problems that should be solved 10 years  
 ago! By the way, Git+TurtoiseGit still has encoding problems on Windows  
 and it is awful (see its changelog).

Given that it only requires a 'w' suffix for literals it's a good choice.
May 22 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 22.05.2012 23:32, Martin Nowak wrote:
 * it performs worse because Windows has to convert ASCII string to
 UTF-16 first

 P.S.
 Let's finally solve encoding problems that should be solved 10 years
 ago! By the way, Git+TurtoiseGit still has encoding problems on
 Windows and it is awful (see its changelog).

Given that it only requires a 'w' suffix for literals it's a good choice.

http://stackoverflow.com/questions/7950271/windows-uses-utf-16-as-its-internal-encoding-what-exactly-does-this-mean Second answer sheds some light on the topic. From what I know of Windows NT, the kernel even doesn't use Z-strings most of the time. All stuff that can be called syscalls use a variation of L-strings for 16-bit width chars. -- Dmitry Olshansky
May 22 2012
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/22/2012 12:32 PM, Martin Nowak wrote:
 * it performs worse because Windows has to convert ASCII string to UTF-16 first


Yes. Windows internally is all 16 bit Unicode.
May 22 2012
prev sibling next sibling parent Trass3r <un known.com> writes:
Yeah let 'em burn!
May 22 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/22/2012 11:11 AM, Denis Shelomovskij wrote:
 Since Win9x isn't supported any more why do we have ASCII WinAPI functions in
 druntime's core.sys.windows.windows (and, possibly, other places)?

 Reasons against *A functions:
 * using of every such function is unsafe (with really seldom exceptions like
 LoadLibraryA("ntdll")) because inability to encode non-ASCII characters to OEM
 encoding will almost always give unpredictable results for programmer (simple
 test: you, reader, what will happen?);
 * in D it's too easy to make a mistake by passing UTF-8 string pointer to such
 function because D has no string types other than UTF and elimination of such
 function is the only solution unless ASCII string type is created
 * it performs worse because Windows has to convert ASCII string to UTF-16 first

 And yes, druntime already has encoding bugs because of using such functions.

First off, I agree that druntime and phobos must not use the A functions without a very, very good reason. Secondly, as a matter of principle, we are not going to fix, improve, refactor, or re-engineer the Windows API, nor any other operating system API, nor the C Standard Library, no matter how tempting that may be. The job of the D interface modules is to simply provide an interface to them, as thin and direct as possible, without editorial comment. The user can decide what to use or not use from it.
May 22 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 23.05.2012 0:41, Walter Bright wrote:
 On 5/22/2012 11:11 AM, Denis Shelomovskij wrote:
 Since Win9x isn't supported any more why do we have ASCII WinAPI
 functions in
 druntime's core.sys.windows.windows (and, possibly, other places)?

 Reasons against *A functions:
 * using of every such function is unsafe (with really seldom
 exceptions like
 LoadLibraryA("ntdll")) because inability to encode non-ASCII
 characters to OEM
 encoding will almost always give unpredictable results for programmer
 (simple
 test: you, reader, what will happen?);
 * in D it's too easy to make a mistake by passing UTF-8 string pointer
 to such
 function because D has no string types other than UTF and elimination
 of such
 function is the only solution unless ASCII string type is created
 * it performs worse because Windows has to convert ASCII string to
 UTF-16 first

 And yes, druntime already has encoding bugs because of using such
 functions.

First off, I agree that druntime and phobos must not use the A functions without a very, very good reason.

Right.
 Secondly, as a matter of principle, we are not going to fix, improve,
 refactor, or re-engineer the Windows API, nor any other operating system
 API, nor the C Standard Library, no matter how tempting that may be. The
 job of the D interface modules is to simply provide an interface to
 them, as thin and direct as possible, without editorial comment. The
 user can decide what to use or not use from it.

Again correct. The trick is that the way *A functions are provided is in fact wrong edit! It signatres are basically saying "hello I'm explicit Win32 API multi-byte string binding and I accept UTF-8 string " ... WTF?! The fact that they are horribly outdated is the perfect moment to both fix the issue and get rid of junk. -- Dmitry Olshansky
May 22 2012
prev sibling parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
23.05.2012 0:41, Walter Bright написал:
 Secondly, as a matter of principle, we are not going to fix, improve,
 refactor, or re-engineer the Windows API, nor any other operating system
 API, nor the C Standard Library, no matter how tempting that may be. The
 job of the D interface modules is to simply provide an interface to
 them, as thin and direct as possible, without editorial comment. The
 user can decide what to use or not use from it.

The key point is what does it mean "interface"? An ability to load DLL and get symbols from it is enough to use every function. Is it an interface? You say "no". It's common in C/C++ to use WinAPI functions without A/W postfixes because preprocessor defines it according to your preferences. Is it an interface? You say "no". Functions like C's memmove are deprecated in VC headers on Windows because they are unsafe. Is it an interface? You say "no". WinAPI functions are more than just C definitions, they have IDL to allow user to avoid pointers and exit code checking. Is it an interface? You say "no". There is no such macros in Windows headers even for dmc and there is no talks at all to generate good D wrappers for WinAPI functions based on its IDL. *A functions are in WinAPI headers obviously for backward compatibility only. Are they definitions an interface? You say "yes". And I completely disagree with the last 2 points. I just want to show that this "principle" isn't as well-shaped as it can look at first sight. -- Денис В. Шеломовский Denis V. Shelomovskij
May 24 2012
prev sibling next sibling parent Gor Gyolchanyan <gor.f.gyolchanyan gmail.com> writes:
--bcaec554d842108aad04c0a61cef
Content-Type: text/plain; charset=UTF-8

On Wed, May 23, 2012 at 12:31 AM, Trass3r <un known.com> wrote:

 Yeah let 'em burn!

Kill it! Kill it with fire!!! +1 -- Bye, Gor Gyolchanyan. --bcaec554d842108aad04c0a61cef Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Wed, May 23, 2012 at 12:31 AM, Trass3r <span = dir=3D"ltr">&lt;<a href=3D"mailto:un known.com" target=3D"_blank">un known.= com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"mar= gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Yeah let &#39;em burn!<br> </blockquote></div><br>Kill it! Kill it with fire!!!<div>+1<br clear=3D"all= "><div><br></div>-- <br>Bye,<br>Gor Gyolchanyan.<br> </div> --bcaec554d842108aad04c0a61cef--
May 22 2012
prev sibling next sibling parent reply "Mehrdad" <wfunction hotmail.com> writes:
I hope this includes SNN.lib, which also uses ANSI functions...
May 22 2012
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 23/05/2012 15:16, Kagamin wrote:
<snip>
 Well, you can't fix C because C explicitly ignores string encoding and
thoughtlessly
 passes strings around without any transcoding. Though, D bindings suggest that
C functions
 accept utf-8 strings

A lot of C functions do. Indeed, this is one of the considerations made in the design of UTF-8.
 which leads to assumption that those functions will act properly on
 utf-8 strings. I'd say that's a bug in bindings: C strings are specified to be
in C
 encoding,

What is "C encoding"?
 not utf-8 encoding. I think, conversion from D string to C string should
require
 at least a cast.

Several people have dealt with this by using byte or ubyte as D's equivalent of the C char type. Stewart.
May 23 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-05-23 20:34, Stewart Gordon wrote:

 What is "C encoding"?

Since C doesn't really have a concept of encodings it would be whatever a given application/library decides it is. -- /Jacob Carlborg
May 23 2012
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Wednesday, 23 May 2012 at 04:01:05 UTC, Mehrdad wrote:
 I hope this includes SNN.lib, which also uses ANSI functions...

Well, you can't fix C because C explicitly ignores string encoding and thoughtlessly passes strings around without any transcoding. Though, D bindings suggest that C functions accept utf-8 strings which leads to assumption that those functions will act properly on utf-8 strings. I'd say that's a bug in bindings: C strings are specified to be in C encoding, not utf-8 encoding. I think, conversion from D string to C string should require at least a cast.
May 23 2012
prev sibling next sibling parent reply "Michael" <pr m1xa.com> writes:
In WinAPI we have: LoadLibraryA/W, but not GetProcAddressA/W 
because  PE COFF limitations exists.

Walter Bright
The user can decide what to use or not use from it.

 256 max path

May 23 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 23.05.2012 23:29, Michael wrote:
 In WinAPI we have: LoadLibraryA/W, but not GetProcAddressA/W because PE
 COFF limitations exists.

 Walter Bright
 The user can decide what to use or not use from it.

 256 max path


Nope. Quoting random top hit from google: Individual components of a filename (i.e. each subdirectory along the path, and the final filename) are limited to 255 characters, and the total path length is limited to approximately 32,000 characters. However, you should generally try to limit path lengths to below 260 characters (MAX_PATH) when possible. See http://msdn.microsoft.com/en-us/library/aa365247.aspx for full details. -- Dmitry Olshansky
May 23 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 24.05.2012 0:13, Michael wrote:
 approximately 32,000 characters...

I know it ;) But it's platform specific kung-fu.

It's the only game in M$ town ;) -- Dmitry Olshansky
May 23 2012
prev sibling next sibling parent "Michael" <pr m1xa.com> writes:
 approximately 32,000 characters...

I know it ;) But it's platform specific kung-fu.
May 23 2012
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 23 May 2012 20:54:44 +0100, Jacob Carlborg <doob me.com> wrote:

 On 2012-05-23 20:34, Stewart Gordon wrote:

 What is "C encoding"?

Since C doesn't really have a concept of encodings it would be whatever a given application/library decides it is.

All the more reason to use byte/ubyte as D's equivalent to C's char. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
May 24 2012
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 23 May 2012 21:13:47 +0100, Michael <pr m1xa.com> wrote:

 approximately 32,000 characters...

I know it ;) But it's platform specific kung-fu.

And, if you start to dig a bit things can get a bit hairy in places: http://blogs.msdn.com/b/bclteam/archive/2007/02/13/long-paths-in-net-part-1-of-3-kim-hamilton.aspx http://blogs.msdn.com/b/bclteam/archive/2007/03/26/long-paths-in-net-part-2-of-3-long-path-workarounds-kim-hamilton.aspx http://blogs.msdn.com/b/bclteam/archive/2008/07/07/long-paths-in-net-part-3-of-3-redux-kim-hamilton.aspx R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
May 24 2012
prev sibling parent "Michael" <pr m1xa.com> writes:
I knew it till an .net era. Main line is even Windows may handle 
it in a wrong way.

WinAPi - interface "as is". So let user decides to use or not.
May 24 2012