www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A hat-trick of File bugs

reply Arcane Jill <Arcane_member pathlink.com> writes:
Hi,

I encountered three bugs yesterday. One of them crept in with DMD 0.93. For
reference, my file opening procedure is:



BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an exact
multiple of sixteen bytes, then a spurious '˙' character is appended to the
filename. That is, you think you're opening "foo". The function actually tries
to open "foo˙". (You need a longer filename to see the effect, but you get the
idea). I believe this bug is caused by appending an uninitialized char to the
end of the filename, in an attempt to make it null-terminated for the benefit of
underlying C-functions. Uninitialized chars now contain 0xFF. In combination
with BUG 2, this would cause the observed symptoms.

BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN
attempts to open the wrong file, but this time for a different reason. The
reason is that the path parameter to new File() is passed in in UTF-8, but
Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert
from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename
parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide
characters in this case.

Please note that the REASON why the default value for uninitialized chars was
changed to 0xFF was /precisely/ to catch situations like this. 0xFF is illegal
in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16, and a trailing
0xFF happened to be found, it WOULD rightly have caused a UTF-8 conversion
exception. As it was, Windows interpretted the 0xFF as a WINDOWS-1252 encoded
character (which is why we got '˙' - its codepoint equals U+00FF).

BUG 3 - Even though I was attempting to open a file for READING, an empty file
got CREATED. Like - you try to open "foo" for reading, and "foo˙" gets created.
Surely FileMode.In out to mean "do not create"?

Arcane Jill

PS. I don't have newsgroup access, so I can't report these anywhere else, unless
the bug-reporting forum has a web interface too. But these would seem to be
sufficiently pervasive that they are likely to affect a whole lot of other
people too. Particularly given that much discussion of late has been about
streams, it seemed relevant to mention it here.
Jun 26 2004
next sibling parent Ben Hinkle <bhinkle4 juno.com> writes:
Arcane Jill wrote:

 Hi,
 
 I encountered three bugs yesterday. One of them crept in with DMD 0.93.
 For reference, my file opening procedure is:
 

 
 BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an
 exact multiple of sixteen bytes, then a spurious '' character is appended
 to the filename. That is, you think you're opening "foo". The function
 actually tries to open "foo". (You need a longer filename to see the
 effect, but you get the idea). I believe this bug is caused by appending
 an uninitialized char to the end of the filename, in an attempt to make it
 null-terminated for the benefit of underlying C-functions. Uninitialized
 chars now contain 0xFF. In combination with BUG 2, this would cause the
 observed symptoms.
Now I see why the std.string unittests fail in toStringz. I had assumed my phobos was messed up but the failures were exactly about tacking on a trailing FF. Line 227 and 228 in string.d allocate a new char[] and copy the string over and assume char.init is 0. I had to put copy[string.length] = 0; Even then I think the string unittests still failed later on (I stopped trying to fix std.string then assuming something else was wrong).
 BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN
 attempts to open the wrong file, but this time for a different reason. The
 reason is that the path parameter to new File() is passed in in UTF-8, but
 Windows filenames are in UTF-16. It is a bug for new File() /not/ to
 convert from UTF-8 to UTF-16 (on Windows) before attempting the open. The
 filename parameter to CreateFile(...) is an LPCTSTR, which really should
 use 16-bit wide characters in this case.
 
 Please note that the REASON why the default value for uninitialized chars
 was changed to 0xFF was /precisely/ to catch situations like this. 0xFF is
 illegal in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16,
 and a trailing 0xFF happened to be found, it WOULD rightly have caused a
 UTF-8 conversion exception. As it was, Windows interpretted the 0xFF as a
 WINDOWS-1252 encoded character (which is why we got '' - its codepoint
 equals U+00FF).
you are right. line 1404 in std.stream should call something like file.toMBSz instead of toStringz.
 BUG 3 - Even though I was attempting to open a file for READING, an empty
 file got CREATED. Like - you try to open "foo" for reading, and "foo" gets
 created. Surely FileMode.In out to mean "do not create"?
Interesting. It looks like on linux it errors. Seems reasonable to have Windows do the same.
 Arcane Jill
 
 PS. I don't have newsgroup access, so I can't report these anywhere else,
 unless the bug-reporting forum has a web interface too. But these would
 seem to be sufficiently pervasive that they are likely to affect a whole
 lot of other people too. Particularly given that much discussion of late
 has been about streams, it seemed relevant to mention it here.
in general bug reports should go to the bugs newsgroup.
Jun 26 2004
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
In article <cbj8kn$1pvm$1 digitaldaemon.com>, Arcane Jill says...
BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an exact
multiple of sixteen bytes, then a spurious '˙' character is appended to the
filename. That is, you think you're opening "foo". The function actually tries
to open "foo˙". (You need a longer filename to see the effect, but you get the
idea). I believe this bug is caused by appending an uninitialized char to the
end of the filename, in an attempt to make it null-terminated for the benefit of
underlying C-functions. Uninitialized chars now contain 0xFF. In combination
with BUG 2, this would cause the observed symptoms.
Already fixed in my update version.
BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN
attempts to open the wrong file, but this time for a different reason. The
reason is that the path parameter to new File() is passed in in UTF-8, but
Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert
from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename
parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide
characters in this case.
So far I'm only calling CreateFileA (can't use CreateFile because macros don't work in D). I'll add a wchar version that calls CreateFileW.
BUG 3 - Even though I was attempting to open a file for READING, an empty file
got CREATED. Like - you try to open "foo" for reading, and "foo˙" gets created.
Surely FileMode.In out to mean "do not create"?
Haven't addressed the truncate, etc, flags yet. But you're right, this operation should fail. To that end, would it make more sense to throw an exception or return a bit and add an isOpen method? Sean
Jun 26 2004
prev sibling next sibling parent "Carlos Santander B." <carlos8294 msn.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> escribió en el mensaje
news:cbj8kn$1pvm$1 digitaldaemon.com
|
| ...
|
| BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN
| attempts to open the wrong file, but this time for a different reason. The
| reason is that the path parameter to new File() is passed in in UTF-8, but
| Windows filenames are in UTF-16. It is a bug for new File() /not/ to
convert
| from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename
| parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit
wide
| characters in this case.
|
| Please note that the REASON why the default value for uninitialized chars
was
| changed to 0xFF was /precisely/ to catch situations like this. 0xFF is
illegal
| in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16, and a
trailing
| 0xFF happened to be found, it WOULD rightly have caused a UTF-8 conversion
| exception. As it was, Windows interpretted the 0xFF as a WINDOWS-1252
encoded
| character (which is why we got '˙' - its codepoint equals U+00FF).
|
| ...
|
| Arcane Jill
|

See my post "(fix) Re: unicode filenames: std.stream.File and
std.path.listdir" in digitalmars.D.bugs on June 8th. There I attached a
fixed stream.d which addresses that situation for Windows. I didn't know
there could be such a problem in Linux, and I certainly don't know how to
fix it. It's up to Walter now to fix std.stream in Phobos.

| PS. I don't have newsgroup access, so I can't report these anywhere else,
unless
| the bug-reporting forum has a web interface too. But these would seem to
be
| sufficiently pervasive that they are likely to affect a whole lot of other
| people too. Particularly given that much discussion of late has been about
| streams, it seemed relevant to mention it here.

Of course there's web interface for the bugs ng.

-----------------------
Carlos Santander Bernal
Jun 26 2004
prev sibling parent J C Calvarese <jcc7 cox.net> writes:
Arcane Jill wrote:
...
 
 PS. I don't have newsgroup access, so I can't report these anywhere else,
unless
 the bug-reporting forum has a web interface too. But these would seem to be
 sufficiently pervasive that they are likely to affect a whole lot of other
 people too. Particularly given that much discussion of late has been about
 streams, it seemed relevant to mention it here.
Try out this: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs Each Digital Mars newsgroup group has a corresponding web interface. Here's a list: http://www.digitalmars.com/drn-bin/wwwnews?* -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Jun 26 2004