www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - DMD allows invalid UTF-8, in comments

reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Currently the lexer allows invalid UTF-8 in comments.
Here is a patch to make it check all comments as well.

It's just a gross copy-and-paste, since that seemed
to be the rule in the current lexer.c source code ? :-)

To all three places, where it skips over characters:
//
/*
/+

Seems to be working OK with GDC, as far as I can tell.
(haven't run the regression suite just yet, but anyway)

--anders

PS. Walter, here's some neat tools:
     http://unxutils.sourceforge.net/
Jan 29 2005
parent Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Anders F Björklund schrieb am Sat, 29 Jan 2005 23:08:39 +0100:
 Currently the lexer allows invalid UTF-8 in comments.

I've added a bunch of invalid UTF test to DStress: http://dstress.kuehne.cn/nocompile/invalid_utf_01.d ... http://dstress.kuehne.cn/nocompile/invalid_utf_43.d Note: The invalid UTF tests aren't complete yet. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFB/7HA3w+/yD4P9tIRAi9HAKCV7mEJT3rmThzOebdvTR0B1VrQtgCgxwhT FIncPr5yoRVcSAoO40MD6GY= =x6Hd -----END PGP SIGNATURE-----
Feb 01 2005