www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - UTF bug+proposed fix, please check

--001a1134877446e24a04fbd2f47a
Content-Type: text/plain; charset=UTF-8

I filed a bug report: https://issues.dlang.org/show_bug.cgi?id=12923 : "UTF
exception in stride even though passes validate"

and then a proposed fix. The fix still passes unittests and also solves the
bug, however I'm not sure whether it is correct: is the behavior of
strideImpl correct or is the old behavior of decodeImpl correct?

TL;DR:
auto fst=str.front;
immutable msbs = 7 - bsr(~fst);
if (msbs < 2 || msbs > 6) throw invalidUTF();

--001a1134877446e24a04fbd2f47a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><span style=3D"font-family:arial,sans-serif;font-size:13px=
">I filed a bug report:=C2=A0</span><a href=3D"https://issues.dlang.org/sho=
w_bug.cgi?id=3D12923" target=3D"_blank" style=3D"font-family:arial,sans-ser=
if;font-size:13px">https://issues.dlang.org/show_bug.cgi?id=3D12923</a><spa=
n style=3D"font-family:arial,sans-serif;font-size:13px">=C2=A0: &quot;UTF e=
xception in stride even though passes validate&quot;</span><div style=3D"fo=
nt-family:arial,sans-serif;font-size:13px">
<br></div><div style=3D"font-family:arial,sans-serif;font-size:13px">and th=
en a proposed fix. The fix still passes unittests and also solves the bug, =
however I&#39;m not sure whether it is correct: is the behavior of strideIm=
pl correct or is the old behavior of=C2=A0decodeImpl correct?</div>
<div style=3D"font-family:arial,sans-serif;font-size:13px"><br></div><div s=
tyle=3D"font-family:arial,sans-serif;font-size:13px">TL;DR:</div><div style=
=3D"font-family:arial,sans-serif;font-size:13px">auto fst=3Dstr.front;<br><=
/div>
<div style=3D"font-family:arial,sans-serif;font-size:13px"><div>immutable m=
sbs =3D 7 - bsr(~fst);</div><div>if (msbs &lt; 2 || msbs &gt; 6) throw inva=
lidUTF();</div></div></div>

--001a1134877446e24a04fbd2f47a--
Jun 14 2014