www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 11350] New: libphobos2 regex match segfaults when a rare HTTP header is received

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11350

           Summary: libphobos2 regex match segfaults when a rare HTTP
                    header is received
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: sha0 badchecksum.net



A simple std.net.curl.get() is performed to a remote host, which contains some
rare http headers, (I don't define the onReceiveHeader callback) but the
liphobos2 call to the default onReceiveHeader() which apply a regex to the
header, and then crashes.

I connect on this way:

    auto conn = HTTP();
    conn.connectTimeout(dur!"seconds"(4));
    conn.addRequestHeader("User-agent","Mozilla/5.0 (Windows NT 6.1; rv:20.0)
Gecko/20100101 Firefox/20.0");
    char[] html = get(url,conn);


It seems the bug is at:

/usr/include/dmd/phobos/std/regex.d  line 6348

6537 public auto match(R, RegEx)(R input, RegEx re)
6538     if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R)))
6539 {
6540     return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input);
6541 }

Maybe is an encoding problem, it seems the input is:
 print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
da�H4STeF (gdb) bt /usr/lib/i386-linux-gnu/libphobos2.so.0.63 _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch (this=0x95ac0774, input=646197483453546546, prog=...) at /usr/include/dmd/phobos/std/regex.d:6348 _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch (__HID46=0x95ac0b18, re=..., input=646197483453546546) at /usr/include/dmd/phobos/std/regex.d:6540 /usr/lib/i386-linux-gnu/libphobos2.so.0.63 /usr/lib/i386-linux-gnu/libphobos2.so.0.63 /usr/lib/i386-linux-gnu/libphobos2.so.0.63 /usr/lib/i386-linux-gnu/libcurl.so.4 /usr/lib/i386-linux-gnu/libcurl.so.4 /usr/lib/i386-linux-gnu/libcurl.so.4 /usr/lib/i386-linux-gnu/libphobos2.so.0.63 /usr/lib/i386-linux-gnu/libphobos2.so.0.63 /usr/lib/i386-linux-gnu/libphobos2.so.0.63 _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa (client=..., sendData=579669917507256320, url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762 _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa (conn=..., url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:364 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 25 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11350


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh gmail.com



11:21:26 PDT ---

 
 It seems the bug is at:
 
 /usr/include/dmd/phobos/std/regex.d  line 6348
 
 6537 public auto match(R, RegEx)(R input, RegEx re)
 6538     if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R)))
 6539 {
 6540     return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input);
 6541 }
 
 Maybe is an encoding problem, it seems the input is:
 print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
da�H4STeF
Would be nice to see what pattern that is and how exactly the argument to it looks like. I tried to reproduce with this: void main() { import std.regex; ubyte[] header = [0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46]; auto m = match(cast(char[]) header, regex("(.*?): (.*)$")); assert(m.empty); } I get: std.utf.UTFException C:\dmd2\windows\bin\..\..\src\phobos\std\utf.d(1113): Invalid UTF-8 sequence (at index 1) No crashes. Now it may have to do with shared object / PIC code for all I know, as I'm testing on Win32. But w/o a smaller or at least complete reproduceble test-case there is nothing to work on. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 25 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11350




11:40:08 PDT ---

 It seems the bug is at:
No and I think I know what it is.
 Maybe is an encoding problem, it seems the input is:
 print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
da�H4STeF
Yes, this is broken UTF-8 and hence...
 
 
 
 (gdb) bt

 /usr/lib/i386-linux-gnu/libphobos2.so.0.63

it throws and exception ...

 _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
 (this=0x95ac0774, input=646197483453546546, prog=...)
     at /usr/include/dmd/phobos/std/regex.d:6348
.. inside of std.regex.match. But the thing is - we are doing it inside of a callback of C-library CURL (browse the call stack to curl_easy_perform). IT HAS NO IDEA what to do with exception hence the crash. So the fix would be to insulate it with try/catch inside of that onRecieve callback.

 _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
 (__HID46=0x95ac0b18, re=..., input=646197483453546546) at
 /usr/include/dmd/phobos/std/regex.d:6540

 /usr/lib/i386-linux-gnu/libphobos2.so.0.63

 /usr/lib/i386-linux-gnu/libphobos2.so.0.63

 /usr/lib/i386-linux-gnu/libphobos2.so.0.63

 /usr/lib/i386-linux-gnu/libcurl.so.4

 /usr/lib/i386-linux-gnu/libcurl.so.4



 /usr/lib/i386-linux-gnu/libcurl.so.4

 /usr/lib/i386-linux-gnu/libphobos2.so.0.63

 /usr/lib/i386-linux-gnu/libphobos2.so.0.63

 /usr/lib/i386-linux-gnu/libphobos2.so.0.63

 _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa
 (client=..., sendData=579669917507256320,
     url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762

 _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa
 (conn=..., url=10576998119117946914)
     at /usr/include/dmd/phobos/std/net/curl.d:364
-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 25 2013