www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 20676] New: regex backtracking memory leak

https://issues.dlang.org/show_bug.cgi?id=20676

          Issue ID: 20676
           Summary: regex backtracking memory leak
           Product: D
           Version: D2
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P1
         Component: druntime
          Assignee: nobody puremagic.com
          Reporter: ritaka3368 provamail.com

With this code:

```
import core.memory: GC;
import std.regex;

void f()
{
   match("", ctRegex!(`^(?!\.)a$`));
}

void main()
{
   f();
   GC.collect();
}
```

using this LDC:

```
$ ldc2 --version
LDC - the LLVM D compiler (1.20.1):
  based on DMD v2.090.1 and LLVM 9.0.1
  built with LDC - the LLVM D compiler (1.20.1)
  Default target: x86_64-unknown-linux-gnu
  Host CPU: nocona
```

valgrind reports a "definitely lost" leak:

```
$ valgrind --track-origins=yes --num-callers=50 --leak-check=full ./simple
==26749== Memcheck, a memory error detector
==26749== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==26749== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==26749== Command: ./simple
==26749== 
==26749== Conditional jump or move depends on uninitialised value(s)
==26749==    at 0x19966B:
_D2gc4impl12conservativeQw3Gcx12collectRootsMFNbNlPvQcZv (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x17E992:
_D4core6thread8osthread15scanAllTypeImplFNbMDFNbEQBvQBtQBp8ScanTypePvQcZvQgZv
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x17D4E8:
_D4core6thread8osthread18callWithStackShellFNbMDFNbPvZvZv (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x17EA74: thread_scanAll (in /home/ritaka/regex-leak/simple)
==26749==    by 0x19A2BD: _D2gc4impl12conservativeQw3Gcx12markParallelMFNbbZv
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x19633E: _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x197D42:
_D2gc4impl12conservativeQw3Gcx10smallAllocMFNbmKmkxC8TypeInfoZPv (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x1949EF:
_D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x196E04:
_DThn16_2gc4impl12conservativeQw14ConservativeGC6mallocMFNbmkxC8TypeInfoZPv (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x129A46:
_D3std5regex8internal12backtracking__T19BacktrackingMatcherTaTSQCjQCiQCf2ir__T5InputTaZQjZQCa10fwdMatcherMFNaNbNeKxSQEkQEjQEgQCb__T5RegexTaZQjAvZCQFoQFnQFkQFe__TQEtTaTQEbZQFd
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x129254:
_D3std5regex__T11ctRegexImplVAyaa9_5e283f215c2e296124VQzA0Z4funcFNeCQCoQCn8internal12backtracking__T19BacktrackingMatcherTaTSQEtQEsQCf2ir__T5InputTaZQjZQCaZb
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x129B4D:
_D3std5regex8internal12backtracking__T19BacktrackingMatcherTaTSQCjQCiQCf2ir__T5InputTaZQjZQCa9matchImplMFNeZi
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x13A07B:
_D3std5regex8internal12backtracking__T19BacktrackingMatcherTaTSQCjQCiQCf2ir__T5InputTaZQjZQCa5matchMFNeASQDzQDyQDvQBq__T5GroupTmZQjZi
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x1544F1:
_D3std5regex__T10RegexMatchTAyaZQr__T6__ctorTSQBsQBr__T14CTRegexWrapperTaZQtZQBoMFNcNeQCgQBsZ__T9__lambda3TASQEdQEc8internal2ir__T5GroupTmZQjZQBuMFQBoZv
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x153F0B:
_D3std5regex8internal2ir__T15SmallFixedArrayTSQBsQBrQBoQBi__T5GroupTmZQjVki3ZQBy6mutateMFMDFAQBwZvZv
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x153DBC:
_D3std5regex__T10RegexMatchTAyaZQr__T6__ctorTSQBsQBr__T14CTRegexWrapperTaZQtZQBoMFNcNeQCgQBsZSQDoQDn__TQDkTQDbZQDs
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x128F41:
_D3std5regex__T5matchTAyaTSQzQx__T14CTRegexWrapperTaZQtZQBpFNfQBoQBnZSQCqQCp__T10RegexMatchTQCsZQr
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x128E95: _D6simple1fFZv (in /home/ritaka/regex-leak/simple)
==26749==    by 0x128FA8: _Dmain (in /home/ritaka/regex-leak/simple)
==26749==    by 0x183D2F: _D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x183C1E: _d_run_main2 (in /home/ritaka/regex-leak/simple)
==26749==    by 0x183A7D: _d_run_main (in /home/ritaka/regex-leak/simple)
==26749==    by 0x153C34: main (in /home/ritaka/regex-leak/simple)
==26749==  Uninitialised value was created by a stack allocation
==26749==    at 0x17EA40: thread_scanAll (in /home/ritaka/regex-leak/simple)
==26749== 
==26749== 
==26749== HEAP SUMMARY:
==26749==     in use at exit: 82,040 bytes in 4 blocks
==26749==   total heap usage: 558 allocs, 554 frees, 351,784 bytes allocated
==26749== 
==26749== 32 bytes in 1 blocks are possibly lost in loss record 2 of 4
==26749==    at 0x4C2FB0F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26749==    by 0x193C3A:
_D2gc4impl12conservativeQw10initializeFZC4coreQBs11gcinterface2GC (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x191569:
_D4core2gc8registry16createGCInstanceFAyaZCQBpQBn11gcinterface2GC (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x17F2CC: gc_init (in /home/ritaka/regex-leak/simple)
==26749==    by 0x17F355: gc_init_nothrow (in /home/ritaka/regex-leak/simple)
==26749==    by 0x19C8C0:
_DThn16_2gc4impl5protoQo7ProtoGC6mallocMFNbmkxC8TypeInfoZPv (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x129A46:
_D3std5regex8internal12backtracking__T19BacktrackingMatcherTaTSQCjQCiQCf2ir__T5InputTaZQjZQCa10fwdMatcherMFNaNbNeKxSQEkQEjQEgQCb__T5RegexTaZQjAvZCQFoQFnQFkQFe__TQEtTaTQEbZQFd
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x129254:
_D3std5regex__T11ctRegexImplVAyaa9_5e283f215c2e296124VQzA0Z4funcFNeCQCoQCn8internal12backtracking__T19BacktrackingMatcherTaTSQEtQEsQCf2ir__T5InputTaZQjZQCaZb
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x129B4D:
_D3std5regex8internal12backtracking__T19BacktrackingMatcherTaTSQCjQCiQCf2ir__T5InputTaZQjZQCa9matchImplMFNeZi
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x13A07B:
_D3std5regex8internal12backtracking__T19BacktrackingMatcherTaTSQCjQCiQCf2ir__T5InputTaZQjZQCa5matchMFNeASQDzQDyQDvQBq__T5GroupTmZQjZi
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x1544F1:
_D3std5regex__T10RegexMatchTAyaZQr__T6__ctorTSQBsQBr__T14CTRegexWrapperTaZQtZQBoMFNcNeQCgQBsZ__T9__lambda3TASQEdQEc8internal2ir__T5GroupTmZQjZQBuMFQBoZv
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x153F0B:
_D3std5regex8internal2ir__T15SmallFixedArrayTSQBsQBrQBoQBi__T5GroupTmZQjVki3ZQBy6mutateMFMDFAQBwZvZv
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x153DBC:
_D3std5regex__T10RegexMatchTAyaZQr__T6__ctorTSQBsQBr__T14CTRegexWrapperTaZQtZQBoMFNcNeQCgQBsZSQDoQDn__TQDkTQDbZQDs
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x128F41:
_D3std5regex__T5matchTAyaTSQzQx__T14CTRegexWrapperTaZQtZQBpFNfQBoQBnZSQCqQCp__T10RegexMatchTQCsZQr
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x128E95: _D6simple1fFZv (in /home/ritaka/regex-leak/simple)
==26749==    by 0x128FA8: _Dmain (in /home/ritaka/regex-leak/simple)
==26749==    by 0x183D2F: _D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x183C1E: _d_run_main2 (in /home/ritaka/regex-leak/simple)
==26749==    by 0x183A7D: _d_run_main (in /home/ritaka/regex-leak/simple)
==26749==    by 0x153C34: main (in /home/ritaka/regex-leak/simple)
==26749== 
==26749== 81,936 bytes in 1 blocks are definitely lost in loss record 4 of 4
==26749==    at 0x4C2FB0F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26749==    by 0x1291F4:
_D3std5regex__T11ctRegexImplVAyaa9_5e283f215c2e296124VQzA0Z4funcFNeCQCoQCn8internal12backtracking__T19BacktrackingMatcherTaTSQEtQEsQCf2ir__T5InputTaZQjZQCaZb
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x129B4D:
_D3std5regex8internal12backtracking__T19BacktrackingMatcherTaTSQCjQCiQCf2ir__T5InputTaZQjZQCa9matchImplMFNeZi
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x13A07B:
_D3std5regex8internal12backtracking__T19BacktrackingMatcherTaTSQCjQCiQCf2ir__T5InputTaZQjZQCa5matchMFNeASQDzQDyQDvQBq__T5GroupTmZQjZi
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x1544F1:
_D3std5regex__T10RegexMatchTAyaZQr__T6__ctorTSQBsQBr__T14CTRegexWrapperTaZQtZQBoMFNcNeQCgQBsZ__T9__lambda3TASQEdQEc8internal2ir__T5GroupTmZQjZQBuMFQBoZv
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x153F0B:
_D3std5regex8internal2ir__T15SmallFixedArrayTSQBsQBrQBoQBi__T5GroupTmZQjVki3ZQBy6mutateMFMDFAQBwZvZv
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x153DBC:
_D3std5regex__T10RegexMatchTAyaZQr__T6__ctorTSQBsQBr__T14CTRegexWrapperTaZQtZQBoMFNcNeQCgQBsZSQDoQDn__TQDkTQDbZQDs
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x128F41:
_D3std5regex__T5matchTAyaTSQzQx__T14CTRegexWrapperTaZQtZQBpFNfQBoQBnZSQCqQCp__T10RegexMatchTQCsZQr
(in /home/ritaka/regex-leak/simple)
==26749==    by 0x128E95: _D6simple1fFZv (in /home/ritaka/regex-leak/simple)
==26749==    by 0x128FA8: _Dmain (in /home/ritaka/regex-leak/simple)
==26749==    by 0x183D2F: _D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv (in
/home/ritaka/regex-leak/simple)
==26749==    by 0x183C1E: _d_run_main2 (in /home/ritaka/regex-leak/simple)
==26749==    by 0x183A7D: _d_run_main (in /home/ritaka/regex-leak/simple)
==26749==    by 0x153C34: main (in /home/ritaka/regex-leak/simple)
==26749== 
==26749== LEAK SUMMARY:
==26749==    definitely lost: 81,936 bytes in 1 blocks
==26749==    indirectly lost: 0 bytes in 0 blocks
==26749==      possibly lost: 32 bytes in 1 blocks
==26749==    still reachable: 72 bytes in 2 blocks
==26749==         suppressed: 0 bytes in 0 blocks
==26749== Reachable blocks (those to which a pointer was found) are not shown.
==26749== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==26749== 
==26749== For counts of detected and suppressed errors, rerun with: -v
==26749== ERROR SUMMARY: 473 errors from 3 contexts (suppressed: 0 from 0)
```

Is this is a bug in the regex library?

--
Mar 16