www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - regex error

reply Jabba Laci <jabba.laci gmail.com> writes:
I'm working on an Advent of Code problem (2015, Day 5, Part 2), 
and my code doesn't work in D. In Python I get the correct result.

Here is a string:

     const s = "xdwduffwgcptfwad";

The instructions says: "It contains a pair of any two letters 
that appears at least twice in the string without overlapping, 
like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it 
overlaps)."

Here, "fw" appears twice. My D code:

     auto m1 = matchFirst(s, regex(r"(..).*\1"));

returns an empty m1 object. What am I doing wrong? The same regex 
works in Python.
Nov 23
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Sunday, 23 November 2025 at 12:43:05 UTC, Jabba Laci wrote:
     auto m1 = matchFirst(s, regex(r"(..).*\1"));

 returns an empty m1 object. What am I doing wrong? The same 
 regex works in Python.
This is a bug, first reported in 2015: https://issues.dlang.org/show_bug.cgi?id=15489
Nov 23
prev sibling next sibling parent Sergey <kornburn yandex.ru> writes:
On Sunday, 23 November 2025 at 12:43:05 UTC, Jabba Laci wrote:
 returns an empty m1 object. What am I doing wrong? The same 
 regex works in Python.
Nothing wrong with the code. Previously it was already reported by the same AoC problem I think :) https://forum.dlang.org/thread/lqjffwcpzayznqljxsuu forum.dlang.org
Nov 23
prev sibling next sibling parent reply user1234 <user1234 12.de> writes:
On Sunday, 23 November 2025 at 12:43:05 UTC, Jabba Laci wrote:
 [...]
 What am I doing wrong? The same regex works in Python.
Nothing, you've hit a documented issue see https://github.com/dlang/phobos/issues/10152.
Nov 23
parent reply Jabba Laci <jabba.laci gmail.com> writes:
On Sunday, 23 November 2025 at 14:36:44 UTC, user1234 wrote:
 Nothing, you've hit a documented issue see 
 https://github.com/dlang/phobos/issues/10152.
Thanks. I added a comment here: https://github.com/dlang/phobos/issues/10152 I understand that this is an open source project, but it's very strange that this bug hasn't been fixed for 10 years. It doesn't give a good impression about the future of the language.
Nov 23
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Sunday, 23 November 2025 at 14:46:51 UTC, Jabba Laci wrote:
 On Sunday, 23 November 2025 at 14:36:44 UTC, user1234 wrote:
 Nothing, you've hit a documented issue see 
 https://github.com/dlang/phobos/issues/10152.
Thanks. I added a comment here: https://github.com/dlang/phobos/issues/10152 I understand that this is an open source project, but it's very strange that this bug hasn't been fixed for 10 years. It doesn't give a good impression about the future of the language.
as much as I criticize documenting bugs via technically-correct indirect declarations like this "using Thompson NFA matching scheme" or 300 pages of arguments about what is correct by referencing a spec and fuck ton of complexity rather then just working. Prefect coverage of edge cases of regex would not make my list for important libs. Std's should be be for data structures and algorithms, let us judge the health of the language how well thats going.
Nov 23
prev sibling next sibling parent reply Sergey <kornburn yandex.ru> writes:
On Sunday, 23 November 2025 at 14:46:51 UTC, Jabba Laci wrote:
 I understand that this is an open source project, but it's very 
 strange that this bug hasn't been fixed for 10 years. It 
 doesn't give a good impression about the future of the language.
Nothing to do with language (even though it's not production ready in any case). It's Phobos functionality. The author of std.regex is doing other things - he is busy. He also has some concepts of next-gen regex engine, though not ready.
Nov 23
parent Mike Parker <aldacron gmail.com> writes:
On Sunday, 23 November 2025 at 15:04:53 UTC, Sergey wrote:
 On Sunday, 23 November 2025 at 14:46:51 UTC, Jabba Laci wrote:
 I understand that this is an open source project, but it's 
 very strange that this bug hasn't been fixed for 10 years. It 
 doesn't give a good impression about the future of the 
 language.
Nothing to do with language (even though it's not production ready in any case).
I guess someone had better tell the people using it in production.
Nov 23
prev sibling next sibling parent user1234 <user1234 12.de> writes:
On Sunday, 23 November 2025 at 14:46:51 UTC, Jabba Laci wrote:
 On Sunday, 23 November 2025 at 14:36:44 UTC, user1234 wrote:
 Nothing, you've hit a documented issue see 
 https://github.com/dlang/phobos/issues/10152.
Thanks. I added a comment here: https://github.com/dlang/phobos/issues/10152 I understand that this is an open source project, but it's very strange that this bug hasn't been fixed for 10 years. It doesn't give a good impression about the future of the language.
I don't think this is a significant indication given that's related to a sub- fonctionality that's rarily used. Also that's very common for a programming language or its main library to have very old issues.
Nov 23
prev sibling next sibling parent Mike Parker <aldacron gmail.com> writes:
On Sunday, 23 November 2025 at 14:46:51 UTC, Jabba Laci wrote:
 On Sunday, 23 November 2025 at 14:36:44 UTC, user1234 wrote:
 Nothing, you've hit a documented issue see 
 https://github.com/dlang/phobos/issues/10152.
Thanks. I added a comment here: https://github.com/dlang/phobos/issues/10152 I understand that this is an open source project, but it's very strange that this bug hasn't been fixed for 10 years. It doesn't give a good impression about the future of the language.
Countless other bugs have been fixed in the meantime. The future of the language is not threatened by old ones. You'll find a lot of older ones still open because no one was able to or interested in fixing them at the time, and they haven't popped up as a priority since they were reported. These days we have two Pull Request & Issue Managers who review new issues. We didn't have them in 2015. It's less likely today that issues will remain open so long. And when you do encounter old bugs like this one, the thing to do is exactly what you've done here, bring it to our attention. If it's a straightforward fix, then it probably will be fixed. Though bear in mind that some old issues are still open because they were complicated to resolve. This one may be in that camp, given this comment from the author of the compile-time regex engine: 'You might have found a case where Thompson engine simply cannot produce the right result'. Regardless, I'll make sure our issue managers are aware of it.
Nov 23
prev sibling parent reply Dejan Lekic <dejan.lekic gmail.com> writes:
On Sunday, 23 November 2025 at 14:46:51 UTC, Jabba Laci wrote:
 I understand that this is an open source project, but it's very 
 strange that this bug hasn't been fixed for 10 years. It 
 doesn't give a good impression about the future of the language.
If it is only 10 years, then you should consider yourself lucky!
Nov 24
parent Kapendev <alexandroskapretsos gmail.com> writes:
On Monday, 24 November 2025 at 12:42:20 UTC, Dejan Lekic wrote:
 On Sunday, 23 November 2025 at 14:46:51 UTC, Jabba Laci wrote:
 I understand that this is an open source project, but it's 
 very strange that this bug hasn't been fixed for 10 years. It 
 doesn't give a good impression about the future of the 
 language.
If it is only 10 years, then you should consider yourself lucky!
Let me translate this in gamer words. Godot has "bugs" and open issues that are over 3+ years old. People still use it.
Nov 24
prev sibling next sibling parent matheus <matheus gmail.com> writes:
On Sunday, 23 November 2025 at 12:43:05 UTC, Jabba Laci wrote:
 I'm working on an Advent of Code problem (2015, Day 5, Part 2), 
 ...
Since other users already established what's going on, and Paul Backus even replied with a solution in the bugzilla etc. Now I wonder if in this kind of problem (like in project Euler) is "OK" to just use built-in functions or people should write the algorithm themselves to solve the problem. Not trying to make an excuse for the bug, I'm genuine curious. Matheus.
Nov 23
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Sunday, 23 November 2025 at 12:43:05 UTC, Jabba Laci wrote:
 I'm working on an Advent of Code problem (2015, Day 5, Part 2), 
 and my code doesn't work in D. In Python I get the correct 
 result.

 Here is a string:

     const s = "xdwduffwgcptfwad";

 The instructions says: "It contains a pair of any two letters 
 that appears at least twice in the string without overlapping, 
 like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it 
 overlaps)."

 Here, "fw" appears twice. My D code:

     auto m1 = matchFirst(s, regex(r"(..).*\1"));

 returns an empty m1 object. What am I doing wrong? The same 
 regex works in Python.
Author of std.regex here. Indeed it’s unfortunate that the main engine doesn’t support all cases of backreferences. It’s been a while since I touched std, but the fix should be mostly strightforward - use simple backtracking for this cases. std.regex has backtracking but I think it’s also augmented with certain tricks to avoid exponential behaviour. Those need to be reverted, then std.regex could just select simple backtracking for patterns that have backreferences. — Dmitry Olshansky
Nov 23