www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Looking for a new maintainer for std.uni/std.regex

reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
Time flies by and my work on D's std library has halted a long 
time ago mostly due to personal health issues.

Since lots of people ask what they can do to help push D language 
forward I thought one great way is to take on the responsibility 
for std modules that have lost their maintainers.

In particuar I willing to guide a volonteer into the low-level 
pits of std.regex and std.uni and hopefully let him or her 
continue the work I once envisioned for them or maybe choosing a 
different track of evolution altogether. Anyhow I'm willing to 
spend the time to transfer the knowledge so that at minimum there 
is someone more active than me to hold the line. std.regex is 
2011's product with all of language bugs and quirks of that time, 
std.uni is 2012 and pretty much in the same position.

Anyway reply to this message or mail me

dmitry at olshansky dot me

--
Dmitry Olshansky
Aug 24 2022
next sibling parent user1234 <user1234 12.de> writes:
On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky 
wrote:
 Time flies by and my work on D's std library has halted a long 
 time ago mostly due to personal health issues.

 Since lots of people ask what they can do to help push D 
 language forward I thought one great way is to take on the 
 responsibility for std modules that have lost their maintainers.

 In particuar I willing to guide a volonteer into the low-level 
 pits of std.regex and std.uni and hopefully let him or her 
 continue the work I once envisioned for them or maybe choosing 
 a different track of evolution altogether. Anyhow I'm willing 
 to spend the time to transfer the knowledge so that at minimum 
 there is someone more active than me to hold the line. 
 std.regex is 2011's product with all of language bugs and 
 quirks of that time, std.uni is 2012 and pretty much in the 
 same position.

 Anyway reply to this message or mail me

 dmitry at olshansky dot me

 --
 Dmitry Olshansky
courage bro.
Oct 01 2022
prev sibling next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky 
wrote:
 Time flies by and my work on D's std library has halted a long 
 time ago mostly due to personal health issues.

 [...]
I'll take a look. If I survive more than 15 minutes I'll let you know
Oct 02 2022
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/2/2022 3:14 PM, Imperatorn wrote:
 On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:
 Time flies by and my work on D's std library has halted a long time ago mostly 
 due to personal health issues.
I'll take a look. If I survive more than 15 minutes I'll let you know
I doubt Dmitry is looking at this thread 6 weeks later. Probably best to email him. Thanks for picking up the flag!
Oct 02 2022
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Monday, 3 October 2022 at 00:13:54 UTC, Walter Bright wrote:
 On 10/2/2022 3:14 PM, Imperatorn wrote:
 On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky 
 wrote:
 Time flies by and my work on D's std library has halted a 
 long time ago mostly due to personal health issues.
I'll take a look. If I survive more than 15 minutes I'll let you know
I doubt Dmitry is looking at this thread 6 weeks later. Probably best to email him. Thanks for picking up the flag!
Oops, I missed that 😅
Oct 03 2022
prev sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:
 On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky 
 wrote:
 Time flies by and my work on D's std library has halted a long 
 time ago mostly due to personal health issues.

 [...]
I'll take a look. If I survive more than 15 minutes I'll let you know
How bad was it?
Oct 03 2022
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Monday, 3 October 2022 at 22:15:18 UTC, monkyyy wrote:
 On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:
 On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky 
 wrote:
 Time flies by and my work on D's std library has halted a 
 long time ago mostly due to personal health issues.

 [...]
I'll take a look. If I survive more than 15 minutes I'll let you know
How bad was it?
Unfortunately I died
Oct 04 2022
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 4 October 2022 at 07:58:57 UTC, Imperatorn wrote:
 On Monday, 3 October 2022 at 22:15:18 UTC, monkyyy wrote:
 On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:
 On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry 
 Olshansky wrote:
 Time flies by and my work on D's std library has halted a 
 long time ago mostly due to personal health issues.

 [...]
I'll take a look. If I survive more than 15 minutes I'll let you know
How bad was it?
Unfortunately I died
Sorry for not picking on this earlier. Anyhow what was that you found the most appalling?
Oct 31 2022
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Monday, 31 October 2022 at 16:33:49 UTC, Dmitry Olshansky 
wrote:
 On Tuesday, 4 October 2022 at 07:58:57 UTC, Imperatorn wrote:
 On Monday, 3 October 2022 at 22:15:18 UTC, monkyyy wrote:
 On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:
 On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry 
 Olshansky wrote:
 Time flies by and my work on D's std library has halted a 
 long time ago mostly due to personal health issues.

 [...]
I'll take a look. If I survive more than 15 minutes I'll let you know
How bad was it?
Unfortunately I died
Sorry for not picking on this earlier. Anyhow what was that you found the most appalling?
Hehe, I didn't die. I'm just exploring it atm 😅
Oct 31 2022
prev sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Monday, 31 October 2022 at 16:33:49 UTC, Dmitry Olshansky 
wrote:
 On Tuesday, 4 October 2022 at 07:58:57 UTC, Imperatorn wrote:
 On Monday, 3 October 2022 at 22:15:18 UTC, monkyyy wrote:
 On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:
 On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry 
 Olshansky wrote:
 Time flies by and my work on D's std library has halted a 
 long time ago mostly due to personal health issues.

 [...]
I'll take a look. If I survive more than 15 minutes I'll let you know
How bad was it?
Unfortunately I died
Sorry for not picking on this earlier. Anyhow what was that you found the most appalling?
If I find more time, I will actually look into it. Just my current situation does not allow me spending so much time :(
Nov 01 2022
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
Hey,

As you know, I'm currently working on the table generator trying to get 
that into Phobos, and its been a fair amount of work for what should 
have already been working (oh wells).

I did have to recreate some logic for the symbols toUpperSimpleIndex and 
friends.

My general feeling is I'm missing something as I'm getting:

https://dev.azure.com/dlanguage/Phobos/_build/results?buildId=33558&view=logs&j=4fbced83-508e-5fe0-c978-5c71ec0fc506&t=efea9dc6-8b7a-5cfd-995a-4727b0e8449d&l=4640

I did that logic by hand, I'm pretty certain it should be working, my 
suspicion is you had it do the decomposing as well. I could do with some 

made it into the repo for the generator.
Nov 01 2022
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 1 November 2022 at 10:22:17 UTC, rikki cattermole 
wrote:
 Hey,

 As you know, I'm currently working on the table generator 
 trying to get that into Phobos, and its been a fair amount of 
 work for what should have already been working (oh wells).
Totally understand the feeling. Keep in mind that I'm in the same position today.
 I did have to recreate some logic for the symbols 
 toUpperSimpleIndex and friends.
When I tried to restore tables before I opted to remove them, not sure if it was the right move. https://github.com/dlang/phobos/pull/7469
 My general feeling is I'm missing something as I'm getting:

 https://dev.azure.com/dlanguage/Phobos/_build/results?buildId=33558&view=logs&j=4fbced83-508e-5fe0-c978-5c71ec0fc506&t=efea9dc6-8b7a-5cfd-995a-4727b0e8449d&l=4640
Cannot easily decipher what's blowing up there. Seems to be this: Error: `assert(cmp(s2, "I i\xcc\x87") == 0)` failed I'm missing the context but it looks like this uses title case tables which are not the same as simple case folding, but since you didn't fiddle with title case that would be strange to break.
 I did that logic by hand, I'm pretty certain it should be 
 working, my suspicion is you had it do the decomposing as well. 
 I could do with some pointers on what is probably missing for 

 generator.
So the issue with simple case folding vs full case folding. Simple case folding is used mostly in sicmp (simple insensitive cmp) so should be well confined. In any case I'm happy to help with restoring the generator, could you point me to your repo so I can help you figure out what might be missing?
Nov 01 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 02/11/2022 12:01 AM, Dmitry Olshansky wrote:
 On Tuesday, 1 November 2022 at 10:22:17 UTC, rikki cattermole wrote:
 Hey,

 As you know, I'm currently working on the table generator trying to 
 get that into Phobos, and its been a fair amount of work for what 
 should have already been working (oh wells).
Totally understand the feeling. Keep in mind that I'm in the same position today.
I chose not to raise it with you because 2014 was a long time ago, no chance you still have it ;) But I am on my second day, so I'm hoping you'll at least know what I need to do.
 I did have to recreate some logic for the symbols toUpperSimpleIndex 
 and friends.
When I tried to restore tables before I opted to remove them, not sure if it was the right move. https://github.com/dlang/phobos/pull/7469
 My general feeling is I'm missing something as I'm getting:

 https://dev.azure.com/dlanguage/Phobos/_build/results?buildId=33558&view=logs&j=4fbced83-508e-5fe0-c978-5c71ec0fc506&t=efea9dc6-8b7a-5cfd-995a-4727b0e8449d&l=4640
Cannot easily decipher what's blowing up there. Seems to be this:  Error: `assert(cmp(s2, "I i\xcc\x87") == 0)` failed I'm missing the context but it looks like this uses title case tables which are not the same as simple case folding, but since you didn't fiddle with title case that would be strange to break.
Yes this has something to do with casing, rather than case folding. It does need SpecialCasing.txt. The check in question shouldn't have anything to do with case folding as it is ``cmp``.
 I did that logic by hand, I'm pretty certain it should be working, my 
 suspicion is you had it do the decomposing as well. I could do with 

 never made it into the repo for the generator.
So the issue with simple case folding vs full case folding. Simple case folding is used mostly in sicmp (simple insensitive cmp) so should be well confined. In any case I'm happy to help with restoring the generator, could you point me to your repo so I can help you figure out what might be missing?
https://github.com/rikkimax/phobos/blob/unicode_tables/std/internal/unicode_table_generator.d#L575 https://github.com/rikkimax/phobos/blob/unicode_tables/std/internal/unicode_table_generator.d#L868
Nov 01 2022
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 1 November 2022 at 11:11:05 UTC, rikki cattermole 
wrote:
 On 02/11/2022 12:01 AM, Dmitry Olshansky wrote:
 On Tuesday, 1 November 2022 at 10:22:17 UTC, rikki cattermole

 
 
 In any case I'm happy to help with restoring the generator, 
 could you point me to your repo so I can help you figure out 
 what might be missing?
https://github.com/rikkimax/phobos/blob/unicode_tables/std/internal/unicode_table_generator.d#L575 https://github.com/rikkimax/phobos/blob/unicode_tables/std/internal/unicode_table_generator.d#L868
Okay that's a start) Just in case - I pushed important tiny fix to my repo you should absolutely take it, has to do with property name aliases incorrectly swapped. In general, I think we should take this discussion to some messenger so as to not flood the forums. I recall you had discord or something.
Nov 01 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 02/11/2022 1:21 AM, Dmitry Olshansky wrote:
 Okay that's a start) Just in case - I pushed important tiny fix to my 
 repo you should absolutely take it, has to do with property name aliases 
 incorrectly swapped.
Yeah, already grabbed it as soon as I saw it.
 In general, I think we should take this discussion to some messenger so 
 as to not flood the forums. I recall you had discord or something.
Yeah Discord works.
Nov 01 2022
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 1 November 2022 at 12:27:13 UTC, rikki cattermole 
wrote:
 On 02/11/2022 1:21 AM, Dmitry Olshansky wrote:
 Okay that's a start) Just in case - I pushed important tiny 
 fix to my repo you should absolutely take it, has to do with 
 property name aliases incorrectly swapped.
Yeah, already grabbed it as soon as I saw it.
 In general, I think we should take this discussion to some 
 messenger so as to not flood the forums. I recall you had 
 discord or something.
Yeah Discord works.
Please elaborate how do I find you there, been ages since I used it.
Nov 01 2022
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 02/11/2022 1:40 AM, Dmitry Olshansky wrote:
 On Tuesday, 1 November 2022 at 12:27:13 UTC, rikki cattermole wrote:
 On 02/11/2022 1:21 AM, Dmitry Olshansky wrote:
 Okay that's a start) Just in case - I pushed important tiny fix to my 
 repo you should absolutely take it, has to do with property name 
 aliases incorrectly swapped.
Yeah, already grabbed it as soon as I saw it.
 In general, I think we should take this discussion to some messenger 
 so as to not flood the forums. I recall you had discord or something.
Yeah Discord works.
Please elaborate how do I find you there, been ages since I used it.
For future reference: dlang.org -> community -> Community Discord https://discord.gg/bMZk9Q4
Nov 01 2022
prev sibling parent reply Hipreme <msnmancini hotmail.com> writes:
On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky 
wrote:
 Time flies by and my work on D's std library has halted a long 
 time ago mostly due to personal health issues.

 Since lots of people ask what they can do to help push D 
 language forward I thought one great way is to take on the 
 responsibility for std modules that have lost their maintainers.

 In particuar I willing to guide a volonteer into the low-level 
 pits of std.regex and std.uni and hopefully let him or her 
 continue the work I once envisioned for them or maybe choosing 
 a different track of evolution altogether. Anyhow I'm willing 
 to spend the time to transfer the knowledge so that at minimum 
 there is someone more active than me to hold the line. 
 std.regex is 2011's product with all of language bugs and 
 quirks of that time, std.uni is 2012 and pretty much in the 
 same position.

 Anyway reply to this message or mail me

 dmitry at olshansky dot me

 --
 Dmitry Olshansky
The greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now? Are you looking for fixes or an entire rework on it?
Nov 02 2022
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 03/11/2022 12:20 AM, Hipreme wrote:
 The greatest bug on std.regex is it being too slow to compile, do you 
 have any idea on what it could be right now? Are you looking for fixes 
 or an entire rework on it?
A feature that is known to have been useless is ctRegex, that needs to be deprecated. Perhaps that'll help things once removed?
Nov 02 2022
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Wednesday, 2 November 2022 at 11:34:11 UTC, rikki cattermole 
wrote:
 On 03/11/2022 12:20 AM, Hipreme wrote:
 The greatest bug on std.regex is it being too slow to compile, 
 do you have any idea on what it could be right now? Are you 
 looking for fixes or an entire rework on it?
A feature that is known to have been useless is ctRegex, that needs to be deprecated. Perhaps that'll help things once removed?
Yeah, that should be removed
Nov 02 2022
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via Digitalmars-d
wrote:
 On 03/11/2022 12:20 AM, Hipreme wrote:
 The greatest bug on std.regex is it being too slow to compile, do
 you have any idea on what it could be right now? Are you looking for
 fixes or an entire rework on it?
A feature that is known to have been useless is ctRegex, that needs to be deprecated. Perhaps that'll help things once removed?
While ctRegex probably should be removed, I don't think that's the problem. Even when you don't use ctRegex, using regex() alone slows down compile times by 2-3 seconds. I think it may be the excessive use of nested templates / CTFE deep inside std.regex's internal implementation. I'm not sure if this can be fixed without rewriting from scratch (which we don't want to do -- that would be too big of an effort), but perhaps some careful profiling of the compiler might help pinpoint the most egregious parts of the code that could be improved. T -- Designer clothes: how to cover less by paying more.
Nov 02 2022
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Wednesday, 2 November 2022 at 15:32:56 UTC, H. S. Teoh wrote:
 On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via 
 Digitalmars-d wrote:
 On 03/11/2022 12:20 AM, Hipreme wrote:
 The greatest bug on std.regex is it being too slow to 
 compile, do you have any idea on what it could be right now? 
 Are you looking for fixes or an entire rework on it?
A feature that is known to have been useless is ctRegex, that needs to be deprecated.
I guess CTFEing big tables didn’t work since it’s been 10 years and we are exactly where it started - a proof of concept that is incredibly slow to compile with minor speed gains at run-time.
 Perhaps that'll help things once removed?
While ctRegex probably should be removed, I don't think that's the problem. Even when you don't use ctRegex, using regex() alone slows down compile times by 2-3 seconds. I think it may be the excessive use of nested templates / CTFE deep inside std.regex's internal implementation.
Regex is fairly simple in its use of templates - the whole thing is templated by Char which is hardly a big problem considering that Phobos is made of templates.
 I'm not sure if this can be fixed without rewriting from 
 scratch (which we don't want to do -- that would be too big of 
 an effort), but perhaps some careful profiling of the compiler 
 might help pinpoint the most egregious parts of the code that 
 could be improved.
I do not think one needs to go that deep, just look for immutable globals since that’s where the CTFE is which is synonym for slow.
 T
Nov 02 2022
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Wednesday, 2 November 2022 at 18:57:27 UTC, Dmitry Olshansky 
wrote:
 On Wednesday, 2 November 2022 at 15:32:56 UTC, H. S. Teoh wrote:
 On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via 
 Digitalmars-d wrote:
 On 03/11/2022 12:20 AM, Hipreme wrote:
I'm not sure if this can be fixed without rewriting from scratch (which we don't want to do -- that would be too big of an effort), but perhaps some careful profiling of the compiler might help pinpoint the most egregious parts of the code that could be improved.
I do not think one needs to go that deep, just look for immutable globals since that’s where the CTFE is which is synonym for slow.
Like this thingie: https://github.com/dlang/phobos/blob/bf3ff35b8f1d40cb70a7584a563dc731a2c3ddad/std/regex/internal/ir.d#L52
Nov 02 2022
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Wednesday, 2 November 2022 at 11:20:52 UTC, Hipreme wrote:
 On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky 
 wrote:
 Time flies by and my work on D's std library has halted a long 
 time ago mostly due to personal health issues.

 Since lots of people ask what they can do to help push D 
 language forward I thought one great way is to take on the 
 responsibility for std modules that have lost their 
 maintainers.

 In particuar I willing to guide a volonteer into the low-level 
 pits of std.regex and std.uni and hopefully let him or her 
 continue the work I once envisioned for them or maybe choosing 
 a different track of evolution altogether. Anyhow I'm willing 
 to spend the time to transfer the knowledge so that at minimum 
 there is someone more active than me to hold the line. 
 std.regex is 2011's product with all of language bugs and 
 quirks of that time, std.uni is 2012 and pretty much in the 
 same position.

 Anyway reply to this message or mail me

 dmitry at olshansky dot me

 --
 Dmitry Olshansky
The greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now?
Basically this most likely has to do with static immutable tables initialized at compile-time and hence invoking heavy CTFE. Lazy initialization could be an option. Again someone have to look into it to be certain.
 Are you looking for fixes or an entire rework on it?
What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
Nov 02 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:
 Are you looking for fixes or an entire rework on it?
What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
One of the libraries I listed from day one that ImportC had to support is sljit. They have their own regex implementation which of course is JIT'd. It would be a good candidate to be included in Phobos. https://github.com/zherczeg/sljit
Nov 02 2022
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Thursday, 3 November 2022 at 05:39:17 UTC, rikki cattermole 
wrote:
 On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:
 Are you looking for fixes or an entire rework on it?
What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
One of the libraries I listed from day one that ImportC had to support is sljit. They have their own regex implementation which of course is JIT'd. It would be a good candidate to be included in Phobos. https://github.com/zherczeg/sljit
Having a JIT in Phobos would be fantastic. On the other hand if doing it in std is not a requirement doing a regex dub package that depends on e.g. this JIT library should work as well.
Nov 03 2022
prev sibling parent reply Dave P. <dave287091 gmail.com> writes:
On Thursday, 3 November 2022 at 05:39:17 UTC, rikki cattermole 
wrote:
 On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:
 Are you looking for fixes or an entire rework on it?
What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
One of the libraries I listed from day one that ImportC had to support is sljit. They have their own regex implementation which of course is JIT'd. It would be a good candidate to be included in Phobos. https://github.com/zherczeg/sljit
Uses inline assembly so it’s pretty unlikely.
Nov 03 2022
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 04/11/2022 5:58 AM, Dave P. wrote:
 Uses inline assembly so it’s pretty unlikely.
We could upstream disables and use our own implementation to replace it. It shouldn't be a problem.
Nov 03 2022