digitalmars.D - Looking for a new maintainer for std.uni/std.regex
- Dmitry Olshansky (17/17) Aug 24 2022 Time flies by and my work on D's std library has halted a long
- user1234 (3/21) Oct 01 2022 courage bro.
- Imperatorn (4/7) Oct 02 2022 I'll take a look. If I survive more than 15 minutes I'll let you
- Walter Bright (3/7) Oct 02 2022 I doubt Dmitry is looking at this thread 6 weeks later. Probably best to...
- Imperatorn (2/12) Oct 03 2022 Oops, I missed that 😅
- monkyyy (2/10) Oct 03 2022 How bad was it?
- Imperatorn (2/13) Oct 04 2022 Unfortunately I died
- Dmitry Olshansky (3/17) Oct 31 2022 Sorry for not picking on this earlier. Anyhow what was that you
- Imperatorn (3/21) Oct 31 2022 Hehe, I didn't die. I'm just exploring it atm 😅
- Imperatorn (4/22) Nov 01 2022 If I find more time, I will actually look into it. Just my
- rikki cattermole (12/12) Nov 01 2022 Hey,
- Dmitry Olshansky (18/31) Nov 01 2022 Totally understand the feeling. Keep in mind that I'm in the same
- rikki cattermole (10/51) Nov 01 2022 I chose not to raise it with you because 2014 was a long time ago, no
- Dmitry Olshansky (8/18) Nov 01 2022 Okay that's a start) Just in case - I pushed important tiny fix
- rikki cattermole (3/8) Nov 01 2022 Yeah Discord works.
- Dmitry Olshansky (4/13) Nov 01 2022 Please elaborate how do I find you there, been ages since I used
- rikki cattermole (3/17) Nov 01 2022 For future reference: dlang.org -> community -> Community Discord
- Hipreme (5/23) Nov 02 2022 The greatest bug on std.regex is it being too slow to compile, do
- rikki cattermole (3/6) Nov 02 2022 A feature that is known to have been useless is ctRegex, that needs to
- Imperatorn (3/10) Nov 02 2022 Yeah, that should be removed
- H. S. Teoh (12/19) Nov 02 2022 While ctRegex probably should be removed, I don't think that's the
- Dmitry Olshansky (9/30) Nov 02 2022 I guess CTFEing big tables didn’t work since it’s been 10 years
- Dmitry Olshansky (4/17) Nov 02 2022 Like this thingie:
- Dmitry Olshansky (11/40) Nov 02 2022 Basically this most likely has to do with static immutable tables
- rikki cattermole (6/14) Nov 02 2022 One of the libraries I listed from day one that ImportC had to support
- Dmitry Olshansky (5/21) Nov 03 2022 Having a JIT in Phobos would be fantastic. On the other hand if
- Dave P. (3/19) Nov 03 2022 Uses inline assembly so it’s pretty unlikely.
- rikki cattermole (3/4) Nov 03 2022 We could upstream disables and use our own implementation to replace it.
Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. Since lots of people ask what they can do to help push D language forward I thought one great way is to take on the responsibility for std modules that have lost their maintainers. In particuar I willing to guide a volonteer into the low-level pits of std.regex and std.uni and hopefully let him or her continue the work I once envisioned for them or maybe choosing a different track of evolution altogether. Anyhow I'm willing to spend the time to transfer the knowledge so that at minimum there is someone more active than me to hold the line. std.regex is 2011's product with all of language bugs and quirks of that time, std.uni is 2012 and pretty much in the same position. Anyway reply to this message or mail me dmitry at olshansky dot me -- Dmitry Olshansky
Aug 24 2022
On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. Since lots of people ask what they can do to help push D language forward I thought one great way is to take on the responsibility for std modules that have lost their maintainers. In particuar I willing to guide a volonteer into the low-level pits of std.regex and std.uni and hopefully let him or her continue the work I once envisioned for them or maybe choosing a different track of evolution altogether. Anyhow I'm willing to spend the time to transfer the knowledge so that at minimum there is someone more active than me to hold the line. std.regex is 2011's product with all of language bugs and quirks of that time, std.uni is 2012 and pretty much in the same position. Anyway reply to this message or mail me dmitry at olshansky dot me -- Dmitry Olshanskycourage bro.
Oct 01 2022
On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. [...]I'll take a look. If I survive more than 15 minutes I'll let you know
Oct 02 2022
On 10/2/2022 3:14 PM, Imperatorn wrote:On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:I doubt Dmitry is looking at this thread 6 weeks later. Probably best to email him. Thanks for picking up the flag!Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues.I'll take a look. If I survive more than 15 minutes I'll let you know
Oct 02 2022
On Monday, 3 October 2022 at 00:13:54 UTC, Walter Bright wrote:On 10/2/2022 3:14 PM, Imperatorn wrote:Oops, I missed that 😅On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:I doubt Dmitry is looking at this thread 6 weeks later. Probably best to email him. Thanks for picking up the flag!Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues.I'll take a look. If I survive more than 15 minutes I'll let you know
Oct 03 2022
On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:How bad was it?Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. [...]I'll take a look. If I survive more than 15 minutes I'll let you know
Oct 03 2022
On Monday, 3 October 2022 at 22:15:18 UTC, monkyyy wrote:On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:Unfortunately I diedOn Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:How bad was it?Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. [...]I'll take a look. If I survive more than 15 minutes I'll let you know
Oct 04 2022
On Tuesday, 4 October 2022 at 07:58:57 UTC, Imperatorn wrote:On Monday, 3 October 2022 at 22:15:18 UTC, monkyyy wrote:Sorry for not picking on this earlier. Anyhow what was that you found the most appalling?On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:Unfortunately I diedOn Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:How bad was it?Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. [...]I'll take a look. If I survive more than 15 minutes I'll let you know
Oct 31 2022
On Monday, 31 October 2022 at 16:33:49 UTC, Dmitry Olshansky wrote:On Tuesday, 4 October 2022 at 07:58:57 UTC, Imperatorn wrote:Hehe, I didn't die. I'm just exploring it atm 😅On Monday, 3 October 2022 at 22:15:18 UTC, monkyyy wrote:Sorry for not picking on this earlier. Anyhow what was that you found the most appalling?On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:Unfortunately I diedOn Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:How bad was it?Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. [...]I'll take a look. If I survive more than 15 minutes I'll let you know
Oct 31 2022
On Monday, 31 October 2022 at 16:33:49 UTC, Dmitry Olshansky wrote:On Tuesday, 4 October 2022 at 07:58:57 UTC, Imperatorn wrote:If I find more time, I will actually look into it. Just my current situation does not allow me spending so much time :(On Monday, 3 October 2022 at 22:15:18 UTC, monkyyy wrote:Sorry for not picking on this earlier. Anyhow what was that you found the most appalling?On Sunday, 2 October 2022 at 22:14:49 UTC, Imperatorn wrote:Unfortunately I diedOn Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:How bad was it?Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. [...]I'll take a look. If I survive more than 15 minutes I'll let you know
Nov 01 2022
Hey, As you know, I'm currently working on the table generator trying to get that into Phobos, and its been a fair amount of work for what should have already been working (oh wells). I did have to recreate some logic for the symbols toUpperSimpleIndex and friends. My general feeling is I'm missing something as I'm getting: https://dev.azure.com/dlanguage/Phobos/_build/results?buildId=33558&view=logs&j=4fbced83-508e-5fe0-c978-5c71ec0fc506&t=efea9dc6-8b7a-5cfd-995a-4727b0e8449d&l=4640 I did that logic by hand, I'm pretty certain it should be working, my suspicion is you had it do the decomposing as well. I could do with some made it into the repo for the generator.
Nov 01 2022
On Tuesday, 1 November 2022 at 10:22:17 UTC, rikki cattermole wrote:Hey, As you know, I'm currently working on the table generator trying to get that into Phobos, and its been a fair amount of work for what should have already been working (oh wells).Totally understand the feeling. Keep in mind that I'm in the same position today.I did have to recreate some logic for the symbols toUpperSimpleIndex and friends.When I tried to restore tables before I opted to remove them, not sure if it was the right move. https://github.com/dlang/phobos/pull/7469My general feeling is I'm missing something as I'm getting: https://dev.azure.com/dlanguage/Phobos/_build/results?buildId=33558&view=logs&j=4fbced83-508e-5fe0-c978-5c71ec0fc506&t=efea9dc6-8b7a-5cfd-995a-4727b0e8449d&l=4640Cannot easily decipher what's blowing up there. Seems to be this: Error: `assert(cmp(s2, "I i\xcc\x87") == 0)` failed I'm missing the context but it looks like this uses title case tables which are not the same as simple case folding, but since you didn't fiddle with title case that would be strange to break.I did that logic by hand, I'm pretty certain it should be working, my suspicion is you had it do the decomposing as well. I could do with some pointers on what is probably missing for generator.So the issue with simple case folding vs full case folding. Simple case folding is used mostly in sicmp (simple insensitive cmp) so should be well confined. In any case I'm happy to help with restoring the generator, could you point me to your repo so I can help you figure out what might be missing?
Nov 01 2022
On 02/11/2022 12:01 AM, Dmitry Olshansky wrote:On Tuesday, 1 November 2022 at 10:22:17 UTC, rikki cattermole wrote:I chose not to raise it with you because 2014 was a long time ago, no chance you still have it ;) But I am on my second day, so I'm hoping you'll at least know what I need to do.Hey, As you know, I'm currently working on the table generator trying to get that into Phobos, and its been a fair amount of work for what should have already been working (oh wells).Totally understand the feeling. Keep in mind that I'm in the same position today.Yes this has something to do with casing, rather than case folding. It does need SpecialCasing.txt. The check in question shouldn't have anything to do with case folding as it is ``cmp``.I did have to recreate some logic for the symbols toUpperSimpleIndex and friends.When I tried to restore tables before I opted to remove them, not sure if it was the right move. https://github.com/dlang/phobos/pull/7469My general feeling is I'm missing something as I'm getting: https://dev.azure.com/dlanguage/Phobos/_build/results?buildId=33558&view=logs&j=4fbced83-508e-5fe0-c978-5c71ec0fc506&t=efea9dc6-8b7a-5cfd-995a-4727b0e8449d&l=4640Cannot easily decipher what's blowing up there. Seems to be this: Â Error: `assert(cmp(s2, "I i\xcc\x87") == 0)` failed I'm missing the context but it looks like this uses title case tables which are not the same as simple case folding, but since you didn't fiddle with title case that would be strange to break.https://github.com/rikkimax/phobos/blob/unicode_tables/std/internal/unicode_table_generator.d#L575 https://github.com/rikkimax/phobos/blob/unicode_tables/std/internal/unicode_table_generator.d#L868I did that logic by hand, I'm pretty certain it should be working, my suspicion is you had it do the decomposing as well. I could do with never made it into the repo for the generator.So the issue with simple case folding vs full case folding. Simple case folding is used mostly in sicmp (simple insensitive cmp) so should be well confined. In any case I'm happy to help with restoring the generator, could you point me to your repo so I can help you figure out what might be missing?
Nov 01 2022
On Tuesday, 1 November 2022 at 11:11:05 UTC, rikki cattermole wrote:On 02/11/2022 12:01 AM, Dmitry Olshansky wrote:Okay that's a start) Just in case - I pushed important tiny fix to my repo you should absolutely take it, has to do with property name aliases incorrectly swapped. In general, I think we should take this discussion to some messenger so as to not flood the forums. I recall you had discord or something.On Tuesday, 1 November 2022 at 10:22:17 UTC, rikki cattermolehttps://github.com/rikkimax/phobos/blob/unicode_tables/std/internal/unicode_table_generator.d#L575 https://github.com/rikkimax/phobos/blob/unicode_tables/std/internal/unicode_table_generator.d#L868In any case I'm happy to help with restoring the generator, could you point me to your repo so I can help you figure out what might be missing?
Nov 01 2022
On 02/11/2022 1:21 AM, Dmitry Olshansky wrote:Okay that's a start) Just in case - I pushed important tiny fix to my repo you should absolutely take it, has to do with property name aliases incorrectly swapped.Yeah, already grabbed it as soon as I saw it.In general, I think we should take this discussion to some messenger so as to not flood the forums. I recall you had discord or something.Yeah Discord works.
Nov 01 2022
On Tuesday, 1 November 2022 at 12:27:13 UTC, rikki cattermole wrote:On 02/11/2022 1:21 AM, Dmitry Olshansky wrote:Please elaborate how do I find you there, been ages since I used it.Okay that's a start) Just in case - I pushed important tiny fix to my repo you should absolutely take it, has to do with property name aliases incorrectly swapped.Yeah, already grabbed it as soon as I saw it.In general, I think we should take this discussion to some messenger so as to not flood the forums. I recall you had discord or something.Yeah Discord works.
Nov 01 2022
On 02/11/2022 1:40 AM, Dmitry Olshansky wrote:On Tuesday, 1 November 2022 at 12:27:13 UTC, rikki cattermole wrote:For future reference: dlang.org -> community -> Community Discord https://discord.gg/bMZk9Q4On 02/11/2022 1:21 AM, Dmitry Olshansky wrote:Please elaborate how do I find you there, been ages since I used it.Okay that's a start) Just in case - I pushed important tiny fix to my repo you should absolutely take it, has to do with property name aliases incorrectly swapped.Yeah, already grabbed it as soon as I saw it.In general, I think we should take this discussion to some messenger so as to not flood the forums. I recall you had discord or something.Yeah Discord works.
Nov 01 2022
On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. Since lots of people ask what they can do to help push D language forward I thought one great way is to take on the responsibility for std modules that have lost their maintainers. In particuar I willing to guide a volonteer into the low-level pits of std.regex and std.uni and hopefully let him or her continue the work I once envisioned for them or maybe choosing a different track of evolution altogether. Anyhow I'm willing to spend the time to transfer the knowledge so that at minimum there is someone more active than me to hold the line. std.regex is 2011's product with all of language bugs and quirks of that time, std.uni is 2012 and pretty much in the same position. Anyway reply to this message or mail me dmitry at olshansky dot me -- Dmitry OlshanskyThe greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now? Are you looking for fixes or an entire rework on it?
Nov 02 2022
On 03/11/2022 12:20 AM, Hipreme wrote:The greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now? Are you looking for fixes or an entire rework on it?A feature that is known to have been useless is ctRegex, that needs to be deprecated. Perhaps that'll help things once removed?
Nov 02 2022
On Wednesday, 2 November 2022 at 11:34:11 UTC, rikki cattermole wrote:On 03/11/2022 12:20 AM, Hipreme wrote:Yeah, that should be removedThe greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now? Are you looking for fixes or an entire rework on it?A feature that is known to have been useless is ctRegex, that needs to be deprecated. Perhaps that'll help things once removed?
Nov 02 2022
On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via Digitalmars-d wrote:On 03/11/2022 12:20 AM, Hipreme wrote:While ctRegex probably should be removed, I don't think that's the problem. Even when you don't use ctRegex, using regex() alone slows down compile times by 2-3 seconds. I think it may be the excessive use of nested templates / CTFE deep inside std.regex's internal implementation. I'm not sure if this can be fixed without rewriting from scratch (which we don't want to do -- that would be too big of an effort), but perhaps some careful profiling of the compiler might help pinpoint the most egregious parts of the code that could be improved. T -- Designer clothes: how to cover less by paying more.The greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now? Are you looking for fixes or an entire rework on it?A feature that is known to have been useless is ctRegex, that needs to be deprecated. Perhaps that'll help things once removed?
Nov 02 2022
On Wednesday, 2 November 2022 at 15:32:56 UTC, H. S. Teoh wrote:On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via Digitalmars-d wrote:I guess CTFEing big tables didn’t work since it’s been 10 years and we are exactly where it started - a proof of concept that is incredibly slow to compile with minor speed gains at run-time.On 03/11/2022 12:20 AM, Hipreme wrote:The greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now? Are you looking for fixes or an entire rework on it?A feature that is known to have been useless is ctRegex, that needs to be deprecated.Regex is fairly simple in its use of templates - the whole thing is templated by Char which is hardly a big problem considering that Phobos is made of templates.Perhaps that'll help things once removed?While ctRegex probably should be removed, I don't think that's the problem. Even when you don't use ctRegex, using regex() alone slows down compile times by 2-3 seconds. I think it may be the excessive use of nested templates / CTFE deep inside std.regex's internal implementation.I'm not sure if this can be fixed without rewriting from scratch (which we don't want to do -- that would be too big of an effort), but perhaps some careful profiling of the compiler might help pinpoint the most egregious parts of the code that could be improved.I do not think one needs to go that deep, just look for immutable globals since that’s where the CTFE is which is synonym for slow.T
Nov 02 2022
On Wednesday, 2 November 2022 at 18:57:27 UTC, Dmitry Olshansky wrote:On Wednesday, 2 November 2022 at 15:32:56 UTC, H. S. Teoh wrote:Like this thingie: https://github.com/dlang/phobos/blob/bf3ff35b8f1d40cb70a7584a563dc731a2c3ddad/std/regex/internal/ir.d#L52On Thu, Nov 03, 2022 at 12:34:11AM +1300, rikki cattermole via Digitalmars-d wrote:I do not think one needs to go that deep, just look for immutable globals since that’s where the CTFE is which is synonym for slow.On 03/11/2022 12:20 AM, Hipreme wrote:I'm not sure if this can be fixed without rewriting from scratch (which we don't want to do -- that would be too big of an effort), but perhaps some careful profiling of the compiler might help pinpoint the most egregious parts of the code that could be improved.
Nov 02 2022
On Wednesday, 2 November 2022 at 11:20:52 UTC, Hipreme wrote:On Wednesday, 24 August 2022 at 18:20:53 UTC, Dmitry Olshansky wrote:Basically this most likely has to do with static immutable tables initialized at compile-time and hence invoking heavy CTFE. Lazy initialization could be an option. Again someone have to look into it to be certain.Time flies by and my work on D's std library has halted a long time ago mostly due to personal health issues. Since lots of people ask what they can do to help push D language forward I thought one great way is to take on the responsibility for std modules that have lost their maintainers. In particuar I willing to guide a volonteer into the low-level pits of std.regex and std.uni and hopefully let him or her continue the work I once envisioned for them or maybe choosing a different track of evolution altogether. Anyhow I'm willing to spend the time to transfer the knowledge so that at minimum there is someone more active than me to hold the line. std.regex is 2011's product with all of language bugs and quirks of that time, std.uni is 2012 and pretty much in the same position. Anyway reply to this message or mail me dmitry at olshansky dot me -- Dmitry OlshanskyThe greatest bug on std.regex is it being too slow to compile, do you have any idea on what it could be right now?Are you looking for fixes or an entire rework on it?What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
Nov 02 2022
On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:One of the libraries I listed from day one that ImportC had to support is sljit. They have their own regex implementation which of course is JIT'd. It would be a good candidate to be included in Phobos. https://github.com/zherczeg/sljitAre you looking for fixes or an entire rework on it?What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
Nov 02 2022
On Thursday, 3 November 2022 at 05:39:17 UTC, rikki cattermole wrote:On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:Having a JIT in Phobos would be fantastic. On the other hand if doing it in std is not a requirement doing a regex dub package that depends on e.g. this JIT library should work as well.One of the libraries I listed from day one that ImportC had to support is sljit. They have their own regex implementation which of course is JIT'd. It would be a good candidate to be included in Phobos. https://github.com/zherczeg/sljitAre you looking for fixes or an entire rework on it?What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
Nov 03 2022
On Thursday, 3 November 2022 at 05:39:17 UTC, rikki cattermole wrote:On 03/11/2022 7:46 AM, Dmitry Olshansky wrote:Uses inline assembly so it’s pretty unlikely.One of the libraries I listed from day one that ImportC had to support is sljit. They have their own regex implementation which of course is JIT'd. It would be a good candidate to be included in Phobos. https://github.com/zherczeg/sljitAre you looking for fixes or an entire rework on it?What would be the point in the entire rework? It took about 4 months to write it from scratch, plus the bug fixes found after that. If anything I had a plan to support JITing regexes which should work faster than CTFE static regex codegen plus avoiding the compile-time penalty. Even that doesn’t require an entire rework, in fact it would nicely fit into what is there.
Nov 03 2022
On 04/11/2022 5:58 AM, Dave P. wrote:Uses inline assembly so it’s pretty unlikely.We could upstream disables and use our own implementation to replace it. It shouldn't be a problem.
Nov 03 2022