digitalmars.D - Why is std.regex slow, well here is one reason!

Richard (Rikki) Andrew Cattermole (143/143) Feb 23 2023 As well all know std.regex slows down our builds even if all

Walter Bright (4/5) Feb 23 2023 wouldn't be required otherwise

Richard (Rikki) Andrew Cattermole (13/19) Feb 23 2023 Sounds good, its just not the best idea to do it when you should be

Walter Bright (6/14) Feb 23 2023 Another way is to generate the tables into a separate file when Phobos i...

Max Samukha (6/8) Feb 23 2023 Unicode did not start that. For example, all Cyrillic encodings

Walter Bright (15/22) Feb 24 2023 Let's say I write "x". Is that the letter x, or the math symbol x? I kno...

Max Samukha (14/31) Feb 24 2023 Same as 'A' in KOI8 or Windows-1251? Latin and Cyrillic 'A' look

Walter Bright (16/36) Feb 24 2023 Is 'A' in German different from the 'A' in English? Yes. Do they have di...

Adam D Ruppe (3/7) Feb 24 2023 1lI 5S
Andrea Fontana (5/9) Feb 24 2023 It sounds like you're saying as "piano" in Italian means "slow"
Herbie Melbourne (11/18) Feb 25 2023 But it is the same Latin 'A' just like '0' is the same digit

Walter Bright (4/8) Mar 02 2023 It started out that way, but it is no more. There are Fraktur fonts embe...

Kagamin (4/7) Mar 03 2023 Heuristically from context. For example we know "6:26 AM" is

Dmitry Olshansky (5/12) Mar 01 2023 You’d be surprised but there are typesets where Cyrillic A is

Dmitry Olshansky (7/21) Mar 01 2023 Also your idea of “what it looks on paper” is basically NFKC or

Walter Bright (7/11) Mar 02 2023 Programs can't tell if "die" means "the" or "expire" without context, ei...

Dmitry Olshansky (15/30) Mar 02 2023 We are talking about characters. Yes we can’t tell the meaning

GrimMaple (11/18) Mar 03 2023 Except they are literally the same Latin A and in no way are

Patrick Schluter (5/10) Mar 03 2023 Indeed, it's a stupid argument since it's even more likely that
Max Samukha (6/11) Mar 04 2023 According to Wikipedia, all A's, including the Cyrillic one,

H. S. Teoh (61/78) Feb 24 2023 Actually x and × are *not* identical if you're using a sane font. They

Richard (Rikki) Andrew Cattermole (11/24) Feb 24 2023 We already do this. But instead we just commit the generated files,

Walter Bright (3/5) Feb 24 2023 Because the Unicode designers are in love with complexity (like far too ...

Richard (Rikki) Andrew Cattermole (10/16) Feb 24 2023 Not entirely.
Patrick Schluter (12/17) Feb 25 2023 Languages are complex and often contradictory. The moment you

Herbie Melbourne (4/6) Feb 25 2023 It does. See for example

Richard (Rikki) Andrew Cattermole (6/12) Feb 25 2023 Both of you are correct.

FeepingCreature (4/16) Feb 27 2023 Note: ß has an official uppercase version in German, ẞ, that can

Dmitry Olshansky (13/17) Feb 25 2023 Its doom comes from its success. Initial design was simple

Robert Schadek (1/1) Mar 03 2023 Good seeing you around

Dmitry Olshansky (6/7) Mar 03 2023 Yeah, glad you guys keep pushing forward. It’s getting tight in

Robert Schadek (1/1) Mar 03 2023 Good to see you are lurking then +1

Richard (Rikki) Andrew Cattermole (9/10) Feb 24 2023 https://issues.dlang.org/show_bug.cgi?id=23737

Johan (12/27) Feb 23 2023 Can you try compiling this with LDC's `--ftime-trace`?

Richard (Rikki) Andrew Cattermole (12/12) Feb 24 2023 I'm going to be totally honest, I have no idea how to use that informati...

Johan (14/17) Feb 24 2023 The format is chromium's:

Richard (Rikki) Andrew Cattermole (40/40) Feb 24 2023 Okay I have found something that can be improved very easily!

=?UTF-8?Q?Ali_=c3=87ehreli?= (5/6) Feb 24 2023 Too good to be true! :p

Johan (8/20) Feb 24 2023 This was a nice small project. See here:

Richard (Rikki) Andrew Cattermole (193/193) Feb 25 2023 So there is a problem with time trace handling, it doesn't escape

Johan (10/25) Feb 25 2023 I don't quite understand what you mean.

Richard (Rikki) Andrew Cattermole (12/39) Feb 25 2023 {"ph":"M","ts":0,"args":{"name":"C:\Tools\D\ldc2-1.30.0-beta1-windows-mu...

Johan (4/13) Mar 01 2023 https://github.com/ldc-developers/ldc/pull/4339

Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:

As well all know std.regex slows down our builds even if all 
you're doing is importing it.

So on Discord we were chatting and I got annoyed about it enough 
to look into it (which as we all know is a good way to make me do 
something about it).

To start off with lets do some base timings with dmd.

Here is my test module, disable the regex call as required.

```d
import std.regex;

void main() {
     auto r = regex(`[a-z]`); // remove me
}
```

To compile this its 2.2s, to compile it without the regex call 
its 1.2s.

Okay that's quite a big jump but at least we're using it. Now on 
to modifying std.regex or should I say std.uni.

That's right, we will be modifying std.uni not std.regex!

All we need to do is add ``-version=std_uni_bootstrap`` to our 
call to dmd to get this working and apply the changes at the end 
of this post.

Now the times are 1.2s and 0.9s.

Why is turning on bootstrap version in std.uni decreasing compile 
times so significantly? This is almost certainly because of the 
Unicode tables being compressed. std.regex is triggering 
decompression and bringing a whole pile of logic that wouldn't be 
required otherwise. Which costs an awful lot CPU and ram during 
CTFE. newCTFE anyone?





If you want to repeat, you'll need the below changes to std.uni 
(just add at bottom of file).

```d
public:
version(std_uni_bootstrap) {
     int icmp(S1, S2)(S1 r1, S2 r2) { return 0;}
     dchar toLower()(dchar c) { return c; }
     dchar toUpper()(dchar c) { return c; }
     void toLowerInPlace(C)(ref C[] s){}
     void toUpperInPlace(C)(ref C[] s){}
     size_t graphemeStride(C)(const scope C[] input, size_t index) 
{return 0;}
     bool isGraphical()(dchar c) { return false;}
     struct unicode {
         static  property auto opDispatch(string name)() {
             return CodepointSet.init;
         }

         static CodepointSet parseSet(Range)(ref Range range, bool 
casefold=false) {
             return CodepointSet.init;
         }

         static CodepointSet parsePropertySpec(Range)(ref Range p,
         bool negated, bool casefold) {
          return CodepointSet.init;
         }
         static dchar parseControlCode(Parser)(ref Parser p) {
         return 0;
         }
     }
     alias Escapables = AliasSeq!('[', ']', '\\', '^', '$', '.', 
'|', '?', ',', '-',

')', '{', '}',  '~');

     struct Stack(T) {
      safe:
     T[] data;
      property bool empty(){ return data.empty; }

      property size_t length(){ return data.length; }

     void push(T val){ data ~= val;  }

      trusted T pop()
     {
         assert(!empty);
         auto val = data[$ - 1];
         data = data[0 .. $ - 1];
         if (!__ctfe)
             cast(void) data.assumeSafeAppend();
         return val;
     }

      property ref T top()
     {
         assert(!empty);
         return data[$ - 1];
     }
     }

     bool isAlpha()(dchar c) {return false;}
     CodepointSet wordCharacter()() { return CodepointSet.init;}
     dchar parseUniHex(Range)(ref Range str, size_t maxDigit) {
         return 0;
     }
     auto simpleCaseFoldings()(dchar ch) {
             static struct Range
     {
      safe pure nothrow:
         uint idx; //if == uint.max, then read c.
         union
         {
             dchar c; // == 0 - empty range
             uint len;
         }
          property bool isSmall() const { return idx == uint.max; }

         this(dchar ch)
         {
             idx = uint.max;
             c = ch;
         }

         this(uint start, uint size)
         {
             idx = start;
             len = size;
         }

          property dchar front() const
         {
             return 0;
         }

          property bool empty() const
         {
             if (isSmall)
             {
                 return c == 0;
             }
             return len == 0;
         }

          property size_t length() const
         {
             if (isSmall)
             {
                 return c == 0 ? 0 : 1;
             }
             return len;
         }

         void popFront()
         {
             if (isSmall)
                 c = 0;
             else
             {
                 idx++;
                 len--;
             }
         }
     }
     return Range.init;
     }
}
```

Feb 23 2023

Walter Bright <newshound2 digitalmars.com> writes:

 std.regex is triggering decompression and bringing a whole pile of logic that 

wouldn't be required otherwise

This is good detective work. At minimum, please file a bugzilla issue with your 
analysis.

How about making the decompression code lazy?

Feb 23 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 24/02/2023 9:26 AM, Walter Bright wrote:
  > std.regex is triggering decompression and bringing a whole pile of 
 logic that wouldn't be required otherwise
 
 This is good detective work. At minimum, please file a bugzilla issue 
 with your analysis.

Sounds good, its just not the best idea to do it when you should be 
asleep ;)

 How about making the decompression code lazy?

It should already be lazy. Its just the wrong type of lazy.

Everything about std.uni and its tables is about tradeoffs. It is 
designed to be opt-in and to be small in binary. If you didn't care 
about binary sizes it would be easy enough to have it all in ROM ready 
to go, but it'll be over 8mb if you did that (mine is).

On that note, I recently looked at Unicode symbols for identifiers; we 
can shrink the is alpha LUT in dmd to ~1/9th its current size by 
updating to C11 :)

Unicode keeps growing, which is good for compilers, but horrible for 
standard libraries!

Feb 23 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 2/23/2023 12:50 PM, Richard (Rikki) Andrew Cattermole wrote:
 Everything about std.uni and its tables is about tradeoffs. It is designed to
be 
 opt-in and to be small in binary. If you didn't care about binary sizes it
would 
 be easy enough to have it all in ROM ready to go, but it'll be over 8mb if you 
 did that (mine is).

Another way is to generate the tables into a separate file when Phobos is
built, 
and import that file.


 On that note, I recently looked at Unicode symbols for identifiers; we can 
 shrink the is alpha LUT in dmd to ~1/9th its current size by updating to C11 :)

Let's do it!


 Unicode keeps growing, which is good for compilers, but horrible for standard 
 libraries!

Unicode is a brilliant idea, but its doom comes from the execrable decision to 
apply semantic meaning to glyphs.

Feb 23 2023

Max Samukha <maxsamukha gmail.com> writes:

On Thursday, 23 February 2023 at 23:11:56 UTC, Walter Bright 
wrote:

 Unicode is a brilliant idea, but its doom comes from the 
 execrable decision to apply semantic meaning to glyphs.

Unicode did not start that. For example, all Cyrillic encodings 
encode Latin А, K, H, etc. differently than the similarly looking 
Cyrillic counterparts. Whether that decision was execrable is 
highly debatable.

Feb 23 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 2/23/2023 11:28 PM, Max Samukha wrote:
 On Thursday, 23 February 2023 at 23:11:56 UTC, Walter Bright wrote:
 Unicode is a brilliant idea, but its doom comes from the execrable decision to 
 apply semantic meaning to glyphs.

 
 Unicode did not start that. For example, all Cyrillic encodings encode Latin
А, 
 K, H, etc. differently than the similarly looking Cyrillic counterparts.
Whether 
 that decision was execrable is highly debatable.

Let's say I write "x". Is that the letter x, or the math symbol x? I know which 
it is from the context. But in Unicode, there's a letter x and the math symbol 
x, although they look identical.

There is no end to semantic meanings for "x", and so any attempt to encode 
semantics into Unicode is doomed from the outset.

Printed media do not seem to require these hidden semantics, why should
Unicode? 
If you print the Unicode on paper, thereby losing its meaning, what again is
the 
purpose of Unicode?

Equally stupid are:

1. encoding of various fonts

2. multiple encodings of the same character, leading to "normalization" problems

3. encodings to enable/disable the direction the glyphs are to be read

Implementing all this stuff is hopelessly complex, which is why Unicode had to 
introduce "levels" of Unicode support.

Feb 24 2023

Max Samukha <maxsamukha gmail.com> writes:

On Friday, 24 February 2023 at 18:34:42 UTC, Walter Bright wrote:

 Let's say I write "x". Is that the letter x, or the math symbol 
 x? I know which it is from the context. But in Unicode, there's 
 a letter x and the math symbol x, although they look identical.

Same as 'A' in KOI8 or Windows-1251? Latin and Cyrillic 'A' look 
identical but have different codes. Not that I disagree with you, 
but Unicode just upheld the tradition.

 There is no end to semantic meanings for "x", and so any 
 attempt to encode semantics into Unicode is doomed from the 
 outset.

That is similar to attempts to encode semantics in, say, binary 
operators - they are nothing but functions, but...

 Printed media do not seem to require these hidden semantics, 
 why should Unicode? If you print the Unicode on paper, thereby 
 losing its meaning, what again is the purpose of Unicode?

Looks like another case of caching, one of the two hard problems 
in computing. The meaning of a code point can be inferred without 
the need to keep track of the context.

 Equally stupid are:

 1. encoding of various fonts

 2. multiple encodings of the same character, leading to 
 "normalization" problems

I agree that multiple encodings for the same abstract character 
is not a great idea, but "same character" is unfortunately not 
well defined. Is Latin 'A' the same character as Cyrillic 'A'? 
Should they have the same code?

 3. encodings to enable/disable the direction the glyphs are to 
 be read

 Implementing all this stuff is hopelessly complex, which is why 
 Unicode had to introduce "levels" of Unicode support.

That's true.

Feb 24 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 2/24/2023 12:05 PM, Max Samukha wrote:
 On Friday, 24 February 2023 at 18:34:42 UTC, Walter Bright wrote:
 
 Let's say I write "x". Is that the letter x, or the math symbol x? I know 
 which it is from the context. But in Unicode, there's a letter x and the math 
 symbol x, although they look identical.

 
 Same as 'A' in KOI8 or Windows-1251? Latin and Cyrillic 'A' look identical but 
 have different codes. Not that I disagree with you, but Unicode just upheld
the 
 tradition.

Is 'A' in German different from the 'A' in English? Yes. Do they have different 
keys on the keyboard? No. Do they have different Unicode code points? No. How
do 
you tell a German 'A' from an English 'A'? By the context.

The same for the word "die". Is it the German "the"? Or is it the English 
"expire"? Should we embed this in the letters themselves? Of course not.

 Not that I disagree with you, but Unicode just upheld the
 tradition.

Inventing a new code encoding needn't follow tradition, or take tradition to 
such an extreme that it makes everyone who uses Unicode miserable.


 There is no end to semantic meanings for "x", and so any attempt to encode 
 semantics into Unicode is doomed from the outset.

 
 That is similar to attempts to encode semantics in, say, binary operators -
they 
 are nothing but functions, but...

We know the meaning by context.


 The meaning of a code point can be inferred without the need to keep track of 
 the context.

Meaning in a character set simply should not exist outside the visual
appearance.


 Is Latin 'A' the 
 same character as Cyrillic 'A'? Should they have the same code?

It's the same glyph, and so should have the same code. The definitive test is, 
when printed out or displayed, can you see a difference? If the answer is "no" 
then they should be the same code.

It's fine if one wishes to develop another layer over Unicode which encodes 
semantics, style, font, language, emphasis, bold face, italics, etc. But these 
just do not belong in Unicode. They belong in a separate markup language.

Feb 24 2023

Adam D Ruppe <destructionator gmail.com> writes:

On Friday, 24 February 2023 at 20:44:17 UTC, Walter Bright wrote:
 It's the same glyph, and so should have the same code. The 
 definitive test is, when printed out or displayed, can you see 
 a difference? If the answer is "no" then they should be the 
 same code.

1lI 5S

i guess it depends on the font

Feb 24 2023

Andrea Fontana <nospam example.org> writes:

On Friday, 24 February 2023 at 20:44:17 UTC, Walter Bright wrote:
 It's the same glyph, and so should have the same code. The 
 definitive test is, when printed out or displayed, can you see 
 a difference? If the answer is "no" then they should be the 
 same code.

It sounds like you're saying as "piano" in Italian means "slow" 
but also "plane" so we can merge "plane" and "slow" in English as 
well.

Andrea

Feb 24 2023

Herbie Melbourne <herbmel23268 gmail.com> writes:

On Friday, 24 February 2023 at 20:44:17 UTC, Walter Bright wrote:
 Is 'A' in German different from the 'A' in English? Yes. Do 
 they have different keys on the keyboard? No. Do they have 
 different Unicode code points? No. How do you tell a German 'A' 
 from an English 'A'? By the context.

But it is the same Latin 'A' just like '0' is the same digit 
(which may look like an O) only it's pronounced differently.

 The same for the word "die". Is it the German "the"? Or is it 
 the English "expire"? Should we embed this in the letters 
 themselves? Of course not.

Some languages use pictograms for words instead of letters, like 
Chinese, but whether or not they are encoded with different code 
points for each language idk. Also Chinese has traditional and 
simplified - so, multiple code points for the same word?

My understanding of Unicode has always been that it's merely a 
mapping of a number, a code point, to a letter, word, symbol, 
icon, an idea and nothing more. Unicode is agnostic to layout. 
That's defined in a font.

Feb 25 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 2/25/2023 6:26 AM, Herbie Melbourne wrote:
 But it is the same Latin 'A'

When it's printed, how do you know the difference?

 My understanding of Unicode has always been that it's merely a mapping of a 
 number, a code point, to a letter, word, symbol, icon, an idea and nothing
more. 
 Unicode is agnostic to layout. That's defined in a font.

It started out that way, but it is no more. There are Fraktur fonts embedded in 
Unicode. There are also direction instructions to turn the rendering
right-to-left.

Mar 02 2023

Kagamin <spam here.lot> writes:

On Thursday, 2 March 2023 at 20:06:38 UTC, Walter Bright wrote:
 On 2/25/2023 6:26 AM, Herbie Melbourne wrote:
 But it is the same Latin 'A'

 When it's printed, how do you know the difference?

Heuristically from context. For example we know "6:26 AM" is 
latin, because it's abbreviation from "ante meridiem", you will 
need pretty heavy AI to do this programmatically.

Mar 03 2023

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Friday, 24 February 2023 at 20:44:17 UTC, Walter Bright wrote:
 On 2/24/2023 12:05 PM, Max Samukha wrote:
 Is Latin 'A' the same character as Cyrillic 'A'? Should they 
 have the same code?

 It's the same glyph, and so should have the same code. The 
 definitive test is, when printed out or displayed, can you see 
 a difference? If the answer is "no" then they should be the 
 same code.

You’d be surprised but there are typesets where Cyrillic A is 
visually different from ASCII A.

—
Dmitry Olshansky

Mar 01 2023

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Thursday, 2 March 2023 at 07:35:06 UTC, Dmitry Olshansky wrote:
 On Friday, 24 February 2023 at 20:44:17 UTC, Walter Bright 
 wrote:
 On 2/24/2023 12:05 PM, Max Samukha wrote:
 Is Latin 'A' the same character as Cyrillic 'A'? Should they 
 have the same code?

 It's the same glyph, and so should have the same code. The 
 definitive test is, when printed out or displayed, can you see 
 a difference? If the answer is "no" then they should be the 
 same code.

 You’d be surprised but there are typesets where Cyrillic A is 
 visually different from ASCII A.

Also your idea of “what it looks on paper” is basically NFKC or 
NFKD, which is compatibility normalization that folds lookalikes 
into the same canonical codepoint.

I would insist that there are times when “looks the same” is not 
a good option. Typically programs do not have the context, that 
we as humans use to disambiguate.

 —
 Dmitry Olshansky

Mar 01 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 3/1/2023 11:49 PM, Dmitry Olshansky wrote:
 I would insist that there are times when “looks the same” is not a good
option. 
 Typically programs do not have the context, that we as humans use to
disambiguate.

Programs can't tell if "die" means "the" or "expire" without context, either.

The point is, once invisible semantic meaning is added, an infinite number of 
Unicode code points is required.

 You’d be surprised

Not at all. People use different fonts to assert different meanings all the
time.

 but there are typesets where Cyrillic A is visually different from ASCII A.

Yes, and there are italic fonts, and people embed them in text using markup,
not 
different code points.

Mar 02 2023

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Thursday, 2 March 2023 at 20:11:14 UTC, Walter Bright wrote:
 On 3/1/2023 11:49 PM, Dmitry Olshansky wrote:
 I would insist that there are times when “looks the same” is 
 not a good option. Typically programs do not have the context, 
 that we as humans use to disambiguate.

 Programs can't tell if "die" means "the" or "expire" without 
 context, either.

We are talking about characters. Yes we can’t tell the meaning 
but we can upper/lowercase or word break it at ease.

 The point is, once invisible semantic meaning is added, an 
 infinite number of Unicode code points is required.

 You’d be surprised

 Not at all. People use different fonts to assert different 
 meanings all the time.

 but there are typesets where Cyrillic A is visually different

 from ASCII A.

 Yes, and there are italic fonts, and people embed them in text 
 using markup, not different code points.

Let’s see another example. Cyrillic letter ‘В’ looks the same as 
ASCII ‘B’ when capitalized, hence by your reasoning it’s the same 
codepoint. Now lowercase ‘в’ and ‘b’ don’t look the same hence 
different codepoints. Voila, you just made 
lowercasing/uppercasing impossible without some external context, 
so <cyrillic>В</cyrillic> ?

I’d rather live in a world where codepoints represent particular 
alphabet allowing us to generically manipulate text according to 
the language standards even if we do not know the semantics of 
words. Context if required is for high-level meaning.
—
Dmitry Olshansky

Mar 02 2023

GrimMaple <grimmaple95 gmail.com> writes:

On Friday, 24 February 2023 at 20:44:17 UTC, Walter Bright wrote:

 Is 'A' in German different from the 'A' in English? Yes.

Except they are literally the same Latin A and in no way are 
different.

 Is Latin 'A' the same character as Cyrillic 'A'? Should they 
 have the same code?

 It's the same glyph, and so should have the same code.

Except they aren't, and it's a mere coincidence that in this 
particular font they look the same way. Cyrillic А is 
traditionally written more as c\ with c being tilted to the left 
about 45 degrees. Even in fonts with Cyrillic A looking more like 
Latin A, a lot of fonts put extra emphasis on the right stroke, 
making it wider than the left.

 The definitive test is, when printed out or displayed, can you 
 see a difference? If the answer is "no" then they should be the 
 same code.

The definitive test would be understanding what you're talking 
about

Mar 03 2023

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 3 March 2023 at 11:29:34 UTC, GrimMaple wrote:
 The definitive test is, when printed out or displayed, can you 
 see a difference? If the answer is "no" then they should be 
 the same code.

 The definitive test would be understanding what you're talking 
 about

Indeed, it's a stupid argument since it's even more likely that 
printed glyphs are even more likely to be differing as on screen 
with its lower resolution than printers (600dpi printers are 
nowadays standard, screen with more than 200dpi not so frequent).

Mar 03 2023

Max Samukha <maxsamukha gmail.com> writes:

On Friday, 3 March 2023 at 11:29:34 UTC, GrimMaple wrote:
 On Friday, 24 February 2023 at 20:44:17 UTC, Walter Bright 
 wrote:

 Is 'A' in German different from the 'A' in English? Yes.

 Except they are literally the same Latin A and in no way are 
 different.

According to Wikipedia, all A's, including the Cyrillic one, 
derive from aleph through the Greek alpha. At a sufficiently high 
level of abstraction, they are the same character. We ought to 
assign a single code to all aleph derivatives. Using the ox head 
consistently as the glyph would also be a fantastic idea.

Mar 04 2023

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Fri, Feb 24, 2023 at 10:34:42AM -0800, Walter Bright via Digitalmars-d wrote:
 On 2/23/2023 11:28 PM, Max Samukha wrote:
 On Thursday, 23 February 2023 at 23:11:56 UTC, Walter Bright wrote:
 Unicode is a brilliant idea, but its doom comes from the execrable
 decision to apply semantic meaning to glyphs.

 
 Unicode did not start that. For example, all Cyrillic encodings
 encode Latin А, K, H, etc. differently than the similarly looking
 Cyrillic counterparts. Whether that decision was execrable is highly
 debatable.

 
 Let's say I write "x". Is that the letter x, or the math symbol x? I
 know which it is from the context. But in Unicode, there's a letter x
 and the math symbol x, although they look identical.

Actually x and × are *not* identical if you're using a sane font. They
have different glyph shapes (that though very similar are actually
different -- × for example will never have serifs even in a serif font),
and different font metrics (× has more space around it on either side; x
may be kerned against an adjacent letter). If you print them they will
have a different representation of dots on the paper, even if the
difference is fine enough you don't notice it.

With all due respect, writing systems aren't as simple as you think.
Sometimes what to you seems like a lookalike glyph may be something
completely different. For example, in English if you see:

	m

you can immediately tell that it's a lowercase M.  So it makes sense to
have just one Unicode codepoint to encode this, right?

Now take the lowercase Cyrillic letter т.  Completely different glyph,
so completely different Unicode codepoint, right?  The problem is, the
*cursive* version of this letter looks like this:

	m

According to your logic, we should encode this exactly the same way you
encode the English lowercase M.  But now you have two completely
different codepoints for the same letter, which makes no sense because
it implies that changing the display font (from upright to cursive)
requires re-encoding your string.

This isn't the only instance of this. Another example is lowercase
Cyrillic П, which looks like this in upright font:

	п

but in cursive:

	n

Again, you have the same problem.

It's not reasonable to expect that changing your display font requires
reencoding the string. But then you must admit that the English
lowercase n must be encoded differently from the Cyrillic cursive n.

Which means that you must encode the *logical* symbol rather than the
physical representation of it. I.e., semnatics.


 There is no end to semantic meanings for "x", and so any attempt to
 encode semantics into Unicode is doomed from the outset.

If we were to take your suggestion that "x" and "×" should be encoded
identically, we would quickly run into readability problems with English
text that contains mathematical fragments, say, the text talks about 3×3
matrices.  How will your email reader render the ×?  Not knowing any
better, it sees the exact same codepoint as x and prints it as an
English letter x, say in a serif font.  Which looks out-of-place in a
mathematical expression. To fix that, you have to explicitly switch to a
different font in order to have a nicer symbol.  The computer can't do
this for you, because, as you said, the interpretation of a symbol is
context-dependent --- and computers are bad at context-dependent stuff.
So you'll need complex information outside of the text itself (e.g. use
HTML or some other markup) to tell the computer which meaning of "x" is
intended here.  The *exact same kind of complex information* that
Unicode currently deals with.

So you're not really solving anything, just pushing the complexity from
one place to another.  And not having this information directly encoded
in the string means that you're now going back to the bad ole days where
there is no standard for marking semantics in a piece of text; everybody
does it differently, and copy-n-pasting text from one program to another
will almost guarantee the loss of this information (that you then have
to re-input in the target software).


[...]
 Implementing all this stuff is hopelessly complex, which is why
 Unicode had to introduce "levels" of Unicode support.

Human writing systems are hopelessly complex.  It's just par for the
course. :-D


T

-- 
You have to expect the unexpected. -- RL

Feb 24 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 24/02/2023 12:11 PM, Walter Bright wrote:
 On 2/23/2023 12:50 PM, Richard (Rikki) Andrew Cattermole wrote:
 Everything about std.uni and its tables is about tradeoffs. It is 
 designed to be opt-in and to be small in binary. If you didn't care 
 about binary sizes it would be easy enough to have it all in ROM ready 
 to go, but it'll be over 8mb if you did that (mine is).

 
 Another way is to generate the tables into a separate file when Phobos 
 is built, and import that file.

We already do this. But instead we just commit the generated files, 
which is a lot better than not having to modify them by hand... *shudder*

 On that note, I recently looked at Unicode symbols for identifiers; we 
 can shrink the is alpha LUT in dmd to ~1/9th its current size by 
 updating to C11 :)

 
 Let's do it!

I probably should've looked at C23 draft spec to see how they were doing 
it before saying something like this. Because they are not doing the 
ranges anymore. Its a lot more complicated with TR31.

https://open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf

This will be in the realm of a rewrite I think and certainly DIP 
territory (guess I'm on it for the export/symbol/shared library DIP).

That DIP keeps getting larger and larger... with same scope, who knew 
those innocent looking symbols, all in their tables could be so complicated!

Feb 24 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 2/24/2023 2:27 AM, Richard (Rikki) Andrew Cattermole wrote:
 who knew those 
 innocent looking symbols, all in their tables could be so complicated!

Because the Unicode designers are in love with complexity (like far too many 
engineers).

Feb 24 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 25/02/2023 7:39 AM, Walter Bright wrote:
 On 2/24/2023 2:27 AM, Richard (Rikki) Andrew Cattermole wrote:
 who knew those innocent looking symbols, all in their tables could be 
 so complicated!

 
 Because the Unicode designers are in love with complexity (like far too 
 many engineers).

Not entirely.

Humans have made pretty much every form of writing system imaginable.

If there is an assumption to be had in latin, there is another script 
that violates it so hard that you now need a table for it.

I find Unicode to be pretty impressive. It is composed of some of the 
hardest parts of human society to represent and it does so with support 
of thousands of years of content and not only that but it is backwards 
and forwards compatible!

Date/time is easy in comparison lol!

Feb 24 2023

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 24 February 2023 at 18:39:02 UTC, Walter Bright wrote:
 On 2/24/2023 2:27 AM, Richard (Rikki) Andrew Cattermole wrote:
 who knew those innocent looking symbols, all in their tables 
 could be so complicated!

 Because the Unicode designers are in love with complexity (like 
 far too many engineers).

Languages are complex and often contradictory. The moment you 
want, f.ex. taking letter cases you're in for the complexity. 
Uppercase i is different in Turkish than in any other language. ß 
does not have uppercase (uppercase is SS) but has a titlecase 
(titlecase is not the same thing as uppercase) ß. Changing cases 
is not reversible in general (Greek has two lower case sigma but 
only one uppercase, German again with ß, which becomes SS in 
uppercase, but not all SS can be ß wenn lowercased). This were 
just some simple example in Latin scripts.
Unicode is complex because language is complex. Is it perfect? 
No. Is it bad, far from it.

Feb 25 2023

Herbie Melbourne <herbmel23268 gmail.com> writes:

On Saturday, 25 February 2023 at 13:19:55 UTC, Patrick Schluter 
wrote:
 ß does not have uppercase (uppercase is SS) but has a titlecase 
 (titlecase is not the same thing as uppercase) ß.

It does. See for example 
https://www.sueddeutsche.de/bildung/rechtschreibung-das-alphabet-bekommt-einen-neuen-buchstaben-1.3566309

Feb 25 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 26/02/2023 3:31 AM, Herbie Melbourne wrote:
 On Saturday, 25 February 2023 at 13:19:55 UTC, Patrick Schluter wrote:
 ß does not have uppercase (uppercase is SS) but has a titlecase 
 (titlecase is not the same thing as uppercase) ß.

 
 It does. See for example 
 https://www.sueddeutsche.de/bildung/rechtschreibung-das-alphabet-bekommt-einen-neuen-buchstaben-1.3566309

Both of you are correct.

00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;;;;

No uppercase for simple casing.



But it does for special casing.

Feb 25 2023

FeepingCreature <feepingcreature gmail.com> writes:

On Saturday, 25 February 2023 at 13:19:55 UTC, Patrick Schluter 
wrote:
 Languages are complex and often contradictory. The moment you 
 want, f.ex. taking letter cases you're in for the complexity. 
 Uppercase i is different in Turkish than in any other language. 
 ß does not have uppercase (uppercase is SS) but has a titlecase 
 (titlecase is not the same thing as uppercase) ß. Changing 
 cases is not reversible in general (Greek has two lower case 
 sigma but only one uppercase, German again with ß, which 
 becomes SS in uppercase, but not all SS can be ß wenn 
 lowercased). This were just some simple example in Latin 
 scripts.
 Unicode is complex because language is complex. Is it perfect? 
 No. Is it bad, far from it.

Note: ß has an official uppercase version in German, ẞ, that can 
be used in parallel to SS since 2017, and is preferred since 2020.

Feb 27 2023

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Thursday, 23 February 2023 at 23:11:56 UTC, Walter Bright 
wrote:
 Unicode keeps growing, which is good for compilers, but 
 horrible for standard libraries!

 Unicode is a brilliant idea, but its doom comes from the 
 execrable decision to apply semantic meaning to glyphs.

Its doom comes from its success. Initial design was simple 
enough, and 16 bits should have been enough for everyone. Then 
gradually it got extended towards more and more of writing 
systems, the marvel here is that it managed to:
- remain compatible with earlier versions
- accommodate the vast complexity with fairly few algorithms and 
concepts
- handle technical debt, that is probably what you dislike about 
it, but at the scale of the project it’s inevitable

—
Dmitry Olshansky

Feb 25 2023

Robert Schadek <rschadek symmetryinvestments.com> writes:

Good seeing you around

Mar 03 2023

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Friday, 3 March 2023 at 15:36:03 UTC, Robert Schadek wrote:
 Good seeing you around

Yeah, glad you guys keep pushing forward. It’s getting tight in 
the language wars. Me, I’m mostly out. Couldn’t resist to chime 
in on “std.regex slow” thread though, especially the Unicode rant.

—
Dmitry Olshansky

Mar 03 2023

Robert Schadek <rschadek symmetryinvestments.com> writes:

Good to see you are lurking then +1

Mar 03 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 24/02/2023 9:26 AM, Walter Bright wrote:
 At minimum, please file a bugzilla issue with your analysis.

https://issues.dlang.org/show_bug.cgi?id=23737

One fix (removal of formattedWrite call) 
https://github.com/dlang/phobos/pull/8698



This function is taking like 700ms! 
https://github.com/dlang/phobos/blob/master/std/regex/internal/ir.d#L52

I don't know how to minimize it, but it does need to be memorized based 
upon std.uni tables. There is getMatcher above it, but yeah wordMatcher 
also needs to pure so that isn't a solution.

Feb 24 2023

Johan <j j.nl> writes:

On Thursday, 23 February 2023 at 17:06:30 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 As well all know std.regex slows down our builds even if all 
 you're doing is importing it.

 So on Discord we were chatting and I got annoyed about it 
 enough to look into it (which as we all know is a good way to 
 make me do something about it).

 To start off with lets do some base timings with dmd.

 Here is my test module, disable the regex call as required.

 ```d
 import std.regex;

 void main() {
     auto r = regex(`[a-z]`); // remove me
 }
 ```

 To compile this its 2.2s, to compile it without the regex call 
 its 1.2s.

Can you try compiling this with LDC's `--ftime-trace`?

On my machine, it takes much shorter than 2.2s, and 
`std.uni.unicode.parseSet!(Parser!(string, CodeGen)).parseSet` 
(the only big std.uni piece) takes about 1/6th of the total 
semantic analysis time (`-o-`).

I'm curious to hear whether `--ftime-trace` would have helped you 
for this or not :)

Thanks!
Cheers,
   Johan

Feb 23 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

I'm going to be totally honest, I have no idea how to use that information.

Its not in a format that is easy to figure out.

What I would want is stuff like this:

```
module
|- function
| |- initialize template module thing  ~200ms
| | |- ran CTFE on thingie  ~150us
```

Give me that, and this type of hunting would be a cake walk I think.

Seeing what triggers a template to instantiate or CTFE is just as 
important as knowing how long it took.

Feb 24 2023

Johan <j j.nl> writes:

On Friday, 24 February 2023 at 10:52:37 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I'm going to be totally honest, I have no idea how to use that 
 information.

 Its not in a format that is easy to figure out.

The format is chromium's: 
https://www.chromium.org/developers/how-tos/trace-event-profiling-tool/

If you use the Chrome browser, go to `about:tracing`.
See here for some screenshots of what it looks like: 
https://aras-p.info/blog/2019/01/16/time-trace-timeline-flame-chart-profiler-for-Clang/

I have not yet seen a tool that converts it into ASCII as you 
propose, might be a nice project for someone to work on :)

(the chromium time trace format is used by other tools too, 
notably Clang, so the ASCII tool would be appreciated by more 
than just LDC users)

cheers,
   Johan

Feb 24 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

Okay I have found something that can be improved very easily!

std.regex.internal.parser:

Inside of Parser struct:

```d
      trusted void error(string msg)
     {
         import std.array : appender;
         import std.format.write : formattedWrite;
         auto app = appender!string();

         app ~= msg;
         app ~= "\nPattern with error: `";
         app ~= origin[0..$-pat.length];
         app ~= "` <--HERE-- `";
         app ~= pat;
         app ~= "`";

         throw new RegexException(app.data);
     }
```

That'll cut out ~100ms by removing formattedWrite!


Oooo ``static immutable CharMatcher matcher = 
CharMatcher(wordCharacter);`` is causing 541ms of slowness. And that 
line isn't showing up in the profile, so that could be improved, we need 
to also have CTFE initialization of say globals in it.

Next big jump for the above:

```d
 property auto wordMatcher()()
{
     return CharMatcher(unicode.Alphabetic | unicode.Mn | unicode.Mc
         | unicode.Me | unicode.Nd | unicode.Pc);
}
```

Add some pure annotations to CharMatcher and BitTable constructors.

These two things take out a good 700ms!

Looks like constructors are not showing up at all. KickStart from 
std.regex.internal.kickstart is not showing up for postprocess from 
std.regex.internal.parser. Not that we can do anything there just by 
simply removing stuff (text call shows up but it doesn't benefit much).

Okay looks like I'm at the 62ms mark. There is certainly more things to 
do but its starting to get into premature optimize territory 
individually, I'll do a PR for the above sets of changes.

Feb 24 2023

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 2/24/23 05:07, Richard (Rikki) Andrew Cattermole wrote:

 Okay looks like I'm at the 62ms mark.

Too good to be true! :p

Thank you for working on this. This kind of effort improves everybody's 
life.

Ali

Feb 24 2023

Johan <j j.nl> writes:

On Friday, 24 February 2023 at 10:52:37 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I'm going to be totally honest, I have no idea how to use that 
 information.

 Its not in a format that is easy to figure out.

 What I would want is stuff like this:

 ```
 module
 |- function
 | |- initialize template module thing  ~200ms
 | | |- ran CTFE on thingie  ~150us
 ```

 Give me that, and this type of hunting would be a cake walk I 
 think.

This was a nice small project. See here: 
https://gist.github.com/JohanEngelen/907c7681e4740f82d37fd2f2244ba7bf

Looking forward to your feedback about improvements to 
--ftime-trace ;-)

Cheers,
   Johan

Feb 24 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

So there is a problem with time trace handling, it doesn't escape 
Windows paths so you end up with an exception on \T tather than \\T.

I've gone ahead and modified the tool, did some cleaning up, added a 
second output file that allows for consumption in a spreadsheet 
application, sorted by duration automatically.

I'd love to see the time trace switch upstreamed into dmd. We can then 
distribute this tool for an out of the box visualization experience that 
doesn't require a web browser. And of course globals need work, not just 
Windows path escaping ;)

It is an absolutely lovely tool that will ease a lot of peoples concerns 
over debugging compile times. Gonna be worth a blog article!

I'll put my version here also:


// Run using: rdmd timeTraceTree.d <your .time-trace file>
// Outputs timetrace.txt in current dir
module timeTraceTree;
import std.stdio;
import std.file;
import std.json;
import std.range;
import std.conv;
import std.algorithm;

File outputTextFile, outputTSVFile;
static string duration_format_string = "%13.3f ";

JSONValue sourceFile;
JSONValue[] metadata; // "M"
JSONValue[] counterdata; // "C"
JSONValue[] processes; // "X"

ulong lineNumberCounter = 1;

int main(string[] args)
{
     if (args.length < 1)
         return 1;

     auto input_json = read(args[1]).to!string;
     outputTextFile = File("timetrace.txt", "w");
     outputTSVFile = File("timetrace.tsv", "w");

     {
         sourceFile = parseJSON(input_json);
         readMetaData;
         constructTree;
         constructList;
     }

     {
         outputTextFile.writeln("Timetrace: ", args[1]);
         lineNumberCounter++;

         outputTextFile.writeln("Metadata:");
         lineNumberCounter++;

         foreach (node; metadata)
         {
             outputTextFile.write("  ");
             outputTextFile.writeln(node);
             lineNumberCounter++;
         }

         outputTextFile.writeln("Duration (ms)");
         lineNumberCounter++;
     }

     foreach (i, ref child; Node.root.children)
         child.print(0, false);

     outputTSVFile.writeln("Duration\tText Line 
Number\tName\tLocation\tDetail");
     foreach (node; Node.all)
         outputTSVFile.writeln(node.duration, "\t", node.lineNumber, "\t",
                 node.name, "\t", node.location, "\t", node.detail);

     return 0;
}

void readMetaData()
{
     auto beginningOfTime = sourceFile["beginningOfTime"].get!ulong;
     auto traceEvents = sourceFile["traceEvents"].get!(JSONValue[]);

     // Read meta data
     foreach (value; traceEvents)
     {
         switch (value["ph"].get!string)
         {
         case "M":
             metadata ~= value;
             break;
         case "C":
             counterdata ~= value;
             break;
         case "X":
             processes ~= value;
             break;
         default: //drop
         }
     }

     // process node = {"ph":"X","name": "Sema1: Module 
object","ts":26825,"dur":1477,"loc":"<no file>","args":{"detail": 
"","loc":"<no file>"},"pid":101,"tid":101},
     // Sort time processes
     multiSort!(q{a["ts"].get!ulong < b["ts"].get!ulong}, 
q{a["dur"].get!ulong > b["dur"].get!ulong})(
             processes);
}

void constructTree()
{
     // Build tree (to get nicer looking structure lines)
     Node*[] parent_stack = [&Node.root]; // each stack item represents 
the first uncompleted note of that level in the tree

     foreach (ref process; processes)
     {
         auto last_ts = process["ts"].get!ulong + process["dur"].get!ulong;
         size_t parent_idx = 0; // index in parent_stack to which this 
item should be added.

         foreach (i; 0 .. parent_stack.length)
         {
             if (last_ts > parent_stack[i].last_ts)
             {
                 // The current process outlasts stack item i. Stop 
traversing, parent is i-1;
                 parent_idx = i - 1;
                 parent_stack.length = i;
                 break;
             }

             parent_idx = i;
         }

         parent_stack[parent_idx].children ~= Node(&process, last_ts);
         parent_stack ~= &parent_stack[parent_idx].children[$ - 1];
         Node.count++;
     }
}

void constructList()
{
     size_t offset;

     Node.all.length = Node.count - 1;

     void handle(Node* root)
     {
         Node.all[offset++] = root;

         foreach (ref child; root.children)
             handle(&child);
     }

     foreach (ref child; Node.root.children)
         handle(&child);

     Node.all.sort!((a, b) => a.duration > b.duration);
}

struct Node
{
     Node[] children;
     JSONValue* json;
     ulong last_ts; // represents the last timestamp of this node (i.e. 
ts + dur)
     ulong lineNumber;

     string name;
     ulong duration;
     string location;
     string detail;

     this(JSONValue* json, ulong last_ts)
     {
         this.json = json;
         this.last_ts = last_ts;

         if ((*json).type == JSONType.object && "dur" in (*json))
         {
             this.duration = (*json)["dur"].get!ulong;
             this.name = (*json)["name"].get!string;
             this.location = (*json)["args"]["loc"].get!string;
             this.detail = (*json)["args"]["detail"].get!string;
         }
     }

     void print(uint indentLevel, bool last_child)
     {
         char[] identPrefix = getIdentPrefix(indentLevel, last_child);

         import std.stdio;

         if (last_child)
         {
             identPrefix[$ - 4] = ' ';
             identPrefix[$ - 3 .. $] = "\u2514";
         }
         else
             identPrefix[$ - 2 .. $] = " |";

         outputTextFile.writef(duration_format_string,
                 cast(double)(*this.json)["dur"].get!ulong / 1000);

         outputTextFile.write(identPrefix);
         outputTextFile.write("- ", this.name);
         outputTextFile.write(", ", this.detail);
         outputTextFile.writeln(", ", this.location);

         this.lineNumber = lineNumberCounter;
         lineNumberCounter++;

         if (last_child)
             identPrefix[$ - 4 .. $] = ' ';

         foreach (i, ref child; this.children)
             child.print(indentLevel + 1, i == this.children.length - 1);
     }

     static Node root = Node(new JSONValue("Tree root"), ulong.max);
     static Node*[] all;
     static size_t count = 1;
}

char[] getIdentPrefix(uint indentLevel, bool last_child)
{
     static char[] buffer;

     size_t needed = ((indentLevel + 1) * 2) + (last_child * 2);

     if (buffer.length < needed)
         buffer.length = needed;

     return buffer;
}

Feb 25 2023

Johan <j j.nl> writes:

On Saturday, 25 February 2023 at 13:55:00 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 So there is a problem with time trace handling, it doesn't 
 escape Windows paths so you end up with an exception on \T 
 tather than \\T.

I don't quite understand what you mean.

 I've gone ahead and modified the tool, did some cleaning up, 
 added a second output file that allows for consumption in a 
 spreadsheet application, sorted by duration automatically.

 I'd love to see the time trace switch upstreamed into dmd.

https://github.com/dlang/dmd/pull/13965

 We can then distribute this tool for an out of the box 
 visualization experience that doesn't require a web browser. 
 And of course globals need work, not just Windows path escaping 
 ;)

I'll add the tool to LDC.

 It is an absolutely lovely tool that will ease a lot of peoples 
 concerns over debugging compile times. Gonna be worth a blog 
 article!

Thanks. Looking forward.
I don't remember adding CTFE times to the traces, so that sounds 
like a clear improvement point? Or was it still useful for you to 
tackle the issue of the OP?

 I'll put my version here also:

Thanks :)

Feb 25 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 26/02/2023 5:49 AM, Johan wrote:
 On Saturday, 25 February 2023 at 13:55:00 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 So there is a problem with time trace handling, it doesn't escape 
 Windows paths so you end up with an exception on \T tather than \\T.

 
 I don't quite understand what you mean.

{"ph":"M","ts":0,"args":{"name":"C:\Tools\D\ldc2-1.30.0-beta1-windows-multilib\bin\ldc2.exe"},"name":"process_name","pid":101,"tid":101},

Needs to be:

{"ph":"M","ts":0,"args":{"name":"C:\\Tools\\D\\ldc2-1.30.0-beta1-windows-multilib\\bin\\ldc2.exe"},"name":"process_name","pid":101,"tid":101},

 I've gone ahead and modified the tool, did some cleaning up, added a 
 second output file that allows for consumption in a spreadsheet 
 application, sorted by duration automatically.

 I'd love to see the time trace switch upstreamed into dmd.

 
 https://github.com/dlang/dmd/pull/13965
 
 We can then distribute this tool for an out of the box visualization 
 experience that doesn't require a web browser. And of course globals 
 need work, not just Windows path escaping ;)

 
 I'll add the tool to LDC.
 
 It is an absolutely lovely tool that will ease a lot of peoples 
 concerns over debugging compile times. Gonna be worth a blog article!

 
 Thanks. Looking forward.
 I don't remember adding CTFE times to the traces, so that sounds like a 
 clear improvement point? Or was it still useful for you to tackle the 
 issue of the OP?

Basically right now globals are not leading to anything in the output.

```
void func() {
	static immutable Thing thing = Thing(123);
}
```

The constructor call for Thing won't show up. This is the big one for 
std.regex basically.

Feb 25 2023

Johan <j j.nl> writes:

On Sunday, 26 February 2023 at 07:25:45 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Basically right now globals are not leading to anything in the 
 output.

 ```
 void func() {
 	static immutable Thing thing = Thing(123);
 }
 ```

 The constructor call for Thing won't show up. This is the big 
 one for std.regex basically.

https://github.com/ldc-developers/ldc/pull/4339

-Johan

Mar 01 2023

D Programming

C/C++ Programming

Other

digitalmars.D - Why is std.regex slow, well here is one reason!