www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Breaking news: std.uni changes!

reply Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:
Hello one and all on this merry of all days!

Today unfortunately I bring all but joy. For std.uni has had a 
bout of work!

- Unicode tables have been updated to 15 from 6.2 (and with that 
the generator is now in Phobos!).
- Unicode categories C aka Other have been brought in line with 
TR44 specification. E.g. ``unicode.C``.

In both cases if you use std.uni directly or indirectly (say 
std.regex), you may find yourself with code breakage on next 
release.

If you do find yourself with problems, first check that you are 
not referencing the C category, if you are, here is some code to 
mitigate your circumstance however it would be better to prevent 
such need.

```d
 property auto loadPropertyOriginal(string name)() pure
{
     import std.uni : unicode;

     static if (name == "C" || name == "c" || name == "other" || 
name == "Other")
     {
         auto target = unicode.Co;
         target |= unicode.Lo;
         target |= unicode.No;
         target |= unicode.So;
         target |= unicode.Po;
         return target;
     }
     else
         return unicode.opDispatch!name;
}
```

Lastly, the tables updating have already brought much joy to MIR, 
with a broken test. A character that was being tested wasn't 
allocated in 6.2 but was in 7 therefore results were different. 
If your test suite is not part of the Phobos runners, please be 
aware that once you update you may experience failed tests. These 
are not avoidable due to external specification its based upon. 
However in even worse news the table generator was not kept in a 
working condition in the last 10 years, so there is a chance that 
something may have been missed.

In all cases, please do contact me if you need assistance. I'm 
available on Discord, OFTC #d and of course N.G. or even email if 
you really need it (firstname lastname.co.nz).

--- Happy holidays to those that are currently enjoying them or 
about to!
Dec 24 2022
next sibling parent Dom Disc <dominikus scherkl.de> writes:
On Saturday, 24 December 2022 at 21:26:40 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 - Unicode tables have been updated to 15 from 6.2 (and with 
 that the generator is now in Phobos!).
Hurray! Whatever problems this may cause, its problems in very very outdated code that would already need an overhaul, so what. But it's super to have finally tables that are (at least now) up to date!
Dec 25 2022
prev sibling next sibling parent Robert Schadek <rburners gmail.com> writes:
Awesome work, thank you
Dec 26 2022
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
A big thank you!
Dec 26 2022
prev sibling parent reply Dukc <ajieskola gmail.com> writes:
On Saturday, 24 December 2022 at 21:26:40 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Hello one and all on this merry of all days!

 Today unfortunately I bring all but joy. For std.uni has had a 
 bout of work!

 - Unicode tables have been updated to 15 from 6.2 (and with 
 that the generator is now in Phobos!).
 - Unicode categories C aka Other have been brought in line with 
 TR44 specification. E.g. ``unicode.C``.
This is a big service for us at Symmetry. Getting Unicode support up to date was needed, we would have had to switch libraries at some point or update it ourselves. But now, nothing to do except perhaps dealing with a bit of breakage. Thank you! I see it's not quite Unicode 15 though. `graphemeStride` does not take Emoji sequences and prepend characters into account. I'm going to contribute a bit now since it's holiday, and this is a good task for me. PR coming soon unless I run into issues!
Dec 27 2022
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 28/12/2022 12:13 AM, Dukc wrote:
 This is a big service for us at Symmetry. Getting Unicode support up to 
 date was needed, we would have had to switch libraries at some point or 
 update it ourselves. But now, nothing to do except perhaps dealing with 
 a bit of breakage. Thank you!
I had no idea that this was becoming an issue for you guys. It wasn't in any of the meeting notes and I haven't seen it brought up anywhere. So if there is anything more like this, please talk about it!
 I see it's not quite Unicode 15 though. `graphemeStride` does not take 
 Emoji sequences and prepend characters into account. I'm going to 
 contribute a bit now since it's holiday, and this is a good task for me. 
 PR coming soon unless I run into issues!
Yeah, there will be tons of small stuff currently missed out due to such a big jump and of course ping me rikkimax, when you have something to review. Loads of other work available such as culling all the version specific information out of the docs :)
Dec 27 2022
parent reply Dukc <ajieskola gmail.com> writes:
(Sorry for the late answer)

On Wednesday, 28 December 2022 at 00:10:36 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 28/12/2022 12:13 AM, Dukc wrote:
 This is a big service for us at Symmetry. Getting Unicode 
 support up to date was needed, we would have had to switch 
 libraries at some point or update it ourselves. But now, 
 nothing to do except perhaps dealing with a bit of breakage. 
 Thank you!
I had no idea that this was becoming an issue for you guys. It wasn't in any of the meeting notes and I haven't seen it brought up anywhere. So if there is anything more like this, please talk about it!
Yes, I should have done that.
 I see it's not quite Unicode 15 though. `graphemeStride` does 
 not take Emoji sequences and prepend characters into account. 
 I'm going to contribute a bit now since it's holiday, and this 
 is a good task for me. PR coming soon unless I run into issues!
Yeah, there will be tons of small stuff currently missed out due to such a big jump and of course ping me rikkimax, when you have something to review. Loads of other work available such as culling all the version specific information out of the docs :)
Other things coming to mind: Bidirectional grapheme iteration, Word break and line break algorithms, lazy normalisation. Indeed, lots of improvement potential.
Jan 02 2023
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 03/01/2023 10:24 AM, Dukc wrote:
 Other things coming to mind: Bidirectional grapheme iteration, Word 
 break and line break algorithms, lazy normalisation. Indeed, lots of 
 improvement potential.
I've done word break, "lazy" normalization (so can stop at any point), and lazy case insensitive comparison with normalization. But: Bidirectional grapheme iteration makes my eye twitch lol. My main concern for adding new features is increasing the size of Phobos binary for the tables. Most people don't need a lot of these optional algorithms, but they do need things like casing to work correctly (which makes increased size worth it).
Jan 02 2023
next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jan 03, 2023 at 05:13:53PM +1300, Richard (Rikki) Andrew Cattermole via
Digitalmars-d-announce wrote:
 On 03/01/2023 10:24 AM, Dukc wrote:
 Other things coming to mind: Bidirectional grapheme iteration,
 Word break and line break algorithms, lazy normalisation. Indeed,
 lots of improvement potential.
I've done word break, "lazy" normalization (so can stop at any point), and lazy case insensitive comparison with normalization. But: Bidirectional grapheme iteration makes my eye twitch lol. My main concern for adding new features is increasing the size of Phobos binary for the tables. Most people don't need a lot of these optional algorithms, but they do need things like casing to work correctly (which makes increased size worth it).
Is there a way to make these tables pay-as-you-go? As in, if you never call a function that depends on a table, it would not be pulled into the binary? T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
Jan 02 2023
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 03/01/2023 6:13 PM, H. S. Teoh wrote:
 Is there a way to make these tables pay-as-you-go? As in, if you never
 call a function that depends on a table, it would not be pulled into the
 binary?
This should already be the case. I saw some stuff involving Rainer 10 years ago who helped improve it along these lines. The main concern would be shared libraries, which Phobos should be able to be distributed as on all platforms by all compilers.
Jan 02 2023
parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Tuesday, 3 January 2023 at 05:23:55 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 The main concern would be shared libraries, which Phobos should 
 be able to be distributed as on all platforms by all compilers.
I said this on the discord chat but you should really just dynamic load the system icu if it is available.
Jan 03 2023
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/01/2023 2:58 AM, Adam D Ruppe wrote:
 On Tuesday, 3 January 2023 at 05:23:55 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 The main concern would be shared libraries, which Phobos should be 
 able to be distributed as on all platforms by all compilers.
I said this on the discord chat but you should really just dynamic load the system icu if it is available.
Ideally. We still need an implementation for CTFE though. Its just a lot of work to shoehorn it in now.
Jan 03 2023
prev sibling parent reply Dukc <ajieskola gmail.com> writes:
On Tuesday, 3 January 2023 at 04:13:53 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 03/01/2023 10:24 AM, Dukc wrote:
 Other things coming to mind: Bidirectional grapheme iteration, 
 Word break and line break algorithms, lazy normalisation. 
 Indeed, lots of improvement potential.
I've done word break, "lazy" normalization (so can stop at any point), and lazy case insensitive comparison with normalization.
Can't wait to see them in master!
 But: Bidirectional grapheme iteration makes my eye twitch lol.
I did write a reverse grapheme iterator for Symmetry. It isn't fit for Phobos as-is since it only accepts UTF-8 strings (not other ranges) and is modeled after the Phobos grapheme walker, not the 15.0 standard. But I could ask for permission to give it to you if it'd help.
Jan 03 2023
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 04/01/2023 2:51 AM, Dukc wrote:
 On Tuesday, 3 January 2023 at 04:13:53 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 03/01/2023 10:24 AM, Dukc wrote:
 Other things coming to mind: Bidirectional grapheme iteration, Word 
 break and line break algorithms, lazy normalisation. Indeed, lots of 
 improvement potential.
I've done word break, "lazy" normalization (so can stop at any point), and lazy case insensitive comparison with normalization.
Can't wait to see them in master!
 But: Bidirectional grapheme iteration makes my eye twitch lol.
I did write a reverse grapheme iterator for Symmetry. It isn't fit for Phobos as-is since it only accepts UTF-8 strings (not other ranges) and is modeled after the Phobos grapheme walker, not the 15.0 standard. But I could ask for permission to give it to you if it'd help.
I probably won't be adding any new features to std.uni. Only finishing off the things that annoy me and reviewing other peoples work. I've got enough on my plate just building my own "standard library" https://github.com/Project-Sidero/basic_memory :)
Jan 03 2023