www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - GREETINGS FROM iSTANBUL

reply Salih Dincer <salihdb hotmail.com> writes:
Greetings from istanbul...

In our language, the capital letter 'i' is used, similar to the 
lower case.  But in this example:
```d
// D 2.0.83

import std.stdio, std.uni;

void main()
{
   auto message = "Greetings from istanbul"d;

   message.asUpperCase.writeln; // GREETINGS FROM ISTANBUL

   /* D is very talented at this,
    * except for one letter: 'i'
    * ref: https://en.m.wikipedia.org/wiki/Istanbul
    */
}
```
I've to code a custom solution.  Is it possible to solve the 
problem from within std.uni?

We are discussing the issue in our own community.  I also saw: 
https://forum.dlang.org/post/vxnnykllgxsghlludpqv forum.dlang.org

Thanks...
Aug 01 2021
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
It appears you are using the wrong lowercase character.

https://en.wikipedia.org/wiki/Dotted_and_dotless_I

 From a quick experiment, it appears std.uni is treating the upper case 
dotted I's lower case as a grapheme. Which it probably shouldn't be as 
there is an actual character for that.

We might need to update our unicode database... or something.
Aug 01 2021
parent reply Paul Backus <snarwin gmail.com> writes:
On Sunday, 1 August 2021 at 17:56:00 UTC, rikki cattermole wrote:
 It appears you are using the wrong lowercase character.

 https://en.wikipedia.org/wiki/Dotted_and_dotless_I

 From a quick experiment, it appears std.uni is treating the 
 upper case dotted I's lower case as a grapheme. Which it 
 probably shouldn't be as there is an actual character for that.

 We might need to update our unicode database... or something.
It's not the wrong lower-case character. Turkish uses U+0069 (a.k.a. ASCII 'i') for lower-case dotted I, but has a non-default case mapping that pairs U+0069 with U+0130 ('İ') rather than U+0049 (ASCII 'I'). Phobos' std.uni uses the default case mapping for its toUpper function, so it does not produce the correct result for Turkish text. Source: https://www.unicode.org/faq/casemap_charprop.html#1 A common solution to this in other languages is to have a version of toUpper that takes a locale as an argument. Some examples: - Javascript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleUpperCase - Go: https://pkg.go.dev/strings#ToUpperSpecial - Java: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#toUpperCase(java.util.Locale) https://docs.microsoft.com/en-US/dotnet/api/system.string.toupper?view=net-5.0
Aug 01 2021
next sibling parent Salih Dincer <salihdb hotmail.com> writes:
On Sunday, 1 August 2021 at 18:22:05 UTC, Paul Backus wrote:
 On Sunday, 1 August 2021 at 17:56:00 UTC, rikki cattermole 
 wrote:
 It appears you are using the wrong lowercase character.
I think so too, here's the proof: ```d import std.string, std.stdio; void main() { auto istanbul = "\u0131stanbul"; enum capitalized = "Istanbul"; assert(istanbul.capitalize == capitalized); assert("istanbul".capitalize == capitalized); } ``` Different characters but same and seamless results...
Aug 01 2021
prev sibling parent Salih Dincer <salihdb hotmail.com> writes:
On Sunday, 1 August 2021 at 18:22:05 UTC, Paul Backus wrote:
 A common solution to this in other languages is to have a 
 version of toUpper that takes a locale as an argument. Some 
 examples:

 - Javascript: 
 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleUpperCase
I did not know that; exactly that I want to talk about. So clean code... Thank you Paul.
Aug 01 2021