www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Best approach to handle accented letters

reply Alfred Newman <alfredonewman gmail.com> writes:
Hello,

I'm getting some troubles to replace the accented letters in a 
given string with their unaccented counterparts.

Let's say I have the following input string "très élégant" and I 
need to create a function to return just "tres elegant". 
Considering we need to take care about unicode chars, what is the 
best way to write a D code to handle that ?

Cheers
Oct 28 2016
next sibling parent reply Chris <wendlec tcd.ie> writes:
On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
 Hello,

 I'm getting some troubles to replace the accented letters in a 
 given string with their unaccented counterparts.

 Let's say I have the following input string "très élégant" and 
 I need to create a function to return just "tres elegant". 
 Considering we need to take care about unicode chars, what is 
 the best way to write a D code to handle that ?

 Cheers
You could try something like this. It works for accents. I haven't tested it on other characters yet. import std.stdio; import std.algorithm; import std.array; import std.conv; enum { dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U'] } void main() { auto str = "très élégant"; auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a)); writeln(removed); // prints "tres elegant" }
Oct 28 2016
parent reply Alfred Newman <alfredonewman gmail.com> writes:
On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote:
 On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
 Hello,

 I'm getting some troubles to replace the accented letters in a 
 given string with their unaccented counterparts.

 Let's say I have the following input string "très élégant" and 
 I need to create a function to return just "tres elegant". 
 Considering we need to take care about unicode chars, what is 
 the best way to write a D code to handle that ?

 Cheers
You could try something like this. It works for accents. I haven't tested it on other characters yet. import std.stdio; import std.algorithm; import std.array; import std.conv; enum { dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U'] } void main() { auto str = "très élégant"; auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a)); writeln(removed); // prints "tres elegant" }
Chris As a new guy in the D community, I am not sure, but I think the line below is something like a Python's lambda, right ? auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a)); Can you please rewrite the line in a more didatic way ? Sorry, but I'm still learning the basics. Thanks in advance
Oct 28 2016
parent reply Chris <wendlec tcd.ie> writes:
On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote:
 On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote:
 [...]
Chris As a new guy in the D community, I am not sure, but I think the line below is something like a Python's lambda, right ? auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a)); Can you please rewrite the line in a more didatic way ? Sorry, but I'm still learning the basics. Thanks in advance
It boils down to something like: if (c in _accent) return _accent[c]; else return c; Just a normal lambda (condition true) ? yes : no; I'd recommend you to use Marc's approach, though.
Oct 28 2016
parent reply Chris <wendlec tcd.ie> writes:
On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote:
 On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote:

 It boils down to something like:

 if (c in _accent)
   return _accent[c];
 else
   return c;

 Just a normal lambda (condition true) ? yes : no;

 I'd recommend you to use Marc's approach, though.
What you basically do is you pass the logic on to `map` and `map` applies it to each item in the range (cf. [1]): map!(myLogic)(range); or (more idiomatic) range.map!(myLogic); This is true of a lot of functions, or rather templates, in the Phobos standard library, especially functions in std.algorithm (like find [2], canFind, filter etc.). In this way, instead of writing for-loops with if-else statements, you pass the logic to be applied within the `!()`-part of the template. // Filter the letter 'l' auto result = "Hello, world!".filter!(a => a != 'l'); // returns "Heo, word!" However, what is returned is not a string. So this won't work: `writeln("Result is " ~ result);` // Error: incompatible types for (("Result is ") ~ (result)): 'string' and // 'FilterResult!(__lambda2, string)' It returns a `FilterResult`. To fix this, you can either write: ` import std.conv; auto result = "Hello, world!".filter!(a => a != 'l').to!string; ` which converts it into a string. or you do this: ` import std.array; auto result = "Hello, world!".filter!(a => a != 'l').array; ` Then you have a string again and ` writeln("Result is " ~ result); ` works. Just bear that in mind, because you will get the above error sometimes. Marc's example is idiomatic D and you should become familiar with it asap. void main() { auto str = "très élégant"; immutable accents = unicode.Diacritic; auto removed = str // normalize each character .normalize!NFD // replace each diacritic with its non-diacritic counterpart .filter!(c => !accents[c]) // convert each item in FilterResult back to string. .to!string; writeln(removed); // prints "tres elegant" }
Oct 28 2016
parent Alfred Newman <alfredonewman gmail.com> writes:
On Friday, 28 October 2016 at 15:08:59 UTC, Chris wrote:
 On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote:
 [...]
What you basically do is you pass the logic on to `map` and `map` applies it to each item in the range (cf. [1]): [...]
The life is beautiful ! Thx.
Oct 28 2016
prev sibling parent reply Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
 Hello,

 I'm getting some troubles to replace the accented letters in a 
 given string with their unaccented counterparts.

 Let's say I have the following input string "très élégant" and 
 I need to create a function to return just "tres elegant". 
 Considering we need to take care about unicode chars, what is 
 the best way to write a D code to handle that ?

 Cheers
import std.stdio; import std.algorithm; import std.uni; import std.conv; void main() { auto str = "très élégant"; immutable accents = unicode.Diacritic; auto removed = str .normalize!NFD .filter!(c => !accents[c]) .to!string; writeln(removed); // prints "tres elegant" } This first decomposes all characters into base and diacritic, and then removes the latter.
Oct 28 2016
parent Chris <wendlec tcd.ie> writes:
On Friday, 28 October 2016 at 12:52:04 UTC, Marc Schütz wrote:
 On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
 [...]
import std.stdio; import std.algorithm; import std.uni; import std.conv; void main() { auto str = "très élégant"; immutable accents = unicode.Diacritic; auto removed = str .normalize!NFD .filter!(c => !accents[c]) .to!string; writeln(removed); // prints "tres elegant" } This first decomposes all characters into base and diacritic, and then removes the latter.
Cool. That looks pretty neat and it should cover all cases.
Oct 28 2016