digitalmars.D.learn - Best approach to handle accented letters

Alfred Newman (8/8) Oct 28 2016 Hello,

Chris (19/27) Oct 28 2016 You could try something like this. It works for accents. I

Alfred Newman (9/39) Oct 28 2016 @Chris

Chris (8/18) Oct 28 2016 It boils down to something like:

Chris (55/63) Oct 28 2016 What you basically do is you pass the logic on to `map` and `map`

Alfred Newman (3/8) Oct 28 2016 The life is beautiful !

Marc =?UTF-8?B?U2Now7x0eg==?= (17/25) Oct 28 2016 import std.stdio;

Chris (2/20) Oct 28 2016 Cool. That looks pretty neat and it should cover all cases.

Alfred Newman <alfredonewman gmail.com> writes:

Hello,

I'm getting some troubles to replace the accented letters in a 
given string with their unaccented counterparts.

Let's say I have the following input string "très élégant" and I 
need to create a function to return just "tres elegant". 
Considering we need to take care about unicode chars, what is the 
best way to write a D code to handle that ?

Cheers

Oct 28 2016

Chris <wendlec tcd.ie> writes:

On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
 Hello,

 I'm getting some troubles to replace the accented letters in a 
 given string with their unaccented counterparts.

 Let's say I have the following input string "très élégant" and 
 I need to create a function to return just "tres elegant". 
 Considering we need to take care about unicode chars, what is 
 the best way to write a D code to handle that ?

 Cheers

You could try something like this. It works for accents. I 
haven't tested it on other characters yet.

import std.stdio;
import std.algorithm;
import std.array;
import std.conv;

enum
{
   dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 
'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U']
}

void main()
{
   auto str = "très élégant";
   auto removed = to!string(str.map!(a => (a in _accent) ? 
_accent[a] : a));
   writeln(removed);  // prints "tres elegant"
}

Oct 28 2016

Alfred Newman <alfredonewman gmail.com> writes:

On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote:
 On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
 Hello,

 I'm getting some troubles to replace the accented letters in a 
 given string with their unaccented counterparts.

 Let's say I have the following input string "très élégant" and 
 I need to create a function to return just "tres elegant". 
 Considering we need to take care about unicode chars, what is 
 the best way to write a D code to handle that ?

 Cheers

 You could try something like this. It works for accents. I 
 haven't tested it on other characters yet.

 import std.stdio;
 import std.algorithm;
 import std.array;
 import std.conv;

 enum
 {
   dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 
 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U']
 }

 void main()
 {
   auto str = "très élégant";
   auto removed = to!string(str.map!(a => (a in _accent) ? 
 _accent[a] : a));
   writeln(removed);  // prints "tres elegant"
 }

 Chris

As a new guy in the D community, I am not sure, but I think the 
line below is something like a Python's lambda, right ?

auto removed = to!string(str.map!(a => (a in _accent) ? 
_accent[a] : a));

Can you please rewrite the line in a more didatic way ? Sorry, 
but I'm still learning the basics.

Thanks in advance

Oct 28 2016

Chris <wendlec tcd.ie> writes:

On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote:
 On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote:
 [...]

  Chris

 As a new guy in the D community, I am not sure, but I think the 
 line below is something like a Python's lambda, right ?

 auto removed = to!string(str.map!(a => (a in _accent) ? 
 _accent[a] : a));

 Can you please rewrite the line in a more didatic way ? Sorry, 
 but I'm still learning the basics.

 Thanks in advance

It boils down to something like:

if (c in _accent)
   return _accent[c];
else
   return c;

Just a normal lambda (condition true) ? yes : no;

I'd recommend you to use Marc's approach, though.

Oct 28 2016

Chris <wendlec tcd.ie> writes:

On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote:
 On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote:

 It boils down to something like:

 if (c in _accent)
   return _accent[c];
 else
   return c;

 Just a normal lambda (condition true) ? yes : no;

 I'd recommend you to use Marc's approach, though.

What you basically do is you pass the logic on to `map` and `map` 
applies it to each item in the range (cf. [1]):

map!(myLogic)(range);

or (more idiomatic)

range.map!(myLogic);

This is true of a lot of functions, or rather templates, in the 
Phobos standard library, especially functions in std.algorithm 
(like find [2], canFind, filter etc.). In this way, instead of 
writing for-loops with if-else statements, you pass the logic to 
be applied within the `!()`-part of the template.

// Filter the letter 'l'
auto result = "Hello, world!".filter!(a => a != 'l'); // returns 
"Heo, word!"

However, what is returned is not a string. So this won't work:

`writeln("Result is " ~ result);`

// Error: incompatible types for (("Result is ") ~ (result)): 
'string' and
// 'FilterResult!(__lambda2, string)'

It returns a `FilterResult`.

To fix this, you can either write:
`
import std.conv;
auto result = "Hello, world!".filter!(a => a != 'l').to!string;
`
which converts it into a string.

or you do this:

`
import std.array;
auto result = "Hello, world!".filter!(a => a != 'l').array;
`

Then you have a string again and

`
writeln("Result is " ~ result);
`
works.

Just bear that in mind, because you will get the above error 
sometimes. Marc's example is idiomatic D and you should become 
familiar with it asap.

void main()
{
     auto str = "très élégant";
     immutable accents = unicode.Diacritic;
     auto removed = str
         // normalize each character
         .normalize!NFD
         // replace each diacritic with its non-diacritic 
counterpart
         .filter!(c => !accents[c])
         // convert each item in FilterResult back to string.
         .to!string;
     writeln(removed);  // prints "tres elegant"
}

Oct 28 2016

Alfred Newman <alfredonewman gmail.com> writes:

On Friday, 28 October 2016 at 15:08:59 UTC, Chris wrote:
 On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote:
 [...]

 What you basically do is you pass the logic on to `map` and 
 `map` applies it to each item in the range (cf. [1]):

 [...]

The life is beautiful !
Thx.

Oct 28 2016

Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:

On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
 Hello,

 I'm getting some troubles to replace the accented letters in a 
 given string with their unaccented counterparts.

 Let's say I have the following input string "très élégant" and 
 I need to create a function to return just "tres elegant". 
 Considering we need to take care about unicode chars, what is 
 the best way to write a D code to handle that ?

 Cheers

import std.stdio;
import std.algorithm;
import std.uni;
import std.conv;

void main()
{
     auto str = "très élégant";
     immutable accents = unicode.Diacritic;
     auto removed = str
         .normalize!NFD
         .filter!(c => !accents[c])
         .to!string;
     writeln(removed);  // prints "tres elegant"
}

This first decomposes all characters into base and diacritic, and 
then removes the latter.

Oct 28 2016

Chris <wendlec tcd.ie> writes:

On Friday, 28 October 2016 at 12:52:04 UTC, Marc Schütz wrote:
 On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
 [...]

 import std.stdio;
 import std.algorithm;
 import std.uni;
 import std.conv;

 void main()
 {
     auto str = "très élégant";
     immutable accents = unicode.Diacritic;
     auto removed = str
         .normalize!NFD
         .filter!(c => !accents[c])
         .to!string;
     writeln(removed);  // prints "tres elegant"
 }

 This first decomposes all characters into base and diacritic, 
 and then removes the latter.

Cool. That looks pretty neat and it should cover all cases.

Oct 28 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Best approach to handle accented letters