digitalmars.D.bugs - [Issue 5543] New: to!int to see a char as a single-char string

d-bugmail puremagic.com (30/30) Feb 07 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5543

d-bugmail puremagic.com (12/12) Dec 18 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (14/14) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (31/42) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (13/21) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (27/49) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (11/11) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (11/11) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (8/10) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (29/36) Dec 21 2012 I'm a bit too busy to do the actual pull, but I wrote code, doc and test...
d-bugmail puremagic.com (40/66) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (13/89) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (13/14) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (9/21) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (14/17) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (12/12) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (11/22) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (8/24) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (16/17) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (8/29) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543
d-bugmail puremagic.com (15/31) Dec 21 2012 http://d.puremagic.com/issues/show_bug.cgi?id=5543

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543

           Summary: to!int to see a char as a single-char string
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



In DMD 2.051 to!int acts as cast(int) on chars:

import std.conv: to;
void main() {
    assert(to!int("1") == 1);
    assert(cast(int)'1' == 49);
    assert(to!int('1') == 49);
}


But I think this is more handy:

import std.conv: to;
void main() {
    assert(to!int("1") == 1);
    assert(cast(int)'1' == 49);
    assert(to!int('1') == 1);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Feb 07 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pull
                 CC|                            |andrej.mitrovich gmail.com
         AssignedTo|nobody puremagic.com        |andrej.mitrovich gmail.com



09:58:10 PST ---
https://github.com/D-Programming-Language/phobos/pull/1017

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 18 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




06:37:34 PST ---
 bear: Please see the comments here:
https://github.com/D-Programming-Language/phobos/pull/1017

The feature can be implemented but to!() was rejected, so we need to come up
with some alternative function names and put them somewhere other than
std.conv. 

Personally I don't see how people will be expected to find an obscure function
name like 'codePointIdx'. This isn't related unicode representation at all,
there should be no confusion with Unicode when it comes to representing 0-9,
it's always the same regardless of encoding.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |monarchdodra gmail.com




  bear: Please see the comments here:
 https://github.com/D-Programming-Language/phobos/pull/1017
 
 The feature can be implemented but to!() was rejected, so we need to come up
 with some alternative function names and put them somewhere other than
 std.conv. 
 
 Personally I don't see how people will be expected to find an obscure function
 name like 'codePointIdx'. This isn't related unicode representation at all,
 there should be no confusion with Unicode when it comes to representing 0-9,
 it's always the same regardless of encoding.

Well, that's why we have std.ascii, no? For all char operations when we don't
care about unicode.

In all fairness, unicode defines "is numeric" (which we already have) and
"numeric value" (which we *should* have).


one taking chars, and another taking int (dchar)
http://msdn.microsoft.com/en-us/library/system.char.getnumericvalue.aspx
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html

I'd say we should just add:
std.ascii.getNumericValue
std.uni.getNumericValue
(or plain numericValue)

I already wrote the ascii version (easy as pie), and support for the [Nd]
group, using a binary search, followed by an offset from the lower bound.

[Nl] and [Po] require a straight up mapping of codepoint to value, but I'm
still writing the parser that extract the data for the raw UCD
(http://www.unicode.org/Public/6.2.0/ucdxml/).

The file is too large for std.xml to handle, so it's back to C++ for me :/

The only questions I have is:
Return value: int or double?
Input is not numeric: -1 or exception?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




07:08:12 PST ---

 Well, that's why we have std.ascii, no? For all char operations when we don't
 care about unicode.
 
 In all fairness, unicode defines "is numeric" (which we already have) and
 "numeric value" (which we *should* have).

Damn Unicode, why does it need to have 10 different ways to represent
something? :)

 The only questions I have is:
 Return value: int or double?

int, because int is implicitly convertible to double, not vice-versa. At least
for the ascii part, if Unicode has code points that represent floating-point
values.. then I really don't understand what Unicode is about anymore.

 Input is not numeric: -1 or exception?

Hmm.. although exceptions are preferred I think for performance reasons we
might consider using -1.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh gmail.com



07:17:53 PST ---
Java even implements
 one taking chars, and another taking int (dchar)

That's because Java folks used to have only 16bit chars. Now true codepoints
are going in form of 'int'.

 http://msdn.microsoft.com/en-us/library/system.char.getnumericvalue.aspx
 http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html
 
 I'd say we should just add:
 std.ascii.getNumericValue
 std.uni.getNumericValue
 (or plain numericValue)
 

Agreed and the name should be numericValue.

 I already wrote the ascii version (easy as pie), and support for the [Nd]
 group, using a binary search, followed by an offset from the lower bound.
 
 [Nl] and [Po] require a straight up mapping of codepoint to value, but I'm
 still writing the parser that extract the data for the raw UCD
 (http://www.unicode.org/Public/6.2.0/ucdxml/).
 

I'm wrapping up a revamp of std.uni that makes it piece of cake to create
character sets. And maps are converted to multi-staged tables that are faster
the binary search on a large set. I'd suggest to wait a bit on it (so as to not
duplicate work) and introduce only std.ascii version as the most useful.

The ongoing polishing, fixing and testing against ICU is going on here:
https://github.com/blackwhale/gsoc-bench-2012

 The file is too large for std.xml to handle, so it's back to C++ for me :/
 

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

Same thing but no useless XML trash. Description of fields is somewhere in the
middle of this document 
http://www.unicode.org/reports/tr44/

 The only questions I have is:
 Return value: int or double?

Should be rational to acurately represent things like "1/5" character ;)
I do suspect some simple custom type could do (2 shorts packed in one struct
etc.).

 Input is not numeric: -1 or exception?

-1 is fine I think as this rather low level (per character) and it's not at all
convenient to throw (and then catch).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




07:26:08 PST ---
Ok I think there are two enhancements here, one for the simple ascii int->char,
char->int, and the other more complicated Unicode implementation which
monarch/dmitry know more about.

I think we should split up the Unicode enhancement into a new bugzilla entry
since the ASCII one can be implemented right now so this issue can be closed
soon.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543


hsteoh quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh quickfur.ath.cx



It would be nice to have a separate issue filed for tracking Unicode support
progress. It can maybe include things like issue 9173 too.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




07:32:31 PST ---

 It would be nice to have a separate issue filed for tracking Unicode support
 progress. It can maybe include things like issue 9173 too.

Reporters could add "Unicode" into the Keywords box for these types of issues
so we can filter them out.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




 Ok I think there are two enhancements here, one for the simple ascii int->char,
 char->int, and the other more complicated Unicode implementation which
 monarch/dmitry know more about.
 
 I think we should split up the Unicode enhancement into a new bugzilla entry
 since the ASCII one can be implemented right now so this issue can be closed
 soon.

I'm a bit too busy to do the actual pull, but I wrote code, doc and test for
this already.

//----
/++
    If $(D c) is an ASCII digit, returns the
    corresponding numeric value. Returns -1 otherwise.
  +/
int numericValue(dchar c)  safe pure nothrow
{
    return ('0' <= c && c <= '9') ? (c - '0') : -1;
}
unittest
{
    int counter = 0;
    foreach (char c; 0 .. 80)
    {
        if (isDigit(c))
            assert(numericValue(c) == counter++);
        else
            assert(numericValue(c) == -1);
    }
}
//----

Not much, but there is never any reason to do the same work twice...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543





 
 I'm wrapping up a revamp of std.uni that makes it piece of cake to create
 character sets. And maps are converted to multi-staged tables that are faster
 the binary search on a large set. I'd suggest to wait a bit on it (so as to not
 duplicate work) and introduce only std.ascii version as the most useful.
 
 The ongoing polishing, fixing and testing against ICU is going on here:
 https://github.com/blackwhale/gsoc-bench-2012

OK: The thing I was having trouble though is that existing binary search
returns a bool, whereas I need the actual entry, so I can do "value -
entry[0]", eg:

//----
    static immutable dchar[2][] table1 = [
    [ 0x0030,  0x0039], //
    [ 0x0660,  0x0669], //ARABIC-INDIC
    [ 0x06F0,  0x06F9], //EXTENDED ARABIC-INDIC

...
//---
That's because all the entries in [Nd] are consecutive numerals starting at 0.
I can also cram a select couple of entries from [Nl] and [Po] that also use
this scheme.

So if I have the unicode 0x0665 (The ARABIC-INDIC numeral '6'), I'd want to
find [ 0x0660,  0x0669], and then "return 0x0665 - 0x0660".

Well, I don't need the entire pair, but at least the lhs of the pair.

If you could keep that in mind during your re-write. Or not. Just throwing it
out there.

For all other entries in [Nl] and [Po], I'd have:
    static immutable dchar[2][] table1 = [
    [ 0x261D,  100], //ROMAN NUMERAL ONE HUNDRED

So that's just basic dictionary. But I don't think you can statically allocate
an AA. So yeah, just throwing that your direction too.

 The file is too large for std.xml to handle, so it's back to C++ for me :/
 

 http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
 
 Same thing but no useless XML trash. Description of fields is somewhere in the
 middle of this document 
 http://www.unicode.org/reports/tr44/

Nice, TY.

 The only questions I have is:
 Return value: int or double?

 
 Should be rational to acurately represent things like "1/5" character ;)
 I do suspect some simple custom type could do (2 shorts packed in one struct
 etc.).
 
 Input is not numeric: -1 or exception?

 
 -1 is fine I think as this rather low level (per character) and it's not at all
 convenient to throw (and then catch).

The only issue I have with returning -1 is that it is a magic value. The fact
that there is no unicode for -1 is pure coincidence, and not by design. In
particular, any attempt to write "if (numericValue(c) < 0) fail" would also be
wrong because:
http://unicode.org/cldr/utility/character.jsp?a=0F33
The TIBETAN DIGIT HALF ZERO returns -0.5

Do we *really* want to standardize the syntax of "if (numericValue(c) < -0.7)"
?

...

Damn you unicode!

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




08:00:56 PST ---


 
 I'm wrapping up a revamp of std.uni that makes it piece of cake to create
 character sets. And maps are converted to multi-staged tables that are faster
 the binary search on a large set. I'd suggest to wait a bit on it (so as to not
 duplicate work) and introduce only std.ascii version as the most useful.
 
 The ongoing polishing, fixing and testing against ICU is going on here:
 https://github.com/blackwhale/gsoc-bench-2012

 
 OK: The thing I was having trouble though is that existing binary search
 returns a bool, whereas I need the actual entry, so I can do "value -
 entry[0]", eg:
 
 //----
     static immutable dchar[2][] table1 = [
     [ 0x0030,  0x0039], //
     [ 0x0660,  0x0669], //ARABIC-INDIC
     [ 0x06F0,  0x06F9], //EXTENDED ARABIC-INDIC
 
 ...
 //---
 That's because all the entries in [Nd] are consecutive numerals starting at 0.
 I can also cram a select couple of entries from [Nl] and [Po] that also use
 this scheme.
 

Sometimes I was able to abuse the natural format of data and sometimes failed.
But what proved to be quite good is varying sizes of multi-staged rable to
match "periods" of data. In the end if the data has a lot of common "rows" a
multi-staged table of certain size per stage is bound hit a sweet spot.

 So if I have the unicode 0x0665 (The ARABIC-INDIC numeral '6'), I'd want to
 find [ 0x0660,  0x0669], and then "return 0x0665 - 0x0660".
 
 Well, I don't need the entire pair, but at least the lhs of the pair.
 
 If you could keep that in mind during your re-write. Or not. Just throwing it
 out there.
 
 For all other entries in [Nl] and [Po], I'd have:
     static immutable dchar[2][] table1 = [
     [ 0x261D,  100], //ROMAN NUMERAL ONE HUNDRED
 
 So that's just basic dictionary. But I don't think you can statically allocate
 an AA. So yeah, just throwing that your direction too.
 

Well, AA is a fat pig w.r.t RAM usage. But thanks anyway.

 The file is too large for std.xml to handle, so it's back to C++ for me :/
 

 http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
 
 Same thing but no useless XML trash. Description of fields is somewhere in the
 middle of this document 
 http://www.unicode.org/reports/tr44/

 
 Nice, TY.
 
 The only questions I have is:
 Return value: int or double?

 
 Should be rational to acurately represent things like "1/5" character ;)
 I do suspect some simple custom type could do (2 shorts packed in one struct
 etc.).
 
 Input is not numeric: -1 or exception?

 
 -1 is fine I think as this rather low level (per character) and it's not at all
 convenient to throw (and then catch).

 
 The only issue I have with returning -1 is that it is a magic value. The fact
 that there is no unicode for -1 is pure coincidence, and not by design. In
 particular, any attempt to write "if (numericValue(c) < 0) fail" would also be
 wrong because:
 http://unicode.org/cldr/utility/character.jsp?a=0F33
 The TIBETAN DIGIT HALF ZERO returns -0.5
 
 Do we *really* want to standardize the syntax of "if (numericValue(c) < -0.7)"
 ?
 
 ...
 
 Damn you unicode!

Aye, and given there are things like "1e12" I don't think packing it would work
any better... some kind of custom type is required.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




08:04:19 PST ---

 int numericValue(dchar c)  safe pure nothrow

What about int->dchar?

We could call it toNumericChar or something, but it would probably have to
throw on invalid input? Or can we also return -1? E.g.

char toNumericChar(int i)  safe pure nothrow
{
    return cast(char)((0 <= i && i <= 9) ? (i + '0') : -1);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543






 int numericValue(dchar c)  safe pure nothrow

 
 What about int->dchar?
 
 We could call it toNumericChar or something, but it would probably have to
 throw on invalid input? Or can we also return -1? E.g.
 
 char toNumericChar(int i)  safe pure nothrow
 {
     return cast(char)((0 <= i && i <= 9) ? (i + '0') : -1);
 }

-1 is char.init, so seems good to me. Although I'd go and write it as
"char.init" explicitly in the code actually, so as to limit any possible
confusion.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543





 
 Aye, and given there are things like "1e12" I don't think packing it would work
 any better... some kind of custom type is required.

Really? According to:

http://unicode.org/cldr/utility/properties.jsp?a=Numeric_Value#Numeric_Value

They only go from
-0.5 // TIBETAN DIGIT HALF ZERO
to
1_000_000 // ROMAN NUMERAL ONE HUNDRED THOUSAND

So I figured though we were in the number plane where there is a perfect "int
<=> double" correlation. If this is not the case...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




Having functions in std.ascii (and elsewhere) seems acceptable. But I think the
name of such functions shouldn't be too much long.


to!int raises exceptions. Returning -1 in case of errors seems able to cause
some problems. One common use case for the char->int conversion:

auto s = "123x456";
auto digits = s.map!numericValue().array();

Now I have to scan digits again looking for any -1.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




10:10:37 PST ---

 Having functions in std.ascii (and elsewhere) seems acceptable. But I think the
 name of such functions shouldn't be too much long.
 
 
 to!int raises exceptions. Returning -1 in case of errors seems able to cause
 some problems. One common use case for the char->int conversion:
 
 auto s = "123x456";
 auto digits = s.map!numericValue().array();
 
 Now I have to scan digits again looking for any -1.

*But* you can wrap it inside a function which throws on -1 (pseudocode):

auto s = "123x456";
auto thr = (a) => a == -1 ? throw ConvException() : a;
auto digits = s.map!numericValue().array();

Whereas if it threw to begin with you're forced to catch exceptions.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543




10:20:15 PST ---


 
 Aye, and given there are things like "1e12" I don't think packing it would work
 any better... some kind of custom type is required.

 
 Really? According to:
 
 http://unicode.org/cldr/utility/properties.jsp?a=Numeric_Value#Numeric_Value
 
 They only go from
 -0.5 // TIBETAN DIGIT HALF ZERO
 to
 1_000_000 // ROMAN NUMERAL ONE HUNDRED THOUSAND
 
 So I figured though we were in the number plane where there is a perfect "int
 <=> double" correlation. If this is not the case...

You missed the nice and cool 1.0e12 !

http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AnumericValue%3D1.0E12%3A%5D&g=

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543






 Whereas if it threw to begin with you're forced to catch exceptions.

There is no perfect solution. Exceptions are safer than error codes because if
you forget to test for a negative result, your program stops. On the other hand
exceptions are less efficient, less handy to use in nothrow functions, and
often require some try-catch wrapping.

In this enhancement request I was originally asking for an overload of to!(),
this means a solution that throws exceptions when the input is wrong.

Efficiency is not a significant problem for me here because where I need to
convert char digits to numerical digits with max efficientcy I use a '0'
subtraction (or a vectorized version of it). So with this overload of to!() I
was looking for safety.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543







 
 Aye, and given there are things like "1e12" I don't think packing it would work
 any better... some kind of custom type is required.

 
 Really? According to:
 
 http://unicode.org/cldr/utility/properties.jsp?a=Numeric_Value#Numeric_Value
 
 They only go from
 -0.5 // TIBETAN DIGIT HALF ZERO
 to
 1_000_000 // ROMAN NUMERAL ONE HUNDRED THOUSAND
 
 So I figured though we were in the number plane where there is a perfect "int
 <=> double" correlation. If this is not the case...

 
 You missed the nice and cool 1.0e12 !
 
 http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AnumericValue%3D1.0E12%3A%5D&g=

Well, that still fits in both a long, and in a double with no loss, so we're
still good. Crisis averted.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5543






 
 Whereas if it threw to begin with you're forced to catch exceptions.

 
 There is no perfect solution. Exceptions are safer than error codes because if
 you forget to test for a negative result, your program stops. On the other hand
 exceptions are less efficient, less handy to use in nothrow functions, and
 often require some try-catch wrapping.
 
 In this enhancement request I was originally asking for an overload of to!(),
 this means a solution that throws exceptions when the input is wrong.
 
 Efficiency is not a significant problem for me here because where I need to
 convert char digits to numerical digits with max efficientcy I use a '0'
 subtraction (or a vectorized version of it). So with this overload of to!() I
 was looking for safety.

I think a good solution is to accept having different semantics:

std.ascii.numericValue:
int numericValue(dchar c) safe nothrow pure; returns -1 on failure

std.uni.numericValue:
double numericValue(dchar c) safe pure; Throws on failure

If you are doing anything with unicode, the exception's overhead will be mostly
moot compared to the cost (I think). When operating with ASCII, then it's a
different ballgame (IMO).

That's my opinion anyways.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Dec 21 2012

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 5543] New: to!int to see a char as a single-char string