www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9821] New: Smarter conversion of strings to enums

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9821

           Summary: Smarter conversion of strings to enums
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: jared economicmodeling.com



13:01:24 PDT ---
Currently std.conv.to requires a string to be the *member name* of an enum in
order to convert it. However, a standard use case when sharing data is to
serialize enum variables using the underlying type of the enum, since different
programs should not be expected to use the same enum naming scheme internally.
The std.conv.to template currently cannot handle such a conversion (from string
version of underlying type to the enum type). This also requires a workaround
going the other direction (i.e., converting enum values to strings). In order
to serialize data in a portable manner, you shouldn't emit enum values as the
string representation of the symbols used in the source code.

This is a significant annoyance that surfaces in std.csv.csvReader, which
requires all data going into an enum to be serialized as the enum member name,
not the string representation of its underlying type.

Example and my current workaround:

-------------
import std.algorithm, std.conv, std.stdio, std.string, std.traits;

enum MyEnum {
    Foo = 1,
    Baz = 7
}

void main()
{
    writeln( to!MyEnum(7) );              // ok.
    writeln( to!MyEnum("Baz") );          // ok.
    try {
        writeln( to!MyEnum("7") );        // throws
    }
    catch(ConvException  e) {
        writeln( e.msg );
    }

    writeln( strToEnum!MyEnum("7") );     // ok.
    writeln( strToEnum!MyEnum("Baz") );   // ok.
}

/*
 * Current workaround for a smarter conversion.
 */
E strToEnum(E, S)(S str)
    if(is(E == enum) && isSomeString!S)
{
    if(countUntil([__traits(allMembers,E)], str) > -1)
        return to!E(str);
    else {
        auto underlyingValue = to!(OriginalType!E)(str);
        if(countUntil([EnumMembers!E], underlyingValue) > -1)
            return cast(E)(underlyingValue);
        else
            throw new ConvException(format(
                    "Value '%s' cannot be converted to enum %s",
                    underlyingValue, E.stringof));            
    }
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 26 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9821


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich gmail.com



14:01:11 PDT ---
It would have to become a new function, not std.conv.to, see
https://github.com/D-Programming-Language/phobos/pull/897

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 26 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9821




16:43:27 PDT ---
Perhaps I'm being naive, but why not modify the current string-to-enum parse()
overload so that it (1) first tries to convert using enum member names, as it
currently does, and *only if* that fails, then (2) tries to convert the string
to the enum base type?

The current functionality would remain the same, except that the conversion
would succeed instead of failing in those cases where a string => base type =>
enum conversion is possible. (Sorry if this isn't the formal way of submitting
a patch; it's more of an explanation of what I mean.)

2126,2129c2126,2148
< 
<     throw new ConvException(
<         Target.stringof ~ " does not have a member named '"
<         ~ to!string(s) ~ "'");
---
     else
     {
         OriginalType!Target baseVal;
         try {
             baseVal = to!(OriginalType!Target)(s);
         }
         catch(ConvException e) {
             throw new ConvException(
                 "'" ~ to!string(s) ~ "' is not a member name of " 
                 ~ Target.stringof ~ " and is not convertible to "
                 ~ (OriginalType!Target).stringof );
         }
         if(countUntil([EnumMembers!Target], baseVal) != -1)
         {
             return cast(Target)(baseVal);
         }
         else 
         {
             throw new ConvException(
                 "'" ~ to!string(s) ~ "' is not a member name or value of "
                 ~ Target.stringof);
         }
     }
If this is not desirable, I would be okay with closing this issue and filing one for std.csv.csvReader to work around it, since that's mainly where it really causes problems (and possibly in other deserialization code like std.json too). At my workplace we've hit the same issue with reading from databases and have a special case to handle enums. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 26 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9821


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc




 Perhaps I'm being naive, but why not modify the current string-to-enum parse()
 overload so that it (1) first tries to convert using enum member names, as it
 currently does, and *only if* that fails, then (2) tries to convert the string
 to the enum base type?
This is a bad idea. It's much better to keep the semantics tidy, to avoid troubles down the line. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 26 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9821




17:36:17 PDT ---
 This is a bad idea. It's much better to keep the semantics tidy, to avoid
 troubles down the line.
Sure, I understand if there's reluctance to change the meaning of to() in this case. It's just unfortunate that for enums the original decision makes it harder to work with real-world data (I admit it's nice to read on-screen). But there should be at least some alternative function for converting strings to enums using the base type, and it should be a standard option for any (de)serialization code in Phobos. Considering how frequently string serialization comes up in the real world, it will be well worth it. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 26 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9821






 Sure, I understand if there's reluctance to change the meaning of to() in this
 case.
What I meant to say is that this semantics is bad:
 first tries to convert using enum member names, as it
 currently does, and *only if* that fails, then (2) tries to convert the string
 to the enum base type?
You try a conversion, and if it fails, then you _stop_. Otherwise you are going into a swamp.
 But there should be at least some alternative function for converting strings
 to enums using the base type,
I think I have asked for such function in another Bugzilla issue. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 26 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9821


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg gmx.com



PDT ---
 But there should be at least some alternative function for converting strings
to enums using the base type Then just use to with the base type. And if you're dealing with generic code or don't want to hard code what the base type is, then use std.traits.OriginalType: to!(OriginalType!MyEnum)(str); -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 26 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9821




12:11:26 PDT ---

 Then just use to with the base type. And if you're dealing with generic code or
 don't want to hard code what the base type is, then use
 std.traits.OriginalType:
 
 to!(OriginalType!MyEnum)(str);
Right, that gets me to the base type. Eventually I want the enum type, so at minimum I need to do: // Naive, unless I check that the value is a member MyEnum e = cast(MyEnum)(to!(OriginalType!MyEnum)(str)); // Strict MyEnum e = to!MyEnum(to!(OriginalType!MyEnum)(str)); So, it's not hard, it's just always a special case. Some "expected" things don't work: MyEnum { Foo = 1, Bar = 7 } MyEnum e2 = to!MyEnum("7"); // throws readf("%d", &e1); // error: no matching unformatValue readf("%d", &cast(int)e1); // ok, works like C, avoids proper checks writef("%d", e1); // ok Are my expectations just odd? Very possible :) I deal with a lot of data munging and this little enum quirk ends up requiring special handling in every bit of generic read/write code, unless I just ban enums entirely from all data conversion code. C/C++ always treat them as ints and D always treats them as member-name strings in std.conv -- you see the potential incompatibility. I realize D has to deal with more complexity since more base types are allowed, but it does break with tradition for integral base types. (like D), but Enum.Parse takes strings representing *either* member name or value. there could be a toImpl overload that's restricted to integral enums and takes a required flag for "by name" or "by value". Also it'd be handy to have enum overloads of unformatValue. Mainly what I don't like to see is the the default member-name conversion creeping into other components like csvReader. So I guess what I'm looking for is either: (1) "No, we've thought it over and enums, regardless of base type, should nearly always be (de)serialized by member name -- doing it by value is a rare use case so you should write your own wrappers for anything that uses std.conv", or (2) "Yes, parsing string/char to integral enums by value is common enough that components in Phobos should offer that option where appropriate." Thanks and I hope I've made my issue clearer. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 27 2013