www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Add duration parsing to core.time?

reply Justin Whear <justin economicmodeling.com> writes:
While working on a configuration file parser, I found myself trying to 
decide which units to use for various time variables (e.g. 
`expireInterval`) which is silly because we have an excellent Duration 
structure in core.time.  I was pleased to discover that Duration has a 
toString method which prints a nice, human-readable description.  
Unfortunately, there appears to be no corresponding parse method.  Turns 
out that it's surprisingly easy to write thanks to the existing 
functionality in std.conv: http://dpaste.dzfl.pl/1500b834

It appears that DPaste stumbles over the unicode 'μs' in the units enum, 
so here's a test invocation and output:

$ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 minutes' 
'1w2d20m12h5m2s'
12 hours and 30 minutes
1 week, 2 days, 12 hours, 25 minutes, and 2 secs

I've made the implementation more flexible than simply parsing the very 
standard output of Duration.toString by adding more unit synonyms and 
making whitespace, commas, and 'and' optional.  All this really requires 
is a sequence of digits followed by a unit name, possibly repeating;  
leading to the very compact form used in '1w2d20m12h5m2s'.
All validation is performed by the two calls to std.conv.parse, so 
invalid strings should fail (e.g. 'four madeupunits').

One possible improvement is to support written-out numbers such as 
"seven" and "forty-two", but I suspect this would entail a much more 
involved implementation.

Thoughts on including something like this core.time?  My thought is that 
Duration could have a `this(string)` with a non-consuming version of this 
function for automatic to! support in addition to providing parse.

Justin
Aug 20 2013
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, August 20, 2013 17:57:19 Justin Whear wrote:
 While working on a configuration file parser, I found myself trying to
 decide which units to use for various time variables (e.g.
 `expireInterval`) which is silly because we have an excellent Duration
 structure in core.time. I was pleased to discover that Duration has a
 toString method which prints a nice, human-readable description.
 Unfortunately, there appears to be no corresponding parse method. Turns
 out that it's surprisingly easy to write thanks to the existing
 functionality in std.conv: http://dpaste.dzfl.pl/1500b834
 
 It appears that DPaste stumbles over the unicode 'μs' in the units enum,
 so here's a test invocation and output:
 
 $ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 minutes'
 '1w2d20m12h5m2s'
 12 hours and 30 minutes
 1 week, 2 days, 12 hours, 25 minutes, and 2 secs
 
 I've made the implementation more flexible than simply parsing the very
 standard output of Duration.toString by adding more unit synonyms and
 making whitespace, commas, and 'and' optional. All this really requires
 is a sequence of digits followed by a unit name, possibly repeating;
 leading to the very compact form used in '1w2d20m12h5m2s'.
 All validation is performed by the two calls to std.conv.parse, so
 invalid strings should fail (e.g. 'four madeupunits').
 
 One possible improvement is to support written-out numbers such as
 "seven" and "forty-two", but I suspect this would entail a much more
 involved implementation.
 
 Thoughts on including something like this core.time? My thought is that
 Duration could have a `this(string)` with a non-consuming version of this
 function for automatic to! support in addition to providing parse.

If such a function were added, it would be fromString on Duration, and it would accept the exact format that toString uses (and only that format). Anything more complicated would have to be part of a functionality relating to user-defined format strings, which I haven't finished yet. That'll probably end up in std.datetime.format at some point after I've finished splitting std.datetime. - Jonathan M Davis
Aug 20 2013
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, August 20, 2013 15:35:20 Jonathan M Davis wrote:
 On Tuesday, August 20, 2013 17:57:19 Justin Whear wrote:
 While working on a configuration file parser, I found myself trying=


 decide which units to use for various time variables (e.g.
 `expireInterval`) which is silly because we have an excellent Durat=


 structure in core.time. I was pleased to discover that Duration has=


 toString method which prints a nice, human-readable description.
 Unfortunately, there appears to be no corresponding parse method. T=


 out that it's surprisingly easy to write thanks to the existing
 functionality in std.conv: http://dpaste.dzfl.pl/1500b834
=20
 It appears that DPaste stumbles over the unicode '=CE=BCs' in the u=


 so here's a test invocation and output:
=20
 $ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 mi=


 '1w2d20m12h5m2s'
 12 hours and 30 minutes
 1 week, 2 days, 12 hours, 25 minutes, and 2 secs
=20
 I've made the implementation more flexible than simply parsing the =


 standard output of Duration.toString by adding more unit synonyms a=


 making whitespace, commas, and 'and' optional. All this really requ=


 is a sequence of digits followed by a unit name, possibly repeating=


 leading to the very compact form used in '1w2d20m12h5m2s'.
 All validation is performed by the two calls to std.conv.parse, so
 invalid strings should fail (e.g. 'four madeupunits').
=20
 One possible improvement is to support written-out numbers such as
 "seven" and "forty-two", but I suspect this would entail a much mor=


 involved implementation.
=20
 Thoughts on including something like this core.time? My thought is =


 Duration could have a `this(string)` with a non-consuming version o=


 function for automatic to! support in addition to providing parse.

If such a function were added, it would be fromString on Duration, an=

 would accept the exact format that toString uses (and only that forma=

 Anything more complicated would have to be part of a functionality re=

 to user-defined format strings, which I haven't finished yet. That'll=

 probably end up in std.datetime.format at some point after I've finis=

 splitting std.datetime.

And actually, I really don't like the idea of adding a function for par= sing=20 the result of Duration's toString. Duration's toString was intended for= human=20 legibility, not for being written out and the read in again. std.dateti= me has=20 several to*String functions with corresponding from*String functions, b= ut=20 they're all in standard formats, whereas Duration's toString is not. So= , if=20 any kind of from*String is going to be added to Duration, then a standa= rd=20 format needs to be used and a corresponding to*String function created.= There=20 are several standard formats for dates and times, so I assume that ther= e's one=20 for durations as well, but I'd have to look into it. Preferably somethi= ng from=20 ISO 8601 would be used if it has a standard string format for durations= , since=20 that's the main ISO standard for time-related stuff. In general, I'm very much opposed to functions which try and parse arbi= trary=20 strings as they're incredibly error-prone and have to guess at what you= mean.=20 In pretty much any case where the string was emitted by a computer in t= he first=20 place rather than a human, that's just plain sloppy, and ideally, a hum= an=20 would be required to put a string in a standard format when inputting i= t (or=20 input the values separately rather than as a string) in order to avoid=20= intepretation errors. - Jonathan M Davis
Aug 20 2013
prev sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Wednesday, 21 August 2013 at 06:46:49 UTC, Jonathan M Davis 
wrote:
 In general, I'm very much opposed to functions which try and 
 parse arbitrary
 strings as they're incredibly error-prone and have to guess at 
 what you mean.
 In pretty much any case where the string was emitted by a 
 computer in the first
 place rather than a human, that's just plain sloppy, and 
 ideally, a human
 would be required to put a string in a standard format when 
 inputting it (or
 input the values separately rather than as a string) in order 
 to avoid
 intepretation errors.

 - Jonathan M Davis

I agree completely and can speak from experience. We used wxWidget's wxDateTime class for years at work and its ParseDateTime which allows free format strings. It was a source of never ending problems for us until we finally stopped using it. The implementation was fine, it's just that dates are not amenable to unstructured reading. Date strings with locale information embedded in them may be doable but they are basically nonexistent. Date strings are a lot like string encodings. They are unsafe to use without knowing a definitive format/encoding.
Aug 21 2013