www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - unFormat marginally complete

reply Sean Kelly <sean f4.ca> writes:
http://home.f4.ca/sean/d/unformat.d

The D compiler is currently a bit weird with templates and stdarg so to 
use unformat.d in 0.97 you have to compile in std.format.d as well.  If 
anyone feels inclined to play with it, please let me know if sutff is 
broken, you'd like the exceptions to match doFormat, etc.


Prototypes:

int unFormat( bit delegate( out dchar ) getc,
               bit delegate( dchar ) ungetc,
               TypeInfo[] arguments,
               void* argptr );
int sreadf( ... ); // first va_arg is string, second is format
int freadf( FILE* buf, ... ); // first va_arg is format
int readf( ... ); // first va_arg is format (console input)


Ways in which unFormat differs from vscanf (and possibly doFormat):

- The format string can be either UTF-8, UTF-16, or UTF-32.
- If there is a mismatch between the arguments and the format 
specification, the function will return and will not evaluate the rest 
of the format string.
- unFormat will return prematurely on an input failure (if get returns 
false), an argument mismatch, or a UTF conversion error.  UtfError 
exceptions will not be passed out of the function.


For reference, the conversion specifiers are:

d, u: An optionally signed decimal integer.
i: An optionally signed integer.  Base can be decimal, hex, or octal and
    will be detected automatically.  If the input is preceded by 0x or 0X
    then the number will be interpreted as hex.  If the input is preceded
    only by 0 then the number will be interpreted as octal.  Any other
    value will be interpreted as decimal.
o: An optionally signed octal integer.
x, X: An optionally signed hex integer.
a, e, f, g
A, E, F, G: An optionally signed floating point number, infinity,
             or NaN.
   Examples:   1
               -5.6
               1.2e5
               0x3p-2
               0X1234
               NAN
               INF
               infinity
c: A single UTF-32 character, or sequence of characters if the width
    modifier is present.
s: A sequence of non-whitespace characters.
[: Defines a scanset.  Contents can be single characters or a range
    indicated by a hyphen.
    Examples:   [a-z]    indicates the set of numeric values between a
                         and z, inclusive.
                [abc123] indicates the characters a, b, c, 1, 2, and 3.
p: A pointer in hex format without the leading 0x.
n: Returns the number of UTF-32 characters read from the input stream.
%: Matches a single % character.
Jul 29 2004
parent reply Sean Kelly <sean f4.ca> writes:
Sean Kelly wrote:
 http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec. I've fixed the code and re-uploaded it with another unit test. Sean
Aug 04 2004
parent reply pragma <EricAnderton at yahoo dot com> <pragma_member pathlink.com> writes:
In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...
Sean Kelly wrote:
 http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec. I've fixed the code and re-uploaded it with another unit test.

Looks pretty useful. I like it. I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace? One critique though: why check all your exception instances (Underflow, BadFmt, etc) for each call of unFormat? You can set all these up ahead of time in a static block outside your function, without breaking encapsulation too badly. # private class Underflow: Exception{ this(){ super("Underflow"); }} # private static Underflow underflow; # static this(){ # underflow = new Underflow(); # } That way you can prevent redundant allocations (which you've already done) plus eliminate all those extra "if" statements. :) - Pragma
Aug 05 2004
next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <cetftt$1nd2$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...
Sean Kelly wrote:
 http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec. I've fixed the code and re-uploaded it with another unit test.

Looks pretty useful. I like it. I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace?

Everything is done internally in terms of dchars, so hopefully the functions will be able to correctly recognize all whitespace chars. I know there may also be some locale dependent whitespace sequences (Jill?) but as D doesn't have any concept of locales yet, that will have to wait.
One critique though: why check all your exception instances (Underflow, BadFmt,
etc) for each call of unFormat?  You can set all these up ahead of time in a
static block outside your function, without breaking encapsulation too badly.

# private class Underflow: Exception{ this(){ super("Underflow"); }}
# private static Underflow    underflow;
# static this(){
#     underflow = new Underflow();
# }

That way you can prevent redundant allocations (which you've already done) plus
eliminate all those extra "if" statements. :)

Good point. I think I'm still in a C++ mindset as far as statics are concerned. I'll make this change today :) Sean
Aug 05 2004
parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <ceti16$1oa9$1 digitaldaemon.com>, Sean Kelly says...

Everything is done internally in terms of dchars, so hopefully the functions
will be able to correctly recognize all whitespace chars.  I know there may also
be some locale dependent whitespace sequences (Jill?)

Nope, whitespace is locale independent. You only have to import etc.unicode.unicode and call isWhitespace(dchar). But I'd suggest waiting until next week because I'm planning to finally get the linkable library + header files together this weekend, which will make things somewhat easier for you.
but as D doesn't have any
concept of locales yet, that will have to wait.

It will have soon, but as I said, it's not relevant to whitespace. Arcane Jill
Aug 05 2004
prev sibling parent Sean Kelly <sean f4.ca> writes:
In article <cetftt$1nd2$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...
Sean Kelly wrote:
 http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec. I've fixed the code and re-uploaded it with another unit test.

Looks pretty useful. I like it. I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace?

By the way. I like that doFormat doesn't require a format string at all. Since I was working off the scanf spec I didn't do anything about that with unFormat. I assume that doFormat can handle things like this: doFormat( &get, "hello world", 1, "%d", 2 ); and would print: hello world12 I suppose the equivalent bit for unFormat would be: char[] buf; int x, y; float f; unFormat( &get, &unget, &buf, &x, "%2d", &y, &f ); which would read a string, an integer, an int with width 2, and a float. The only thing I don't know offhand is if I can tell a char** from a char* using TypeInfo (for %p). In any case, would people like this syntax rather than having to specify a format string? I think I may start on it today just to see how it goes. Sean
Aug 06 2004