digitalmars.D - unFormat marginally complete

Sean Kelly (52/52) Jul 29 2004 http://home.f4.ca/sean/d/unformat.d

Sean Kelly (4/5) Aug 04 2004 I just realized I'd misread a part of the scanf spec. I've fixed the

pragma (15/19) Aug 05 2004 Looks pretty useful. I like it. I haven't had a chance to run with it ...

Sean Kelly (9/29) Aug 05 2004 Everything is done internally in terms of dchars, so hopefully the funct...

Arcane Jill (7/12) Aug 05 2004 Nope, whitespace is locale independent. You only have to import

Sean Kelly (19/29) Aug 06 2004 By the way. I like that doFormat doesn't require a format string at all...

Sean Kelly <sean f4.ca> writes:

http://home.f4.ca/sean/d/unformat.d

The D compiler is currently a bit weird with templates and stdarg so to 
use unformat.d in 0.97 you have to compile in std.format.d as well.  If 
anyone feels inclined to play with it, please let me know if sutff is 
broken, you'd like the exceptions to match doFormat, etc.


Prototypes:

int unFormat( bit delegate( out dchar ) getc,
               bit delegate( dchar ) ungetc,
               TypeInfo[] arguments,
               void* argptr );
int sreadf( ... ); // first va_arg is string, second is format
int freadf( FILE* buf, ... ); // first va_arg is format
int readf( ... ); // first va_arg is format (console input)


Ways in which unFormat differs from vscanf (and possibly doFormat):

- The format string can be either UTF-8, UTF-16, or UTF-32.
- If there is a mismatch between the arguments and the format 
specification, the function will return and will not evaluate the rest 
of the format string.
- unFormat will return prematurely on an input failure (if get returns 
false), an argument mismatch, or a UTF conversion error.  UtfError 
exceptions will not be passed out of the function.


For reference, the conversion specifiers are:

d, u: An optionally signed decimal integer.
i: An optionally signed integer.  Base can be decimal, hex, or octal and
    will be detected automatically.  If the input is preceded by 0x or 0X
    then the number will be interpreted as hex.  If the input is preceded
    only by 0 then the number will be interpreted as octal.  Any other
    value will be interpreted as decimal.
o: An optionally signed octal integer.
x, X: An optionally signed hex integer.
a, e, f, g
A, E, F, G: An optionally signed floating point number, infinity,
             or NaN.
   Examples:   1
               -5.6
               1.2e5
               0x3p-2
               0X1234
               NAN
               INF
               infinity
c: A single UTF-32 character, or sequence of characters if the width
    modifier is present.
s: A sequence of non-whitespace characters.
[: Defines a scanset.  Contents can be single characters or a range
    indicated by a hyphen.
    Examples:   [a-z]    indicates the set of numeric values between a
                         and z, inclusive.
                [abc123] indicates the characters a, b, c, 1, 2, and 3.
p: A pointer in hex format without the leading 0x.
n: Returns the number of UTF-32 characters read from the input stream.
%: Matches a single % character.

Jul 29 2004

Sean Kelly <sean f4.ca> writes:

Sean Kelly wrote:
 http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec.  I've fixed the 
code and re-uploaded it with another unit test.


Sean

Aug 04 2004

pragma <EricAnderton at yahoo dot com> <pragma_member pathlink.com> writes:

In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...
Sean Kelly wrote:
 http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec.  I've fixed the 
code and re-uploaded it with another unit test.

Looks pretty useful.  I like it.  I haven't had a chance to run with it myself,
so I'll have to ask: do you have any provisions for reading or handling
whitespace?

One critique though: why check all your exception instances (Underflow, BadFmt,
etc) for each call of unFormat?  You can set all these up ahead of time in a
static block outside your function, without breaking encapsulation too badly.







That way you can prevent redundant allocations (which you've already done) plus
eliminate all those extra "if" statements. :)

- Pragma

Aug 05 2004

Sean Kelly <sean f4.ca> writes:

In article <cetftt$1nd2$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...
Sean Kelly wrote:
 http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec.  I've fixed the 
code and re-uploaded it with another unit test.

Looks pretty useful.  I like it.  I haven't had a chance to run with it myself,
so I'll have to ask: do you have any provisions for reading or handling
whitespace?

Everything is done internally in terms of dchars, so hopefully the functions
will be able to correctly recognize all whitespace chars.  I know there may also
be some locale dependent whitespace sequences (Jill?) but as D doesn't have any
concept of locales yet, that will have to wait.

One critique though: why check all your exception instances (Underflow, BadFmt,
etc) for each call of unFormat?  You can set all these up ahead of time in a
static block outside your function, without breaking encapsulation too badly.







That way you can prevent redundant allocations (which you've already done) plus
eliminate all those extra "if" statements. :)

Good point.  I think I'm still in a C++ mindset as far as statics are concerned.
I'll make this change today :)


Sean

Aug 05 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ceti16$1oa9$1 digitaldaemon.com>, Sean Kelly says...

Everything is done internally in terms of dchars, so hopefully the functions
will be able to correctly recognize all whitespace chars.  I know there may also
be some locale dependent whitespace sequences (Jill?)

Nope, whitespace is locale independent. You only have to import
etc.unicode.unicode and call isWhitespace(dchar). But I'd suggest waiting until
next week because I'm planning to finally get the linkable library + header
files together this weekend, which will make things somewhat easier for you.


but as D doesn't have any
concept of locales yet, that will have to wait.

It will have soon, but as I said, it's not relevant to whitespace.

Arcane Jill

Aug 05 2004

Sean Kelly <sean f4.ca> writes:

In article <cetftt$1nd2$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...
Sean Kelly wrote:
 http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec.  I've fixed the 
code and re-uploaded it with another unit test.

Looks pretty useful.  I like it.  I haven't had a chance to run with it myself,
so I'll have to ask: do you have any provisions for reading or handling
whitespace?

By the way.  I like that doFormat doesn't require a format string at all.  Since
I was working off the scanf spec I didn't do anything about that with unFormat.
I assume that doFormat can handle things like this:

doFormat( &get, "hello world", 1, "%d", 2 );

and would print:

hello world12

I suppose the equivalent bit for unFormat would be:

char[] buf;
int x, y;
float f;
unFormat( &get, &unget, &buf, &x, "%2d", &y, &f );

which would read a string, an integer, an int with width 2, and a float.  The
only thing I don't know offhand is if I can tell a char** from a char* using
TypeInfo (for %p).  In any case, would people like this syntax rather than
having to specify a format string?  I think I may start on it today just to see
how it goes.


Sean

Aug 06 2004

D Programming

C/C++ Programming

Other

digitalmars.D - unFormat marginally complete