D - new streams

Pavel Minayev (48/48) May 10 2002 You can find the new stream module at my site, http://int19h.tamb.ru.

Walter (10/18) May 10 2002 Cool!

Pavel Minayev (5/11) May 10 2002 That's exactly what I did in the new version. It requires ungetc()
Russ Lewis (15/21) May 10 2002 This has caused me some HUGE headaches doing streaming on UNIX boxes. A...

Pavel Minayev (4/7) May 10 2002 on

Russ Lewis (10/17) May 10 2002 Ungetc doesn't help the problem I was talking about. If you do lookahea...

Andrew Feldstein (13/32) May 10 2002 I agree that Russ's way is better, but it is still not ideal. The user ...

Pavel Minayev (18/30) May 10 2002 should
Walter (14/26) May 10 2002 should

Robert W. Cunningham (8/22) May 10 2002 On serial device drivers I've written, and on at least one of the many R...

Walter (6/12) May 10 2002 until

Burton Radons (75/98) May 10 2002 I think we should get the scanf and fmt format codes aligned. My

Pavel Minayev (16/35) May 10 2002 Agreed, but I think we should first have Walter to agree with this,

Burton Radons (17/52) May 10 2002 Sure, but I'm leaving the option open to kick his ass if he decides to

Pavel Minayev (4/8) May 10 2002 Hmm.. I always thought that "set" is a short form of "just set that

OddesE (9/17) May 11 2002 LOL :)

Walter (8/17) May 10 2002 But that means I have to think about it . In any case, I think it is ...

Pavel Minayev (4/7) May 10 2002 just

Walter (6/13) May 11 2002 up

Martin M. Pedersen (11/19) May 11 2002 Hi,
Walter (9/13) May 11 2002 I've been thinking about the problem of writing wide characters vs ascii

"Pavel Minayev" <evilone omen.ru> writes:

You can find the new stream module at my site, http://int19h.tamb.ru.

Far the most interesting addition is scanf(). What is even better,
it can read D strings!

    char[] s;
    stdin.scanf("%.*s", &s);

Yes, this really works! Of course, you can still read C strings
(%s), but who needs it anymore? Note, however, that scanf wasn't
tested much, so it might contain bugs. Be careful!

Going further, readLine() has learnt to read lines terminated with
a single CR (aka It Came From Mac). And writeLine() now follows
Windows conventions, and writes CR/LF terminated lines. On Linux,
it should write a single LF, and a CR on Mac, whenever D gets
there - a bit of underlying platform transparency.

Unicode strings now work - no, really! =) readStringW(),
writeStringW(), readLineW(), and writeLineW() do their job
not any worse then their ANSI counterparts.

Generic read() and write() can now handle strings as well. Unlike
readString() and writeString(), these also store the length in
the stream:

    char[] s;
    ...
    file.write(s);   // writes s.length, followed by s
    ...
    file.read(s);    // reads length, then string of that length

Two new functions: getc() and ungetc(). I guess you know what
are these for. =) They also have Unicode versions, getcw() and
ungetcw().

Enumerations changed names again:

    enum SeekPos
    {
        Set,
        Current,
        End
    }

    enum FileMode
    {
        In,
        Out
    }

Now these have proper case, and should be more consistent to other
Phobos modules.

And finally, the module is NO LONGER DEPENDANT on my windows.d import,
and can be used with the one that comes with D. Thus, it can easily
replace the old and outdated stream module in Phobos.

By the way, Walter, could you pleeease replace the old version, that
you'd put into Phobos, with this new one? It's much better, has
less bugs, and since it is now self-sufficient (no need for my crappy
win32 import module), it should be easy to do...

May 10 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:abglum$2kij$1 digitaldaemon.com...
 You can find the new stream module at my site, http://int19h.tamb.ru.

Cool!

 Going further, readLine() has learnt to read lines terminated with
 a single CR (aka It Came From Mac). And writeLine() now follows
 Windows conventions, and writes CR/LF terminated lines. On Linux,
 it should write a single LF, and a CR on Mac, whenever D gets
 there - a bit of underlying platform transparency.

What I do is treat as "newline" any of the following:

1) CR
2) CR LF
3) LF

It requires a bit of lookahead to distinguish case 1 from case 2, but it
works with files generated by Windows, linux, and Mac.

 By the way, Walter, could you pleeease replace the old version, that
 you'd put into Phobos, with this new one?

Sure!

May 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:abgshp$2qbp$1 digitaldaemon.com...

 What I do is treat as "newline" any of the following:

 1) CR
 2) CR LF
 3) LF

 It requires a bit of lookahead to distinguish case 1 from case 2, but it
 works with files generated by Windows, linux, and Mac.

That's exactly what I did in the new version. It requires ungetc()
(for the case when CR is not followed by LF) though, so I had to
add it as well.

May 10 2002

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Walter wrote:

 What I do is treat as "newline" any of the following:

 1) CR
 2) CR LF
 3) LF

 It requires a bit of lookahead to distinguish case 1 from case 2, but it
 works with files generated by Windows, linux, and Mac.

This has caused me some HUGE headaches doing streaming on UNIX boxes.  At
least some of the tools do "lookahead", so they don't echo a line out until
you have printed 1 character AFTER the newline...in some cases, it has
caused my programs to hang for minutes or hours (while, say, a long find
command runs) until either another (unnecessary) line is printed, or the
stream runs into EOF.

IMHO, you should immediately interpret CR as a newline, but put a marker on
the stream such that if another character is read and that character is a
LF, then it will be consumed LATER.  DON'T lookahead for it :(

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]

May 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3CDBF812.D68ED4F6 deming-os.org...

 IMHO, you should immediately interpret CR as a newline, but put a marker

on
 the stream such that if another character is read and that character is a
 LF, then it will be consumed LATER.  DON'T lookahead for it :(

I do a lookahead, but I have ungetc() implemented and working...

May 10 2002

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Pavel Minayev wrote:

 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
 news:3CDBF812.D68ED4F6 deming-os.org...

 IMHO, you should immediately interpret CR as a newline, but put a marker

 on
 the stream such that if another character is read and that character is a
 LF, then it will be consumed LATER.  DON'T lookahead for it :(

 I do a lookahead, but I have ungetc() implemented and working...

Ungetc doesn't help the problem I was talking about.  If you do lookahead but
there is not a character available, then your library will block until one more
character is available to read (or you detect EOF)...which could be a LONG time
from now.

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]

May 10 2002

Andrew Feldstein <Andrew_member pathlink.com> writes:

I agree that Russ's way is better, but it is still not ideal.  The user should
be able to set some sort of library flag to determine how to handle end of lines
*correctly* given the needs of the program.  This flag could control both
writing as well as reading, knowing how to handle \n, for example.  For example,
under *nix, it is incorrect to treat CR as part of a newline, and under MAC, I
believe, the LF the same.  Of course, any implementation should should default
to the text model used by the underlying operating system and should handle the
oddball cases cleanly.  Of course reading and writing don't *have* to be the
same....

Pavel, how would your new function read, say, a file containing nothing but
three <CR>'s followed by two <LF>'s?  Under various text models this could be
interpreted as any of 1, 2, 3, 4, or 5 blank lines.

In article <3CDBFA09.5F8DD76D deming-os.org>, Russ Lewis says...
Pavel Minayev wrote:

 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
 news:3CDBF812.D68ED4F6 deming-os.org...

 IMHO, you should immediately interpret CR as a newline, but put a marker

 on
 the stream such that if another character is read and that character is a
 LF, then it will be consumed LATER.  DON'T lookahead for it :(

 I do a lookahead, but I have ungetc() implemented and working...

Ungetc doesn't help the problem I was talking about.  If you do lookahead but
there is not a character available, then your library will block until one more
character is available to read (or you detect EOF)...which could be a LONG time
from now.

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]

May 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Andrew Feldstein" <Andrew_member pathlink.com> wrote in message
news:abh3e0$3031$1 digitaldaemon.com...

 I agree that Russ's way is better, but it is still not ideal.  The user

should
 be able to set some sort of library flag to determine how to handle end of

lines
 *correctly* given the needs of the program.  This flag could control both
 writing as well as reading, knowing how to handle \n, for example.  For

example,
 under *nix, it is incorrect to treat CR as part of a newline, and under

MAC, I
 believe, the LF the same.  Of course, any implementation should should

default
 to the text model used by the underlying operating system and should

handle the
 oddball cases cleanly.  Of course reading and writing don't *have* to be

the
 same....

Under *nix, CR is a control character, and thus it is NOT supposed to
be seen in ASCII-files - which readLine() is designed for. However, if
it occasionally comes over a file made in Windows or Mac text editor,
it will still be able to read it properly. The same is true for mac -
text files SHOULDN'T contain LF. Stream's ability to handle it is an
advantage, not a bug.

 Pavel, how would your new function read, say, a file containing nothing

but
 three <CR>'s followed by two <LF>'s?  Under various text models this could

be
 interpreted as any of 1, 2, 3, 4, or 5 blank lines.

It will treat is as CR, CR, CR+LF, LF - 4 lines.

May 10 2002

"Walter" <walter digitalmars.com> writes:

"Andrew Feldstein" <Andrew_member pathlink.com> wrote in message
news:abh3e0$3031$1 digitaldaemon.com...
 I agree that Russ's way is better, but it is still not ideal.  The user

should
 be able to set some sort of library flag to determine how to handle end of

lines
 *correctly* given the needs of the program.  This flag could control both
 writing as well as reading, knowing how to handle \n, for example.  For

example,
 under *nix, it is incorrect to treat CR as part of a newline, and under

MAC, I
 believe, the LF the same.  Of course, any implementation should should

default
 to the text model used by the underlying operating system and should

handle the
 oddball cases cleanly.  Of course reading and writing don't *have* to be

the
 same....

The problem is that files are transferred from machine, and a program cannot
reliably know the source of it.

 Pavel, how would your new function read, say, a file containing nothing

but
 three <CR>'s followed by two <LF>'s?  Under various text models this could

be
 interpreted as any of 1, 2, 3, 4, or 5 blank lines.

That would be CR,CR,CR,LF,LF, or 4 lines.

May 10 2002

"Robert W. Cunningham" <rcunning acm.org> writes:

Russ Lewis wrote:

 Pavel Minayev wrote:

 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
 news:3CDBF812.D68ED4F6 deming-os.org...

 IMHO, you should immediately interpret CR as a newline, but put a marker

 on
 the stream such that if another character is read and that character is a
 LF, then it will be consumed LATER.  DON'T lookahead for it :(

 I do a lookahead, but I have ungetc() implemented and working...

 Ungetc doesn't help the problem I was talking about.  If you do lookahead but
 there is not a character available, then your library will block until one more
 character is available to read (or you detect EOF)...which could be a LONG time
 from now.

On serial device drivers I've written, and on at least one of the many RTOS
systems I've used, we had peekc() and/or lookc() calls that would, without
side-effects, look at the next character in the device driver's buffer, and if
that buffer was empty, the call would wait a single character time and sneak a
nondestructive look at the uart buffer (a tricky thing to do on some uarts).

I have no idea if Windows has similar capabilities.


-BobC

May 10 2002

"Walter" <walter digitalmars.com> writes:

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3CDBF812.D68ED4F6 deming-os.org...
 This has caused me some HUGE headaches doing streaming on UNIX boxes.  At
 least some of the tools do "lookahead", so they don't echo a line out

until
 you have printed 1 character AFTER the newline...in some cases, it has
 caused my programs to hang for minutes or hours (while, say, a long find
 command runs) until either another (unnecessary) line is printed, or the
 stream runs into EOF.

One solution is to use isatty() and if it is a stream, not a file, timeout
instead of blocking for the lookahead. I've used similar tricks when reading
escape sequences from terminals.

May 10 2002

Burton Radons <loth users.sourceforge.net> writes:

On Fri, 10 May 2002 18:42:01 +0400, "Pavel Minayev" <evilone omen.ru>
wrote:

You can find the new stream module at my site, http://int19h.tamb.ru.

Far the most interesting addition is scanf(). What is even better,
it can read D strings!

    char[] s;
    stdin.scanf("%.*s", &s);

Yes, this really works! Of course, you can still read C strings
(%s), but who needs it anymore? Note, however, that scanf wasn't
tested much, so it might contain bugs. Be careful!

I think we should get the scanf and fmt format codes aligned.  My
method is "%s" for char[], "%S" for wchar[], "%+s" for char*, and
"%+S" for wchar*.  Different semantics for what looks like the same
thing is bad city.

[snip]
Generic read() and write() can now handle strings as well. Unlike
readString() and writeString(), these also store the length in
the stream:

    char[] s;
    ...
    file.write(s);   // writes s.length, followed by s
    ...
    file.read(s);    // reads length, then string of that length

Since this format is our own (that is to say, there's no standard for
counted strings -- some are 32-bit, some are 16-bit, some are 8-bit,
with varying rules on NUL termination and alignment), we may as well
use dynamic-sized integers for this.  For each byte we take the first
seven bits and read another byte if the eighth bit is set, like:

    /* Write an unsigned long using the minimum number of bytes */
    void dwrite(ulong value)
    {
        do
        {
            write ((value & 127) | (value > 127 ? 128 : 0));
            value = value >> 7;
        }
        while (value);
    }

    /* Read an unsigned long using the minimum number of bytes */
    void dread(out ulong value)
    {
        ulong shift = 0;
        ubyte buffer;

        value = 0;

        do
        {
            if (shift >= 64)
                throw new ReadError("integer overflow on reading
value");
            read (buffer);
            value |= (ulong) (buffer & 127) << shift;
            shift += 7;
        }
        while (buffer & 128);

        return value;
    }

When writing uint you'll usually get three or two bytes savings, which
really adds up when writing meshes, and you have your future covered,
and it's endian neutral.

Signed values can be written by preprocessing them for writing:

    if (value < 0)
        ovalue = (-value << 1) | 1;
    else
        ovalue = value << 1;

and postprocessing them after reading:

    ovalue = (value >> 1);
    if (value & 1)
        ovalue = -ovalue;

Uh, except that you can't write the minimum value of long then.  Think
of the byte case - you start with a range of -128 to 127 and end with
a range of -127 to 127 if you kept just to byte.  If they existed I'd
cast to a bignum and save that, although real bignums should be saved
like counted strings.

For my code I won't be able to use the class if it doesn't handle
endian properly - I'm just too ethically opposed to blindly writing
values.  It's just one step down from writing structs, IMO.  Standard
read/write could use little endian, with bread/bwrite for big endian.

[snip]
Enumerations changed names again:

    enum SeekPos
    {
        Set,
        Current,
        End
    }

Why not Cur?  "Set" is already nonsensical; Start or Beginning would
be more appropriate, so we may as well use the convenient nonsense
we're used to.

Hm.  I don't like writing the name of the enumeration when there's
only one type that can fit in the argument.  How about we have this:

    file.seek (x, .Current);
    file.seek (x, .Set);
    file.seek (x, .End);

Minimise namespace pollution and too much writingitis at the same
time.  Of course, it means that you can't find the enumeration value
until after the function has been decided upon, but it shouldn't be
ambiguous; it's clearly an enumeration of some sort.

[snip]

May 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:sjunduovt39c2tcntmkv6rp23cn8thmk9g 4ax.com...

 I think we should get the scanf and fmt format codes aligned.  My
 method is "%s" for char[], "%S" for wchar[], "%+s" for char*, and
 "%+S" for wchar*.  Different semantics for what looks like the same
 thing is bad city.

Agreed, but I think we should first have Walter to agree with this,
so it'd become "official". Once it is, I will be happy to standartize
streams appropriately.

 Since this format is our own (that is to say, there's no standard for
 counted strings -- some are 32-bit, some are 16-bit, some are 8-bit,
 with varying rules on NUL termination and alignment), we may as well
 use dynamic-sized integers for this.  For each byte we take the first
 seven bits and read another byte if the eighth bit is set, like:

...
 When writing uint you'll usually get three or two bytes savings, which
 really adds up when writing meshes, and you have your future covered,
 and it's endian neutral.

But at a cost of speed... and I wonder if it is really needed? Is
file size so important?

 For my code I won't be able to use the class if it doesn't handle
 endian properly - I'm just too ethically opposed to blindly writing
 values.  It's just one step down from writing structs, IMO.  Standard
 read/write could use little endian, with bread/bwrite for big endian.

I would prefer read() and write() to operate in "current endianness"
(because often you just don't care - all you want is that your
program should be able to read data it previously written, on that
computer, savegames etc). If you really care about endianness, you'll
have to use functions like bread() and lread().

 Why not Cur?  "Set" is already nonsensical; Start or Beginning would
 be more appropriate, so we may as well use the convenient nonsense
 we're used to.

Because "Set" is a word, and so is "Current", but not "Cur".
But if you really think that "Start" looks better, I'll probably
change it...

May 10 2002

Burton Radons <loth users.sourceforge.net> writes:

On Fri, 10 May 2002 22:20:55 +0400, "Pavel Minayev" <evilone omen.ru>
wrote:

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:sjunduovt39c2tcntmkv6rp23cn8thmk9g 4ax.com...

 I think we should get the scanf and fmt format codes aligned.  My
 method is "%s" for char[], "%S" for wchar[], "%+s" for char*, and
 "%+S" for wchar*.  Different semantics for what looks like the same
 thing is bad city.

Agreed, but I think we should first have Walter to agree with this,
so it'd become "official". Once it is, I will be happy to standartize
streams appropriately.

Sure, but I'm leaving the option open to kick his ass if he decides to
go with "%format-a-string;".  ;-)

 Since this format is our own (that is to say, there's no standard for
 counted strings -- some are 32-bit, some are 16-bit, some are 8-bit,
 with varying rules on NUL termination and alignment), we may as well
 use dynamic-sized integers for this.  For each byte we take the first
 seven bits and read another byte if the eighth bit is set, like:

...
 When writing uint you'll usually get three or two bytes savings, which
 really adds up when writing meshes, and you have your future covered,
 and it's endian neutral.

But at a cost of speed... and I wonder if it is really needed? Is
file size so important?

It should be a little faster on a competent compiler.  We have to
buffer the data anyway; flushing the buffer takes a long time; loops
can be unrolled; dynamic-sized integers lower the incidence of
flushing; dynamic-sized integers are faster.  But this is splitting
hairs in any case.  Endian independence and a much smaller normal case
are far more important.

 For my code I won't be able to use the class if it doesn't handle
 endian properly - I'm just too ethically opposed to blindly writing
 values.  It's just one step down from writing structs, IMO.  Standard
 read/write could use little endian, with bread/bwrite for big endian.

I would prefer read() and write() to operate in "current endianness"
(because often you just don't care - all you want is that your
program should be able to read data it previously written, on that
computer, savegames etc). If you really care about endianness, you'll
have to use functions like bread() and lread().

Uh, if you don't care, then it can default to little endian.  :-)

 Why not Cur?  "Set" is already nonsensical; Start or Beginning would
 be more appropriate, so we may as well use the convenient nonsense
 we're used to.

Because "Set" is a word, and so is "Current", but not "Cur".
But if you really think that "Start" looks better, I'll probably
change it...

It's a word, but so is "Catholicity", and it's as appropriate as
"Set".  My dictionary gives 125 meanings for set.  The only thing that
could be related is in the context of "setting sun", which is quite
the opposite.

Besides which, cur is a word.  Uh, perhaps not in your part of the
world.  It means a worthless dog, or contemptible scoundrel.

May 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:3f4oduspb6fuiseeg7a4a025c92pnnfl1e 4ax.com...

 It's a word, but so is "Catholicity", and it's as appropriate as
 "Set".  My dictionary gives 125 meanings for set.  The only thing that
 could be related is in the context of "setting sun", which is quite
 the opposite.

Hmm.. I always thought that "set" is a short form of "just set that

May 10 2002

"OddesE" <OddesE_XYZ hotmail.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:abha3v$4f7$1 digitaldaemon.com...
 "Burton Radons" <loth users.sourceforge.net> wrote in message
 news:3f4oduspb6fuiseeg7a4a025c92pnnfl1e 4ax.com...

 It's a word, but so is "Catholicity", and it's as appropriate as
 "Set".  My dictionary gives 125 meanings for set.  The only thing that
 could be related is in the context of "setting sun", which is quite
 the opposite.

 Hmm.. I always thought that "set" is a short form of "just set that


LOL  :)

--
Stijn
OddesE_XYZ hotmail.com
http://OddesE.cjb.net
_________________________________________________
Remove _XYZ from my address when replying by mail

May 11 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:abh2p1$2vfr$1 digitaldaemon.com...
 "Burton Radons" <loth users.sourceforge.net> wrote in message
 news:sjunduovt39c2tcntmkv6rp23cn8thmk9g 4ax.com...
 I think we should get the scanf and fmt format codes aligned.  My
 method is "%s" for char[], "%S" for wchar[], "%+s" for char*, and
 "%+S" for wchar*.  Different semantics for what looks like the same
 thing is bad city.

 Agreed, but I think we should first have Walter to agree with this,
 so it'd become "official". Once it is, I will be happy to standartize
 streams appropriately.

But that means I have to think about it <g>. In any case, I think it is just
a matter of reviewing the C printf and scanf format strings, and coming up
with something as equivalent as practical but still support the full D
types. Note that D enables some cool things like a format specifier for
Objects, too, which will cast the argument to an Object and call toString()
on it.

May 10 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Walter" <walter digitalmars.com> wrote in message
news:abhjha$c6h$1 digitaldaemon.com...

 But that means I have to think about it <g>. In any case, I think it is

just
 a matter of reviewing the C printf and scanf format strings, and coming up
 with something as equivalent as practical but still support the full D

Yes, exactly. But you don't want anarchy here, do you?

May 10 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:abi475$qe3$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:abhjha$c6h$1 digitaldaemon.com...
 But that means I have to think about it <g>. In any case, I think it is

 just
 a matter of reviewing the C printf and scanf format strings, and coming


up
 with something as equivalent as practical but still support the full D

 Yes, exactly. But you don't want anarchy here, do you?

No, but it's a matter of getting spread too thin making it hard to give each
issue the attention it needs. I'm currently trying to finish another project
(and get paid for it) so I can spend more time on D.

May 11 2002

"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:

Hi,

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:sjunduovt39c2tcntmkv6rp23cn8thmk9g 4ax.com...
 Since this format is our own (that is to say, there's no standard for
 counted strings -- some are 32-bit, some are 16-bit, some are 8-bit,
 with varying rules on NUL termination and alignment), we may as well
 use dynamic-sized integers for this.  For each byte we take the first
 seven bits and read another byte if the eighth bit is set, like:

This is very much like the ASN.1/DER encoding of lengths, but not exactly.
We might consider that encoding, see:
ftp://ftp.rsasecurity.com/pub/pkcs/ascii/layman.asc

 When writing uint you'll usually get three or two bytes savings, which
 really adds up when writing meshes, and you have your future covered,
 and it's endian neutral.

These are good properties. ASN.1/DER also specifies how to encode type
information used to destinguish between ASCII and UNICODE strings - that
might be usable too.

Regards,
Martin M. Pedersen

May 11 2002

"Walter" <walter digitalmars.com> writes:

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:sjunduovt39c2tcntmkv6rp23cn8thmk9g 4ax.com...
 I think we should get the scanf and fmt format codes aligned.  My
 method is "%s" for char[], "%S" for wchar[], "%+s" for char*, and
 "%+S" for wchar*.  Different semantics for what looks like the same
 thing is bad city.

I've been thinking about the problem of writing wide characters vs ascii
characters. Embedding it in the format string isn't going to work too well,
as what happens with:

    printf("foo %S bar");

Are foo and bar written as unicode or ascii? I don't know many applications
that would want to mix the two. The practical solution I see is to have two
printf's, one for ascii and one for unicode. I.e. printf and wprintf.

May 11 2002

D Programming

C/C++ Programming

Other

D - new streams