digitalmars.D - streaming redux

Andrei Alexandrescu (8/8) Dec 27 2010 I've put together over the past days an embryonic streaming interface.

Vladimir Panteleev (48/79) Dec 28 2010 Here are my humble observations:

Andrei Alexandrescu (14/92) Dec 28 2010 I think static polymorphism is great for ranges, which have fine

Michel Fortin (21/33) Dec 28 2010 You're assuming streams will deal with I/O operations. What if I have a

Andrei Alexandrescu (6/33) Dec 28 2010 That too.

Michel Fortin (21/30) Dec 28 2010 Well, theoretically, adding a buffer between us and the stream should

Sean Kelly (21/35) Dec 28 2010 This one is really difficult to get right. JSON, for example, has named...

Andrei Alexandrescu (12/51) Dec 28 2010 named members of its object type. How could the name of a field be

Robert Jacques (5/69) Dec 28 2010 By the way, JSON doesn't support associative arrays in general. It only ...

Andrei Alexandrescu (4/72) Dec 29 2010 Yah, I meant AAs keyed on string types and with values that in turn are
Sean Kelly (3/7) Dec 29 2010 Or something like this:

Vladimir Panteleev (15/20) Dec 28 2010 Ah, OK. For some reason I thought it was only for binary data. Still, I ...

Daniel Gibson (51/70) Dec 28 2010 I think I mostly like the proposal. I think it should be done this way,

jovo (3/12) Dec 28 2010 Interfaces should abstract only common things. You can
jovo (3/12) Dec 28 2010 Interfaces should abstract only common things. You can
Dmitry Olshansky (5/20) Dec 30 2010 This, I guess, would be provided by free functions in the same module,

Andrei Alexandrescu (3/28) Dec 30 2010 What's wrong with void[]?

Dmitry Olshansky (15/46) Dec 30 2010 Nothing, in fact I was repyling to
Daniel Gibson (13/44) Dec 31 2010 For example:

Steven Schveighoffer (7/51) Dec 31 2010 This can be significantly shortened:

so (5/7) Jan 01 2011 Wow, i didn't know that!

Steven Schveighoffer (3/9) Jan 03 2011 http://www.digitalmars.com/d/2.0/arrays.html#implicit-conversions

so (7/18) Jan 03 2011 Thanks.

Steven Schveighoffer (16/35) Jan 03 2011 type of (&i)[0..1] is int[]. int[] is implicitly convertable to void[] ...

so (6/7) Jan 03 2011 I see what the topic is all about.

Steven Schveighoffer (10/15) Jan 03 2011 Oh that:

so (6/23) Jan 03 2011 Thanks a ton!

Daniel Gibson (2/57) Jan 03 2011 This is indeed a very cool trick :-)

Daniel Gibson (11/36) Dec 31 2010 Maybe for the convenience functions, but what about readFully()?

Dmitry Olshansky (18/58) Dec 31 2010 I meant something like this (assuming call to t.read blocks):

Andrej Mitrovic (2/3) Dec 28 2010 What exactly is the difference between an interface and an abstract inte...

Andrei Alexandrescu (3/6) Dec 28 2010 Just an artifact of ddoc.

SHOO (34/42) Dec 28 2010 I hope that this argument grows warm. For Phobos, the I/O is a very

Andrei Alexandrescu (13/57) Dec 28 2010 With dynamically polymorphic interface, client code need not be

Steven Schveighoffer (13/42) Dec 29 2010 Consider this scenario:

spir (14/24) Dec 29 2010 =20

SHOO (19/52) Dec 30 2010 I think interface is good for stream. Basically, handling of I/O is

Michel Fortin (59/68) Dec 28 2010 One of my concerns is the number of virtual calls required in actual

Sean Kelly (2/14) Dec 28 2010 I like it. There needs to be some way to hold format-specific state inf...

Michel Fortin (12/30) Dec 28 2010 The 'F' formatter can be anything, it can be a class, a delegate, a

Sean Kelly (2/18) Dec 28 2010 And I guess writeTo could just call formatter.write(MyClass c). You're ...

Michel Fortin (34/55) Dec 28 2010 Well, not exactly. I'd expect formatter.write(Object) do be the one

Sean Kelly (4/20) Dec 28 2010 This is what I meant. Sorry for the confusion.

Michel Fortin (8/14) Dec 28 2010 I never pretended I had overcome the problem of the lack of runtime

Andrei Alexandrescu (8/18) Dec 28 2010 After some thought, I think we should confine the charter of the current...

Michel Fortin (7/16) Dec 29 2010 Seems reasonable to me.
Sean Kelly (2/7) Dec 29 2010 Can formatters be chained? Data available from Bloomberg, for example, ...

Andrei Alexandrescu (3/10) Dec 29 2010 I think Transports are supposed to be chained.

sclytrack (11/25) Dec 28 2010 class SerializableObject
Andrei Alexandrescu (9/26) Dec 28 2010 This design prevents new formatters from working with existing class

Andrei Alexandrescu (24/85) Dec 28 2010 I think that's a very rare situation. When you pick a certain formatter,...

Michel Fortin (20/22) Dec 29 2010 It seems we're approaching the problem from different angles. What you

Jonathan M Davis (23/32) Dec 28 2010 The fact that streams here are interfaces and yet Phobos uses templates ...

Andrei Alexandrescu (19/50) Dec 28 2010 Dynamic polymorphism is a well understood style of coding and is

Haruki Shigemori (5/13) Dec 28 2010 I've waited so long for this day.

Andrei Alexandrescu (6/23) Dec 28 2010 There isn't one. The source code is just support for documentation, and

SHOO (38/62) Dec 28 2010 Like this?

Andrei Alexandrescu (4/35) Dec 28 2010 [snip]

Robert Jacques (68/76) Dec 28 2010 Here are my initial thoughts and responses to the questions. Now to go
Sean Kelly (6/36) Dec 29 2010 I think a distinction should be drawn between unstructured formats (comp...
Steven Schveighoffer (142/149) Dec 29 2010 Without reading any other comments, here is my take on just the streamin...

Dmitry Olshansky (35/51) Dec 30 2010 Maybe it's only me but I would prefer non-blocking IO not mixed with

Steven Schveighoffer (9/56) Dec 30 2010 On Linux, you set the file descriptor to blocking or non-blocking, and

Dmitry Olshansky (10/68) Dec 30 2010 The only general thing I can think of would be to suspend thread
Johannes Pfau (13/71) Dec 31 2010 I think it's possible (libev: "If you cannot use non-blocking mode,

Steven Schveighoffer (8/15) Dec 29 2010 One thing I just realized, the streams have no shared methods. This mea...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I've put together over the past days an embryonic streaming interface. 
It separates transport from formatting, input from output, and buffered 
from unbuffered operation.

http://erdani.com/d/phobos/std_stream2.html

There are a number of questions interspersed. It would be great to start 
a discussion using that design as a baseline. Please voice any related 
thoughts - thanks!


Andrei

Dec 27 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tue, 28 Dec 2010 09:02:29 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.  
 It separates transport from formatting, input from output, and buffered  
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start  
 a discussion using that design as a baseline. Please voice any related  
 thoughts - thanks!

Here are my humble observations:

First of all: was ranges-like duck typing considered for streams? The  
language allows on-demand runtime polymorphism, and static typing allows  
compile-time detection of stream features for abstraction. Not sure how  
useful this is is practice, but it allows some optimizations (e.g. the  
code can be really fast when working with memory streams, due to inlining  
and lack of vcalls).

Also, why should there be support for unopened streams? While a stream  
should be flush-able and close-able, opening and reopening streams should  
be done at a higher level IMO.

 Question: Should we offer an open primitive at this level? If so, what  
 parameter(s) should it take?

I don't see how this would be implemented at the lowest level, taking into  
consideration all the possible stream types (network connections, pipes,  
etc.)

 Question: Should we offer a primitive rewind that takes the stream back  
 to the beginning? That might be supported even by some streams that  
 don't support general seek calls. Alternatively, some streams might  
 support seek(0, SeekAnchor.start) but not other calls to seek.

If seek support is determined at runtime by whether the call throws an  
exception or not, then I see no difference in having a rewind method or  
having non-zero seek throw.

 Question: May we eliminate seekFromCurrent and seekFromEnd and just have  
 seek with absolute positioning? I don't know of streams that allow seek  
 without allowing tell. Even if some stream doesn't, it's easy to add  
 support for tell in a wrapper. The marginal cost of calling tell is  
 small enough compared to the cost of seek.

Does anyone ever use seekFromEnd in practice (except the rare case of  
supporting certain file formats)? seekFromCurrent is a nice commodity, but  
every abstract method increases the burden for implementers.

 Buffered*Transport

I always thought that a perfect stream library would have buffering as an  
additional layer. For example: auto f = new Buffered!FileStream(...);

 abstract interface Formatter;

I'm really not sure about this interface. I can see at most three  
implementations of it (native, high-endian and low-endian variants),  
everything else being too obscure to count. I think it should be  
implemented as static structs instead. Also, having an abstract method for  
each native type is quite ugly for D standards, I'm sure there's a better  
solution.

 Question: Should all formatters require buffered transport? Otherwise  
 they might need to keep their own buffering, which ends up being less  
 efficient with buffered transports.

Ideally buffering would be optional, and constructing a buffer-enabled  
stream should be so easy it'd be an easily adoptable habit (see above).  

3-4 classes before I could read from a file. D can do better.

 Question: Should we also define putln that writes the string and then an  
 line terminator?

But then you're mixing together text and binary streams into the same  
interface. I don't think this is a good idea.

 Question: Should we define a more involved protocol?

"A more involved protocol" would really be proper serialization. Calling  
toString can work as a commodity, similar to writefln's behavior.

 This final function writes a customizable "header" and a customizable  
 "footer".

What is the purpose of this? TypeInfo doesn't contain the field names, so  
it can't be used for protobuf-like serialization. Compile-time reflection  
would be much more useful.

 Question: Should we pass the size in advance, or make the stream  
 responsible for inferring it?

Code that needs to handle allocation itself can make the small effort of  
writing the lengths as well. A possible solution is to make string length  
encoding part of the interface specification, then the user can read the  
length and the contents separately themselves.

 Question: How to handle associative arrays?

Not a problem with static polymorphism.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 On Tue, 28 Dec 2010 09:02:29 +0200, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and
 buffered from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to
 start a discussion using that design as a baseline. Please voice any
 related thoughts - thanks!

 Here are my humble observations:

 First of all: was ranges-like duck typing considered for streams? The
 language allows on-demand runtime polymorphism, and static typing allows
 compile-time detection of stream features for abstraction. Not sure how
 useful this is is practice, but it allows some optimizations (e.g. the
 code can be really fast when working with memory streams, due to
 inlining and lack of vcalls).

I think static polymorphism is great for ranges, which have fine 
granularity, but not for streams, which have coarse granularity. One 
read/write operation on a stream is likely to do enough work for the 
dynamic dispatch overhead to not matter.

 Also, why should there be support for unopened streams? While a stream
 should be flush-able and close-able, opening and reopening streams
 should be done at a higher level IMO.

OK.

 Question: Should we offer an open primitive at this level? If so, what
 parameter(s) should it take?

 I don't see how this would be implemented at the lowest level, taking
 into consideration all the possible stream types (network connections,
 pipes, etc.)

It could take a Variant.

 Question: Should we offer a primitive rewind that takes the stream
 back to the beginning? That might be supported even by some streams
 that don't support general seek calls. Alternatively, some streams
 might support seek(0, SeekAnchor.start) but not other calls to seek.

 If seek support is determined at runtime by whether the call throws an
 exception or not, then I see no difference in having a rewind method or
 having non-zero seek throw.

 Question: May we eliminate seekFromCurrent and seekFromEnd and just
 have seek with absolute positioning? I don't know of streams that
 allow seek without allowing tell. Even if some stream doesn't, it's
 easy to add support for tell in a wrapper. The marginal cost of
 calling tell is small enough compared to the cost of seek.

 Does anyone ever use seekFromEnd in practice (except the rare case of
 supporting certain file formats)? seekFromCurrent is a nice commodity,
 but every abstract method increases the burden for implementers.

 Buffered*Transport

 I always thought that a perfect stream library would have buffering as
 an additional layer. For example: auto f = new Buffered!FileStream(...);

So Buffered would be a template? Cool idea. Let me think of it a bit more.

 abstract interface Formatter;

 I'm really not sure about this interface. I can see at most three
 implementations of it (native, high-endian and low-endian variants),
 everything else being too obscure to count. I think it should be
 implemented as static structs instead. Also, having an abstract method
 for each native type is quite ugly for D standards, I'm sure there's a
 better solution.

Nonono. Perhaps I chose the wrong name, but Formatter is really anything 
that takes typed data and encodes it in raw bytes suitable for 
transporting. That includes e.g. json, csv, and also a variety of binary 
formats.

 Question: Should all formatters require buffered transport? Otherwise
 they might need to keep their own buffering, which ends up being less
 efficient with buffered transports.

 Ideally buffering would be optional, and constructing a buffer-enabled
 stream should be so easy it'd be an easily adoptable habit (see above).

 3-4 classes before I could read from a file. D can do better.

 Question: Should we also define putln that writes the string and then
 an line terminator?

 But then you're mixing together text and binary streams into the same
 interface. I don't think this is a good idea.

 Question: Should we define a more involved protocol?

 "A more involved protocol" would really be proper serialization. Calling
 toString can work as a commodity, similar to writefln's behavior.

 This final function writes a customizable "header" and a customizable
 "footer".

 What is the purpose of this? TypeInfo doesn't contain the field names,
 so it can't be used for protobuf-like serialization. Compile-time
 reflection would be much more useful.
 Question: Should we pass the size in advance, or make the stream
 responsible for inferring it?

 Code that needs to handle allocation itself can make the small effort of
 writing the lengths as well. A possible solution is to make string
 length encoding part of the interface specification, then the user can
 read the length and the contents separately themselves.

 Question: How to handle associative arrays?

 Not a problem with static polymorphism.

Yah, but that precludes dynamic polymorphism...


Andrei

Dec 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-28 11:09:01 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 First of all: was ranges-like duck typing considered for streams? The
 language allows on-demand runtime polymorphism, and static typing allows
 compile-time detection of stream features for abstraction. Not sure how
 useful this is is practice, but it allows some optimizations (e.g. the
 code can be really fast when working with memory streams, due to
 inlining and lack of vcalls).

 
 I think static polymorphism is great for ranges, which have fine 
 granularity, but not for streams, which have coarse granularity. One 
 read/write operation on a stream is likely to do enough work for the 
 dynamic dispatch overhead to not matter.

You're assuming streams will deal with I/O operations. What if I have a 
pipe between two processes on the same machine? What if I'm serializing 
an object before passing it to another thread? What if I just want to 
calculate the checksum for a serialized object without writing it 
anywhere? Should I create my own stream system for these cases?

As for fine/coarse granularity, that's somewhat true when the stream is 
buffered before the virtual calls, but do you realize that using 
Formatter to output bytes can easily result in two virtual calls per 
byte for calling mostly trivial functions that could easily be inlined 
otherwise? First virtual call: Formatter.put(byte), which then calls 
UnbufferedOutputTransport.write(byte[1]).

If you restrict virtual calls so they only happen when flushing the 
buffer, then you have a coarse granularity, as you're passing many-byte 
buffers through those virtual functions. But otherwise it's quite 
wasteful.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 11:34 AM, Michel Fortin wrote:
 On 2010-12-28 11:09:01 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 First of all: was ranges-like duck typing considered for streams? The
 language allows on-demand runtime polymorphism, and static typing allows
 compile-time detection of stream features for abstraction. Not sure how
 useful this is is practice, but it allows some optimizations (e.g. the
 code can be really fast when working with memory streams, due to
 inlining and lack of vcalls).

 I think static polymorphism is great for ranges, which have fine
 granularity, but not for streams, which have coarse granularity. One
 read/write operation on a stream is likely to do enough work for the
 dynamic dispatch overhead to not matter.

 You're assuming streams will deal with I/O operations. What if I have a
 pipe between two processes on the same machine?

That's solid work all right.

 What if I'm serializing
 an object before passing it to another thread?

That too.

 What if I just want to
 calculate the checksum for a serialized object without writing it
 anywhere? Should I create my own stream system for these cases?

I'd guess so. I'm not getting your drift.

 As for fine/coarse granularity, that's somewhat true when the stream is
 buffered before the virtual calls, but do you realize that using
 Formatter to output bytes can easily result in two virtual calls per
 byte for calling mostly trivial functions that could easily be inlined
 otherwise? First virtual call: Formatter.put(byte), which then calls
 UnbufferedOutputTransport.write(byte[1]).

Ideas on how to mitigate that?


Andrei

Dec 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-28 12:47:26 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 12/28/10 11:34 AM, Michel Fortin wrote:
 As for fine/coarse granularity, that's somewhat true when the stream is
 buffered before the virtual calls, but do you realize that using
 Formatter to output bytes can easily result in two virtual calls per
 byte for calling mostly trivial functions that could easily be inlined
 otherwise? First virtual call: Formatter.put(byte), which then calls
 UnbufferedOutputTransport.write(byte[1]).

 
 Ideas on how to mitigate that?

Well, theoretically, adding a buffer between us and the stream should 
allow us to play with our buffer and flush only when we have a big 
chunk of data, making the virtual call overhead irrelevant. But for 
this to work, we need to manipulate the buffer free of virtual calls; 
this doesn't really work with BufferedOutputTransport as an interface.

One way you could achieve this by making BufferedOutputTransport an 
abstract class that implements UnbufferedOutoutTransport's write method 
as a final function and leaves abstract (and virtual) only the buffer 
flushing method (and other things related to the underlying stream). 
This will make BufferedOutoutTransport's implementation of buffering 
hard to change, but how many buffer implementation do we really need?

So this should eliminate about half of the virtual calls, provided 
Formatter knows at compile time it is speaking to a buffered stream. As 
for the other half, when calling Formatter's functions, see my earlier 
post.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 28 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 abstract interface Formatter;

 I'm really not sure about this interface. I can see at most three
 implementations of it (native, high-endian and low-endian variants),
 everything else being too obscure to count. I think it should be
 implemented as static structs instead. Also, having an abstract method
 for each native type is quite ugly for D standards, I'm sure there's a
 better solution.

 
 Nonono. Perhaps I chose the wrong name, but Formatter is really anything 
 that takes typed data and encodes it in raw bytes suitable for 
 transporting. That includes e.g. json, csv, and also a variety of binary 
 formats.

This one is really difficult to get right.  JSON, for example, has named
members of its object type.  How could the name of a field be communicated to
the formatter?  The best I was able to do with C++ iostreams was to create an
abstract formatter class that knew about the types I needed to format and have
protocol-specific derived classes do the work.  Here's some of the dispatching
code:

    printer* get_printer( std::ios_base& str )
    {
        void*& ptr = str.pword( printer::stream_index() );

        if( ptr == NULL )
        {
            str.register_callback( &printer_callback, printer::stream_index() );
            ptr = new xml_printer();
        }
        return static_cast<printer*>( ptr );
    }

    std::ostream& operator<<( std::ostream& os, const message_header& val )
    {
        printer* ptr = get_printer( os );
        return (*ptr)( os, val );
    }

Actually using this code to write data to a stream looks great:

    ostr << header << someobj << anotherobj<< end_msg;

but I'm not happy about how much specialized underlying code needs to exist.

I guess what I'm saying is that a generic formatter may be great for simple
formats like zip streams, CSV files, etc, but not so much for more structured
output.  That may be a sufficient goal for std.stream2, but if so I'd remove
JSON from your list of possible output formats :-)

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 11:54 AM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 abstract interface Formatter;

 I'm really not sure about this interface. I can see at most three
 implementations of it (native, high-endian and low-endian variants),
 everything else being too obscure to count. I think it should be
 implemented as static structs instead. Also, having an abstract method
 for each native type is quite ugly for D standards, I'm sure there's a
 better solution.

 Nonono. Perhaps I chose the wrong name, but Formatter is really anything
 that takes typed data and encodes it in raw bytes suitable for
 transporting. That includes e.g. json, csv, and also a variety of binary
 formats.

 This one is really difficult to get right. JSON, for example, has

named members of its object type. How could the name of a field be
communicated to the formatter? The best I was able to do with C++
iostreams was to create an abstract formatter class that knew about the
types I needed to format and have protocol-specific derived classes do
the work. Here's some of the dispatching code:
      printer* get_printer( std::ios_base&  str )
      {
          void*&  ptr = str.pword( printer::stream_index() );

          if( ptr == NULL )
          {
              str.register_callback(&printer_callback, printer::stream_index()
);
              ptr = new xml_printer();
          }
          return static_cast<printer*>( ptr );
      }

      std::ostream&  operator<<( std::ostream&  os, const message_header&  val )
      {
          printer* ptr = get_printer( os );
          return (*ptr)( os, val );
      }

 Actually using this code to write data to a stream looks great:

      ostr<<  header<<  someobj<<  anotherobj<<  end_msg;

 but I'm not happy about how much specialized underlying code needs to exist.

 I guess what I'm saying is that a generic formatter may be great for
 simple formats like zip streams, CSV files, etc, but not so much for
 more structured output.  That may be a sufficient goal for
 std.stream2, but if so I'd remove JSON from your list of possible
 output formats :-)

I agree with the spirit. In brief, I think it's fine to have a Json 
formatter as long as data is provided to it as Json-friendly types 
(ints, strings, arrays, associative arrays). In other words, I need to 
simplify the interface to not attempt to format class and struct types - 
only built-in types.


Andrei

Dec 28 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 28 Dec 2010 23:34:42 -0700, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/28/10 11:54 AM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 abstract interface Formatter;

 I'm really not sure about this interface. I can see at most three
 implementations of it (native, high-endian and low-endian variants),
 everything else being too obscure to count. I think it should be
 implemented as static structs instead. Also, having an abstract method
 for each native type is quite ugly for D standards, I'm sure there's a
 better solution.

 Nonono. Perhaps I chose the wrong name, but Formatter is really  
 anything
 that takes typed data and encodes it in raw bytes suitable for
 transporting. That includes e.g. json, csv, and also a variety of  
 binary
 formats.

 This one is really difficult to get right. JSON, for example, has

 named members of its object type. How could the name of a field be
 communicated to the formatter? The best I was able to do with C++
 iostreams was to create an abstract formatter class that knew about the
 types I needed to format and have protocol-specific derived classes do
 the work. Here's some of the dispatching code:
      printer* get_printer( std::ios_base&  str )
      {
          void*&  ptr = str.pword( printer::stream_index() );

          if( ptr == NULL )
          {
              str.register_callback(&printer_callback,  
 printer::stream_index() );
              ptr = new xml_printer();
          }
          return static_cast<printer*>( ptr );
      }

      std::ostream&  operator<<( std::ostream&  os, const  
 message_header&  val )
      {
          printer* ptr = get_printer( os );
          return (*ptr)( os, val );
      }

 Actually using this code to write data to a stream looks great:

      ostr<<  header<<  someobj<<  anotherobj<<  end_msg;

 but I'm not happy about how much specialized underlying code needs to  
 exist.

 I guess what I'm saying is that a generic formatter may be great for
 simple formats like zip streams, CSV files, etc, but not so much for
 more structured output.  That may be a sufficient goal for
 std.stream2, but if so I'd remove JSON from your list of possible
 output formats :-)

 I agree with the spirit. In brief, I think it's fine to have a Json  
 formatter as long as data is provided to it as Json-friendly types  
 (ints, strings, arrays, associative arrays). In other words, I need to  
 simplify the interface to not attempt to format class and struct types -  
 only built-in types.

By the way, JSON doesn't support associative arrays in general. It only  
supports AA in the sense that JSON objects are an array of string:value  
pairs.

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/29/10 1:37 AM, Robert Jacques wrote:
 On Tue, 28 Dec 2010 23:34:42 -0700, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 12/28/10 11:54 AM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:

 On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
 abstract interface Formatter;

 I'm really not sure about this interface. I can see at most three
 implementations of it (native, high-endian and low-endian variants),
 everything else being too obscure to count. I think it should be
 implemented as static structs instead. Also, having an abstract method
 for each native type is quite ugly for D standards, I'm sure there's a
 better solution.

 Nonono. Perhaps I chose the wrong name, but Formatter is really
 anything
 that takes typed data and encodes it in raw bytes suitable for
 transporting. That includes e.g. json, csv, and also a variety of
 binary
 formats.

 This one is really difficult to get right. JSON, for example, has

 named members of its object type. How could the name of a field be
 communicated to the formatter? The best I was able to do with C++
 iostreams was to create an abstract formatter class that knew about the
 types I needed to format and have protocol-specific derived classes do
 the work. Here's some of the dispatching code:
 printer* get_printer( std::ios_base& str )
 {
 void*& ptr = str.pword( printer::stream_index() );

 if( ptr == NULL )
 {
 str.register_callback(&printer_callback, printer::stream_index() );
 ptr = new xml_printer();
 }
 return static_cast<printer*>( ptr );
 }

 std::ostream& operator<<( std::ostream& os, const message_header& val )
 {
 printer* ptr = get_printer( os );
 return (*ptr)( os, val );
 }

 Actually using this code to write data to a stream looks great:

 ostr<< header<< someobj<< anotherobj<< end_msg;

 but I'm not happy about how much specialized underlying code needs to
 exist.

 I guess what I'm saying is that a generic formatter may be great for
 simple formats like zip streams, CSV files, etc, but not so much for
 more structured output. That may be a sufficient goal for
 std.stream2, but if so I'd remove JSON from your list of possible
 output formats :-)

 I agree with the spirit. In brief, I think it's fine to have a Json
 formatter as long as data is provided to it as Json-friendly types
 (ints, strings, arrays, associative arrays). In other words, I need to
 simplify the interface to not attempt to format class and struct types
 - only built-in types.

 By the way, JSON doesn't support associative arrays in general. It only
 supports AA in the sense that JSON objects are an array of string:value
 pairs.

Yah, I meant AAs keyed on string types and with values that in turn are 
JSON-friendly.

Andrei

Dec 29 2010

Sean Kelly <sean invisibleduck.org> writes:

Robert Jacques Wrote:
 
 By the way, JSON doesn't support associative arrays in general. It only  
 supports AA in the sense that JSON objects are an array of string:value  
 pairs.

Or something like this:

[{"key":123,"val":"foo"},{"key":456,"val":"bar"}]

Dec 29 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tue, 28 Dec 2010 18:09:01 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Nonono. Perhaps I chose the wrong name, but Formatter is really anything  
 that takes typed data and encodes it in raw bytes suitable for  
 transporting. That includes e.g. json, csv, and also a variety of binary  
 formats.

Ah, OK. For some reason I thought it was only for binary data. Still, I  
can't shake off the idea that having an interface method for each native  
type is not the perfect solution.

 Yah, but that precludes dynamic polymorphism...

Hmm. I seem to have somehow reached the conclusion that due to recent  
developments D has breached the barrier when we can easily wrap dynamic  
polymorphism around static polymorphism. Of course, that would require the  
compiler to know beforehand all the types used with the various templated  
methods when constructing the interface VMT, so it's back to another kind  
of static polymorphism. (Perhaps in a JIT-ed language, there would be no  
need for this distinction...) Well, ignore my crazed ramblings then :P

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Dec 28 2010

Daniel Gibson <metalcaedes gmail.com> writes:

Am 28.12.2010 08:02, schrieb Andrei Alexandrescu:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei

I think I mostly like the proposal. I think it should be done this way, 
i.e. in OOP style, not with ranges-like duck-typing.
Here are my comments:


 Question: May we eliminate seekFromCurrent and  seekFromEnd and just
 have seek with absolute positioning? I don't know of streams that
 allow seek without allowing tell. Even if some stream doesn't, it's
 easy to add support for tell in a wrapper. The marginal cost of
 calling tell is small enough compared to the cost of  seek.

No, seekFromCurrent may be convenient e.g. in network streams (just skip 
some bytes), where other seek operations don't make that much sense.

 Question: Should this [close()] throw on an unopened stream?

No, close()ing a closed stream should just do nothing.



I'd like "void readFully(ubyte[] buffer)" which reads buffer.length 
bytes or throws an exception if that is not possible
This would also fix busy-waiting (it'd block until buffer.length bytes 
are available).

Also "size_t read(void* buffer, size_t length)" (and the same for 
readFully()) would be nice, so one can read from the stream to buffers 
of arbitrary type without too much casting. Is probably especially handy 
when used with (data from) extern(C) functions and such.

Also, for convenience: "ubyte[] read(size_t length)" (does something 
like "return read(new ubyte[length]);"
and "ubyte[] readFully(size_t length)"



I'd like "void write(void *buffer, size_t length)" - for the same reason 
as read(void* buffer, size_t length).



 Question: Should all formatters require buffered transport? Otherwise
 they might need to keep their own buffering, which ends up being less
 efficient with buffered transports.

No, I don't think so. But readFully() would come in handy for that case.



Why is "abstract void put(Object obj);" here and not in Formatter?

*Please* provide not only "void read(ref <PRIMITIVE_TYPE> value)" but 
also "<PRIMITIVE_TYPE> read<TYPENAME>()". I found this design in the old 
std.stream quite annoying.

Just compare:
int i;
tr.read(i);
foo(i);
with:
foo(tr.readInt());

Or maybe even only this alternative (I don't think you gain anything by 
passing a reference to a variable to read() instead of just returning 
the variable).


"abstract void read(ref char[] value);" etc:
 Question: Should we pass the size in advance, or make the stream
 responsible for inferring it?

reading a string without knowing its length doesn't make much sense in 
95% of the cases (assuming that 5% of the cases read only fixed length 
strings), so the user would have to make sure the length is prepended 
himself, so he knows how long "value[]" is.
So it may make sense to write the strings length in front of the string 
itself, as a uint or ulong or something (but *not* a size_t like in old 
std.stream, because that is not portable between i386 and amd64!).
Else one could just use "abstract void read(ref void[] value, TypeInfo 
elementType);" instead.

Cheers,
- Daniel

Dec 28 2010

jovo <jovo home.com> writes:

Daniel Gibson Wrote:
  > Question: May we eliminate seekFromCurrent and  seekFromEnd and just
  > have seek with absolute positioning? I don't know of streams that
  > allow seek without allowing tell. Even if some stream doesn't, it's
  > easy to add support for tell in a wrapper. The marginal cost of
  > calling tell is small enough compared to the cost of  seek.
 
 No, seekFromCurrent may be convenient e.g. in network streams (just skip 
 some bytes), where other seek operations don't make that much sense.
 

Interfaces should abstract only common things. You can
always directly use concrete class that implements this.

Dec 28 2010

jovo <jovo home.com> writes:

Daniel Gibson Wrote:
  > Question: May we eliminate seekFromCurrent and  seekFromEnd and just
  > have seek with absolute positioning? I don't know of streams that
  > allow seek without allowing tell. Even if some stream doesn't, it's
  > easy to add support for tell in a wrapper. The marginal cost of
  > calling tell is small enough compared to the cost of  seek.
 
 No, seekFromCurrent may be convenient e.g. in network streams (just skip 
 some bytes), where other seek operations don't make that much sense.
 

Interfaces should abstract only common things. You can
always directly use concrete class that implements this.

Dec 28 2010

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 28.12.2010 16:08, Daniel Gibson wrote:
[snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length 
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes 
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for 
 readFully()) would be nice, so one can read from the stream to buffers 
 of arbitrary type without too much casting. Is probably especially 
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something 
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"

This, I guess, would be provided by free functions in the same module, 
there is no point in requiring to implement them inside the stream itself.


 I'd like "void write(void *buffer, size_t length)" - for the same 
 reason as read(void* buffer, size_t length).

Ditto

Dec 30 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"

 This, I guess, would be provided by free functions in the same module,
 there is no point in requiring to implement them inside the stream itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).

 Ditto

What's wrong with void[]?

Andrei

Dec 30 2010

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 31.12.2010 1:17, Andrei Alexandrescu wrote:
 On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"

 This, I guess, would be provided by free functions in the same module,
 there is no point in requiring to implement them inside the stream 
 itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).

 Ditto

 What's wrong with void[]?

 Andrei

Nothing, in fact  I was repyling  to
---
I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
bytes or throws an exception if that is not possible
This would also fix busy-waiting (it'd block until buffer.length bytes
are available).
[snip]
Also, for convenience: "ubyte[] read(size_t length)" (does something
like "return read(new ubyte[length]);"
and "ubyte[] readFully(size_t length)" ...
---
I should have made it clearer.

-- 
Dmitry Olshansky

Dec 30 2010

Daniel Gibson <metalcaedes gmail.com> writes:

Am 30.12.2010 23:17, schrieb Andrei Alexandrescu:
 On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"

 This, I guess, would be provided by free functions in the same module,
 there is no point in requiring to implement them inside the stream
 itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).

 Ditto

 What's wrong with void[]?

 Andrei

For example:

void put(int i) {
	write(&i, int.sizeof);
}

is shorter and easier than

void put(int i) {
	void *tmp = cast(void*)(&i);
	void[] arr = tmp[0..int.sizeof];
	write(arr);
}

Cheers,
- Daniel

Dec 31 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 31 Dec 2010 03:28:12 -0500, Daniel Gibson <metalcaedes gmail.com>  
wrote:

 Am 30.12.2010 23:17, schrieb Andrei Alexandrescu:
 On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"

 This, I guess, would be provided by free functions in the same module,
 there is no point in requiring to implement them inside the stream
 itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).

 Ditto

 What's wrong with void[]?

 Andrei

 For example:

 void put(int i) {
 	write(&i, int.sizeof);
 }

 is shorter and easier than

 void put(int i) {
 	void *tmp = cast(void*)(&i);
 	void[] arr = tmp[0..int.sizeof];
 	write(arr);
 }

This can be significantly shortened:

write((&i)[0..1]);

Remember, all arrays implicitly cast to void[], which is why you use it  
for input parameters.

-Steve

Dec 31 2010

so <so so.do> writes:

 This can be significantly shortened:

 write((&i)[0..1]);

Wow, i didn't know that!
Could you point me to doc, please?

Thanks.

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jan 01 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 01 Jan 2011 10:59:47 -0500, so <so so.do> wrote:

 This can be significantly shortened:

 write((&i)[0..1]);

 Wow, i didn't know that!
 Could you point me to doc, please?

 Thanks.

http://www.digitalmars.com/d/2.0/arrays.html#implicit-conversions

-Steve

Jan 03 2011

so <so so.do> writes:

On Mon, 03 Jan 2011 14:58:22 +0200, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Sat, 01 Jan 2011 10:59:47 -0500, so <so so.do> wrote:

 This can be significantly shortened:

 write((&i)[0..1]);

 Wow, i didn't know that!
 Could you point me to doc, please?

 Thanks.

 http://www.digitalmars.com/d/2.0/arrays.html#implicit-conversions

 -Steve

Thanks.
I know those 3, but they don't have much to do with your example, or most  
likely i didn't get it...

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jan 03 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 03 Jan 2011 15:15:27 -0500, so <so so.do> wrote:

 On Mon, 03 Jan 2011 14:58:22 +0200, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Sat, 01 Jan 2011 10:59:47 -0500, so <so so.do> wrote:

 This can be significantly shortened:

 write((&i)[0..1]);

 Wow, i didn't know that!
 Could you point me to doc, please?

 Thanks.

 http://www.digitalmars.com/d/2.0/arrays.html#implicit-conversions

 -Steve

 Thanks.
 I know those 3, but they don't have much to do with your example, or  
 most likely i didn't get it...

type of (&i)[0..1] is int[].  int[] is implicitly convertable to void[]  
per those rules, so there is no need to cast.

The original post implying that void[] would make things more difficult  
stated that with write taking a (void[]) argument instead of (void *,  
size_t length) you would have to write put like this:


void put(int i) {
	void *tmp = cast(void*)(&i);
	void[] arr = tmp[0..int.sizeof];
	write(arr);
}

But you do not need to do this, all you need is what I wrote (which is  
actually simpler I think than the void*, size_t function).

That was my point.  If there is something else you are looking for, maybe  
you can ask a different question?

-Steve

Jan 03 2011

so <so so.do> writes:

 type of (&i)[0..1] is int[]

I see what the topic is all about.
The trouble is this syntax. You say it is int[], but i couldn't find  
anything in D reference that explains this.
Sorry if i am overlooking something.

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jan 03 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 03 Jan 2011 15:33:10 -0500, so <so so.do> wrote:

 type of (&i)[0..1] is int[]

 I see what the topic is all about.
 The trouble is this syntax. You say it is int[], but i couldn't find  
 anything in D reference that explains this.
 Sorry if i am overlooking something.

Oh that:

http://www.digitalmars.com/d/2.0/arrays.html#slicing

quoted from there:


Slicing is not only handy for referring to parts of other arrays, but for  
converting pointers into bounds-checked arrays:

    int* p;
    int[] b = p[0..8];

The type of &i is int*, so there you go.

-Steve

Jan 03 2011

so <so so.do> writes:

On Mon, 03 Jan 2011 23:02:20 +0200, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Mon, 03 Jan 2011 15:33:10 -0500, so <so so.do> wrote:

 type of (&i)[0..1] is int[]

 I see what the topic is all about.
 The trouble is this syntax. You say it is int[], but i couldn't find  
 anything in D reference that explains this.
 Sorry if i am overlooking something.

 Oh that:

 http://www.digitalmars.com/d/2.0/arrays.html#slicing

 quoted from there:


 Slicing is not only handy for referring to parts of other arrays, but  
 for converting pointers into bounds-checked arrays:

     int* p;
     int[] b = p[0..8];

 The type of &i is int*, so there you go.

 -Steve

Thanks a ton!
Another small but very important feature.

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jan 03 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 31.12.2010 15:43, schrieb Steven Schveighoffer:
 On Fri, 31 Dec 2010 03:28:12 -0500, Daniel Gibson
 <metalcaedes gmail.com> wrote:

 Am 30.12.2010 23:17, schrieb Andrei Alexandrescu:
 On 12/30/10 3:59 PM, Dmitry Olshansky wrote:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"

 This, I guess, would be provided by free functions in the same module,
 there is no point in requiring to implement them inside the stream
 itself.


 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).

 Ditto

 What's wrong with void[]?

 Andrei

 For example:

 void put(int i) {
 write(&i, int.sizeof);
 }

 is shorter and easier than

 void put(int i) {
 void *tmp = cast(void*)(&i);
 void[] arr = tmp[0..int.sizeof];
 write(arr);
 }

 This can be significantly shortened:

 write((&i)[0..1]);

 Remember, all arrays implicitly cast to void[], which is why you use it
 for input parameters.

 -Steve

This is indeed a very cool trick :-)

Jan 03 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 30.12.2010 22:59, schrieb Dmitry Olshansky:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"

 This, I guess, would be provided by free functions in the same module,
 there is no point in requiring to implement them inside the stream itself.

Maybe for the convenience functions, but what about readFully()?
Could the stream not support a non-blocking read that reads up-to 
buffer.length bytes and a blocking read that blocks until buffer.length 
bytes are read?
If you want to read whole ints, floats, shorts, ... you need something 
like that anyway, because only one byte of an int doesn't help you at 
all. But because the stream may support something like this natively, it 
makes sense to have readFully() here and not in Unformatter.



 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).

 Ditto

Cheers,
- Daniel

Dec 31 2010

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 31.12.2010 11:35, Daniel Gibson wrote:
 Am 30.12.2010 22:59, schrieb Dmitry Olshansky:
 On 28.12.2010 16:08, Daniel Gibson wrote:
 [snip]


 I'd like "void readFully(ubyte[] buffer)" which reads buffer.length
 bytes or throws an exception if that is not possible
 This would also fix busy-waiting (it'd block until buffer.length bytes
 are available).

 Also "size_t read(void* buffer, size_t length)" (and the same for
 readFully()) would be nice, so one can read from the stream to buffers
 of arbitrary type without too much casting. Is probably especially
 handy when used with (data from) extern(C) functions and such.

 Also, for convenience: "ubyte[] read(size_t length)" (does something
 like "return read(new ubyte[length]);"
 and "ubyte[] readFully(size_t length)"

 This, I guess, would be provided by free functions in the same module,
 there is no point in requiring to implement them inside the stream 
 itself.

 Maybe for the convenience functions, but what about readFully()?
 Could the stream not support a non-blocking read that reads up-to 
 buffer.length bytes and a blocking read that blocks until 
 buffer.length bytes are read?

I meant something like this (assuming call  to t.read blocks):
//reads exactly buf.length bytes, not counting some extra that might 
reside in the internal buffer
ubyte[] readFully(BufferedInputTransport t, ubyte[] buf) //changed 
signatures to prevent allocations
auto dst = buf[];
while(!dst.empty){
     auto res = t.read(dst, dst.length);
     dst = dst[res.length..$];
}
return buf;
}
Also that would be pig slow without buffering. The internal 
implementation of BufferedTransport may use non-blocking IO to keep 
reasonable buffer fill rate.
 If you want to read whole ints, floats, shorts, ... you need something 
 like that anyway, because only one byte of an int doesn't help you at 
 all. But because the stream may support something like this natively, 
 it makes sense to have readFully() here and not in Unformatter.



 I'd like "void write(void *buffer, size_t length)" - for the same
 reason as read(void* buffer, size_t length).

 Ditto

 Cheers,
 - Daniel

-- 
Dmitry Olshansky

Dec 31 2010

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 12/28/10, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 http://erdani.com/d/phobos/std_stream2.html

What exactly is the difference between an interface and an abstract interface..?

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 9:45 AM, Andrej Mitrovic wrote:
 On 12/28/10, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 http://erdani.com/d/phobos/std_stream2.html

 What exactly is the difference between an interface and an abstract
interface..?

Just an artifact of ddoc.

Andrei

Dec 28 2010

SHOO <zan77137 nifty.com> writes:

(2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei

I hope that this argument grows warm. For Phobos, the I/O is a very 
important component.


I have some doubt about this interface.

1.
There seems to be it on the basis of the deriving.
In comparison with current std.stream, what will the advantage be?

2.
I think that there are two advantages in I/O being introduced into 
Phobos _standard_ library. They are an advantage for a person defining a 
device and advantages for the user of the device.

It gives a person defining I/O device an indicator to determine 
interface in a standard library. The person defining a device can apply 
to various helpers of Phobos by following this indicator. It just 
resembles relations of Range and Algorithms.
In this case, it is important that a definition is simple.
Range is very simple. It makes ends meet with only at least three 
definitions(front, popFront, empty). Like this, it is desirable for the 
base of the I/O interface to make ends meet with a minimum definition.
However, TransportBase needs more definitions.
Cannot you offer the interface that is simpler than in duck-typing?

On the other hand, an advantage of the users of the devices is to use 
unified interface and helpers, and it is the point that it can handle in 
the same way even if a device is anything.
I think that at this point Input/OutputRange is good (like 
Unformatter/Formatter). Users can use various Algorithms by making it Range.

3.
Formatter has writef, but thinks that this is unnecessary.
Because the destination is binary data, writef to write in text data at 
should become the function of TextFormatter. And I can say a similar 
thing about readf of Unformatter.

Thanks.

--
SHOO

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 10:57 AM, SHOO wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei

 I hope that this argument grows warm. For Phobos, the I/O is a very
 important component.


 I have some doubt about this interface.

 1.
 There seems to be it on the basis of the deriving.
 In comparison with current std.stream, what will the advantage be?

With dynamically polymorphic interface, client code need not be 
templated in order to accommodate any implementation of the interface. 
Also, there is more opportunity for layering interface implemetations 
during run time.

This argument used to be stronger in e.g. C++ because defining a 
template function was noisier than defining a regular one. I'm glad to 
see that this particular problem is not as acute today in D.

 2.
 I think that there are two advantages in I/O being introduced into
 Phobos _standard_ library. They are an advantage for a person defining a
 device and advantages for the user of the device.

 It gives a person defining I/O device an indicator to determine
 interface in a standard library. The person defining a device can apply
 to various helpers of Phobos by following this indicator. It just
 resembles relations of Range and Algorithms.
 In this case, it is important that a definition is simple.
 Range is very simple. It makes ends meet with only at least three
 definitions(front, popFront, empty). Like this, it is desirable for the
 base of the I/O interface to make ends meet with a minimum definition.
 However, TransportBase needs more definitions.
 Cannot you offer the interface that is simpler than in duck-typing?

I guess we can, but then let's not forget that to many people 
implementing interfaces is a well-learned lesson.

 On the other hand, an advantage of the users of the devices is to use
 unified interface and helpers, and it is the point that it can handle in
 the same way even if a device is anything.
 I think that at this point Input/OutputRange is good (like
 Unformatter/Formatter). Users can use various Algorithms by making it
 Range.

Right.

 3.
 Formatter has writef, but thinks that this is unnecessary.
 Because the destination is binary data, writef to write in text data at
 should become the function of TextFormatter. And I can say a similar
 thing about readf of Unformatter.

Destination may be text or binary data.


Andrei

Dec 28 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 29 Dec 2010 01:01:09 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 12/28/10 10:57 AM, SHOO wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to  
 start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei

 I hope that this argument grows warm. For Phobos, the I/O is a very
 important component.


 I have some doubt about this interface.

 1.
 There seems to be it on the basis of the deriving.
 In comparison with current std.stream, what will the advantage be?

 With dynamically polymorphic interface, client code need not be  
 templated in order to accommodate any implementation of the interface.  
 Also, there is more opportunity for layering interface implemetations  
 during run time.

Consider this scenario:

stdout is currently implemented via C's FILE * to allow interleaving of C  
output and D output.  However, FILE * has some limitations that may hinder  
performance.  If you don't care about interleaving C and D I/O, you could  
replace stdout with a D-based output stream to achieve higher  
performance.  But this is only possible if stdout is *runtime* switchable,  
which means both the C-based stdout and the D-based stdout have a common  
base and implement polymorphism.

I think the right call in I/O is to use interfaces/classes and not  
compile-time interfaces.

-Steve

Dec 29 2010

spir <denis.spir gmail.com> writes:

On Wed, 29 Dec 2010 11:02:23 -0500
"Steven Schveighoffer" <schveiguy yahoo.com> wrote:

 stdout is currently implemented via C's FILE * to allow interleaving of C=

 =20
 output and D output.  However, FILE * has some limitations that may hinde=

r =20
 performance.  If you don't care about interleaving C and D I/O, you could=

 =20
 replace stdout with a D-based output stream to achieve higher =20
 performance.  But this is only possible if stdout is *runtime* switchable=

, =20
 which means both the C-based stdout and the D-based stdout have a common =

=20
 base and implement polymorphism.
=20
 I think the right call in I/O is to use interfaces/classes and not =20
 compile-time interfaces.

+++ (IIUC) What about logging? Or even output to several 'streams' in //.
One can use an external lib's output functionality and redirect it simply b=
y reassigning stdout. How are such features currently written using D?

Denis
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com

Dec 29 2010

SHOO <zan77137 nifty.com> writes:

(2010/12/29 15:01), Andrei Alexandrescu wrote:
 1.
 There seems to be it on the basis of the deriving.
 In comparison with current std.stream, what will the advantage be?

 With dynamically polymorphic interface, client code need not be
 templated in order to accommodate any implementation of the interface.
 Also, there is more opportunity for layering interface implemetations
 during run time.

 This argument used to be stronger in e.g. C++ because defining a
 template function was noisier than defining a regular one. I'm glad to
 see that this particular problem is not as acute today in D.

I think interface is good for stream. Basically, handling of I/O is 
processing to take time very much. I think that I can ignore function 
call overhead, because it is very smaller than I/O processing.
Because inheritance has not a little advantageous, I do not have the 
dissenting opinion for inheritance.


 2.
 I think that there are two advantages in I/O being introduced into
 Phobos _standard_ library. They are an advantage for a person defining a
 device and advantages for the user of the device.

 It gives a person defining I/O device an indicator to determine
 interface in a standard library. The person defining a device can apply
 to various helpers of Phobos by following this indicator. It just
 resembles relations of Range and Algorithms.
 In this case, it is important that a definition is simple.
 Range is very simple. It makes ends meet with only at least three
 definitions(front, popFront, empty). Like this, it is desirable for the
 base of the I/O interface to make ends meet with a minimum definition.
 However, TransportBase needs more definitions.
 Cannot you offer the interface that is simpler than in duck-typing?

 I guess we can, but then let's not forget that to many people
 implementing interfaces is a well-learned lesson.

There is not the problem even if it used interface if I can easily 
define a device.
Current Transport needs many definitions. For example, even if the 
device is impossible, it is necessary to consider seek. Because seek and 
buffering are options, I hope the interface that is not necessary to 
define and consider basically.

 3.
 Formatter has writef, but thinks that this is unnecessary.
 Because the destination is binary data, writef to write in text data at
 should become the function of TextFormatter. And I can say a similar
 thing about readf of Unformatter.

 Destination may be text or binary data.

Is Formatter one of several interface for users of the devices? Or is 
Formatter aimed at replacing all the interface of the Range-based such 
as ByLine and ByChunk?
I think that you had better make different interface for the different 
thing.
For formatted text, I think that Formatter is useful. However, for the 
binary data, I think that another interface is necessary.

Dec 30 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-28 02:02:29 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I've put together over the past days an embryonic streaming interface. 
 It separates transport from formatting, input from output, and buffered 
 from unbuffered operation.
 
 http://erdani.com/d/phobos/std_stream2.html
 
 There are a number of questions interspersed. It would be great to 
 start a discussion using that design as a baseline. Please voice any 
 related thoughts - thanks!

One of my concerns is the number of virtual calls required in actual 
usage, because virtual calls prevent inlining. I know it's necessary to 
have virtual calls in the formatter to serialize objects (which 
requires double dispatch), but in your design the underlying transport 
layer too wants to be called virtually. How many virtual calls will be 
necessary to serialize an array of 10 objects, each having 10 fields? 
Let's see:

	  10 calls to Formatter.put(Object)
	+ 10 calls to Object.toString(Formatter)
	+ 10 objects * 10 calls per object to Formatter.put(<some field type>)
	+ 10 objects * 10 calls per object to 
UnbufferedOutputTransport.write(in ubyte[])

Total: 220 virtual calls, for 10 objects with 10 fields each. Most of 
the functions called virtually here are pretty trivial and would 
normally be inlined if the context allowed it. Assuming those fields 
are 4 byte integers and are stored as is in the stream, the result will 
be between 400 and 500 byte long once we add the object's class name. 
We end up having almost 1 virtual call for each two byte of emitted 
data; is this overhead really acceptable? How much inlining does it 
prevent?

My second concern is that your approach to Formatter is too rigid. For 
instance, what if an object needs to write different fields depending 
on the output format, or write them in a different order? It'll have to 
check at runtime which kind of formatter it got (through casts 
probably). Or what if I have a formatter that wants to expose an XML 
tree instead of bytes? It'll need a totally different interface that 
deals with XML elements, attributes, and character data, not bytes.

So because of all this virtual dispatch and all this rigidity, I think 
Formatter needs to be rethought a little. My preference obviously goes 
to satically-typed formatters. But what I'd like to see is something 
like this:

	interface Serializable(F) {
		void writeTo(F formatter);
	}

Any object can implement a serialization for a given formatter by 
implementing the interface above parametrized with the formatter type. 
(Struct types could have a similar writeTo function too, they just 
don't need to implement an interface.) The formatter type can expose 
the interface it wants and use or not use virtual functions, it could 
be an XML writer interface (something with openElement, 
writeCharacterData, closeElement, etc), it could be a JSON interface; 
it could even be your Formatter as proposed, we just wouldn't be 
limited by it.

So basically, I'm not proposing you dump Formatter, just that you make 
it part of a reusable pattern for 
formatting/serializing/unformatting/unserializing things using other 
things that your Formatter interface.

As for the transport layer, I don't mind it much if it's an interface. 
Unlike Formatter, nothing prevents you from creating a 'final' class 
and using it directly when you can to avoid virtual dispatch. This 
doesn't work so well for Formatter however because it requires double 
dispatch when it encounters a class, which washes away all static 
information.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 28 2010

Sean Kelly <sean invisibleduck.org> writes:

Michel Fortin Wrote:
 
 So because of all this virtual dispatch and all this rigidity, I think 
 Formatter needs to be rethought a little. My preference obviously goes 
 to satically-typed formatters. But what I'd like to see is something 
 like this:
 
 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}
 
 Any object can implement a serialization for a given formatter by 
 implementing the interface above parametrized with the formatter type.

I like it.  There needs to be some way to hold format-specific state info for a
stream though.  I guess this could be done via an external hash (stream address
to formatter state), but it would be nicer if this could be stored in the
stream itself somehow.

Dec 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-28 13:07:56 -0500, Sean Kelly <sean invisibleduck.org> said:

 Michel Fortin Wrote:
 
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:
 
 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}
 
 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.

 
 I like it.  There needs to be some way to hold format-specific state 
 info for a stream though.  I guess this could be done via an external 
 hash (stream address to formatter state), but it would be nicer if this 
 could be stored in the stream itself somehow.

The 'F' formatter can be anything, it can be a class, a delegate, a 
struct (although for a struct you might want to pass it as 'ref')... so 
it *can* hold a state. Or am I missing something?

If we want to specify additional parameters to writeTo for a given 
formatter, such as a format string, then the Serializable interface 
template could introspect type F to find what additional arguments it 
wants writeTo to have.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 28 2010

Sean Kelly <sean invisibleduck.org> writes:

Michel Fortin Wrote:

 On 2010-12-28 13:07:56 -0500, Sean Kelly <sean invisibleduck.org> said:
 
 Michel Fortin Wrote:
 
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:
 
 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}


 
 The 'F' formatter can be anything, it can be a class, a delegate, a 
 struct (although for a struct you might want to pass it as 'ref')... so 
 it *can* hold a state. Or am I missing something?

And I guess writeTo could just call formatter.write(MyClass c).  You're right,
that works.

Dec 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-28 17:19:01 -0500, Sean Kelly <sean invisibleduck.org> said:

 Michel Fortin Wrote:
 
 On 2010-12-28 13:07:56 -0500, Sean Kelly <sean invisibleduck.org> said:
 
 Michel Fortin Wrote:
 
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:
 
 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}


 
 The 'F' formatter can be anything, it can be a class, a delegate, a
 struct (although for a struct you might want to pass it as 'ref')... so
 it *can* hold a state. Or am I missing something?

 
 And I guess writeTo could just call formatter.write(MyClass c).  You're 
 right, that works.

Well, not exactly. I'd expect formatter.write(Object) do be the one 
calling writeTo. Here's what a similar function in my own code does 
(with a few things renamed to match this discussion):

	void write(T)(in T value) if (is(T == class)) {
		write('O'); // identifying an object type
		writeMappedString(value.classinfo.name); // class name
		
		// cast to interface and call writeTo
		auto s = cast(Serializable!Formatter)value;
		assert(s);
		s.writeTo(this);
		
		write('Z'); // end of object
	}

A typical writeTo might look like this:

	void writeTo(Formatter formatter) {
		formatter.write(member1);
		formatter.write(member2);
	}

or like this:

	void writeTo(Formatter formatter) {
		formatter.writeKeyValue("member1", member1);
		formatter.writeKeyValue("member2", member2);
	}

or anything else that fits how a specific formatter type wants to 
receive its data. This writeTo function could be generated with a mixin 
that'd introspect the type. The only thing is that you need to define 
writeTo (or use the mixin) with any class and subclass you want to 
serialize.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 28 2010

Sean Kelly <sean invisibleduck.org> writes:

Michel Fortin Wrote:

 On 2010-12-28 17:19:01 -0500, Sean Kelly <sean invisibleduck.org> said:
 
 And I guess writeTo could just call formatter.write(MyClass c).  You're 
 right, that works.

 
 Well, not exactly. I'd expect formatter.write(Object) do be the one 
 calling writeTo.

...
 A typical writeTo might look like this:
 
 	void writeTo(Formatter formatter) {
 		formatter.write(member1);
 		formatter.write(member2);
 	}

This is what I meant.  Sorry for the confusion.

 The only thing is that you need to define 
 writeTo (or use the mixin) with any class and subclass you want to 
 serialize.

Similar to what I did in C++ then.  Gotcha.

Dec 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-28 18:58:40 -0500, Sean Kelly <sean invisibleduck.org> said:

 Michel Fortin Wrote:
 The only thing is that you need to define
 writeTo (or use the mixin) with any class and subclass you want to
 serialize.

 
 Similar to what I did in C++ then.  Gotcha.

I never pretended I had overcome the problem of the lack of runtime 
reflection. Without better runtime reflection, we're limited in many 
ways just like C++ is.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 6:14 PM, Michel Fortin wrote:
 On 2010-12-28 18:58:40 -0500, Sean Kelly <sean invisibleduck.org> said:

 Michel Fortin Wrote:
 The only thing is that you need to define
 writeTo (or use the mixin) with any class and subclass you want to
 serialize.

 Similar to what I did in C++ then. Gotcha.

 I never pretended I had overcome the problem of the lack of runtime
 reflection. Without better runtime reflection, we're limited in many
 ways just like C++ is.

After some thought, I think we should confine the charter of the current 
library like this:

* Transport only transports untyped bits

* Formatter only formats primitive types

We will build more sophisticated superstructure on top of these, but 
let's not embellish Formatter too much right now.


Andrei

Dec 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-29 01:55:29 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 After some thought, I think we should confine the charter of the 
 current library like this:
 
 * Transport only transports untyped bits
 
 * Formatter only formats primitive types
 
 We will build more sophisticated superstructure on top of these, but 
 let's not embellish Formatter too much right now.

Seems reasonable to me.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 29 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:
 
 * Formatter only formats primitive types
 
 We will build more sophisticated superstructure on top of these, but 
 let's not embellish Formatter too much right now.

Can formatters be chained?  Data available from Bloomberg, for example, is
triple DES encoded, gzipped, and uuencoded.

Dec 29 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/29/10 8:45 AM, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 * Formatter only formats primitive types

 We will build more sophisticated superstructure on top of these, but
 let's not embellish Formatter too much right now.

 Can formatters be chained?  Data available from Bloomberg, for example, is
triple DES encoded, gzipped, and uuencoded.

I think Transports are supposed to be chained.

Andrei

Dec 29 2010

sclytrack <sclytrack fake.com> writes:

class SerializableObject
{
  void describe( PropertyDescription d )
  {
     d.addProperty(...)
  }
}


== Quote from Sean Kelly (sean invisibleduck.org)'s article
 Michel Fortin Wrote:
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:

 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}

 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.

 I like it.  There needs to be some way to hold format-specific state info for a

stream though.  I guess this could be done via an external hash (stream address
to
formatter state), but it would be nicer if this could be stored in the stream
itself somehow.

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 12:07 PM, Sean Kelly wrote:
 Michel Fortin Wrote:
 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters. But what I'd like to see is something
 like this:

 	interface Serializable(F) {
 		void writeTo(F formatter);
 	}

 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.

 I like it.  There needs to be some way to hold format-specific state
 info for a stream though.  I guess this could be done via an external
 hash (stream address to formatter state), but it would be nicer if
 this could be stored in the stream itself somehow.

This design prevents new formatters from working with existing class 
hierarchies, unless they themselves obey a hierarchy which undoes the 
very advantage of the design.

It also forces the person defining a class hierarchy to statically 
commit to a specific formatter for the entire hierarchy.

As a corollary this design forces the designer of a hierarchy to make 
early and big decisions.


Andrei

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 11:39 AM, Michel Fortin wrote:
 On 2010-12-28 02:02:29 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and
 buffered from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to
 start a discussion using that design as a baseline. Please voice any
 related thoughts - thanks!

 One of my concerns is the number of virtual calls required in actual
 usage, because virtual calls prevent inlining. I know it's necessary to
 have virtual calls in the formatter to serialize objects (which requires
 double dispatch), but in your design the underlying transport layer too
 wants to be called virtually. How many virtual calls will be necessary
 to serialize an array of 10 objects, each having 10 fields? Let's see:

 10 calls to Formatter.put(Object)
 + 10 calls to Object.toString(Formatter)
 + 10 objects * 10 calls per object to Formatter.put(<some field type>)
 + 10 objects * 10 calls per object to UnbufferedOutputTransport.write(in
 ubyte[])

 Total: 220 virtual calls, for 10 objects with 10 fields each. Most of
 the functions called virtually here are pretty trivial and would
 normally be inlined if the context allowed it. Assuming those fields are
 4 byte integers and are stored as is in the stream, the result will be
 between 400 and 500 byte long once we add the object's class name. We
 end up having almost 1 virtual call for each two byte of emitted data;
 is this overhead really acceptable? How much inlining does it prevent?

Probably that overhead may be quite large.

 My second concern is that your approach to Formatter is too rigid. For
 instance, what if an object needs to write different fields depending on
 the output format, or write them in a different order? It'll have to
 check at runtime which kind of formatter it got (through casts
 probably). Or what if I have a formatter that wants to expose an XML
 tree instead of bytes? It'll need a totally different interface that
 deals with XML elements, attributes, and character data, not bytes.

I think that's a very rare situation. When you pick a certain formatter, 
you commit to a certain representation, period. It's poor design to have 
the object object (sic) to that representation.

To some extent representation can be tweaked via format specifiers, 
which are a language spoken by both the formatter and the formatted.

 So because of all this virtual dispatch and all this rigidity, I think
 Formatter needs to be rethought a little. My preference obviously goes
 to satically-typed formatters.

It's heartwarming to see so much interest in static polymorphism. Only a 
couple of years ago I would've had trouble convincing people of that; 
now I need to preach the advantages of dynamic polymorphism.

 But what I'd like to see is something
 like this:

 interface Serializable(F) {
 void writeTo(F formatter);
 }

Let me make sure I understand correctly. So when I define a class I 
commit to its possible representations? Doesn't seem good design to me. 
What if I later come with a new Formatter? I'd need to change my entire 
class hierarchy too.

 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.

If only one formatter would be allowed that would be even worse. But you 
can allow several:

class Widget : Serializable!Json, Serializable!Binary {
   ...
}

Sorry, I think this is poor design.

 (Struct types could have a similar writeTo function too, they just don't
 need to implement an interface.) The formatter type can expose the
 interface it wants and use or not use virtual functions, it could be an
 XML writer interface (something with openElement, writeCharacterData,
 closeElement, etc), it could be a JSON interface; it could even be your
 Formatter as proposed, we just wouldn't be limited by it.

 So basically, I'm not proposing you dump Formatter, just that you make
 it part of a reusable pattern for
 formatting/serializing/unformatting/unserializing things using other
 things that your Formatter interface.

I may be misunderstanding, but to me it seems that this design brings 
more problems than it solves.

 As for the transport layer, I don't mind it much if it's an interface.
 Unlike Formatter, nothing prevents you from creating a 'final' class and
 using it directly when you can to avoid virtual dispatch. This doesn't
 work so well for Formatter however because it requires double dispatch
 when it encounters a class, which washes away all static information.

I agree that Transport is fine with the dynamic interface.


Andrei

Dec 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-29 01:32:17 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I may be misunderstanding, but to me it seems that this design brings 
 more problems than it solves.

It seems we're approaching the problem from different angles. What you 
seem to want is a general way to serialize objects and data structures. 
For this task, your concept of Formatter is fine, except perhaps the 
virtual dispatch overhead might be unacceptable in some cases.

What I want is a way to serialize specific objects to specific formats. 
I don't need all of my objects to be serializable to a RSS feed, but 
for those who do I want them to output things correctly, and nothing's 
better for that than a formatter class that just takes some values as 
function arguments and transform them to a RSS feed, encapsulating the 
format within the formatter. This RSS formatter could in turn use an 
XML formatter to write the XML output, which in turn could use some 
kind a text formatter to convert the text to the desired encoding 
before sending it to the transport layer.

So our formatters have different purposes, but they can share the same pattern.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 29 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday 27 December 2010 23:02:29 Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.
 
 http://erdani.com/d/phobos/std_stream2.html
 
 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!

The fact that streams here are interfaces and yet Phobos uses templates most 
everywhere rather than interfaces makes me wonder about what exactly the
reasons 
for that are and what the pros and cons and both approaches really are. My
first 
reaction at this point tends to be that if it's an interface, it can be a 
template, but that doesn't work as well if it's expected that it'll be normal 
for streams to be fed to virtual functions.

In any case, the one thing about this which immediately concerned me was the 
fact that TransportBase throws on some functions if the class implementing the 
interface doesn't support them. I _hate_ the fact that Java does that on some
of 
their stream functions, and I'd hate to see that in D, let alone in Phobos. If 
it doesn't support them, _then it doesn't properly implement the interface_.
How 
would you use such functions in real code? If you rely on their behavior,
you're 
going to get an exception at runtime rather than being able to determine at 
compile time that that behavior isn't supported. And if you try to use them and 
they aren't supported, but you _can_ do what you need to do without them, then 
you have to catch an exception and then having a different code path.

I'd _strongly_ suggest splitting TransportBase into two interfaces if it has 
functions which aren't necessarily really implemented in the classes that 
implement it. And if for some reason, that isn't reasonable, at least add a 
function which is supposed to return whether the positioning primitives are 
properly implemented.

- Jonathan M Davis

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 4:06 PM, Jonathan M Davis wrote:
 On Monday 27 December 2010 23:02:29 Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!

 The fact that streams here are interfaces and yet Phobos uses templates most
 everywhere rather than interfaces makes me wonder about what exactly the
reasons
 for that are and what the pros and cons and both approaches really are. My
first
 reaction at this point tends to be that if it's an interface, it can be a
 template, but that doesn't work as well if it's expected that it'll be normal
 for streams to be fed to virtual functions.

Dynamic polymorphism is a well understood style of coding and is 
somewhat simpler syntactically even in D. Conventional wisdom has it 
that one should use dynamic polymorphism if possible and resort to 
static polymorphism when considerations of e.g. type information or 
efficiency require it.

A statically polymorphic design of streams would define various 
Formatter structs parameterized on the type of transport, all offering 
the same implicit interface. Then, code that wants to use formatters 
would be parameterized by the type of the formatter. In fact this is 
what std.format does today.

 In any case, the one thing about this which immediately concerned me was the
 fact that TransportBase throws on some functions if the class implementing the
 interface doesn't support them. I _hate_ the fact that Java does that on some
of
 their stream functions, and I'd hate to see that in D, let alone in Phobos. If
 it doesn't support them, _then it doesn't properly implement the interface_.
How
 would you use such functions in real code? If you rely on their behavior,
you're
 going to get an exception at runtime rather than being able to determine at
 compile time that that behavior isn't supported. And if you try to use them and
 they aren't supported, but you _can_ do what you need to do without them, then
 you have to catch an exception and then having a different code path.

I think that's a minor concern. Some files are seekable and some aren't. 
We're well used to that. This file passed to is seekable:

prog <foo.txt

This is not:

cat foo.txt | prog

It's a dynamically decided capability as cut and dried as it gets.

 I'd _strongly_ suggest splitting TransportBase into two interfaces if it has
 functions which aren't necessarily really implemented in the classes that
 implement it. And if for some reason, that isn't reasonable, at least add a
 function which is supposed to return whether the positioning primitives are
 properly implemented.

I think the latter is doable.


Andrei

Dec 28 2010

Haruki Shigemori <rayerd.wiz gmail.com> writes:

(2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei

I've waited so long for this day.
Excuse me, would you give me a user side code and librarian side code 
using std.stream2?
I don't know a concrete implementation of the std.stream2 interfaces.

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 5:14 PM, Haruki Shigemori wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei

 I've waited so long for this day.
 Excuse me, would you give me a user side code and librarian side code
 using std.stream2?
 I don't know a concrete implementation of the std.stream2 interfaces.

There isn't one. The source code is just support for documentation, and 
I attach it with this message.

Thanks for participating! I know there has been some good stream-related 
activity in the Japanese D community.


Andrei

Dec 28 2010

SHOO <zan77137 nifty.com> writes:

(2010/12/29 8:41), Andrei Alexandrescu wrote:
 On 12/28/10 5:14 PM, Haruki Shigemori wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei

 I've waited so long for this day.
 Excuse me, would you give me a user side code and librarian side code
 using std.stream2?
 I don't know a concrete implementation of the std.stream2 interfaces.

 There isn't one. The source code is just support for documentation, and
 I attach it with this message.

 Thanks for participating! I know there has been some good stream-related
 activity in the Japanese D community.


 Andrei

Like this?

-----
import std.stream2;

void main()
{
     /*
     <data>
         <int>123</int>
         <double>55.98</double>
         <string>aabbccddee</string>
     </data>
     */
     auto infile = new BufferedFileTransport("intest.xml");
     auto unfmt  = new XmlUnformatter(infile);

     int a;
     double b;
     string c;

     unfmt.read(a);
     unfmt.read(b);
     unfmt.read(c);

     writeln(a); // 123
     writeln(b); // 55.98
     writeln(c); // aabbccddee

     auto outfile = new UnbufferedFileTransport("outtest.dat");
     auto fmt     = new BinaryFormatter(outfile);

     fmt.put(a);
     fmt.put(b);
     fmt.put(c);
     /*
         |  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
     ----+------------------------------------------------
     0000| 7B-00-00-00-00-00-00-00-3D-0A-D7-A3-70-FD-4B-40
     0001| 0A-00-00-00-61-61-62-62-63-63-64-64-65-65
     */
}

--
SHOO

Dec 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/10 8:33 PM, SHOO wrote:
 (2010/12/29 8:41), Andrei Alexandrescu wrote:
 On 12/28/10 5:14 PM, Haruki Shigemori wrote:
 (2010/12/28 16:02), Andrei Alexandrescu wrote:
 I've put together over the past days an embryonic streaming interface.
 It separates transport from formatting, input from output, and buffered
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to
 start
 a discussion using that design as a baseline. Please voice any related
 thoughts - thanks!


 Andrei

 I've waited so long for this day.
 Excuse me, would you give me a user side code and librarian side code
 using std.stream2?
 I don't know a concrete implementation of the std.stream2 interfaces.

 There isn't one. The source code is just support for documentation, and
 I attach it with this message.

 Thanks for participating! I know there has been some good stream-related
 activity in the Japanese D community.


 Andrei

 Like this?

[snip]

Looks promising!

Andrei

Dec 28 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 28 Dec 2010 00:02:29 -0700, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.  
 It separates transport from formatting, input from output, and buffered  
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start  
 a discussion using that design as a baseline. Please voice any related  
 thoughts - thanks!


 Andrei

Here are my initial thoughts and responses to the questions. Now to go  
read everyone else's.

Re: TransportBase
Q1: Internally, I think it is a good idea for transport to support lazy  
opening, but I'm not sure of the hassle/benefit reward for exposing this  
to user code. If open is supported, I don't think it should take any  
parameters.
Q2: If seek isn't considered universal, having a isSeekable and rewind,  
might be beneficial. But while I know of transports where seeking might be  
slow, I'm not sure which one wouldn't support it at all, or only support  
rewind.
Q3: Yes, to seek + tell and getting rid of seekFromXXX.

Re: UnbufferedInputTransport
Q1: I think that read should be allowed to return less than buffered  
length, but since the transport should know the most efficient way to  
block on an input, I don't think returning a length zero array is valid.

Re: BufferedInputTransport
Q1: I think it's valid for the front of a buffer input to be empty: an  
empty front simply means that popFront should be called. popFront should  
be required to fill at least some of front (See UnbufferedInputTransport  
Q1)

Q2: Semantically, 'advance' feels to like popFront: I want to advance my  
input and I'm intending to work with it. The seek routines, on the other  
hand feel more like indexing: I want to do something with that index, but  
I do not necessarily need everything in between. In particular, I'd expect  
long seeks to reduce the front array to a zero elements, while I'd expect  
advance to enlarge the internal buffer if necessary.

Re: Formatter
Q1: I don't think formatters should be responsible for buffering, but  
certain formats require rather extensive buffering that can't be provided  
by the current buffer transport classes. (BSON comes to mind). My initial  
impression is that seek, etc should be able to handle these use cases, but  
adding a buffer hint setter/getter might be a good idea. The idea being  
that if the formatter knows that it will come back to this part of the  
stream, it can set a hint, so the buffer can make a more intelligent  
choice of when/where to flush internally.
Q2: putln only makes sense in terms of text based streams, plus it adds a  
large number of methods to implement. So I'm a bit on the fence about it.  
I think writefln would be a better solution to a similar problem.
Q3: The issue I see with a reflection-based solution is that the runtime  
reflection system should respect the visibility of the member: i.e.  
private variables shouldn't be accessible. But to do effective  
serialization, private members are generally required. As for the more  
technical aspects, combining __traits(derivedMembers,T) and  
BaseClassesTuple!T can determine which objects overload toString, etc.
Q4: Reading/writting the same sub-object is an internal mater, in my  
opinion. The really important aspect is handling slices, etc nicely for  
formats that support cyclic graphs. For which, the only thing missing is  
put(void*) to handle pointers (I think).
Q5: I think handling AA's with hooks is the best case with this design,  
though I only see a need for start and end. The major issue is that  
reading should be done as a tuple, which basically breaks the interface  
idiom. Alternatively, callbacks could be used to set read's mode: i.e.  
readKeyMode, readValueMode & putKeyMode, putValueMode.
Q6: Well, toString and cast(int/double/etc), should go a long way to  
covering most of the printf specifiers
Q7: Yes, writefln should probable be supported for text based transport.

Re: Unformatter
Q1: Implementations should be free (and indeed encouraged) to minimize  
allocations by returning a reusable buffer for arrays. So the stream  
should be responsible for inferring the size of an array.
Q2: See Formatter Q3.
Q3: See Formatter Q5.


Other Formatter/Unformatter thoughts:
For objects, several formats also require additional meta information  
(i.e. a unique string id, member offset position, etc), while others don't.

Dec 28 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:

 On 12/28/10 11:39 AM, Michel Fortin wrote:

 But what I'd like to see is something
 like this:

 interface Serializable(F) {
 void writeTo(F formatter);
 }

 
 Let me make sure I understand correctly. So when I define a class I 
 commit to its possible representations? Doesn't seem good design to me. 
 What if I later come with a new Formatter? I'd need to change my entire 
 class hierarchy too.

What about defining this trait externally?  It would limit the formatter to
accessing public data members though.

 Any object can implement a serialization for a given formatter by
 implementing the interface above parametrized with the formatter type.

 
 If only one formatter would be allowed that would be even worse. But you 
 can allow several:
 
 class Widget : Serializable!Json, Serializable!Binary {
    ...
 }
 
 Sorry, I think this is poor design.

I think a distinction should be drawn between unstructured formats
(compression, encryption) and structured formats (json, xml, csv).  In the
latter case, each piece of data written may need a label, there may be some
context-specific separation between elements, etc.  One obviously knows the
desired serialization structure at design time, so the issue is how to achieve
it within the streaming mechanism.

I've encountered two cases: first, where I'm serializing a set of objects in
code I own that has a direct relation to the serialized structure, and second,
where I don't have such distinct chunks of serializable data in memory and I
assemble the output from more granular data.  Steve's formatter works very well
for the first case, and this is by far the most common (I can't think of a
single case where I've needed to format data objects that I can't alter, ie.
from a third-party library).  The latter is mostly an issue when a distinct
serialized element is quite large and/or the app is generating the output
somehow.

I tend to work entirely with the last category of data and so don't expect any
in-stream formatter to work for me, but the closer it can get the better :-) 
This could be tabular data where each row is quite large (a CSV stream needs
some way to denote the end of a row for the newline, for example), input
translated dynamically from another source and pumped to an output stream in
some structured format like XML or JSON, etc.  The problem with supporting this
design with a stream formatter is that it often only works for
output--unformatting the input typically requires a parser of some sort.  It
makes for one task a novice programmer can do (writing output), but the
asymmetry is a bit weird from a stream design perspective.

 So basically, I'm not proposing you dump Formatter, just that you make
 it part of a reusable pattern for
 formatting/serializing/unformatting/unserializing things using other
 things that your Formatter interface.

 
 I may be misunderstanding, but to me it seems that this design brings 
 more problems than it solves.

It solves the (common) first problem above of reading/writing structured data
formats where the data is available in-memory.  That covers quite a lot.

Dec 29 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 28 Dec 2010 02:02:29 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.  
 It separates transport from formatting, input from output, and buffered  
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start  
 a discussion using that design as a baseline. Please voice any related  
 thoughts - thanks!

Without reading any other comments, here is my take on just the streaming  
part (not formatting).

Everything looks good except for two problems:

1. BufferedX should not inherit UnbufferedX.  The main reason for this is  
because both Buffered *and* Unbuffered can be desirable properties.  For  
example, you may want to *require* that you have a raw stream as a  
parameter without a buffer.  The perfect example is a class which wraps an  
Unbuffered stream, and adds a buffer to it (which is what I'd expect as a  
class design).  You don't want to accept a stream that's already buffered,  
or you are double-buffering.  You can deal with this at runtime by  
throwing an exception, but I think it's better to disallow this to even  
compile.

Now, this removes the possibility of having a function which accepts  
either an unbuffered or buffered stream.  I stipulate that this is not a  
valid requirement -- your code will work best with one of them, but not  
both.  If you really need to accept either, you can use templates, but I  
think you will find you always use one or the other even there.

2. I think it's a mistake to put a range interface directly in the  
interface.  A range can be built with the buffered stream as its core if  
need be.  I have long voiced my opinion that I/O should not implement  
ranges, and reference types should never be ranges.  For example, you are  
going to implement byLine based not on the range interface, but based on  
the other parts.  Why must byLine be an external range, but "byBuffer" is  
builtin to the stream?  In particular, I think popFront is an odd function  
for all buffered streams to have to implement.

To voice my opinions on the questions:


-----
Question: Should we offer an open primitive at this level? If so, what  
parameter(s) should it take?

No, if you need a new stream, create a new instance.  The OS processing  
required to open a file is going to dwarf any performance degradation of  
creating a new class on the heap.
For types that may open quick (say, an Array input stream), you can  
provide a function to re-open another array that doesn't have to go in the  
base interface.

Also note that opening a network stream requires quite different  
parameters than opening a file.  Putting it at the interface level would  
require some sort of parsed-string parameter, which puts undue  
responsibility on such a basic interface.

-----
Question: Should we offer a primitive rewind that takes the stream back to  
the beginning? That might be supported even by some streams that don't  
support general seek calls. Alternatively, some streams might support  
seek(0, SeekAnchor.start) but not other calls to seek.

Considering that seek is already callable, even if the stream doesn't  
support it (because the interface defines it), I don't think it's  
unreasonable to selectively throw exceptions if the seek isn't possible.   
In otherwords, I think seek(0) is acceptable as an alternative to rewind().

However, you may also implement:

final void rewind() { seek(0);}

directly in the interface if necessary

-----
Question: May we eliminate seekFromCurrent and seekFromEnd and just have  
seek with absolute positioning? I don't know of streams that allow seek  
without allowing tell. Even if some stream doesn't, it's easy to add  
support for tell in a wrapper. The marginal cost of calling tell is small  
enough compared to the cost of seek.

I don't think the cost of tell is marginal.  Support what the OS supports,  
and all OSes support seeking from the current position, reducing the  
number of system calls is preferable.

Also, how to implement seekFromEnd with just tell?

-----
Question: Should this throw on an unopened stream? I don't think so,  
because throwing does not offer any additional information that user code  
didn't have, and the idiom if (s.isOpen) s.close() is verbose and  
frequently encountered.

I agree, don't throw on an unopened stream.

-----
Question: Should we allow read to return an empty slice even if atEnd is  
false? If we do, we allow non-blocking streams with burst transfer.  
However, naive client code on non-blocking streams will be inefficient  
because it would essentially implement busy-waiting.

Why not return an integer so different situations could be designated?   
It's how the system call read works so you can tell no data was read but  
that's because it's a non-blocking stream.

I realize it's sexy to return the data again so it can be used  
immediately, but in practice it's more useful to return an integer.

For example, if you want to fill a buffer, you need a loop anyways  
(there's no guarantee that the first read will fill the buffer), and at  
that point, you are just going to use the length member of the return  
value to advance your loop.

I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF,  
positive on data read, and throw an exception on error.

-----
Question: Should we allow an empty front on a non-empty stream? This goes  
back to handling non-blocking streams.

Well, streams shouldn't have a range interface anyways, but to answer this  
specific question, I'd say no.  front should fill the buffer if it's  
empty.  This follows the nature of all other ranges, where front is  
available on creation.

-----
Question: Should we eliminate this function? Theoretically calling  
advance(n) is equivalent with seekFromCurrent(n). However, in practice a  
file-based stream will have to implement advance even though the  
underlying file is not seekable.

I think it's good to have this function.  At first, I didn't, but now I  
realize it's good because advance(n) may be low-performance (it may use  
read to advance the stream).  If you eliminate this function, but put it's  
functionality into seekFromCurrent, this makes seekFromCurrent low  
performance.

I think you should change the requirements, however, and follow the same  
return type as I specified above for read (-1 for wouldblock, 0 for EOF,  
positive for number of bytes 'advanced').  Otherwise, you have issues with  
non-blocking streams.

====================

OK, so now I've voiced my opinions on what's there, now I'll push the  
interface I had specified some time ago (which incidentally, I am building  
an I/O library based off of it).  From my current skeleton:


     /**
      * Read data until a condition is satisfied.
      *
      * Buffers data from the input stream until the delegate returns other  
than
      * ~0.  The delegate is passed the data read so far, and the start of  
the
      * data just read.  The deleate should return ~0 if the condition is  
not
      * satisfied, or the number of bytes that should be returned otherwise.
      *
      * Any data that satisfies the condition will be considered consumed  
from
      * the stream.
      *
      * params: process = A delegate to determine satisfaction of a  
condition
      * per the terms above.
      *
      * returns: the data identified by the delegate that satisfies the
      * condition.  Note that this data may be owned by the buffer and so
      * shouldn't be written to or stored for later use without duping.
      */
     ubyte[] readUntil(uint delegate(ubyte[] data, uint start) process);

The advantage of such an interface is that it creates a very efficient way  
to specify how to buffer the data based on the data (i.e. byLine comes to  
mind).

Here is a second function that does the same as above but appends it  
directly into a user-supplied buffer:

     size_t appendUntil(uint delegate(ubyte[] data, uint start) process,  
ref ubyte[] arr);

-Steve

Dec 29 2010

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

[snip]
 -----
 Question: Should we allow read to return an empty slice even if atEnd 
 is false? If we do, we allow non-blocking streams with burst transfer. 
 However, naive client code on non-blocking streams will be inefficient 
 because it would essentially implement busy-waiting.

 Why not return an integer so different situations could be 
 designated?  It's how the system call read works so you can tell no 
 data was read but that's because it's a non-blocking stream.

 I realize it's sexy to return the data again so it can be used 
 immediately, but in practice it's more useful to return an integer. 

 For example, if you want to fill a buffer, you need a loop anyways 
 (there's no guarantee that the first read will fill the buffer), and 
 at that point, you are just going to use the length member of the 
 return value to advance your loop.

 I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF, 
 positive on data read, and throw an exception on error.

Maybe it's only me but I would prefer non-blocking IO not mixed with 
blocking in such a way. Imagine function that takes an 
UnbufferedInputTransport, how should it indicate that it expects only a 
non-blocking IO capable transport? Or the other way around. Checking 
return codes hardly helps anything, and it means supporting both types 
everywhere, which is a source of all kind of  weird problems.
 From my (somewhat limited) experience, code paths for blocking and 
non-blocking IO are quite different, the latter are performed by 
*special* asynchronous calls which are supported by all modern OSes for 
things like files/sockets.

Then my position would be:
1) All read/write methods are *blocking*, returning empty slices on EOF.
2) Transport that supports asynchronous IO should implement extended 
interfaces like
interface AsyncInputTransport: UnbufferedInputTransport{
     void asyncRead(ubyte[] buffer, void delegate(ubyte[] data) 
callback=null);
}
interface AsyncOutputTransport: UnbufferedOutputTransport{
     void asyncWrite(ubyte[] buffer, void delegate(ubyte[] data) 
callback=null);
}
Where callback (if not null) is called with a slice of buffer containing 
actual read/written bytes on IO completion.
Any calls to read/asyncRead while there is asynchronous IO operation 
going on should throw, of course.

Regarding buffering transports I agree with Steven, they shouldn't be 
interfaces *derived* from Unbuffered...Transport.
Speaking of the above the ubyte[] front property of whatever buffered 
range-like construct we settle on IMHO should be blocking, since you 
can't get  any advantage over this by making front return an empty 
slices or throw exceptions on "would block" situations.

-- 
Dmitry Olshansky

Dec 30 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 30 Dec 2010 16:49:15 -0500, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 [snip]
 -----
 Question: Should we allow read to return an empty slice even if atEnd  
 is false? If we do, we allow non-blocking streams with burst transfer.  
 However, naive client code on non-blocking streams will be inefficient  
 because it would essentially implement busy-waiting.

 Why not return an integer so different situations could be designated?   
 It's how the system call read works so you can tell no data was read  
 but that's because it's a non-blocking stream.

 I realize it's sexy to return the data again so it can be used  
 immediately, but in practice it's more useful to return an integer. For  
 example, if you want to fill a buffer, you need a loop anyways (there's  
 no guarantee that the first read will fill the buffer), and at that  
 point, you are just going to use the length member of the return value  
 to advance your loop.

 I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF,  
 positive on data read, and throw an exception on error.

 Maybe it's only me but I would prefer non-blocking IO not mixed with  
 blocking in such a way. Imagine function that takes an  
 UnbufferedInputTransport, how should it indicate that it expects only a  
 non-blocking IO capable transport? Or the other way around. Checking  
 return codes hardly helps anything, and it means supporting both types  
 everywhere, which is a source of all kind of  weird problems.
  From my (somewhat limited) experience, code paths for blocking and  
 non-blocking IO are quite different, the latter are performed by  
 *special* asynchronous calls which are supported by all modern OSes for  
 things like files/sockets.

 Then my position would be:
 1) All read/write methods are *blocking*, returning empty slices on EOF.
 2) Transport that supports asynchronous IO should implement extended  
 interfaces like
 interface AsyncInputTransport: UnbufferedInputTransport{
      void asyncRead(ubyte[] buffer, void delegate(ubyte[] data)  
 callback=null);
 }
 interface AsyncOutputTransport: UnbufferedOutputTransport{
      void asyncWrite(ubyte[] buffer, void delegate(ubyte[] data)  
 callback=null);
 }
 Where callback (if not null) is called with a slice of buffer containing  
 actual read/written bytes on IO completion.
 Any calls to read/asyncRead while there is asynchronous IO operation  
 going on should throw, of course.

On Linux, you set the file descriptor to blocking or non-blocking, and  
read(fd) returns errno=EWOULDBLOCK when no data is available.  How does  
this fit into your scheme?  I.e. if you call read() on a  
AsyncInputTransport, what does it do when it gets this error?

It's quite possible that there is some API I'm unaware of for doing  
non-blocking and blocking I/O interleaved, but this has been my experience.

-Steve

Dec 30 2010

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 31.12.2010 1:14, Steven Schveighoffer wrote:
 On Thu, 30 Dec 2010 16:49:15 -0500, Dmitry Olshansky 
 <dmitry.olsh gmail.com> wrote:

 [snip]
 -----
 Question: Should we allow read to return an empty slice even if 
 atEnd is false? If we do, we allow non-blocking streams with burst 
 transfer. However, naive client code on non-blocking streams will be 
 inefficient because it would essentially implement busy-waiting.

 Why not return an integer so different situations could be 
 designated?  It's how the system call read works so you can tell no 
 data was read but that's because it's a non-blocking stream.

 I realize it's sexy to return the data again so it can be used 
 immediately, but in practice it's more useful to return an integer. 
 For example, if you want to fill a buffer, you need a loop anyways 
 (there's no guarantee that the first read will fill the buffer), and 
 at that point, you are just going to use the length member of the 
 return value to advance your loop.

 I'd say, return -1 if a non-blocking stream returns no data, 0 on 
 EOF, positive on data read, and throw an exception on error.

 Maybe it's only me but I would prefer non-blocking IO not mixed with 
 blocking in such a way. Imagine function that takes an 
 UnbufferedInputTransport, how should it indicate that it expects only 
 a non-blocking IO capable transport? Or the other way around. 
 Checking return codes hardly helps anything, and it means supporting 
 both types everywhere, which is a source of all kind of  weird problems.
  From my (somewhat limited) experience, code paths for blocking and 
 non-blocking IO are quite different, the latter are performed by 
 *special* asynchronous calls which are supported by all modern OSes 
 for things like files/sockets.

 Then my position would be:
 1) All read/write methods are *blocking*, returning empty slices on EOF.
 2) Transport that supports asynchronous IO should implement extended 
 interfaces like
 interface AsyncInputTransport: UnbufferedInputTransport{
      void asyncRead(ubyte[] buffer, void delegate(ubyte[] data) 
 callback=null);
 }
 interface AsyncOutputTransport: UnbufferedOutputTransport{
      void asyncWrite(ubyte[] buffer, void delegate(ubyte[] data) 
 callback=null);
 }
 Where callback (if not null) is called with a slice of buffer 
 containing actual read/written bytes on IO completion.
 Any calls to read/asyncRead while there is asynchronous IO operation 
 going on should throw, of course.

 On Linux, you set the file descriptor to blocking or non-blocking, and 
 read(fd) returns errno=EWOULDBLOCK when no data is available.  How 
 does this fit into your scheme?  I.e. if you call read() on a 
 AsyncInputTransport, what does it do when it gets this error?

The only general thing I can think of would be to suspend thread 
(core.Thread.yield), then re-attempt. There may be better platform 
specific ways (there is for GetOverlappedResult with wait flag in Win32).
 It's quite possible that there is some API I'm unaware of for doing 
 non-blocking and blocking I/O interleaved, but this has been my 
 experience.

 -Steve

You are right, I was referring to Win32 API, but I should have revisited 
that part in API docs. Just checked - indeed you should specify the 
intent in CreateFile operation.  So if asynchronous IO is chosen, then 
blocking IO could only be emulated as suggested above.

-- 
Dmitry Olshansky

Dec 30 2010

Johannes Pfau <spam example.com> writes:

Steven Schveighoffer wrote:
On Thu, 30 Dec 2010 16:49:15 -0500, Dmitry Olshansky =20
<dmitry.olsh gmail.com> wrote:

 [snip]
 -----
 Question: Should we allow read to return an empty slice even if
 atEnd is false? If we do, we allow non-blocking streams with burst
 transfer. However, naive client code on non-blocking streams will
 be inefficient because it would essentially implement busy-waiting.

 Why not return an integer so different situations could be
 designated? It's how the system call read works so you can tell no
 data was read but that's because it's a non-blocking stream.

 I realize it's sexy to return the data again so it can be used =20
 immediately, but in practice it's more useful to return an integer.
 For example, if you want to fill a buffer, you need a loop anyways
 (there's no guarantee that the first read will fill the buffer),
 and at that point, you are just going to use the length member of
 the return value to advance your loop.

 I'd say, return -1 if a non-blocking stream returns no data, 0 on
 EOF, positive on data read, and throw an exception on error.

 Maybe it's only me but I would prefer non-blocking IO not mixed
 with blocking in such a way. Imagine function that takes an =20
 UnbufferedInputTransport, how should it indicate that it expects
 only a non-blocking IO capable transport? Or the other way around.
 Checking return codes hardly helps anything, and it means supporting
 both types everywhere, which is a source of all kind of  weird
 problems. From my (somewhat limited) experience, code paths for
 blocking and non-blocking IO are quite different, the latter are
 performed by *special* asynchronous calls which are supported by all
 modern OSes for things like files/sockets.

 Then my position would be:
 1) All read/write methods are *blocking*, returning empty slices on
 EOF. 2) Transport that supports asynchronous IO should implement
 extended interfaces like
 interface AsyncInputTransport: UnbufferedInputTransport{
      void asyncRead(ubyte[] buffer, void delegate(ubyte[] data) =20
 callback=3Dnull);
 }
 interface AsyncOutputTransport: UnbufferedOutputTransport{
      void asyncWrite(ubyte[] buffer, void delegate(ubyte[] data) =20
 callback=3Dnull);
 }
 Where callback (if not null) is called with a slice of buffer
 containing actual read/written bytes on IO completion.
 Any calls to read/asyncRead while there is asynchronous IO
 operation going on should throw, of course.

On Linux, you set the file descriptor to blocking or non-blocking,
and read(fd) returns errno=3DEWOULDBLOCK when no data is available.  How
does this fit into your scheme?  I.e. if you call read() on a =20
AsyncInputTransport, what does it do when it gets this error?

It's quite possible that there is some API I'm unaware of for doing =20
non-blocking and blocking I/O interleaved, but this has been my
experience.

-Steve

I think it's possible (libev: "If you cannot use non-blocking mode,
then force the use of a known-to-be-good backend (at the time of this
writing, this includes only EVBACKEND_SELECT and EVBACKEND_POLL)") but
it's usually not a good idea.
I wonder if the Async*Transport should inherit
Unbuffered*Transport or maybe just TransportBase. A transport which
supports asynchronous and synchronous IO could then inherit both
interfaces. If Async*Transport always inherits Unbuffered*Transport
we'll need some other way to check whether the transport really
supports synchronous reading.

--=20
Johannes Pfau

Dec 31 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 28 Dec 2010 02:02:29 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've put together over the past days an embryonic streaming interface.  
 It separates transport from formatting, input from output, and buffered  
 from unbuffered operation.

 http://erdani.com/d/phobos/std_stream2.html

 There are a number of questions interspersed. It would be great to start  
 a discussion using that design as a baseline. Please voice any related  
 thoughts - thanks!

One thing I just realized, the streams have no shared methods.  This means  
they cannot be used as e.g. stdout...

What are your thoughts on solving this?  I firmly believe that unshared  
streams should be a priority, but would there be some way to wrap an  
unshared stream as a shared stream with some added layer of locking?

-Steve

Dec 29 2010

D Programming

C/C++ Programming

Other

digitalmars.D - streaming redux