www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Streaming parsers in D a novel design

reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
A streaming parser is doing 2 things:
- waiting for input
- processing the current input and spitting out Events

This maps to D beautifully:

1. Waiting for input means it's an OutputRange with explicit put 
method!

2. Processing events means it's an InputRange of events. It may 
even be ForwardRange, mine first of this kind is Forward range.

Destroy!

--
Olshansky Dmitry
Apr 18 2023
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:
 A streaming parser is doing 2 things:
 - waiting for input
 - processing the current input and spitting out Events

 This maps to D beautifully:

 1. Waiting for input means it's an OutputRange with explicit 
 put method!

 2. Processing events means it's an InputRange of events. It may 
 even be ForwardRange, mine first of this kind is Forward range.

 Destroy!

 --
 Olshansky Dmitry
Sounds like a natural fit! I think it's interesting to explore related concepts that other languages have like their versions of iterators/ranges, but also streams, observables, interaction with promises (observables vs async iterables). Another related topic is validation vs parsing (converting data from unstructured input to a more structured form). I don't have time to discuss in detail at the moment, so I'll just share a few links. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ https://serokell.io/blog/parser-combinators-in-haskell http://csl.stanford.edu/~christos/pldi2010.fit/meijer.duality.pdf https://www.youtube.com/watch?v=sTSQlYX5DU0&ab_channel=ReactConference https://dev.solita.fi/2021/10/14/grokking-clojure-transducers.html https://www.youtube.com/watch?v=6mTbuzafcII https://hypirion.com/musings/recursive-transducers https://hypirion.com/musings/haskell-transducers https://www.youtube.com/watch?v=vohGJjGxtJQ&ab_channel=CppCon https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2300r5.html
Apr 18 2023
parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Tuesday, 18 April 2023 at 13:01:40 UTC, Petar Kirov 
[ZombineDev] wrote:
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2300r5.html
For an implementation in D, see: https://github.com/symmetryinvestments/concurrency
Apr 18 2023
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Apr 18, 2023 at 08:13:05AM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
 A streaming parser is doing 2 things:
 - waiting for input
 - processing the current input and spitting out Events
 
 This maps to D beautifully:
 
 1. Waiting for input means it's an OutputRange with explicit put
 method!
 
 2. Processing events means it's an InputRange of events. It may even
 be ForwardRange, mine first of this kind is Forward range.
[...] In practice, the input would be an input range consumed by the parser which returns an input range of tokens. IOW the parser is a filter that transforms an input range of characters into a range of tokens. You'd only need an output range interface if you needed to explicitly feed characters to the parser, such as if you were using an async I/O API and you just received a block of fresh input, and now you need to stuff the input into the parser. But even then, I wouldn't do it this way, because on the other end, the input range wouldn't be able to return a result if, say, the next block of input hasn't arrived yet. So .empty would have to block, which is an ugly design for something with an input range API. At the end of the day, I think treating a parser as a transformation (range of char -> range of tokens) is a better abstraction than trying to shoehorn it into something involved output ranges. T -- Nearly all men can stand adversity, but if you want to test a man's character, give him power. -- Abraham Lincoln
Apr 18 2023
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/18/23 7:24 AM, H. S. Teoh wrote:

 At the end of the day, I think treating a parser as a transformation
 (range of char -> range of tokens) is a better abstraction than trying
 to shoehorn it into something involved output ranges.
This is exactly how jsoniopipe works (though the input is not a "range" but an iopipe of course). The resulting thing is essentially a "range", but not with the range functions. The implementation doesn't quite fit the range paradigm. https://github.com/schveiguy/jsoniopipe -Steve
Apr 18 2023
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/18/2023 7:24 AM, H. S. Teoh wrote:
 In practice, the input would be an input range consumed by the parser
 which returns an input range of tokens.  IOW the parser is a filter that
 transforms an input range of characters into a range of tokens.
The lexer produces a stream of tokens, not the parser. The output of the parser is an AST.
Apr 18 2023
prev sibling next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:
 A streaming parser is doing 2 things:
 - waiting for input
 - processing the current input and spitting out Events

 This maps to D beautifully:

 1. Waiting for input means it's an OutputRange with explicit 
 put method!

 2. Processing events means it's an InputRange of events. It may 
 even be ForwardRange, mine first of this kind is Forward range.

 Destroy!

 --
 Olshansky Dmitry
I think it will work beautifully until the day you want to do it asynchronously.
Apr 18 2023
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 18 April 2023 at 14:26:00 UTC, Sebastiaan Koppe wrote:
 On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky 
 wrote:
 A streaming parser is doing 2 things:
 - waiting for input
 - processing the current input and spitting out Events

 This maps to D beautifully:

 1. Waiting for input means it's an OutputRange with explicit 
 put method!

 2. Processing events means it's an InputRange of events. It 
 may even be ForwardRange, mine first of this kind is Forward 
 range.

 Destroy!

 --
 Olshansky Dmitry
I think it will work beautifully until the day you want to do it asynchronously.
Photon takes care of that: https://github.com/DmitryOlshansky/photon
Apr 18 2023
prev sibling parent reply Jacob Shtokolov <jacob.100205 gmail.com> writes:
On Tuesday, 18 April 2023 at 14:26:00 UTC, Sebastiaan Koppe wrote:
 On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky 
 wrote:
 A streaming parser is doing 2 things:
 - waiting for input
 - processing the current input and spitting out Events
I think it will work beautifully until the day you want to do it asynchronously.
Ranges can be async, in this case, you just need to return the Result type (pending|error|success) and provide hints to something that will be pulling from this range. It will look like an infinite range, but not fully infinite: it returns "success" once the result is ready. The sync version will always return `error|success` without the 'pending' phase. That kind of interface is similar to Rust's futures, providing poll-based primitives with some wakeup hints (callbacks).
Apr 19 2023
parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Wednesday, 19 April 2023 at 16:07:18 UTC, Jacob Shtokolov 
wrote:
 On Tuesday, 18 April 2023 at 14:26:00 UTC, Sebastiaan Koppe 
 wrote:
 I think it will work beautifully until the day you want to do 
 it asynchronously.
Ranges can be async, in this case, you just need to return the Result type (pending|error|success) and provide hints to something that will be pulling from this range.
Maybe. But not without major hurdles. And you forgot (async) cancellation, which is a big deal, ask Rust.
Apr 19 2023
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Wednesday, 19 April 2023 at 19:13:25 UTC, Sebastiaan Koppe 
wrote:
 On Wednesday, 19 April 2023 at 16:07:18 UTC, Jacob Shtokolov 
 wrote:
 On Tuesday, 18 April 2023 at 14:26:00 UTC, Sebastiaan Koppe 
 wrote:
 I think it will work beautifully until the day you want to do 
 it asynchronously.
Ranges can be async, in this case, you just need to return the Result type (pending|error|success) and provide hints to something that will be pulling from this range.
Maybe. But not without major hurdles. And you forgot (async) cancellation, which is a big deal, ask Rust.
Cancelation is trivial, you can break a fiber at any safe point (where it yields) -- Dmitry Olshansky
Apr 19 2023
parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Wednesday, 19 April 2023 at 19:50:35 UTC, Dmitry Olshansky 
wrote:
 Cancelation is trivial, you can break a fiber at any safe point 
 (where it yields)
Interrupting a fiber and cancelling work aren't the same thing. The latter might involve some cleanup, which itself might be async. Quote from Rust's wg-async https://rust-lang.github.io/wg-async/vision/roadmap/scopes.html#cancellation:
 In today's Rust, any async function can be synchronously 
 cancelled at any await point: the code simply stops executing, 
 and destructors are run for any extant variables. This leads to 
 a lot of bugs.
Also a blog post on uring and rust, https://without.boats/blog/io-uring/ Suffice to say its tricky.
Apr 19 2023
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Wednesday, 19 April 2023 at 20:46:23 UTC, Sebastiaan Koppe 
wrote:
 On Wednesday, 19 April 2023 at 19:50:35 UTC, Dmitry Olshansky 
 wrote:
 Cancelation is trivial, you can break a fiber at any safe 
 point (where it yields)
Interrupting a fiber and cancelling work aren't the same thing. The latter might involve some cleanup, which itself might be async. Quote from Rust's wg-async https://rust-lang.github.io/wg-async/vision/roadmap/scopes.html#cancellation:
 In today's Rust, any async function can be synchronously 
 cancelled at any await point: the code simply stops executing, 
 and destructors are run for any extant variables. This leads 
 to a lot of bugs.
I think your structured concurrency stuff would fit nicely in photon, maybe we should get in touch and discuss it.
 Also a blog post on uring and rust, 
 https://without.boats/blog/io-uring/

 Suffice to say its tricky.
Apr 19 2023
parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Thursday, 20 April 2023 at 00:17:05 UTC, Dmitry Olshansky 
wrote:
 I think your structured concurrency stuff would fit nicely in 
 photon, maybe we should get in touch and discuss it.
I actually considered going with photon before starting to implement C++'s Senders/Receivers. Its an awesome idea, but what held me back was the portability concerns around fibers. WebAssembly doesn't support them for instance. We also needed support for async algorithms. Racing TCP connections is a classic example, where you initiate 2 connections in parallel and continue with whoever responds first. Key is doing that in a structured way, so there is no possibility of leaking something. There is probably a way we could integrate the 2 libs, I already have a MR for supporting D fibers, so we can probably extend it to whatever executor photon has. I bet there is a way to autowrap any regular blocking libraries that way, and get both structured concurrency and async syscalls. Interesting.
Apr 20 2023
prev sibling next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:
 This maps to D beautifully:
I recently wrote a new stream class that works kinda like this and thought about doing put() and front etc, but I wanted heterogeneous types. Still, it kinda did work. https://github.com/adamdruppe/arsd/blob/master/core.d#L5040 there's the test rn to show how it works
Apr 19 2023
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Wednesday, 19 April 2023 at 16:15:53 UTC, Adam D Ruppe wrote:
 On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky 
 wrote:
 This maps to D beautifully:
I recently wrote a new stream class that works kinda like this and thought about doing put() and front etc, but I wanted heterogeneous types. Still, it kinda did work. https://github.com/adamdruppe/arsd/blob/master/core.d#L5040 there's the test rn to show how it works
Cool stuff, as usual! -- Dmitry Olshansky
Apr 19 2023
prev sibling parent ikod <igor.khasilev gmail.com> writes:
On Tuesday, 18 April 2023 at 08:13:05 UTC, Dmitry Olshansky wrote:
 A streaming parser is doing 2 things:
 - waiting for input
 - processing the current input and spitting out Events

 This maps to D beautifully:

 1. Waiting for input means it's an OutputRange with explicit 
 put method!

 2. Processing events means it's an InputRange of events. It may 
 even be ForwardRange, mine first of this kind is Forward range.
This is how `requests` handle input stream from the remote end and convert it to "input stream" of http response body chunks. Input is a stream of bytes, arbitrary splitted by remote end or by network, etc. Moreover,response content can be gzipped, chunk-encoded and whatever else. So you have to process each network input with several "processors", like "inflate", "chunk-encoded decoder" and so on. +------------+ +--------------+ +---------+ +-----------------+ |input range | | chunk-encoded| | gzip | | input range | Net->-| of byte[] |->-| decoder |->-| decoder |->-| of response body|->- App | | | | | | | chunks: byte[] | +------------+ +--------------+ +---------+ +-----------------+ There is some problems like push all unprocessed data to the right end of the picture at the end of the network stream and maybe some other... But essentially it works. Immutability of underlying data on every step of this processing stream helps very much.
Apr 20 2023