digitalmars.D - stdx.data.json needs a layer on top

Laeeth Isharc (13/13) Jun 23 2015 It's great, but it's not quite a replacement for std.json, as I

Rikki Cattermole (3/15) Jun 23 2015 Please come onto https://www.livecoding.tv/alphaglosined/ and hang out

Laeeth Isharc (2/4) Jun 23 2015 what times GMT or BST are good for you?

Rikki Cattermole (5/9) Jun 23 2015 12pm UTC+0 is when I aim to stream. Hopefully I'll stream again tonight....

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (20/32) Jun 23 2015 One thing. which I consider the most important missing building block,

Laeeth Isharc (48/80) Jun 23 2015 Thanks, Sonke. I appreciate your taking the time to reply, and I

Jacob Carlborg (9/16) Jun 24 2015 If the data can change between calls or is not consistent my

Laeeth Isharc (36/57) Jun 24 2015 Thanks, Jacob.

Jacob Carlborg (4/16) Jun 24 2015 I understand and I agree it would be nice to have.
=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (36/48) Jun 24 2015 This is very close to what I had done initially for vibe.d's "Json"

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/4) Jun 24 2015 Should have been "toJSONValue".

Martin Nowak (9/12) Jun 24 2015 Allowing to lazily foreach over elements would be nice.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/18) Jun 25 2015 An initial version of readArray is up for discussion:

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

It's great, but it's not quite a replacement for std.json, as I 
see it.

The stream parser is fast, and it's valuable to be able to access 
it at a low level.

However, it was consciously designed to be low-level, and for 
something else to go on top.

As I understand it, there is a gap between what you can currently 
do with std.json (and indeed vibed json) and what you can do with 
stdx.data.json.  And the capability falls short of what can be 
done in other standard libraries such as the ones for python.

So since we are going for a nuclear-power station included 
approach, does that not mean that we need to specify what this 
layer should do, and somebody should start to work on it?

Jun 23 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 24/06/2015 12:17 a.m., Laeeth Isharc wrote:
 It's great, but it's not quite a replacement for std.json, as I see it.

 The stream parser is fast, and it's valuable to be able to access it at
 a low level.

 However, it was consciously designed to be low-level, and for something
 else to go on top.

 As I understand it, there is a gap between what you can currently do
 with std.json (and indeed vibed json) and what you can do with
 stdx.data.json.  And the capability falls short of what can be done in
 other standard libraries such as the ones for python.

 So since we are going for a nuclear-power station included approach,
 does that not mean that we need to specify what this layer should do,
 and somebody should start to work on it?

Please come onto https://www.livecoding.tv/alphaglosined/ and hang out 
for half an hour. I want to show you something related.

Jun 23 2015

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

On Tuesday, 23 June 2015 at 12:28:00 UTC, Rikki Cattermole wrote:
 Please come onto https://www.livecoding.tv/alphaglosined/ and 
 hang out for half an hour. I want to show you something related.

what times GMT or BST are good for you?

Jun 23 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 24/06/2015 7:05 a.m., Laeeth Isharc wrote:
 On Tuesday, 23 June 2015 at 12:28:00 UTC, Rikki Cattermole wrote:
 Please come onto https://www.livecoding.tv/alphaglosined/ and hang out
 for half an hour. I want to show you something related.

 what times GMT or BST are good for you?

12pm UTC+0 is when I aim to stream. Hopefully I'll stream again tonight. 
Although I'm getting a bit tired after streaming for three days! 
(usually only twice a week).

Follow or keep an eye on livecodingtv on twitter to know when I start.

Jun 23 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 23.06.2015 um 14:17 schrieb Laeeth Isharc:
 It's great, but it's not quite a replacement for std.json, as I see it.

 The stream parser is fast, and it's valuable to be able to access it at
 a low level.

 However, it was consciously designed to be low-level, and for something
 else to go on top.

 As I understand it, there is a gap between what you can currently do
 with std.json (and indeed vibed json) and what you can do with
 stdx.data.json.  And the capability falls short of what can be done in
 other standard libraries such as the ones for python.

 So since we are going for a nuclear-power station included approach,
 does that not mean that we need to specify what this layer should do,
 and somebody should start to work on it?

One thing. which I consider the most important missing building block, 
is Jacob's anticipated std.serialization module [1]*. Skipping the data 
representation layer and going straight for a statically typed access to 
the data is the way to go in a language such as D, at least in most 
situations.

Another part is a high level layer on top of the stream parser that 
exists for a while (albeit with room for improvement), but that I forgot 
to update the documentation for. I've now caught up on that and it can 
be found under [2] - see the read[...] and skip[...] functions.

Do you, or anyone else, have further ideas for higher level 
functionality, or any concrete examples in other standard libraries?

[1]: https://github.com/jacob-carlborg/orange
[2]: http://s-ludwig.github.io/std_data_json/stdx/data/json/parser.html


* Or any other suitable replacement, if that doesn't work out for some 
reason. The vibe.data.serialization module to me is not a suitable 
candidate as it stands, because it lacks some features of Jacob's 
solution, such as proper handling of (duplicate/interior) references. 
But it's a perfect fit for my own class of problems, so I currently 
can't justify to put work into this either.

Jun 23 2015

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

On Tuesday, 23 June 2015 at 14:06:38 UTC, Sönke Ludwig wrote:
 As I understand it, there is a gap between what you can 
 currently do
 with std.json (and indeed vibed json) and what you can do with
 stdx.data.json.  And the capability falls short of what can be 
 done in
 other standard libraries such as the ones for python.

 So since we are going for a nuclear-power station included 
 approach,
 does that not mean that we need to specify what this layer 
 should do,
 and somebody should start to work on it?

 One thing. which I consider the most important missing building 
 block, is Jacob's anticipated std.serialization module [1]*. 
 Skipping the data representation layer and going straight for a 
 statically typed access to the data is the way to go in a 
 language such as D, at least in most situations.

Thanks, Sonke.  I appreciate your taking the time to reply, and I 
hope I represented my understanding of things correctly.  I think 
often things get stuck in limbo because people don't know what's 
most useful, so I do think a central list of "things that need to 
be done" in D ecosystem might be nice, if it doesn't become 
excessively structured and bureaucratic.  (I ain't volunteering 
to maintain it, as I can't commit to it).

Thing is there are different use cases.  For example, I pull data 
from Quandl - the metadata is standard and won't change in format 
often; but the data for a particular series will.  For example if 
I pull volatility data that will have different fields to price 
or economic data.  And I don't know beforehand the total set of 
possibilities.  This must be quite a common use case, and indeed 
I just hit another one recently with a poorly-documented internal 
corporate database for securities.

Maybe it's fine to generate the static typing in response to 
reading the data, but then it ought to be easy to do so 
(ultimately).  Because otherwise you hack something up in Python 
because it's just easier, and that hack job becomes the basis for 
something larger then you ever intended or wanted and it's never 
worth rewriting given the other stuff you need.

But even if you prefer static typing generated on the fly (which 
maybe becomes useful via introspection a la Alexandrescu talk), 
sometimes one will prefer dynamic typing, and since it's easy to 
do in a way that doesn't destroy the elegance and coherence of 
the whole project, why not give people the option ?  It seems to 
me that Guido painted a target on Python by saying "it's fast 
enough, and you are usually I/O etc bound", because the numerical 
computing people have different needs.  So BLAS and the like may 
be part of that, but also having something like pandas - and the 
ability to get data in and out of it - would be an important part 
in making it easy and fun to use D for this purpose, and it's not 
so hard to do so, just a fair bit of work.  Not that it makes 
sense to undergo a death march to duplicate python functionality, 
but there are some things that are relatively easy that have a 
high payoff - like John Colvin's pydmagic.

(The link here, which may not be so obvious, is that in a way 
pandas is a kind of replacement for a spreadsheet, and being able 
to just pull stuff in without minding your 'p's and 'q's to get a 
quick result lends itself to the kind of iterative exploration 
that makes spreadsheets still overused even today.  And that's 
the link to JSON and (de)-serialization).

 Another part is a high level layer on top of the stream parser 
 that exists for a while (albeit with room for improvement), but 
 that I forgot to update the documentation for. I've now caught 
 up on that and it can be found under [2] - see the read[...] 
 and skip[...] functions.

Thank you for the link.
 Do you, or anyone else, have further ideas for higher level 
 functionality, or any concrete examples in other standard 
 libraries?

Will think it through and try to come up with some simple 
examples.  Paging John Colvin and Russell Winder, too.

 * Or any other suitable replacement, if that doesn't work out 
 for some reason. The vibe.data.serialization module to me is 
 not a suitable candidate as it stands, because it lacks some 
 features of Jacob's solution, such as proper handling of 
 (duplicate/interior) references. But it's a perfect fit for my 
 own class of problems, so I currently can't justify to put work 
 into this either.

Is it worth you or someone else trying to articulate well what it 
does well that is missing from stdx.data.json?

Jun 23 2015

Jacob Carlborg <doob me.com> writes:

On 23/06/15 21:22, Laeeth Isharc wrote:

 Thing is there are different use cases.  For example, I pull data from
 Quandl - the metadata is standard and won't change in format often; but
 the data for a particular series will.  For example if I pull volatility
 data that will have different fields to price or economic data.  And I
 don't know beforehand the total set of possibilities.  This must be
 quite a common use case, and indeed I just hit another one recently with
 a poorly-documented internal corporate database for securities.

If the data can change between calls or is not consistent my 
serialization library is not a good fit. But if the data is consistent 
but changes over time, something like once a month, my serialization 
library could work if you update the data structures when the data changes.

My serialization library can also work with optional fields if custom 
serialization is used.

-- 
/Jacob Carlborg

Jun 24 2015

"Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:

On Wednesday, 24 June 2015 at 13:15:52 UTC, Jacob Carlborg wrote:
 On 23/06/15 21:22, Laeeth Isharc wrote:

 Thing is there are different use cases.  For example, I pull 
 data from
 Quandl - the metadata is standard and won't change in format 
 often; but
 the data for a particular series will.  For example if I pull 
 volatility
 data that will have different fields to price or economic 
 data.  And I
 don't know beforehand the total set of possibilities.  This 
 must be
 quite a common use case, and indeed I just hit another one 
 recently with
 a poorly-documented internal corporate database for securities.

 If the data can change between calls or is not consistent my 
 serialization library is not a good fit. But if the data is 
 consistent but changes over time, something like once a month, 
 my serialization library could work if you update the data 
 structures when the data changes.

 My serialization library can also work with optional fields if 
 custom serialization is used.

Thanks, Jacob.

Some series shouldn't change too often.  On the other hand, just 
with Quandl that is 10 million data series taken from a whole 
range of different sources, some of them rather unfinished, and 
it's hard to know.

My needs are not relevant for the library, except that I think 
people often want to explore new data sets iteratively (over the 
course of weeks and months).  Of course it doesn't take long to 
write the struct (or make something that will write it given the 
data and some guidance) but that's one more layer of friction.

So from the perspective of D succeeding, I would think giving 
people the option (within a coherent framework, so not using one 
library here and another there when in other language ecosystems 
it is not fragmented) of using static or dynamic typing as they 
prefer would pay off.

I don't know if you have looked at pandas and ipython notebook 
much.  But now one can call D code from the ipython notebook 
(again, a 'trivial' piece of glue but ingenious and removing this 
small friction makes getting work done much easier) maybe having 
the option to have dynamic types with JSON will have more value.

See here, as one simple example:
http://nbviewer.ipython.org/gist/wesm/4757075/PandasTour.ipynb

So it would be nice to be able to something like Adam Ruppe does 
here:
https://github.com/adamdruppe/arsd/blob/master/jsvar.d

var j = json!q{
		"hello": {
			"data":[1,2,"giggle",4]
		},
		"world":20
	};

	writeln(j.hello.data[2]);

Obviously the scope is outside a serialization library, but just 
thinking about the broader integrated and coherent library 
offering we should have.

Jun 24 2015

Jacob Carlborg <doob me.com> writes:

On 24/06/15 15:48, Laeeth Isharc wrote:

 So it would be nice to be able to something like Adam Ruppe does here:
 https://github.com/adamdruppe/arsd/blob/master/jsvar.d

 var j = json!q{
          "hello": {
              "data":[1,2,"giggle",4]
          },
          "world":20
      };

      writeln(j.hello.data[2]);

 Obviously the scope is outside a serialization library, but just
 thinking about the broader integrated and coherent library offering we
 should have.

I understand and I agree it would be nice to have.

-- 
/Jacob Carlborg

Jun 24 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 24.06.2015 um 15:48 schrieb Laeeth Isharc:
 So it would be nice to be able to something like Adam Ruppe does here:
 https://github.com/adamdruppe/arsd/blob/master/jsvar.d

 var j = json!q{
          "hello": {
              "data":[1,2,"giggle",4]
          },
          "world":20
      };

      writeln(j.hello.data[2]);

 Obviously the scope is outside a serialization library, but just
 thinking about the broader integrated and coherent library offering we
 should have.

This is very close to what I had done initially for vibe.d's "Json" 
struct. However, this approach requires adding opDispatch with an 
unbounded input domain. This in turn means that *any* change of the 
normal members in the Json struct is a potential silent breaking change. 
In particular, it then de-facto becomes impossible to add new methods.

Another issue that has come up is that such a struct passes all kinds of 
duck typing tests, so that for example it was considered to be an input 
range when it really isn't. This can be an issue for things like a 
serialization library that don't include a special case for this type.

Finally, although this is partially a matter of taste, I personally 
found that using the member access syntax can lead to the wrong (sub 
conscious) impression that these members were statically declared. I 
suspect that this leads to a larger likelihood that bugs caused by 
missing field existence checks slip into the source, as well as making 
it more difficult for the developer to detect typos (member access 
*looks* like it would be normal statically checked code, so it's easy to 
overlook typos there).

For these reasons, the code with the proposed JSONValue becomes a little 
more verbose, requiring index based access instead:

     auto j = parseJSONValue(q{
         "hello": {
             "data":[1,2,"giggle",4]
         },
         "world":20
     });

     writeln(j["hello"]["data"][2]);

There is also a method to safely (without causing exceptions) iterate 
down a path within the JSON DOM, when parts of the path might be missing:

     writeln(j.opt("hello", "data")[2]);

JSONValue is backed by a std.variant.Algebraic, which has the advantage 
of getting the operators for free. It also means that JSONValue will 
automatically be compatible with other similar value types, such as a 
potential BSONValue (which has more types to choose from).

[1]: 
http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.html

Jun 24 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.06.2015 um 07:52 schrieb Sönke Ludwig:
 (...)
      auto j = parseJSONValue(q{

Should have been "toJSONValue".

Jun 24 2015

Martin Nowak <code+news.digitalmars dawg.eu> writes:

On 06/23/2015 04:06 PM, Sönke Ludwig wrote:
 
 Do you, or anyone else, have further ideas for higher level
 functionality, or any concrete examples in other standard libraries?

Allowing to lazily foreach over elements would be nice.

foreach (elem; nodes.readArray)
{
    // each elem would be a bounded node stream (range)
    foreach (key, value; elem.readObject)
    {
    }
}

Jun 24 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 24.06.2015 um 23:50 schrieb Martin Nowak:
 On 06/23/2015 04:06 PM, Sönke Ludwig wrote:
 Do you, or anyone else, have further ideas for higher level
 functionality, or any concrete examples in other standard libraries?

 Allowing to lazily foreach over elements would be nice.

 foreach (elem; nodes.readArray)
 {
      // each elem would be a bounded node stream (range)
      foreach (key, value; elem.readObject)
      {
      }
 }

An initial version of readArray is up for discussion:
https://github.com/s-ludwig/std_data_json/blob/3efc0600b4f8598dd6ccf897d6140d3351b5ee84/source/stdx/data/json/parser.d#L955

Unfortunately it is  system, because the reference to the input range 
gets escaped. The "VR" struct also has to be non-copyable to avoid its 
"depth" field to get out of sync when it gets copied around.

Jun 25 2015

D Programming

C/C++ Programming

Other

digitalmars.D - stdx.data.json needs a layer on top