www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - associative arrays: to sort or not to sort?

reply Mario Kroeplin <kroeplin.d googlemail.com> writes:
Have a look at the unittest of std.json: the only non-trivial test is commented
out as "currently broken".
Smells like std.json has deep problems when it comes to real-world examples.
But then: running the unittest shows that the actual result is
{"goodbye":[true,"or",false,["test",42,{"nested":{"a":23.54,"b":0.0012}}]],"hello":{"json":"is
great","array":[12,null,{}]}}.
The two name/value pairs just appear in different order than expected.
That's because "goodbye" < "hello", if you make an illegal assumption for
associative arrays, as "the key/value slots are iterated in an unspecified
order." [tdpl]

Now the question:

Shall the "currently broken" unittest be "fixed" by expecting the actual
result, with just the order of "goodbye" and "hello" changed?
That is, shall we silently make the illegal assumption, that foreach provides
the keys in sorted order.
Or, shall the foreach (key, value; aa) be replaced with a more clumsy foreach
(key; aa.keys.sort)?
That is, shall we produce "canonical" JSON text at the price of efficiency.
Or, shall the perfect implementation of JSON objects as associative arrays be
dropped?
Or, what else?

By the way: the example before the "currently broken" one is {"a":1,"b":null};
also broken in theory, but not (yet) in practice!
Jul 18 2010
parent reply BCS <none anon.com> writes:
Hello Mario,


 That is, shall we produce "canonical" JSON text at the price of
 efficiency.
 
 Or, shall the perfect implementation of JSON objects as associative
 arrays be dropped?
 
 Or, what else?
 

Unless JSON requiers that the keys be in some order, the correct solution is to make the check order independent. For example: string s = CallReturningJSON(); s = replace(s,`"a":23.54`, `X`); s = replace(s,`"b":0.0012`, `X`); s = replace(s,`{"nested":{X,X}}`, `X`); s = replace(s,`"goodbye":[true,"or",false,["test",42,X]]`, `X`); s = replace(s,`"array":[12,null,{}]`, `Y`); s = replace(s,"is\n\ngreat", `Z`); s = replace(s,`"json":Z`, `Y`); s = replace(s,`"hello":{Y,Y}`, `X`); assert(s == `{X,X}`);. -- ... <IXOYE><
Jul 18 2010
parent reply Mario Kroeplin <kroeplin.d googlemail.com> writes:
 Unless JSON requiers that the keys be in some order,

No, JSON does not require the names of an object to be in alphabetical order.
 the correct solution is to make the check order independent. For example:
 string s = CallReturningJSON();
 s = replace(s,`"a":23.54`, `X`);
 s = replace(s,`"b":0.0012`, `X`);
 s = replace(s,`{"nested":{X,X}}`, `X`);
 s = replace(s,`"goodbye":[true,"or",false,["test",42,X]]`, `X`);
 s = replace(s,`"array":[12,null,{}]`, `Y`);
 s = replace(s,"is\n\ngreat", `Z`);
 s = replace(s,`"json":Z`, `Y`);
 s = replace(s,`"hello":{Y,Y}`, `X`);
 assert(s == `{X,X}`);.

But this is lengthy and seems to lack additional assert checks. One way out could be to avoid real-world examples in the unittest. For the simple example of std.json that is only "broken in theory", the correct solution could be: assert(s == `{"a":1,"b":null}` || s == `{"b":null,"a":1}`) But now assume, I want to implement a JSON-RPC encoder that uses std.json. A JSON-RPC request object has up to four name/value pairs ("jsonrpc", "method", "params", "id"). Instead of only 2!, I now have 4! (too many) possible orders to check in the unittest of the JSON-RPC encoder. The unspecified order of the foreach iteration does not only affect the unittest of std.json, but also leaks out to far away implementations using std.json. The same is true for std.xml: assert(s == `<tag goodbye="1" hello="0"/>`); may pass today, but is broken in theory! You have to notice that XML attributes are stored in an associative array and that toString is implemented using foreach. Until then, broken unit tests pass... This is a problem, isn't it?
Jul 20 2010
parent reply BCS <none anon.com> writes:
Hello Mario,

 Unless JSON requiers that the keys be in some order,
 

order.
 the correct solution is to make the check order independent. For
 example:
 string s = CallReturningJSON();
 s = replace(s,`"a":23.54`, `X`);
 s = replace(s,`"b":0.0012`, `X`);
 s = replace(s,`{"nested":{X,X}}`, `X`);
 s = replace(s,`"goodbye":[true,"or",false,["test",42,X]]`, `X`);
 s = replace(s,`"array":[12,null,{}]`, `Y`);
 s = replace(s,"is\n\ngreat", `Z`);
 s = replace(s,`"json":Z`, `Y`);
 s = replace(s,`"hello":{Y,Y}`, `X`);
 assert(s == `{X,X}`);.

But this is lengthy and seems to lack additional assert checks.

It's a little wordy but it has one line per things it is checking. And adjusting it to give more asserts is trivial.
 
 Until then, broken unit tests pass...
 

Spot on. The unittest is broken. Not the thing it's testing and not foreach. -- ... <IXOYE><
Jul 21 2010
parent reply Mario Kroeplin <kroeplin.d googlemail.com> writes:
 the correct solution is to make the check order independent. For
 example:
 string s = CallReturningJSON();
 s = replace(s,`"a":23.54`, `X`);
 s = replace(s,`"b":0.0012`, `X`);
 s = replace(s,`{"nested":{X,X}}`, `X`);
 s = replace(s,`"goodbye":[true,"or",false,["test",42,X]]`, `X`);
 s = replace(s,`"array":[12,null,{}]`, `Y`);
 s = replace(s,"is\n\ngreat", `Z`);
 s = replace(s,`"json":Z`, `Y`);
 s = replace(s,`"hello":{Y,Y}`, `X`);
 assert(s == `{X,X}`);.

But this is lengthy and seems to lack additional assert checks.


The order of name/value pairs in JSON objects is as unspecified as the order of foreach iteration for associative arrays. So, agreed, you would have to unittest a serialization against all permutations. But then, JSON has a jew more unspecified gaps like "whitespace can be inserted between any pair of tokens". Shall we rely on the fact that the implementation currently does not insert whitespace between tokens? This would work for the unittest of std.json, but what's with the unittest of an implementation using std.json? Or, shall we implement a tokenizer in the unittest to get rid of extra whitespace between tokens?
 Until then, broken unit tests pass...


Maybe, the thing under test is not broken. But there seems to be something wrong with the thing when it forces you to examine the unspecified result of toString in a unittest. I mean, it would have been easy for the author of the code to provide toCanonicalJSON, and a lot easier for users of the code. And, of course, foreach is not broken. But D removes a lot of unspecified gaps that make life hard in C or C++. And you have to provide opCmp in order to put your own keys into an associative array, so why don't you get them out in order? However, let's get things done: can the attached unittest be an acceptatble replacement for the broken one of std.json?
Jul 21 2010
parent BCS <none anon.com> writes:
Hello Mario,

 But then, JSON has a jew more unspecified gaps like "whitespace can be
 inserted between any pair of tokens".
 

That can be dealt with by just being consistent.
 Shall we rely on the fact that the implementation currently does not
 insert whitespace between tokens?

On the output side, why not? The unit test is for this implementation after all.
 And you have to provide opCmp in order to put your own keys into an
 associative array, so why don't you get them out in order?

That's because AA's use closed hashing with tree used to deal with col3sions.
 
 However, let's get things done: can the attached unittest be an
 acceptatble replacement for the broken one of std.json?
 

Move line 35 and 38 up one place each and you can drop the check for permutations. -- ... <IXOYE><
Jul 21 2010