digitalmars.D - Request for pre-review: std.serialization/orange

Jacob Carlborg (38/38) Sep 29 2011 I would like to have some form of pre-review of my serialization library...

Tobias Pankrath (7/10) Sep 29 2011 There should be something in Phobos. This should involve two steps.

Jacob Carlborg (6/16) Sep 29 2011 My unit test framework does two things, provides a some form of context

Jonathan M Davis (14/21) Sep 29 2011 I very much doubt that it would be accepted. assertPred failed to make i...

Jacob Carlborg (19/40) Sep 30 2011 That's why I'm asking. BTW, if we had a proper unit test framework it

Jonathan M Davis (17/68) Sep 30 2011 That can be done easily enough with a version block with a version speci...

Jacob Carlborg (9/20) Sep 30 2011 Ok, I see.

Robert Jacques (12/14) Sep 30 2011 [snip]

Jacob Carlborg (17/42) Sep 30 2011 I don't know why but I thought Type.stringof == typeid(Type).toString

Robert Jacques (16/38) Sep 30 2011 I agree, which is why I suggested lookup should have some granuality. i....

Jacob Carlborg (13/34) Oct 01 2011 Aah, now I get it. That's a good idea. The question is what to name the

Robert Jacques (4/35) Oct 02 2011 How about overrideSerializer or overloadSerializer?

Jacob Carlborg (10/23) Oct 03 2011 registerSerializer for the static method and

Robert Jacques (2/18) Oct 03 2011 Yes. Sorry for being unclear. The concept being that at the instance lev...

Jacob Carlborg (4/25) Oct 03 2011 Yes, thanks, that makes sense.

Robert Jacques (17/19) Sep 30 2011 [snip]

Jacob Carlborg (25/57) Oct 01 2011 I was hoping the Archive interface and the Base abstract class would be

Robert Jacques (28/49) Oct 01 2011 For the pre-review its okay. But you'll need it for the actual review.

Jacob Carlborg (22/78) Oct 02 2011 Ok, it will be the same documentation as for Archive and Base. Ddoc

Robert Jacques (7/76) Oct 03 2011 So, in essence, you are saying that by the time archiving occurs, isSlic...

Jacob Carlborg (42/70) Oct 03 2011 No, I'm not saying that. Example:

Robert Jacques (57/126) Oct 03 2011 Regarding design, I agree, although I'd go one further and define Array ...

Jacob Carlborg (26/98) Oct 04 2011 Any suggestion how to fix this, how to properly detect if an array is a

Robert Jacques (16/55) Oct 04 2011 Well, there are two problems. First, there is an API issue: When you pas...

Jacob Carlborg (8/36) Oct 04 2011 Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not

Robert Jacques (2/19) Oct 04 2011 Appending has never mandated reallocation. Maybe you're confusing it wit...

Jacob Carlborg (4/27) Oct 04 2011 Yeah, maybe you're right.

Jacob Carlborg <doob me.com> writes:

I would like to have some form of pre-review of my serialization library 
Orange for later inclusion in Phobos as std.serialization (or similar).

The reason for why I would like to have a pre-review is that Orange, in 
its current state, supports both D1/Tango and D2/Phobos. I don't want to 
spend a lot of time in interacting Orange with Phobos if it have no 
chance of getting accepted into Phobos.

I'm hoping with can do like a regular review with the assumption that if 
it gets accepted into Phobos I will remove all D1/Tango specific code, 
integrate it properly with Phobos and do a second regular review.

A couple of notes for the review:

* The most important packages are: orange.serialization and 
orange.serialization.archives

* I will not accept having just one module for this library. I'm hoping 
to have a package called "std.serialization" or similar. The question is 
if there needs to be a (sub)package for the archives as well.

* For the unit tests I've used my own kind of micro unit test framework 
(that is included). Is that something we want to have in general in 
Phobos so other modules can take advantage of that? Or should I just rip 
out the framework?

* The unit tests are located in its own package, I'm not very happy 
about putting the unit tests in the same module as the rest of the code, 
i.e. the serialization module

* I'm using some utility functions located in the "util" and "core" 
packages, what should we do about those, where to put them?

* I'm not using all of the functions/modules in the above mentioned 
utility packages

For usage examples, see the project page: 
http://dsource.org/projects/orange/wiki/Tutorials

For more extended usage examples, see the unit tests: 
https://github.com/jacob-carlborg/orange/tree/master/tests

Sources: https://github.com/jacob-carlborg/orange
Project page: http://dsource.org/projects/orange/
Documentation: 
http://dl.dropbox.com/u/18386187/orange_docs/orange.serialization.Serializer.html

(Don't forget clicking the "Package" tab in the top corner to see the 
documentation for the rest of the modules)

-- 
/Jacob Carlborg

Sep 29 2011

Tobias Pankrath <tobias pankrath.net> writes:

 * For the unit tests I've used my own kind of micro unit test framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that? 

There should be something in Phobos. This should involve two steps.

First define a protocol, so that multiple unittest libraries can
be used in one project (i.e a library uses A and another B). Than 
write a library on top of this.

My suggestion for a protocol would be something simple as:

1. There should only be one approach to unittest used per module
2. If you replace the test runner, call the old one first.

Sep 29 2011

Jacob Carlborg <doob me.com> writes:

On 2011-09-29 21:42, Tobias Pankrath wrote:
 * For the unit tests I've used my own kind of micro unit test framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that?

 There should be something in Phobos. This should involve two steps.

 First define a protocol, so that multiple unittest libraries can
 be used in one project (i.e a library uses A and another B). Than
 write a library on top of this.

 My suggestion for a protocol would be something simple as:

 1. There should only be one approach to unittest used per module
 2. If you replace the test runner, call the old one first.

My unit test framework does two things, provides a some form of context 
for the tests; and it catches all assert exceptions to let other tests 
continue. It also gives a quite nice report of what tests failed.

-- 
/Jacob Carlborg

Sep 29 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:
 * For the unit tests I've used my own kind of micro unit test framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that? Or should I just rip
 out the framework?

I very much doubt that it would be accepted. assertPred failed to make it in 
spite of the various benefits that it provided, and a number of folks seem to 
be against more complicated unit testing features making it into Phobos. Not 
to mention, there's already some discussion of the unit tests taking too long. 
We aren't going to want anything that adds yet more overhead.

 * The unit tests are located in its own package, I'm not very happy
 about putting the unit tests in the same module as the rest of the code,
 i.e. the serialization module

Well, that's the way that Phobos does it, and it's essentially how D's unit 
tests are intended to work (though obviously, you can work around that if you 
really want to). Phobos' makefiles are set up with the idea that each module's 
unit tests are included in that module. I'm not sure that it's guaranteed that 
we wouldn't allow the unit tests to be separate, but that's not how any of the 
other Phobos unit tests works. And personally, I find that the unit tests are 
far more maintainable when they're right next to the functions that they test.

- Jonathan M Davis

Sep 29 2011

Jacob Carlborg <doob me.com> writes:

On 2011-09-30 08:39, Jonathan M Davis wrote:
 On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:
 * For the unit tests I've used my own kind of micro unit test framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that? Or should I just rip
 out the framework?

 I very much doubt that it would be accepted. assertPred failed to make it in
 spite of the various benefits that it provided, and a number of folks seem to
 be against more complicated unit testing features making it into Phobos. Not
 to mention, there's already some discussion of the unit tests taking too long.
 We aren't going to want anything that adds yet more overhead.

That's why I'm asking. BTW, if we had a proper unit test framework it 
could be possible to run a smaller set of unit tests to keep the running 
time down to a minimum (not something my framework can do).

 * The unit tests are located in its own package, I'm not very happy
 about putting the unit tests in the same module as the rest of the code,
 i.e. the serialization module

 Well, that's the way that Phobos does it, and it's essentially how D's unit
 tests are intended to work (though obviously, you can work around that if you
 really want to). Phobos' makefiles are set up with the idea that each module's
 unit tests are included in that module. I'm not sure that it's guaranteed that
 we wouldn't allow the unit tests to be separate, but that's not how any of the
 other Phobos unit tests works. And personally, I find that the unit tests are
 far more maintainable when they're right next to the functions that they test.

 - Jonathan M Davis

The thing is that just to do a simple test, like serializing an int, it 
requires some extra code. It's not like calling a single function. It 
requires:

* Something to serialize
* A serializer
* An archive
* Actually doing the serialization
* Performing several checks on the serialized data

BTW I don't think it scales to have the unit tests in the same module if 
you want to perform extensive unit tests or if it's a complicated module.

The unit test code should be equally well organized, readable and 
maintainable as the regular code. I think keeping it in the same module 
makes that harder.

-- 
/Jacob Carlborg

Sep 30 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, September 30, 2011 16:25:47 Jacob Carlborg wrote:
 On 2011-09-30 08:39, Jonathan M Davis wrote:
 On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:
 * For the unit tests I've used my own kind of micro unit test
 framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that? Or should I just
 rip
 out the framework?

 
 I very much doubt that it would be accepted. assertPred failed to make
 it in spite of the various benefits that it provided, and a number of
 folks seem to be against more complicated unit testing features making
 it into Phobos. Not to mention, there's already some discussion of the
 unit tests taking too long. We aren't going to want anything that adds
 yet more overhead.

 
 That's why I'm asking. BTW, if we had a proper unit test framework it
 could be possible to run a smaller set of unit tests to keep the running
 time down to a minimum (not something my framework can do).

That can be done easily enough with a version block with a version specific to 
running longer or shorter tests. No additional framework is necessary.  Don 
has already brought it up in the Phobos newsgroup, but it's unclear whether 
we're going to do anything about it. The main problem is the compile time, not 
the run time, so it's ultimately a compiler issue. Templates and CTFE in 
particular make the whole thing slower.

 * The unit tests are located in its own package, I'm not very happy
 about putting the unit tests in the same module as the rest of the
 code,
 i.e. the serialization module

 
 Well, that's the way that Phobos does it, and it's essentially how D's
 unit tests are intended to work (though obviously, you can work around
 that if you really want to). Phobos' makefiles are set up with the idea
 that each module's unit tests are included in that module. I'm not sure
 that it's guaranteed that we wouldn't allow the unit tests to be
 separate, but that's not how any of the other Phobos unit tests works.
 And personally, I find that the unit tests are far more maintainable
 when they're right next to the functions that they test.
 
 - Jonathan M Davis

 
 The thing is that just to do a simple test, like serializing an int, it
 requires some extra code. It's not like calling a single function. It
 requires:
 
 * Something to serialize
 * A serializer
 * An archive
 * Actually doing the serialization
 * Performing several checks on the serialized data
 
 BTW I don't think it scales to have the unit tests in the same module if
 you want to perform extensive unit tests or if it's a complicated module.
 
 The unit test code should be equally well organized, readable and
 maintainable as the regular code. I think keeping it in the same module
 makes that harder.s

In my experience, it's _much_ easier to have the tests right next to what 
they're testing. I've found it _very_ annoying when I've had to put unit tests 
after a templated type, because the unit tests coludn't be _in_ the templated 
type where they would be instantiated with it. The Interval unit tests in 
std.datetime are way less manageable because of that.

Now, if you have unit testing each function individually doesn't work very 
well, and if you have to do a bunch of setup for any testing, then the 
situation is a bit different, but you could probably just stick all of the 
tests at the end of the file then, where they're then separate from the code.

- Jonathan M Davis

Sep 30 2011

Jacob Carlborg <doob me.com> writes:

On 2011-09-30 18:09, Jonathan M Davis wrote:
 That can be done easily enough with a version block with a version specific to
 running longer or shorter tests. No additional framework is necessary.  Don
 has already brought it up in the Phobos newsgroup, but it's unclear whether
 we're going to do anything about it. The main problem is the compile time, not
 the run time, so it's ultimately a compiler issue. Templates and CTFE in
 particular make the whole thing slower.

Ok, I see.

Phobos is like Boost, everything is templates, that's what we get. 
Compiling my unit tests with D1 takes about 0.7 seconds, using D2 it 
takes about 1.4 seconds.

 Now, if you have unit testing each function individually doesn't work very
 well, and if you have to do a bunch of setup for any testing, then the
 situation is a bit different, but you could probably just stick all of the
 tests at the end of the file then, where they're then separate from the code.

 - Jonathan M Davis

I had all the unit tests in one module at first, but I thought it was 
too much to have in one module and now I have several more unit tests.

-- 
/Jacob Carlborg

Sep 30 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:
 I would like to have some form of pre-review of my serialization library
 Orange for later inclusion in Phobos as std.serialization (or similar).

[snip]

This is a quick note on the API design; I'm planning on doing a deeper review
of the code + API later.

Re: registerSerializer
Type.stringof is not unique and can't be used by your serializer. Furthermore,
allowing users to manually set the lookup string is going to be a major source
of silent errors / bugs and exposes a large portions of your internals. Doing
it in this way prevents you from updating how the back-end looks up types.

I'd recommend using: typeid(A).toString internally instead as this is unique
and registerSerializer's API to

void registerSerializer(Derived,Base)(void delegate(Base) dg) if(
is(Derived:Base) ) {}

which would be called via:

registerSerializer!Foo(dg);


The method should also be static: If I'm registering a custom serialization
method, I don't want to duplicate that mapping everywhere a serializer is
instanced. I don't even want to duplicate it for every type of serializer. I
think there needs to be some granularity for this: i.e. instance -> type ->
global.

Also, repeat the above for registerDeserializer.


Re: deserialize some of the example don't look like they're correct.

Sep 30 2011

Jacob Carlborg <doob me.com> writes:

On 2011-09-30 15:03, Robert Jacques wrote:
 On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:
 I would like to have some form of pre-review of my serialization library
 Orange for later inclusion in Phobos as std.serialization (or similar).

 [snip]

 This is a quick note on the API design; I'm planning on doing a deeper
 review of the code + API later.

 Re: registerSerializer
 Type.stringof is not unique and can't be used by your serializer.

I don't know why but I thought Type.stringof == typeid(Type).toString 
for everything that wasn't a class.

 Furthermore, allowing users to manually set the lookup string is going
 to be a major source of silent errors / bugs and exposes a large
 portions of your internals. Doing it in this way prevents you from
 updating how the back-end looks up types.

 I'd recommend using: typeid(A).toString internally instead as this is
 unique and registerSerializer's API to

 void registerSerializer(Derived,Base)(void delegate(Base) dg) if(
 is(Derived:Base) ) {}

 which would be called via:

 registerSerializer!Foo(dg);

Didn't thought of that, thanks.

 The method should also be static: If I'm registering a custom
 serialization method, I don't want to duplicate that mapping everywhere
 a serializer is instanced. I don't even want to duplicate it for every
 type of serializer. I think there needs to be some granularity for this:
 i.e. instance -> type -> global.

"register" is static, "registerSerializer" is not because I'm not 
entirely sure how I want the API to behave.

What if I want to serialize a class in two different places. In one 
place I want to serialize it by default and in the other I want to do 
custom serialization?

"I don't even want to duplicate it for every type of serializer". I'm 
not sure I quite understand, there's only one type of serializer.

 Also, repeat the above for registerDeserializer.


 Re: deserialize some of the example don't look like they're correct.

They look correct to me. Note that two of the three "deserialize" 
methods should only be called when performing custom deserialization of 
a class/strut. This method will then, most likely, be called from within 
the class to manually deserialize the fields.

-- 
/Jacob Carlborg

Sep 30 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 30 Sep 2011 10:41:48 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-09-30 15:03, Robert Jacques wrote:
 On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:


[snip]

 The method should also be static: If I'm registering a custom
 serialization method, I don't want to duplicate that mapping everywhere
 a serializer is instanced. I don't even want to duplicate it for every
 type of serializer. I think there needs to be some granularity for this:
 i.e. instance -> type -> global.

 "register" is static, "registerSerializer" is not because I'm not
 entirely sure how I want the API to behave.

 What if I want to serialize a class in two different places. In one
 place I want to serialize it by default and in the other I want to do
 custom serialization?

I agree, which is why I suggested lookup should have some granuality. i.e. that
there is both a global store of serialization methods and a per instance store
of serialization methods. Lookup would first look in the local store before
defaulting to the global store. But this should be a separate pair of functions.

 "I don't even want to duplicate it for every type of serializer". I'm
 not sure I quite understand, there's only one type of serializer.

I'm sorry, I was thinking about archive types (i.e. JSON vs XML) and somehow
thinking that the Serializers would be different for each. (I was thinking that
the serializer was templated on the archive for some reason.)

 Also, repeat the above for registerDeserializer.


 Re: deserialize some of the example don't look like they're correct.

 They look correct to me. Note that two of the three "deserialize"
 methods should only be called when performing custom deserialization of
 a class/strut. This method will then, most likely, be called from within
 the class to manually deserialize the fields.

Both

T deserialize  (T)();
T deserialize  (T)(string key);

have the following example:

  class Foo
  {
  	int a;

  	void fromData (Serializer serializer, Serializer.Data key)
  	{
  		a = serializer!(int)("a");
  	}
  }

Sep 30 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

Aah, now I get it. That's a good idea. The question is what to name the 
two functions. Yet another use case for overloading methods on static.

 I'm sorry, I was thinking about archive types (i.e. JSON vs XML) and
 somehow thinking that the Serializers would be different for each. (I
 was thinking that the serializer was templated on the archive for some
 reason.)

Ok, no problem.

 Both

 T deserialize (T)();
 T deserialize (T)(string key);

 have the following example:

 class Foo
 {
 int a;

 void fromData (Serializer serializer, Serializer.Data key)
 {
 a = serializer!(int)("a");
 }
 }


No, "T deserialize (T)(string key)" has:

a = serializer!(int)("a");

And "T deserialize (T)()" has:

a = serializer!(int)();

Both are correct. This is a complete example of using one of these 
methods: https://github.com/jacob-carlborg/orange/blob/master/tests/Custom.d

Let me know if anything is confusing.

-- 
/Jacob Carlborg

Oct 01 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

 Aah, now I get it. That's a good idea. The question is what to name the
 two functions. Yet another use case for overloading methods on static.

How about overrideSerializer or overloadSerializer?

[snip]

 Both

 T deserialize (T)();
 T deserialize (T)(string key);

 have the following example:

 class Foo
 {
 int a;

 void fromData (Serializer serializer, Serializer.Data key)
 {
 a = serializer!(int)("a");
 }
 }


 No, "T deserialize (T)(string key)" has:

 a = serializer!(int)("a");

 And "T deserialize (T)()" has:

 a = serializer!(int)();

 Both are correct. This is a complete example of using one of these
 methods: https://github.com/jacob-carlborg/orange/blob/master/tests/Custom.d

 Let me know if anything is confusing.

Umm... example code for the deserialize method should contain 'deserialize'
somewhere inside it.

Oct 02 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-03 05:50, Robert Jacques wrote:
 On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

 Aah, now I get it. That's a good idea. The question is what to name the
 two functions. Yet another use case for overloading methods on static.

 How about overrideSerializer or overloadSerializer?

registerSerializer for the static method and 
overloadSerializer/overrideSerializer for the instance method?

 Umm... example code for the deserialize method should contain
 'deserialize' somewhere inside it.


You are completley right

a = serializer!(int)("a");

Should be

a = deserialize!(int)("a");

My bad.

-- 
/Jacob Carlborg

Oct 03 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-03 05:50, Robert Jacques wrote:
 On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

 Aah, now I get it. That's a good idea. The question is what to name the
 two functions. Yet another use case for overloading methods on static.

 How about overrideSerializer or overloadSerializer?

 registerSerializer for the static method and
 overloadSerializer/overrideSerializer for the instance method?

Yes. Sorry for being unclear. The concept being that at the instance level, you
are overriding default behavior.

Oct 03 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-03 15:39, Robert Jacques wrote:
 On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-03 05:50, Robert Jacques wrote:
 On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first
 look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

 Aah, now I get it. That's a good idea. The question is what to name the
 two functions. Yet another use case for overloading methods on static.

 How about overrideSerializer or overloadSerializer?

 registerSerializer for the static method and
 overloadSerializer/overrideSerializer for the instance method?

 Yes. Sorry for being unclear. The concept being that at the instance
 level, you are overriding default behavior.

Yes, thanks, that makes sense.

-- 
/Jacob Carlborg

Oct 03 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:

 I would like to have some form of pre-review of my serialization library
 Orange for later inclusion in Phobos as std.serialization (or similar).

[snip]

(1)
The first example in class Serializer is:

auto serializer = new Serializer;

Shouldn't it be

auto serializer = new Serializer(archive);

(2)
orange.serialization.archives.XmlArchive need to be documented.

(3)
if Archive.Array (which is poorly named btw) "is a type independent
representation of an array" then why does it contain an elementSize field?
(3a)
Also by the time archiving is called, isSliceOf should always return false.
That this function exists speaks to design problems both large and small. On
the small scale, isSliceOf indicates that you are testing every array against
every other array for slicing, which I hope is not the case. On the large
scale, all the alias/object/pointer/arrays/etc resolutions should be done by
the serializer not the archive. The archive should only be responsible for
converting the internal representation of the serializer to JSON/XML/YAML/etc.
(3b)
Given that Slice has an ID field, why doesn't array.

(4)
Why have an Archive Interface and a Base class with common functionality? Why
not an abstract class? Also, 'Base' isn't an acceptable class name for a Phobos
module. Use 'ArchiveBase' or something more unique instead.

Sep 30 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-01 06:29, Robert Jacques wrote:
 On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:

 I would like to have some form of pre-review of my serialization library
 Orange for later inclusion in Phobos as std.serialization (or similar).

 [snip]

 (1)
 The first example in class Serializer is:

 auto serializer = new Serializer;

 Shouldn't it be

 auto serializer = new Serializer(archive);

You're right, thanks.

 (2)
 orange.serialization.archives.XmlArchive need to be documented.

I was hoping the Archive interface and the Base abstract class would be 
enough.

 (3)
 if Archive.Array (which is poorly named btw) "is a type independent
 representation of an array" then why does it contain an elementSize field?

Suggestions for other names are welcome. Perhaps it was poorly worded, 
but what I mean is that this type can represent all array types.

 (3a)
 Also by the time archiving is called, isSliceOf should always return
 false.

Why is that?

 That this function exists speaks to design problems both large
 and small. On the small scale, isSliceOf indicates that you are testing
 every array against every other array for slicing, which I hope is not
 the case.

I do. How would I otherwise discover if an array is a slice of another 
array or not?

 On the large scale, all the alias/object/pointer/arrays/etc
 resolutions should be done by the serializer not the archive. The
 archive should only be responsible for converting the internal
 representation of the serializer to JSON/XML/YAML/etc.

isSliceOf is never called by any existing archive. Perhaps Slice and 
Array should be moved to the serializer module. I can also remove 
isSliceOf from Base. I think I was trying to keep the archives 
independent of the serializer, in the sense that the archives  never 
should have to import the serializer. That's not currently that case so 
these structs could be moved.

 (3b)
 Given that Slice has an ID field, why doesn't array.

That's a good question. I don't think it's used at all. archiveSlice 
takes a Slice and a sliceId. This field should be removed.

 (4)
 Why have an Archive Interface and a Base class with common
 functionality? Why not an abstract class? Also, 'Base' isn't an
 acceptable class name for a Phobos module. Use 'ArchiveBase' or
 something more unique instead.

As you can see Base is a template class and therefore is dependent on 
the archive type. My goal was that Serializer should not be a template 
class.

I thought it was unnecessary to repeat the module name in the class 
name. But in this case it might be a good idea. Or perhaps call it 
AbstractArchive.

-- 
/Jacob Carlborg

Oct 01 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-10-01 06:29, Robert Jacques wrote:
 On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:


[snip]

 (2)
 orange.serialization.archives.XmlArchive need to be documented.

 I was hoping the Archive interface and the Base abstract class would be
 enough.

For the pre-review its okay. But you'll need it for the actual review.

 (3)
 if Archive.Array (which is poorly named btw) "is a type independent
 representation of an array" then why does it contain an elementSize field?

 Suggestions for other names are welcome. Perhaps it was poorly worded,
 but what I mean is that this type can represent all array types.

 (3a)
 Also by the time archiving is called, isSliceOf should always return
 false.

 Why is that?

If isSliceOf can return true, then that means that the archive is responsible
for alias detection, management, etc. That means that every single archive
format must implement an alias resolution algorithm. This results in a lot of
copy/paste boiler plate which has to be maintained. It also makes it more
difficult to get extra formats supported. Worse if someone forgets to either
include this functionality or to apply a patch, silent bugs and/or slowdowns
are introduced. All and archiver should be responsible for is taking some
decorated data structure and converting it to XML/JSON/YAML/etc and back again.
Anything more complex than that should be shared functionality located in the
serializer / de-serializer objects.

 That this function exists speaks to design problems both large
 and small. On the small scale, isSliceOf indicates that you are testing
 every array against every other array for slicing, which I hope is not
 the case.

 I do. How would I otherwise discover if an array is a slice of another
 array or not?

Okay, first some rational. Consider:

assert(!a.isSliceOf(b));
assert(!b.isSliceOf(a));
assert( c.isSliceOf(a));
assert( c.isSliceOf(b));

and

class Foo {
	float x;
	float[3] point;
}

void main() {
	auto foo = new Foo;
	auto ptr = &foo.x;
	auto slice = point[0..2];
}

In the first case, a, b and c are all slices of a common root array, but the
root array may not be serialized. In the second case, first you have a pointer
to the inside of an object and second you have a slice of a static array inside
an object, all three of which may be serialized together. My impression from
your API (so this might not be correct) is that currently, you can't handle the
above use cases. Even if you can, an O(N^2) algorithm is rather inefficient.

The solution, in my mind, is to think in terms of memory blocks/chucks. Every
reference can be thought as pointing to a memory chunk defined by two pointers
and a flag:

{
	void* head;			// Start of the memory chunk
	void* tail;			// End of the memory chunk
	bool  hasAliases;	// True if there are more than one reference to this chunk
}

For alias detection / resolution, you build a balanced tree of memory chunks,
widening the chunk and flagging hasAliases as appropriate. Which should give
you O(N log(N)) performance.

As an optimization, the user should be able to 'finalize' the serialization by
pruning away all memory chunks without aliases. (i.e. a serializeAndClear
method)

Oct 01 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-02 00:52, Robert Jacques wrote:
 On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote:
 For the pre-review its okay. But you'll need it for the actual review.

Ok, it will be the same documentation as for Archive and Base. Ddoc 
really needs to get better at this, I mean, why can't it just inherit 
the documentation.

 Also by the time archiving is called, isSliceOf should always return
 false.

 Why is that?

 If isSliceOf can return true, then that means that the archive is
 responsible for alias detection, management, etc.

No, who says that. You can take this struct and use it outside of this 
library, it knows nothing about archiving or serialization. If the 
isSliceOf method should return false when archiving has been called I 
would need to add logic to detect when serialization and archiving has 
begun and ended.

 That means that every
 single archive format must implement an alias resolution algorithm.

No, the serializer performs this task.

 This  results in a lot of copy/paste boiler plate which has to be maintained.
 It also makes it more difficult to get extra formats supported. Worse if
 someone forgets to either include this functionality or to apply a
 patch, silent bugs and/or slowdowns are introduced. All and archiver
 should be responsible for is taking some decorated data structure and
 converting it to XML/JSON/YAML/etc and back again. Anything more complex
 than that should be shared functionality located in the serializer /
 de-serializer objects.

It's the only thing the archive does.

 I do. How would I otherwise discover if an array is a slice of another
 array or not?

 Okay, first some rational. Consider:

 assert(!a.isSliceOf(b));
 assert(!b.isSliceOf(a));
 assert( c.isSliceOf(a));
 assert( c.isSliceOf(b));

 and

 class Foo {
 float x;
 float[3] point;
 }

 void main() {
 auto foo = new Foo;
 auto ptr = &foo.x;
 auto slice = point[0..2];
 }

 In the first case, a, b and c are all slices of a common root array, but
 the root array may not be serialized. In the second case, first you have
 a pointer to the inside of an object and second you have a slice of a
 static array inside an object, all three of which may be serialized
 together. My impression from your API (so this might not be correct) is
 that currently, you can't handle the above use cases. Even if you can,
 an O(N^2) algorithm is rather inefficient.

This is how it works: As the first step all arrays are serialized as 
regular arrays and not slices. After all serialization is done I loop 
over all arrays and check if they are a slice of some other array. If I 
found a match I replace the serialized array with a slice instead.

These arrays are stored as an associative array with the type Array[Id]. 
I don't know if there's a better data structure for this.

 The solution, in my mind, is to think in terms of memory blocks/chucks.
 Every reference can be thought as pointing to a memory chunk defined by
 two pointers and a flag:

 {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this chunk
 }

 For alias detection / resolution, you build a balanced tree of memory
 chunks, widening the chunk and flagging hasAliases as appropriate. Which
 should give you O(N log(N)) performance.

I'm not sure I understand. That would require that the arrays are stored 
in a continues block of memory? Won't "head" and "tail" always point to 
start of the array and the end of the array?

 As an optimization, the user should be able to 'finalize' the
 serialization by pruning away all memory chunks without aliases. (i.e. a
 serializeAndClear method)


-- 
/Jacob Carlborg

Oct 02 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 03 Oct 2011 02:38:22 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-10-02 00:52, Robert Jacques wrote:
 On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote:


[snip]

 Also by the time archiving is called, isSliceOf should always return
 false.

 Why is that?

 If isSliceOf can return true, then that means that the archive is
 responsible for alias detection, management, etc.

 No, who says that. You can take this struct and use it outside of this
 library, it knows nothing about archiving or serialization. If the
 isSliceOf method should return false when archiving has been called I
 would need to add logic to detect when serialization and archiving has
 begun and ended.

So, in essence, you are saying that by the time archiving occurs, isSliceOf
will always return false? Then why is it part of the public API?

 That means that every
 single archive format must implement an alias resolution algorithm.

 No, the serializer performs this task.

Okay.

[snip]

 I do. How would I otherwise discover if an array is a slice of another
 array or not?

 Okay, first some rational. Consider:

 assert(!a.isSliceOf(b));
 assert(!b.isSliceOf(a));
 assert( c.isSliceOf(a));
 assert( c.isSliceOf(b));

 and

 class Foo {
 float x;
 float[3] point;
 }

 void main() {
 auto foo = new Foo;
 auto ptr = &foo.x;
 auto slice = point[0..2];
 }

 In the first case, a, b and c are all slices of a common root array, but
 the root array may not be serialized. In the second case, first you have
 a pointer to the inside of an object and second you have a slice of a
 static array inside an object, all three of which may be serialized
 together. My impression from your API (so this might not be correct) is
 that currently, you can't handle the above use cases. Even if you can,
 an O(N^2) algorithm is rather inefficient.

 This is how it works: As the first step all arrays are serialized as
 regular arrays and not slices. After all serialization is done I loop
 over all arrays and check if they are a slice of some other array. If I
 found a match I replace the serialized array with a slice instead.

 These arrays are stored as an associative array with the type Array[Id].
 I don't know if there's a better data structure for this.

I presented one below.

 The solution, in my mind, is to think in terms of memory blocks/chucks.
 Every reference can be thought as pointing to a memory chunk defined by
 two pointers and a flag:

 {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this chunk
 }

 For alias detection / resolution, you build a balanced tree of memory
 chunks, widening the chunk and flagging hasAliases as appropriate. Which
 should give you O(N log(N)) performance.

 I'm not sure I understand. That would require that the arrays are stored
 in a continues block of memory? Won't "head" and "tail" always point to
 start of the array and the end of the array?

Most of the time yes. But not all of the time. It would be fair to say that
'head' and 'tail' would be inside the GC memory region of the array and are <=
or >= of the start/end of an array, respectively. More importantly, both
objects and pointers would resolve to memory chunks as well and be included in
the alias resolution algorithm. I'm assuming you currently separate object and
array alias resolution and don't handle pointers at all.

Oct 03 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-03 15:57, Robert Jacques wrote:
 So, in essence, you are saying that by the time archiving occurs,
 isSliceOf will always return false? Then why is it part of the public API?

No, I'm not saying that. Example:

struct Array
{
	void* ptr;
	size_t length;
	size_t elementSize;
	
	bool isSliceOf (Array b)
	{
		return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length 
* b.elementSize;
	}
}

void main ()
{
	auto a = [0, 1, 2, 3, 4];
	auto b = a[2 .. 4];
	
	auto aArr = Array(a.ptr, a.length, 4);
	auto bArr = Array(b.ptr, b.length, 4);
	
	assert(bArr.isSliceOf(aArr));
	assert(!aArr.isSliceOf(bArr));
}

Both the asserts in the above code passes as expected. See, no 
serialization or archiving in sight. Is there something I'm missing?

Actually it does not need to be part of the public API when I think 
about it. I can move it into Serializer. Array would still need to be 
public since both Serailzer and Archive need access to it and the 
package attribute doesn't work very well.

 The solution, in my mind, is to think in terms of memory blocks/chucks.
 Every reference can be thought as pointing to a memory chunk defined by
 two pointers and a flag:

 {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this
 chunk
 }

 For alias detection / resolution, you build a balanced tree of memory
 chunks, widening the chunk and flagging hasAliases as appropriate. Which
 should give you O(N log(N)) performance.

 I'm not sure I understand. That would require that the arrays are stored
 in a continues block of memory? Won't "head" and "tail" always point to
 start of the array and the end of the array?

 Most of the time yes. But not all of the time. It would be fair to say
 that 'head' and 'tail' would be inside the GC memory region of the array
 and are <= or >= of the start/end of an array, respectively. More
 importantly, both objects and pointers would resolve to memory chunks as
 well and be included in the alias resolution algorithm.

Now I think I start to understand.

 I'm assuming you
 currently separate object and array alias resolution and don't handle
 pointers at all.

Yes. Pointers are handled as well. It's handled similar to 
arrays/slices. First it always serializes what the pointer points to. 
Then in a post process step, after all serialization is done, it 
replaces all serialized pointers, that points to a value that has been 
serialized, with a reference. If a given pointer doesn't point to a 
value that has be serialized it's left as is.

I can see if I this memory chunk approach can be used instead. How will 
this be used with the balanced tree, could you give a simple example?

-- 
/Jacob Carlborg

Oct 03 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 03 Oct 2011 14:10:52 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-10-03 15:57, Robert Jacques wrote:
 So, in essence, you are saying that by the time archiving occurs,
 isSliceOf will always return false? Then why is it part of the public API?

 No, I'm not saying that. Example:

 struct Array
 {
 	void* ptr;
 	size_t length;
 	size_t elementSize;
 	
 	bool isSliceOf (Array b)
 	{
 		return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length
 * b.elementSize;
 	}
 }

 void main ()
 {
 	auto a = [0, 1, 2, 3, 4];
 	auto b = a[2 .. 4];
 	
 	auto aArr = Array(a.ptr, a.length, 4);
 	auto bArr = Array(b.ptr, b.length, 4);
 	
 	assert(bArr.isSliceOf(aArr));
 	assert(!aArr.isSliceOf(bArr));
 }

 Both the asserts in the above code passes as expected. See, no
 serialization or archiving in sight. Is there something I'm missing?

That putting isSliceOf in the public API, implies its usage by the archiver.

 Actually it does not need to be part of the public API when I think
 about it. I can move it into Serializer. Array would still need to be
 public since both Serailzer and Archive need access to it and the
 package attribute doesn't work very well.

Regarding design, I agree, although I'd go one further and define Array as a
public type inside the Serializer class. However, this concept of an 'Array' is
fundamentally flawed. Consider:

auto c = a[1..3];

auto cArr = Array(c.ptr,c.length,4);

assert(!cArr.isSliceOf(bArr));
assert(!bArr.isSliceOf(cArr));

// and

b ~= 5;

bArr = Array(b.ptr, b.length, 4);

assert(!bArr.isSliceOf(aArr));

In short, a serializer must be capable of handling overlapping arrays, not just
strict slices. The array representation therefore needs to parameterize both
the underlying array common to all slices and the actual slice that was
serialized for that key.

 The solution, in my mind, is to think in terms of memory blocks/chucks.
 Every reference can be thought as pointing to a memory chunk defined by
 two pointers and a flag:

 {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this
 chunk
 }

 For alias detection / resolution, you build a balanced tree of memory
 chunks, widening the chunk and flagging hasAliases as appropriate. Which
 should give you O(N log(N)) performance.

 I'm not sure I understand. That would require that the arrays are stored
 in a continues block of memory? Won't "head" and "tail" always point to
 start of the array and the end of the array?

 Most of the time yes. But not all of the time. It would be fair to say
 that 'head' and 'tail' would be inside the GC memory region of the array
 and are <= or >= of the start/end of an array, respectively. More
 importantly, both objects and pointers would resolve to memory chunks as
 well and be included in the alias resolution algorithm.

 Now I think I start to understand.

 I'm assuming you
 currently separate object and array alias resolution and don't handle
 pointers at all.

 Yes. Pointers are handled as well. It's handled similar to
 arrays/slices. First it always serializes what the pointer points to.
 Then in a post process step, after all serialization is done, it
 replaces all serialized pointers, that points to a value that has been
 serialized, with a reference. If a given pointer doesn't point to a
 value that has be serialized it's left as is.

Can a pointer point to the interior of an object? To an element of an array?

 I can see if I this memory chunk approach can be used instead. How will
 this be used with the balanced tree, could you give a simple example?

Well, balanced trees need a comparison function so:

struct Node {
     void* head; // Start of the memory chunk
     void* tail; // End of the memory chunk
     bool hasAliases; // True if there are more than one reference to this chunk
     //... Other meta-data, i.e. ID,

     int opCmp(const ref Node b) {
         if(  tail < b.head) return -1;
         if(b.tail <   head) return 1;
         return 0;
     }
}

On equality / assignment, one just has to combine the heads and tail with
min/max, respectively, and update hasAliases, etc. The difficulty is when a new
node 'bridges the gap' between two existing nodes. This has to handled
explicitly as part of the tree re-balancing, but you may want to consider
making the merging of nodes part of the comparison operator for
simplicity/efficiency:

head = min(head,b.head);
tail = max(tail,b.tail);
hasAliases = true;

After pruning, updating meta-data, etc, the aliased memory chunk for any given
pointer can be found using a separate comparison operator:

     int opCmp(const ref void* b) {
         if(tail < b   ) return -1;
         if(b    < head) return 1;
         return 0;
     }

// Which you'd probably use like:

if( auto node = arr.ptr in setOfAliases ) {
     auto offset = arr.ptr - node.head;
     //...
} else {//...


So at the pseudo-code level, it would be something like:

foreach(obj; [objects,pointers,arrays]) {
    auto node = Node(obj);
    setOfAliases.insert(node);
    // Convert obj into an intermediate form
}

setOfAliases.pruneUnaliasedNodes;
setOfAliases.postProcessMetadata; // i.e. assign unique alias ID, etc

foreach(obj; [objects,pointers,arrays]) {
     if( auto node = arr.ptr in setOfAliases ) {
         auto offset = arr.ptr - node.head;
         // Perform an aliased archival
     } else {
         // Perform an unaliased archival
     }
}

I hope that helped, though I'm not sure if I really answered your question or
not.

Oct 03 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-04 07:21, Robert Jacques wrote:
 That putting isSliceOf in the public API, implies its usage by the
 archiver.

Ok, then I'll remove it.

 Actually it does not need to be part of the public API when I think
 about it. I can move it into Serializer. Array would still need to be
 public since both Serailzer and Archive need access to it and the
 package attribute doesn't work very well.

 Regarding design, I agree, although I'd go one further and define Array
 as a public type inside the Serializer class. However, this concept of
 an 'Array' is fundamentally flawed. Consider:

 auto c = a[1..3];

 auto cArr = Array(c.ptr,c.length,4);

 assert(!cArr.isSliceOf(bArr));
 assert(!bArr.isSliceOf(cArr));

Any suggestion how to fix this, how to properly detect if an array is a 
slice of some other array?

 // and

 b ~= 5;

 bArr = Array(b.ptr, b.length, 4);

 assert(!bArr.isSliceOf(aArr));

Appending to "b" will reallocate "b" making it a regular array and not a 
slice:

b ~= 5;
b[] = 100;
assert(a == [0, 1, 2, 3, 4]);

"a" is not modified and the assert passes.

 Can a pointer point to the interior of an object? To an element of an
 array?

Yes, have a look at: 
https://github.com/jacob-carlborg/orange/blob/master/tests/Pointer.d

Not sure about the array part though, but I think so. Each element of an 
array is serialized just like all the other types.

 I can see if I this memory chunk approach can be used instead. How will
 this be used with the balanced tree, could you give a simple example?

 Well, balanced trees need a comparison function so:

 struct Node {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this chunk
 //... Other meta-data, i.e. ID,

 int opCmp(const ref Node b) {
 if( tail < b.head) return -1;
 if(b.tail < head) return 1;
 return 0;
 }
 }

 On equality / assignment, one just has to combine the heads and tail
 with min/max, respectively, and update hasAliases, etc. The difficulty
 is when a new node 'bridges the gap' between two existing nodes. This
 has to handled explicitly as part of the tree re-balancing, but you may
 want to consider making the merging of nodes part of the comparison
 operator for simplicity/efficiency:

 head = min(head,b.head);
 tail = max(tail,b.tail);
 hasAliases = true;

 After pruning, updating meta-data, etc, the aliased memory chunk for any
 given pointer can be found using a separate comparison operator:

 int opCmp(const ref void* b) {
 if(tail < b ) return -1;
 if(b < head) return 1;
 return 0;
 }

 // Which you'd probably use like:

 if( auto node = arr.ptr in setOfAliases ) {
 auto offset = arr.ptr - node.head;
 //...
 } else {//...


 So at the pseudo-code level, it would be something like:

 foreach(obj; [objects,pointers,arrays]) {
 auto node = Node(obj);
 setOfAliases.insert(node);
 // Convert obj into an intermediate form
 }

 setOfAliases.pruneUnaliasedNodes;
 setOfAliases.postProcessMetadata; // i.e. assign unique alias ID, etc

 foreach(obj; [objects,pointers,arrays]) {
 if( auto node = arr.ptr in setOfAliases ) {
 auto offset = arr.ptr - node.head;
 // Perform an aliased archival
 } else {
 // Perform an unaliased archival
 }
 }

 I hope that helped, though I'm not sure if I really answered your
 question or not.

Yes, I think it at least helped somewhat. The thing is that I'm not very 
familiar with using trees like these.

Would this something similar to: 
https://github.com/jacob-carlborg/orange/blob/master/orange/serializatio
/Serializer.d#L1520 
?

What is the advantage with using a tree? Is the advantage that you loop 
over the elements once in the pseudo-code compared to that I loop over 
them twice, as in: 
https://github.com/jacob-carlborg/orange/blob/master/orange/serializatio
/Serializer.d#L1495 
?

-- 
/Jacob Carlborg

Oct 04 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 04 Oct 2011 03:22:35 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-10-04 07:21, Robert Jacques wrote:

[snip]

 Actually it does not need to be part of the public API when I think
 about it. I can move it into Serializer. Array would still need to be
 public since both Serailzer and Archive need access to it and the
 package attribute doesn't work very well.

 Regarding design, I agree, although I'd go one further and define Array
 as a public type inside the Serializer class. However, this concept of
 an 'Array' is fundamentally flawed. Consider:

 auto c = a[1..3];

 auto cArr = Array(c.ptr,c.length,4);

 assert(!cArr.isSliceOf(bArr));
 assert(!bArr.isSliceOf(cArr));

 Any suggestion how to fix this, how to properly detect if an array is a
 slice of some other array?

Well, there are two problems. First, there is an API issue: When you pass an
aliased array to the archiver, you must pass both an array and a slice of that
array.
Second, is detection. The comparison operator I suggested below covers this
issue. Just remember that with a loop based approach, you can't terminate the
search early: you have to check all existing arrays for possible matches.

 // and

 b ~= 5;

 bArr = Array(b.ptr, b.length, 4);

 assert(!bArr.isSliceOf(aArr));

 Appending to "b" will reallocate "b" making it a regular array and not a
 slice:

 b ~= 5;
 b[] = 100;
 assert(a == [0, 1, 2, 3, 4]);

 "a" is not modified and the assert passes.

I'm sorry, you're right. In my mind b extended to the end of the a array, for
some reason. However, if you do define b to extend to the end of the a array,
then it can append without allocating:

	auto a = [0, 1, 2, 3, 4];
	auto b = a[2 .. $];
	b ~= 5;
	assert(b[0]==2);
	a[2] = 10;
	assert(b[0]==10);

So please don't dismiss this point.

[snip]

 Would this something similar to:
 https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1520
 ?

I'm not sure, that routine seems to be filtering pointers between those with
aliases and those without aliases, which would be similar in effect to:

if( auto node = arr.ptr in setOfAliases ) {} else {}

 What is the advantage with using a tree? Is the advantage that you loop
 over the elements once in the pseudo-code compared to that I loop over
 them twice, as in:
 https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1495
 ?

Primarily, it's O(N logN) vs O(N^2). Also, it solves the isSliceOf problem we
discussed above and puts arrays and objects into the same framework, as objects
containing fixed sized arrays can have slices.

Oct 04 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-04 17:14, Robert Jacques wrote:
 I'm sorry, you're right. In my mind b extended to the end of the a
 array, for some reason. However, if you do define b to extend to the end
 of the a array, then it can append without allocating:

 auto a = [0, 1, 2, 3, 4];
 auto b = a[2 .. $];
 b ~= 5;
 assert(b[0]==2);
 a[2] = 10;
 assert(b[0]==10);

 So please don't dismiss this point.

Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not 
changed when "b" is changed. Since you append to "b" I though that it 
would always require a reallocation of "b".

 [snip]

 Would this something similar to:
 https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1520

 ?

 I'm not sure, that routine seems to be filtering pointers between those
 with aliases and those without aliases, which would be similar in effect
 to:

 if( auto node = arr.ptr in setOfAliases ) {} else {}

 What is the advantage with using a tree? Is the advantage that you loop
 over the elements once in the pseudo-code compared to that I loop over
 them twice, as in:
 https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1495

 ?

 Primarily, it's O(N logN) vs O(N^2). Also, it solves the isSliceOf
 problem we discussed above and puts arrays and objects into the same
 framework, as objects containing fixed sized arrays can have slices.

I guess I just have to try this and see how it works out. Anyway, thank 
you for your review and your patience.

-- 
/Jacob Carlborg

Oct 04 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 04 Oct 2011 12:54:27 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-04 17:14, Robert Jacques wrote:
 I'm sorry, you're right. In my mind b extended to the end of the a
 array, for some reason. However, if you do define b to extend to the end
 of the a array, then it can append without allocating:

 auto a = [0, 1, 2, 3, 4];
 auto b = a[2 .. $];
 b ~= 5;
 assert(b[0]==2);
 a[2] = 10;
 assert(b[0]==10);

 So please don't dismiss this point.

 Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not
 changed when "b" is changed. Since you append to "b" I though that it
 would always require a reallocation of "b".

Appending has never mandated reallocation. Maybe you're confusing it with
concatenation (i.e. b = b ~ 5;), which does always reallocate. Or the period in
time when the anti-array-stomping parts of druntime was overly conservative and
would have prevented b from appending in place.

Oct 04 2011

Jacob Carlborg <doob me.com> writes:

On 2011-10-05 03:49, Robert Jacques wrote:
 On Tue, 04 Oct 2011 12:54:27 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-04 17:14, Robert Jacques wrote:
 I'm sorry, you're right. In my mind b extended to the end of the a
 array, for some reason. However, if you do define b to extend to the end
 of the a array, then it can append without allocating:

 auto a = [0, 1, 2, 3, 4];
 auto b = a[2 .. $];
 b ~= 5;
 assert(b[0]==2);
 a[2] = 10;
 assert(b[0]==10);

 So please don't dismiss this point.

 Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not
 changed when "b" is changed. Since you append to "b" I though that it
 would always require a reallocation of "b".

 Appending has never mandated reallocation. Maybe you're confusing it
 with concatenation (i.e. b = b ~ 5;), which does always reallocate. Or
 the period in time when the anti-array-stomping parts of druntime was
 overly conservative and would have prevented b from appending in place.

Yeah, maybe you're right.

-- 
/Jacob Carlborg

Oct 04 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Request for pre-review: std.serialization/orange