www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Request for pre-review: std.serialization/orange

reply Jacob Carlborg <doob me.com> writes:
I would like to have some form of pre-review of my serialization library 
Orange for later inclusion in Phobos as std.serialization (or similar).

The reason for why I would like to have a pre-review is that Orange, in 
its current state, supports both D1/Tango and D2/Phobos. I don't want to 
spend a lot of time in interacting Orange with Phobos if it have no 
chance of getting accepted into Phobos.

I'm hoping with can do like a regular review with the assumption that if 
it gets accepted into Phobos I will remove all D1/Tango specific code, 
integrate it properly with Phobos and do a second regular review.

A couple of notes for the review:

* The most important packages are: orange.serialization and 
orange.serialization.archives

* I will not accept having just one module for this library. I'm hoping 
to have a package called "std.serialization" or similar. The question is 
if there needs to be a (sub)package for the archives as well.

* For the unit tests I've used my own kind of micro unit test framework 
(that is included). Is that something we want to have in general in 
Phobos so other modules can take advantage of that? Or should I just rip 
out the framework?

* The unit tests are located in its own package, I'm not very happy 
about putting the unit tests in the same module as the rest of the code, 
i.e. the serialization module

* I'm using some utility functions located in the "util" and "core" 
packages, what should we do about those, where to put them?

* I'm not using all of the functions/modules in the above mentioned 
utility packages

For usage examples, see the project page: 
http://dsource.org/projects/orange/wiki/Tutorials

For more extended usage examples, see the unit tests: 
https://github.com/jacob-carlborg/orange/tree/master/tests

Sources: https://github.com/jacob-carlborg/orange
Project page: http://dsource.org/projects/orange/
Documentation: 
http://dl.dropbox.com/u/18386187/orange_docs/orange.serialization.Serializer.html

(Don't forget clicking the "Package" tab in the top corner to see the 
documentation for the rest of the modules)

-- 
/Jacob Carlborg
Sep 29 2011
next sibling parent reply Tobias Pankrath <tobias pankrath.net> writes:
 * For the unit tests I've used my own kind of micro unit test framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that? 

There should be something in Phobos. This should involve two steps. First define a protocol, so that multiple unittest libraries can be used in one project (i.e a library uses A and another B). Than write a library on top of this. My suggestion for a protocol would be something simple as: 1. There should only be one approach to unittest used per module 2. If you replace the test runner, call the old one first.
Sep 29 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-09-29 21:42, Tobias Pankrath wrote:
 * For the unit tests I've used my own kind of micro unit test framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that?

There should be something in Phobos. This should involve two steps. First define a protocol, so that multiple unittest libraries can be used in one project (i.e a library uses A and another B). Than write a library on top of this. My suggestion for a protocol would be something simple as: 1. There should only be one approach to unittest used per module 2. If you replace the test runner, call the old one first.

My unit test framework does two things, provides a some form of context for the tests; and it catches all assert exceptions to let other tests continue. It also gives a quite nice report of what tests failed. -- /Jacob Carlborg
Sep 29 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:
 * For the unit tests I've used my own kind of micro unit test framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that? Or should I just rip
 out the framework?

I very much doubt that it would be accepted. assertPred failed to make it in spite of the various benefits that it provided, and a number of folks seem to be against more complicated unit testing features making it into Phobos. Not to mention, there's already some discussion of the unit tests taking too long. We aren't going to want anything that adds yet more overhead.
 * The unit tests are located in its own package, I'm not very happy
 about putting the unit tests in the same module as the rest of the code,
 i.e. the serialization module

Well, that's the way that Phobos does it, and it's essentially how D's unit tests are intended to work (though obviously, you can work around that if you really want to). Phobos' makefiles are set up with the idea that each module's unit tests are included in that module. I'm not sure that it's guaranteed that we wouldn't allow the unit tests to be separate, but that's not how any of the other Phobos unit tests works. And personally, I find that the unit tests are far more maintainable when they're right next to the functions that they test. - Jonathan M Davis
Sep 29 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-09-30 08:39, Jonathan M Davis wrote:
 On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:
 * For the unit tests I've used my own kind of micro unit test framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that? Or should I just rip
 out the framework?

I very much doubt that it would be accepted. assertPred failed to make it in spite of the various benefits that it provided, and a number of folks seem to be against more complicated unit testing features making it into Phobos. Not to mention, there's already some discussion of the unit tests taking too long. We aren't going to want anything that adds yet more overhead.

That's why I'm asking. BTW, if we had a proper unit test framework it could be possible to run a smaller set of unit tests to keep the running time down to a minimum (not something my framework can do).
 * The unit tests are located in its own package, I'm not very happy
 about putting the unit tests in the same module as the rest of the code,
 i.e. the serialization module

Well, that's the way that Phobos does it, and it's essentially how D's unit tests are intended to work (though obviously, you can work around that if you really want to). Phobos' makefiles are set up with the idea that each module's unit tests are included in that module. I'm not sure that it's guaranteed that we wouldn't allow the unit tests to be separate, but that's not how any of the other Phobos unit tests works. And personally, I find that the unit tests are far more maintainable when they're right next to the functions that they test. - Jonathan M Davis

The thing is that just to do a simple test, like serializing an int, it requires some extra code. It's not like calling a single function. It requires: * Something to serialize * A serializer * An archive * Actually doing the serialization * Performing several checks on the serialized data BTW I don't think it scales to have the unit tests in the same module if you want to perform extensive unit tests or if it's a complicated module. The unit test code should be equally well organized, readable and maintainable as the regular code. I think keeping it in the same module makes that harder. -- /Jacob Carlborg
Sep 30 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-09-30 18:09, Jonathan M Davis wrote:
 That can be done easily enough with a version block with a version specific to
 running longer or shorter tests. No additional framework is necessary.  Don
 has already brought it up in the Phobos newsgroup, but it's unclear whether
 we're going to do anything about it. The main problem is the compile time, not
 the run time, so it's ultimately a compiler issue. Templates and CTFE in
 particular make the whole thing slower.

Ok, I see. Phobos is like Boost, everything is templates, that's what we get. Compiling my unit tests with D1 takes about 0.7 seconds, using D2 it takes about 1.4 seconds.
 Now, if you have unit testing each function individually doesn't work very
 well, and if you have to do a bunch of setup for any testing, then the
 situation is a bit different, but you could probably just stick all of the
 tests at the end of the file then, where they're then separate from the code.

 - Jonathan M Davis

I had all the unit tests in one module at first, but I thought it was too much to have in one module and now I have several more unit tests. -- /Jacob Carlborg
Sep 30 2011
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:
 I would like to have some form of pre-review of my serialization library
 Orange for later inclusion in Phobos as std.serialization (or similar).

[snip] This is a quick note on the API design; I'm planning on doing a deeper review of the code + API later. Re: registerSerializer Type.stringof is not unique and can't be used by your serializer. Furthermore, allowing users to manually set the lookup string is going to be a major source of silent errors / bugs and exposes a large portions of your internals. Doing it in this way prevents you from updating how the back-end looks up types. I'd recommend using: typeid(A).toString internally instead as this is unique and registerSerializer's API to void registerSerializer(Derived,Base)(void delegate(Base) dg) if( is(Derived:Base) ) {} which would be called via: registerSerializer!Foo(dg); The method should also be static: If I'm registering a custom serialization method, I don't want to duplicate that mapping everywhere a serializer is instanced. I don't even want to duplicate it for every type of serializer. I think there needs to be some granularity for this: i.e. instance -> type -> global. Also, repeat the above for registerDeserializer. Re: deserialize some of the example don't look like they're correct.
Sep 30 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-09-30 15:03, Robert Jacques wrote:
 On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:
 I would like to have some form of pre-review of my serialization library
 Orange for later inclusion in Phobos as std.serialization (or similar).

[snip] This is a quick note on the API design; I'm planning on doing a deeper review of the code + API later. Re: registerSerializer Type.stringof is not unique and can't be used by your serializer.

I don't know why but I thought Type.stringof == typeid(Type).toString for everything that wasn't a class.
 Furthermore, allowing users to manually set the lookup string is going
 to be a major source of silent errors / bugs and exposes a large
 portions of your internals. Doing it in this way prevents you from
 updating how the back-end looks up types.

 I'd recommend using: typeid(A).toString internally instead as this is
 unique and registerSerializer's API to

 void registerSerializer(Derived,Base)(void delegate(Base) dg) if(
 is(Derived:Base) ) {}

 which would be called via:

 registerSerializer!Foo(dg);

Didn't thought of that, thanks.
 The method should also be static: If I'm registering a custom
 serialization method, I don't want to duplicate that mapping everywhere
 a serializer is instanced. I don't even want to duplicate it for every
 type of serializer. I think there needs to be some granularity for this:
 i.e. instance -> type -> global.

"register" is static, "registerSerializer" is not because I'm not entirely sure how I want the API to behave. What if I want to serialize a class in two different places. In one place I want to serialize it by default and in the other I want to do custom serialization? "I don't even want to duplicate it for every type of serializer". I'm not sure I quite understand, there's only one type of serializer.
 Also, repeat the above for registerDeserializer.


 Re: deserialize some of the example don't look like they're correct.

They look correct to me. Note that two of the three "deserialize" methods should only be called when performing custom deserialization of a class/strut. This method will then, most likely, be called from within the class to manually deserialize the fields. -- /Jacob Carlborg
Sep 30 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.
 I'm sorry, I was thinking about archive types (i.e. JSON vs XML) and
 somehow thinking that the Serializers would be different for each. (I
 was thinking that the serializer was templated on the archive for some
 reason.)

Ok, no problem.
 Both

 T deserialize (T)();
 T deserialize (T)(string key);

 have the following example:

 class Foo
 {
 int a;

 void fromData (Serializer serializer, Serializer.Data key)
 {
 a = serializer!(int)("a");
 }
 }

No, "T deserialize (T)(string key)" has: a = serializer!(int)("a"); And "T deserialize (T)()" has: a = serializer!(int)(); Both are correct. This is a complete example of using one of these methods: https://github.com/jacob-carlborg/orange/blob/master/tests/Custom.d Let me know if anything is confusing. -- /Jacob Carlborg
Oct 01 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-03 05:50, Robert Jacques wrote:
 On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.

How about overrideSerializer or overloadSerializer?

registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method?
 Umm... example code for the deserialize method should contain
 'deserialize' somewhere inside it.

You are completley right a = serializer!(int)("a"); Should be a = deserialize!(int)("a"); My bad. -- /Jacob Carlborg
Oct 03 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-10-03 15:39, Robert Jacques wrote:
 On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-03 05:50, Robert Jacques wrote:
 On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first
 look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.

How about overrideSerializer or overloadSerializer?

registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method?

Yes. Sorry for being unclear. The concept being that at the instance level, you are overriding default behavior.

Yes, thanks, that makes sense. -- /Jacob Carlborg
Oct 03 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, September 30, 2011 16:25:47 Jacob Carlborg wrote:
 On 2011-09-30 08:39, Jonathan M Davis wrote:
 On Thursday, September 29, 2011 20:58:30 Jacob Carlborg wrote:
 * For the unit tests I've used my own kind of micro unit test
 framework
 (that is included). Is that something we want to have in general in
 Phobos so other modules can take advantage of that? Or should I just
 rip
 out the framework?

I very much doubt that it would be accepted. assertPred failed to make it in spite of the various benefits that it provided, and a number of folks seem to be against more complicated unit testing features making it into Phobos. Not to mention, there's already some discussion of the unit tests taking too long. We aren't going to want anything that adds yet more overhead.

That's why I'm asking. BTW, if we had a proper unit test framework it could be possible to run a smaller set of unit tests to keep the running time down to a minimum (not something my framework can do).

That can be done easily enough with a version block with a version specific to running longer or shorter tests. No additional framework is necessary. Don has already brought it up in the Phobos newsgroup, but it's unclear whether we're going to do anything about it. The main problem is the compile time, not the run time, so it's ultimately a compiler issue. Templates and CTFE in particular make the whole thing slower.
 * The unit tests are located in its own package, I'm not very happy
 about putting the unit tests in the same module as the rest of the
 code,
 i.e. the serialization module

Well, that's the way that Phobos does it, and it's essentially how D's unit tests are intended to work (though obviously, you can work around that if you really want to). Phobos' makefiles are set up with the idea that each module's unit tests are included in that module. I'm not sure that it's guaranteed that we wouldn't allow the unit tests to be separate, but that's not how any of the other Phobos unit tests works. And personally, I find that the unit tests are far more maintainable when they're right next to the functions that they test. - Jonathan M Davis

The thing is that just to do a simple test, like serializing an int, it requires some extra code. It's not like calling a single function. It requires: * Something to serialize * A serializer * An archive * Actually doing the serialization * Performing several checks on the serialized data BTW I don't think it scales to have the unit tests in the same module if you want to perform extensive unit tests or if it's a complicated module. The unit test code should be equally well organized, readable and maintainable as the regular code. I think keeping it in the same module makes that harder.s

In my experience, it's _much_ easier to have the tests right next to what they're testing. I've found it _very_ annoying when I've had to put unit tests after a templated type, because the unit tests coludn't be _in_ the templated type where they would be instantiated with it. The Interval unit tests in std.datetime are way less manageable because of that. Now, if you have unit testing each function individually doesn't work very well, and if you have to do a bunch of setup for any testing, then the situation is a bit different, but you could probably just stick all of the tests at the end of the file then, where they're then separate from the code. - Jonathan M Davis
Sep 30 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 30 Sep 2011 10:41:48 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-09-30 15:03, Robert Jacques wrote:
 On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:


[snip]
 The method should also be static: If I'm registering a custom
 serialization method, I don't want to duplicate that mapping everywhere
 a serializer is instanced. I don't even want to duplicate it for every
 type of serializer. I think there needs to be some granularity for this:
 i.e. instance -> type -> global.

"register" is static, "registerSerializer" is not because I'm not entirely sure how I want the API to behave. What if I want to serialize a class in two different places. In one place I want to serialize it by default and in the other I want to do custom serialization?

I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions.
 "I don't even want to duplicate it for every type of serializer". I'm
 not sure I quite understand, there's only one type of serializer.

I'm sorry, I was thinking about archive types (i.e. JSON vs XML) and somehow thinking that the Serializers would be different for each. (I was thinking that the serializer was templated on the archive for some reason.)
 Also, repeat the above for registerDeserializer.


 Re: deserialize some of the example don't look like they're correct.

They look correct to me. Note that two of the three "deserialize" methods should only be called when performing custom deserialization of a class/strut. This method will then, most likely, be called from within the class to manually deserialize the fields.

Both T deserialize (T)(); T deserialize (T)(string key); have the following example: class Foo { int a; void fromData (Serializer serializer, Serializer.Data key) { a = serializer!(int)("a"); } }
Sep 30 2011
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:

 I would like to have some form of pre-review of my serialization library
 Orange for later inclusion in Phobos as std.serialization (or similar).

[snip] (1) The first example in class Serializer is: auto serializer = new Serializer; Shouldn't it be auto serializer = new Serializer(archive); (2) orange.serialization.archives.XmlArchive need to be documented. (3) if Archive.Array (which is poorly named btw) "is a type independent representation of an array" then why does it contain an elementSize field? (3a) Also by the time archiving is called, isSliceOf should always return false. That this function exists speaks to design problems both large and small. On the small scale, isSliceOf indicates that you are testing every array against every other array for slicing, which I hope is not the case. On the large scale, all the alias/object/pointer/arrays/etc resolutions should be done by the serializer not the archive. The archive should only be responsible for converting the internal representation of the serializer to JSON/XML/YAML/etc. (3b) Given that Slice has an ID field, why doesn't array. (4) Why have an Archive Interface and a Base class with common functionality? Why not an abstract class? Also, 'Base' isn't an acceptable class name for a Phobos module. Use 'ArchiveBase' or something more unique instead.
Sep 30 2011
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-01 06:29, Robert Jacques wrote:
 On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:

 I would like to have some form of pre-review of my serialization library
 Orange for later inclusion in Phobos as std.serialization (or similar).

[snip] (1) The first example in class Serializer is: auto serializer = new Serializer; Shouldn't it be auto serializer = new Serializer(archive);

You're right, thanks.
 (2)
 orange.serialization.archives.XmlArchive need to be documented.

I was hoping the Archive interface and the Base abstract class would be enough.
 (3)
 if Archive.Array (which is poorly named btw) "is a type independent
 representation of an array" then why does it contain an elementSize field?

Suggestions for other names are welcome. Perhaps it was poorly worded, but what I mean is that this type can represent all array types.
 (3a)
 Also by the time archiving is called, isSliceOf should always return
 false.

Why is that?
 That this function exists speaks to design problems both large
 and small. On the small scale, isSliceOf indicates that you are testing
 every array against every other array for slicing, which I hope is not
 the case.

I do. How would I otherwise discover if an array is a slice of another array or not?
 On the large scale, all the alias/object/pointer/arrays/etc
 resolutions should be done by the serializer not the archive. The
 archive should only be responsible for converting the internal
 representation of the serializer to JSON/XML/YAML/etc.

isSliceOf is never called by any existing archive. Perhaps Slice and Array should be moved to the serializer module. I can also remove isSliceOf from Base. I think I was trying to keep the archives independent of the serializer, in the sense that the archives never should have to import the serializer. That's not currently that case so these structs could be moved.
 (3b)
 Given that Slice has an ID field, why doesn't array.

That's a good question. I don't think it's used at all. archiveSlice takes a Slice and a sliceId. This field should be removed.
 (4)
 Why have an Archive Interface and a Base class with common
 functionality? Why not an abstract class? Also, 'Base' isn't an
 acceptable class name for a Phobos module. Use 'ArchiveBase' or
 something more unique instead.

As you can see Base is a template class and therefore is dependent on the archive type. My goal was that Serializer should not be a template class. I thought it was unnecessary to repeat the module name in the class name. But in this case it might be a good idea. Or perhaps call it AbstractArchive. -- /Jacob Carlborg
Oct 01 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-02 00:52, Robert Jacques wrote:
 On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote:
 For the pre-review its okay. But you'll need it for the actual review.

Ok, it will be the same documentation as for Archive and Base. Ddoc really needs to get better at this, I mean, why can't it just inherit the documentation.
 Also by the time archiving is called, isSliceOf should always return
 false.

Why is that?

If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc.

No, who says that. You can take this struct and use it outside of this library, it knows nothing about archiving or serialization. If the isSliceOf method should return false when archiving has been called I would need to add logic to detect when serialization and archiving has begun and ended.
 That means that every
 single archive format must implement an alias resolution algorithm.

No, the serializer performs this task.
 This  results in a lot of copy/paste boiler plate which has to be maintained.
 It also makes it more difficult to get extra formats supported. Worse if
 someone forgets to either include this functionality or to apply a
 patch, silent bugs and/or slowdowns are introduced. All and archiver
 should be responsible for is taking some decorated data structure and
 converting it to XML/JSON/YAML/etc and back again. Anything more complex
 than that should be shared functionality located in the serializer /
 de-serializer objects.

It's the only thing the archive does.
 I do. How would I otherwise discover if an array is a slice of another
 array or not?

Okay, first some rational. Consider: assert(!a.isSliceOf(b)); assert(!b.isSliceOf(a)); assert( c.isSliceOf(a)); assert( c.isSliceOf(b)); and class Foo { float x; float[3] point; } void main() { auto foo = new Foo; auto ptr = &foo.x; auto slice = point[0..2]; } In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient.

This is how it works: As the first step all arrays are serialized as regular arrays and not slices. After all serialization is done I loop over all arrays and check if they are a slice of some other array. If I found a match I replace the serialized array with a slice instead. These arrays are stored as an associative array with the type Array[Id]. I don't know if there's a better data structure for this.
 The solution, in my mind, is to think in terms of memory blocks/chucks.
 Every reference can be thought as pointing to a memory chunk defined by
 two pointers and a flag:

 {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this chunk
 }

 For alias detection / resolution, you build a balanced tree of memory
 chunks, widening the chunk and flagging hasAliases as appropriate. Which
 should give you O(N log(N)) performance.

I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?
 As an optimization, the user should be able to 'finalize' the
 serialization by pruning away all memory chunks without aliases. (i.e. a
 serializeAndClear method)

-- /Jacob Carlborg
Oct 02 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-03 15:57, Robert Jacques wrote:
 So, in essence, you are saying that by the time archiving occurs,
 isSliceOf will always return false? Then why is it part of the public API?

No, I'm not saying that. Example: struct Array { void* ptr; size_t length; size_t elementSize; bool isSliceOf (Array b) { return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length * b.elementSize; } } void main () { auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. 4]; auto aArr = Array(a.ptr, a.length, 4); auto bArr = Array(b.ptr, b.length, 4); assert(bArr.isSliceOf(aArr)); assert(!aArr.isSliceOf(bArr)); } Both the asserts in the above code passes as expected. See, no serialization or archiving in sight. Is there something I'm missing? Actually it does not need to be part of the public API when I think about it. I can move it into Serializer. Array would still need to be public since both Serailzer and Archive need access to it and the package attribute doesn't work very well.
 The solution, in my mind, is to think in terms of memory blocks/chucks.
 Every reference can be thought as pointing to a memory chunk defined by
 two pointers and a flag:

 {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this
 chunk
 }

 For alias detection / resolution, you build a balanced tree of memory
 chunks, widening the chunk and flagging hasAliases as appropriate. Which
 should give you O(N log(N)) performance.

I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?

Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm.

Now I think I start to understand.
 I'm assuming you
 currently separate object and array alias resolution and don't handle
 pointers at all.

Yes. Pointers are handled as well. It's handled similar to arrays/slices. First it always serializes what the pointer points to. Then in a post process step, after all serialization is done, it replaces all serialized pointers, that points to a value that has been serialized, with a reference. If a given pointer doesn't point to a value that has be serialized it's left as is. I can see if I this memory chunk approach can be used instead. How will this be used with the balanced tree, could you give a simple example? -- /Jacob Carlborg
Oct 03 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-04 07:21, Robert Jacques wrote:
 That putting isSliceOf in the public API, implies its usage by the
 archiver.

Ok, then I'll remove it.
 Actually it does not need to be part of the public API when I think
 about it. I can move it into Serializer. Array would still need to be
 public since both Serailzer and Archive need access to it and the
 package attribute doesn't work very well.

Regarding design, I agree, although I'd go one further and define Array as a public type inside the Serializer class. However, this concept of an 'Array' is fundamentally flawed. Consider: auto c = a[1..3]; auto cArr = Array(c.ptr,c.length,4); assert(!cArr.isSliceOf(bArr)); assert(!bArr.isSliceOf(cArr));

Any suggestion how to fix this, how to properly detect if an array is a slice of some other array?
 // and

 b ~= 5;

 bArr = Array(b.ptr, b.length, 4);

 assert(!bArr.isSliceOf(aArr));

Appending to "b" will reallocate "b" making it a regular array and not a slice: b ~= 5; b[] = 100; assert(a == [0, 1, 2, 3, 4]); "a" is not modified and the assert passes.
 Can a pointer point to the interior of an object? To an element of an
 array?

Yes, have a look at: https://github.com/jacob-carlborg/orange/blob/master/tests/Pointer.d Not sure about the array part though, but I think so. Each element of an array is serialized just like all the other types.
 I can see if I this memory chunk approach can be used instead. How will
 this be used with the balanced tree, could you give a simple example?

Well, balanced trees need a comparison function so: struct Node { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk //... Other meta-data, i.e. ID, int opCmp(const ref Node b) { if( tail < b.head) return -1; if(b.tail < head) return 1; return 0; } } On equality / assignment, one just has to combine the heads and tail with min/max, respectively, and update hasAliases, etc. The difficulty is when a new node 'bridges the gap' between two existing nodes. This has to handled explicitly as part of the tree re-balancing, but you may want to consider making the merging of nodes part of the comparison operator for simplicity/efficiency: head = min(head,b.head); tail = max(tail,b.tail); hasAliases = true; After pruning, updating meta-data, etc, the aliased memory chunk for any given pointer can be found using a separate comparison operator: int opCmp(const ref void* b) { if(tail < b ) return -1; if(b < head) return 1; return 0; } // Which you'd probably use like: if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; //... } else {//... So at the pseudo-code level, it would be something like: foreach(obj; [objects,pointers,arrays]) { auto node = Node(obj); setOfAliases.insert(node); // Convert obj into an intermediate form } setOfAliases.pruneUnaliasedNodes; setOfAliases.postProcessMetadata; // i.e. assign unique alias ID, etc foreach(obj; [objects,pointers,arrays]) { if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; // Perform an aliased archival } else { // Perform an unaliased archival } } I hope that helped, though I'm not sure if I really answered your question or not.

Yes, I think it at least helped somewhat. The thing is that I'm not very familiar with using trees like these. Would this something similar to: https://github.com/jacob-carlborg/orange/blob/master/orange/serializatio /Serializer.d#L1520 ? What is the advantage with using a tree? Is the advantage that you loop over the elements once in the pseudo-code compared to that I loop over them twice, as in: https://github.com/jacob-carlborg/orange/blob/master/orange/serializatio /Serializer.d#L1495 ? -- /Jacob Carlborg
Oct 04 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-04 17:14, Robert Jacques wrote:
 I'm sorry, you're right. In my mind b extended to the end of the a
 array, for some reason. However, if you do define b to extend to the end
 of the a array, then it can append without allocating:

 auto a = [0, 1, 2, 3, 4];
 auto b = a[2 .. $];
 b ~= 5;
 assert(b[0]==2);
 a[2] = 10;
 assert(b[0]==10);

 So please don't dismiss this point.

Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not changed when "b" is changed. Since you append to "b" I though that it would always require a reallocation of "b".
 [snip]

 Would this something similar to:
 https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1520

 ?

I'm not sure, that routine seems to be filtering pointers between those with aliases and those without aliases, which would be similar in effect to: if( auto node = arr.ptr in setOfAliases ) {} else {}
 What is the advantage with using a tree? Is the advantage that you loop
 over the elements once in the pseudo-code compared to that I loop over
 them twice, as in:
 https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1495

 ?

Primarily, it's O(N logN) vs O(N^2). Also, it solves the isSliceOf problem we discussed above and puts arrays and objects into the same framework, as objects containing fixed sized arrays can have slices.

I guess I just have to try this and see how it works out. Anyway, thank you for your review and your patience. -- /Jacob Carlborg
Oct 04 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-10-05 03:49, Robert Jacques wrote:
 On Tue, 04 Oct 2011 12:54:27 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-04 17:14, Robert Jacques wrote:
 I'm sorry, you're right. In my mind b extended to the end of the a
 array, for some reason. However, if you do define b to extend to the end
 of the a array, then it can append without allocating:

 auto a = [0, 1, 2, 3, 4];
 auto b = a[2 .. $];
 b ~= 5;
 assert(b[0]==2);
 a[2] = 10;
 assert(b[0]==10);

 So please don't dismiss this point.

Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not changed when "b" is changed. Since you append to "b" I though that it would always require a reallocation of "b".

Appending has never mandated reallocation. Maybe you're confusing it with concatenation (i.e. b = b ~ 5;), which does always reallocate. Or the period in time when the anti-array-stomping parts of druntime was overly conservative and would have prevented b from appending in place.

Yeah, maybe you're right. -- /Jacob Carlborg
Oct 04 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-10-01 06:29, Robert Jacques wrote:
 On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob me.com> wrote:


[snip]
 (2)
 orange.serialization.archives.XmlArchive need to be documented.

I was hoping the Archive interface and the Base abstract class would be enough.

For the pre-review its okay. But you'll need it for the actual review.
 (3)
 if Archive.Array (which is poorly named btw) "is a type independent
 representation of an array" then why does it contain an elementSize field?

Suggestions for other names are welcome. Perhaps it was poorly worded, but what I mean is that this type can represent all array types.
 (3a)
 Also by the time archiving is called, isSliceOf should always return
 false.

Why is that?

If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc. That means that every single archive format must implement an alias resolution algorithm. This results in a lot of copy/paste boiler plate which has to be maintained. It also makes it more difficult to get extra formats supported. Worse if someone forgets to either include this functionality or to apply a patch, silent bugs and/or slowdowns are introduced. All and archiver should be responsible for is taking some decorated data structure and converting it to XML/JSON/YAML/etc and back again. Anything more complex than that should be shared functionality located in the serializer / de-serializer objects.
 That this function exists speaks to design problems both large
 and small. On the small scale, isSliceOf indicates that you are testing
 every array against every other array for slicing, which I hope is not
 the case.

I do. How would I otherwise discover if an array is a slice of another array or not?

Okay, first some rational. Consider: assert(!a.isSliceOf(b)); assert(!b.isSliceOf(a)); assert( c.isSliceOf(a)); assert( c.isSliceOf(b)); and class Foo { float x; float[3] point; } void main() { auto foo = new Foo; auto ptr = &foo.x; auto slice = point[0..2]; } In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient. The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance. As an optimization, the user should be able to 'finalize' the serialization by pruning away all memory chunks without aliases. (i.e. a serializeAndClear method)
Oct 01 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 03 Oct 2011 02:38:22 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-10-02 00:52, Robert Jacques wrote:
 On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob me.com> wrote:


[snip]
 Also by the time archiving is called, isSliceOf should always return
 false.

Why is that?

If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc.

No, who says that. You can take this struct and use it outside of this library, it knows nothing about archiving or serialization. If the isSliceOf method should return false when archiving has been called I would need to add logic to detect when serialization and archiving has begun and ended.

So, in essence, you are saying that by the time archiving occurs, isSliceOf will always return false? Then why is it part of the public API?
 That means that every
 single archive format must implement an alias resolution algorithm.

No, the serializer performs this task.

Okay. [snip]
 I do. How would I otherwise discover if an array is a slice of another
 array or not?

Okay, first some rational. Consider: assert(!a.isSliceOf(b)); assert(!b.isSliceOf(a)); assert( c.isSliceOf(a)); assert( c.isSliceOf(b)); and class Foo { float x; float[3] point; } void main() { auto foo = new Foo; auto ptr = &foo.x; auto slice = point[0..2]; } In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient.

This is how it works: As the first step all arrays are serialized as regular arrays and not slices. After all serialization is done I loop over all arrays and check if they are a slice of some other array. If I found a match I replace the serialized array with a slice instead. These arrays are stored as an associative array with the type Array[Id]. I don't know if there's a better data structure for this.

I presented one below.
 The solution, in my mind, is to think in terms of memory blocks/chucks.
 Every reference can be thought as pointing to a memory chunk defined by
 two pointers and a flag:

 {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this chunk
 }

 For alias detection / resolution, you build a balanced tree of memory
 chunks, widening the chunk and flagging hasAliases as appropriate. Which
 should give you O(N log(N)) performance.

I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?

Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm. I'm assuming you currently separate object and array alias resolution and don't handle pointers at all.
Oct 03 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 03 Oct 2011 14:10:52 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-10-03 15:57, Robert Jacques wrote:
 So, in essence, you are saying that by the time archiving occurs,
 isSliceOf will always return false? Then why is it part of the public API?

No, I'm not saying that. Example: struct Array { void* ptr; size_t length; size_t elementSize; bool isSliceOf (Array b) { return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length * b.elementSize; } } void main () { auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. 4]; auto aArr = Array(a.ptr, a.length, 4); auto bArr = Array(b.ptr, b.length, 4); assert(bArr.isSliceOf(aArr)); assert(!aArr.isSliceOf(bArr)); } Both the asserts in the above code passes as expected. See, no serialization or archiving in sight. Is there something I'm missing?

That putting isSliceOf in the public API, implies its usage by the archiver.
 Actually it does not need to be part of the public API when I think
 about it. I can move it into Serializer. Array would still need to be
 public since both Serailzer and Archive need access to it and the
 package attribute doesn't work very well.

Regarding design, I agree, although I'd go one further and define Array as a public type inside the Serializer class. However, this concept of an 'Array' is fundamentally flawed. Consider: auto c = a[1..3]; auto cArr = Array(c.ptr,c.length,4); assert(!cArr.isSliceOf(bArr)); assert(!bArr.isSliceOf(cArr)); // and b ~= 5; bArr = Array(b.ptr, b.length, 4); assert(!bArr.isSliceOf(aArr)); In short, a serializer must be capable of handling overlapping arrays, not just strict slices. The array representation therefore needs to parameterize both the underlying array common to all slices and the actual slice that was serialized for that key.
 The solution, in my mind, is to think in terms of memory blocks/chucks.
 Every reference can be thought as pointing to a memory chunk defined by
 two pointers and a flag:

 {
 void* head; // Start of the memory chunk
 void* tail; // End of the memory chunk
 bool hasAliases; // True if there are more than one reference to this
 chunk
 }

 For alias detection / resolution, you build a balanced tree of memory
 chunks, widening the chunk and flagging hasAliases as appropriate. Which
 should give you O(N log(N)) performance.

I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?

Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm.

Now I think I start to understand.
 I'm assuming you
 currently separate object and array alias resolution and don't handle
 pointers at all.

Yes. Pointers are handled as well. It's handled similar to arrays/slices. First it always serializes what the pointer points to. Then in a post process step, after all serialization is done, it replaces all serialized pointers, that points to a value that has been serialized, with a reference. If a given pointer doesn't point to a value that has be serialized it's left as is.

Can a pointer point to the interior of an object? To an element of an array?
 I can see if I this memory chunk approach can be used instead. How will
 this be used with the balanced tree, could you give a simple example?

Well, balanced trees need a comparison function so: struct Node { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk //... Other meta-data, i.e. ID, int opCmp(const ref Node b) { if( tail < b.head) return -1; if(b.tail < head) return 1; return 0; } } On equality / assignment, one just has to combine the heads and tail with min/max, respectively, and update hasAliases, etc. The difficulty is when a new node 'bridges the gap' between two existing nodes. This has to handled explicitly as part of the tree re-balancing, but you may want to consider making the merging of nodes part of the comparison operator for simplicity/efficiency: head = min(head,b.head); tail = max(tail,b.tail); hasAliases = true; After pruning, updating meta-data, etc, the aliased memory chunk for any given pointer can be found using a separate comparison operator: int opCmp(const ref void* b) { if(tail < b ) return -1; if(b < head) return 1; return 0; } // Which you'd probably use like: if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; //... } else {//... So at the pseudo-code level, it would be something like: foreach(obj; [objects,pointers,arrays]) { auto node = Node(obj); setOfAliases.insert(node); // Convert obj into an intermediate form } setOfAliases.pruneUnaliasedNodes; setOfAliases.postProcessMetadata; // i.e. assign unique alias ID, etc foreach(obj; [objects,pointers,arrays]) { if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; // Perform an aliased archival } else { // Perform an unaliased archival } } I hope that helped, though I'm not sure if I really answered your question or not.
Oct 03 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 04 Oct 2011 03:22:35 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-10-04 07:21, Robert Jacques wrote:

[snip]
 Actually it does not need to be part of the public API when I think
 about it. I can move it into Serializer. Array would still need to be
 public since both Serailzer and Archive need access to it and the
 package attribute doesn't work very well.

Regarding design, I agree, although I'd go one further and define Array as a public type inside the Serializer class. However, this concept of an 'Array' is fundamentally flawed. Consider: auto c = a[1..3]; auto cArr = Array(c.ptr,c.length,4); assert(!cArr.isSliceOf(bArr)); assert(!bArr.isSliceOf(cArr));

Any suggestion how to fix this, how to properly detect if an array is a slice of some other array?

Well, there are two problems. First, there is an API issue: When you pass an aliased array to the archiver, you must pass both an array and a slice of that array. Second, is detection. The comparison operator I suggested below covers this issue. Just remember that with a loop based approach, you can't terminate the search early: you have to check all existing arrays for possible matches.
 // and

 b ~= 5;

 bArr = Array(b.ptr, b.length, 4);

 assert(!bArr.isSliceOf(aArr));

Appending to "b" will reallocate "b" making it a regular array and not a slice: b ~= 5; b[] = 100; assert(a == [0, 1, 2, 3, 4]); "a" is not modified and the assert passes.

I'm sorry, you're right. In my mind b extended to the end of the a array, for some reason. However, if you do define b to extend to the end of the a array, then it can append without allocating: auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. $]; b ~= 5; assert(b[0]==2); a[2] = 10; assert(b[0]==10); So please don't dismiss this point. [snip]
 Would this something similar to:
 https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1520
 ?

I'm not sure, that routine seems to be filtering pointers between those with aliases and those without aliases, which would be similar in effect to: if( auto node = arr.ptr in setOfAliases ) {} else {}
 What is the advantage with using a tree? Is the advantage that you loop
 over the elements once in the pseudo-code compared to that I loop over
 them twice, as in:
 https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/Serializer.d#L1495
 ?

Primarily, it's O(N logN) vs O(N^2). Also, it solves the isSliceOf problem we discussed above and puts arrays and objects into the same framework, as objects containing fixed sized arrays can have slices.
Oct 04 2011
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 04 Oct 2011 12:54:27 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-04 17:14, Robert Jacques wrote:
 I'm sorry, you're right. In my mind b extended to the end of the a
 array, for some reason. However, if you do define b to extend to the end
 of the a array, then it can append without allocating:

 auto a = [0, 1, 2, 3, 4];
 auto b = a[2 .. $];
 b ~= 5;
 assert(b[0]==2);
 a[2] = 10;
 assert(b[0]==10);

 So please don't dismiss this point.

Hmm, D1 and D2 behaves differently in this case. In D1 "a" is not changed when "b" is changed. Since you append to "b" I though that it would always require a reallocation of "b".

Appending has never mandated reallocation. Maybe you're confusing it with concatenation (i.e. b = b ~ 5;), which does always reallocate. Or the period in time when the anti-array-stomping parts of druntime was overly conservative and would have prevented b from appending in place.
Oct 04 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.

How about overrideSerializer or overloadSerializer? [snip]
 Both

 T deserialize (T)();
 T deserialize (T)(string key);

 have the following example:

 class Foo
 {
 int a;

 void fromData (Serializer serializer, Serializer.Data key)
 {
 a = serializer!(int)("a");
 }
 }

No, "T deserialize (T)(string key)" has: a = serializer!(int)("a"); And "T deserialize (T)()" has: a = serializer!(int)(); Both are correct. This is a complete example of using one of these methods: https://github.com/jacob-carlborg/orange/blob/master/tests/Custom.d Let me know if anything is confusing.

Umm... example code for the deserialize method should contain 'deserialize' somewhere inside it.
Oct 02 2011
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-03 05:50, Robert Jacques wrote:
 On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-10-01 05:00, Robert Jacques wrote:
 I agree, which is why I suggested lookup should have some granuality.
 i.e. that there is both a global store of serialization methods and a
 per instance store of serialization methods. Lookup would first look in
 the local store before defaulting to the global store. But this should
 be a separate pair of functions.

Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.

How about overrideSerializer or overloadSerializer?

registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method?

Yes. Sorry for being unclear. The concept being that at the instance level, you are overriding default behavior.
Oct 03 2011