www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Implementing serialisation with minmal boilerplate and template

reply Stefan Koch <uplink.coder googlemail.com> writes:
Good Day everyone,

A friend asked me recently of core.reflect could be used for 
serialization.

The code to do that generically is about 100 lines of code.
Since it's so little I am just going to post it here verbatim.

```D
class C {  ("NoSerialize") ubyte a; uint b;  ("NoSerialize") 
ulong c;
     this(ubyte a, uint b, ulong c)
     {
         this.a = a; this.b = b; this.c = c;
     }
}
struct S {  ("NoSerialize") ubyte a; uint b; ubyte c; 
 ("NoSerialize") ulong d; }

import core.reflect.reflect;
/// if a type is an aggregate type return the aggregate 
declaration
/// otherwise return null
AggregateDeclaration aggregateFromType(const Type type)
{
     AggregateDeclaration result = null;
     if (auto ts = cast(TypeStruct)type)
     {
         result = ts.sym;
     } else if (auto tc = cast(TypeClass)type)
     {
         result = tc.sym;
     }
     return result;
}

bool hasNoSerializeAttrib(const Declaration d)
{
     bool NoSerialize = false;
     foreach(attr;d.attributes)
     {
         auto se = cast(StringLiteral)attr;
         if (se && se.string_ == "NoSerialize")
         {
           NoSerialize = true;
           break;
         }
     }
     return NoSerialize;
}
   /// the only template in here. only one instances per 
serialzed-root-type
   const(ubyte[]) serialize(T)(T value) {
     static immutable Type type = cast(immutable Type) 
nodeFromName("T");
     return serializeType(cast(ubyte*)&value, type);
   }

   const(ubyte[]) serializeType(const ubyte* ptr, const Type type)
   {
     // writeShallowTypeDescriptor(Type);
     // printf("Serializing type: %s\n", type.identifier.ptr);
     if (auto sa = cast (TypeArray) type)
     {
         ulong length = sa.dim;
         const void* values = ptr;
         auto elemType = sa.nextOf;
         return serializeArray(length, ptr, elemType);
     }
     else if (auto da = cast(TypeSlice) type)
     {
         ulong length = *cast(size_t*) ptr;
         const void* values = *cast(const ubyte**)(ptr + 
size_t.sizeof);
         auto elemType = da.nextOf;
         return serializeArray(length, ptr, elemType);
     }
     else if (auto ts = cast(TypeStruct) type)
     {
         auto fields = ts.sym.fields;
         return serializeAggregate(ptr, fields);
     }
     else if (auto tc = cast(TypeClass) type)
     {
         auto fields = tc.sym.fields;
         return serializeAggregate(*cast(const ubyte**)ptr, 
fields);
     }
     else if (auto tb = cast(TypeBasic) type)
     {
         // writeTypeTag ?
         return ptr[0 .. type.size];
     }
     else assert(0, "Serialisation for " ~ type.identifier ~ " not 
implemented .. " ~ (cast()type).toString());
   }

   ubyte[] serializeArray(ulong length, const ubyte* ptr, const 
Type elemType)
   {
     ubyte[] result;
     // writeLength(length)
     foreach(i; 0 .. length)
     {
       result ~= serializeType(ptr, elemType);
     }
     return result;
   }

   ubyte[] serializeAggregate(const ubyte* ptr, 
VariableDeclaration[] fields)
   {
     ubyte[] result;
     foreach(f; fields)
     {
       if (hasNoSerializeAttrib(f))
       {
         // skip fields which are annotated with noSerialize;
         continue;
       }
       result ~= serializeType(ptr + f.offset, f.type);
     }
     return result;
   }


void main()
{
     S s = S(72, 19992034, 98);
     C c = new C(72, 19992039, 98);
     auto buffer = serialize(s);
     assert(buffer.length == 5);
     assert ((buffer[0] | buffer[1] << 8 | buffer[2] << 16 | 
buffer[3] << 24) == 19992034);
     assert (buffer[4] == 98);
     auto buffer_c = serialize(c);
     assert(buffer_c.length == 4);
     assert ((buffer_c[0] | buffer_c[1] << 8 | buffer_c[2] << 16 | 
buffer_c[3] << 24) == 19992039);
     // we can see the fields annotated with  ("NoSerialize") are 
skipped
}
```

I would like to know what you think about this

In my next example I will show how you can modify serialization 
for library types source-code you can't control.

However for that to work `core.reflect` needs to be extended a 
little ;)

Cheers and have a nice day,

Stefan
Aug 15
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 15 August 2021 at 11:18:57 UTC, Stefan Koch wrote:
 Good Day everyone,

 A friend asked me recently of core.reflect could be used for 
 serialization.
 [ ... ]

 I would like to know what you think about this
There was a bug in the code I posted. I should have run every path before calling it a day ;) The code for dynamic array serialization ```d ulong length = *cast(size_t*) ptr; const void* values = *cast(const ubyte**)(ptr + size_t.sizeof); auto elemType = da.nextOf; return serializeArray(length, ptr, elemType); ``` has to be ```d ulong length = *cast(size_t*) ptr; const ubyte* values = *cast(const ubyte**)(ptr + size_t.sizeof); auto elemType = da.nextOf; return serializeArray(length, ptr, values); ``` and of course serializeArray has to write the length as well for this to work as without the length information you cannot de-serialize.
Aug 15
prev sibling parent reply drug <drug2004 bk.ru> writes:
15.08.2021 14:18, Stefan Koch пишет:
 Good Day everyone,
 
 A friend asked me recently of core.reflect could be used for serialization.
 
 The code to do that generically is about 100 lines of code.
 Since it's so little I am just going to post it here verbatim.
 
 ```D
 class C {  ("NoSerialize") ubyte a; uint b;  ("NoSerialize") ulong c;
      this(ubyte a, uint b, ulong c)
      {
          this.a = a; this.b = b; this.c = c;
      }
 }
 struct S {  ("NoSerialize") ubyte a; uint b; ubyte c;  ("NoSerialize") 
 ulong d; }
 
 import core.reflect.reflect;
 /// if a type is an aggregate type return the aggregate declaration
 /// otherwise return null
 AggregateDeclaration aggregateFromType(const Type type)
 {
      AggregateDeclaration result = null;
      if (auto ts = cast(TypeStruct)type)
      {
          result = ts.sym;
      } else if (auto tc = cast(TypeClass)type)
      {
          result = tc.sym;
      }
      return result;
 }
 
 bool hasNoSerializeAttrib(const Declaration d)
 {
      bool NoSerialize = false;
      foreach(attr;d.attributes)
      {
          auto se = cast(StringLiteral)attr;
          if (se && se.string_ == "NoSerialize")
          {
            NoSerialize = true;
            break;
          }
      }
      return NoSerialize;
 }
    /// the only template in here. only one instances per 
 serialzed-root-type
    const(ubyte[]) serialize(T)(T value) {
      static immutable Type type = cast(immutable Type) nodeFromName("T");
      return serializeType(cast(ubyte*)&value, type);
    }
 
    const(ubyte[]) serializeType(const ubyte* ptr, const Type type)
    {
      // writeShallowTypeDescriptor(Type);
      // printf("Serializing type: %s\n", type.identifier.ptr);
      if (auto sa = cast (TypeArray) type)
      {
          ulong length = sa.dim;
          const void* values = ptr;
          auto elemType = sa.nextOf;
          return serializeArray(length, ptr, elemType);
      }
      else if (auto da = cast(TypeSlice) type)
      {
          ulong length = *cast(size_t*) ptr;
          const void* values = *cast(const ubyte**)(ptr + size_t.sizeof);
          auto elemType = da.nextOf;
          return serializeArray(length, ptr, elemType);
      }
      else if (auto ts = cast(TypeStruct) type)
      {
          auto fields = ts.sym.fields;
          return serializeAggregate(ptr, fields);
      }
      else if (auto tc = cast(TypeClass) type)
      {
          auto fields = tc.sym.fields;
          return serializeAggregate(*cast(const ubyte**)ptr, fields);
      }
      else if (auto tb = cast(TypeBasic) type)
      {
          // writeTypeTag ?
          return ptr[0 .. type.size];
      }
      else assert(0, "Serialisation for " ~ type.identifier ~ " not 
 implemented .. " ~ (cast()type).toString());
    }
 
    ubyte[] serializeArray(ulong length, const ubyte* ptr, const Type 
 elemType)
    {
      ubyte[] result;
      // writeLength(length)
      foreach(i; 0 .. length)
      {
        result ~= serializeType(ptr, elemType);
      }
      return result;
    }
 
    ubyte[] serializeAggregate(const ubyte* ptr, VariableDeclaration[] 
 fields)
    {
      ubyte[] result;
      foreach(f; fields)
      {
        if (hasNoSerializeAttrib(f))
        {
          // skip fields which are annotated with noSerialize;
          continue;
        }
        result ~= serializeType(ptr + f.offset, f.type);
      }
      return result;
    }
 
 
 void main()
 {
      S s = S(72, 19992034, 98);
      C c = new C(72, 19992039, 98);
      auto buffer = serialize(s);
      assert(buffer.length == 5);
      assert ((buffer[0] | buffer[1] << 8 | buffer[2] << 16 | buffer[3] 
 << 24) == 19992034);
      assert (buffer[4] == 98);
      auto buffer_c = serialize(c);
      assert(buffer_c.length == 4);
      assert ((buffer_c[0] | buffer_c[1] << 8 | buffer_c[2] << 16 | 
 buffer_c[3] << 24) == 19992039);
      // we can see the fields annotated with  ("NoSerialize") are skipped
 }
 ```
 
 I would like to know what you think about this
 
 In my next example I will show how you can modify serialization for 
 library types source-code you can't control.
 
 However for that to work `core.reflect` needs to be extended a little ;)
 
 Cheers and have a nice day,
 
 Stefan
I'm impressed by your work. But I have one question - when ordinary people like me can be able to use all these amazing things you did? Iterating over aggregate members considering their type, attributes and runtime value is so often in my practice that I'm really interesting in your core.reflect but I can use only official compiler w/o any customization.
Aug 15
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 15 August 2021 at 12:27:44 UTC, drug wrote:
 I'm impressed by your work. But I have one question - when 
 ordinary people like me can be able to use all these amazing 
 things you did? Iterating over aggregate members considering 
 their type, attributes and runtime value is so often in my 
 practice that I'm really interesting in your core.reflect but I 
 can use only official compiler w/o any customization.
I have to write a DIP and have it approved. I am already starting as this one is much simpler than the type function project. Essentially the changes to the core language spec is just a few lines. I suspect the runtime documentation will be more challenging. But for that at least I have already specified a data-structure of there is not much variability to take into account. I cannot give a definitive timeline but I would hope for this to go in before 2022 is over.
Aug 15
parent reply Temtaime <temtaime gmail.com> writes:
On Sunday, 15 August 2021 at 14:21:00 UTC, Stefan Koch wrote:
 On Sunday, 15 August 2021 at 12:27:44 UTC, drug wrote:
 [...]
I have to write a DIP and have it approved. I am already starting as this one is much simpler than the type function project. Essentially the changes to the core language spec is just a few lines. I suspect the runtime documentation will be more challenging. But for that at least I have already specified a data-structure of there is not much variability to take into account. I cannot give a definitive timeline but I would hope for this to go in before 2022 is over.
Hello. Take a look at https://github.com/Temtaime/utile/blob/main/source/utile/binary/tests.d :) Maybe someone will found this library of me useful
Aug 16
parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Monday, 16 August 2021 at 16:53:54 UTC, Temtaime wrote:
 On Sunday, 15 August 2021 at 14:21:00 UTC, Stefan Koch wrote:
 On Sunday, 15 August 2021 at 12:27:44 UTC, drug wrote:
 [...]
I have to write a DIP and have it approved. I am already starting as this one is much simpler than the type function project. Essentially the changes to the core language spec is just a few lines. I suspect the runtime documentation will be more challenging. But for that at least I have already specified a data-structure of there is not much variability to take into account. I cannot give a definitive timeline but I would hope for this to go in before 2022 is over.
Hello. Take a look at https://github.com/Temtaime/utile/blob/main/source/utile/binary/tests.d :) Maybe someone will found this library of me useful
dlang sure seems to inspire run time serializers: https://code.dlang.org/search?q=serialization My home brew serialization is not as sophisticated as what you've written, let alone what Stefan is doing at compile time.
Aug 16
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/16/21 5:18 PM, Bruce Carneal wrote:
 On Monday, 16 August 2021 at 16:53:54 UTC, Temtaime wrote:
 On Sunday, 15 August 2021 at 14:21:00 UTC, Stefan Koch wrote:
 On Sunday, 15 August 2021 at 12:27:44 UTC, drug wrote:
 [...]
I have to write a DIP and have it approved. I am already starting as this one is much simpler than the type function project. Essentially the changes to the core language spec is just a few lines. I suspect the runtime documentation will be more challenging. But for that at least I have already specified a data-structure of there is not much variability to take into account. I cannot give a definitive timeline but I would hope for this to go in before 2022 is over.
Hello. Take a look at https://github.com/Temtaime/utile/blob/main/source/utile/binary/tests.d :) Maybe someone will found this library of me useful
dlang sure seems to inspire run time serializers: https://code.dlang.org/search?q=serialization My home brew serialization is not as sophisticated as what you've written, let alone what Stefan is doing at compile time.
[D is for (de)serialization](https://dconf.org/2019/talks/schveighoffer.html) ;) There's at least one talk about serialization in almost every dconf I think. -Steve
Aug 16
parent reply russhy <russhy gmail.com> writes:
Is this runtime reflection?

Will this depend on the GC? if so does it add pressure to the GC?

We already have compile time type introspection, i don't think 
it's wise to move things to runtime, we have a poor GC adding 
more pressure to it is just bad

Compile time reflection already proved to be superior in heavy 
workloads


examples to follow

Also if it uses the GC, i'm not sure "core" package is the go, 
should be put on "std", or as a library imo
Aug 17
next sibling parent reply 12345swordy <alexanderheistermann gmail.com> writes:
On Tuesday, 17 August 2021 at 13:52:19 UTC, russhy wrote:
 Is this runtime reflection?

 Will this depend on the GC? if so does it add pressure to the 
 GC?

 We already have compile time type introspection, i don't think 
 it's wise to move things to runtime, we have a poor GC adding 
 more pressure to it is just bad

 Compile time reflection already proved to be superior in heavy 
 workloads


 examples to follow

 Also if it uses the GC, i'm not sure "core" package is the go, 
 should be put on "std", or as a library imo
of the basic principles of OOP. -Alex
Aug 17
next sibling parent reply russhy <russhy gmail.com> writes:
On Tuesday, 17 August 2021 at 14:16:52 UTC, 12345swordy wrote:
 On Tuesday, 17 August 2021 at 13:52:19 UTC, russhy wrote:
 Is this runtime reflection?

 Will this depend on the GC? if so does it add pressure to the 
 GC?

 We already have compile time type introspection, i don't think 
 it's wise to move things to runtime, we have a poor GC adding 
 more pressure to it is just bad

 Compile time reflection already proved to be superior in heavy 
 workloads


 examples to follow

 Also if it uses the GC, i'm not sure "core" package is the go, 
 should be put on "std", or as a library imo
because of the basic principles of OOP. -Alex
As well as other atrocities such as runtime code generation / runtime dependencies
 It's for introspecting over code at compile time, not at 
 runtime. Stefan and I have been mulling over this for ages and 
 I think we both think it can do more than reflection as 
 currently know it at least. This let's you drink from the 
 firehose, so to speak.
Oh i see, so the goal is not what i was thinking, my bad! I guess will have to read the DIP to know more about it, that is interesting, i'm curious now
Aug 17
next sibling parent Adam Ruppe <destructionator gmail.com> writes:
On Tuesday, 17 August 2021 at 14:52:22 UTC, russhy wrote:
 I guess will have to read the DIP to know more about it, that 
 is interesting, i'm curious now
basically it provides the same reflection we have now but as CTFE classes instead of compiler traits and is expression matching.
Aug 17
prev sibling parent 12345swordy <alexanderheistermann gmail.com> writes:
On Tuesday, 17 August 2021 at 14:52:22 UTC, russhy wrote:
 On Tuesday, 17 August 2021 at 14:16:52 UTC, 12345swordy wrote:
 On Tuesday, 17 August 2021 at 13:52:19 UTC, russhy wrote:
 Is this runtime reflection?

 Will this depend on the GC? if so does it add pressure to the 
 GC?

 We already have compile time type introspection, i don't 
 think it's wise to move things to runtime, we have a poor GC 
 adding more pressure to it is just bad

 Compile time reflection already proved to be superior in 
 heavy workloads


 examples to follow

 Also if it uses the GC, i'm not sure "core" package is the 
 go, should be put on "std", or as a library imo
because of the basic principles of OOP. -Alex
As well as other atrocities such as runtime code generation / runtime dependencies
Which is not a inherently an evil thing, so there is no need for hyperbolic language such as using the word atrocities. It's all about the trade offs. - Alex
Aug 17
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Tuesday, 17 August 2021 at 14:16:52 UTC, 12345swordy wrote:

 because of the basic principles of OOP.

 -Alex
How does basic principles of oop force java implementations use runtime reflection??? It would mean that D also has to use runtime reflection because it has OOP support. Regards, Alexandru.
Aug 17
parent reply 12345swordy <alexanderheistermann gmail.com> writes:
On Tuesday, 17 August 2021 at 18:10:48 UTC, Alexandru Ermicioi 
wrote:
 On Tuesday, 17 August 2021 at 14:16:52 UTC, 12345swordy wrote:

 because of the basic principles of OOP.

 -Alex
How does basic principles of oop force java implementations use runtime reflection???
https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming) You cannot obtain information such as "How many child classes does this class currently has", without compiling every code/library that you use, which isn't feasible as not every library share their source code. - Alex
Aug 17
parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Tuesday, 17 August 2021 at 20:23:48 UTC, 12345swordy wrote:
 On Tuesday, 17 August 2021 at 18:10:48 UTC, Alexandru Ermicioi 
 wrote:
 On Tuesday, 17 August 2021 at 14:16:52 UTC, 12345swordy wrote:

 because of the basic principles of OOP.

 -Alex
How does basic principles of oop force java implementations use runtime reflection???
https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming) You cannot obtain information such as "How many child classes does this class currently has", without compiling every code/library that you use, which isn't feasible as not every library share their source code. - Alex
I still fail to see relation between oop and runtime info use. The part about source code sharing is also true for c and d libs, which both could share only header files and have pointer to struct declared without said struct declaration (i.e. Opaque struct type, not sure completely about D though). The thing is, that runtime reflection in java is more or less usable compared to d (sigh), and can be used to design libs that are using that data to do their job, but this doesn't mean that there aren't any options on using compile time info to generate or alter compiled code. Take for example lombok project, jpa model generator from hibernate, or mapstruct library which is a mapper from one java type to another (kinda close to serializers), all of them are used at compile time to either generate new code, or alter existing one, based on annotation processor plugin feature offered by java compiler, not to mention byte code enhancement capabilities, and libs using them. and alteration. Regards, Alexandru.
Aug 17
parent Arafel <er.krali gmail.com> writes:
On 17/8/21 22:50, Alexandru Ermicioi wrote:
 On Tuesday, 17 August 2021 at 20:23:48 UTC, 12345swordy wrote:
 On Tuesday, 17 August 2021 at 18:10:48 UTC, Alexandru Ermicioi wrote:
 On Tuesday, 17 August 2021 at 14:16:52 UTC, 12345swordy wrote:

 the basic principles of OOP.

 -Alex
How does basic principles of oop force java implementations use runtime reflection???
https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming) You cannot obtain information such as "How many child classes does this class currently has", without compiling every code/library that you use, which isn't feasible as not every library share their source code. - Alex
I still fail to see relation between oop and runtime info use. The part about source code sharing is also true for c and d libs, which both could share only header files and have pointer to struct declared without said struct declaration (i.e. Opaque struct type, not sure completely about D though). The thing is, that runtime reflection in java is more or less usable compared to d (sigh), and can be used to design libs that are using that data to do their job, but this doesn't mean that there aren't any options on using compile time info to generate or alter compiled code. Take for example lombok project, jpa model generator from hibernate, or mapstruct library which is a mapper from one java type to another (kinda close to serializers), all of them are used at compile time to either generate new code, or alter existing one, based on annotation processor plugin feature offered by java compiler, not to mention byte code enhancement capabilities, and libs using them. alteration. Regards, Alexandru.
There's at least a use case that can only be solved with runtime reflection: dynamically loaded code (so, dlopen or similar) where no source code is available at all, for example if you want to implement a plugin system. For sure you can have the code "register" itself, and add some kind of "description" or metadata, but if you want for instance to implement a persistence / serialization system in such an environment, you end up all but implementing your own version of runtime reflection. I don't have that much experience with that, but I think that separate compilation could also have issues, for instance if you want to distribute your program in binary form, and still allow people to compile plugins / extensions for it. A workaround might be to distribute enough of the source code in the .di files, but I'm not sure how workable this would be.
Aug 18
prev sibling parent max haughton <maxhaton gmail.com> writes:
On Tuesday, 17 August 2021 at 13:52:19 UTC, russhy wrote:
 Is this runtime reflection?

 Will this depend on the GC? if so does it add pressure to the 
 GC?

 We already have compile time type introspection, i don't think 
 it's wise to move things to runtime, we have a poor GC adding 
 more pressure to it is just bad

 Compile time reflection already proved to be superior in heavy 
 workloads


 examples to follow

 Also if it uses the GC, i'm not sure "core" package is the go, 
 should be put on "std", or as a library imo
It's for introspecting over code at compile time, not at runtime. Stefan and I have been mulling over this for ages and I think we both think it can do more than reflection as currently know it at least. This let's you drink from the firehose, so to speak.
Aug 17