www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Beyond the veil: What's after type functions

reply Stefan Koch <uplink.coder googlemail.com> writes:
Good Evening,

I recently had a programming task in my hobby project where I 
needed to interface with a C library that uses manual closures 
using void**.
This quickly lead to bugs where one would forget to deref twice 
for read or accidental double deference on write (causing the 
written value to be invisible).

After that casting to the correct structure was an issue, since 
(void**) trows away typing information.

So I wanted a discriminated union (a "sumtype") to which I would 
use for all closures where the discriminant would tell me the 
rumtime type.

This is what I ended up with:

---
struct UploadClosure
{
     MHD_PostProcessor* pp;
     const (char)* filename;
     FILE* f;
     size_t totalSize;
}

struct KeyServerClosure
{
     bool processing_request;
}

enum ConnectionClosureKind
{
     Invalid,
     UploadClosure,
     KeyServerClosure,
}

ConnectionClosure* newClosure(ConnectionClosureKind kind)
{
     alias Kind = ConnectionClosureKind;
     const closureSize = closureSize(kind);

     ConnectionClosure* result = /*new ConnectionClosure(); //*/ 
cast(ConnectionClosure*)new ubyte[](closureSize).ptr;

     result.kind = kind;

     final switch (kind)
     {
         case Kind.UploadClosure :
         {
             // result.uploadClosure = new UploadClosure();
             result.uploadClosure = cast(UploadClosure*) 
((cast(void*)result) + (*result).sizeof);
         } break;

         case Kind.KeyServerClosure : {
             result.keyServerClosure = cast(KeyServerClosure*) 
((cast(void*)result) + (*result).sizeof);
         } break;

         case Kind.Invalid : {};
     }

     // import std.stdio; writeln(*result);

     return result;
}

struct ConnectionClosure
{
     ConnectionClosureKind kind;
     union
     {
         UploadClosure* uploadClosure;
         KeyServerClosure* keyServerClosure;
     }
}

size_t closureSize (ConnectionClosureKind kind) pure nothrow  nogc
{
     size_t result = ConnectionClosure.sizeof;

     final switch (kind)
     {
         case ConnectionClosureKind.UploadClosure : result += 
UploadClosure.sizeof;
         case ConnectionClosureKind.KeyServerClosure : result += 
KeyServerClosure.sizeof;
         case ConnectionClosureKind.Invalid : {}
     }

     return result;
}
---

You can easily Imagine the 'struct ConnectionClosure' and the 
`new Closure` function to be generate from the `enum 
ConnectionClosureKind` which would reduce the code snippet I 
posted by 54 lines (which is over 70% of the 75 line snippet).

Traditionally in D one would use templates and static foreach, 
which are known to scale awfully with regards to compile time, 
once you base an entire system on them.
Also they will most likely use string mixins in the 
implementation which means you can run into visibility/protection 
and namespace issues.

So I started to sketch what I would want to the compiler to do. 
(And I know it can do it, because these operations are performed 
internally)

And here was what this would look like:

---
alias type = __type__;

/// Compiler interal API looks up a typename in the given 
scope($D lookupScope)
/// Returns: the type named ($D typeName) if found
///          __emptyType otherwise.
type __lookupTypeFromScope (string typeName, __scope lookupScope);

/// compiler internal API returns the current scope
/// Returns: the current scope
__scope __currentScope();

bool hasInvalidMember(type Enum)
{
     static immutable members = __traits(allMembers, Enum);
     return members.length && members[0] == "Invalid";
}

/// given an enum ($ closureEnumType) where all members are 
type-name (execpt for the first member iff that member is called 
'Invalid' return an array of type values which corrospond the 
type names in the enum. By default the names are looked up in the 
current scope, but this can be customized setting optional the 
lookupScope parameter)
type[] getClosureTypes(type closureEnumType, __scope lookupScope 
= __currentScope())
{
     assert(is(closureEnumType == enum), __FUNCTION__ ~ " expects 
an enum to be passed in" ~
         " which has member named equivalent to the types stored 
in the closure union");

     // type function traits don't produce compiler tuples since 
that would mean a shape change.
     immutable string[] enumMembers =
         __traits(allMembers, 
closureEnumType)[hasInvalidMember(closureEnumType)
                  /*strip the first invalid member if there is 
one*/ .. $];

     __type__[] result;
     result.length = enumMembers.length;

     foreach(i,m;enumMembers)
     {
         __type__ t = __lookupTypeFromScope(m, lookupScope);
         // assert that t is not the __emptyType
         debug { assert(is(t), "Could not find a type named: " ~ 
m); }
         else { if (!is(t))  return null; }
         result[i] = t;
     }

     return result;
}

string genSizeEnumerationCases(type closureEnumType, string 
targetVaribleName)
{
     string result;
     auto closureTypes = getClosureTypes(closureEnumType, 
__currentScope());
     assert(closureTypes != null,
         "An error occurred when getting closure types, perhaps 
the types are not visible from current scope");

     immutable string case_prefix = "case " ~ __traits(identifier, 
closureEnumType) ~ ".";
     foreach(t;closureTypes)
     {
         result ~= case_prefix ~ __traits(identifier, t) ~ ": " ~ 
targetVaribleName ~ " = " ~ t.sizeof ~ "; break;\n";
     }

     if (hasInvalidMember(closureEnumType))
         result ~= case_prefix ~ "Invalid : " ~ targetVaribleName 
~ " = 0;\n";
}

size_t closureSize (ConnectionClosureKind kind) pure nothrow  nogc
{
     size_t result = ConnectionClosure.sizeof;

     size_t size;

     final switch (kind)
     {
         mixin(genSizeEnumerationCases(typeof(kind), "size"));
     }

     result += size;

     return result;
}
---

Look at the concise imperative code above.
It does not rely on any library functionality (except if you want 
to call the compiler internal api a 'library').

Now let's write a template which would do the same thing trying 
to reduce compile times as much as we can:

---
template hasInvalidMember(EnumT)
{
     enum hasInvalidMember = __traits(allMembers, EnumT).length > 
1 && __traits(allMembers, EnumT)[0] == "Invaild";
}

/// Note: Be careful about the instantiation context.
/// This can go very poorly of the type names in EnumT are 
locally shadowed, we can't pass a scope in here so we can't fix 
this!
template getClosureTypes(EnumT)
{
     alias getClosureTypes = mixin(() {
     char[] result = cast(char[])"AliasSeq!("; // let's hope this 
template is defined in the destination of the mixin otherwise we 
will error
     // note: we could fix this my nesting a couple more mixins 
but then the code get's even more bloated
     static assert(is(EnumT == enum), __FUNCTION__ ~ " expects an 
enum to be passed in" ~
         " which has member named equivalent to the types stored 
in the closure union");
     // this will be a statically unrolled tuple foreach bloating 
the size of this
     // since we need to do another mixin within here
     // this might be removed before codegen but you do pay the 
price of building the function before codegen even though it's 
only use once by ctfe.
     // and it will be one function per instance of 
getClosureTypes bloating the internal symbol table.
     foreach(i,m;__traits(allMembers, 
enumT)[hasInvalidMember!(enumT) .. $])
     {
         static assert(is(mixin(m)), m ~ " is either not a type, 
or it's not visible here");
         result ~= m ~ ",";
     }
     // replace the last useless ',' with a ')' to close the 
AliasSeq.
     result[$-1] = ')';
} ());
}

template genSizeEnumerationCases(closureEnumType, string 
targetVaribleName)
{
     enum genSizeEnumerationCases = (() {
     string result;
     alias closureTypes = getClosureTypes!(closureEnumType);
     // also note how we can't assert here that getClosureTypes 
not error ...
     // thereby lengthening the instantiation trace if something 
does go wrong
     immutable string case_prefix = "case " ~ __traits(identifier, 
closureEnumType) ~ ".";
     foreach(t;closureTypes)
     // again unrolled tuple foreach bloating this function ...
     // and again we actually have to build an internal function 
and then we have to interpret it. So we pay for the template 
instance and for CTFE...
     {
         result ~= case_prefix ~ __traits(identifier, t) ~ ": " ~ 
targetVaribleName ~ " = " ~ t.sizeof ~ "; break;\n";
     }

     if (hasInvalidMember!(closureEnumType))
         result ~= case_prefix ~ "Invalid : " ~ targetVaribleName 
~ " = 0;\n";

      return result;
}());
}

size_t closureSize (ConnectionClosureKind kind) pure nothrow  nogc
{
     size_t result = ConnectionClosure.sizeof;

     size_t size;

     final switch (kind)
     {
         mixin(genSizeEnumerationCases!(typeof(kind), "size"));
     }

     result += size;

     return result;
}
---

Without the comments the template would probably be somewhat 
shorter.
But again they are dependent on the external template AliasSeq to 
be visible in the template instantiation context.
CTFE can't cache them because they are new symbols every time.
They use unrolled tuple foreaches increasing the size of the body 
that ctfe and semantic has to deal with.
And they have to generate new symbols persistently ... even 
though those might get stripped the compiler has to allocate 
memory for them and spent time to generate them.
Thereby memory consumption and further reducing scalability.

whereas the type function with extended internal compiler API has 
none of these problems.
And does not rely on library functionality.

With this in mind, please tell me what you think :)

Cheers,
Stefan
Jan 04 2021
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 4 January 2021 at 19:30:31 UTC, Stefan Koch wrote:
 Good Evening, [....]
I just realized one of the key differences between type functions and templates. type functions are more declarative than templates which are more generative. templates build up stuff and to see what it does you have to instantiate it. templates generally don't allow higher level reasoning. type functions are by nature pure and state transformations or reductions. Which you can reason about in isolation.
Jan 04 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 4 January 2021 at 20:37:46 UTC, Stefan Koch wrote:
 On Monday, 4 January 2021 at 19:30:31 UTC, Stefan Koch wrote:
 Good Evening, [....]
I just realized one of the key differences between type functions and templates. type functions are more declarative than templates which are more generative.
Hm... I would have thought that a template would be more declarative? Anyway, if you allow loops then it isn't declarative. Meaning, it will cause problems if you use it for deduction.
Jan 06 2021
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 6 January 2021 at 14:12:57 UTC, Ola Fosheim Grøstad 
wrote:
 Hm... I would have thought that a template would be more 
 declarative?

 Anyway, if you allow loops then it isn't declarative. Meaning, 
 it will cause problems if you use it for deduction.
Unless you constrain the loops in such a way that you can easily turn them into some kind of recursion.
Jan 06 2021
prev sibling next sibling parent reply sighoya <sighoya gmail.com> writes:
On Monday, 4 January 2021 at 19:30:31 UTC, Stefan Koch wrote:
 With this in mind, please tell me what you think :)
I like the idea of type functions for type calculus only, because it's easier to get the work done and feels more natural as it is pure D-code, not a foreign language to talk with. Though I wouldn't choose them to introduce generics in D, I think templates are enough for that part.
templates build up stuff and to see what it does you have to 
instantiate it.
templates generally don't allow higher level reasoning.
type functions are by nature pure and state transformations or 
reductions.
Which you can reason about in isolation.
Both templates and type functions are functions, the former takes code and maps them to other code internally over AST Nodes, the latter takes first class values and maps them to first class values. Though, type functions are easier to handle than templates as templates aren't really first class functions in D.
Jan 06 2021
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 6 January 2021 at 14:08:15 UTC, sighoya wrote:
 On Monday, 4 January 2021 at 19:30:31 UTC, Stefan Koch wrote:
 With this in mind, please tell me what you think :)
I like the idea of type functions for type calculus only, because it's easier to get the work done and feels more natural as it is pure D-code, not a foreign language to talk with. Though I wouldn't choose them to introduce generics in D, I think templates are enough for that part.
Indeed templates are enough for that. The issue comes when you want a typefunction to instantiate a template given a type value, since those values are runtime arguments, (even though they are evaluated by CTFE) they cannot be a compile time argument to a template. As that could change the shape of the function body (which would make caching impossible and require us to copy the function body on every call). I don't see that much of an issue to have 2 ways of doing the same thing. We have that all the time when a lower-level feature gets supplemented with a higher level one. You just chose the right way for your abstraction level and performance requirements. In the end there is not wanted wanting to convince people without data. I am going to run my experiments and we'll see what comes out :)
Jan 07 2021
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2021-01-04 20:30, Stefan Koch wrote:

 With this in mind, please tell me what you think :)
I think this looks like yet another new set of features just to avoid AST macros. What everyone is currently doing is inventing many highly specialized features just in the name of avoiding AST macros, regardless of the cost. Instead we could have a very generalized features that would solve a whole bunch of problems and even more problems we haven't thought of yet. When it comes to the actual API, again, it looks highly specialized for the purpose of the example. It also combines a lot of different language features to do its job, instead of providing a more unified API. Also, I don't like the idea of adding more magic symbols to the compiler, we already have too many of these. D has an excellent module system, but we're not using it. Everything is stuffed into object.d or directly as magic symbols into the compiler. BTW, the first example is 72 lines. The second is 75 and the last one is 65 (all lines between triple dashes). -- /Jacob Carlborg
Jan 06 2021
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 6 January 2021 at 16:30:25 UTC, Jacob Carlborg 
wrote:
 On 2021-01-04 20:30, Stefan Koch wrote:

 With this in mind, please tell me what you think :)
I think this looks like yet another new set of features just to avoid AST macros. What everyone is currently doing is inventing many highly specialized features just in the name of avoiding AST macros, regardless of the cost.
But the DMD AST model isn't really suitable for exposure. So you would need to design a completely new AST... not sure if this is feasible in this decade? :)
Jan 06 2021
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2021-01-06 17:34, Ola Fosheim Grøstad wrote:

 But the DMD AST model isn't really suitable for exposure. 
Why not? It would be code like any other code that can be abstracted over and helpers can be created for it.
 So you would need to design a completely new AST... not sure if this is
feasible in 
 this decade? :)
I would not expose it exactly as it is. But I don't think a completely new design is required. If you look carefully at my design [1] you'll see that it doesn't map exactly (but more or less) to the internal AST in DMD. [1] https://github.com/jacob-carlborg/druntime/blob/517dafcf54ad73049fb35d1ed5fa2ad6619b9ac4/src/core/ast/expression.d -- /Jacob Carlborg
Jan 06 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 6 January 2021 at 16:43:26 UTC, Jacob Carlborg 
wrote:
 I would not expose it exactly as it is. But I don't think a 
 completely new design is required. If you look carefully at my 
 design [1] you'll see that it doesn't map exactly (but more or 
 less) to the internal AST in DMD.
Ok, but it would have to map exactly since it would be modified after a succession of template evaluations? Or maybe not.
Jan 06 2021
parent reply Jacob Carlborg <doob me.com> writes:
On 2021-01-06 20:36, Ola Fosheim Grøstad wrote:

 Ok, but it would have to map exactly since it would be modified after a 
 succession of template evaluations? Or maybe not.
I'm not sure if I follow. But the compiler will convert between the internal AST and external AST on macro boundaries, i.e. when something is passed to a macro and when something is returned. -- /Jacob Carlborg
Jan 07 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 7 January 2021 at 10:39:02 UTC, Jacob Carlborg wrote:
 On 2021-01-06 20:36, Ola Fosheim Grøstad wrote:

 Ok, but it would have to map exactly since it would be 
 modified after a succession of template evaluations? Or maybe 
 not.
I'm not sure if I follow. But the compiler will convert between the internal AST and external AST on macro boundaries, i.e. when something is passed to a macro and when something is returned.
But you are able to traverse the whole external AST tree? So you maintain what is equivalent of a virtual AST? Kinda like React does with a virtual DOM? That's ok if only the virtual AST changes, then you can propagate changes to the real AST. UNLESS the compiler is caching information in other structures... At that point things get really hairy. But how to detect changes in the real AST, if the compiler modifies it? You need to add a tracking mechanism, dirty-marking or some such.
Jan 07 2021
parent Jacob Carlborg <doob me.com> writes:
On 2021-01-07 15:05, Ola Fosheim Grøstad wrote:

 But you are able to traverse the whole external AST tree?
Yes.
 So you maintain what is equivalent of a virtual AST? Kinda like React 
 does with a virtual DOM?
 
 That's ok if only the virtual AST changes, then you can propagate 
 changes to the real AST. UNLESS the compiler is caching information in 
 other structures... At that point things get really hairy.
 
 But how to detect changes in the real AST, if the compiler modifies it?
 
 You need to add a tracking mechanism, dirty-marking or some such.
I haven't thought of that. I have not been able to come up with any other way to expose the AST at least. Otherwise I think there would need to be some significant changes to how CTFE works in the compiler. -- /Jacob Carlborg
Jan 09 2021
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 6 January 2021 at 16:34:14 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 6 January 2021 at 16:30:25 UTC, Jacob Carlborg 
 wrote:
 On 2021-01-04 20:30, Stefan Koch wrote:

 With this in mind, please tell me what you think :)
I think this looks like yet another new set of features just to avoid AST macros. What everyone is currently doing is inventing many highly specialized features just in the name of avoiding AST macros, regardless of the cost.
But the DMD AST model isn't really suitable for exposure. So you would need to design a completely new AST... not sure if this is feasible in this decade? :)
You don't need a stable AST API for AST macros, you just need quasi-quoting and unquoting, à la Common Lisp. There's an old talk by Walter and Andrei that includes a sketch of what AST macros might look like in D [1], which contains the following example: macro foo(e) { e = 3; } In Common Lisp, the equivalent would be (defmacro foo (e) `(setf ,e 3)) You will notice that nowhere in either example is the AST exposed directly. [1] https://www.youtube.com/watch?v=FRfTk44nuWE&t=1h5m37s
Jan 06 2021
parent Jacob Carlborg <doob me.com> writes:
On 2021-01-06 17:58, Paul Backus wrote:

 You don't need a stable AST API for AST macros, you just need 
 quasi-quoting and unquoting, à la Common Lisp. There's an old talk by 
 Walter and Andrei that includes a sketch of what AST macros might look 
 like in D [1], which contains the following example:
 
      macro foo(e) { e = 3; }
 
 In Common Lisp, the equivalent would be
 
     (defmacro foo (e) `(setf ,e 3))
 
 You will notice that nowhere in either example is the AST exposed directly.
 
 [1] https://www.youtube.com/watch?v=FRfTk44nuWE&t=1h5m37s
Yes. Here's my suggestion [1] and bottom of [2]. [1] https://wiki.dlang.org/DIP50#Quasi-Quoting [2] https://forum.dlang.org/post/rt4df4$2n8g$1 digitalmars.com -- /Jacob Carlborg
Jan 06 2021
prev sibling parent Max Haughton <maxhaton gmail.com> writes:
On Wednesday, 6 January 2021 at 16:30:25 UTC, Jacob Carlborg 
wrote:
 On 2021-01-04 20:30, Stefan Koch wrote:

 With this in mind, please tell me what you think :)
I think this looks like yet another new set of features just to avoid AST macros. What everyone is currently doing is inventing many highly specialized features just in the name of avoiding AST macros, regardless of the cost. Instead we could have a very generalized features that would solve a whole bunch of problems and even more problems we haven't thought of yet. When it comes to the actual API, again, it looks highly specialized for the purpose of the example. It also combines a lot of different language features to do its job, instead of providing a more unified API. Also, I don't like the idea of adding more magic symbols to the compiler, we already have too many of these. D has an excellent module system, but we're not using it. Everything is stuffed into object.d or directly as magic symbols into the compiler. BTW, the first example is 72 lines. The second is 75 and the last one is 65 (all lines between triple dashes).
I think we should have AST macros but it should be a data structure like any other as opposed to how other languages do it. We have all this CTFE power so it should just be something like the work you began previously. Typefunctions would be a good demo run for that, as they would have to be part of the AST anyway. I think AST-anything would have to read only to start with. An API like __traits(getAST, ...) would be fine.
Jan 06 2021