www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - imports and a data structure (any critique welcome)

reply "Jonathan" <jdgall84 gmail.com> writes:
I come from Haskell, so please excuse any functional programming 
idiosyncracies I have :)

In Haskell, I have a datatype for representing so called terms 
which looks like:

     data Term = Var Char | Op Char [Term]

i.e. a Term is a variable (denoted by an Int) or an operation 
(whose name is also denoted by an int), applied to a list of its 
arguments.  For example,

     f(g(x,y),a(),x) is represented by Op 'f' [Op 'g' [Var 'x',Var 
'y'],Op 'a' [], Var 'x]

Now, the reason I am writing in D is twofold.  First, I want to 
learn the D language, and second I really need pointers to reduce 
certain complexities in the operations (Haskell does not have 
observable sharing).

Okay, so the first thing I need to do is represent this as a data 
structure in D.  My choices are to represent the structure as a 
class or as a struct.  I choose struct because it is simpler (in 
terms of what is going on), and because the entire program takes 
one struct and makes changes to it.

Here is my first attempt at the datastructure.

struct Term {
	enum NodeType
	{
		Var, Op
	}
	NodeType node;
	union TermBody
	{
		int VarName;
		struct OpSymbol
		{
			int ConsName;	
			Term*[] Subterms;
		}
		OpSymbol terms;
	}
	TermBody term;
	this(int nodeValue)
	{
		node = NodeType.Var;
		term.VarName = nodeValue;
	}
	this(int OpSymb,Term*[] otherTerms)
	{
		node = NodeType.Op;
		term.terms.ConsName = OpSymb;
		term.terms.Subterms = otherTerms;
	}
}

I strongly encourage you guys to try to destroy my datastructure 
and critique anything and everything about it!!

But for now I will move on.  As a sanity check, I want to make 
sure I can print this datastructure out.

string termToString(Term* term)
{
	if ((*term).node == Term.NodeType.Var)
	{
		return [convertNum((*term).term.VarName)];
	}
	else
	{
		string retString = 
[convertNum((*term).term.terms.ConsName),'('];
		retString ~= termToString((*term).term.terms.Subterms[0]);
		for(size_t i = 1; i < (*term).term.terms.Subterms.length;i++)
		{
			retString ~= ',';
			retString ~= termToString((*term).term.terms.Subterms[i]);
		}
		retString ~= ')';
		return retString;
	}
}

For now, convertNum is rather silly,

char convertNum(int x){
	switch(x)
	{
		case 1: return 'x';
		case 2: return 'y';
		case 3: return 'z';
		case 4: return 'f';
		case 5: return 'g';
		case 6: return 'h';
		default: return 'e';
	}
}


Okay, here is some code to produce a simple example of a 
datastructure:

Term makeExTerm(){
	Term var1 = Term(1);//x
	Term var2 = Term(2);//y
	Term var3 = Term(3);//z
	Term fxx = Term(4, [&var1,&var1]);//f(x,x)
	Term gfxxy = Term(5, [&fxx,&var2]);//g(f(x,x),y)
	Term hzg = Term(6, [&gfxxy,&gfxxy]);//h(g(f(x,x),y),g(f(x,x),y))
	return hzg;
}

Using the above, I wrote a test program,

void main(string[] args)
{
	Term exTerm = makeExTerm();
	string printable = termToString(&exTerm);
	writeln(printable);
	Term var = Term(219);
	(*exTerm.term.terms.Subterms[0]).term.terms.Subterms[0] = &var;
         // it is now  (and should be) h(g(e,y),g(e,y))
	string printable2 = termToString(&exTerm);
	writeln(printable2);
}

If I put this in a file, compile it (with ldc or gdc), and run 
it, everything works just superbly.

However, if I extract everything except main to a different file 
and import that file, compilation passes, but running gives a 
segmentation fault.  I ran the program in a debugger, and the 
first line of main passes no problem, we are able to construct 
the term.  I checked the memory, and everything seems to work 
just fine.  However, when I step into the second line of main, 
the call to termToString, I get the segfault.

Can anyone tell me what is going on?
Dec 26 2013
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12/27/2013 01:40 AM, Jonathan wrote:
 Term makeExTerm(){
      Term var1 = Term(1);//x
      Term var2 = Term(2);//y
      Term var3 = Term(3);//z
      Term fxx = Term(4, [&var1,&var1]);//f(x,x)
      Term gfxxy = Term(5, [&fxx,&var2]);//g(f(x,x),y)
      Term hzg = Term(6, [&gfxxy,&gfxxy]);//h(g(f(x,x),y),g(f(x,x),y))
      return hzg;
 }

 Using the above, I wrote a test program,
...

 Can anyone tell me what is going on?
You are taking the address of stack-allocated variables. [1] Accessing the terms after the function has returned results in undefined behaviour. In the case where it worked you just were lucky. You may allocate your terms on the heap using 'new': --- Term* makeExTerm(){ auto var1 = new Term(1);//x auto var2 = new Term(2);//y auto var3 = new Term(3);//z auto fxx = new Term(4, [var1,var1]);//f(x,x) auto gfxxy = new Term(5, [fxx,var2]);//g(f(x,x),y) auto hzg = new Term(6, [gfxxy,gfxxy]);//h(g(f(x,x),y),g(f(x,x),y)) return hzg; } void main(string[] args){ auto exTerm = makeExTerm(); string printable = termToString(exTerm); writeln(printable); exTerm.term.terms.Subterms[0].term.terms.Subterms[0] = new Term(219); // it is now (and should be) h(g(e,y),g(e,y)) string printable2 = termToString(exTerm); writeln(printable2); } --- prints: --- h(g(f(x,x),y),g(f(x,x),y)) h(g(e,y),g(e,y)) --- [1] This would be disallowed in the memory safe subset. You might want to put ' safe:' on top of your files.
Dec 26 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Jonathan:

 I come from Haskell, so please excuse any functional 
 programming idiosyncracies I have :)
Welcome. You are the first Haskell programmer that I know willing to learn and try D :-)
 In Haskell, I have a datatype for representing so called terms 
 which looks like:

     data Term = Var Char | Op Char [Term]

 i.e. a Term is a variable (denoted by an Int) or an operation 
 (whose name is also denoted by an int), applied to a list of 
 its arguments.  For example,

     f(g(x,y),a(),x) is represented by Op 'f' [Op 'g' [Var 
 'x',Var 'y'],Op 'a' [], Var 'x]
In almost-theory D should allow code like: import std.stdio, std.typecons, std.variant; alias Term = Algebraic!(string, Tuple!(string, This[])); void main() { const expr = Term(tuple("f", [Term(tuple("g", ["x".Term, "y".Term])), Term(tuple("a", [])), "x".Term])); } In practice std.variant module needs a This defined as "struct This {}", plus another improvement to support recursive types better, so that code doesn't work. That code also can't work because the second type of the "tuple("a", [])" literal is void[], so it's not compatible. I am not sure a good enough library-defined Algebraic type can be defined.
 Okay, so the first thing I need to do is represent this as a 
 data structure in D.  My choices are to represent the structure 
 as a class or as a struct.  I choose struct because it is 
 simpler (in terms of what is going on), and because the entire 
 program takes one struct and makes changes to it.
Raw pointers are like razors, they can be useful, but they should must also handled with a good amount of discipline and safeties added by you and your experience, otherwise you cut yourself often. For a first version of your code you can try to use class instances instead, that are easer to handle (especially in garbage-collected language. But don't forget to add to functions all the pre-conditions to assert the class references are not null). I see tons of things that can be improved in your code. Like the switch, "(*term).node", the lack of immutable/const/pure/nothrow (you should be able to write good enough functional code in D), but better to think about the larger things first. Bye, bearophile
Dec 26 2013
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 12/27/2013 02:49 AM, bearophile wrote:
 In almost-theory D should allow code like:


 import std.stdio, std.typecons, std.variant;

 alias Term = Algebraic!(string, Tuple!(string, This[]));

 void main() {
      const expr = Term(tuple("f", [Term(tuple("g", ["x".Term,
 "y".Term])), Term(tuple("a", [])), "x".Term]));
 }


 In practice std.variant module needs a This defined as "struct This {}",
 plus another improvement to support recursive types better, so that code
 doesn't work.

 That code also can't work because the second type of the "tuple("a",
 [])" literal is void[], so it's not compatible.

 I am not sure a good enough library-defined Algebraic type can be defined.
mixin ADT!q{ Term: Var char | Op char Term[] }; void main(){ const expr = Op('f', [Op('g',[Var('x'), Var('y')]), Op('a',[]), Var('x')]); }
Dec 26 2013
next sibling parent reply "Jonathan" <jdgall84 gmail.com> writes:
Thanks to everyone that has replied.

 Timon Gehr:
Thank you.  In reply to [1]: that is an interesting issue that I 
don't really understand right now.  Can you explain exactly where 
I invoke undefined behavior?  It was my understanding that 
structs are passed and returned by value, and the assignment
   x = some struct
makes a copy of some struct and stores it in x.  Perhaps I don't 
understand the type system of D?  If a function returns T where T 
is some struct, does the function result in a struct?  I am 
having trouble finding these subtleties in the documentation (but 
I also acknowledge that this is probably my fault and not the 
documentation's).

Does auto do a simple type inference to find out that Term* 
should be there?  Is there an advantage to using auto?

I can't seem to put  safe: at the top of the file -- is it the 
same as compiling with -safe.  Is there a way to add this as a 
source annotation?

Is ! an operator in e.g. ADT!q ...

What is the mixin bit doing?  Can you point me towards any 
tutorials, or should I buy the book?

 bearophile:
Thank you for the welcome.  That is odd; I find that D shares a 
philosophy closer to Haskell than many of the "lower level" 
languages like C.

So let me just say, I really do need to build a structure that 
can have a shared reference to a substructure.  I will be 
rewriting these terms, and the problem is that e.g.

     f(t) -> g(t,t)

I don't want to copy t, but store a reference to t, so that if t 
can be re-written, it will only be rewritten once.  However, this 
is really *not* an algebraic type since sharing is not algebraic, 
so I figured that this would not be a good solution.

The object oriented approach would be really nice.  Since objects 
are passed by reference, I should get sharing for free.  The only 
reason I didn't use an object oriented approach off the bat is 
that I don't think I completely understand when to use a struct 
versus an object.  Using structs seem to fit the story better, 
because the structure isn't really emitting any functionality, I 
just want to perform some functions to it.  So using structs feel 
like a better fit.  Feel free to correct me here.  What are some 
of the drawbacks to using object.  How much more time overhead is 
there once the object is created?

You are right, I probably look like an idiomless noob!  I totally 
am!  Hopefully an understanding of the appropriate keywords to 
use and where will come (there is quite a lot of syntax in D).  
For now, can you tell me what I could do better about the switch 
statement and what is wrong with (*term).node?
Dec 26 2013
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Jonathan:

 Thank you.  In reply to [1]: that is an interesting issue that 
 I don't really understand right now.  Can you explain exactly 
 where I invoke undefined behavior?
It can be shown with this function: int*[] foo() { int x; int*[] arr = [&x]; return arr; } Here the value 'x' is allocated on the stack, in the stack frame of the function foo(). arr is a heap-allocated array, so you can return it safely from foo. But arr contains a pointer to x, that points still to the stack frame of foo(). When foo() ends, the stack space used by the stack frame of foo gets reused for other purposes, so now the pointer inside the array arr points to data unrelated to the original x, essentially garbage.
 It was my understanding that structs are passed and returned by 
 value, and the assignment
   x = some struct
 makes a copy of some struct and stores it in x.
This is right. Bit the problem is different.
 Does auto do a simple type inference to find out that Term* 
 should be there?
Right.
 Is there an advantage to using auto?
It makes writing the code faster, the code less cluttered, and sometimes the type is "obvious" for the person that reads the code. Just don't abuse it.
 I can't seem to put  safe: at the top of the file -- is it the 
 same as compiling with -safe.  Is there a way to add this as a 
 source annotation?
There is a writeln in the main, so you have to use: void main(string[] args) system { But still the code doesn't compile: temp.d(30): Error: field TermBody.terms cannot be accessed in safe code because it overlaps with a pointer ... So safe can't be used here.
 Is ! an operator in e.g. ADT!q ...
! is the "not" operator, but there it's used to instantiate a template.
 What is the mixin bit doing?
It probably mixes-in (statically) some code generated at compile-time.
 Can you point me towards any tutorials, or should I buy the 
 book?
The book is useful if you want to get serious about learning D. Otherwise the online docs, Rosettacode and the book by a person that posts often in D.learn could suffice.
 That is odd; I find that D shares a philosophy closer to 
 Haskell than many of the "lower level" languages like C.
In D you can write functional-style code, look at Rosettacode D entries for many examples. (Also regarding your first post, D1 language used to have literary programming built-in, as GHC-Haskell, but this feature was dropped in D2).
 However, this is really *not* an algebraic type since sharing
 is not algebraic,
If you share something that is immutable, and you assure there are no null pointers, then the fact that is shared is transparent, so why is it not algebraic?
  The only reason I didn't use an object oriented approach off 
 the bat is that I don't think I completely understand when to 
 use a struct versus an object.  Using structs seem to fit the 
 story better, because the structure isn't really emitting any 
 functionality, I just want to perform some functions to it.  So 
 using structs feel like a better fit.  Feel free to correct me 
 here.  What are some of the drawbacks to using object.  How 
 much more time overhead is there once the object is created?
Class instances have some overhead, +2 words of memory, plus a bit of time to initialize those fields. Class instances allocated on the heap are a bit simpler to use compared to structs handled by pointers.
 You are right, I probably look like an idiomless noob!  I 
 totally am!
Don't worry, you are going to learn.
 (there is quite a lot of syntax in D).
D is a system language designed to be simpler to use, and it tries to be safer, and it tries to have many of the most useful tools. All this leads to a complex language. And the syntax reflects the varied semantics of all the things it contains. It's not just syntax sugar of the same bits of lambda calculus.
 For now, can you tell me what I could do better about the 
 switch statement
That switch can become an array. Or you can use enumerations with "final switch", and so on.
 and what is wrong with (*term).node?
D is a bit smarter than C, so "term.node" suffices. Bye, bearophile
Dec 27 2013
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12/27/2013 05:54 AM, Jonathan wrote:
 Thanks to everyone that has replied.

  Timon Gehr:
 Thank you.  In reply to [1]: that is an interesting issue that I don't
 really understand right now.  Can you explain exactly where I invoke
 undefined behavior?  It was my understanding that structs are passed and
 returned by value, and the assignment
    x = some struct
 makes a copy of some struct and stores it in x.  Perhaps I don't
 understand the type system of D?
The original code takes the address of variables that are allocated on the stack. Function-local variables (fortunately/unfortunately) have limited lifetime unless they are closed over. If their address is dereferenced after they have gone out of scope, you are accessing locations on the stack that may have been populated with new data of different type in the meantime. Eg. any usage of the '&' operator in the original code is wrong, because the addresses are part of the structure that the function returns.
 If a function returns T where T is
 some struct, does the function result in a struct?  I am having trouble
 finding these subtleties in the documentation (but I also acknowledge
 that this is probably my fault and not the documentation's).
 ...
The documentation can indeed be a little sparse. This may help you: http://ddili.org/ders/d.en/index.html (I have not read a lot of it, but it appears to put special focus on subtleties as it is targeted also at programming novices.)
 Does auto do a simple type inference to find out that Term* should be
 there?
No, auto is just a place holder used to show that what follows is a declaration. What is important to invoke type deduction is that there is no type specified. (It indeed is very simple, all it does is copying the type of the initializer.)
 Is there an advantage to using auto?
 ...
Less redundancy. Eg. I would have been able to remove the undefined behaviour faster if type deduction was already in use.
 I can't seem to put  safe: at the top of the file
Yes, I missed the union. (It is not generally a memory safe construct.)
 -- is it the same as compiling with -safe.
I think this compiler switch no longer exists.
 Is there a way to add this as a source annotation?
 ...
Yes, safe: at the top of the file is the source annotation.
 Is ! an operator in e.g. ADT!q ...
 ...
It is notation for template instantiation. http://dlang.org/template.html http://ddili.org/ders/d.en/templates.html http://ddili.org/ders/d.en/templates_more.html
 What is the mixin bit doing?
It imports the declarations within the template body into the current scope.
 Can you point me towards any tutorials, or  should I buy the book?
 ...
http://ddili.org/ders/d.en/mixin.html http://dlang.org/template-mixin.html I don't think the book discusses mixins in depth.
  bearophile:
 Thank you for the welcome.  That is odd;
Maybe he just does not know a lot of Haskell programmers. :o)
 I find that D shares a
 philosophy closer to Haskell than many of the "lower level" languages
 like C.
 ...
I guess some missing type system features can turn off Haskell programmers. Most importantly, the type system is monomorphic. (Templates cover some use cases of a polymorphic type system.)
 So let me just say, I really do need to build a structure that can have
 a shared reference to a substructure.  I will be rewriting these terms,
 and the problem is that e.g.

      f(t) -> g(t,t)

 I don't want to copy t, but store a reference to t, so that if t can be
 re-written, it will only be rewritten once.  However, this is really
 *not* an algebraic type since sharing is not algebraic, so I figured
 that this would not be a good solution.

 The object oriented approach would be really nice.  Since objects are
 passed by reference, I should get sharing for free.  The only reason I
 didn't use an object oriented approach off the bat is that I don't think
 I completely understand when to use a struct versus an object.
Objects are often used for convenient virtual function tables, for reference semantics, and sometimes to emulate closed sum types in a memory safe fashion. structs are typically used for more manual control and for implementing types with (part-wise) value semantics.
 Using
 structs seem to fit the story better, because the structure isn't really
 emitting any functionality, I just want to perform some functions to
 it.
Some might claim that the structure emits visitor functionality. :o)
 So using structs feel like a better fit.  Feel free to correct me
 here.  What are some of the drawbacks to using object.
They may waste some memory.
 How much more time overhead is there once the object is created?
 ...
Virtual function calls have some overhead compared to static function calls, but it is less bad on x86/x64 than on some other architectures. How performance-critical is your application?
 You are right, I probably look like an idiomless noob!  I totally am!
 Hopefully an understanding of the appropriate keywords to use and where
 will come (there is quite a lot of syntax in D). For now, can you tell
 me what I could do better about the switch statement
I don't know what he was targeting at, but convertNum could be written more compactly as: char convertNum(int x){ return "exyzfgh"[x<$?x:0]; }
 and what is wrong with (*term).node?
The explicit dereference is redundant. term.node suffices.
Dec 27 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Timon Gehr:

 mixin ADT!q{ Term: Var char | Op char Term[] };

 void main(){
     const expr = Op('f', [Op('g',[Var('x'), Var('y')]), 
 Op('a',[]), Var('x')]);
 }
Where is ADT defined? (And why isn't it a Phobos patch on GitHub?)
 convertNum could be written more compactly as:
 char convertNum(int x){
     return "exyzfgh"[x<$?x:0];
That's too much compact :-) Humans put a space around operators, and sometimes some parentheses don't hurt: char convertNum(in int x) pure nothrow in { assert(x >= 0); } body { return "exyzfgh"[(x < $) ? x : 0]; } Bye, bearophile
Dec 27 2013
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 27 Dec 2013 13:45:10 +0000
schrieb "bearophile" <bearophileHUGS lycos.com>:

 Timon Gehr:
 
 mixin ADT!q{ Term: Var char | Op char Term[] };

 void main(){
     const expr = Op('f', [Op('g',[Var('x'), Var('y')]), 
 Op('a',[]), Var('x')]);
 }
Where is ADT defined? (And why isn't it a Phobos patch on GitHub?)
 convertNum could be written more compactly as:
 char convertNum(int x){
     return "exyzfgh"[x<$?x:0];
That's too much compact :-) Humans put a space around operators, and sometimes some parentheses don't hurt: char convertNum(in int x) pure nothrow in { assert(x >= 0); } body { return "exyzfgh"[(x < $) ? x : 0]; } Bye, bearophile
Before you go _that_ far, just write: char convertNum(in size_t x) pure nothrow { return "exyzfgh"[(x < $) ? x : 0]; } :-) -- Marco
Dec 27 2013
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 12/27/2013 02:45 PM, bearophile wrote:
 Timon Gehr:

 mixin ADT!q{ Term: Var char | Op char Term[] };

 void main(){
     const expr = Op('f', [Op('g',[Var('x'), Var('y')]), Op('a',[]),
 Var('x')]);
 }
Where is ADT defined?
https://d.puremagic.com/issues/show_bug.cgi?id=10431
 (And why isn't it a Phobos patch on GitHub?)
 ...
https://d.puremagic.com/issues/show_bug.cgi?id=11558
 That's too much compact :-)  Humans put a space around
 operators,
Feel free to reformat, but implying that you are somehow ethically superior and that I am not human seems to go a little far. :o)
 and sometimes some parentheses don't hurt:
They hurt there because they add noise and don't help.
Dec 27 2013
parent "bearophile" <bearophileHUGS lycos.com> writes:
Timon Gehr:

 https://d.puremagic.com/issues/show_bug.cgi?id=11558
We are getting there :-)
 but implying that you are somehow ethically superior and that I 
 am not human seems to go a little far. :o)
Sorry. In my opinion that kid of code is not very readable, spaces help to eye to chunk the code better.
 They hurt there because they add noise and don't help.
The parentheses in "(x < $) ? x : 0" are not necessary, but they help chunk the code, the first chunk is the condition. This is especially useful when that condition gets a little longer. Bye, bearophile
Dec 27 2013
prev sibling parent reply "Jonathan" <jdgall84 gmail.com> writes:
Let me just check my understanding:  If a function says it 
returns a thing of type T, it really does return something whose 
outermost shape is T; however, if it contains pointers to other 
things, and these were stack allocated, the pointers might be 
readdressed.

 Bearophile: in your example, why is the array heap allocated?  
For arrays do you not need to use new?

 From the documentation:
"BUGS:
Currently, Algebraic does not allow recursive data types."
... So maybe in the future, I can refactor to that.

It makes sense that union is not type safe.  If I have a struct 
like this

struct F {
   enum Case {case1, case2}
   Case case;
   int x;
   string y;
   this(int x_in)
   {
     x = x_in;
     case = case1;
   }
   this(string y_in)
   {
   y = y_in;
   case = case2;
   }
}

That seems like a bad practice leaving one of the fields 
uninstantiated.  Is this is a sign that I should be using an 
object oriented approach, or is there a way to clean this up.

I have to admit, I don't understand the mixin/template stuff 
right now.  However the mixin ADT thing seems pretty sexy, so it 
might be a good idea to learn enough to understand what is going 
on there.  The problem I have with this is if it ends up 
describing a struct in the background, will I have to keep a 
bunch of conventions straight in my head, or are there lots of 
utilities for working with this kind of thing (i.e. can I do a 
case operation, and recurse on subterms)?  Are templates 
considered a good practice in D?

Also, would

mixin ADT!q{ Term: Var char | Op char Term[] | Ptr Term*};

be considered valid.  If so, then it would allow me to create a 
term t get its pointer, p, and then have
   Op 'g' (Ptr p, Ptr p)
so that in rewriting g(t,t), I only need to rewrite t once.

Suppose a seasoned D programmer were thinking about this problem: 
would (s)he opt for an object oriented approach or the use of 
structs.  The main point of this data structure is to implement 
term rewriting.  There will probably be a lot of object creation 
-- especially in building and applying substitution lists.  I 
don't see any real benefit of one of the other for this 
application.

I tend not to worry too much about being performance critical 
i.e. cutting corners to shave off constants at the expense of 
losing safety ... I tend to prefer a simpler approach as long as 
I can guarantee that the big-O is the same -- however, I try to 
avoid even logarithmic "blowups" in comparable approaches ...
Dec 27 2013
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12/27/2013 05:08 PM, Jonathan wrote:
 Let me just check my understanding:  If a function says it returns a
 thing of type T, it really does return something whose outermost shape
 is T; however, if it contains pointers to other things, and these were
 stack allocated, the pointers might be readdressed.
 ...
Well, the it is returned as-is, but the value of something that contains indirections is in general only meaningful in the context of a corresponding heap, stack and static data segment. Here the stack is changed in a way that renders the pointers in the returned value meaningless, so type safety does not hold. (Outside of safe code, type safety is conditional on the programmer making sure that code does not violate certain invariants.)
  Bearophile: in your example, why is the array heap allocated?
This is just how the language feature works. The array literal allocates an array on the heap and gives you a slice to it. (If you assign such a literal to a static variable during compilation, it will actually be stored in the static data segment.)
 For arrays do you not need to use new?
 ...
You can also allocate a new array using new: auto x = new int[](4); // allocate array of 4 elements
 ...

 It makes sense that union is not type safe.  If I have a struct like this
 ...
 }

 That seems like a bad practice leaving one of the fields
 uninstantiated.  Is this is a sign that I should be using an object
 oriented approach, or is there a way to clean this up.
 ...
The only way to clean it up is to hide it behind a type safe interface. (Besides concise syntax, that's all the ADT mixin template is doing essentially.)
 I have to admit, I don't understand the mixin/template stuff right now.
 However the mixin ADT thing seems pretty sexy, so it might be a good
 idea to learn enough to understand what is going on there.
It parses the given specification (q{this is a string}), generates D code as a string using the built-in D interpreter that the compiler comes with. ("Using CTFE".) Then the generated D code is "mixed in" and compiled.
 The problem
 I have with this is if it ends up describing a struct in the background,
 will I have to keep a bunch of conventions straight in my head, or are
 there lots of utilities for working with this kind of thing (i.e. can I
 do a case operation, and recurse on subterms)?
Well, it is implemented in D code, so that's up to you. The proof-of-concept implementation supports pattern matching.
 Are templates considered a good practice in D?
 ...
Yes, but mixins less so. (They are considered good if there is no other way to get what you want.)
 Also, would

 mixin ADT!q{ Term: Var char | Op char Term[] | Ptr Term*};

 be considered valid.  If so, then it would allow me to create a term t
 get its pointer, p, and then have
    Op 'g' (Ptr p, Ptr p)
 so that in rewriting g(t,t), I only need to rewrite t once.
 ...
A full-blown ADT implementation probably would not tolerate mutable aliasing.
 Suppose a seasoned D programmer were thinking about this problem: would
 (s)he opt for an object oriented approach  or the use of structs.  The
 main point of this data structure is to implement term rewriting.  There
 will probably be a lot of object creation -- especially in building and
 applying substitution lists.  I don't see any real benefit of one of the
 other for this application.

 I tend not to worry too much about being performance critical i.e.
 cutting corners to shave off constants at the expense of losing safety
 ... I tend to prefer a simpler approach as long as I can guarantee that
 the big-O is the same -- however, I try to avoid even logarithmic
 "blowups" in comparable approaches ...
You should probably just go with classes then. (It's what I've done for a recent toy implementation of the calculus of constructions.)
Dec 27 2013
prev sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 27 December 2013 at 16:08:09 UTC, Jonathan wrote:
 Let me just check my understanding:  If a function says it 
 returns a thing of type T, it really does return something 
 whose outermost shape is T; however, if it contains pointers to 
 other things, and these were stack allocated, the pointers 
 might be readdressed.
The pointers remain exactly how they were, but stack allocated memory is released when the function that allocated it returns (that's just how the stack works, it's temporary scratch space for functions) so the pointers are left pointing to garbage.
Dec 27 2013