digitalmars.D - Wider tuple design discussion

bearophile (105/131) Aug 02 2011 Walter has asked for a more global vision on tuples before implementing ...

Michel Fortin (17/26) Aug 02 2011 Actually, that's up to debate in my opinion. Andrei's Tuple in phobos

bearophile (4/6) Aug 02 2011 I think they are incompatible.

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (6/12) Aug 02 2011 I'm not sure it would make much sense to actually allow low-level

Don (3/32) Aug 02 2011 In fact they're collections of symbols, which can be types, variables,

Pelle (29/58) Aug 02 2011 I would argue these two are the most important ones.

bearophile <bearophileHUGS lycos.com> writes:

Walter has asked for a more global vision on tuples before implementing their
unpacking, so need to show the discussion here again:
http://d.puremagic.com/issues/show_bug.cgi?id=6365#c26

Walter:
 My main reservation about this is not that I can see anything obviously wrong
 about it, but I am nervous we are building a special case for tuples. I think
 any expansion of tuple support should be part of a more comprehensive design
 for tuples.
 
 If we grow tuple support by adding piecemeal special cases for this and that,
 without thinking about them more globally, we are liable to build ourselves
 into a box we can't get out of.
 
 I can give examples in other languages that did this - the most infamous being
 the great C idea of conflating arrays and pointers. It seems to be a great idea
 in the small, but only much larger experience showed what a disastrous mistake
 it was.
 
 We should step back and figure out what we want to do with tuples in the much
 more general case.


More stuff I've written:
http://d.puremagic.com/issues/show_bug.cgi?id=6367
http://d.puremagic.com/issues/show_bug.cgi?id=6383


A global design for tuples in D was right what I have tried to avoid, because I
have thought that Andrei was against a full tuple built-in design and that
generally there was not so much desire for large changes in D2 now. With that
syntax sugar suggestion I have tried to find the simpler and more essential D
change that allows me to use tuples in a more handy way. Now I'm back to the
start. And I still fear that a better and more comprehensive tuple design will
be refused in D. So I fear there is no way out: if you design a minimal change
Walter fears of corner cases and myopic design, and if you design a
comprehensive tuple, it risks being a too much big change for this stage of D
evolution. In both cases the end result is of not having good enough tuples in
D.

In the end Walter is right. Better to not have built-in support for tuples than
having some wrongly designed built-in syntax sugar for tuples. So let's think
about the only viable alternative, a wider design, and hope.

When you design something you need first of all to think about what your tool
will be used for.

Tuples are used all the time in functional-style languages, usually once every
line of code or so. In Python and Haskell I use them all the time. In D I
already use tuples, but they are not so handy.

Tuples are not essential in D. You are able to write large programs without
tuples. But they are sometimes handy, especially if you use a more
functional-style programming.

Tuples do have some disadvantages. Their fields often are anonymous, this goes
against the typical well defined style of D/Java/Ada programs, where you know
the name of what you are using. So tuples are better if you use them locally,
or if they have only few items. Long tuples become hard to use because you risk
forgetting what each field is, so the most common tuples have between 2 and 5
items. In Python you are also allowed to define a tuple with zero items, and a
tuple with one item, but in my experience they are not so useful (in Python you
sometimes use longer tuples made of items of the same type, because you are
using them essentially as immutable arrays. This is considered not pythonic). D
tuples allow a name for the fields, this is useful at usage point, because
instead of a "[2]" you use (example) ".isOpen", that's more descriptive.

A tuple isn't an object, it's more like a record, it's almost like a more handy
POD. So tuples are data structures that you use with code (functions) that they
don't contain. OOP design is against this. But I am not fond of pure OO design.
There are many situations where you want something more handy, more functional
style, or more in "abstract data structure"-style. Python, D, Scala and other
languages recognize this.

In some languages (like Python) tuples are immutable, while in D they are the
opposite, you currently can't give them immutable fields. In D tuples are
values, while in Python they are used by kind of reference (by name). In
functional languages you can't how they managed, because there is referential
transparency (so even if there is a reference, it's invisible. So the compiler
is free to implement them as it wants, and optimize as much as it can).

In some languages tuples are managed with a nominal type system, while in most
other situations they are managed by a structural type system (that means that
two tuples are seen of different type only if the types of their fields
differ). (Time ago I have even suggested a  structural attribute to be used in
the Phobos code that defines D tuples).

There are many operations you want to perform on tuples, the most basic ones
are:
0) define the type of a tuple
1) create a tuple
2) copy a tuple
3) Read a tuple field
4) Write a tuple field
5) Unpack tuple fields into variables, this is useful in 3 or more different
situations:
5.1) The most basic one is unpacking a tuple locally, inside a function.
5.2) Another common situation is unpacking a small tuple inside a foreach
statement, when you iterate on a collection of tuples.
5.3) Another usage (quite common in Haskell and similar languages) is to unpack
a tuple in the signature of a function. Haskell even allows to give names at
the same time to both fields and to the whole tuple.

Other operations are:
6) Slice a tuple like a Python/D array: tup[1..3]. (This is already supported
in D, but you can't use the normal slice syntax, you need to use tup.slice!(1,
3) ).
7) Concat two or more tuples (using ~ and ~= operators).

There are many other operations (remove a field, rename a field, etc), but they
are not commonly useful. If you know a common need that I have not listed
please show it.

It's better to show some examples of those 5.1, 5.2 and 5.3.

5.1 is useful for functions that return more than one value too:
auto (x, y) = tuple(1, 2);
(int x, int y) = foo(100);
(int x, auto y, auto z) = bar("hello");
The proposal in 6365 was to implement this very common case, that currently is
not good enough in D.

5.2 (a different syntax is possible, this is just an example):
auto array = [tuple(1,"foo"), tuple(2,"bar")];
foreach (tuple(id, name); array) {}

5.3) This is already possible in D using the typetuple coming from a tuple
using the (undocumented?) .tupleof. But this has the problem of losing some
safety, this compiles with no errors:

import std.typecons;
void bar(int i, string s) {}
void main() {
    auto t1 = tuple(1, "foo");
    bar(t1.tupleof);
    auto t2 = tuple(1);
    auto t3 = tuple("foo");
    bar(t2.tupleof, t3.tupleof);
}

This is not allowed in Python2 and Haskell, because tuples are true solid
tuples at the unpacking point too, in the function signature.

To implement the feature 5.3 well you need something the enforces the tuple
length at the unpacking point too, something more like this:

void bar(Tuple!(int i, string s)) {
    // here use varibles s and i 
}


In Python there are more features similar to the point (5). Yo are allowed to
unpack a dynamic array too into variables, and even a lazy generator. This is
so handy and useful that I have suggested to allow the same thing in D too:
http://d.puremagic.com/issues/show_bug.cgi?id=6383


Some usage examples, assign from a dynamic array, in D2:

import std.string;
void main() {
    auto s = " foo bar ";
    auto ss = s.split();
    auto sa = ss[0];
    auto sb = ss[1];
}


In Python2.6:

 sa, sb = " foo bar ".split()
 sa



'foo'
 sb



'bar'


Proposed:

import std.string;
void main() {
    auto s = " foo bar ";
    auto (sa, sb) = s.split();
}




From lazy range, D2:


import std.algorithm, std.conv, std.string;
void main() {
    string s = " 15 27";

    // splitter doesn't work yet for this
    // (here it doesn't strip the leading space)
    //auto xy = map!(to!int)(splitter(s));

    auto xy = map!(to!int)(split(s));
    int x = xy[0];
    int y = xy[1];
}



In Python:

 x, y = map(int, " 15 27".split())
 x



15
 y



27
 from itertools import imap
 x, y = imap(int, " 15 27".split())
 x



15
 y



27


Proposed:

import std.algorithm, std.conv, std.string;
void main() {
    string s = " 15 27";
    (int x, int y) = map!(to!int)(splitter(s));
}


Another example of unpacking a lazy range is to unpack the result of match(x,
r"...").captures.


In my opinion unpacking a dynamic array in variables is natural in D because D
already contains something that logically is very similar or almost equal:
foo[] a1 = [1, 2];
int[2] a2 = a1;

This performs a run-time test on the length of a1.

In Python3 there are even ways to unpack only part of a tuple:
a, b, *rest = (1, 2, 3, 4)

More thinking is needed. Please help me answer what Walter was asking for.
And please Walter, if you have more specific questions, it's good moment to ask.

Bye,
bearophile

Aug 02 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-08-02 13:15:52 +0000, bearophile <bearophileHUGS lycos.com> said:

 A tuple isn't an object, it's more like a record, it's almost like a 
 more handy POD. So tuples are data structures that you use with code 
 (functions) that they don't contain.

Actually, that's up to debate in my opinion. Andrei's Tuple in phobos 
is a struct, so it's a POD like you say. Walter's tuples in the 
language are just collections of variables.

A language tuple does not have an address in itself, it essentially has 
one address per field. This actually makes tuple packing/unpacking very 
efficient across function calls (because there's no real 
packing/unpacking taking place), but it makes it impossible to take the 
address of a tuple.

We need a way to make those two concepts work together, I think that's 
the hard part.


 Tuples do have some disadvantages. Their fields often are anonymous, 
 this goes against the typical well defined style of D/Java/Ada 
 programs, where you know the name of what you are using. So tuples are 
 better if you use them locally, or if they have only few items. Long 
 tuples become hard to use because you risk forgetting what each field 
 is, so the most common tuples have between 2 and 5 items.

Named tuple elements in a language tuple will be implemented when/if 
someone implement named arguments, something I might do eventually.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Aug 02 2011

bearophile <bearophileHUGS lycos.com> writes:

Michel Fortin:

 We need a way to make those two concepts work together, I think that's 
 the hard part.

I think they are incompatible.

Bye,
bearophile

Aug 02 2011

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:

On 02-08-2011 16:28, bearophile wrote:
 Michel Fortin:

 We need a way to make those two concepts work together, I think that's
 the hard part.

 I think they are incompatible.

 Bye,
 bearophile

I'm not sure it would make much sense to actually allow low-level 
operations on tuples. They're meant to be a language abstraction, and 
freely taking the address of them seems useless in practice and really 
just smells like array pointers (or similar) to me.

- Alex

Aug 02 2011

Don <nospam nospam.com> writes:

Michel Fortin wrote:
 On 2011-08-02 13:15:52 +0000, bearophile <bearophileHUGS lycos.com> said:
 
 A tuple isn't an object, it's more like a record, it's almost like a 
 more handy POD. So tuples are data structures that you use with code 
 (functions) that they don't contain.

 
 Actually, that's up to debate in my opinion. Andrei's Tuple in phobos is 
 a struct, so it's a POD like you say. Walter's tuples in the language 
 are just collections of variables.

In fact they're collections of symbols, which can be types, variables, 
or values...


 
 A language tuple does not have an address in itself, it essentially has 
 one address per field. This actually makes tuple packing/unpacking very 
 efficient across function calls (because there's no real 
 packing/unpacking taking place), but it makes it impossible to take the 
 address of a tuple.
 
 We need a way to make those two concepts work together, I think that's 
 the hard part.
 
 
 Tuples do have some disadvantages. Their fields often are anonymous, 
 this goes against the typical well defined style of D/Java/Ada 
 programs, where you know the name of what you are using. So tuples are 
 better if you use them locally, or if they have only few items. Long 
 tuples become hard to use because you risk forgetting what each field 
 is, so the most common tuples have between 2 and 5 items.

 
 Named tuple elements in a language tuple will be implemented when/if 
 someone implement named arguments, something I might do eventually.

Aug 02 2011

Pelle <pelle.mansson gmail.com> writes:

On Tue, 02 Aug 2011 15:15:52 +0200, bearophile <bearophileHUGS lycos.com>  
wrote:

 5.1) The most basic one is unpacking a tuple locally, inside a function.
 5.2) Another common situation is unpacking a small tuple inside a  
 foreach statement, when you iterate on a collection of tuples.

I would argue these two are the most important ones.

 5.3) Another usage (quite common in Haskell and similar languages) is to  
 unpack a tuple in the signature of a function. Haskell even allows to  
 give names at the same time to both fields and to the whole tuple.

This is less important, I think.

 Other operations are:
 6) Slice a tuple like a Python/D array: tup[1..3]. (This is already  
 supported in D, but you can't use the normal slice syntax, you need to  
 use tup.slice!(1, 3) ).
 7) Concat two or more tuples (using ~ and ~= operators).

Note that ~= cannot work, as the result of the concatenation has a  
different type than the left hand side. (Unless, of course, the right hand  
side is the empty tuple :-)

 5.3) This is already possible in D using the typetuple coming from a  
 tuple using the (undocumented?) .tupleof. But this has the problem of  
 losing some safety, this compiles with no errors:

 import std.typecons;
 void bar(int i, string s) {}
 void main() {
     auto t1 = tuple(1, "foo");
     bar(t1.tupleof);
     auto t2 = tuple(1);
     auto t3 = tuple("foo");
     bar(t2.tupleof, t3.tupleof);
 }

This is because of the conflation of TypeTuple (the compiler thing) and  
Tuple (the sane type thing). That code works because .tupleof is a  
TypeTuple.

We need this functionality because of things like  
std.traits.ParameterTypeTuple (or indeed Tuple itself uses a TypeTuple  
internally).

The built in compiler tuple could use a new name, badly, to disperse the  
confused situation with sane tuples. AliasSequence or something :-)

 In Python there are more features similar to the point (5). Yo are  
 allowed to unpack a dynamic array too into variables, and even a lazy  
 generator. This is so handy and useful that I have suggested to allow  
 the same thing in D too:
 http://d.puremagic.com/issues/show_bug.cgi?id=6383

I think that as well as tuples, any range should be unpackable, and I  
think it should be implemented via syntactic rewrite.

Translate this:

auto (a, rest...) = myRange;

into

static assert (isInputRange!myRange);
assert (!myRange.empty);
auto a = myRange.front;
myRange.popFront;
rest = myRange;

Equivalently with bidirectional ranges, and any number of unpacked  
elements.


I wonder how the auto(...) syntax works with existing variables. For  
example, in python:

Aug 02 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Wider tuple design discussion