www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Wider tuple design discussion

reply bearophile <bearophileHUGS lycos.com> writes:
Walter has asked for a more global vision on tuples before implementing their
unpacking, so need to show the discussion here again:
http://d.puremagic.com/issues/show_bug.cgi?id=6365#c26

Walter:
 My main reservation about this is not that I can see anything obviously wrong
 about it, but I am nervous we are building a special case for tuples. I think
 any expansion of tuple support should be part of a more comprehensive design
 for tuples.
 
 If we grow tuple support by adding piecemeal special cases for this and that,
 without thinking about them more globally, we are liable to build ourselves
 into a box we can't get out of.
 
 I can give examples in other languages that did this - the most infamous being
 the great C idea of conflating arrays and pointers. It seems to be a great idea
 in the small, but only much larger experience showed what a disastrous mistake
 it was.
 
 We should step back and figure out what we want to do with tuples in the much
 more general case.

More stuff I've written: http://d.puremagic.com/issues/show_bug.cgi?id=6367 http://d.puremagic.com/issues/show_bug.cgi?id=6383 A global design for tuples in D was right what I have tried to avoid, because I have thought that Andrei was against a full tuple built-in design and that generally there was not so much desire for large changes in D2 now. With that syntax sugar suggestion I have tried to find the simpler and more essential D change that allows me to use tuples in a more handy way. Now I'm back to the start. And I still fear that a better and more comprehensive tuple design will be refused in D. So I fear there is no way out: if you design a minimal change Walter fears of corner cases and myopic design, and if you design a comprehensive tuple, it risks being a too much big change for this stage of D evolution. In both cases the end result is of not having good enough tuples in D. In the end Walter is right. Better to not have built-in support for tuples than having some wrongly designed built-in syntax sugar for tuples. So let's think about the only viable alternative, a wider design, and hope. When you design something you need first of all to think about what your tool will be used for. Tuples are used all the time in functional-style languages, usually once every line of code or so. In Python and Haskell I use them all the time. In D I already use tuples, but they are not so handy. Tuples are not essential in D. You are able to write large programs without tuples. But they are sometimes handy, especially if you use a more functional-style programming. Tuples do have some disadvantages. Their fields often are anonymous, this goes against the typical well defined style of D/Java/Ada programs, where you know the name of what you are using. So tuples are better if you use them locally, or if they have only few items. Long tuples become hard to use because you risk forgetting what each field is, so the most common tuples have between 2 and 5 items. In Python you are also allowed to define a tuple with zero items, and a tuple with one item, but in my experience they are not so useful (in Python you sometimes use longer tuples made of items of the same type, because you are using them essentially as immutable arrays. This is considered not pythonic). D tuples allow a name for the fields, this is useful at usage point, because instead of a "[2]" you use (example) ".isOpen", that's more descriptive. A tuple isn't an object, it's more like a record, it's almost like a more handy POD. So tuples are data structures that you use with code (functions) that they don't contain. OOP design is against this. But I am not fond of pure OO design. There are many situations where you want something more handy, more functional style, or more in "abstract data structure"-style. Python, D, Scala and other languages recognize this. In some languages (like Python) tuples are immutable, while in D they are the opposite, you currently can't give them immutable fields. In D tuples are values, while in Python they are used by kind of reference (by name). In functional languages you can't how they managed, because there is referential transparency (so even if there is a reference, it's invisible. So the compiler is free to implement them as it wants, and optimize as much as it can). In some languages tuples are managed with a nominal type system, while in most other situations they are managed by a structural type system (that means that two tuples are seen of different type only if the types of their fields differ). (Time ago I have even suggested a structural attribute to be used in the Phobos code that defines D tuples). There are many operations you want to perform on tuples, the most basic ones are: 0) define the type of a tuple 1) create a tuple 2) copy a tuple 3) Read a tuple field 4) Write a tuple field 5) Unpack tuple fields into variables, this is useful in 3 or more different situations: 5.1) The most basic one is unpacking a tuple locally, inside a function. 5.2) Another common situation is unpacking a small tuple inside a foreach statement, when you iterate on a collection of tuples. 5.3) Another usage (quite common in Haskell and similar languages) is to unpack a tuple in the signature of a function. Haskell even allows to give names at the same time to both fields and to the whole tuple. Other operations are: 6) Slice a tuple like a Python/D array: tup[1..3]. (This is already supported in D, but you can't use the normal slice syntax, you need to use tup.slice!(1, 3) ). 7) Concat two or more tuples (using ~ and ~= operators). There are many other operations (remove a field, rename a field, etc), but they are not commonly useful. If you know a common need that I have not listed please show it. It's better to show some examples of those 5.1, 5.2 and 5.3. 5.1 is useful for functions that return more than one value too: auto (x, y) = tuple(1, 2); (int x, int y) = foo(100); (int x, auto y, auto z) = bar("hello"); The proposal in 6365 was to implement this very common case, that currently is not good enough in D. 5.2 (a different syntax is possible, this is just an example): auto array = [tuple(1,"foo"), tuple(2,"bar")]; foreach (tuple(id, name); array) {} 5.3) This is already possible in D using the typetuple coming from a tuple using the (undocumented?) .tupleof. But this has the problem of losing some safety, this compiles with no errors: import std.typecons; void bar(int i, string s) {} void main() { auto t1 = tuple(1, "foo"); bar(t1.tupleof); auto t2 = tuple(1); auto t3 = tuple("foo"); bar(t2.tupleof, t3.tupleof); } This is not allowed in Python2 and Haskell, because tuples are true solid tuples at the unpacking point too, in the function signature. To implement the feature 5.3 well you need something the enforces the tuple length at the unpacking point too, something more like this: void bar(Tuple!(int i, string s)) { // here use varibles s and i } In Python there are more features similar to the point (5). Yo are allowed to unpack a dynamic array too into variables, and even a lazy generator. This is so handy and useful that I have suggested to allow the same thing in D too: http://d.puremagic.com/issues/show_bug.cgi?id=6383 Some usage examples, assign from a dynamic array, in D2: import std.string; void main() { auto s = " foo bar "; auto ss = s.split(); auto sa = ss[0]; auto sb = ss[1]; } In Python2.6:
 sa, sb = " foo bar ".split()
 sa



 sb



Proposed: import std.string; void main() { auto s = " foo bar "; auto (sa, sb) = s.split(); } From lazy range, D2: import std.algorithm, std.conv, std.string; void main() { string s = " 15 27"; // splitter doesn't work yet for this // (here it doesn't strip the leading space) //auto xy = map!(to!int)(splitter(s)); auto xy = map!(to!int)(split(s)); int x = xy[0]; int y = xy[1]; } In Python:
 x, y = map(int, " 15 27".split())
 x



 y



 from itertools import imap
 x, y = imap(int, " 15 27".split())
 x



 y



Proposed: import std.algorithm, std.conv, std.string; void main() { string s = " 15 27"; (int x, int y) = map!(to!int)(splitter(s)); } Another example of unpacking a lazy range is to unpack the result of match(x, r"...").captures. In my opinion unpacking a dynamic array in variables is natural in D because D already contains something that logically is very similar or almost equal: foo[] a1 = [1, 2]; int[2] a2 = a1; This performs a run-time test on the length of a1. In Python3 there are even ways to unpack only part of a tuple: a, b, *rest = (1, 2, 3, 4) More thinking is needed. Please help me answer what Walter was asking for. And please Walter, if you have more specific questions, it's good moment to ask. Bye, bearophile
Aug 02 2011
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2011-08-02 13:15:52 +0000, bearophile <bearophileHUGS lycos.com> said:

 A tuple isn't an object, it's more like a record, it's almost like a 
 more handy POD. So tuples are data structures that you use with code 
 (functions) that they don't contain.

Actually, that's up to debate in my opinion. Andrei's Tuple in phobos is a struct, so it's a POD like you say. Walter's tuples in the language are just collections of variables. A language tuple does not have an address in itself, it essentially has one address per field. This actually makes tuple packing/unpacking very efficient across function calls (because there's no real packing/unpacking taking place), but it makes it impossible to take the address of a tuple. We need a way to make those two concepts work together, I think that's the hard part.
 Tuples do have some disadvantages. Their fields often are anonymous, 
 this goes against the typical well defined style of D/Java/Ada 
 programs, where you know the name of what you are using. So tuples are 
 better if you use them locally, or if they have only few items. Long 
 tuples become hard to use because you risk forgetting what each field 
 is, so the most common tuples have between 2 and 5 items.

Named tuple elements in a language tuple will be implemented when/if someone implement named arguments, something I might do eventually. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Aug 02 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Michel Fortin:

 We need a way to make those two concepts work together, I think that's 
 the hard part.

I think they are incompatible. Bye, bearophile
Aug 02 2011
parent =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 02-08-2011 16:28, bearophile wrote:
 Michel Fortin:

 We need a way to make those two concepts work together, I think that's
 the hard part.

I think they are incompatible. Bye, bearophile

I'm not sure it would make much sense to actually allow low-level operations on tuples. They're meant to be a language abstraction, and freely taking the address of them seems useless in practice and really just smells like array pointers (or similar) to me. - Alex
Aug 02 2011
prev sibling parent Don <nospam nospam.com> writes:
Michel Fortin wrote:
 On 2011-08-02 13:15:52 +0000, bearophile <bearophileHUGS lycos.com> said:
 
 A tuple isn't an object, it's more like a record, it's almost like a 
 more handy POD. So tuples are data structures that you use with code 
 (functions) that they don't contain.

Actually, that's up to debate in my opinion. Andrei's Tuple in phobos is a struct, so it's a POD like you say. Walter's tuples in the language are just collections of variables.

In fact they're collections of symbols, which can be types, variables, or values...
 
 A language tuple does not have an address in itself, it essentially has 
 one address per field. This actually makes tuple packing/unpacking very 
 efficient across function calls (because there's no real 
 packing/unpacking taking place), but it makes it impossible to take the 
 address of a tuple.
 
 We need a way to make those two concepts work together, I think that's 
 the hard part.
 
 
 Tuples do have some disadvantages. Their fields often are anonymous, 
 this goes against the typical well defined style of D/Java/Ada 
 programs, where you know the name of what you are using. So tuples are 
 better if you use them locally, or if they have only few items. Long 
 tuples become hard to use because you risk forgetting what each field 
 is, so the most common tuples have between 2 and 5 items.

Named tuple elements in a language tuple will be implemented when/if someone implement named arguments, something I might do eventually.

Aug 02 2011
prev sibling parent Pelle <pelle.mansson gmail.com> writes:
On Tue, 02 Aug 2011 15:15:52 +0200, bearophile <bearophileHUGS lycos.com>  
wrote:

 5.1) The most basic one is unpacking a tuple locally, inside a function.
 5.2) Another common situation is unpacking a small tuple inside a  
 foreach statement, when you iterate on a collection of tuples.

I would argue these two are the most important ones.
 5.3) Another usage (quite common in Haskell and similar languages) is to  
 unpack a tuple in the signature of a function. Haskell even allows to  
 give names at the same time to both fields and to the whole tuple.

This is less important, I think.
 Other operations are:
 6) Slice a tuple like a Python/D array: tup[1..3]. (This is already  
 supported in D, but you can't use the normal slice syntax, you need to  
 use tup.slice!(1, 3) ).
 7) Concat two or more tuples (using ~ and ~= operators).

Note that ~= cannot work, as the result of the concatenation has a different type than the left hand side. (Unless, of course, the right hand side is the empty tuple :-)
 5.3) This is already possible in D using the typetuple coming from a  
 tuple using the (undocumented?) .tupleof. But this has the problem of  
 losing some safety, this compiles with no errors:

 import std.typecons;
 void bar(int i, string s) {}
 void main() {
     auto t1 = tuple(1, "foo");
     bar(t1.tupleof);
     auto t2 = tuple(1);
     auto t3 = tuple("foo");
     bar(t2.tupleof, t3.tupleof);
 }

This is because of the conflation of TypeTuple (the compiler thing) and Tuple (the sane type thing). That code works because .tupleof is a TypeTuple. We need this functionality because of things like std.traits.ParameterTypeTuple (or indeed Tuple itself uses a TypeTuple internally). The built in compiler tuple could use a new name, badly, to disperse the confused situation with sane tuples. AliasSequence or something :-)
 In Python there are more features similar to the point (5). Yo are  
 allowed to unpack a dynamic array too into variables, and even a lazy  
 generator. This is so handy and useful that I have suggested to allow  
 the same thing in D too:
 http://d.puremagic.com/issues/show_bug.cgi?id=6383

I think that as well as tuples, any range should be unpackable, and I think it should be implemented via syntactic rewrite. Translate this: auto (a, rest...) = myRange; into static assert (isInputRange!myRange); assert (!myRange.empty); auto a = myRange.front; myRange.popFront; rest = myRange; Equivalently with bidirectional ranges, and any number of unpacked elements. I wonder how the auto(...) syntax works with existing variables. For example, in python:
 a, b = b, a # swap!



Aug 02 2011