www.digitalmars.com         C & C++   DMDScript  

D - Ideas, thoughts, and criticisms, part two. About functions.

reply Antti =?iso-8859-1?Q?Syk=E4ri?= <jsykari cc.hut.fi> writes:
Ideas, thoughts, and criticisms (part 2)
----------------------------------------

(The topic says functions. Today's rant is actually about a bit more
than that: multiple-value expressions, the unification of expressions
and statements, and a sidenote about how to implement generics like I
would do it. (The C++ way that is. *g*))

Here we go again.

I thought a bit about functions.

Let's start from an example.

http://www.digitalmars.com/d/function.html contains the following
example:

int foo(int x, out int y, inout int z, int q);

To me it just doesn't seem right. Something is wrong. Maybe it is the
words "inout" and "out" which just I feel like should not be there.
(sorry, all due respect to IDL but I don't use it, I don't like it, and
don't think it's worthwhile modeling a language to match IDL syntax.
Just my opinion. Just the two cents, or something.)

Probably it's just my view that there should not be _two_ places where a
function can return values - the "ordinary" return value and then the
"out" return value.  Could not the function return multiple values?

int, int foo(int x, inout int z, int q);

Now something is still wrong. I put the second argument to the function,
"int y" to the left of the function. Everything's fine except that we
can't see its name there. We cannot document the meanings of function's
return value, because we cannot give it a name.

But it has always been so. function's return value has indeed been
nameless. In C that is. But I don't see why it should be that way. Let's
give names to the return values of the function:

int val, int q
foo(int x, inout int z, int q);

Now that's pretty. We have documented our return values (they have
names), functions can return multiple values, and they are put in one
place. Now we need to see what we can do to the poor inout variable.
Maybe a reference ā la C++?

int val, int q
foo(int x, int& z, int q);

IMHO this syntax is nicer to read because there are no "inout" words
there, and the "in" and "out" values are grouped nicely.

This would also get rid of the little annoyance that in C++, if you want
a function that takes no arguments, you use just:

int f();        // function taking no arguments, returning int

but if you want to return no arguments, you have to use void:

void f(int);    // function taking int, returning nothing

Now, if you want to return nothing, just use:

f();

This return value thing has some other implications, too. It doesn't
come cheap. Grab a good hold on your chairs.

We need a new type expression altogether, let's call it a multivalued
expression. Like I said, I'm no grammar expert so I'm not going to
concentrate on that side. But it seems we'd have to scrap the
well-served comma expression. Let's see what we can figure out to
replace it later, and now concentrate on the multivalued function thing.

int val1, int val2 f();

g()
{
    // you could, of course, do this:
    int x, y;
    x, y = f();

    // and naturally this:
    x, y = y, x;

    // and maybe this:
    int a, b = f();

    // now if you would want to ignore one of the return values, what to
    // do then?
    x = f();    // only take the first one
    , y = f();  // only take the second one (but this does not fulfill my
                // aesthetic needs, so maybe not like this...)
    // so perhaps something else is needed: (this _could_ be interpreted
    // to assignment to an unnamed variable of type void - or just as a
    // syntactic sugar for "int dummy, y = f();"
    void, y = f();
}

While we're at it, why not exploit the naming of the return values a
bit. We could actually represent returning multiple values as returning
a struct. Suppose that we have two f's, so we need to specify one in
order to get a type out of it:

int   val1,   int val2 f(int x)     { return x, x*2;    }
float val1, float val2 f(float x)   { return x, x*2.0f; }

// assuming that the form f() is unevaluated, and only its type is
// requested via the .type property:
// for practical purposes, f_int_return_type is a struct { int val1; int val2; }
// f(int) is there just for disambiguation purposes.
alias f(int).type f_int_return_type;

g()
{
    f_int_return_type ret = f(5);
    assert(ret.val2 == 10);
}

With this, and the "with" expression (or even without this, but with
"with" expression and some sweet syntactic sugar) we could also do

g()
{
    with (f(5))
    {
        do_whatever_with(val1, val2);
    }
}

Now, there is just no end for the syntactic sugar if that's the way to
go. Now wouldn't it taste good if you could do the following:

int, int f(int x) {           // of course, you don't need to name them if you
    return x, x+1;            // don't want to. I just didn't bother to
}                             // invent names right now

g(int first, int second, int third)
{
    printf("%d, %d, %d\n", first, second, third);
}

main()
{
    g(1, f(2));     // will print "5, 6, 7"
}

Now how does this sound like?

And of course, we wouldn't have to specify return parameters at a first
"return" line if we didn't want to. It could be done as they do in
pascal (I'm not one of them, mind you, but anyway I could imagine this
style of returning values being easier to optimize or whatever. I'm no
premature optimizer either. No way, not me. *g*).

For example, suppose that we have a function that calculates a sine and
a cosine. That's not really out of this world, x87 math coprosessor has
one. It's called fsincos (see intel:ia32_vol1) and it produces a sine
and cosine of an argument faster than a fsin and fcos in succession. So,
if we would like to commit a little sin (no pun intended) and
premature-optimize a bit:

module intrinsic; // or whatever

float sin_ret,
float cos_ret
inline sincos(float argument)
{
    asm
    {
        // now, here we assume that the compiler can handle putting
        // argument and the sine and cosine in the right places...
        // I might wrong in my assembly, I just learned it yesterday.
        // This is just an example.
        fsincos argument; 
        mov     cos_ret, st(1)
        mov     sin_ret, st(0)
    }
}

Or, for a more common example, everyone probably knows that a common DIV
instruction on IA-32 architecture (intel:ia32_vol2) produces not only the
quotient but also the remainder as a side effect; so we could actually
make 

int quotient, int remainder inline op.div(int dividee, int divisor)
{
    // assembly code to DIV dividee by divisor,
    // and move EAX to quotient (which probably is EAX anyway)
    // and move EDX to remainder.
    // If one of the return values turns out not to be used,
    // compiler is smart enough to forget unneeded MOV. EDX/EAX 
}

and then we could have (since modulo is in no way special)
int remainder, int quotient inline op.mod(int dividee, int divisor)
{
	quotient, remainder = dividee, divisor; // will call op.div()
}
Oh yeah, I forgot what to do with the poor comma operator.

My idea would be to simply do away with it and replace it with the block
expression, so that

expression1, expression2;
would yield expression1 in a single-valued context
(and a (expression1, expression2) in a multivalued context)

and

{ expression1;
  expression2; }

would yield expression2, and that would be more general anyway.

Of course, that would like a serious rearrangement of things.
Expression would be statement, and a statement would be expression.
And everything would have to be functional, not just half-functional as
it is now.

Besides, I like the idea of functional-style "if" where if is an
expression, not a statement:

// the return value need not be named, since the name of the function
// documents it clearly enough
int max(int lhs, int rhs)
{
     return
        if (lhs < rhs)
            rhs;
        else
            lhs;
}

Of course, that can be taken further and make the last
statement/expression of the function to yield the return value:

int max(int lhs, int lhs)
{
    // if yields one of its statements
    if (lhs < rhs)
        rhs;
    else
        lhs;
}

(This of course renders needless the weird trophy of the last century,
the infamous :? operator, which is a bit annoying to use repeatedly.
You have possibly seen this, and run away from it as fast as you could:

    int x = test1() ? something
                    : test2() ? somethingElse
                              : testEvenMore() ? yetanotherthing
                                               : lastTest() ? awwintired
                                                            : phew;
)

If you went so far as to make tail recursion optimization a feature of
the language (as in scheme), we are sure to see code like:

// recursive helper function
private int inline max(int[] array, int maxSoFar, uint idx)
{
    if (idx == array.length)
        maxSoFar;
    else
        max(array, max(array[i], maxSoFar), idx + 1);

}

// the main entrance to the helper function.
// return maximum entry in the array.
// in case of empty array, return int.min
int inline max(int[] array)
{
    return max(array, int.min, 0);
}

Quelle elegance! Well, that's a question of opinion but this kind of
stuff _would_ make it easier to use functional-style elements in D and
make it even more multi-paradigm than C++ is. (For the good or the
worse.)

Now while we're at it, and we've invented multivalued expressions
(tuples, if you like that name better) on the way, let's go generic. But
let's not go totally generic quite yet. Only generic when it comes to
the argument values. See, let's first introduce a new keyword, "rest",
our friend on the way to the general handling of the parameters:

int inline sum()
{
    return 0;
}

// calculate the sum of its arguments.
int inline sum(int head, rest tail)
{
    // head is the first argument,
    // and "tail", formed with the help of the "rest" keyword,
    // is the tuple containing the rest of them. tail may be
    // empty, which is as good as "void".
    // Of course, you'd better inline this function.
    return head + sum(tail);
}

Now what happens when the compiler starts to generate the code for, for
example, sum(1, 2, 3), is:

sum(1, 2, 3)
-> sum(head = 1, tail = (2, 3))
   -> 1
    + sum(2, 3)
      -> 2
       + sum(3)
         -> 3
          + sum()
            -> 0
-> 1 + 2 + 3
-> 6

And then it stops. If the user had provided something like (sum(1,x,2))
in between, it would've generated the code to calculate 1 + x + 2. Et
cetera. Of course, there might be a way to do stuff like this in
run-time too - but it just might not be needed because of arrays. But
let's see something that arrays are not able to do. To the generics.

I'd really really really very much like to have a generic function
syntax which I could use like I can in C++. Only if it can be fitted
into the existing D language, of course. 

Like as follows:

print() { }

print(int i)    {    printf("%d", i);    }
print(float f)  {    printf("%f", f);    }
print(String s) {  /* print a string */  }
print(...)      { /* you get the point*/ }

// note we don't need a template parameter for tail since it's
// implicitly matched to be the right type
template<type T>
inline print(T head, rest tail)
{
    print(head);
    print(tail);
}

Then you could do, type-safely,

print("value of i is = ", i, "\n", /* as many arguments as you like */);

and the compiler will generate you nice code for that. Inline-only. Of
course.

Of course, it goes without mentioning that C++ did this ages ago with
its << syntax, which is IMHO kind of a cool thing but it turns out some
people don't like it. Maybe this syntax is friendlier. It can probably
be made better. Give me your best shots.

And please apply the same C++-style generic function syntax for to the
examples above - you probably did - and see how it shines in proportion
to any other generic syntax ever invented. It would be very nice to see
that in D, too.

Oh yeah, let me warn you (regarding multiple return values), functions
return multiple return values might (or might now) also mess with
function pointer syntax and/or overloading issues.  (I'm thinking now
that since a i32 (*f)() must pass just one parameter, a i32, i32 (*f)()
could not be be used as one because it might clobber stack or registers
or whatever.  But on the other hand, if it uses an unused part of the
stack, what the hell - just leave the caller the first value and leave
to the programmer the problem that he has a function which does unused
computation.)

Say what you think.

Antti.

References:

(intel:ia32_vol1)
http://www.intel.com/design/pentium4/manuals/245470.htm

(intel:ia32_vol2)
http://www.intel.com/design/pentium4/manuals/245471.htm
Aug 26 2002
next sibling parent Pavel Minayev <evilone omen.ru> writes:
On Tue=2C 27 Aug 2002 02=3A52=3A42 +0000 =28UTC=29 Antti Syk=5Fri
=3Cjsykari=40cc=2Ehut=2Efi=3E wrote=3A

=3E int val=2C int q
=3E foo=28int x=2C inout int z=2C int q=29=3B

In other words=2C you want tuples=2E A nice feature=2E But out parameters are a
bit 
more
than that - don't forget that you can overload functions based on types of their
parameters=2C and not by return type! For example=2C in stream=2Ed=2C I wrote=3A

=09void read=28out byte x=29=3B
=09void read=28out short x=29=3B
=09void read=28out int x=29=3B
=09=2E=2E=2E

And then you can write=3A

=09int n=3B
=09file=2Eread=28n=29=3B

And correct version will be called=2E
 
=3E Maybe a reference =E0 la C++=3F
=3E 
=3E int val=2C int q
=3E foo=28int x=2C int& z=2C int q=29=3B
=3E 
=3E IMHO this syntax is nicer to read because there are no =22inout=22 words
=3E there=2C and the =22in=22 and =22out=22 values are grouped nicely=2E

I still like =22out=22 more=2E 
 
=3E Now=2C if you want to return nothing=2C just use=3A
=3E 
=3E f=28=29=3B

This makes it harder to parse=2E Also=2C when I look at it=2C my mind sees a 
function call=2C and
not a declaration=2E =22void=22 can be treated as =22procedure=22 in other
languages=2C so 
I don't
really see a problem here=2E
 
=3E We need a new type expression altogether=2C let's call it a multivalued
=3E expression=2E Like I said=2C I'm no grammar expert so I'm not going to
=3E concentrate on that side=2E But it seems we'd have to scrap the
=3E well-served comma expression=2E Let's see what we can figure out to
=3E replace it later=2C and now concentrate on the multivalued function thing=2E

It is called a tuple=2E 

 
=3E If you went so far as to make tail recursion optimization a feature of
=3E the language =28as in scheme=29=2C we are sure to see code like=3A
=3E 
=3E =2F=2F recursive helper function
=3E private int inline max=28int=5B=5D array=2C int maxSoFar=2C uint idx=29
=3E {
=3E     if =28idx =3D=3D array=2Elength=29
=3E         maxSoFar=3B
=3E     else
=3E         max=28array=2C max=28array=5Bi=5D=2C maxSoFar=29=2C idx + 1=29=3B
=3E 
=3E }

Yes=2C and then D will become a functional language=2E=2E=2E
No=2C thanks! =3D=29
 
Aug 27 2002
prev sibling parent "Sean L. Palmer" <seanpalmer earthlink.net> writes:
I think I, for one, am with you on all these issues.

Sean

"Antti Sykäri" <jsykari cc.hut.fi> wrote in message
news:akephp$sbc$1 digitaldaemon.com...
 Ideas, thoughts, and criticisms (part 2)
 ----------------------------------------

 (The topic says functions. Today's rant is actually about a bit more
 than that: multiple-value expressions, the unification of expressions
 and statements, and a sidenote about how to implement generics like I
 would do it. (The C++ way that is. *g*))

Aug 27 2002