D - Ideas for language

Marko Tintor marko pkj.co.yu (49/49) Mar 01 2003 1) shorter relational expressions

Ilya Minkov (43/92) Mar 01 2003 Hm. Requieres some thought on whether that's implementable.

Antti Sykari (101/152) Mar 01 2003 Many of the suggestions below have been given earlier.

Achillefs Margaritis (15/170) Mar 02 2003 ADA has "smart unions" named "discrimininants". In the declaration of th...
Ilya Minkov (75/192) Mar 02 2003 And it has been implemented in Python because of "being pretty obvious"....

Daniel Yokomiso (62/254) Mar 03 2003 Hi,

Ilya Minkov (18/33) Mar 03 2003 It's not all that clear to me why. You have to scan every function, so

Bill Cox (38/45) Mar 03 2003 I can help you there. Breaking loops in a directed graph requires only
Daniel Yokomiso (12/45) Mar 03 2003 As I said "if the compiler CAN'T annotate the object code with purity

Mike Wynn (3/9) Mar 03 2003 look at LUA (www.lua.org) the only thing that is require is a change in ...

Marko Tintor marko pkj.co.yu writes:

1) shorter relational expressions
it is easyer to write
a <= b < c == d
instead of
a <= b && b < c && c == d
+ b and c are evaluated only once

2) multiple assignment
a,b,c = A,B,C;
is executed like this:
A,B,C is evaluated from left to right
then values are assigned to a,b,c from left to right
(a=A, b=B, c=C)

3) ^ power, =>, <= and <=> logical operators
8^4 ... 8*8*8*8
a => b ... !a || b
a <= b ... b => a
a <=> b ... (a => b) && (a <= b)

4) optimization idea
if function F has no side effect and its arguments are constants
it can be computed at compile time

5) better swich
old swich:
switch(exp0)
{  
case constexpr1: command1
case constexpr2: command2
case constexpr3: command3
default: command4
}
is expanded:
if(exp0 == constexpr1) goto label1;
if(exp0 == constexpr2) goto label2;
if(exp0 == constexpr3) goto label3;
goto label4;
label1: command1;
label2: command2;
label3: command3;
label4: command4;

better switch:
switch(exp0)
case(exp1) command1
case(exp2) command2
case(exp3) command3
else command4
is expanded:
if(exp0 == exp1) command1 else
if(exp0 == exp2) command2 else
if(exp0 == exp3) command3 else
command4

Mar 01 2003

Ilya Minkov <midiclub 8ung.at> writes:

Welcome stranger!

Marko wrote:
 1) shorter relational expressions
 it is easyer to write
 a <= b < c == d
 instead of
 a <= b && b < c && c == d
 + b and c are evaluated only once

Hm. Requieres some thought on whether that's implementable.

RealtionalExp -> [Exp RelOp]* Exp.

Hm.

 2) multiple assignment
 a,b,c = A,B,C;
 is executed like this:
 A,B,C is evaluated from left to right
 then values are assigned to a,b,c from left to right
 (a=A, b=B, c=C)

We've gone throught such things a couple of times. The question is: can 
you (or anyone) find examples which would make this feature useful.

 3) ^ power, =>, <= and <=> logical operators
 8^4 ... 8*8*8*8
 a => b ... !a || b
 a <= b ... b => a
 a <=> b ... (a => b) && (a <= b)

Yes. However ^ is taken by XOR. How about ** ?
I also wanted that division int/int=float, and to have a separate 
integer division operator. This would decimate the number of stupid 
numeric bugs, and would not yuild to many new bugs because a compiler 
would warn about a type mismatch.

 4) optimization idea
 if function F has no side effect and its arguments are constants
 it can be computed at compile time

Many, many people have had this good idea. :)
This would also give a number of other interesting things. Example: 
regexp could be saved in a program in a compiled form, instead of 
translating them at run-time. Though it doesn't buy much with regexp, it 
would with larger interpreted sub-languages.

I guess there's some problem checking purity of functions though. It 
would mean a need for recursive analysis of the (almost) whole program. 
All functions would qualify, which only access constant globals if any, 
and only functions also qualifed if any. Can a recursive set of 
functions be qualified?

 5) better swich
 old swich:
 switch(exp0)
 {  
 case constexpr1: command1
 case constexpr2: command2
 case constexpr3: command3
 default: command4
 }
 is expanded:
 if(exp0 == constexpr1) goto label1;
 if(exp0 == constexpr2) goto label2;
 if(exp0 == constexpr3) goto label3;
 goto label4;
 label1: command1;
 label2: command2;
 label3: command3;
 label4: command4;

 better switch:
 switch(exp0)
 case(exp1) command1
 case(exp2) command2
 case(exp3) command3
 else command4
 is expanded:
 if(exp0 == exp1) command1 else
 if(exp0 == exp2) command2 else
 if(exp0 == exp3) command3 else
 command4

One thing: a switch is NEVER CONVERTED TO IF's!!! It is a mean to create 
a jump table: "take an input value, make some math on it which yuilds an 
index, make a table lookup, jump to the adress noted in the table". 
That's why switch is so darned efficient - only a few CPU cycles!!! And 
you see why it has such a semantics in C.

What you mean here, is that a "break" is implicit. Walter doesn't want 
it. He's not exactly a young person, and i guess he suspects if he does 
it, he'd be having bugs because it doesn't work the "good old C way". 
And many other programmers as well. He'll rather give you a better 
compiler which will tell you whenever you're missing a "break". I guess 
that'd also be his argument to division. A compiler cannot warn you with 
division though.

There's another problem to it: how do you check for ranges in "switch"? 
Now you can write "case a: case b: case c" else you would be thinking 
out a new syntax on it. Which could be like array slicing, or 
inclusive... Which would anyway cause programmers making huge slices and 
bloating a jump table, which is not a good idea at all.

One thing i find very important to implement is a "smart union", which 
knows is current state. It would fix the unsafety of the normal union 
and make serialisation possible.

-i.

Mar 01 2003

Antti Sykari <jsykari gamma.hut.fi> writes:

Many of the suggestions below have been given earlier.

However, perhaps it's good to repeat why some of them are worth
considering.

Ilya Minkov <midiclub 8ung.at> writes:
 Marko wrote:
 1) shorter relational expressions
 it is easyer to write
 a <= b < c == d
 instead of
 a <= b && b < c && c == d
 + b and c are evaluated only once


First of all, I think that here is a useful feature. At least for the
relational (<, <=, >=, >) expression part.

"a < b < c" is semantically obvious to everyone.  It's compact and
more readable than "a < b && b < c", and causes less bugs: people do
write things like "1 < a < 5" in accident, and get bitten.

Most importantly, it's comfortable to use.

The only downside IMO is that there are some subtle issues to decide
and hence the feature requires some time to design and implement.
Such as:

- Is the operation short-circuited?  (Adding another short-circuited
"operator" in the language -- but why not?)
- Are the operations evaluated only once?  (Logically, yes, since they
are written in the source code only once)
- How does this interact with operator overloading?  (My guess would
be "translate to a<b && b<c && ... first, then do overloading)
- Apply that to == and != (and === and !==), too?  How about
"a < b == x < y"?  (I wouldn't)

Someone might also say that all extra features are bad, because they
require time and effort to learn and teach and write about.  For
example, the old complaint that there are 4 ways to increment a
variable in C.  But this applies only to features which aren't
obviously easy to use, and this one is.  Also I could envision it to
be close-to obvious to implement.

 Hm. Requieres some thought on whether that's implementable.

 RealtionalExp -> [Exp RelOp]* Exp.

Most probably everything is implementable :) Maybe not in a
straightforward manner (like direct translation from a<b<c to a<b&&b<c
-- that would cause the duplication of the expressions, which is not
probably desired), but implementable anyway.  Probably a change in the
back-end/intermediate representation of the compiler to take into
account sequential CmpExps and do things like "eval the first and
second; compare; if false, quit; eval the third; compare with the
second; etc."

 2) multiple assignment
 a,b,c = A,B,C;
 is executed like this:
 A,B,C is evaluated from left to right
 then values are assigned to a,b,c from left to right
 (a=A, b=B, c=C)


This has also subtle issues.  Joe R. Newbie will try this (or as swap
is likely to be implemented as a library routine, something similar):

void swap(inout int a, inout int b) {
    a, b = b, a;
}

which translates to:

    a = b;
    b = a;

and leads to problems. Should this be the default behavior or
something which first assigns to temporary values?

Other things to consider:

- should functions return multiple values
- if f returns two values, what does "a, b = f()" mean?
- if f returns two values, waht does "a = f()" mean?
- if f returns one value, what does "a, b = f(), y()" mean?
- do these features fit into the grammar seamlessly?

 We've gone throught such things a couple of times. The question is:
 can you (or anyone) find examples which would make this feature useful.

With the presence of "out" parameters, I'm not sure that assigning
multiple values is that useful.  But hey, at least it looks cool :-)

 3) ^ power, =>, <= and <=> logical operators
 8^4 ... 8*8*8*8
 a => b ... !a || b
 a <= b ... b => a
 a <=> b ... (a => b) && (a <= b)

 Yes. However ^ is taken by XOR. How about ** ?

If I were implementing a language from scratch, I'd probably use ^ as
an exponentiation operator. ^ doesn't particularly look like "xor" if
you haven't done much bit-level C programming. (Nor does | look like
"or" but that's another issue altogether...)

** clashes with multiplication combined with pointer dereference.

<= also clashes with the "less-or-equal" operator.

"<-", "->", and "<->" could be used if logical operators were
required.  (At least if -> pointer syntax were to be demolished)
But still, there's the issue of:

a<-b; // a <- b  or a < -b ?

 4) optimization idea
 if function F has no side effect and its arguments are constants
 it can be computed at compile time

 Many, many people have had this good idea. :)
 This would also give a number of other interesting things. Example:
 regexp could be saved in a program in a compiled form, instead of
 translating them at run-time. Though it doesn't buy much with regexp,
 it would with larger interpreted sub-languages.

 I guess there's some problem checking purity of functions though. It
 would mean a need for recursive analysis of the (almost) whole
 program. All functions would qualify, which only access constant
 globals if any, and only functions also qualifed if any. Can a
 recursive set of functions be qualified?

Having pure compile-time functions would be very neat, but effectively
it would require a D compiler to be a D interpreter at the same time.
Dunno about the purity checking.  I suppose you will very soon find
out the purity of a function when the compiler starts interpreting a
function and eventually tries to read a global variable, format your
hard disk, or send a network packet, or something else "impure" ;)

An alternative would be to require some different kind of syntax for
pure functions, effectively making them to be of different type of
functions.  And all functions that are pure should be declared pure,
or we have yet again the annoying problem I mentioned in the other
post: we really know we have pure functions, but we can't use them
because they are not declared pure.


[switch]
 One thing: a switch is NEVER CONVERTED TO IF's!!! It is a mean to
 create a jump table: "take an input value, make some math on it which
 yuilds an index, make a table lookup, jump to the adress noted in the
 table". That's why switch is so darned efficient - only a few CPU
 cycles!!! And you see why it has such a semantics in C.

It just *might* be converted to if statements, though, and you'll
never know unless you disassemble the object code ;)

 I guess that'd also be his argument to division. A compiler cannot
 warn you with division though.

Sometimes it sounds like a nice idea to make all arithmetic operators
like "div: Int x Int -> Int" for each integral type Int.  Only it
isn't so.

Case 1:  current machine architectures have operations like 
"mul: Int32 x Int32 -> Int64" and
"div: Int64 x Int32 -> Int32 x Int32" (both dividend and remainder
computed);

Case 2: integer division actually produces a rational number, and
probably the technically most feasible solution would be using floats.

But I guess that some C compatibility is in order sometimes.  At least
it might make the language more comfortable to some people.

 There's another problem to it: how do you check for ranges in
 "switch"? Now you can write "case a: case b: case c" else you would be
 thinking out a new syntax on it. Which could be like array slicing, or
 inclusive... Which would anyway cause programmers making huge slices
 and bloating a jump table, which is not a good idea at all.

Sounds like a good place for syntax "case 1..5:"

 One thing i find very important to implement is a "smart union", which
 knows is current state. It would fix the unsafety of the normal union
 and make serialisation possible.

In C++, this can be achieved with metaprogramming (
http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Variant)
although it isn't very simple.  An ideal language would provide the
right abstractions for implementing things like a smart union, and
then include it in the standard library.  (I can dream, can't I...)

D gets better and better with time (by which I mean delegates and
everything)... but there still are couple of feature requests that
keep coming up.  Perhaps some kind of wiki, bugzilla or similar would
be suitable for tracking feature requests and storing comments,
possibly even voting for them.

-Antti

Mar 01 2003

"Achillefs Margaritis" <axilmar b-online.gr> writes:

ADA has "smart unions" named "discrimininants". In the declaration of the
object the programmer can declare a record that is parameterised according
to passed value. For example:

record variant(value:integer) is
    case value
        when 0 =>
            param1:string;
        when 1 =>
            param1:integer;
    end case;
end record;

stringvar: variant(0);
intvar: variant(1);

"Antti Sykari" <jsykari gamma.hut.fi> wrote in message
news:87el5q8srb.fsf hoastest1-8c.hoasnet.inet.fi...
 Many of the suggestions below have been given earlier.

 However, perhaps it's good to repeat why some of them are worth
 considering.

 Ilya Minkov <midiclub 8ung.at> writes:
 Marko wrote:
 1) shorter relational expressions
 it is easyer to write
 a <= b < c == d
 instead of
 a <= b && b < c && c == d
 + b and c are evaluated only once


 First of all, I think that here is a useful feature. At least for the
 relational (<, <=, >=, >) expression part.

 "a < b < c" is semantically obvious to everyone.  It's compact and
 more readable than "a < b && b < c", and causes less bugs: people do
 write things like "1 < a < 5" in accident, and get bitten.

 Most importantly, it's comfortable to use.

 The only downside IMO is that there are some subtle issues to decide
 and hence the feature requires some time to design and implement.
 Such as:

 - Is the operation short-circuited?  (Adding another short-circuited
 "operator" in the language -- but why not?)
 - Are the operations evaluated only once?  (Logically, yes, since they
 are written in the source code only once)
 - How does this interact with operator overloading?  (My guess would
 be "translate to a<b && b<c && ... first, then do overloading)
 - Apply that to == and != (and === and !==), too?  How about
 "a < b == x < y"?  (I wouldn't)

 Someone might also say that all extra features are bad, because they
 require time and effort to learn and teach and write about.  For
 example, the old complaint that there are 4 ways to increment a
 variable in C.  But this applies only to features which aren't
 obviously easy to use, and this one is.  Also I could envision it to
 be close-to obvious to implement.

 Hm. Requieres some thought on whether that's implementable.

 RealtionalExp -> [Exp RelOp]* Exp.

 Most probably everything is implementable :) Maybe not in a
 straightforward manner (like direct translation from a<b<c to a<b&&b<c
 -- that would cause the duplication of the expressions, which is not
 probably desired), but implementable anyway.  Probably a change in the
 back-end/intermediate representation of the compiler to take into
 account sequential CmpExps and do things like "eval the first and
 second; compare; if false, quit; eval the third; compare with the
 second; etc."

 2) multiple assignment
 a,b,c = A,B,C;
 is executed like this:
 A,B,C is evaluated from left to right
 then values are assigned to a,b,c from left to right
 (a=A, b=B, c=C)


 This has also subtle issues.  Joe R. Newbie will try this (or as swap
 is likely to be implemented as a library routine, something similar):

 void swap(inout int a, inout int b) {
     a, b = b, a;
 }

 which translates to:

     a = b;
     b = a;

 and leads to problems. Should this be the default behavior or
 something which first assigns to temporary values?

 Other things to consider:

 - should functions return multiple values
 - if f returns two values, what does "a, b = f()" mean?
 - if f returns two values, waht does "a = f()" mean?
 - if f returns one value, what does "a, b = f(), y()" mean?
 - do these features fit into the grammar seamlessly?

 We've gone throught such things a couple of times. The question is:
 can you (or anyone) find examples which would make this feature useful.

 With the presence of "out" parameters, I'm not sure that assigning
 multiple values is that useful.  But hey, at least it looks cool :-)

 3) ^ power, =>, <= and <=> logical operators
 8^4 ... 8*8*8*8
 a => b ... !a || b
 a <= b ... b => a
 a <=> b ... (a => b) && (a <= b)

 Yes. However ^ is taken by XOR. How about ** ?

 If I were implementing a language from scratch, I'd probably use ^ as
 an exponentiation operator. ^ doesn't particularly look like "xor" if
 you haven't done much bit-level C programming. (Nor does | look like
 "or" but that's another issue altogether...)

 ** clashes with multiplication combined with pointer dereference.

 <= also clashes with the "less-or-equal" operator.

 "<-", "->", and "<->" could be used if logical operators were
 required.  (At least if -> pointer syntax were to be demolished)
 But still, there's the issue of:

 a<-b; // a <- b  or a < -b ?

 4) optimization idea
 if function F has no side effect and its arguments are constants
 it can be computed at compile time

 Many, many people have had this good idea. :)
 This would also give a number of other interesting things. Example:
 regexp could be saved in a program in a compiled form, instead of
 translating them at run-time. Though it doesn't buy much with regexp,
 it would with larger interpreted sub-languages.

 I guess there's some problem checking purity of functions though. It
 would mean a need for recursive analysis of the (almost) whole
 program. All functions would qualify, which only access constant
 globals if any, and only functions also qualifed if any. Can a
 recursive set of functions be qualified?

 Having pure compile-time functions would be very neat, but effectively
 it would require a D compiler to be a D interpreter at the same time.
 Dunno about the purity checking.  I suppose you will very soon find
 out the purity of a function when the compiler starts interpreting a
 function and eventually tries to read a global variable, format your
 hard disk, or send a network packet, or something else "impure" ;)

 An alternative would be to require some different kind of syntax for
 pure functions, effectively making them to be of different type of
 functions.  And all functions that are pure should be declared pure,
 or we have yet again the annoying problem I mentioned in the other
 post: we really know we have pure functions, but we can't use them
 because they are not declared pure.


 [switch]
 One thing: a switch is NEVER CONVERTED TO IF's!!! It is a mean to
 create a jump table: "take an input value, make some math on it which
 yuilds an index, make a table lookup, jump to the adress noted in the
 table". That's why switch is so darned efficient - only a few CPU
 cycles!!! And you see why it has such a semantics in C.

 It just *might* be converted to if statements, though, and you'll
 never know unless you disassemble the object code ;)

 I guess that'd also be his argument to division. A compiler cannot
 warn you with division though.

 Sometimes it sounds like a nice idea to make all arithmetic operators
 like "div: Int x Int -> Int" for each integral type Int.  Only it
 isn't so.

 Case 1:  current machine architectures have operations like
 "mul: Int32 x Int32 -> Int64" and
 "div: Int64 x Int32 -> Int32 x Int32" (both dividend and remainder
 computed);

 Case 2: integer division actually produces a rational number, and
 probably the technically most feasible solution would be using floats.

 But I guess that some C compatibility is in order sometimes.  At least
 it might make the language more comfortable to some people.

 There's another problem to it: how do you check for ranges in
 "switch"? Now you can write "case a: case b: case c" else you would be
 thinking out a new syntax on it. Which could be like array slicing, or
 inclusive... Which would anyway cause programmers making huge slices
 and bloating a jump table, which is not a good idea at all.

 Sounds like a good place for syntax "case 1..5:"

 One thing i find very important to implement is a "smart union", which
 knows is current state. It would fix the unsafety of the normal union
 and make serialisation possible.

 In C++, this can be achieved with metaprogramming (
 http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Variant)
 although it isn't very simple.  An ideal language would provide the
 right abstractions for implementing things like a smart union, and
 then include it in the standard library.  (I can dream, can't I...)

 D gets better and better with time (by which I mean delegates and
 everything)... but there still are couple of feature requests that
 keep coming up.  Perhaps some kind of wiki, bugzilla or similar would
 be suitable for tracking feature requests and storing comments,
 possibly even voting for them.

 -Antti

Mar 02 2003

Ilya Minkov <ilminkov planet-interkom.de> writes:

Antti Sykari wrote:
 Many of the suggestions below have been given earlier.
 
 However, perhaps it's good to repeat why some of them are worth
 considering.
 
 First of all, I think that here is a useful feature. At least for the
 relational (<, <=, >=, >) expression part.
 
 "a < b < c" is semantically obvious to everyone.  It's compact and
 more readable than "a < b && b < c", and causes less bugs: people do
 write things like "1 < a < 5" in accident, and get bitten.
 
 Most importantly, it's comfortable to use.

And it has been implemented in Python because of "being pretty obvious". 
Python is a language that allows to do obvious things the obvious way. 
I'll write an article, summarizing all unusual decisions made in its 
design and post it here. D could learn something from it.

 The only downside IMO is that there are some subtle issues to decide
 and hence the feature requires some time to design and implement.
 Such as:
 
 - Is the operation short-circuited?  (Adding another short-circuited
 "operator" in the language -- but why not?)
 - Are the operations evaluated only once?  (Logically, yes, since they
 are written in the source code only once)

Evaluation order has always been undefined. I guess it also doesn't
state how many times a function is called if it's in the same sequence
group (C defines "sequence points", remember?). You could only safely
use pure functions in expressions so far, and that's how it has to stay.

 - How does this interact with operator overloading?  (My guess would
 be "translate to a<b && b<c && ... first, then do overloading)
 - Apply that to == and != (and === and !==), too?  How about
 "a < b == x < y"?  (I wouldn't)

Ouch.

 Someone might also say that all extra features are bad, because they
 require time and effort to learn and teach and write about.  For
 example, the old complaint that there are 4 ways to increment a
 variable in C.  But this applies only to features which aren't
 obviously easy to use, and this one is.  Also I could envision it to
 be close-to obvious to implement.

No, this point has not been really criticised, because it's usually 
obvious when you increment vars, one way or another. It has been 
criticised, that there are generally too many ways to make obvious 
things, making them unobvious at first sight. All almost equally bad.

2) multiple assignment
a,b,c = A,B,C;
is executed like this:
A,B,C is evaluated from left to right
then values are assigned to a,b,c from left to right
(a=A, b=B, c=C)


 
 
 This has also subtle issues.  Joe R. Newbie will try this (or as swap
 is likely to be implemented as a library routine, something similar):
 
 void swap(inout int a, inout int b) {
     a, b = b, a;
 }
 
 which translates to:
 
     a = b;
     b = a;
 
 and leads to problems. Should this be the default behavior or
 something which first assigns to temporary values?

No way should it work like that!!! Such a feature, if introduced, should 
work the *obvious* way. This feature has been known as tuples in other 
languages, and should work the same way.

 Other things to consider:

They have to mean obvious things. In general, Daniel Yokomiso has 
already made some thoughts on this topic, and he has probably come up 
with some suitable solution. I'll have to take a look at it, or we could 
simply ask him. He develops his own impure functional language, which 
could supersede Haskell and OCaml. :)

 - should functions return multiple values

Yes. Tuples.

 - if f returns two values, what does "a, b = f()" mean?

The obvious.

 - if f returns two values, waht does "a = f()" mean?

Error. Discarded values have been already a major plaque in C. You 
forget function call parenthesis - and whoops! I guess you know that. If 
someone means to use one return value, then he should state that, like
"a, null = f()"

 - if f returns one value, what does "a, b = f(), y()" mean?

Error.

 - do these features fit into the grammar seamlessly?

Dunno. There should be a way to make them fit into grammar. I have not 
read compiler sources yet, and i'm going to. And i might then do what 
Burton promised, but has not done so far: write documentation on them.

Yes. However ^ is taken by XOR. How about ** ?

 If I were implementing a language from scratch, I'd probably use ^ as
 an exponentiation operator. ^ doesn't particularly look like "xor" if
 you haven't done much bit-level C programming. (Nor does | look like
 "or" but that's another issue altogether...)
 
 ** clashes with multiplication combined with pointer dereference.

It's no problem. You can't write "x+++++y", you have to separate it with 
spaces - now, you could requere that "* *" is mul and dereference, and 
"**" is power. I also proposed once that if you have a function with 2 
parameters and one return value, that it can be called like "y = a 'fun' 
b" which expands to "y = fun(a, b)". Some lexical mean is probably 
requiered to recognise in-fix functions.

 <= also clashes with the "less-or-equal" operator.

I'm a fool, i've notised that too late.

 
 "<-", "->", and "<->" could be used if logical operators were
 required.  (At least if -> pointer syntax were to be demolished)
 But still, there's the issue of:
 
 a<-b; // a <- b  or a < -b ?

It's not a problem, it's a decision question. It's solved the same way 
as so far.

 Having pure compile-time functions would be very neat, but effectively
 it would require a D compiler to be a D interpreter at the same time.
 Dunno about the purity checking.  I suppose you will very soon find
 out the purity of a function when the compiler starts interpreting a
 function and eventually tries to read a global variable, format your
 hard disk, or send a network packet, or something else "impure" ;)

No, that's really not a problem. Do you know how compiler's semantic 
analyser and the constant wrappers work? No, not exactly interpreters, 
but are very close to that. I have mentioned the very clean criteriums 
for a purity of a function, here i re-formulate them in a more 
straightforward form:

  - if a function is external (ie no source is available for it), it is 
impure. Unless its somewhere explicitly stated otherwise. (ie const 
qualifier in declaration?)
  - function body is scanned for variable acesses. If it acesses any 
global variables that are not constant, it is unpure. Even if it only 
reads them, becuase some other function might have modified them, making 
a function yuild inconsistent results.
  - function body is scanned for function calls. All these functions 
must also qualify to be pure, else this one isn't.

 An alternative would be to require some different kind of syntax for
 pure functions, effectively making them to be of different type of
 functions.  And all functions that are pure should be declared pure,
 or we have yet again the annoying problem I mentioned in the other
 post: we really know we have pure functions, but we can't use them
 because they are not declared pure.

const qualifier on return type? I can't imgine of anything else it could 
mean.

 
 [switch]
 
 It just *might* be converted to if statements, though, and you'll
 never know unless you disassemble the object code ;)

Urgh, well, initial math contains IFs, but the switch body generally 
doesn't. Though sure it might, it's not the basic idea of it. But when 
you allow for range syntax a..b, there would be a real need for a 
compiler to split these switches into several jump tables connected with 
IFs. Though this might be good cause it could make the source more terse 
and reduce the probability of a bug... BUT THEN it would also be a good 
idea to introduce pascalese range/set type.

 Sometimes it sounds like a nice idea to make all arithmetic operators
 like "div: Int x Int -> Int" for each integral type Int.  Only it
 isn't so.
 
 Case 1:  current machine architectures have operations like 
 "mul: Int32 x Int32 -> Int64" and
 "div: Int64 x Int32 -> Int32 x Int32" (both dividend and remainder
 computed);

Basically yes.

 Case 2: integer division actually produces a rational number, and
 probably the technically most feasible solution would be using floats.

ieek, i've been looking for a new word for "extended" and ot the wrong 
one, i meant "real". "float" is very limited and should not be used for 
intermediates.

 But I guess that some C compatibility is in order sometimes.  At least
 it might make the language more comfortable to some people.

"seem to be more comfortable" :)
But for a C successor it's kind of vital.

One thing i find very important to implement is a "smart union", which
knows is current state. It would fix the unsafety of the normal union
and make serialisation possible.

 
 
 In C++, this can be achieved with metaprogramming (
 http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Variant)
 although it isn't very simple.  An ideal language would provide the
 right abstractions for implementing things like a smart union, and
 then include it in the standard library.  (I can dream, can't I...)

Nope. Some languages don't include a union at all, but insteat a smart 
union. Dynamic lanuages usually don't have stuff like that at all, since 
you always can change a variables's type and check types.

 D gets better and better with time (by which I mean delegates and
 everything)... but there still are couple of feature requests that
 keep coming up.  Perhaps some kind of wiki, bugzilla or similar would
 be suitable for tracking feature requests and storing comments,
 possibly even voting for them.

Noone reads these. That's why we have this newsgroup :)

-i.

Mar 02 2003

Daniel Yokomiso <Daniel_member pathlink.com> writes:

Hi,

Comments embedded.

In article <b3ub9r$shi$1 digitaldaemon.com>, Ilya Minkov says...
Antti Sykari wrote:
 Many of the suggestions below have been given earlier.
 
 However, perhaps it's good to repeat why some of them are worth
 considering.
 
 First of all, I think that here is a useful feature. At least for the
 relational (<, <=, >=, >) expression part.
 
 "a < b < c" is semantically obvious to everyone.  It's compact and
 more readable than "a < b && b < c", and causes less bugs: people do
 write things like "1 < a < 5" in accident, and get bitten.
 
 Most importantly, it's comfortable to use.

And it has been implemented in Python because of "being pretty obvious". 
Python is a language that allows to do obvious things the obvious way. 
I'll write an article, summarizing all unusual decisions made in its 
design and post it here. D could learn something from it.

 The only downside IMO is that there are some subtle issues to decide
 and hence the feature requires some time to design and implement.
 Such as:
 
 - Is the operation short-circuited?  (Adding another short-circuited
 "operator" in the language -- but why not?)
 - Are the operations evaluated only once?  (Logically, yes, since they
 are written in the source code only once)

Evaluation order has always been undefined. I guess it also doesn't
state how many times a function is called if it's in the same sequence
group (C defines "sequence points", remember?). You could only safely
use pure functions in expressions so far, and that's how it has to stay.

 - How does this interact with operator overloading?  (My guess would
 be "translate to a<b && b<c && ... first, then do overloading)
 - Apply that to == and != (and === and !==), too?  How about
 "a < b == x < y"?  (I wouldn't)

Ouch.


Hmmm, IMO we could just copy Icon ( http://www.cs.arizona.edu/icon/ )
generators, at least a piece of them, to implement this correctly, without
leaving semantic problems. It works like this:


if a < b == x < y then write ("Ok!")


This evaluates left to right:

1 - if a < b it returns b, else it fails (failure is a generator thing).
2 - using the b value returned it compares to x. If they're equal, it returns x
value, else it fails.
3 - using the x value returned it compares to y. If less then, it returns y,
else fails.
4 - if a value was returned it continues to the then part.

The nice thing about this is that using success/failure instead of true vs.
false for relational expressions lead to better syntax and semantics. In Icon
one can write:


if y < (x | 5) then write("y=", y)


and it'll work correctly, comparing y with x and with 5. Also more powerful
stuff can be written:


if (a | b | c) = (d | e | f) then write("Ok!")


Of course this would lead to big changes in D, but it can do many obvious things
possible (like the "y < (x | 5)" stuff). It also can be used to implement
multiple return values with same type.


 Someone might also say that all extra features are bad, because they
 require time and effort to learn and teach and write about.  For
 example, the old complaint that there are 4 ways to increment a
 variable in C.  But this applies only to features which aren't
 obviously easy to use, and this one is.  Also I could envision it to
 be close-to obvious to implement.

No, this point has not been really criticised, because it's usually 
obvious when you increment vars, one way or another. It has been 
criticised, that there are generally too many ways to make obvious 
things, making them unobvious at first sight. All almost equally bad.

2) multiple assignment
a,b,c = A,B,C;
is executed like this:
A,B,C is evaluated from left to right
then values are assigned to a,b,c from left to right
(a=A, b=B, c=C)


 
 
 This has also subtle issues.  Joe R. Newbie will try this (or as swap
 is likely to be implemented as a library routine, something similar):
 
 void swap(inout int a, inout int b) {
     a, b = b, a;
 }
 
 which translates to:
 
     a = b;
     b = a;
 
 and leads to problems. Should this be the default behavior or
 something which first assigns to temporary values?

No way should it work like that!!! Such a feature, if introduced, should 
work the *obvious* way. This feature has been known as tuples in other 
languages, and should work the same way.

 Other things to consider:

They have to mean obvious things. In general, Daniel Yokomiso has 
already made some thoughts on this topic, and he has probably come up 
with some suitable solution. I'll have to take a look at it, or we could 
simply ask him. He develops his own impure functional language, which 
could supersede Haskell and OCaml. :)


I cheat ;-) It has some simple solutions for this, like letting the compiler
create all the temporary variables and deal with any evaluation order problems,
like "int x, y = i++, i++;". Eon has no side-effects in expressions, so it can
get away with tuples. D has to deal with this problems. But I think that tuples
are a nice thing to have, including tuple constructors (bind them together) and
tuple deconstructors (tear them apart). Using an iterative fibonnaci solution:

int fib(int n)
in {
assert(n > 0);
} {
int a, b = 0, 1;
for (int i = 0; i < n; i++) {
a, b = b, a + b;
}
return a;
}


tuples lead to cleaner syntax, without temp variables. This is toy code, but
it's pretty :-)


 - should functions return multiple values

Yes. Tuples.

 - if f returns two values, what does "a, b = f()" mean?

The obvious.

 - if f returns two values, waht does "a = f()" mean?

Error. Discarded values have been already a major plaque in C. You 
forget function call parenthesis - and whoops! I guess you know that. If 
someone means to use one return value, then he should state that, like
"a, null = f()"


Unless a is of "(int, int)" type.


 - if f returns one value, what does "a, b = f(), y()" mean?

Error.


Depends on a and b types.


int f();
(int, int, int) y();

int a;
(int, int, int) b;

a, b = f(), y();


should compile and run ok. At least it's "pretty obvious".

 - do these features fit into the grammar seamlessly?

Dunno. There should be a way to make them fit into grammar. I have not 
read compiler sources yet, and i'm going to. And i might then do what 
Burton promised, but has not done so far: write documentation on them.

Yes. However ^ is taken by XOR. How about ** ?

 If I were implementing a language from scratch, I'd probably use ^ as
 an exponentiation operator. ^ doesn't particularly look like "xor" if
 you haven't done much bit-level C programming. (Nor does | look like
 "or" but that's another issue altogether...)
 
 ** clashes with multiplication combined with pointer dereference.

It's no problem. You can't write "x+++++y", you have to separate it with 
spaces - now, you could requere that "* *" is mul and dereference, and 
"**" is power. I also proposed once that if you have a function with 2 
parameters and one return value, that it can be called like "y = a 'fun' 
b" which expands to "y = fun(a, b)". Some lexical mean is probably 
requiered to recognise in-fix functions.

 <= also clashes with the "less-or-equal" operator.

I'm a fool, i've notised that too late.

 
 "<-", "->", and "<->" could be used if logical operators were
 required.  (At least if -> pointer syntax were to be demolished)
 But still, there's the issue of:
 
 a<-b; // a <- b  or a < -b ?

It's not a problem, it's a decision question. It's solved the same way 
as so far.

 Having pure compile-time functions would be very neat, but effectively
 it would require a D compiler to be a D interpreter at the same time.
 Dunno about the purity checking.  I suppose you will very soon find
 out the purity of a function when the compiler starts interpreting a
 function and eventually tries to read a global variable, format your
 hard disk, or send a network packet, or something else "impure" ;)

No, that's really not a problem. Do you know how compiler's semantic 
analyser and the constant wrappers work? No, not exactly interpreters, 
but are very close to that. I have mentioned the very clean criteriums 
for a purity of a function, here i re-formulate them in a more 
straightforward form:

  - if a function is external (ie no source is available for it), it is 
impure. Unless its somewhere explicitly stated otherwise. (ie const 
qualifier in declaration?)
  - function body is scanned for variable acesses. If it acesses any 
global variables that are not constant, it is unpure. Even if it only 
reads them, becuase some other function might have modified them, making 
a function yuild inconsistent results.
  - function body is scanned for function calls. All these functions 
must also qualify to be pure, else this one isn't.


As an side note, this is similar to Hindley-Milner type inference algorithm.
Its complexity is big (IIRC it's greater than exponential space), so it mays
lead to larger compile-times if the compiler can't annotate the object code with
purity marks.


 An alternative would be to require some different kind of syntax for
 pure functions, effectively making them to be of different type of
 functions.  And all functions that are pure should be declared pure,
 or we have yet again the annoying problem I mentioned in the other
 post: we really know we have pure functions, but we can't use them
 because they are not declared pure.

const qualifier on return type? I can't imgine of anything else it could 
mean.

 
 [switch]
 
 It just *might* be converted to if statements, though, and you'll
 never know unless you disassemble the object code ;)

Urgh, well, initial math contains IFs, but the switch body generally 
doesn't. Though sure it might, it's not the basic idea of it. But when 
you allow for range syntax a..b, there would be a real need for a 
compiler to split these switches into several jump tables connected with 
IFs. Though this might be good cause it could make the source more terse 
and reduce the probability of a bug... BUT THEN it would also be a good 
idea to introduce pascalese range/set type.

 Sometimes it sounds like a nice idea to make all arithmetic operators
 like "div: Int x Int -> Int" for each integral type Int.  Only it
 isn't so.
 
 Case 1:  current machine architectures have operations like 
 "mul: Int32 x Int32 -> Int64" and
 "div: Int64 x Int32 -> Int32 x Int32" (both dividend and remainder
 computed);



Another nice case for tuples.


Basically yes.

 Case 2: integer division actually produces a rational number, and
 probably the technically most feasible solution would be using floats.

ieek, i've been looking for a new word for "extended" and ot the wrong 
one, i meant "real". "float" is very limited and should not be used for 
intermediates.

 But I guess that some C compatibility is in order sometimes.  At least
 it might make the language more comfortable to some people.

"seem to be more comfortable" :)
But for a C successor it's kind of vital.

One thing i find very important to implement is a "smart union", which
knows is current state. It would fix the unsafety of the normal union
and make serialisation possible.

 
 
 In C++, this can be achieved with metaprogramming (
 http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Variant)
 although it isn't very simple.  An ideal language would provide the
 right abstractions for implementing things like a smart union, and
 then include it in the standard library.  (I can dream, can't I...)

Nope. Some languages don't include a union at all, but insteat a smart 
union. Dynamic lanuages usually don't have stuff like that at all, since 
you always can change a variables's type and check types.

 D gets better and better with time (by which I mean delegates and
 everything)... but there still are couple of feature requests that
 keep coming up.  Perhaps some kind of wiki, bugzilla or similar would
 be suitable for tracking feature requests and storing comments,
 possibly even voting for them.

Noone reads these. That's why we have this newsgroup :)

-i.


I second the wiki suggestion (I've suggested that some time ago). At least we
could keep related discussions together. Right now we keep having the same
discussions about certain stuff.

Best regards,
Daniel Yokomiso.

"Beware of bugs in the above code; I have only proved it correct, not tried it."
 - Donald Knuth (in a memo to Peter van Emde Boas)

Mar 03 2003

Ilya Minkov <midiclub 8ung.at> writes:

Daniel Yokomiso wrote:
 In article <b3ub9r$shi$1 digitaldaemon.com>, Ilya Minkov says...
 - if a function is external (ie no source is available for it), it
 is impure. Unless its somewhere explicitly stated otherwise. (ie
 const qualifier in declaration?) - function body is scanned for
 variable acesses. If it acesses any global variables that are not
 constant, it is unpure. Even if it only reads them, becuase some
 other function might have modified them, making a function yuild
 inconsistent results. - function body is scanned for function
 calls. All these functions must also qualify to be pure, else this
 one isn't.

 
 As an side note, this is similar to Hindley-Milner type inference
 algorithm. Its complexity is big (IIRC it's greater than exponential
 space), so it mays lead to larger compile-times if the compiler can't
 annotate the object code with purity marks.

It's not all that clear to me why. You have to scan every function, so 
that a compiled unit has all safe functions qualified as being safe. You 
just have to scan them in the correct order, which is identified by 
recursion.
The only real problem would be to resolve mutually recursive functions. 
So far there are no mutually recursives or they are left "impure", i 
assume the time complexity would be near linear.

Well, someone might one day come up with additional pass of analysis for 
mutually-recursive functions, and that would also resolve that 
problem... With an impact at complexity. I'm not sure, what complexity 
does it take to identify loopings in a directed graph? (HUGE?)
I assume that mutual recursion analysis can be limited to groups of only 
2-3 functions, and that would already cover usual cases. It doesn't have 
to be exhaustive you know, as opposed to type inference.

BTW, doesn't OCaml somehow circumvent doing the complete Hindley-Milner 
analysis for types?

-i.

Mar 03 2003

Bill Cox <bill viasic.com> writes:

Ilya Minkov wrote:


...


 Well, someone might one day come up with additional pass of analysis for 
 mutually-recursive functions, and that would also resolve that 
 problem... With an impact at complexity. I'm not sure, what complexity 
 does it take to identify loopings in a directed graph? (HUGE?)
 I assume that mutual recursion analysis can be limited to groups of only 
 2-3 functions, and that would already cover usual cases. It doesn't have 
 to be exhaustive you know, as opposed to type inference.

I can help you there.  Breaking loops in a directed graph requires only 
a couple simple linear passes.  Here's some pseudo code:

breakLoops(Graph graph) {
     Node node;
     clearNodeFlags(graph);
     foreach(graph, node) {
         if(!node.visited) {
             breakLoopsFromNode(node);
         }
     }
}

clearVisitedNodes(Graph graph)
{
     Node node;

     foreach(graph, node) {
         node.visited = false;
         node.marked = false;
     }
}

breakLoopsFromNode(Node node)
{
     Node otherNode;
     Edge edge;

     node.visited = true;
     node.marked = true;
     foreach(node.outEdges, edge) {
         otherNode = edge.toNode;
         if(otherNode.marked) {
             edge.isLoopEdge = true; // Here's where you break loops
         } else if(!otherNode.visited) {
             breakLoopsFromNode(otherNode);
         }
     }
     node.marked = false;
}

-- Bill

Mar 03 2003

"Daniel Yokomiso" <daniel_yokomiso yahoo.com.br> writes:

"Ilya Minkov" <midiclub 8ung.at> escreveu na mensagem
news:b40g4b$25gf$1 digitaldaemon.com...
 Daniel Yokomiso wrote:
 In article <b3ub9r$shi$1 digitaldaemon.com>, Ilya Minkov says...
 - if a function is external (ie no source is available for it), it
 is impure. Unless its somewhere explicitly stated otherwise. (ie
 const qualifier in declaration?) - function body is scanned for
 variable acesses. If it acesses any global variables that are not
 constant, it is unpure. Even if it only reads them, becuase some
 other function might have modified them, making a function yuild
 inconsistent results. - function body is scanned for function
 calls. All these functions must also qualify to be pure, else this
 one isn't.

 As an side note, this is similar to Hindley-Milner type inference
 algorithm. Its complexity is big (IIRC it's greater than exponential
 space), so it mays lead to larger compile-times if the compiler can't
 annotate the object code with purity marks.

 It's not all that clear to me why. You have to scan every function, so
 that a compiled unit has all safe functions qualified as being safe. You
 just have to scan them in the correct order, which is identified by
 recursion.

As I said "if the compiler CAN'T annotate the object code with purity
marks". If it annotate every module, than it's ok. There's still a problem
with delegate parameters: a function may be pure if a delegate paramter is
pure, but if the parameter isn't, than it can be pure. Like higher-order
functions like map, fold, filter, etc..

 The only real problem would be to resolve mutually recursive functions.
 So far there are no mutually recursives or they are left "impure", i
 assume the time complexity would be near linear.

 Well, someone might one day come up with additional pass of analysis for
 mutually-recursive functions, and that would also resolve that
 problem... With an impact at complexity. I'm not sure, what complexity
 does it take to identify loopings in a directed graph? (HUGE?)
 I assume that mutual recursion analysis can be limited to groups of only
 2-3 functions, and that would already cover usual cases. It doesn't have
 to be exhaustive you know, as opposed to type inference.

 BTW, doesn't OCaml somehow circumvent doing the complete Hindley-Milner
 analysis for types?

I don't know about this, but I guess they don't.

 -i.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.459 / Virus Database: 258 - Release Date: 25/2/2003

Mar 03 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

 Other things to consider:

 - should functions return multiple values
 - if f returns two values, what does "a, b = f()" mean?
 - if f returns two values, waht does "a = f()" mean?
 - if f returns one value, what does "a, b = f(), y()" mean?
 - do these features fit into the grammar seamlessly?

look at LUA (www.lua.org) the only thing that is require is a change in the
comma operator
and as order is not important why have an operator that enforces order :)

Mar 03 2003

D Programming

C/C++ Programming

Other

D - Ideas for language