www.digitalmars.com         C & C++   DMDScript  

D - Ideas for language

reply Marko Tintor marko pkj.co.yu writes:
1) shorter relational expressions
it is easyer to write
a <= b < c == d
instead of
a <= b && b < c && c == d
+ b and c are evaluated only once

2) multiple assignment
a,b,c = A,B,C;
is executed like this:
A,B,C is evaluated from left to right
then values are assigned to a,b,c from left to right
(a=A, b=B, c=C)

3) ^ power, =>, <= and <=> logical operators
8^4 ... 8*8*8*8
a => b ... !a || b
a <= b ... b => a
a <=> b ... (a => b) && (a <= b)

4) optimization idea
if function F has no side effect and its arguments are constants
it can be computed at compile time

5) better swich
old swich:
switch(exp0)
{  
case constexpr1: command1
case constexpr2: command2
case constexpr3: command3
default: command4
}
is expanded:
if(exp0 == constexpr1) goto label1;
if(exp0 == constexpr2) goto label2;
if(exp0 == constexpr3) goto label3;
goto label4;
label1: command1;
label2: command2;
label3: command3;
label4: command4;

better switch:
switch(exp0)
case(exp1) command1
case(exp2) command2
case(exp3) command3
else command4
is expanded:
if(exp0 == exp1) command1 else
if(exp0 == exp2) command2 else
if(exp0 == exp3) command3 else
command4
Mar 01 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Welcome stranger!

Marko wrote:
 1) shorter relational expressions
 it is easyer to write
 a <= b < c == d
 instead of
 a <= b && b < c && c == d
 + b and c are evaluated only once

Hm. Requieres some thought on whether that's implementable. RealtionalExp -> [Exp RelOp]* Exp. Hm.
 2) multiple assignment
 a,b,c = A,B,C;
 is executed like this:
 A,B,C is evaluated from left to right
 then values are assigned to a,b,c from left to right
 (a=A, b=B, c=C)

We've gone throught such things a couple of times. The question is: can you (or anyone) find examples which would make this feature useful.
 3) ^ power, =>, <= and <=> logical operators
 8^4 ... 8*8*8*8
 a => b ... !a || b
 a <= b ... b => a
 a <=> b ... (a => b) && (a <= b)

Yes. However ^ is taken by XOR. How about ** ? I also wanted that division int/int=float, and to have a separate integer division operator. This would decimate the number of stupid numeric bugs, and would not yuild to many new bugs because a compiler would warn about a type mismatch.
 4) optimization idea
 if function F has no side effect and its arguments are constants
 it can be computed at compile time

Many, many people have had this good idea. :) This would also give a number of other interesting things. Example: regexp could be saved in a program in a compiled form, instead of translating them at run-time. Though it doesn't buy much with regexp, it would with larger interpreted sub-languages. I guess there's some problem checking purity of functions though. It would mean a need for recursive analysis of the (almost) whole program. All functions would qualify, which only access constant globals if any, and only functions also qualifed if any. Can a recursive set of functions be qualified?
 5) better swich
 old swich:
 switch(exp0)
 {  
 case constexpr1: command1
 case constexpr2: command2
 case constexpr3: command3
 default: command4
 }
 is expanded:
 if(exp0 == constexpr1) goto label1;
 if(exp0 == constexpr2) goto label2;
 if(exp0 == constexpr3) goto label3;
 goto label4;
 label1: command1;
 label2: command2;
 label3: command3;
 label4: command4;

 better switch:
 switch(exp0)
 case(exp1) command1
 case(exp2) command2
 case(exp3) command3
 else command4
 is expanded:
 if(exp0 == exp1) command1 else
 if(exp0 == exp2) command2 else
 if(exp0 == exp3) command3 else
 command4

One thing: a switch is NEVER CONVERTED TO IF's!!! It is a mean to create a jump table: "take an input value, make some math on it which yuilds an index, make a table lookup, jump to the adress noted in the table". That's why switch is so darned efficient - only a few CPU cycles!!! And you see why it has such a semantics in C. What you mean here, is that a "break" is implicit. Walter doesn't want it. He's not exactly a young person, and i guess he suspects if he does it, he'd be having bugs because it doesn't work the "good old C way". And many other programmers as well. He'll rather give you a better compiler which will tell you whenever you're missing a "break". I guess that'd also be his argument to division. A compiler cannot warn you with division though. There's another problem to it: how do you check for ranges in "switch"? Now you can write "case a: case b: case c" else you would be thinking out a new syntax on it. Which could be like array slicing, or inclusive... Which would anyway cause programmers making huge slices and bloating a jump table, which is not a good idea at all. One thing i find very important to implement is a "smart union", which knows is current state. It would fix the unsafety of the normal union and make serialisation possible. -i.
Mar 01 2003
parent reply Antti Sykari <jsykari gamma.hut.fi> writes:
Many of the suggestions below have been given earlier.

However, perhaps it's good to repeat why some of them are worth
considering.

Ilya Minkov <midiclub 8ung.at> writes:
 Marko wrote:
 1) shorter relational expressions
 it is easyer to write
 a <= b < c == d
 instead of
 a <= b && b < c && c == d
 + b and c are evaluated only once


First of all, I think that here is a useful feature. At least for the relational (<, <=, >=, >) expression part. "a < b < c" is semantically obvious to everyone. It's compact and more readable than "a < b && b < c", and causes less bugs: people do write things like "1 < a < 5" in accident, and get bitten. Most importantly, it's comfortable to use. The only downside IMO is that there are some subtle issues to decide and hence the feature requires some time to design and implement. Such as: - Is the operation short-circuited? (Adding another short-circuited "operator" in the language -- but why not?) - Are the operations evaluated only once? (Logically, yes, since they are written in the source code only once) - How does this interact with operator overloading? (My guess would be "translate to a<b && b<c && ... first, then do overloading) - Apply that to == and != (and === and !==), too? How about "a < b == x < y"? (I wouldn't) Someone might also say that all extra features are bad, because they require time and effort to learn and teach and write about. For example, the old complaint that there are 4 ways to increment a variable in C. But this applies only to features which aren't obviously easy to use, and this one is. Also I could envision it to be close-to obvious to implement.
 Hm. Requieres some thought on whether that's implementable.

 RealtionalExp -> [Exp RelOp]* Exp.

Most probably everything is implementable :) Maybe not in a straightforward manner (like direct translation from a<b<c to a<b&&b<c -- that would cause the duplication of the expressions, which is not probably desired), but implementable anyway. Probably a change in the back-end/intermediate representation of the compiler to take into account sequential CmpExps and do things like "eval the first and second; compare; if false, quit; eval the third; compare with the second; etc."
 2) multiple assignment
 a,b,c = A,B,C;
 is executed like this:
 A,B,C is evaluated from left to right
 then values are assigned to a,b,c from left to right
 (a=A, b=B, c=C)


This has also subtle issues. Joe R. Newbie will try this (or as swap is likely to be implemented as a library routine, something similar): void swap(inout int a, inout int b) { a, b = b, a; } which translates to: a = b; b = a; and leads to problems. Should this be the default behavior or something which first assigns to temporary values? Other things to consider: - should functions return multiple values - if f returns two values, what does "a, b = f()" mean? - if f returns two values, waht does "a = f()" mean? - if f returns one value, what does "a, b = f(), y()" mean? - do these features fit into the grammar seamlessly?
 We've gone throught such things a couple of times. The question is:
 can you (or anyone) find examples which would make this feature useful.

With the presence of "out" parameters, I'm not sure that assigning multiple values is that useful. But hey, at least it looks cool :-)
 3) ^ power, =>, <= and <=> logical operators
 8^4 ... 8*8*8*8
 a => b ... !a || b
 a <= b ... b => a
 a <=> b ... (a => b) && (a <= b)


If I were implementing a language from scratch, I'd probably use ^ as an exponentiation operator. ^ doesn't particularly look like "xor" if you haven't done much bit-level C programming. (Nor does | look like "or" but that's another issue altogether...) ** clashes with multiplication combined with pointer dereference. <= also clashes with the "less-or-equal" operator. "<-", "->", and "<->" could be used if logical operators were required. (At least if -> pointer syntax were to be demolished) But still, there's the issue of: a<-b; // a <- b or a < -b ?
 4) optimization idea
 if function F has no side effect and its arguments are constants
 it can be computed at compile time

Many, many people have had this good idea. :) This would also give a number of other interesting things. Example: regexp could be saved in a program in a compiled form, instead of translating them at run-time. Though it doesn't buy much with regexp, it would with larger interpreted sub-languages. I guess there's some problem checking purity of functions though. It would mean a need for recursive analysis of the (almost) whole program. All functions would qualify, which only access constant globals if any, and only functions also qualifed if any. Can a recursive set of functions be qualified?

Having pure compile-time functions would be very neat, but effectively it would require a D compiler to be a D interpreter at the same time. Dunno about the purity checking. I suppose you will very soon find out the purity of a function when the compiler starts interpreting a function and eventually tries to read a global variable, format your hard disk, or send a network packet, or something else "impure" ;) An alternative would be to require some different kind of syntax for pure functions, effectively making them to be of different type of functions. And all functions that are pure should be declared pure, or we have yet again the annoying problem I mentioned in the other post: we really know we have pure functions, but we can't use them because they are not declared pure. [switch]
 One thing: a switch is NEVER CONVERTED TO IF's!!! It is a mean to
 create a jump table: "take an input value, make some math on it which
 yuilds an index, make a table lookup, jump to the adress noted in the
 table". That's why switch is so darned efficient - only a few CPU
 cycles!!! And you see why it has such a semantics in C.

It just *might* be converted to if statements, though, and you'll never know unless you disassemble the object code ;)
 I guess that'd also be his argument to division. A compiler cannot
 warn you with division though.

Sometimes it sounds like a nice idea to make all arithmetic operators like "div: Int x Int -> Int" for each integral type Int. Only it isn't so. Case 1: current machine architectures have operations like "mul: Int32 x Int32 -> Int64" and "div: Int64 x Int32 -> Int32 x Int32" (both dividend and remainder computed); Case 2: integer division actually produces a rational number, and probably the technically most feasible solution would be using floats. But I guess that some C compatibility is in order sometimes. At least it might make the language more comfortable to some people.
 There's another problem to it: how do you check for ranges in
 "switch"? Now you can write "case a: case b: case c" else you would be
 thinking out a new syntax on it. Which could be like array slicing, or
 inclusive... Which would anyway cause programmers making huge slices
 and bloating a jump table, which is not a good idea at all.

Sounds like a good place for syntax "case 1..5:"
 One thing i find very important to implement is a "smart union", which
 knows is current state. It would fix the unsafety of the normal union
 and make serialisation possible.

In C++, this can be achieved with metaprogramming ( http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Variant) although it isn't very simple. An ideal language would provide the right abstractions for implementing things like a smart union, and then include it in the standard library. (I can dream, can't I...) D gets better and better with time (by which I mean delegates and everything)... but there still are couple of feature requests that keep coming up. Perhaps some kind of wiki, bugzilla or similar would be suitable for tracking feature requests and storing comments, possibly even voting for them. -Antti
Mar 01 2003
next sibling parent "Achillefs Margaritis" <axilmar b-online.gr> writes:
ADA has "smart unions" named "discrimininants". In the declaration of the
object the programmer can declare a record that is parameterised according
to passed value. For example:

record variant(value:integer) is
    case value
        when 0 =>
            param1:string;
        when 1 =>
            param1:integer;
    end case;
end record;

stringvar: variant(0);
intvar: variant(1);

"Antti Sykari" <jsykari gamma.hut.fi> wrote in message
news:87el5q8srb.fsf hoastest1-8c.hoasnet.inet.fi...
 Many of the suggestions below have been given earlier.

 However, perhaps it's good to repeat why some of them are worth
 considering.

 Ilya Minkov <midiclub 8ung.at> writes:
 Marko wrote:
 1) shorter relational expressions
 it is easyer to write
 a <= b < c == d
 instead of
 a <= b && b < c && c == d
 + b and c are evaluated only once


First of all, I think that here is a useful feature. At least for the relational (<, <=, >=, >) expression part. "a < b < c" is semantically obvious to everyone. It's compact and more readable than "a < b && b < c", and causes less bugs: people do write things like "1 < a < 5" in accident, and get bitten. Most importantly, it's comfortable to use. The only downside IMO is that there are some subtle issues to decide and hence the feature requires some time to design and implement. Such as: - Is the operation short-circuited? (Adding another short-circuited "operator" in the language -- but why not?) - Are the operations evaluated only once? (Logically, yes, since they are written in the source code only once) - How does this interact with operator overloading? (My guess would be "translate to a<b && b<c && ... first, then do overloading) - Apply that to == and != (and === and !==), too? How about "a < b == x < y"? (I wouldn't) Someone might also say that all extra features are bad, because they require time and effort to learn and teach and write about. For example, the old complaint that there are 4 ways to increment a variable in C. But this applies only to features which aren't obviously easy to use, and this one is. Also I could envision it to be close-to obvious to implement.
 Hm. Requieres some thought on whether that's implementable.

 RealtionalExp -> [Exp RelOp]* Exp.

Most probably everything is implementable :) Maybe not in a straightforward manner (like direct translation from a<b<c to a<b&&b<c -- that would cause the duplication of the expressions, which is not probably desired), but implementable anyway. Probably a change in the back-end/intermediate representation of the compiler to take into account sequential CmpExps and do things like "eval the first and second; compare; if false, quit; eval the third; compare with the second; etc."
 2) multiple assignment
 a,b,c = A,B,C;
 is executed like this:
 A,B,C is evaluated from left to right
 then values are assigned to a,b,c from left to right
 (a=A, b=B, c=C)


This has also subtle issues. Joe R. Newbie will try this (or as swap is likely to be implemented as a library routine, something similar): void swap(inout int a, inout int b) { a, b = b, a; } which translates to: a = b; b = a; and leads to problems. Should this be the default behavior or something which first assigns to temporary values? Other things to consider: - should functions return multiple values - if f returns two values, what does "a, b = f()" mean? - if f returns two values, waht does "a = f()" mean? - if f returns one value, what does "a, b = f(), y()" mean? - do these features fit into the grammar seamlessly?
 We've gone throught such things a couple of times. The question is:
 can you (or anyone) find examples which would make this feature useful.

With the presence of "out" parameters, I'm not sure that assigning multiple values is that useful. But hey, at least it looks cool :-)
 3) ^ power, =>, <= and <=> logical operators
 8^4 ... 8*8*8*8
 a => b ... !a || b
 a <= b ... b => a
 a <=> b ... (a => b) && (a <= b)


If I were implementing a language from scratch, I'd probably use ^ as an exponentiation operator. ^ doesn't particularly look like "xor" if you haven't done much bit-level C programming. (Nor does | look like "or" but that's another issue altogether...) ** clashes with multiplication combined with pointer dereference. <= also clashes with the "less-or-equal" operator. "<-", "->", and "<->" could be used if logical operators were required. (At least if -> pointer syntax were to be demolished) But still, there's the issue of: a<-b; // a <- b or a < -b ?
 4) optimization idea
 if function F has no side effect and its arguments are constants
 it can be computed at compile time

Many, many people have had this good idea. :) This would also give a number of other interesting things. Example: regexp could be saved in a program in a compiled form, instead of translating them at run-time. Though it doesn't buy much with regexp, it would with larger interpreted sub-languages. I guess there's some problem checking purity of functions though. It would mean a need for recursive analysis of the (almost) whole program. All functions would qualify, which only access constant globals if any, and only functions also qualifed if any. Can a recursive set of functions be qualified?

Having pure compile-time functions would be very neat, but effectively it would require a D compiler to be a D interpreter at the same time. Dunno about the purity checking. I suppose you will very soon find out the purity of a function when the compiler starts interpreting a function and eventually tries to read a global variable, format your hard disk, or send a network packet, or something else "impure" ;) An alternative would be to require some different kind of syntax for pure functions, effectively making them to be of different type of functions. And all functions that are pure should be declared pure, or we have yet again the annoying problem I mentioned in the other post: we really know we have pure functions, but we can't use them because they are not declared pure. [switch]
 One thing: a switch is NEVER CONVERTED TO IF's!!! It is a mean to
 create a jump table: "take an input value, make some math on it which
 yuilds an index, make a table lookup, jump to the adress noted in the
 table". That's why switch is so darned efficient - only a few CPU
 cycles!!! And you see why it has such a semantics in C.

It just *might* be converted to if statements, though, and you'll never know unless you disassemble the object code ;)
 I guess that'd also be his argument to division. A compiler cannot
 warn you with division though.

Sometimes it sounds like a nice idea to make all arithmetic operators like "div: Int x Int -> Int" for each integral type Int. Only it isn't so. Case 1: current machine architectures have operations like "mul: Int32 x Int32 -> Int64" and "div: Int64 x Int32 -> Int32 x Int32" (both dividend and remainder computed); Case 2: integer division actually produces a rational number, and probably the technically most feasible solution would be using floats. But I guess that some C compatibility is in order sometimes. At least it might make the language more comfortable to some people.
 There's another problem to it: how do you check for ranges in
 "switch"? Now you can write "case a: case b: case c" else you would be
 thinking out a new syntax on it. Which could be like array slicing, or
 inclusive... Which would anyway cause programmers making huge slices
 and bloating a jump table, which is not a good idea at all.

Sounds like a good place for syntax "case 1..5:"
 One thing i find very important to implement is a "smart union", which
 knows is current state. It would fix the unsafety of the normal union
 and make serialisation possible.

In C++, this can be achieved with metaprogramming ( http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Variant) although it isn't very simple. An ideal language would provide the right abstractions for implementing things like a smart union, and then include it in the standard library. (I can dream, can't I...) D gets better and better with time (by which I mean delegates and everything)... but there still are couple of feature requests that keep coming up. Perhaps some kind of wiki, bugzilla or similar would be suitable for tracking feature requests and storing comments, possibly even voting for them. -Antti

Mar 02 2003
prev sibling next sibling parent reply Ilya Minkov <ilminkov planet-interkom.de> writes:
Antti Sykari wrote:
 Many of the suggestions below have been given earlier.
 
 However, perhaps it's good to repeat why some of them are worth
 considering.
 
 First of all, I think that here is a useful feature. At least for the
 relational (<, <=, >=, >) expression part.
 
 "a < b < c" is semantically obvious to everyone.  It's compact and
 more readable than "a < b && b < c", and causes less bugs: people do
 write things like "1 < a < 5" in accident, and get bitten.
 
 Most importantly, it's comfortable to use.

And it has been implemented in Python because of "being pretty obvious". Python is a language that allows to do obvious things the obvious way. I'll write an article, summarizing all unusual decisions made in its design and post it here. D could learn something from it.
 The only downside IMO is that there are some subtle issues to decide
 and hence the feature requires some time to design and implement.
 Such as:
 
 - Is the operation short-circuited?  (Adding another short-circuited
 "operator" in the language -- but why not?)
 - Are the operations evaluated only once?  (Logically, yes, since they
 are written in the source code only once)

Evaluation order has always been undefined. I guess it also doesn't state how many times a function is called if it's in the same sequence group (C defines "sequence points", remember?). You could only safely use pure functions in expressions so far, and that's how it has to stay.
 - How does this interact with operator overloading?  (My guess would
 be "translate to a<b && b<c && ... first, then do overloading)
 - Apply that to == and != (and === and !==), too?  How about
 "a < b == x < y"?  (I wouldn't)

Ouch.
 Someone might also say that all extra features are bad, because they
 require time and effort to learn and teach and write about.  For
 example, the old complaint that there are 4 ways to increment a
 variable in C.  But this applies only to features which aren't
 obviously easy to use, and this one is.  Also I could envision it to
 be close-to obvious to implement.

No, this point has not been really criticised, because it's usually obvious when you increment vars, one way or another. It has been criticised, that there are generally too many ways to make obvious things, making them unobvious at first sight. All almost equally bad.
2) multiple assignment
a,b,c = A,B,C;
is executed like this:
A,B,C is evaluated from left to right
then values are assigned to a,b,c from left to right
(a=A, b=B, c=C)


This has also subtle issues. Joe R. Newbie will try this (or as swap is likely to be implemented as a library routine, something similar): void swap(inout int a, inout int b) { a, b = b, a; } which translates to: a = b; b = a; and leads to problems. Should this be the default behavior or something which first assigns to temporary values?

No way should it work like that!!! Such a feature, if introduced, should work the *obvious* way. This feature has been known as tuples in other languages, and should work the same way.
 Other things to consider:

They have to mean obvious things. In general, Daniel Yokomiso has already made some thoughts on this topic, and he has probably come up with some suitable solution. I'll have to take a look at it, or we could simply ask him. He develops his own impure functional language, which could supersede Haskell and OCaml. :)
 - should functions return multiple values

Yes. Tuples.
 - if f returns two values, what does "a, b = f()" mean?

The obvious.
 - if f returns two values, waht does "a = f()" mean?

Error. Discarded values have been already a major plaque in C. You forget function call parenthesis - and whoops! I guess you know that. If someone means to use one return value, then he should state that, like "a, null = f()"
 - if f returns one value, what does "a, b = f(), y()" mean?

Error.
 - do these features fit into the grammar seamlessly?

Dunno. There should be a way to make them fit into grammar. I have not read compiler sources yet, and i'm going to. And i might then do what Burton promised, but has not done so far: write documentation on them.
Yes. However ^ is taken by XOR. How about ** ?

an exponentiation operator. ^ doesn't particularly look like "xor" if you haven't done much bit-level C programming. (Nor does | look like "or" but that's another issue altogether...) ** clashes with multiplication combined with pointer dereference.

It's no problem. You can't write "x+++++y", you have to separate it with spaces - now, you could requere that "* *" is mul and dereference, and "**" is power. I also proposed once that if you have a function with 2 parameters and one return value, that it can be called like "y = a 'fun' b" which expands to "y = fun(a, b)". Some lexical mean is probably requiered to recognise in-fix functions.
 <= also clashes with the "less-or-equal" operator.

I'm a fool, i've notised that too late.
 
 "<-", "->", and "<->" could be used if logical operators were
 required.  (At least if -> pointer syntax were to be demolished)
 But still, there's the issue of:
 
 a<-b; // a <- b  or a < -b ?

It's not a problem, it's a decision question. It's solved the same way as so far.
 Having pure compile-time functions would be very neat, but effectively
 it would require a D compiler to be a D interpreter at the same time.
 Dunno about the purity checking.  I suppose you will very soon find
 out the purity of a function when the compiler starts interpreting a
 function and eventually tries to read a global variable, format your
 hard disk, or send a network packet, or something else "impure" ;)

No, that's really not a problem. Do you know how compiler's semantic analyser and the constant wrappers work? No, not exactly interpreters, but are very close to that. I have mentioned the very clean criteriums for a purity of a function, here i re-formulate them in a more straightforward form: - if a function is external (ie no source is available for it), it is impure. Unless its somewhere explicitly stated otherwise. (ie const qualifier in declaration?) - function body is scanned for variable acesses. If it acesses any global variables that are not constant, it is unpure. Even if it only reads them, becuase some other function might have modified them, making a function yuild inconsistent results. - function body is scanned for function calls. All these functions must also qualify to be pure, else this one isn't.
 An alternative would be to require some different kind of syntax for
 pure functions, effectively making them to be of different type of
 functions.  And all functions that are pure should be declared pure,
 or we have yet again the annoying problem I mentioned in the other
 post: we really know we have pure functions, but we can't use them
 because they are not declared pure.

const qualifier on return type? I can't imgine of anything else it could mean.
 
 [switch]
 
 It just *might* be converted to if statements, though, and you'll
 never know unless you disassemble the object code ;)

Urgh, well, initial math contains IFs, but the switch body generally doesn't. Though sure it might, it's not the basic idea of it. But when you allow for range syntax a..b, there would be a real need for a compiler to split these switches into several jump tables connected with IFs. Though this might be good cause it could make the source more terse and reduce the probability of a bug... BUT THEN it would also be a good idea to introduce pascalese range/set type.
 Sometimes it sounds like a nice idea to make all arithmetic operators
 like "div: Int x Int -> Int" for each integral type Int.  Only it
 isn't so.
 
 Case 1:  current machine architectures have operations like 
 "mul: Int32 x Int32 -> Int64" and
 "div: Int64 x Int32 -> Int32 x Int32" (both dividend and remainder
 computed);

Basically yes.
 Case 2: integer division actually produces a rational number, and
 probably the technically most feasible solution would be using floats.

ieek, i've been looking for a new word for "extended" and ot the wrong one, i meant "real". "float" is very limited and should not be used for intermediates.
 But I guess that some C compatibility is in order sometimes.  At least
 it might make the language more comfortable to some people.

"seem to be more comfortable" :) But for a C successor it's kind of vital.
One thing i find very important to implement is a "smart union", which
knows is current state. It would fix the unsafety of the normal union
and make serialisation possible.

In C++, this can be achieved with metaprogramming ( http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Variant) although it isn't very simple. An ideal language would provide the right abstractions for implementing things like a smart union, and then include it in the standard library. (I can dream, can't I...)

Nope. Some languages don't include a union at all, but insteat a smart union. Dynamic lanuages usually don't have stuff like that at all, since you always can change a variables's type and check types.
 D gets better and better with time (by which I mean delegates and
 everything)... but there still are couple of feature requests that
 keep coming up.  Perhaps some kind of wiki, bugzilla or similar would
 be suitable for tracking feature requests and storing comments,
 possibly even voting for them.

Noone reads these. That's why we have this newsgroup :) -i.
Mar 02 2003
parent reply Daniel Yokomiso <Daniel_member pathlink.com> writes:
Hi,

Comments embedded.

In article <b3ub9r$shi$1 digitaldaemon.com>, Ilya Minkov says...
Antti Sykari wrote:
 Many of the suggestions below have been given earlier.
 
 However, perhaps it's good to repeat why some of them are worth
 considering.
 
 First of all, I think that here is a useful feature. At least for the
 relational (<, <=, >=, >) expression part.
 
 "a < b < c" is semantically obvious to everyone.  It's compact and
 more readable than "a < b && b < c", and causes less bugs: people do
 write things like "1 < a < 5" in accident, and get bitten.
 
 Most importantly, it's comfortable to use.

And it has been implemented in Python because of "being pretty obvious". Python is a language that allows to do obvious things the obvious way. I'll write an article, summarizing all unusual decisions made in its design and post it here. D could learn something from it.
 The only downside IMO is that there are some subtle issues to decide
 and hence the feature requires some time to design and implement.
 Such as:
 
 - Is the operation short-circuited?  (Adding another short-circuited
 "operator" in the language -- but why not?)
 - Are the operations evaluated only once?  (Logically, yes, since they
 are written in the source code only once)

Evaluation order has always been undefined. I guess it also doesn't state how many times a function is called if it's in the same sequence group (C defines "sequence points", remember?). You could only safely use pure functions in expressions so far, and that's how it has to stay.
 - How does this interact with operator overloading?  (My guess would
 be "translate to a<b && b<c && ... first, then do overloading)
 - Apply that to == and != (and === and !==), too?  How about
 "a < b == x < y"?  (I wouldn't)

Ouch.

Hmmm, IMO we could just copy Icon ( http://www.cs.arizona.edu/icon/ ) generators, at least a piece of them, to implement this correctly, without leaving semantic problems. It works like this: if a < b == x < y then write ("Ok!") This evaluates left to right: 1 - if a < b it returns b, else it fails (failure is a generator thing). 2 - using the b value returned it compares to x. If they're equal, it returns x value, else it fails. 3 - using the x value returned it compares to y. If less then, it returns y, else fails. 4 - if a value was returned it continues to the then part. The nice thing about this is that using success/failure instead of true vs. false for relational expressions lead to better syntax and semantics. In Icon one can write: if y < (x | 5) then write("y=", y) and it'll work correctly, comparing y with x and with 5. Also more powerful stuff can be written: if (a | b | c) = (d | e | f) then write("Ok!") Of course this would lead to big changes in D, but it can do many obvious things possible (like the "y < (x | 5)" stuff). It also can be used to implement multiple return values with same type.
 Someone might also say that all extra features are bad, because they
 require time and effort to learn and teach and write about.  For
 example, the old complaint that there are 4 ways to increment a
 variable in C.  But this applies only to features which aren't
 obviously easy to use, and this one is.  Also I could envision it to
 be close-to obvious to implement.

No, this point has not been really criticised, because it's usually obvious when you increment vars, one way or another. It has been criticised, that there are generally too many ways to make obvious things, making them unobvious at first sight. All almost equally bad.
2) multiple assignment
a,b,c = A,B,C;
is executed like this:
A,B,C is evaluated from left to right
then values are assigned to a,b,c from left to right
(a=A, b=B, c=C)


This has also subtle issues. Joe R. Newbie will try this (or as swap is likely to be implemented as a library routine, something similar): void swap(inout int a, inout int b) { a, b = b, a; } which translates to: a = b; b = a; and leads to problems. Should this be the default behavior or something which first assigns to temporary values?

No way should it work like that!!! Such a feature, if introduced, should work the *obvious* way. This feature has been known as tuples in other languages, and should work the same way.
 Other things to consider:

They have to mean obvious things. In general, Daniel Yokomiso has already made some thoughts on this topic, and he has probably come up with some suitable solution. I'll have to take a look at it, or we could simply ask him. He develops his own impure functional language, which could supersede Haskell and OCaml. :)

I cheat ;-) It has some simple solutions for this, like letting the compiler create all the temporary variables and deal with any evaluation order problems, like "int x, y = i++, i++;". Eon has no side-effects in expressions, so it can get away with tuples. D has to deal with this problems. But I think that tuples are a nice thing to have, including tuple constructors (bind them together) and tuple deconstructors (tear them apart). Using an iterative fibonnaci solution: int fib(int n) in { assert(n > 0); } { int a, b = 0, 1; for (int i = 0; i < n; i++) { a, b = b, a + b; } return a; } tuples lead to cleaner syntax, without temp variables. This is toy code, but it's pretty :-)
 - should functions return multiple values

Yes. Tuples.
 - if f returns two values, what does "a, b = f()" mean?

The obvious.
 - if f returns two values, waht does "a = f()" mean?

Error. Discarded values have been already a major plaque in C. You forget function call parenthesis - and whoops! I guess you know that. If someone means to use one return value, then he should state that, like "a, null = f()"

Unless a is of "(int, int)" type.
 - if f returns one value, what does "a, b = f(), y()" mean?

Error.

Depends on a and b types. int f(); (int, int, int) y(); int a; (int, int, int) b; a, b = f(), y(); should compile and run ok. At least it's "pretty obvious".
 - do these features fit into the grammar seamlessly?

Dunno. There should be a way to make them fit into grammar. I have not read compiler sources yet, and i'm going to. And i might then do what Burton promised, but has not done so far: write documentation on them.
Yes. However ^ is taken by XOR. How about ** ?

an exponentiation operator. ^ doesn't particularly look like "xor" if you haven't done much bit-level C programming. (Nor does | look like "or" but that's another issue altogether...) ** clashes with multiplication combined with pointer dereference.

It's no problem. You can't write "x+++++y", you have to separate it with spaces - now, you could requere that "* *" is mul and dereference, and "**" is power. I also proposed once that if you have a function with 2 parameters and one return value, that it can be called like "y = a 'fun' b" which expands to "y = fun(a, b)". Some lexical mean is probably requiered to recognise in-fix functions.
 <= also clashes with the "less-or-equal" operator.

I'm a fool, i've notised that too late.
 
 "<-", "->", and "<->" could be used if logical operators were
 required.  (At least if -> pointer syntax were to be demolished)
 But still, there's the issue of:
 
 a<-b; // a <- b  or a < -b ?

It's not a problem, it's a decision question. It's solved the same way as so far.
 Having pure compile-time functions would be very neat, but effectively
 it would require a D compiler to be a D interpreter at the same time.
 Dunno about the purity checking.  I suppose you will very soon find
 out the purity of a function when the compiler starts interpreting a
 function and eventually tries to read a global variable, format your
 hard disk, or send a network packet, or something else "impure" ;)

No, that's really not a problem. Do you know how compiler's semantic analyser and the constant wrappers work? No, not exactly interpreters, but are very close to that. I have mentioned the very clean criteriums for a purity of a function, here i re-formulate them in a more straightforward form: - if a function is external (ie no source is available for it), it is impure. Unless its somewhere explicitly stated otherwise. (ie const qualifier in declaration?) - function body is scanned for variable acesses. If it acesses any global variables that are not constant, it is unpure. Even if it only reads them, becuase some other function might have modified them, making a function yuild inconsistent results. - function body is scanned for function calls. All these functions must also qualify to be pure, else this one isn't.

As an side note, this is similar to Hindley-Milner type inference algorithm. Its complexity is big (IIRC it's greater than exponential space), so it mays lead to larger compile-times if the compiler can't annotate the object code with purity marks.
 An alternative would be to require some different kind of syntax for
 pure functions, effectively making them to be of different type of
 functions.  And all functions that are pure should be declared pure,
 or we have yet again the annoying problem I mentioned in the other
 post: we really know we have pure functions, but we can't use them
 because they are not declared pure.

const qualifier on return type? I can't imgine of anything else it could mean.
 
 [switch]
 
 It just *might* be converted to if statements, though, and you'll
 never know unless you disassemble the object code ;)

Urgh, well, initial math contains IFs, but the switch body generally doesn't. Though sure it might, it's not the basic idea of it. But when you allow for range syntax a..b, there would be a real need for a compiler to split these switches into several jump tables connected with IFs. Though this might be good cause it could make the source more terse and reduce the probability of a bug... BUT THEN it would also be a good idea to introduce pascalese range/set type.
 Sometimes it sounds like a nice idea to make all arithmetic operators
 like "div: Int x Int -> Int" for each integral type Int.  Only it
 isn't so.
 
 Case 1:  current machine architectures have operations like 
 "mul: Int32 x Int32 -> Int64" and
 "div: Int64 x Int32 -> Int32 x Int32" (both dividend and remainder
 computed);


Another nice case for tuples.
Basically yes.

 Case 2: integer division actually produces a rational number, and
 probably the technically most feasible solution would be using floats.

ieek, i've been looking for a new word for "extended" and ot the wrong one, i meant "real". "float" is very limited and should not be used for intermediates.
 But I guess that some C compatibility is in order sometimes.  At least
 it might make the language more comfortable to some people.

"seem to be more comfortable" :) But for a C successor it's kind of vital.
One thing i find very important to implement is a "smart union", which
knows is current state. It would fix the unsafety of the normal union
and make serialisation possible.

In C++, this can be achieved with metaprogramming ( http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Variant) although it isn't very simple. An ideal language would provide the right abstractions for implementing things like a smart union, and then include it in the standard library. (I can dream, can't I...)

Nope. Some languages don't include a union at all, but insteat a smart union. Dynamic lanuages usually don't have stuff like that at all, since you always can change a variables's type and check types.
 D gets better and better with time (by which I mean delegates and
 everything)... but there still are couple of feature requests that
 keep coming up.  Perhaps some kind of wiki, bugzilla or similar would
 be suitable for tracking feature requests and storing comments,
 possibly even voting for them.

Noone reads these. That's why we have this newsgroup :) -i.

I second the wiki suggestion (I've suggested that some time ago). At least we could keep related discussions together. Right now we keep having the same discussions about certain stuff. Best regards, Daniel Yokomiso. "Beware of bugs in the above code; I have only proved it correct, not tried it." - Donald Knuth (in a memo to Peter van Emde Boas)
Mar 03 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Daniel Yokomiso wrote:
 In article <b3ub9r$shi$1 digitaldaemon.com>, Ilya Minkov says...
 - if a function is external (ie no source is available for it), it
 is impure. Unless its somewhere explicitly stated otherwise. (ie
 const qualifier in declaration?) - function body is scanned for
 variable acesses. If it acesses any global variables that are not
 constant, it is unpure. Even if it only reads them, becuase some
 other function might have modified them, making a function yuild
 inconsistent results. - function body is scanned for function
 calls. All these functions must also qualify to be pure, else this
 one isn't.

As an side note, this is similar to Hindley-Milner type inference algorithm. Its complexity is big (IIRC it's greater than exponential space), so it mays lead to larger compile-times if the compiler can't annotate the object code with purity marks.

It's not all that clear to me why. You have to scan every function, so that a compiled unit has all safe functions qualified as being safe. You just have to scan them in the correct order, which is identified by recursion. The only real problem would be to resolve mutually recursive functions. So far there are no mutually recursives or they are left "impure", i assume the time complexity would be near linear. Well, someone might one day come up with additional pass of analysis for mutually-recursive functions, and that would also resolve that problem... With an impact at complexity. I'm not sure, what complexity does it take to identify loopings in a directed graph? (HUGE?) I assume that mutual recursion analysis can be limited to groups of only 2-3 functions, and that would already cover usual cases. It doesn't have to be exhaustive you know, as opposed to type inference. BTW, doesn't OCaml somehow circumvent doing the complete Hindley-Milner analysis for types? -i.
Mar 03 2003
next sibling parent Bill Cox <bill viasic.com> writes:
Ilya Minkov wrote:


...


 Well, someone might one day come up with additional pass of analysis for 
 mutually-recursive functions, and that would also resolve that 
 problem... With an impact at complexity. I'm not sure, what complexity 
 does it take to identify loopings in a directed graph? (HUGE?)
 I assume that mutual recursion analysis can be limited to groups of only 
 2-3 functions, and that would already cover usual cases. It doesn't have 
 to be exhaustive you know, as opposed to type inference.

I can help you there. Breaking loops in a directed graph requires only a couple simple linear passes. Here's some pseudo code: breakLoops(Graph graph) { Node node; clearNodeFlags(graph); foreach(graph, node) { if(!node.visited) { breakLoopsFromNode(node); } } } clearVisitedNodes(Graph graph) { Node node; foreach(graph, node) { node.visited = false; node.marked = false; } } breakLoopsFromNode(Node node) { Node otherNode; Edge edge; node.visited = true; node.marked = true; foreach(node.outEdges, edge) { otherNode = edge.toNode; if(otherNode.marked) { edge.isLoopEdge = true; // Here's where you break loops } else if(!otherNode.visited) { breakLoopsFromNode(otherNode); } } node.marked = false; } -- Bill
Mar 03 2003
prev sibling parent "Daniel Yokomiso" <daniel_yokomiso yahoo.com.br> writes:
"Ilya Minkov" <midiclub 8ung.at> escreveu na mensagem
news:b40g4b$25gf$1 digitaldaemon.com...
 Daniel Yokomiso wrote:
 In article <b3ub9r$shi$1 digitaldaemon.com>, Ilya Minkov says...
 - if a function is external (ie no source is available for it), it
 is impure. Unless its somewhere explicitly stated otherwise. (ie
 const qualifier in declaration?) - function body is scanned for
 variable acesses. If it acesses any global variables that are not
 constant, it is unpure. Even if it only reads them, becuase some
 other function might have modified them, making a function yuild
 inconsistent results. - function body is scanned for function
 calls. All these functions must also qualify to be pure, else this
 one isn't.

As an side note, this is similar to Hindley-Milner type inference algorithm. Its complexity is big (IIRC it's greater than exponential space), so it mays lead to larger compile-times if the compiler can't annotate the object code with purity marks.

It's not all that clear to me why. You have to scan every function, so that a compiled unit has all safe functions qualified as being safe. You just have to scan them in the correct order, which is identified by recursion.

As I said "if the compiler CAN'T annotate the object code with purity marks". If it annotate every module, than it's ok. There's still a problem with delegate parameters: a function may be pure if a delegate paramter is pure, but if the parameter isn't, than it can be pure. Like higher-order functions like map, fold, filter, etc..
 The only real problem would be to resolve mutually recursive functions.
 So far there are no mutually recursives or they are left "impure", i
 assume the time complexity would be near linear.

 Well, someone might one day come up with additional pass of analysis for
 mutually-recursive functions, and that would also resolve that
 problem... With an impact at complexity. I'm not sure, what complexity
 does it take to identify loopings in a directed graph? (HUGE?)
 I assume that mutual recursion analysis can be limited to groups of only
 2-3 functions, and that would already cover usual cases. It doesn't have
 to be exhaustive you know, as opposed to type inference.

 BTW, doesn't OCaml somehow circumvent doing the complete Hindley-Milner
 analysis for types?

I don't know about this, but I guess they don't.
 -i.

--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.459 / Virus Database: 258 - Release Date: 25/2/2003
Mar 03 2003
prev sibling parent "Mike Wynn" <mike.wynn l8night.co.uk> writes:
 Other things to consider:

 - should functions return multiple values
 - if f returns two values, what does "a, b = f()" mean?
 - if f returns two values, waht does "a = f()" mean?
 - if f returns one value, what does "a, b = f(), y()" mean?
 - do these features fit into the grammar seamlessly?

comma operator and as order is not important why have an operator that enforces order :)
Mar 03 2003