www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Generalized switch statement (a la pattern matching)

reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
Hi,

Many functional languages have what is called pattern matching. That is, 
matching a value against a set of expressions, testing for exact equality.

For example, in F# (ML-ish) code:

let x = fooOrBarOrBaz()
match x with
| "foo" -> printf "got foo"
| "bar" -> printf "got bar"
| "baz" -> printf "got baz"

It's dead simple, and works more or less exactly like a switch: Walk 
through each case until a value matching x is found. Then, execute that 
branch.

There are three major differences, however:

1) If no match could be found, an exception (in D's case, this would be 
an Error-derived type) is thrown.
2) A 'default' branch is specified by matching against a plain variable 
(typically _ if unused). This always matches, since it's just a plain 
binding of the matched-against value to another variable.
3) The cases in the pattern match can be *any* expression. They don't 
have to be compile-time constants. This is extremely flexible, and some 
would argue that it is what the switch statement always should have 
been. Of course, the values in each case have to be compatible or 
implicitly convertible to the type of the value being matched against.

It is worth noting that pattern matching does not ruin compiler 
optimization of switches. It merely takes some extra effort to determine 
that all cases are constant.

Furthermore, point (2) naturally leads to the question: What happens if 
you match a value against an already-bound variable? In functional 
languages, what happens is typically shadowing (i.e. the already-bound 
variable becomes 'hidden'). In D, I think it would make more sense to 
match against the variable's value, since D doesn't have shadowing 
anywhere else AFAIK.

Additionally, binding the value to a new variable in the 'default' case 
might not make sense in D. When you hit the 'default:' label, you 
already have the switched-upon value in scope anyway.

Another trait of pattern matching in functional languages is that a 
pattern match is usually an expression. This means that whatever value 
is returned from the matched case (if any) is the result of the pattern 
match expression. I don't think that this would 'fit in' in D. Then we'd 
have to make if-then-else, while, for, foreach, do-while, etc 
expressions too, for consistency, which doesn't really seem at home in 
an imperative language.

Furthermore, there is the issue of introducing a new keyword. I don't 
think this is a good idea, especially not a common word like "match" or 
something along those lines (and __match would just be outright ugly).

So, I propose the following changes:

1) The switch statement should be generalized to allow any expression in 
cases. If the expression is an existing variable, the switched-upon 
value will be matched against the value of that variable. If the 
expression is an unbound variable, it is a compile-time error.
2) If no case is matched, and no 'default' case is present in the 
switch, an Error (say, SwitchCaseError or whatever) should be raised. I 
don't think this is a bad idea, since we're already moving towards 
deprecating lack of 'default' in non-final switches.

Point (1) is the most important one here. (2) is not crucial for good 
pattern matching capabilities, and is more of an aid in debugging.

What do you folks think? Would this have a place in D?

- Alex
Oct 27 2011
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Where is the fundamental advantage compared to the following?

if(x == "foo") writeln("got foo")
else if(x == "foo") writeln("got foo")
else if(x == "foo") writeln("got foo")
else assert(false);




On 27.10.2011 14:05, Alex Rønne Petersen wrote:
 Hi,

 Many functional languages have what is called pattern matching. That is,
 matching a value against a set of expressions, testing for exact equality.

 For example, in F# (ML-ish) code:

 let x = fooOrBarOrBaz()
 match x with
 | "foo" -> printf "got foo"
 | "bar" -> printf "got bar"
 | "baz" -> printf "got baz"

 It's dead simple, and works more or less exactly like a switch: Walk
 through each case until a value matching x is found. Then, execute that
 branch.

 There are three major differences, however:

 1) If no match could be found, an exception (in D's case, this would be
 an Error-derived type) is thrown.
 2) A 'default' branch is specified by matching against a plain variable
 (typically _ if unused). This always matches, since it's just a plain
 binding of the matched-against value to another variable.
 3) The cases in the pattern match can be *any* expression. They don't
 have to be compile-time constants. This is extremely flexible, and some
 would argue that it is what the switch statement always should have
 been. Of course, the values in each case have to be compatible or
 implicitly convertible to the type of the value being matched against.

 It is worth noting that pattern matching does not ruin compiler
 optimization of switches. It merely takes some extra effort to determine
 that all cases are constant.

 Furthermore, point (2) naturally leads to the question: What happens if
 you match a value against an already-bound variable? In functional
 languages, what happens is typically shadowing (i.e. the already-bound
 variable becomes 'hidden'). In D, I think it would make more sense to
 match against the variable's value, since D doesn't have shadowing
 anywhere else AFAIK.

 Additionally, binding the value to a new variable in the 'default' case
 might not make sense in D. When you hit the 'default:' label, you
 already have the switched-upon value in scope anyway.

 Another trait of pattern matching in functional languages is that a
 pattern match is usually an expression. This means that whatever value
 is returned from the matched case (if any) is the result of the pattern
 match expression. I don't think that this would 'fit in' in D. Then we'd
 have to make if-then-else, while, for, foreach, do-while, etc
 expressions too, for consistency, which doesn't really seem at home in
 an imperative language.

 Furthermore, there is the issue of introducing a new keyword. I don't
 think this is a good idea, especially not a common word like "match" or
 something along those lines (and __match would just be outright ugly).

 So, I propose the following changes:

 1) The switch statement should be generalized to allow any expression in
 cases. If the expression is an existing variable, the switched-upon
 value will be matched against the value of that variable. If the
 expression is an unbound variable, it is a compile-time error.
 2) If no case is matched, and no 'default' case is present in the
 switch, an Error (say, SwitchCaseError or whatever) should be raised. I
 don't think this is a bad idea, since we're already moving towards
 deprecating lack of 'default' in non-final switches.

 Point (1) is the most important one here. (2) is not crucial for good
 pattern matching capabilities, and is more of an aid in debugging.

 What do you folks think? Would this have a place in D?

 - Alex

Oct 27 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Norbert Nemec:

 Where is the fundamental advantage compared to the following?
 
 if(x == "foo") writeln("got foo")
 else if(x == "foo") writeln("got foo")
 else if(x == "foo") writeln("got foo")
 else assert(false);

The advantages are visible even with the D final switch, but are more generalized. The various cases are formatted nicely with less syntax noise, this allows you to see them better. And the compiler gives an error if you duplicate cases, or if some cases are missing. This is also useful if you modify the data structures themselves (adding or removing cases); the compiler gives errors for all the patterns you have to fix after the change. This helps write correct code both the first time, and later during code updates. Regarding pattern matching in D, I'd like a "just" improvement of D switches: http://d.puremagic.com/issues/show_bug.cgi?id=596 Bye, bearophile
Oct 28 2011