www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More on Rust language

reply bearophile <bearophileHUGS lycos.com> writes:
Through Reddit I've found two introductions to the system language Rust being
developed by Mozilla. This is one of them:

http://marijnhaverbeke.nl/rust_tutorial/

This is an alpha-state tutorial, so some parts are unfinished and some parts
will probably change, in the language too.

Unfortunately this first tutorial doesn't discuss typestates and syntax macros
(yet), two of the most significant features of Rust. The second tutorial
discussed a bit typestates too.

Currently the Rust compiler is written in Rust and it's based on the LLVM
back-end. This allows it to eat its own dog food (there are few descriptions of
typestate usage in the compiler itself) and the backend is efficient enough.
Compared to DMD the Rust compiler is in a earlier stage of development, it
works and it's able to compile itself but I think it's not usable yet for
practical purposes.

On the GitHub page the Rust project has 547 "Watch" and 52 "Fork", while DMD
has 159 and 49 of them, despite Rust is a quite younger compiler/software
compared to D/DMD. So it seems enough people are interested in Rust.

Most of the text below is quotations from the tutorials.

---------------------------

http://marijnhaverbeke.nl/rust_tutorial/control.html

Pattern matching

Rust's alt construct is a generalized, cleaned-up version of C's switch
construct. You provide it with a value and a number of arms, each labelled with
a pattern, and it will execute the arm that matches the value.

alt my_number {
  0       { std::io::println("zero"); }
  1 | 2   { std::io::println("one or two"); }
  3 to 10 { std::io::println("three to ten"); }
  _       { std::io::println("something else"); }
}

There is no 'falling through' between arms, as in C—only one arm is executed,
and it doesn't have to explicitly break out of the construct when it is
finished.

The part to the left of each arm is called the pattern. Literals are valid
patterns, and will match only their own value. The pipe operator (|) can be
used to assign multiple patterns to a single arm. Ranges of numeric literal
patterns can be expressed with to. The underscore (_) is a wildcard pattern
that matches everything.

If the arm with the wildcard pattern was left off in the above example, running
it on a number greater than ten (or negative) would cause a run-time failure.
When no arm matches, alt constructs do not silently fall through—they blow up
instead.

A powerful application of pattern matching is destructuring, where you use the
matching to get at the contents of data types. Remember that (float, float) is
a tuple of two floats:

fn angle(vec: (float, float)) -> float {
    alt vec {
      (0f, y) when y < 0f { 1.5 * std::math::pi }
      (0f, y) { 0.5 * std::math::pi }
      (x, y) { std::math::atan(y / x) }
    }
}

A variable name in a pattern matches everything, and binds that name to the
value of the matched thing inside of the arm block. Thus, (0f, y) matches any
tuple whose first element is zero, and binds y to the second element. (x, y)
matches any tuple, and binds both elements to a variable.

Any alt arm can have a guard clause (written when EXPR), which is an expression
of type bool that determines, after the pattern is found to match, whether the
arm is taken or not. The variables bound by the pattern are available in this
guard expression.


Record patterns

Records can be destructured on in alt patterns. The basic syntax is {fieldname:
pattern, ...}, but the pattern for a field can be omitted as a shorthand for
simply binding the variable with the same name as the field.

alt mypoint {
    {x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
    {x, y}             { /* Simply bind the fields */ }
}

The field names of a record do not have to appear in a pattern in the same
order they appear in the type. When you are not interested in all the fields of
a record, a record pattern may end with , _ (as in {field1, _}) to indicate
that you're ignoring all other fields.


Tags

Tags [FIXME terminology] are datatypes that have several different
representations. For example, the type shown earlier:

tag shape {
    circle(point, float);
    rectangle(point, point);
}

A value of this type is either a circle¸ in which case it contains a point
record and a float, or a rectangle, in which case it contains two point
records. The run-time representation of such a value includes an identifier of
the actual form that it holds, much like the 'tagged union' pattern in C, but
with better ergonomics.


Tag patterns

For tag types with multiple variants, destructuring is the only way to get at
their contents. All variant constructors can be used as patterns, as in this
definition of area:

fn area(sh: shape) -> float {
    alt sh {
        circle(_, size) { std::math::pi * size * size }
        rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
    }
}

------------------------------

// The type of this vector will be inferred based on its use.
let x = [];

// Explicitly say this is a vector of integers.
let y: [int] = [];

---------------------------

Tuples

Tuples in Rust behave exactly like records, except that their fields do not
have names (and can thus not be accessed with dot notation). Tuples can have
any arity except for 0 or 1 (though you may see nil, (), as the empty tuple if
you like).

let mytup: (int, int, float) = (10, 20, 30.0);
alt mytup {
  (a, b, c) { log a + b + (c as int); }
}

---------------------------

Pointers

Rust supports several types of pointers. The simplest is the unsafe pointer,
written *TYPE, which is a completely unchecked pointer type only used in unsafe
code (and thus, in typical Rust code, very rarely). The safe pointer types are
 TYPE for shared, reference-counted boxes, and ~TYPE, for uniquely-owned
pointers.

All pointer types can be dereferenced with the * unary operator.

---------------------------

When inserting an implicit copy for something big, the compiler will warn, so
that you know that the code is not as efficient as it looks.

---------------------------

Argument passing styles

...

Another style is by-move, which will cause the argument to become
de-initialized on the caller side, and give ownership of it to the called
function. This is written -.

Finally, the default passing styles (by-value for non-structural types,
by-reference for structural ones) are written + for by-value and && for
by(-immutable)-reference. It is sometimes necessary to override the defaults.
We'll talk more about this when discussing generics.

==============================================

The second introduction I have found:
https://github.com/graydon/rust/wiki/

---------------------------

https://github.com/graydon/rust/wiki/Unit-testing

Rust has built in support for simple unit testing. Functions can be marked as
unit tests using the 'test' attribute.

#[test]
fn return_none_if_empty() {
   ... test code ...
}

A test function's signature must have no arguments and no return value. To run
the tests in a crate, it must be compiled with the '--test' flag: rustc
myprogram.rs --test -o myprogram-tests. Running the resulting executable will
run all the tests in the crate. A test is considered successful if its function
returns; if the task running the test fails, through a call to fail, a failed
check or assert, or some other means, then the test fails.

When compiling a crate with the '--test' flag '--cfg test' is also implied, so
that tests can be conditionally compiled.

#[cfg(test)]
mod tests {
  #[test]
  fn return_none_if_empty() {
    ... test code ...
  }
}

Note that attaching the 'test' attribute to a function does not imply the
'cfg(test)' attribute. Test items must still be explicitly marked for
conditional compilation (though this could change in the future).

Tests that should not be run can be annotated with the 'ignore' attribute. The
existence of these tests will be noted in the test runner output, but the test
will not be run.

A test runner built with the '--test' flag supports a limited set of arguments
to control which tests are run: the first free argument passed to a test runner
specifies a filter used to narrow down the set of tests being run; the
'--ignored' flag tells the test runner to run only tests with the 'ignore'
attribute.
Parallelism


Parallelism

By default, tests are run in parallel, which can make interpreting failure
output difficult. In these cases you can set the RUST_THREADS environment
variable to 1 to make the tests run sequentially.

Examples
Typical test run

 mytests

running 30 tests running driver::tests::mytest1 ... ok running driver::tests::mytest2 ... ignored ... snip ... running driver::tests::mytest30 ... ok result: ok. 28 passed; 0 failed; 2 ignored Test run with failures
 mytests

running 30 tests running driver::tests::mytest1 ... ok running driver::tests::mytest2 ... ignored ... snip ... running driver::tests::mytest30 ... FAILED result: FAILED. 27 passed; 1 failed; 2 ignored Running ignored tests
 mytests --ignored

running 2 tests running driver::tests::mytest2 ... failed running driver::tests::mytest10 ... ok result: FAILED. 1 passed; 1 failed; 0 ignored Running a subset of tests
 mytests mytest1

running 11 tests running driver::tests::mytest1 ... ok running driver::tests::mytest10 ... ignored ... snip ... running driver::tests::mytest19 ... ok result: ok. 11 passed; 0 failed; 1 ignored --------------------------- https://github.com/graydon/rust/wiki/Error-reporting Incorrect use of numeric literals. auto i = 0u; i += 3; // suggest "3u" Use of for where for each was meant. for (v in foo.iter()) // suggest "for each" This is something I'd like in D too: http://d.puremagic.com/issues/show_bug.cgi?id=6638 --------------------------- https://github.com/graydon/rust/wiki/Attribute-notes Crate Linkage Attributes A crate's version is determined by the link attribute, which is a list meta item containing metadata about the crate. This metadata can, in turn, be used in providing partial matching parameters to syntax extension loading and crate importing directives, denoted by the syntax and use keywords respectively. All meta items within a link attribute contribute to the versioning of a crate, and two meta items, name and vers, have special meaning and must be present in all crates compiled as shared libraries. An example of a typical crate link attribute: #[link(name = "std", vers = "0.1", uuid = "122bed0b-c19b-4b82-b0b7-7ae8aead7297", url = "http://rust-lang.org/src/std")]; ============================================== Regarding different kinds of pointers in D, I have recently found this: http://herbsutter.com/2011/10/25/garbage-collection-synopsis-and-c/ From what I understand in this comment by Herb Sutter, I was right when about three years ago I was asking for a second pointer type in D:
Mark-compact (aka moving) collectors, where live objects are moved together to
make allocated memory more compact. Note that doing this involves updating
pointers’ values on the fly. This category includes semispace collectors as
well as the more efficient modern ones like the .NET CLR’s that don’t use up
half your memory or address space. C++ cannot support this without at least a
new pointer type, because C/C++ pointer values are required to be stable (not
change their values), so that you can cast them to an int and back, or write
them to a file and back; this is why we created the ^ pointer type for C++/CLI
which can safely point into #3-style compacting GC heaps. See section 3.3 of my
paper (http://www.gotw.ca/publications/C++CLIRationale.pdf ) A Design Rationale
for C++/CLI for more rationale about ^ and gcnew.<

Tell me if I am wrong still. How do you implement a moving GC in D if D has raw pointers? D semantics doesn't allow the GC to automatically modify those pointers when the GC moves the data. -------------------------- As you see this post of mine doesn't discuss typestates nor syntax macros. I have not found enough info about them in the Rust docs. Even if Rust will not become widespread, it will introduce typestates in the cauldron of features known by future language designers (and maybe future programmers too), or it will show why typestates are not a good idea. In all three cases Rust will be useful. Some comments regarding D: - I'd like the better error messages I have discussed in bug 6638. - Tuple de-structuring syntax will be good to have in D too. There is a patch on this. If the ideas of the patch are not developed enough, then I suggest to present the design problems and to discuss and solve them. - I'd like a bit more flexible switch in D, discussion: http://d.puremagic.com/issues/show_bug.cgi?id=596 This is just an additive change, I think it causes no breaking changes. - Tag patterns used inside the switch-like "alt": syntax-wise this looks less easy to implement in D. - I think unit testing in D needs more improvements. Rust is in a less developed state compared to D, yet its unit testing features seems better designed already. I think this is not complex stuff to design and implement. Bye, bearophile
Nov 03 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
I have found a slides pack, Rust All Hands Winter 2011, with some notes on
typestates too:
http://www.slideshare.net/pcwalton/rust-all-hands-winter-2011

And here there are some tests about macros too, search the word "macro":
https://github.com/graydon/rust/tree/master/src/test/run-pass

Bye,
bearophile
Nov 03 2011
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/3/2011 8:14 PM, bearophile wrote:
 Mark-compact (aka moving) collectors, where live objects are moved together
 to make allocated memory more compact. Note that doing this involves
 updating pointers’ values on the fly. This category includes semispace
 collectors as well as the more efficient modern ones like the .NET CLR’s
 that don’t use up half your memory or address space. C++ cannot support
 this without at least a new pointer type, because C/C++ pointer values are
 required to be stable (not change their values), so that you can cast them
 to an int and back, or write them to a file and back; this is why we
 created the ^ pointer type for C++/CLI which can safely point into #3-style
 compacting GC heaps. See section 3.3 of my paper
 (http://www.gotw.ca/publications/C++CLIRationale.pdf ) A Design Rationale
 for C++/CLI for more rationale about ^ and gcnew.<

Tell me if I am wrong still.

You're wrong still :-)
 How do you implement a moving GC in D if D has
 raw pointers?

It can be done if the D compiler emits full runtime type info. It's a solved problem with GCs.
 D semantics doesn't allow the GC to automatically modify those
 pointers when the GC moves the data.

Yes, it does. I've implemented a moving collector before designing D, and I carefully defined the semantics so that it could be done for D. Besides, having two pointer types in D would be disastrously complex. C++/CLI does, and C++/CLI is a failure in the marketplace. (I've dealt with multiple pointer types from the DOS daze, and believe me it is a BAD BAD BAD idea.)
Nov 03 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 You're wrong still :-)

In this newsgroup I am used to being wrong several times every day :-)
 It can be done if the D compiler emits full runtime type info. It's a solved 
 problem with GCs.

I see, I will have to read more on this solution.
 Besides, having two pointer types in D would be disastrously complex.

Rust has three pointer types! :-) In Ada too I think there are three types of pointers.
 (I've dealt with multiple 
 pointer types from the DOS daze, and believe me it is a BAD BAD BAD idea.)

I am not sure, but I think the situation is very different here. Here it's only the type system that tells those pointers them apart, and restricts the kinds of operations you are allowed to do with them or changes the things they do. In Rust it's not the kind of memory they point to that tells what they are (as I presume was in DOS), here you are allowed to use one of the three kinds of pointers, as you like, for each kind of data you want. The difference is all in their semantics. I think this is very different from the DOS pointers situation. From the examples of Rust code I've read, I have not seen any disaster regarding the design of its pointers. They have implemented a not small compiler with the language, so I think the pointer situation is not awful. Regarding pointer types, in D there are function pointers and function delegates, they are kind of two different kinds of pointers already. They increase language complexity, its usage, and require some conversion code, but they are not a disaster to use. Thank you for your answers, bye, bearophile
Nov 03 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/3/2011 9:14 PM, bearophile wrote:
 Regarding pointer types, in D there are function pointers and function
 delegates, they are kind of two different kinds of pointers already.

And their only saving grace is they are not used that often, so the complexity is tolerable. This is not so for pointers.
Nov 03 2011