www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [GSoC Proposal] Statically Checked Measurement Units

reply Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
First, let me apologize for this very late entry, it's the end of universit=
y
and it's been a very busy period, I hope you will still consider it.

Note this email is best read using a fixed font.

PS: I'm really sorry if this is the wrong mailing list to post and I hope
you'll forgive me if that's the case.

=3D=3D=3D=3D=3D=3D=3D Google Summer of Code Proposal: Statically Checked Un=
its =3D=3D=3D=3D=3D=3D=3D


Abstract
-------------

Measurement units allow to statically check the correctness of assignments
and expressions at virtually no performance cost and very little extra
effort. When it comes to physics the advantages are obvious =96 if you try =
to
assign a force a variable measuring distance, you've most certainly got a
formula wrong somewhere along the way. Also, showing a sensor measurement i=
n
gallons on a litre display that keeps track of the remaining fuel of a plan=
e
(a big no-no) is easily avoidable with this technique. What this translates
is that one more of the many hidden assumptions in source code is made
visible: units naturally complement other contract checking techniques, lik=
e
assertions, invariants and the like. After all the unit that a value is
measured in is part of the contract.

The scope of measurement units is not limited to physics calculations
however and if the feature is properly implemented and is very easy to use,
creating a very domain-specific units helps a great deal with checking
correct code at compile time. Static typing doesn't cut it sometimes:
imagine    two variables counting different things =96 Gadgets and Widgets.
While both values should be ints, one of them should probably not be
assignable to the other. Or imagine a website calculating the number of
downloads per second, but uses a timer that counts milliseconds. When one
thinks about it this way, there are a great many cases where units can help
prevent common errors.

Given D's focus on contract based design and language features supporting
it, I think statically checked measurement units fit very naturally into th=
e
standard library, and the language's metaprogramming features would make it
very clean to implement (as opposed to a similar effort in C++). I think a

provided by Boost.Units:
1. Defining unit systems, like Boost.Units requires is extra effort, so
units for counting Widgets or Gadgets would be awkward to use and we would
lose the safety checks there.

them.
3. The sort of silent conversion that Boost.Units performs is undesirable i=
n
many cases since it is a recipe for precision disasters, imagine sometimes
accidentally assigning a variable measured in billions of years to a one
measured in picoseconds. Boost.Units would silently convert one to another,
since they measure the same dimension. This probably results into the value
being set to +INF and even if it doesn't, very rarely one actually intends
to perform this conversion.
4. Setting numerical ids to units and dimensions is cumbersome. S

Thus, the requirements for the unit system would be:
1. One line definition of new units.
2. Simple, yet safe and explicit conversion between units.
3. Zero runtime overhead.
4. Minimal extra coding effort to use units.


Interface Overview
---------------------------
A Boost type approach to the library interface would be:

struct Metre     : SomeUnitBaseType!(...) {}
struct Second    : SomeUnitBaseType!(...) {}

typedef DerivedUnit!(MetreType,1,Second,-1) MetresPerSecond;
typedef DerivedUnit!(MetreType,2)           MetersSquared;

Meter           metre, metres;
Second          second, seconds;
MetersPerSecond metersPerSecond;
MetersSquared   meterSquared, metersSquared;

void f() {
Quantity!(metre)           dist1 =3D 3.0 * metres;
Quantity!(meterSquared)    area  =3D dist1 * dist1;
Quantity!(metresPerSecond) speed =3D distance / (2.0*seconds);
}


This is very cumbersome and fails on the one line requirement. I propose
using types for base units and mixins to define derived units. One can use
the typenames of the units in arithmetic operations this way:

struct metre  {}
struct second {}


void f() {
Quantity!("metre")        dist1 =3D quantity!(3.0, "metre");
Quantity!("metre^2")      area  =3D dist1 * dist1;
Quantity!("metre/second") speed =3D dist1 / quantity!(2.0, "second");
}


Conversion between units can be done specifying a single factor with a
proper unit:

template conversion( alias unit : "kilometer/meter" ) {
immutable Quantity!(unit) conversion =3D quantity!(123.0,unit);
}

void f() {
Quantity!("metre")     d1 =3D quantity!(123.0,"metre");
 // convert calls conversion! with the right argument
Quantity!("kilometre") d2 =3D convert!(d1,"kilometre");
}

Also, notice this approach imposes no restriction to the types that define
units, therefore our Widget/Gadget counters could be defined without any
extra work:

class Widget { /* complicated class definition */ }
class Gadget { /* complicated class definition */ }

Quantity!("Widget",int) nWidgets;
Quantity!("Gadget",int) nGadgets;


About Me
------------
I am an undergraduate student at the University of Edinburgh in Scotland
doing a degree in Computer Science and Artificial Intelligence, originally
from Romania where I finished a specialised Computer Science high school
(Colegiul National de Informatica "Tudor Vianu").
My first language is C++ which I started learning when I was 9 and as a
result I have a very good understanding of template metaprogramming. I also
know Haskell and Python well which helps me draw from multiple paradigms
when designing as system. I started learning D about a year ago and I
instantly fell in love with it. The fact that it does away with all the
annoying C backwards compatibility, improves on the features that make C++
unique (the template system, performance, low level memory access etc.) and
adds modern language features (garbage collection, lambda functions etc.)
makes me very optimistic about the project.
In terms of working experience, other than a myriad of personal projects, I
did work for my former high school for a summer, implementing an automated
testing system, and last year I was lead a team that won a software
development competition organised by our computing society. This year I too=
k
part in the Scottish Game Jam where my team ended up in 8th place.


--=20
(Cristi Cobzarenco)
Pofile: http://www.google.com/profiles/cristi.cobzarenco
Mar 28 2011
next sibling parent reply David Nadlinger <see klickverbot.at> writes:
On 3/28/11 5:43 PM, Cristi Cobzarenco wrote:
 First, let me apologize for this very late entry, it's the end
 of university and it's been a very busy period, I hope you will still
 consider it.
This is by no means a late proposal – the application period has not even formally opened yet. I was somewhat surprised to see your post, because I had been playing around with a dimensional algebra/unit system implementation for a while before the whole GSoC thing started. I even have an unfinished project proposal for it lying around, but as I thought that I was rather alone with my fascination for unit systems, I decided to finish the Thrift one first. Well, seems like I was wrong… :) A few things that came to my mind while reading your proposal: - The need for numerical ids is not inherent to a dimension-aware model like Boost.Units – it's just needed because there is no way to get a strict total order of types in C++. You will need to find a solution for this in D as well, because you want to make sure that »1.0 * second * metre« is of the same type as »1.0 * metre * second«. - I think that whether disallowing implicit conversion of e.g. millimeters to meters is a good idea has to be decided once the rest of the API has been implemented and one can actually test how the API »feels« – although I'd probably disallow them in my design by default as well, I'm not too sure if this actually works out in practice, or if it just cumbersome to use. Also, I'd like to note that whether to allow this kind of implicit conversions doesn't necessarily depend on whether a system has the notion of dimensions. - Not that I would be too fond of the Boost.Units design, but »convenience« functions for constructing units from strings like in your second example could be implemented for it just as well. - You have probably already thought of this and omitted it for the sake of brevity, but I don't quite see how your current, entirely string-based proposal would work when units and/or conversion factors are defined in a module different from the one the implementations of Quantity/convert/… are in. Contrary to C++, D doesn't have ADL, so I am not sure how a custom base unit would be in scope for Quantity to find, or how a custom conversion template could be found from convert(). Anyway, I am not sure whether I should submit my own units proposal as well, but if you should end up working on this project, I'd be happy to discuss any design or implementation issue you'd like to. David
Mar 28 2011
next sibling parent reply Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
- I too was playing around with a units project before GSoC, that is why I
thought doing this project was a good idea. The way I was doing it without
numerical IDs was simply by having more complicated algorithms for equality=
,
multiplications etc. For example, equality would be implemented as:
template UnitTuple(P...) {
  alias P Units;
}

template contains( Unit, UT ) {
  /* do a linear search for Unit in UT.Units (since UT is a UnitTuple) -
O(n)*/
}

template includes( UT1, UT2 ) {
  /* check for each Unit in UT1 that it is also in UT2 (using contains) -
O(n^2) */
}

template equals( UT1, UT2 ) {
  immutable bool equals =3D includes!(UT1,UT2) && includes!(UT2, UT1);
}
Granted this means that each check takes O(n^2) where n is the number of
different units, but it might be worth it - or not. On the small tests I've
done it didn't seem to increase compile time significantly, but more
research needs to be done. I think that as long as there aren't values with
_a lot_ of units (like ten), the extra compile time shouldn't be noticeable=
.
The biggest problem I have with adding IDs is that one will have to manage
systems afterwards or have to deal with collisions. Neither one is very
nice.

- You're right, you don't need dimensions for implicit conversions, of
course. And you're also right about possibly making the decision later abou=
t

has explicit conversions, and I was trying to steer more towards that model=
.

- I seem not to have been to clear about the way I would like to use
strings. The names of the units in the strings have to be the type names
that determine the units. Then one needs a function that would convert a
string like "Meter/Second" to Division!(Meter, Second), I'm not sure how yo=
u
would do that in C++. Maybe I'm wrong, but I can't see it.

- I hope it is by now clear that my proposal is not, in fact, string based
at all. The strings are just there to be able to write derived units in
infix notation, something boost solves by using dummy objects with
overloaded operators. The lack of ADL is a problem which I completely
missed; I have immersed myself in C++ completely lately and I've gotten use=
d
to specializing templates in different scopes. These are the solutions I ca=
n
come up with, but I will have to think some more:
1. There is an intrusive way of solving this, by making the conversion
factors static members of the unit types, but this would not allow, for
example, having a Widget/Gadget counter the way I intended.

that one manually uses. That actually is not bad at all. The only problem
was that I was hoping that conversion between derived units could
automatically be done using the conversion factors of the fundamental units=
:
(meter/second) -> (kilometer/hour) knowing meter->kilometer and
second->hour.

Again I will have to think some more about the latter point. And I'll do
some more tests on the performance of doing linear searches. Is there way t=
o
get the name of a type (as a string) at compile time (not the mangled name
you get at runtime)? I wasn't able to find any way to do this. My original
idea was actually to use the fully qualified typenames to create the
ordering.

Thanks a lot for your feedback, it's been very helpful, especially in
pointing out the lack of ADL. Hope to hear from you again.

On 28 March 2011 20:57, David Nadlinger <see klickverbot.at> wrote:

 On 3/28/11 5:43 PM, Cristi Cobzarenco wrote:

 First, let me apologize for this very late entry, it's the end
 of university and it's been a very busy period, I hope you will still
 consider it.
This is by no means a late proposal =96 the application period has not ev=
en
 formally opened yet.

 I was somewhat surprised to see your post, because I had been playing
 around with a dimensional algebra/unit system implementation for a while
 before the whole GSoC thing started. I even have an unfinished project
 proposal for it lying around, but as I thought that I was rather alone wi=
th
 my fascination for unit systems, I decided to finish the Thrift one first=
.
 Well, seems like I was wrong=85 :)

 A few things that came to my mind while reading your proposal:
  - The need for numerical ids is not inherent to a dimension-aware model
 like Boost.Units =96 it's just needed because there is no way to get a st=
rict
 total order of types in C++. You will need to find a solution for this in=
D
 as well, because you want to make sure that =BB1.0 * second * metre=AB is=
of the
 same type as =BB1.0 * metre * second=AB.

  - I think that whether disallowing implicit conversion of e.g. millimete=
rs
 to meters is a good idea has to be decided once the rest of the API has b=
een
 implemented and one can actually test how the API =BBfeels=AB =96 althoug=
h I'd
 probably disallow them in my design by default as well, I'm not too sure =
if
 this actually works out in practice, or if it just cumbersome to use. Als=
o,
 I'd like to note that whether to allow this kind of implicit conversions
 doesn't necessarily depend on whether a system has the notion of dimensio=
ns.
  - Not that I would be too fond of the Boost.Units design, but
 =BBconvenience=AB functions for constructing units from strings like in y=
our
 second example could be implemented for it just as well.

  - You have probably already thought of this and omitted it for the sake =
of
 brevity, but I don't quite see how your current, entirely string-based
 proposal would work when units and/or conversion factors are defined in a
 module different from the one the implementations of Quantity/convert/=85=
are
 in. Contrary to C++, D doesn't have ADL, so I am not sure how a custom ba=
se
 unit would be in scope for Quantity to find, or how a custom conversion
 template could be found from convert().

 Anyway, I am not sure whether I should submit my own units proposal as
 well, but if you should end up working on this project, I'd be happy to
 discuss any design or implementation issue you'd like to.

 David
--=20 (Cristi Cobzarenco) Pofile: http://www.google.com/profiles/cristi.cobzarenco
Mar 28 2011
parent reply David Nadlinger <see klickverbot.at> writes:
I am in a slight dilemma, because although I would love to share my work 
and ideas with you, right now this would automatically weaken my own 
units proposal in comparison to yours. However, as this would be grossly 
against the open source spirit, and the point of GSoC certainly can't be 
to encourage that, I'll just do it anyway.

Regarding IDs: As I wrote in my previous post, the only point of the 
unit IDs in Boost.Units is to provide a strict total order over the set 
of units. If you can achieve it without that (see below), you won't need 
any artificial numbers which you have to manage.

But why would you need to be able to sort the base units in the first 
place? The answer is simple: To define a single type representation for 
each possible unit, i.e. to implement type canonicalization. To 
illustrate this point, consider the following (pseudocode) example:

auto force = 5.0 * newton;
auto distance = 3.0 * meter;
Quantity!(Newton, Meter) torque = force * distance;
torque = distance * force;

Both of the assignments to »torque« should obviously work, because the 
types of »force * distance« and »distance * force« are semantically the 
same. In a naïve implementation, however, the actual types would be 
different because the pairs of base units and exponents would be 
arranged in a different order, so at least one of the assignments would 
lead to type mismatch – because a tuple of units is, well, a tuple and 
not an (unordered) set.

And this is exactly where the strictly ordered IDs enter the scheme. By 
using them to sort the base unit/exponent pairs, you can guarantee that 
quantities semantically equivalent always end up with the same 
»physical« type.

Luckily, there is no need to require the user to manually assign 
sortable, unique IDs to each base type because we can access the mangled 
names of types at compile time, which fulfill these requirements. There 
are probably other feasible approaches as well, but using them worked 
out well for me (you can't rely on .stringof to give unique strings). 
When implementing the type sorting code, you might probably run into 
some difficulties and/or CTFE bugs, feel free to contact me for related 
questions (as I have already wasted enough time on this to get a working 
solution…^^).

Regarding strings: I might not have expressed my doubts clearly, but I 
didn't assume that your proposed system would use strings as internal 
representation at all. What I meant is that I don't see a way how, given 
»Quantity!("Widgets/Gadgets")«, to get the Widget and Gadget types in 
scope inside Quantity. Incidentally, this is exactly the reason for 
which you can't use arbitrary functions/types in the »string lambdas« 
from std.algorithm.

David


On 3/28/11 9:43 PM, Cristi Cobzarenco wrote:
 - I too was playing around with a units project before GSoC, that is why
 I thought doing this project was a good idea. The way I was doing it
 without numerical IDs was simply by having more complicated algorithms
 for equality, multiplications etc. For example, equality would be
 implemented as:
 template UnitTuple(P...) {
    alias P Units;
 }

 template contains( Unit, UT ) {
    /* do a linear search for Unit in UT.Units (since UT is a UnitTuple)
 - O(n)*/
 }

 template includes( UT1, UT2 ) {
    /* check for each Unit in UT1 that it is also in UT2 (using contains)
 - O(n^2) */
 }

 template equals( UT1, UT2 ) {
    immutable bool equals = includes!(UT1,UT2) && includes!(UT2, UT1);
 }
 Granted this means that each check takes O(n^2) where n is the number of
 different units, but it might be worth it - or not. On the small tests
 I've done it didn't seem to increase compile time significantly, but
 more research needs to be done. I think that as long as there aren't
 values with _a lot_ of units (like ten), the extra compile time
 shouldn't be noticeable. The biggest problem I have with adding IDs is
 that one will have to manage systems afterwards or have to deal with
 collisions. Neither one is very nice.

 - You're right, you don't need dimensions for implicit conversions, of
 course. And you're also right about possibly making the decision later

 popular, only has explicit conversions, and I was trying to steer more
 towards that model.

 - I seem not to have been to clear about the way I would like to use
 strings. The names of the units in the strings have to be the type names
 that determine the units. Then one needs a function that would convert a
 string like "Meter/Second" to Division!(Meter, Second), I'm not sure how
 you would do that in C++. Maybe I'm wrong, but I can't see it.

 - I hope it is by now clear that my proposal is not, in fact, string
 based at all. The strings are just there to be able to write derived
 units in infix notation, something boost solves by using dummy objects
 with overloaded operators. The lack of ADL is a problem which I
 completely missed; I have immersed myself in C++ completely lately and
 I've gotten used to specializing templates in different scopes. These
 are the solutions I can come up with, but I will have to think some more:
 1. There is an intrusive way of solving this, by making the conversion
 factors static members of the unit types, but this would not allow, for
 example, having a Widget/Gadget counter the way I intended.

 factors, that one manually uses. That actually is not bad at all. The
 only problem was that I was hoping that conversion between derived units
 could automatically be done using the conversion factors of the
 fundamental units: (meter/second) -> (kilometer/hour) knowing
 meter->kilometer and second->hour.

 Again I will have to think some more about the latter point. And I'll do
 some more tests on the performance of doing linear searches. Is there
 way to get the name of a type (as a string) at compile time (not the
 mangled name you get at runtime)? I wasn't able to find any way to do
 this. My original idea was actually to use the fully qualified typenames
 to create the ordering.

 Thanks a lot for your feedback, it's been very helpful, especially in
 pointing out the lack of ADL. Hope to hear from you again.
Mar 29 2011
parent reply Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
Surely, .mangleof returns unique strings? Thanks for your offer, but in my
prototype I already have sorting and operators working. You're right, again=
,
about the scope of the types, I have a few ideas on how to work around that=
,
but I don't like any of them too much, I'll play around with them and tell
you more. Thanks a lot for your feedback, I feel this collaboration will
help D in the end, no matter whose proposal gets accepted (if any). I am a
bit confused regarding your GSoC proposal, aren't you a mentor?

On 29 March 2011 13:51, David Nadlinger <see klickverbot.at> wrote:

 I am in a slight dilemma, because although I would love to share my work
 and ideas with you, right now this would automatically weaken my own unit=
s
 proposal in comparison to yours. However, as this would be grossly agains=
t
 the open source spirit, and the point of GSoC certainly can't be to
 encourage that, I'll just do it anyway.

 Regarding IDs: As I wrote in my previous post, the only point of the unit
 IDs in Boost.Units is to provide a strict total order over the set of uni=
ts.
 If you can achieve it without that (see below), you won't need any
 artificial numbers which you have to manage.

 But why would you need to be able to sort the base units in the first
 place? The answer is simple: To define a single type representation for e=
ach
 possible unit, i.e. to implement type canonicalization. To illustrate thi=
s
 point, consider the following (pseudocode) example:

 auto force =3D 5.0 * newton;
 auto distance =3D 3.0 * meter;
 Quantity!(Newton, Meter) torque =3D force * distance;
 torque =3D distance * force;

 Both of the assignments to =BBtorque=AB should obviously work, because th=
e
 types of =BBforce * distance=AB and =BBdistance * force=AB are semantical=
ly the
 same. In a na=EFve implementation, however, the actual types would be
 different because the pairs of base units and exponents would be arranged=
in
 a different order, so at least one of the assignments would lead to type
 mismatch =96 because a tuple of units is, well, a tuple and not an (unord=
ered)
 set.

 And this is exactly where the strictly ordered IDs enter the scheme. By
 using them to sort the base unit/exponent pairs, you can guarantee that
 quantities semantically equivalent always end up with the same =BBphysica=
l=AB
 type.

 Luckily, there is no need to require the user to manually assign sortable=
,
 unique IDs to each base type because we can access the mangled names of
 types at compile time, which fulfill these requirements. There are probab=
ly
 other feasible approaches as well, but using them worked out well for me
 (you can't rely on .stringof to give unique strings). When implementing t=
he
 type sorting code, you might probably run into some difficulties and/or C=
TFE
 bugs, feel free to contact me for related questions (as I have already
 wasted enough time on this to get a working solution=85^^).

 Regarding strings: I might not have expressed my doubts clearly, but I
 didn't assume that your proposed system would use strings as internal
 representation at all. What I meant is that I don't see a way how, given
 =BBQuantity!("Widgets/Gadgets")=AB, to get the Widget and Gadget types in=
scope
 inside Quantity. Incidentally, this is exactly the reason for which you
 can't use arbitrary functions/types in the =BBstring lambdas=AB from
 std.algorithm.

 David



 On 3/28/11 9:43 PM, Cristi Cobzarenco wrote:

 - I too was playing around with a units project before GSoC, that is why
 I thought doing this project was a good idea. The way I was doing it
 without numerical IDs was simply by having more complicated algorithms
 for equality, multiplications etc. For example, equality would be
 implemented as:
 template UnitTuple(P...) {
   alias P Units;
 }

 template contains( Unit, UT ) {
   /* do a linear search for Unit in UT.Units (since UT is a UnitTuple)
 - O(n)*/
 }

 template includes( UT1, UT2 ) {
   /* check for each Unit in UT1 that it is also in UT2 (using contains)
 - O(n^2) */
 }

 template equals( UT1, UT2 ) {
   immutable bool equals =3D includes!(UT1,UT2) && includes!(UT2, UT1);
 }
 Granted this means that each check takes O(n^2) where n is the number of
 different units, but it might be worth it - or not. On the small tests
 I've done it didn't seem to increase compile time significantly, but
 more research needs to be done. I think that as long as there aren't
 values with _a lot_ of units (like ten), the extra compile time
 shouldn't be noticeable. The biggest problem I have with adding IDs is
 that one will have to manage systems afterwards or have to deal with
 collisions. Neither one is very nice.

 - You're right, you don't need dimensions for implicit conversions, of
 course. And you're also right about possibly making the decision later

 popular, only has explicit conversions, and I was trying to steer more
 towards that model.

 - I seem not to have been to clear about the way I would like to use
 strings. The names of the units in the strings have to be the type names
 that determine the units. Then one needs a function that would convert a
 string like "Meter/Second" to Division!(Meter, Second), I'm not sure how
 you would do that in C++. Maybe I'm wrong, but I can't see it.

 - I hope it is by now clear that my proposal is not, in fact, string
 based at all. The strings are just there to be able to write derived

 units in infix notation, something boost solves by using dummy objects
 with overloaded operators. The lack of ADL is a problem which I
 completely missed; I have immersed myself in C++ completely lately and
 I've gotten used to specializing templates in different scopes. These
 are the solutions I can come up with, but I will have to think some more=
:
 1. There is an intrusive way of solving this, by making the conversion
 factors static members of the unit types, but this would not allow, for
 example, having a Widget/Gadget counter the way I intended.

 factors, that one manually uses. That actually is not bad at all. The
 only problem was that I was hoping that conversion between derived units
 could automatically be done using the conversion factors of the
 fundamental units: (meter/second) -> (kilometer/hour) knowing
 meter->kilometer and second->hour.

 Again I will have to think some more about the latter point. And I'll do
 some more tests on the performance of doing linear searches. Is there
 way to get the name of a type (as a string) at compile time (not the

 mangled name you get at runtime)? I wasn't able to find any way to do
 this. My original idea was actually to use the fully qualified typenames
 to create the ordering.

 Thanks a lot for your feedback, it's been very helpful, especially in
 pointing out the lack of ADL. Hope to hear from you again.
--=20 (Cristi Cobzarenco) Pofile: http://www.google.com/profiles/cristi.cobzarenco
Mar 29 2011
parent David Nadlinger <see klickverbot.at> writes:
On 3/29/11 2:33 PM, Cristi Cobzarenco wrote:
 Surely, .mangleof returns unique strings?
Yes, .mangleof returns unique strings for types. The stringof property which was suggested by other people here on the NG, however, is not unique.
 […] Thanks a lot for your feedback, I feel this
 collaboration will help D in the end, no matter whose proposal gets
 accepted (if any). I am a bit confused regarding your GSoC proposal,
 aren't you a mentor?
No, I'm just hoping to participate in this GSoC as a student as well. To clarify the situation: Having experienced how incredibly useful dimensional analysis is in many areas of science, I have long been interested in possible ways of using unit systems in programming to gain additional type safety. Earlier this year, before a possible application to GSoC was even brought up in the D community, I started to work on a D implementation of an unit system. I finished a working prototype, but didn't have the time yet to implement a flexible unit conversion scheme and, more importantly, extend the documentation and examples so I could put it up for discussion at the D NG. Then, it was announced that Digital Mars would participate in this Google Summer of Code, and surprisingly it didn't take long until someone added an unit system to the ideas page. As I was considering to apply to GSoC anyway, this seemed like a natural fit. However, Andrei also put up the idea of a D implementation of Apache Thrift, which caught my attention as I have been waiting for the opportunity to have an in-depth look on it for quite some time now. As I am equally interested in both topics, and students are allowed to submit a large number of proposals (20?), I decided to just write project proposals for both of them and let Walter/Andrei/… choose which one they like better, if any. I decided to start with the Thrift one, and planned to submit my units proposal later in the application period. After publishing my first draft here at the NG, I also contacted Andrei for his opinion on whether it would make sense to submit a second proposal, given that he seemed quite interested in the Thrift idea. Now, back to topic: I am absolutely sure that collaborating on this project will lead to better results (I mean, that's how open source software works after all), but there is a problem: By the GSoC rules, it's not possible for students to work in teams on a single project. The dilemma I hinted at is that if we start working together right now, we'll probably end up with two almost identical proposals/applications for the same project, which doesn't really seem desirable. Also, I'm increasingly doubtful that an units library would be a good fit for a Summer of Code project in the first place, which is also why I finished my other proposal first: As Don said, I think that while it certainly is a nice demonstration of the metaprogramming capabilities/type system expressiveness of a language, it might not be too useful for the »general public«, compared to other features. Don't get me wrong here, I'm personally very enthusiastic about the idea, and I can imagine many possible ways in which a flexible unit system could be used to avoid bugs or to clarify interfaces. But: The concept isn't new at all – for example, during my research I stumbled over papers dedicated to units in programming languages dating back to 1985 –, but I have yet to see units actually being used in production code. My second concern is the extent of the project: After spending two weekends on it, I have a working prototype of a units library, and, if I understood you correctly, you have one as well. They surely both lack some features and a lot of polish and documentation, but I think it would probably take neither of us three full months of work to get them into a state suitable for inclusion in the Phobos review queue. For these reasons, I really started to wonder if it wouldn't be the better idea to just merge our projects and work on getting the result into shape independent of GSoC when I saw your proposal – even more so since our design/implementation ideas have shown to be quite similar. I don't want to discourage you from applying at all, and I will probably still submit a proposal for it nevertheless, but I think this should be discussed. David
Mar 29 2011
prev sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Cristi Cobzarenco wrote:
 Again I will have to think some more about the latter point. And I'll do
 some more tests on the performance of doing linear searches. Is there way to
 get the name of a type (as a string) at compile time (not the mangled name
 you get at runtime)? I wasn't able to find any way to do this. My original
 idea was actually to use the fully qualified typenames to create the
 ordering.
T.stringof where T is some type gives you the name of the type at compile time. Jens
Mar 28 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/28/11 10:43 AM, Cristi Cobzarenco wrote:
 First, let me apologize for this very late entry, it's the end
 of university and it's been a very busy period, I hope you will still
 consider it.

 Note this email is best read using a fixed font.

 PS: I'm really sorry if this is the wrong mailing list to post and I
 hope you'll forgive me if that's the case.

 ======= Google Summer of Code Proposal: Statically Checked Units =======
[snip] This is a good place to discuss pre-submission proposals. To submit, go to http://d-programming-language.org/gsoc2011.html later today. This is a strong draft proposal that I am likely to back up when complete. A few notes: * There is a good overview of existing work, which puts the proponent in the right position to make the best choices for an implementation in D. * Human-readable strings as means to generate types is a fertile direction. One issue is canonicalization, e.g. "meters^2" would be a different type from "meters ^ 2" (and btw that should mimic D's operators, so it should use "^^"), and both are different from the semantically equivalent "meters*meters". I think this is avoidable by a function that brings all strings to a canonical form. This needs to be discussed in the proposal. * The approach to quantities of discrete objects (widgets, gadgets and I hope to see examples with degrees, radians etc.) is very promising. I'm also looking forward to a "Categorical" type - an integer-based quantity that describes a bounded enumeration of objects, for example "CityID". Categorical measures are not supposed to support arithmetic; they simply identify distinct objects in an unrelated space. * In the final proposal the scope of the library should be clarified (e.g. what kinds of units and related idioms will be supported, and what kinds you chose not to support and why). * At best the proposal could define and project a relationship with std.datetime, which defines a few units itself. Wonder whether it's possible to simplify std.datetime by using the future units library. Thanks for your interest, and good luck! Andrei
Mar 28 2011
next sibling parent Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
Thanks for your answer!

- I agree that using strings to represent units is not a particularly good
idea. Since many people have noted related things, I seem not to have been
particularly clear about the way I intend to use strings. Let me try to
explain it in detail:

There is a type that determines the unit:
struct Meter {}

Then every quantity is parametrised with two aliases:
Quantity!(UnitList, ValueType)

UnitList represents a list of pairs (UnitType,Exponent), where UnitType is a
typename (like Meter) and Exponent is a static rational type. Therefore, the
following would be a valid quantity type:
Quantity!( UnitList!( UnitPair!(Meter,1) ), double )

The strings are parsed at compile time and converted (using mixins) into the
UnitList. For example:
ParseUnitString!("meters/second") -> UnitListDivison!( UnitList!(
UnitPair!(Meter,1) ), UnitList!( UnitPair!(Second,1) ) ) -> UnitList(
UnitPair!(Meter,1), UnitPair!(Second,-1) ).

Therefore there is no need to convert all strings to a cannonical form, they
are all converted to an alias tuple (UnitList). To check whether two
UnitList's are the same, one can check double-inclusion. What do you think,
does this make sense.

 - The Categorical type sounds like a great idea. I think they could be
passed on as a ValueType to a quantity:
typedef Quantity!(City, BoundedInt!(0,100)) CityID;

And BoundedInt is just a type implicitly-convertible to and from int, that
supports assignment and equality and throws on an out-of-bounds assignment.

What do you think?





On 28 March 2011 21:53, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org>wrote:

 On 3/28/11 10:43 AM, Cristi Cobzarenco wrote:

 First, let me apologize for this very late entry, it's the end
 of university and it's been a very busy period, I hope you will still
 consider it.

 Note this email is best read using a fixed font.

 PS: I'm really sorry if this is the wrong mailing list to post and I
 hope you'll forgive me if that's the case.

 ======= Google Summer of Code Proposal: Statically Checked Units =======
[snip] This is a good place to discuss pre-submission proposals. To submit, go to http://d-programming-language.org/gsoc2011.html later today. This is a strong draft proposal that I am likely to back up when complete. A few notes: * There is a good overview of existing work, which puts the proponent in the right position to make the best choices for an implementation in D. * Human-readable strings as means to generate types is a fertile direction. One issue is canonicalization, e.g. "meters^2" would be a different type from "meters ^ 2" (and btw that should mimic D's operators, so it should use "^^"), and both are different from the semantically equivalent "meters*meters". I think this is avoidable by a function that brings all strings to a canonical form. This needs to be discussed in the proposal. * The approach to quantities of discrete objects (widgets, gadgets and I hope to see examples with degrees, radians etc.) is very promising. I'm also looking forward to a "Categorical" type - an integer-based quantity that describes a bounded enumeration of objects, for example "CityID". Categorical measures are not supposed to support arithmetic; they simply identify distinct objects in an unrelated space. * In the final proposal the scope of the library should be clarified (e.g. what kinds of units and related idioms will be supported, and what kinds you chose not to support and why). * At best the proposal could define and project a relationship with std.datetime, which defines a few units itself. Wonder whether it's possible to simplify std.datetime by using the future units library. Thanks for your interest, and good luck! Andrei
-- (Cristi Cobzarenco) Pofile: http://www.google.com/profiles/cristi.cobzarenco
Mar 28 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-03-28 12:53, Andrei Alexandrescu wrote:
 * At best the proposal could define and project a relationship with
 std.datetime, which defines a few units itself. Wonder whether it's
 possible to simplify std.datetime by using the future units library.
Well, I can't say what's possible before we actually have a proposed units module, but I doubt that there's much in std.datetime which could be simplified by having a units library. The units portion of it is a fairly small piece. There are functions templatized on time units, but that's all so generic that it's not exactly much code. So, it'll be interesting to see how a units module might relate to it, but I question that it would really do much to simplify it. - Jonathan M Davis
Mar 28 2011
prev sibling parent reply spir <denis.spir gmail.com> writes:
On 03/28/2011 10:13 PM, Cristi Cobzarenco wrote:
   - The Categorical type sounds like a great idea. I think they could be
 passed on as a ValueType to a quantity:
 typedef Quantity!(City, BoundedInt!(0,100)) CityID;

 And BoundedInt is just a type implicitly-convertible to and from int, that
 supports assignment and equality and throws on an out-of-bounds assignment.
I would implement something like Categorical in a language that has no enum. But I do not see the point in D. What advantage does it bring? Denis -- _________________ vita es estrany spir.wikidot.com
Mar 28 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/28/11 6:09 PM, spir wrote:
 On 03/28/2011 10:13 PM, Cristi Cobzarenco wrote:
 - The Categorical type sounds like a great idea. I think they could be
 passed on as a ValueType to a quantity:
 typedef Quantity!(City, BoundedInt!(0,100)) CityID;

 And BoundedInt is just a type implicitly-convertible to and from int,
 that
 supports assignment and equality and throws on an out-of-bounds
 assignment.
I would implement something like Categorical in a language that has no enum. But I do not see the point in D. What advantage does it bring? Denis
A categorical type may not have a name for each value (userid, cityid, countryid...) Andrei
Mar 28 2011
prev sibling parent reply Don <nospam nospam.com> writes:
Cristi Cobzarenco wrote:
 First, let me apologize for this very late entry, it's the end 
 of university and it's been a very busy period, I hope you will still 
 consider it.
 
 Note this email is best read using a fixed font.
 
 PS: I'm really sorry if this is the wrong mailing list to post and I 
 hope you'll forgive me if that's the case.
 
 ======= Google Summer of Code Proposal: Statically Checked Units =======
 
 
 Abstract
 -------------
 
 Measurement units allow to statically check the correctness of 
 assignments and expressions at virtually no performance cost and very 
 little extra effort. When it comes to physics the advantages are obvious 
 – if you try to assign a force a variable measuring distance, you've 
 most certainly got a formula wrong somewhere along the way. Also, 
 showing a sensor measurement in gallons on a litre display that keeps 
 track of the remaining fuel of a plane (a big no-no) is easily avoidable 
 with this technique. What this translates is that one more of the many 
 hidden assumptions in source code is made visible: units naturally 
 complement other contract checking techniques, like assertions, 
 invariants and the like. After all the unit that a value is measured in 
 is part of the contract.
This is one of those features that gets proposed frequently in multiple languages. It's a great example for metaprogramming. But, are there examples of this idea being seriously *used* in production code in ANY language? (For example, does anybody actually use Boost.Unit?)
Mar 29 2011
next sibling parent reply Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
To Don:
That is a very good point and I agree that one shouldn't implement features
just because they're popular. There don't seem to be many (if any projects)

But, I think the reason Boost.Units isn't use hasn't got much to do with th=
e
idea as much as it does with the implementation. Using units in Boost is
very cumbersome. Adding new units (measuring different dimensions)
on-the-fly is virtually impossible. I think that that Boost.Units misses th=
e
point of units. They should be a natural extension of the type system of a
language, not something so limited to the area of natural sciences. D is a
new language and we should be pushing the envelope, just because the Boost
failed (if it did, it may very well kick-off later) doesn't mean we
shouldn't do it. Since it is such a new feature, I think we should talk
about its potential rather than its acceptance.



d
people are still trying to figure out exactly how to use it. I feel that in

some conventions and good practices.
As I said in the abstract, I think the feature fits snugly with other
mechanisms in D and seems to be a natural part of a contract-based design,
so D programmers should have a predisposition (that C++ programmers might
not have) of adopting such a feature.

I really hope this doesn't come off as rude; as I said, you make a very goo=
d
point, one that needs answering. I guess what I'm saying can be summed up
as: it is a new feature; there have been mistakes; it has a lot of potentia=
l
and we can make it better. I'd be curious to hear what you think.

To spir:
Calling the string representation a small domain specific language is
perfect. It is just that, a way of writing arithmetic expressions between
types - something we couldn't do inside D grammar. It's much like the lambd=
a
definitions in functional. I too am queasy about using strings to represent
code, but I think that small DSLs that save effort and improve readability
is one place where it's OK.
Parsing the expressions at compile time will be fun, thankfully one only
needs a stack to do that (Djikstra's shunting yard algorithm) which very is
to implement in the functional-style metaprogramming land.

To David:
Using T.stringof, we can define a total order on types, based on their
typenames. I'm still thinking about conversion.

To Andrei and Jens:
std.datetime won't be simplified _that_ much, but it will probably require
some work so that it uses the same unit system as the future units library.
I would, of course, take care of this as well.

On 29 March 2011 09:06, Don <nospam nospam.com> wrote:

 Cristi Cobzarenco wrote:

 First, let me apologize for this very late entry, it's the end of
 university and it's been a very busy period, I hope you will still consi=
der
 it.

 Note this email is best read using a fixed font.

 PS: I'm really sorry if this is the wrong mailing list to post and I hop=
e
 you'll forgive me if that's the case.


 =3D=3D=3D=3D=3D=3D=3D Google Summer of Code Proposal: Statically Checked=
Units =3D=3D=3D=3D=3D=3D=3D
 Abstract
 -------------

 Measurement units allow to statically check the correctness of assignmen=
ts
 and expressions at virtually no performance cost and very little extra
 effort. When it comes to physics the advantages are obvious =96 if you t=
ry to
 assign a force a variable measuring distance, you've most certainly got =
a
 formula wrong somewhere along the way. Also, showing a sensor measuremen=
t in
 gallons on a litre display that keeps track of the remaining fuel of a p=
lane
 (a big no-no) is easily avoidable with this technique. What this transla=
tes
 is that one more of the many hidden assumptions in source code is made
 visible: units naturally complement other contract checking techniques, =
like
 assertions, invariants and the like. After all the unit that a value is
 measured in is part of the contract.
This is one of those features that gets proposed frequently in multiple languages. It's a great example for metaprogramming. But, are there examp=
les
 of this idea being seriously *used* in production code in ANY language?
 (For example, does anybody actually use Boost.Unit?)
--=20 (Cristi Cobzarenco) Pofile: http://www.google.com/profiles/cristi.cobzarenco
Mar 29 2011
parent reply Don <nospam nospam.com> writes:
Cristi Cobzarenco wrote:
 To Don:
 That is a very good point and I agree that one shouldn't implement 
 features just because they're popular. There don't seem to be many (if 

 But, I think the reason Boost.Units isn't use hasn't got much to do with 
 the idea as much as it does with the implementation. Using units in 
 Boost is very cumbersome. Adding new units (measuring different 
 dimensions) on-the-fly is virtually impossible. I think that that 
 Boost.Units misses the point of units. They should be a natural 
 extension of the type system of a language, not something so limited to 
 the area of natural sciences. D is a new language and we should be 
 pushing the envelope, just because the Boost failed (if it did, it may 
 very well kick-off later) doesn't mean we shouldn't do it. Since it is 
 such a new feature, I think we should talk about its potential rather 
 than its acceptance.



 and people are still trying to figure out exactly how to use it. I feel 

 agree on some conventions and good practices.
 As I said in the abstract, I think the feature fits snugly with other 
 mechanisms in D and seems to be a natural part of a contract-based 
 design, so D programmers should have a predisposition (that C++ 
 programmers might not have) of adopting such a feature.
 
 I really hope this doesn't come off as rude; as I said, you make a very 
 good point, one that needs answering. I guess what I'm saying can be 
 summed up as: it is a new feature; there have been mistakes; it has a 
 lot of potential and we can make it better. I'd be curious to hear what 
 you think.
I'm a physicist and most of my programming involves quantities which have units. Yet, I can't really imagine myself using a units library. A few observations from my own code: * For each dimension, choose a unit, and use it throughout the code. For example, my code always uses mm because it's a natural size for the work I do. Mixing (say) cm and m is always a design mistake. Scaling should happen only at input and output, not in internal calculations. (So my feeling is, that the value of a units library would come from keeping track of dimension rather than scale). * Most errors involving units can, in my experience, easily be flushed out with a couple of unit tests. This is particularly true of scale errors. The important use cases would be situations where that isn't true. * Arrays are very important. Although an example may have force = mass * accelaration, in real code mass won't be a double, it'll be an array of doubles.
 Since it is
 such a new feature, I think we should talk about its potential rather
 than its acceptance.
I'm really glad you've said that. It's important to be clear that doing a perfect job on this project does not necessarily mean that we end up with a widely used library. You might be right that the implementations have held back widespread use -- I just see a significant risk that we end up with an elegant, well written library that never gets used. If the author is aware of that risk, it's OK. If not, it would be a very depressing thing to discover after the project was completed.
Mar 29 2011
next sibling parent reply Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
To David:
Ok, right now, I got two working versions, one sorting by .mangleof and one
performing a double-inclusion test on the tuples. Both work, I can't see any
performance increase in the .mangleof one, but if .mangleof returns unique
string, I say we use it this way.
Regarding my string little DSL. I have 3 solutions right now:
1. Drop the DSL altogether, right now my system would work perfectly fine
with boost-like tuples (a list of units alternating with exponents):
Quantity!(Metre,1,Second,-1) speed = distance/time;
While less readable, this doesn't have the disadvantages of the following 2.

2. Use a mixin template to declare the expression parser in the current
scope:
mixin DeclareExprQuantity!();

struct Metre {}
struct Second {}
struct Kg {}

void f() {
       ExprQuantity!("Metre/Second * Kg^-1") q = speed / mass;
}
This works, is readable, but it uses C-preprocessor like behaviour (read:
black vodoo) - a library declaring something in your scope isn't very nice.

3. Abandon using types as units and just use strings all the way. This
doesn't guarantee unit name uniqueness and a misspelled unit name is a new
unit. One could use an algorithm to convert all strings to a cannonical form
(like Andrei suggested) and then use string equality for unit equality.

What do you think, I'm personally quite divided:
1. I like that this is simple and it works. It make writing derived units
unnatural though.
2. I actually like this one, despite the obvious ugliness. It's just one
extra line at the beginning of your code and you can the use arithmetic
operations and use type-uniqueness to guarantee unit-uniqueness.
3. This is a bit dangerous. It works very well as long as there isn't more
than one system of units. I still like it a bit.

The only completely clean alternative would be the abominable:
Quantity!( mixin(Expr!("Metre/Second")) ) q;



To Don:
* Choosing one unit and using it is still a very good idea. As I said there
are to be no implicit conversions, so this system would ensure you don't, by
mistake, adhere to this convention. Also, if somebody else uses your library
maybe they assume everything is in meters when in fact you use milimeters.
Sure they should check the documentation, but it's better if they get a nice
error message "Inferred unit Meter doesn't match expected Milimeter", or
something like that.
* True, scale errors can be figured out easily, multiplying something with
an acceleration instead of velocity, or forgetting to multiply acceleration
by a timestep isn't as easily checked. Multiplying instead of dividing in a
formula, or forgetting to divide by a normalisation constant are other
things you may forget, and are caught instantly by unit checking.
* Arrays & vectors are very important, I agree. The Quantity! type is
parametrised both by a unit and a value type, therefore, if one wants a
vector whose components are of the same unit, using "Quantity!(Metre,
Vector!(double))" would work. Vector!(Quantity!(Metre,double)) would also
work. As long as the value type has arithmetic operations defined everything
works out. Same goes for arrays.

There is a risk that it never gets used, sure. But I think that units will
become commonplace, some time in the future, so while it won't get wide
acceptance very soon, at some point people will be looking on Wikipedia for
"Languages supporting measurement units" and it will be good for D to show
up there.




On 29 March 2011 14:36, Don <nospam nospam.com> wrote:

 Cristi Cobzarenco wrote:

 To Don:
 That is a very good point and I agree that one shouldn't implement
 features just because they're popular. There don't seem to be many (if any

 But, I think the reason Boost.Units isn't use hasn't got much to do with
 the idea as much as it does with the implementation. Using units in Boost is
 very cumbersome. Adding new units (measuring different dimensions)
 on-the-fly is virtually impossible. I think that that Boost.Units misses the
 point of units. They should be a natural extension of the type system of a
 language, not something so limited to the area of natural sciences. D is a
 new language and we should be pushing the envelope, just because the Boost
 failed (if it did, it may very well kick-off later) doesn't mean we
 shouldn't do it. Since it is such a new feature, I think we should talk
 about its potential rather than its acceptance.



 people are still trying to figure out exactly how to use it. I feel that in

 some conventions and good practices.
 As I said in the abstract, I think the feature fits snugly with other
 mechanisms in D and seems to be a natural part of a contract-based design,
 so D programmers should have a predisposition (that C++ programmers might
 not have) of adopting such a feature.

 I really hope this doesn't come off as rude; as I said, you make a very
 good point, one that needs answering. I guess what I'm saying can be summed
 up as: it is a new feature; there have been mistakes; it has a lot of
 potential and we can make it better. I'd be curious to hear what you think.
I'm a physicist and most of my programming involves quantities which have units. Yet, I can't really imagine myself using a units library. A few observations from my own code: * For each dimension, choose a unit, and use it throughout the code. For example, my code always uses mm because it's a natural size for the work I do. Mixing (say) cm and m is always a design mistake. Scaling should happen only at input and output, not in internal calculations. (So my feeling is, that the value of a units library would come from keeping track of dimension rather than scale). * Most errors involving units can, in my experience, easily be flushed out with a couple of unit tests. This is particularly true of scale errors. The important use cases would be situations where that isn't true. * Arrays are very important. Although an example may have force = mass * accelaration, in real code mass won't be a double, it'll be an array of doubles.
 Since it is
 such a new feature, I think we should talk about its potential rather
 than its acceptance.
I'm really glad you've said that. It's important to be clear that doing a perfect job on this project does not necessarily mean that we end up with a widely used library. You might be right that the implementations have held back widespread use -- I just see a significant risk that we end up with an elegant, well written library that never gets used. If the author is aware of that risk, it's OK. If not, it would be a very depressing thing to discover after the project was completed.
-- (Cristi Cobzarenco) Pofile: http://www.google.com/profiles/cristi.cobzarenco
Mar 29 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 3/29/11 3:49 PM, Cristi Cobzarenco wrote:
 To David:
 Ok, right now, I got two working versions, one sorting by .mangleof and
 one performing a double-inclusion test on the tuples. Both work, I can't
 see any performance increase in the .mangleof one, but if .mangleof
 returns unique string, I say we use it this way.
To be honest, I still don't see how you are able to get away without canonicalization in the first place; would you mind to elaborate on how you solve the issue of different ordering of expression yielding types? This is not about the algorithm to determine whether two types are semantically equivalent, where your algorithm would work fine as well, but about the actual D types. If you don't sort them, Quantity!(BaseUnitExp!(Meter, 1), BaseUnitExp!(Second, -2)) and Quantity!(BaseUnitExp!(Second, -2), BaseUnitExp!(Meter, 1)) would be different types, which is not desirable for obvious reasons.
 Regarding my string little DSL. I have 3 solutions right now:
 1. Drop the DSL altogether, right now my system would work perfectly
 fine with boost-like tuples (a list of units alternating with exponents):
 Quantity!(Metre,1,Second,-1) speed = distance/time;
 While less readable, this doesn't have the disadvantages of the following 2.

 2. Use a mixin template to declare the expression parser in the current
 scope:
 mixin DeclareExprQuantity!();

 struct Metre {}
 struct Second {}
 struct Kg {}

 void f() {
         ExprQuantity!("Metre/Second * Kg^-1") q = speed / mass;
 }
 This works, is readable, but it uses C-preprocessor like behaviour
 (read: black vodoo) - a library declaring something in your scope isn't
 very nice.

 […]

 The only completely clean alternative would be the abominable:
 Quantity!( mixin(Expr!("Metre/Second")) ) q;
Get out of my head! Immediately! ;) Just kidding – incidentally I considered exactly the same options when designing my current prototype. My current approach would be a mix between 1 and 2: I don't think the Boost approach of using »dummy« instances of units is any less readable than your proposed one when you don't deal with a lot of units. For example, consider enum widgetCount = quantity!("Widget")(2); vs. enum widgetCount = 2 * widgets; This could also be extended to type definitions to avoid having to manually write the template instantiation: Quantity!("meter / second", float) speed; vs. typeof(1.f * meter / second) speed; There are situations, though, where using unit strings could considerable improve readability, namely when using lots of units with exponents. In these cases, a mixin could be used to bring all the types in scope for the »parsing template«, similar to the one you suggested. If a user of the library things could use an additional mixin identifier to clarify the code, e.g. »mixin UnitStringParser U; […] U.unit!"m/s"«). But a more attractive solution would exploit the fact that you would mostly use units with a lot of exponents when working with a »closed« unit system without the need for ad-hoc extensions, like the SI system, which would allow you to use unit symbols instead of the full name, which wouldn't need to be globally unique and wouldn't pollute the namespace (directly defining a type »m« to express meters would probably render the module unusable without static imports). It would essentially work by instantiating a parser template with all the named units. Thus, the parser would know all the types and could query them for additional properties like short names/symbols, etc. In code: --- module units.si; […] alias UnitSystem!(Meter, Second, …) Si; --- module client; import units.si; auto inductance = 5.0 * Si.u!"m^2 kg/(s^2 A^2)"; --- This could also be combined with the mixin parser approach like this: --- import units.si; mixin UnitStringParser!(Si) U; --- But to reiterate my point, I don't think a way to parse unit strings is terribly important, at least not if it isn't coupled with other things like the ability to add shorthand symbols. David
Mar 29 2011
parent reply Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
Well they don't _have_ to be the same type as long they're convertible to
one another, and one can make sure they're convertible based on the result
of the double-inclusion. It does make more sense for them to be the same
type, I agree, therefore I'll be sticking to the .mangleof version. Dummy
objects are fine, the only problem is the fact that one has to define the
extra objects (and, when one wants to count objects, you'll need to define =
a
different type). I was also considering shorthand symbols, since it seems
like a natural addition, I'll have think a bit more on how exactly to do
that, to avoid collisions.
Regarding if this is appropriate for GSoC. It doesn't take me 12 weeks to
write a prototype, sure. But to have a library fit to be part of the
standard library of language takes a lot of laborious testing, going throug=
h
use-cases and making sure it is highly usable. Also, there should be an
effort to make sure other libraries make use of units when appropriate (lik=
e
Andrei suggested std.datetime) etc. As we all know it takes 90% of the time
to code 10% of the code. I think all of this extra polish, reliability and
usability is very important and takes the extra 11 weeks. It's not the most
glorified kind of work, but I really think it's worth it.

On 29 March 2011 23:40, David Nadlinger <see klickverbot.at> wrote:

 On 3/29/11 3:49 PM, Cristi Cobzarenco wrote:

 To David:
 Ok, right now, I got two working versions, one sorting by .mangleof and
 one performing a double-inclusion test on the tuples. Both work, I can't
 see any performance increase in the .mangleof one, but if .mangleof
 returns unique string, I say we use it this way.
To be honest, I still don't see how you are able to get away without canonicalization in the first place; would you mind to elaborate on how y=
ou
 solve the issue of different ordering of expression yielding types? This =
is
 not about the algorithm to determine whether two types are semantically
 equivalent, where your algorithm would work fine as well, but about the
 actual D types. If you don't sort them, Quantity!(BaseUnitExp!(Meter, 1),
 BaseUnitExp!(Second, -2)) and Quantity!(BaseUnitExp!(Second, -2),
 BaseUnitExp!(Meter, 1)) would be different types, which is not desirable =
for
 obvious reasons.


  Regarding my string little DSL. I have 3 solutions right now:
 1. Drop the DSL altogether, right now my system would work perfectly
 fine with boost-like tuples (a list of units alternating with exponents)=
:
 Quantity!(Metre,1,Second,-1) speed =3D distance/time;
 While less readable, this doesn't have the disadvantages of the followin=
g
 2.

 2. Use a mixin template to declare the expression parser in the current
 scope:
 mixin DeclareExprQuantity!();

 struct Metre {}
 struct Second {}
 struct Kg {}

 void f() {
        ExprQuantity!("Metre/Second * Kg^-1") q =3D speed / mass;
 }
 This works, is readable, but it uses C-preprocessor like behaviour
 (read: black vodoo) - a library declaring something in your scope isn't
 very nice.

 [=85]


 The only completely clean alternative would be the abominable:
 Quantity!( mixin(Expr!("Metre/Second")) ) q;
Get out of my head! Immediately! ;) Just kidding =96 incidentally I considered exactly the same options when designing my current prototype. =
My
 current approach would be a mix between 1 and 2: I don't think the Boost
 approach of using =BBdummy=AB instances of units is any less readable tha=
n your
 proposed one when you don't deal with a lot of units. For example, consid=
er
 enum widgetCount =3D quantity!("Widget")(2);
 vs.
 enum widgetCount =3D 2 * widgets;

 This could also be extended to type definitions to avoid having to manual=
ly
 write the template instantiation:

 Quantity!("meter / second", float) speed;
 vs.
 typeof(1.f * meter / second) speed;

 There are situations, though, where using unit strings could considerable
 improve readability, namely when using lots of units with exponents. In
 these cases, a mixin could be used to bring all the types in scope for th=
e
 =BBparsing template=AB, similar to the one you suggested. If a user of th=
e
 library things  could use an additional mixin identifier to clarify the
 code, e.g. =BBmixin UnitStringParser U; [=85] U.unit!"m/s"=AB).

 But a more attractive solution would exploit the fact that you would most=
ly
 use units with a lot of exponents when working with a =BBclosed=AB unit s=
ystem
 without the need for ad-hoc extensions, like the SI system, which would
 allow you to use unit symbols instead of the full name, which wouldn't ne=
ed
 to be globally unique and wouldn't pollute the namespace (directly defini=
ng
 a type =BBm=AB to express meters would probably render the module unusabl=
e
 without static imports).

 It would essentially work by instantiating a parser template with all the
 named units. Thus, the parser would know all the types and could query th=
em
 for additional properties like short names/symbols, etc. In code:

 ---
 module units.si;
 [=85]
 alias UnitSystem!(Meter, Second, =85) Si;
 ---
 module client;
 import units.si;
 auto inductance =3D 5.0 * Si.u!"m^2 kg/(s^2 A^2)";
 ---

 This could also be combined with the mixin parser approach like this:
 ---
 import units.si;
 mixin UnitStringParser!(Si) U;
 ---

 But to reiterate my point, I don't think a way to parse unit strings is
 terribly important, at least not if it isn't coupled with other things li=
ke
 the ability to add shorthand symbols.

 David
--=20 (Cristi Cobzarenco) Pofile: http://www.google.com/profiles/cristi.cobzarenco
Mar 29 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 3/30/11 12:20 AM, Cristi Cobzarenco wrote:
 Well they don't _have_ to be the same type as long they're convertible
 to one another, and one can make sure they're convertible based on the
 result of the double-inclusion.
But how would you make them _implicitly_ convertible then? David
Mar 29 2011
parent reply Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
By making the operators on quantity templates:
ref Quantity opAssign(U)( Quantity!(U) u2 ) {
static assert( SameUnit!(U,Unit) );
this.value = u2.value;
return this;
}

Same for addition, subtraction and equality. Multiplication and division
will have to have a different return type.
Seems right to me, am I missing something?

On 30 March 2011 00:39, David Nadlinger <see klickverbot.at> wrote:

 On 3/30/11 12:20 AM, Cristi Cobzarenco wrote:

 Well they don't _have_ to be the same type as long they're convertible
 to one another, and one can make sure they're convertible based on the
 result of the double-inclusion.
But how would you make them _implicitly_ convertible then? David
-- (Cristi Cobzarenco) Pofile: http://www.google.com/profiles/cristi.cobzarenco
Mar 30 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 3/30/11 11:21 AM, Cristi Cobzarenco wrote:
 Seems right to me, am I missing something?
opAssign isn't taken into consideration when initializing variables or passing values to functions. An example probably says more than thousand words: --- struct Test { ref Test opAssign(int i) { value = i; return this; } int value; } void foo(Test t) {} void main() { // Neither of the following two lines compiles, IIRC: Test t = 4; // (1) foo(4); // (2) } --- You can make case (1) work by defining a static opCall taking an int, which will be called due to property syntax, but I can't think of any solution for (2). David
Mar 30 2011
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Maybe OT, but here's some hackish wizardry you can do with classes:

class Test
{
    int value;

    this(int x)
    {
        value = x;
    }
}

ref Test foo(Test t ...)
{
    return tuple(t)[0];
}

void main()
{
    auto result = foo(4);
    assert(result.value == 4);
}

The Tuple is used to trick DMD into escaping the local `t` reference.
This won't work with structs. And `t` should be constructed on the
stack, but it seems the destructor gets called only after the exit
from main. The docs do say that construction of classes in variadic
arguments depend on the implementation.

I asked on the newsgroups whether Typesafe Variadic Functions
automatically calling a constructor was a good thing at all. Read
about this feature here under "Typesafe Variadic Functions":
http://www.digitalmars.com/d/2.0/function.html
Mar 30 2011
prev sibling next sibling parent Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
Yeah, you're right (case (1) also works with a template ctor as well - in
C++ this would allow for implicit conversions as well, that's why I thought
about using it this way). As I said, I had already abandoned this approach
and decided on using .mangleof sorting anyway for elegance. I think my
proposal write-up is almost ready, will submit it today or tomorrow.



(Cristi Cobzarenco)
Pofile: http://www.google.com/profiles/cristi.cobzarenco


On 30 March 2011 15:26, David Nadlinger <see klickverbot.at> wrote:

 On 3/30/11 11:21 AM, Cristi Cobzarenco wrote:

 Seems right to me, am I missing something?
opAssign isn't taken into consideration when initializing variables or passing values to functions. An example probably says more than thousand words: --- struct Test { ref Test opAssign(int i) { value = i; return this; } int value; } void foo(Test t) {} void main() { // Neither of the following two lines compiles, IIRC: Test t = 4; // (1) foo(4); // (2) } --- You can make case (1) work by defining a static opCall taking an int, which will be called due to property syntax, but I can't think of any solution for (2). David
Mar 30 2011
prev sibling next sibling parent Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
Hmmm, the only problem with this is that we would have to require the
library users to do this to their functions. Thanks for the suggestion but
I'll stick with .mangleof sorting.
(Cristi Cobzarenco)
Pofile: http://www.google.com/profiles/cristi.cobzarenco


On 30 March 2011 16:49, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:

 Maybe OT, but here's some hackish wizardry you can do with classes:

 class Test
 {
    int value;

    this(int x)
    {
        value = x;
    }
 }

 ref Test foo(Test t ...)
 {
    return tuple(t)[0];
 }

 void main()
 {
    auto result = foo(4);
    assert(result.value == 4);
 }

 The Tuple is used to trick DMD into escaping the local `t` reference.
 This won't work with structs. And `t` should be constructed on the
 stack, but it seems the destructor gets called only after the exit
 from main. The docs do say that construction of classes in variadic
 arguments depend on the implementation.

 I asked on the newsgroups whether Typesafe Variadic Functions
 automatically calling a constructor was a good thing at all. Read
 about this feature here under "Typesafe Variadic Functions":
 http://www.digitalmars.com/d/2.0/function.html
Mar 30 2011
prev sibling parent Cristi Cobzarenco <cristi.cobzarenco gmail.com> writes:
Ok, my proposal is up, I'm looking forward to feedback.

*fingers crossed*
(Cristi Cobzarenco)
Pofile: http://www.google.com/profiles/cristi.cobzarenco


On 30 March 2011 17:03, Cristi Cobzarenco <cristi.cobzarenco gmail.com>wrote:

 Hmmm, the only problem with this is that we would have to require the
 library users to do this to their functions. Thanks for the suggestion but
 I'll stick with .mangleof sorting.
 (Cristi Cobzarenco)
 Pofile: http://www.google.com/profiles/cristi.cobzarenco


 On 30 March 2011 16:49, Andrej Mitrovic <andrej.mitrovich gmail.com>wrote:

 Maybe OT, but here's some hackish wizardry you can do with classes:

 class Test
 {
    int value;

    this(int x)
    {
        value = x;
    }
 }

 ref Test foo(Test t ...)
 {
    return tuple(t)[0];
 }

 void main()
 {
    auto result = foo(4);
    assert(result.value == 4);
 }

 The Tuple is used to trick DMD into escaping the local `t` reference.
 This won't work with structs. And `t` should be constructed on the
 stack, but it seems the destructor gets called only after the exit
 from main. The docs do say that construction of classes in variadic
 arguments depend on the implementation.

 I asked on the newsgroups whether Typesafe Variadic Functions
 automatically calling a constructor was a good thing at all. Read
 about this feature here under "Typesafe Variadic Functions":
 http://www.digitalmars.com/d/2.0/function.html
Apr 01 2011
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/29/2011 07:36 AM, Don wrote:
 I'm a physicist and most of my programming involves quantities which
 have units. Yet, I can't really imagine myself using a units library. A
 few observations from my own code:
 * For each dimension, choose a unit, and use it throughout the code. For
 example, my code always uses mm because it's a natural size for the work
 I do. Mixing (say) cm and m is always a design mistake. Scaling should
 happen only at input and output, not in internal calculations. (So my
 feeling is, that the value of a units library would come from keeping
 track of dimension rather than scale).
Many of my bugs involving numeric code is that I mix scalars with units, not units of different scale. Andrei
Mar 29 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 03/29/2011 03:49 PM, Cristi Cobzarenco wrote:
 To David:
 Ok, right now, I got two working versions, one sorting by .mangleof and one
 performing a double-inclusion test on the tuples. Both work, I can't see any
 performance increase in the .mangleof one, but if .mangleof returns unique
 string, I say we use it this way.
 Regarding my string little DSL. I have 3 solutions right now:
 1. Drop the DSL altogether, right now my system would work perfectly fine
 with boost-like tuples (a list of units alternating with exponents):
 Quantity!(Metre,1,Second,-1) speed = distance/time;
 While less readable, this doesn't have the disadvantages of the following 2.

 2. Use a mixin template to declare the expression parser in the current
 scope:
 mixin DeclareExprQuantity!();

 struct Metre {}
 struct Second {}
 struct Kg {}

 void f() {
         ExprQuantity!("Metre/Second * Kg^-1") q = speed / mass;
 }
 This works, is readable, but it uses C-preprocessor like behaviour (read:
 black vodoo) - a library declaring something in your scope isn't very nice.

 3. Abandon using types as units and just use strings all the way. This
 doesn't guarantee unit name uniqueness and a misspelled unit name is a new
 unit. One could use an algorithm to convert all strings to a cannonical form
 (like Andrei suggested) and then use string equality for unit equality.

 What do you think, I'm personally quite divided:
 1. I like that this is simple and it works. It make writing derived units
 unnatural though.
 2. I actually like this one, despite the obvious ugliness. It's just one
 extra line at the beginning of your code and you can the use arithmetic
 operations and use type-uniqueness to guarantee unit-uniqueness.
 3. This is a bit dangerous. It works very well as long as there isn't more
 than one system of units. I still like it a bit.
Have you considered 0. Derived units are declared? After all, relative to the size of an app, and the amount of work it represents, declaring actually used derived units is very a small burden. This means instead of: struct meter {} struct second {} auto dist = Quantity!"meter"(3.0); auto time = Quantity!"second"(2.0); auto speed = Quantity!"meter/second"(dist/time); auto surface = Quantity!"meter2"(dist*dist); one would write: struct meter {} struct second {} alias FractionUnit!(meter,second) meterPerSecond; alias PowerUnit!(meter,2) squareMeter; auto dist = Quantity!meter(3.0); auto time = Quantity!second(2.0); auto speed = Quantity!meterPerSecond(dist/time); auto surface = Quantity!squareMeter(dist*dist); This means you use struct templates as unit-id factories, for user's convenience. The constructor would then generate the metadata needed for unit-type checking, strored on the struct itself (this is far more easily using such struct templates than by parsing a string). In addition to the 2 struct templates above, there should be struct ProductUnit(Units...) {...} (accepting n base units); and I guess that's all, isn't it? The only drawback is that very complicated derived units need be constructed step by step. But this can also be seen as an advantage. An alternative may be to have a single, but more sophisticated and more difficult to use, struct template. I find several advantages to this approach: * Simplicity (also of implementation, I guess). * Unit identifiers are structs all along (both in code and in semantics). * No string mixin black voodoo. I guess even if this is not ideal, you could start with something similar, because it looks easier and cleaner (to me). A similar system may be used for units of diff scales in the same dimension: alias ScaleUnit!(mm,1_000_000) km; By the way, have you considered unit-less (pseudo-)magnitudes (I mean ratios, including %). I would have one declared and exported as constant. then, alias ScaleUnit!(voidUnit,0.001) perthousand;
 To Don:
 * Choosing one unit and using it is still a very good idea. As I said there
 are to be no implicit conversions, so this system would ensure you don't, by
 mistake, adhere to this convention. Also, if somebody else uses your library
 maybe they assume everything is in meters when in fact you use milimeters.
 Sure they should check the documentation, but it's better if they get a nice
 error message "Inferred unit Meter doesn't match expected Milimeter", or
 something like that.
I agree with this. Denis -- _________________ vita es estrany spir.wikidot.com
Mar 29 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/29/2011 02:06 AM, Don wrote:
 Cristi Cobzarenco wrote:
 First, let me apologize for this very late entry, it's the end of
 university and it's been a very busy period, I hope you will still
 consider it.

 Note this email is best read using a fixed font.

 PS: I'm really sorry if this is the wrong mailing list to post and I
 hope you'll forgive me if that's the case.

 ======= Google Summer of Code Proposal: Statically Checked Units =======


 Abstract
 -------------

 Measurement units allow to statically check the correctness of
 assignments and expressions at virtually no performance cost and very
 little extra effort. When it comes to physics the advantages are
 obvious – if you try to assign a force a variable measuring distance,
 you've most certainly got a formula wrong somewhere along the way.
 Also, showing a sensor measurement in gallons on a litre display that
 keeps track of the remaining fuel of a plane (a big no-no) is easily
 avoidable with this technique. What this translates is that one more
 of the many hidden assumptions in source code is made visible: units
 naturally complement other contract checking techniques, like
 assertions, invariants and the like. After all the unit that a value
 is measured in is part of the contract.
This is one of those features that gets proposed frequently in multiple languages. It's a great example for metaprogramming. But, are there examples of this idea being seriously *used* in production code in ANY language? (For example, does anybody actually use Boost.Unit?)
At work we use C++ enums for categorical types to great effect. The way it works is: enum UserId { min = 0, max = 1 << 31 }; enum AppId { min = 0, max = 1 << 31 }; then we express data in terms of UserID, AppId instead of an integral type, and we cast to it when we read it off the wire or the database. The beauty of it is that you can never pass by mistake an AppId instead of a UserId of vice versa, or even a raw int as one without explicitly stating intent. It's saved us a lot of bugs (I know because I found some when converting raw ints to enums) and presumably potential bugs. If we used quantities probably a similar benefit would emerge from using dimensional analysis. I know that in my machine learning code it's very difficult to spot bugs because "it's all numbers". If I used a sort of a double "enum" that could only be a probability, I'm sure I'd save myself a ton of bugs. Andrei
Mar 29 2011
parent reply spir <denis.spir gmail.com> writes:
On 03/29/2011 04:45 PM, Andrei Alexandrescu wrote:
 On 03/29/2011 02:06 AM, Don wrote:
 Cristi Cobzarenco wrote:
 First, let me apologize for this very late entry, it's the end of
 university and it's been a very busy period, I hope you will still
 consider it.

 Note this email is best read using a fixed font.

 PS: I'm really sorry if this is the wrong mailing list to post and I
 hope you'll forgive me if that's the case.

 ======= Google Summer of Code Proposal: Statically Checked Units =======


 Abstract
 -------------

 Measurement units allow to statically check the correctness of
 assignments and expressions at virtually no performance cost and very
 little extra effort. When it comes to physics the advantages are
 obvious – if you try to assign a force a variable measuring distance,
 you've most certainly got a formula wrong somewhere along the way.
 Also, showing a sensor measurement in gallons on a litre display that
 keeps track of the remaining fuel of a plane (a big no-no) is easily
 avoidable with this technique. What this translates is that one more
 of the many hidden assumptions in source code is made visible: units
 naturally complement other contract checking techniques, like
 assertions, invariants and the like. After all the unit that a value
 is measured in is part of the contract.
This is one of those features that gets proposed frequently in multiple languages. It's a great example for metaprogramming. But, are there examples of this idea being seriously *used* in production code in ANY language? (For example, does anybody actually use Boost.Unit?)
At work we use C++ enums for categorical types to great effect. The way it works is: enum UserId { min = 0, max = 1 << 31 }; enum AppId { min = 0, max = 1 << 31 }; then we express data in terms of UserID, AppId instead of an integral type, and we cast to it when we read it off the wire or the database. The beauty of it is that you can never pass by mistake an AppId instead of a UserId of vice versa, or even a raw int as one without explicitly stating intent. It's saved us a lot of bugs (I know because I found some when converting raw ints to enums) and presumably potential bugs. If we used quantities probably a similar benefit would emerge from using dimensional analysis. I know that in my machine learning code it's very difficult to spot bugs because "it's all numbers". If I used a sort of a double "enum" that could only be a probability, I'm sure I'd save myself a ton of bugs.
Waow, this is a great explanation of expected benefits of units, I guess. Also, isn't this precisely the power of true typedefs? Denis -- _________________ vita es estrany spir.wikidot.com
Mar 29 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/29/11 2:17 PM, spir wrote:
 On 03/29/2011 04:45 PM, Andrei Alexandrescu wrote:
 Waow, this is a great explanation of expected benefits of units, I guess.
 Also, isn't this precisely the power of true typedefs?
Typedefs would not allow defining categorical types (e.g. no arithmetic). Fortunately there are already means in the language for defining such types. Andrei
Mar 29 2011