digitalmars.D - Checking function parameters in Phobos

Andrei Alexandrescu (31/31) Nov 19 2013 There's been recent discussion herein about what parameter validation

growler (26/60) Nov 19 2013 I'm not a Phobos dev. but as a user of Phobos and coming from
bearophile (32/52) Nov 19 2013 I think Phobos should rely much more on Contract Programming

Brad Anderson (3/15) Nov 19 2013 Is that not what phobo's AsciiString is?
Walter Bright (3/6) Nov 20 2013 Which ones? The ones I coded up originally were designed so they weren't...

Jacob Carlborg (10/39) Nov 19 2013 Would we accompany the assumeSorted with an assert in the function

Marco Leise (7/13) Nov 20 2013 That is what LDC does and with the -defaultlib switch it is
Timon Gehr (9/11) Nov 20 2013 We do in any case:

Jacob Carlborg (5/13) Nov 20 2013 I don't understand what this is supposed to show. That the type is

Timon Gehr (2/16) Nov 20 2013 Yes, hence SortedRange being sorted is just a convention in any case.

Andrei Alexandrescu (6/7) Nov 20 2013 That's right. In particular we can't have assumeSorted check for
Meta (6/28) Nov 20 2013 Couldn't we have an overload of each of the mutating functions in

Meta (3/33) Nov 20 2013 That is, a mutating function that takes a sorted range strips the
Andrei Alexandrescu (3/28) Nov 20 2013 That wouldn't help much - people have access to the underlying range any...

Meta (8/11) Nov 20 2013 You're right, I forgot about that. However, people generally

Joseph Rushton Wakeling (7/17) Nov 19 2013 Regarding enforce() vs. assert(), a good rule that I remember having sug...
Walter Bright (15/17) Nov 20 2013 Important is deciding upon the notions of "validated data" and "untruste...

Jacob Carlborg (8/23) Nov 20 2013 How should we accomplish this? We can't replace:

Jonathan M Davis (13/45) Nov 20 2013 You'd do it the other way around by having something like

Jacob Carlborg (7/18) Nov 20 2013 If not just if the string is valid UTF-8. There can be many other types

Marco Leise (15/36) Nov 20 2013 None of that is feasible. We can only hope that we simply

Jacob Carlborg (6/17) Nov 20 2013 I don't know how getopt behaves but using them as a filename will most

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (21/44) Nov 20 2013 May I suggest:

Meta (4/61) Nov 20 2013 I was having the exact same thought. I think this could be very

Dicebot (6/6) Nov 20 2013 I also think this is very powerful and under-explored approach

Dmitry Olshansky (13/19) Nov 20 2013 I think the obstacles are mostly:

Dicebot (7/13) Nov 20 2013 This is the very reason why I am saying it makes much more sense

inout (19/76) Nov 21 2013 What if you have more that just one validation, e.g. Positive and

Meta (11/17) Nov 21 2013 Allow multiple validation functions. Then a Validated type is

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (10/22) Nov 21 2013 I believe inout's point was this, though:

Marco Leise (10/35) Nov 25 2013 =20

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (15/47) Nov 25 2013 Do you mean this?

inout (11/65) Nov 26 2013 I find this to be too verbose to be useful. And you also need to

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (19/30) Nov 26 2013 This I understand. It is actually the best argument I can find in favor
Meta (4/6) Nov 26 2013 It isn't surprising that any operation that expects int will get

Marco Leise (6/32) Nov 26 2013 =20

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (12/34) Nov 24 2013 I've created a version of Validated now that takes 1 or more

Meta (84/114) Nov 24 2013 Awesome, I was messing around with something similar but you beat

Meta (2/11) Nov 24 2013 "//Fails" should be "//Passes" as well.
=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (24/99) Nov 25 2013 Even better - test if 'if (fn(value)) {}' compiles. Fixed.

Meta (7/31) Nov 25 2013 What about a version flag, then, that can be passed to specify

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (5/11) Nov 26 2013 That's already in:

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (5/53) Nov 20 2013 Uh-hm. Add this:

Dmitry Olshansky (6/31) Nov 20 2013 And it decays to the naked type in a blink of an eye. And some function

Meta (8/39) Nov 20 2013 Yes. It is very important not to allow direct access to the

Jonathan M Davis (5/11) Nov 20 2013 It's arguably pretty pointless to put a nullable type in

Meta (5/18) Nov 20 2013 See the discussion from the other thread for why it can be useful

Jonathan M Davis (5/27) Nov 20 2013 I know. And I still think that it's pointless - and it incurs extra over...

Jacob Carlborg (6/12) Nov 20 2013 In that case all string functionality needs to be provided inside the

Jonathan M Davis (6/17) Nov 20 2013 You could use alias this and alias the Validated struct to the underlyin...

Jacob Carlborg (5/9) Nov 21 2013 Yeah, that's what needs to be avoided and is the reason "alias this" or

Meta (21/36) Nov 20 2013 This is tricky business. Unfortunately, having the wrapper be

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (25/42) Nov 20 2013 And guess what? That's (often) ok. It's better to do the validation once...

Jacob Carlborg (4/19) Nov 20 2013 It's still accessible via "value".

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (10/30) Nov 21 2013 Indeed it is. If we want to make it perfectly impossible to get at the

Daniel Davidson (6/8) Nov 21 2013 Not if that function down the road only accepted validated in the

Walter Bright (8/17) Nov 20 2013 Utf validation isn't the only form of validation for strings. You could,...

Jonathan M Davis (29/48) Nov 20 2013 Yes, but we seemed to be discussing the possibility of having some kind ...
Marco Leise (15/23) Nov 25 2013 A checked type for database access goes a bit beyond the scope

Walter Bright (3/8) Nov 20 2013 Use a different type for the validated string, validated means your prog...

Jonathan M Davis (74/75) Nov 20 2013 In general, I favor using defensive programming in library APIs and usin...

Jacob Carlborg (16/38) Nov 20 2013 I think Walter suggestion requires the use of asserts:

Timon Gehr (4/10) Nov 20 2013 void process(Data data)in{ assert(isValid(data)); }body{

Jacob Carlborg (4/7) Nov 20 2013 Right, forgot about contracts.

Lars T. Kyllingstad (30/33) Nov 20 2013 I think it is fair to always assume that a char[] is a valid

Jonathan M Davis (12/27) Nov 20 2013 That doesn't work when strings are being created via concatenation and t...
Dmitry Olshansky (9/25) Nov 20 2013 Sadly it's horrifically slow to do so. Above all practicality must take
Lionello Lunesu (9/16) Nov 26 2013 +1

Jonathan M Davis (25/51) Nov 20 2013 When an assertion fails, it's a bug in your code. Assertions should _nev...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

There's been recent discussion herein about what parameter validation 
method would be best for Phobos to adhere to.

Currently we are using a mix of approaches:

1. Some functions enforce()

2. Some functions just assert()

3. Some (fewer I think) functions assert(0)

4. Some functions don't do explicit checking, relying instead on 
lower-level enforcement such as null dereference and bounds checking to 
ensure safety.

Each method has its place. The question is what guidelines we put 
forward for Phobos code to follow; we're a bit loose about that right now.

A second, just as interesting topic, is how to design abstractions for 
speed and safety. There are cases in which spurious checking is 
prohibitively expensive if not necessary, so it should be avoided where 
necessary. Examples:

(a) FracSecs(long x) validates x to be within range. The cost of the 
validation itself is about as high as the payload itself (which is one 
assignment).

(b) sort() offers a SortedRange with its goodies. We also have 
assumeSorted that also offers a SortedRange, but relies on the user to 
validate that assumption.

(c) A variety of text functions currently suffer because we don't make 
the difference between validated UTF strings and potentially invalid ones.

Walter and I are thinking of fostering the idiom in which types (or 
attributes?) are used as information about validation, similar to how 
assumeSorted works. Building on that, we'd have a function like "static 
FracSecs assumeValid(long)" inside FracSecs (no need for a different 
type here). Then, we'd have a CleanUTF type or something that would 
guarantee the string stored within has been validated.


Please chime in with ideas!

Andrei

Nov 19 2013

"growler" <growlercab gmail.com> writes:

On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei 
Alexandrescu wrote:
 There's been recent discussion herein about what parameter 
 validation method would be best for Phobos to adhere to.

 Currently we are using a mix of approaches:

 1. Some functions enforce()

 2. Some functions just assert()

 3. Some (fewer I think) functions assert(0)

 4. Some functions don't do explicit checking, relying instead 
 on lower-level enforcement such as null dereference and bounds 
 checking to ensure safety.

 Each method has its place. The question is what guidelines we 
 put forward for Phobos code to follow; we're a bit loose about 
 that right now.

 A second, just as interesting topic, is how to design 
 abstractions for speed and safety. There are cases in which 
 spurious checking is prohibitively expensive if not necessary, 
 so it should be avoided where necessary. Examples:

 (a) FracSecs(long x) validates x to be within range. The cost 
 of the validation itself is about as high as the payload itself 
 (which is one assignment).

 (b) sort() offers a SortedRange with its goodies. We also have 
 assumeSorted that also offers a SortedRange, but relies on the 
 user to validate that assumption.

 (c) A variety of text functions currently suffer because we 
 don't make the difference between validated UTF strings and 
 potentially invalid ones.

 Walter and I are thinking of fostering the idiom in which types 
 (or attributes?) are used as information about validation, 
 similar to how assumeSorted works. Building on that, we'd have 
 a function like "static FracSecs assumeValid(long)" inside 
 FracSecs (no need for a different type here). Then, we'd have a 
 CleanUTF type or something that would guarantee the string 
 stored within has been validated.


 Please chime in with ideas!

 Andrei

I'm not a Phobos dev. but as a user of Phobos and coming from 
C/C++ I'd like to see...

Less enforce and more debug-only contracts in the std lib, with 
opt-in run-time checks for release builds.

That way I can decide on a function-by-function basis or globally 
at compile time whether the run-time checks occur in release 
builds.

For example, given:

1. FracSecs(long x)
2. FracSecs!Args.verify(long x)

In debug 1. would always have full run-time checking enabled. In 
release builds 1. would only have essential run-time checks, 
preferably none. I can then opt-in for run-time checks in release 
builds using 2.

There would also be a version(ArgsVerify) so I can turn on 
run-time checks globally at compile time in release builds (maybe 
the --debug flag allows this already, not sure).

Of course this unfortunately requires even more work from Phobos 
devs and I'm not a D expert so I don't know how viable it would 
be.

Whatever is decided I'm looking forward to see what you guys come 
up with because I'm currently using Phobos as my "Idiomatic D" 
reference guide.

Thanks G.

Nov 19 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 There's been recent discussion herein about what parameter 
 validation method would be best for Phobos to adhere to.

 Currently we are using a mix of approaches:

 1. Some functions enforce()

 2. Some functions just assert()

 3. Some (fewer I think) functions assert(0)

 4. Some functions don't do explicit checking, relying instead 
 on lower-level enforcement such as null dereference and bounds 
 checking to ensure safety.

 Each method has its place. The question is what guidelines we 
 put forward for Phobos code to follow; we're a bit loose about 
 that right now.

I think Phobos should rely much more on Contract Programming 
based on asserts. This could mean Dmd automatically using a 
Phobos compiled with asserts when you compile your D code 
normally, and automatically using a assert-stripped version of 
Phobos libs when you compile with -release and similar.

In other situations enforce and exceptions are still useful.


 (b) sort() offers a SortedRange with its goodies. We also have 
 assumeSorted that also offers a SortedRange, but relies on the 
 user to validate that assumption.

I'd like another function, that could be named validateSorted() 
that returns a SortedRange and always fully verifies its range 
argument is actually sorted, and throws an exception otherwise. 
So it doesn't assume its input is sorted. It's like a isSorted + 
assumeSorted.


 (c) A variety of text functions currently suffer because we 
 don't make the difference between validated UTF strings and 
 potentially invalid ones.

Often I have genomic data or other text data that is surely ASCII 
(and I can accept a run-time exception at loading time if it's 
not ASCII). Once such text is in memory I'd like to not pay for 
UTF on it. Sometimes you can do this with 
std.string.representation, but there is no opposite function 
(http://d.puremagic.com/issues/show_bug.cgi?id=10162 ). Also in 
Phobos there are several string/char functions that could be made 
faster if the input is assumed to be ASCII. To solve this problem 
in languages as Haskell they usually introduce a new type like 
AsciiString. In past I have suggested to introduce such string 
wrapper in Phobos.


 Then, we'd have a CleanUTF type or something that would 
 guarantee the string stored within has been validated.

In recent talks Bjarne Stroustrup has being advocating a lot such 
usage of types for safety in C++11/C++14, and functional 
programmers use it often since lot of time. OcaML programmers use 
such style of coding to write "safer" code all the time.

Too many types make the code harder (also because D doesn't have 
de-structuring syntax in function signatures and so on), but few 
strategically designed structs can help.

Bye,
bearophile

Nov 19 2013

"Brad Anderson" <eco gnuk.net> writes:

On Wednesday, 20 November 2013 at 00:48:40 UTC, bearophile wrote:
 [snip]
 Often I have genomic data or other text data that is surely 
 ASCII (and I can accept a run-time exception at loading time if 
 it's not ASCII). Once such text is in memory I'd like to not 
 pay for UTF on it. Sometimes you can do this with 
 std.string.representation, but there is no opposite function 
 (http://d.puremagic.com/issues/show_bug.cgi?id=10162 ). Also in 
 Phobos there are several string/char functions that could be 
 made faster if the input is assumed to be ASCII. To solve this 
 problem in languages as Haskell they usually introduce a new 
 type like AsciiString. In past I have suggested to introduce 
 such string wrapper in Phobos.

Is that not what phobo's AsciiString is?

Nov 19 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/19/2013 4:48 PM, bearophile wrote:
 Also in Phobos there are
 several string/char functions that could be made faster if the input is assumed
 to be ASCII.

Which ones? The ones I coded up originally were designed so they weren't 
degraded by utf.

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-20 01:01, Andrei Alexandrescu wrote:
 There's been recent discussion herein about what parameter validation
 method would be best for Phobos to adhere to.

 Currently we are using a mix of approaches:

 1. Some functions enforce()

 2. Some functions just assert()

 3. Some (fewer I think) functions assert(0)

 4. Some functions don't do explicit checking, relying instead on
 lower-level enforcement such as null dereference and bounds checking to
 ensure safety.

 Each method has its place. The question is what guidelines we put
 forward for Phobos code to follow; we're a bit loose about that right now.

 A second, just as interesting topic, is how to design abstractions for
 speed and safety. There are cases in which spurious checking is
 prohibitively expensive if not necessary, so it should be avoided where
 necessary. Examples:

 (a) FracSecs(long x) validates x to be within range. The cost of the
 validation itself is about as high as the payload itself (which is one
 assignment).

 (b) sort() offers a SortedRange with its goodies. We also have
 assumeSorted that also offers a SortedRange, but relies on the user to
 validate that assumption.

 (c) A variety of text functions currently suffer because we don't make
 the difference between validated UTF strings and potentially invalid ones.

 Walter and I are thinking of fostering the idiom in which types (or
 attributes?) are used as information about validation, similar to how
 assumeSorted works. Building on that, we'd have a function like "static
 FracSecs assumeValid(long)" inside FracSecs (no need for a different
 type here). Then, we'd have a CleanUTF type or something that would
 guarantee the string stored within has been validated.

Would we accompany the assumeSorted with an assert in the function 
assuming something is sorted? We probably don't want to rely on convention.

What about distributing a version of druntime and Phobos with asserts 
enabled that is used by default (or with the -debug flag). Then a 
version with asserts disabled is used when the -release flag is used.

We probably also want it to be possible to use Phobos with asserts 
enabled even in release mode.

-- 
/Jacob Carlborg

Nov 19 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Wed, 20 Nov 2013 08:49:28 +0100
schrieb Jacob Carlborg <doob me.com>:

 What about distributing a version of druntime and Phobos with asserts 
 enabled that is used by default (or with the -debug flag). Then a 
 version with asserts disabled is used when the -release flag is used.
 
 We probably also want it to be possible to use Phobos with asserts 
 enabled even in release mode.

That is what LDC does and with the -defaultlib switch it is
easy to use the debug Phobos in release builds. Currently this
flag is mostly used to link against the shared phobos2.so.

-- 
Marco

Nov 20 2013

Timon Gehr <timon.gehr gmx.ch> writes:

On 11/20/2013 08:49 AM, Jacob Carlborg wrote:
 Would we accompany the assumeSorted with an assert in the function
 assuming something is sorted? We probably don't want to rely on convention.

We do in any case:

import std.algorithm, std.range;

void main(){
     auto a = [1,2,3,4,5];
     auto s = sort(a);
     swap(a[0],a[$-1]);
     assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted());
}

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
      auto a = [1,2,3,4,5];
      auto s = sort(a);
      swap(a[0],a[$-1]);
      assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted());
 }

I don't understand what this is supposed to show. That the type is 
"SortedRange" but it's actually not sorted?

-- 
/Jacob Carlborg

Nov 20 2013

Timon Gehr <timon.gehr gmx.ch> writes:

On 11/20/2013 02:52 PM, Jacob Carlborg wrote:
 On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
      auto a = [1,2,3,4,5];
      auto s = sort(a);
      swap(a[0],a[$-1]);
      assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted());
 }

 I don't understand what this is supposed to show. That the type is
 "SortedRange" but it's actually not sorted?

Yes, hence SortedRange being sorted is just a convention in any case.

Nov 20 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/20/13 6:14 AM, Timon Gehr wrote:
 Yes, hence SortedRange being sorted is just a convention in any case.

That's right. In particular we can't have assumeSorted check for 
isSorted even at the point of creation, and even with debug-only 
asserts. This is because checking would change the complexity of binary 
search and related algorithms, which is often prohibitive.

Andrei

Nov 20 2013

"Meta" <jared771 gmail.com> writes:

On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr wrote:
 On 11/20/2013 02:52 PM, Jacob Carlborg wrote:
 On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
     auto a = [1,2,3,4,5];
     auto s = sort(a);
     swap(a[0],a[$-1]);
     assert(is(typeof(s)==SortedRange!(int[])) && 
 !s.isSorted());
 }

 I don't understand what this is supposed to show. That the 
 type is
 "SortedRange" but it's actually not sorted?

 Yes, hence SortedRange being sorted is just a convention in any 
 case.

Couldn't we have an overload of each of the mutating functions in 
std.algorithm that takes a SortedRange and does static assert(0, 
"Cannot modify a sorted range")? I suppose there are cases where 
we *want* to mutate a sorted range... Unwrap the inner type, 
maybe?

Nov 20 2013

"Meta" <jared771 gmail.com> writes:

On Wednesday, 20 November 2013 at 17:56:22 UTC, Meta wrote:
 On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr 
 wrote:
 On 11/20/2013 02:52 PM, Jacob Carlborg wrote:
 On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
    auto a = [1,2,3,4,5];
    auto s = sort(a);
    swap(a[0],a[$-1]);
    assert(is(typeof(s)==SortedRange!(int[])) && 
 !s.isSorted());
 }

 I don't understand what this is supposed to show. That the 
 type is
 "SortedRange" but it's actually not sorted?

 Yes, hence SortedRange being sorted is just a convention in 
 any case.

 Couldn't we have an overload of each of the mutating functions 
 in std.algorithm that takes a SortedRange and does static 
 assert(0, "Cannot modify a sorted range")? I suppose there are 
 cases where we *want* to mutate a sorted range... Unwrap the 
 inner type, maybe?

That is, a mutating function that takes a sorted range strips the 
SortedRange wrapper and returns the underlying type.

Nov 20 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/20/13 9:56 AM, Meta wrote:
 On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr wrote:
 On 11/20/2013 02:52 PM, Jacob Carlborg wrote:
 On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
     auto a = [1,2,3,4,5];
     auto s = sort(a);
     swap(a[0],a[$-1]);
     assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted());
 }

 I don't understand what this is supposed to show. That the type is
 "SortedRange" but it's actually not sorted?

 Yes, hence SortedRange being sorted is just a convention in any case.

 Couldn't we have an overload of each of the mutating functions in
 std.algorithm that takes a SortedRange and does static assert(0, "Cannot
 modify a sorted range")? I suppose there are cases where we *want* to
 mutate a sorted range... Unwrap the inner type, maybe?

That wouldn't help much - people have access to the underlying range anyway.

Andrei

Nov 20 2013

"Meta" <jared771 gmail.com> writes:

On Wednesday, 20 November 2013 at 20:06:47 UTC, Andrei 
Alexandrescu wrote:
 That wouldn't help much - people have access to the underlying 
 range anyway.

 Andrei

You're right, I forgot about that. However, people generally 
won't be modifying a SortedRange in place, will they? Even if 
they do, it'll probably be using one of the mutating functions in 
std.algorithm. Also, somewhat related, couldn't 
std.algorithm.sort simply return the passed-in range if that 
range is already wrapped with SortedRange?

Nov 20 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 20/11/13 01:01, Andrei Alexandrescu wrote:
 There's been recent discussion herein about what parameter validation method
 would be best for Phobos to adhere to.

 Currently we are using a mix of approaches:

 1. Some functions enforce()

 2. Some functions just assert()

 3. Some (fewer I think) functions assert(0)

 4. Some functions don't do explicit checking, relying instead on lower-level
 enforcement such as null dereference and bounds checking to ensure safety.

 Each method has its place. The question is what guidelines we put forward for
 Phobos code to follow; we're a bit loose about that right now.

Regarding enforce() vs. assert(), a good rule that I remember having suggested 
to me was that enforce() should be used for actual runtime checking (e.g. 
checking that the input to a public API function has correct properties), 
assert() should be used to test logical failures (i.e. checking that cases
which 
should never arise, really don't arise).

I've always followed that as a rule of thumb ever since.

Nov 19 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/19/2013 4:01 PM, Andrei Alexandrescu wrote:
 There's been recent discussion herein about what parameter validation method
 would be best for Phobos to adhere to.

Important is deciding upon the notions of "validated data" and "untrusted data"
is.

1. Validated data should get asserts if it is found to be invalid.

2. Untrusted data should get exceptions thrown if it is found to be invalid (or 
return errors).

For example, consider a utf string. If it has passed a validation check, then
it 
becomes trusted data. Further processing on it should assert if it turns out to 
be invalid (because then you've got a programming bug).

File open failures should always throw, and never assert, because the file is 
not part of the program and so is inherently not trusted.

One way to distinguish validated from untrusted data is by using different
types 
(or a naming convention, see Joel Spolsky's 
http://www.joelonsoftware.com/articles/Wrong.html).

It is of major importance in a program to think about what APIs get validated 
arguments and what APIs get untrusted arguments.

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-20 09:50, Walter Bright wrote:

 Important is deciding upon the notions of "validated data" and
 "untrusted data" is.

 1. Validated data should get asserts if it is found to be invalid.

 2. Untrusted data should get exceptions thrown if it is found to be
 invalid (or return errors).

 For example, consider a utf string. If it has passed a validation check,
 then it becomes trusted data. Further processing on it should assert if
 it turns out to be invalid (because then you've got a programming bug).

 File open failures should always throw, and never assert, because the
 file is not part of the program and so is inherently not trusted.

 One way to distinguish validated from untrusted data is by using
 different types (or a naming convention, see Joel Spolsky's
 http://www.joelonsoftware.com/articles/Wrong.html).

 It is of major importance in a program to think about what APIs get
 validated arguments and what APIs get untrusted arguments.

How should we accomplish this? We can't replace:

void main (string[] args)

With

void main (UnsafeString[] args)

And break every application out there.

-- 
/Jacob Carlborg

Nov 20 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, November 20, 2013 11:49:32 Jacob Carlborg wrote:
 On 2013-11-20 09:50, Walter Bright wrote:
 Important is deciding upon the notions of "validated data" and
 "untrusted data" is.
 
 1. Validated data should get asserts if it is found to be invalid.
 
 2. Untrusted data should get exceptions thrown if it is found to be
 invalid (or return errors).
 
 For example, consider a utf string. If it has passed a validation check,
 then it becomes trusted data. Further processing on it should assert if
 it turns out to be invalid (because then you've got a programming bug).
 
 File open failures should always throw, and never assert, because the
 file is not part of the program and so is inherently not trusted.
 
 One way to distinguish validated from untrusted data is by using
 different types (or a naming convention, see Joel Spolsky's
 http://www.joelonsoftware.com/articles/Wrong.html).
 
 It is of major importance in a program to think about what APIs get
 validated arguments and what APIs get untrusted arguments.

 
 How should we accomplish this? We can't replace:
 
 void main (string[] args)
 
 With
 
 void main (UnsafeString[] args)
 
 And break every application out there.

You'd do it the other way around by having something like

ValidatedString!char s = validateString("hello world");

ValidatedString would then avoid any extra validation when iterating over the 
characters, though I don't know how much of an efficiency gain that would 
actually be given that much of the validation occurs naturally when decoding 
or using stride. It would have the downside that any function which 
specializes on strings would likely have to then specialize on ValidatedString 
as well. So, while I agree with the idea in concept, I'd propose that we 
benchmark the difference in decoding and striding without the checks and see if 
there actually is much difference. Because if there isn't, then I don't think 
that it's worth going to the trouble of adding something like ValidatedString.

- Jonathan M Davis

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");

Right.

 ValidatedString would then avoid any extra validation when iterating over the
 characters, though I don't know how much of an efficiency gain that would
 actually be given that much of the validation occurs naturally when decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks and see if
 there actually is much difference. Because if there isn't, then I don't think
 that it's worth going to the trouble of adding something like ValidatedString.

If not just if the string is valid UTF-8. There can be many other types 
of valid strings. Or rather other functions that have additional 
requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.

-- 
/Jacob Carlborg

Nov 20 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Wed, 20 Nov 2013 12:49:20 +0100
schrieb Jacob Carlborg <doob me.com>:

 On 2013-11-20 12:16, Jonathan M Davis wrote:
 
 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");

 
 Right.
 
 ValidatedString would then avoid any extra validation when iterating over the
 characters, though I don't know how much of an efficiency gain that would
 actually be given that much of the validation occurs naturally when decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks and see if
 there actually is much difference. Because if there isn't, then I don't think
 that it's worth going to the trouble of adding something like ValidatedString.

 
 If not just if the string is valid UTF-8. There can be many other types 
 of valid strings. Or rather other functions that have additional 
 requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.

None of that is feasible. We can only hope that we simply
catch every case of user input (or untrusted data) and check
it before passing it to Phobos APIs. That's why there are
functions to validate and also to sanitize UTF strings on a
best effort basis in Phobos.

So in my opinion Phobos should continue forward with assert
instead of enforce. I/O functions, of course, have to use
exceptions.

That said, I never thought of validating args[] before passing
it to getopt or using them as a filename. Lesson learned, I
guess?

-- 
Marco

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-20 13:22, Marco Leise wrote:

 None of that is feasible. We can only hope that we simply
 catch every case of user input (or untrusted data) and check
 it before passing it to Phobos APIs. That's why there are
 functions to validate and also to sanitize UTF strings on a
 best effort basis in Phobos.

 So in my opinion Phobos should continue forward with assert
 instead of enforce. I/O functions, of course, have to use
 exceptions.

 That said, I never thought of validating args[] before passing
 it to getopt or using them as a filename. Lesson learned, I
 guess?

I don't know how getopt behaves but using them as a filename will most 
likely end up calling a system function, which will hopefully take care 
of the checking.

-- 
/Jacob Carlborg

Nov 20 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 20.11.2013 12:49, Jacob Carlborg wrote:
 On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");

 Right.

 ValidatedString would then avoid any extra validation when iterating
 over the
 characters, though I don't know how much of an efficiency gain that would
 actually be given that much of the validation occurs naturally when
 decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on
 ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks
 and see if
 there actually is much difference. Because if there isn't, then I
 don't think
 that it's worth going to the trouble of adding something like
 ValidatedString.

 If not just if the string is valid UTF-8. There can be many other types
 of valid strings. Or rather other functions that have additional
 requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.

May I suggest:

struct Validated(alias fn, T) {
     private T value;
      property inout
     T get() {
         return value;
     }
}

Validated!(fn, T) validate(alias fn, T)(T value) {
     Validated!(fn, T) result;
     fn(value);
     result.value = value;
     return result;
}

void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, 
string) path) {
    // Do stuff
}

-- 
   Simen

Nov 20 2013

"Meta" <jared771 gmail.com> writes:

On Wednesday, 20 November 2013 at 17:45:43 UTC, Simen Kjærås 
wrote:
 On 20.11.2013 12:49, Jacob Carlborg wrote:
 On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");

 Right.

 ValidatedString would then avoid any extra validation when 
 iterating
 over the
 characters, though I don't know how much of an efficiency 
 gain that would
 actually be given that much of the validation occurs 
 naturally when
 decoding
 or using stride. It would have the downside that any function 
 which
 specializes on strings would likely have to then specialize on
 ValidatedString
 as well. So, while I agree with the idea in concept, I'd 
 propose that we
 benchmark the difference in decoding and striding without the 
 checks
 and see if
 there actually is much difference. Because if there isn't, 
 then I
 don't think
 that it's worth going to the trouble of adding something like
 ValidatedString.

 If not just if the string is valid UTF-8. There can be many 
 other types
 of valid strings. Or rather other functions that have 
 additional
 requirements. Like sanitized filenames, HTML/SQL escaped 
 strings and so on.

 May I suggest:

 struct Validated(alias fn, T) {
     private T value;
      property inout
     T get() {
         return value;
     }
 }

 Validated!(fn, T) validate(alias fn, T)(T value) {
     Validated!(fn, T) result;
     fn(value);
     result.value = value;
     return result;
 }

 void 
 functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, 
 string) path) {
    // Do stuff
 }

I was having the exact same thought. I think this could be very 
powerful if done correctly.

Nov 20 2013

"Dicebot" <public dicebot.lv> writes:

I also think this is very powerful and under-explored approach 
but it really better belongs to certain domain framework than to 
stdlib. One example I keep thinking about is to re-declare vibe.d 
string functions in terms of EscapedString!(SQL), 
EscapedString!(HTML) and so on for better application safety and 
correctness. No idea how that may work in practice though.

Nov 20 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

20-Nov-2013 22:28, Dicebot пишет:
 I also think this is very powerful and under-explored approach but it
 really better belongs to certain domain framework than to stdlib. One
 example I keep thinking about is to re-declare vibe.d string functions
 in terms of EscapedString!(SQL), EscapedString!(HTML) and so on for
 better application safety and correctness. No idea how that may work in
 practice though.

I think the obstacles are mostly:

1. There is a non-zero intersection between validated subsets. Some kind 
of NiceStringWithNoPunctuation fits practically every 
EscapedString!(XYZ). There must be a way to cascade and mix/match these 
classes.

2. Template bloatZ! It would be real hard to fight the IFTI duping 
functions bodies behind your back. Or if you dumb down these escaped 
types to not fit the most of templates, it may become a usability problem.

3. This kind of thing is viral. With escape hatch though, it may be done 
step by step.

-- 
Dmitry Olshansky

Nov 20 2013

"Dicebot" <public dicebot.lv> writes:

On Wednesday, 20 November 2013 at 20:19:28 UTC, Dmitry Olshansky 
wrote:
 2. Template bloatZ! It would be real hard to fight the IFTI 
 duping functions bodies behind your back. Or if you dumb down 
 these escaped types to not fit the most of templates, it may 
 become a usability problem.

 3. This kind of thing is viral. With escape hatch though, it 
 may be done step by step.

This is the very reason why I am saying it makes much more sense 
as part of certain application framework as those tends to have 
more clear separation between internal and external 
infrastructure and strict usage API expectations. So it is not a 
usability problem, it is a usability feature :)

Nov 20 2013

"inout" <inout gmail.com> writes:

On Wednesday, 20 November 2013 at 17:45:43 UTC, Simen Kjærås 
wrote:
 On 20.11.2013 12:49, Jacob Carlborg wrote:
 On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");

 Right.

 ValidatedString would then avoid any extra validation when 
 iterating
 over the
 characters, though I don't know how much of an efficiency 
 gain that would
 actually be given that much of the validation occurs 
 naturally when
 decoding
 or using stride. It would have the downside that any function 
 which
 specializes on strings would likely have to then specialize on
 ValidatedString
 as well. So, while I agree with the idea in concept, I'd 
 propose that we
 benchmark the difference in decoding and striding without the 
 checks
 and see if
 there actually is much difference. Because if there isn't, 
 then I
 don't think
 that it's worth going to the trouble of adding something like
 ValidatedString.

 If not just if the string is valid UTF-8. There can be many 
 other types
 of valid strings. Or rather other functions that have 
 additional
 requirements. Like sanitized filenames, HTML/SQL escaped 
 strings and so on.

 May I suggest:

 struct Validated(alias fn, T) {
     private T value;
      property inout
     T get() {
         return value;
     }
 }

 Validated!(fn, T) validate(alias fn, T)(T value) {
     Validated!(fn, T) result;
     fn(value);
     result.value = value;
     return result;
 }

 void 
 functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, 
 string) path) {
    // Do stuff
 }

What if you have more that just one validation, e.g. Positive and 
LessThan42?
Is Positive!LessThan42!int the same type as 
LessThan42!Positive!int? Implicitly convertible?

I feel that it might be better to use  attributes here instead. 
Something like:

 positive int validatePositive(int value) {
   assert(value > 0);
   return value;
}

 lessThan42 validateLessThan42(int value) {
   assert(value < 42);
   return value;
}

Now you can have  positive  lessThan42 int value = 
validatePositive(validateLessThan42(x));

It also doesn't involve creating new types.

Nov 21 2013

"Meta" <jared771 gmail.com> writes:

On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive 
 and LessThan42?
 Is Positive!LessThan42!int the same type as 
 LessThan42!Positive!int? Implicitly convertible?

Allow multiple validation functions. Then a Validated type is 
only valid if validationFunction1(val) && 
validationFunction2(val) &&...

Validated!(isPositive, lessThan42, int) validatedInt = 
validate!(isPositive, lessThan42)(34);
//Do stuff with validatedInt

Or just pass a function that validates that the int is both 
positive and less than 42, which would be much simpler.

 ...

 It also doesn't involve creating new types.

Creating new types is what allows us to provide static, 
compiler-verified guarantees.

Nov 21 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as LessThan42!Positive!int?
 Implicitly convertible?

 Allow multiple validation functions. Then a Validated type is only valid
 if validationFunction1(val) && validationFunction2(val) &&...

 Validated!(isPositive, lessThan42, int) validatedInt =
 validate!(isPositive, lessThan42)(34);
 //Do stuff with validatedInt

I believe inout's point was this, though:

   Validated!(isPositive, lessThan42, int) i = foo();

   Validated!(isPositive, int) n = i; // Fails.
   Validated!(lessThan42, isPositive, int) r = i; // Fails.

This is of course less than optimal.

If a type such as Validate is to be added to Phobos, these problems need 
to be fixed first.


 Or just pass a function that validates that the int is both positive and
 less than 42, which would be much simpler.



-- 
   Simen

Nov 21 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 22 Nov 2013 02:55:44 +0100
schrieb Simen Kj=C3=A6r=C3=A5s <simen.kjaras gmail.com>:

 On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as LessThan42!Positive!int?
 Implicitly convertible?

 Allow multiple validation functions. Then a Validated type is only valid
 if validationFunction1(val) && validationFunction2(val) &&...

 Validated!(isPositive, lessThan42, int) validatedInt =3D
 validate!(isPositive, lessThan42)(34);
 //Do stuff with validatedInt

=20
 I believe inout's point was this, though:
=20
    Validated!(isPositive, lessThan42, int) i =3D foo();
=20
    Validated!(isPositive, int) n =3D i; // Fails.
    Validated!(lessThan42, isPositive, int) r =3D i; // Fails.
=20
 This is of course less than optimal.
=20
 If a type such as Validate is to be added to Phobos, these problems need=

=20
 to be fixed first.

Can you write a templated assignment operator that
accepts any Validated!* instance and builds the set difference
of validation functions that are missing on the assigned value?
E.g. in the case of n =3D i: {isPositive} / {isPositive,
lessThan42} =3D emtpy set.

--=20
Marco

Nov 25 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 2013-11-25 13:00, Marco Leise wrote:
 Am Fri, 22 Nov 2013 02:55:44 +0100
 schrieb Simen Kjærås <simen.kjaras gmail.com>:

 On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as LessThan42!Positive!int?
 Implicitly convertible?

 Allow multiple validation functions. Then a Validated type is only valid
 if validationFunction1(val) && validationFunction2(val) &&...

 Validated!(isPositive, lessThan42, int) validatedInt =
 validate!(isPositive, lessThan42)(34);
 //Do stuff with validatedInt

 I believe inout's point was this, though:

     Validated!(isPositive, lessThan42, int) i = foo();

     Validated!(isPositive, int) n = i; // Fails.
     Validated!(lessThan42, isPositive, int) r = i; // Fails.

 This is of course less than optimal.

 If a type such as Validate is to be added to Phobos, these problems need
 to be fixed first.

 Can you write a templated assignment operator that
 accepts any Validated!* instance and builds the set difference
 of validation functions that are missing on the assigned value?
 E.g. in the case of n = i: {isPositive} / {isPositive,
 lessThan42} = emtpy set.

Do you mean this?

Validated!(int, isPositive, lessThan42) a =
     validated!(isPositive, lessThan42)(13);
Validated!(int, isPositive) b = a;
a = b; // Only tests lessThan42

If so, you're mostly right that this should be done. I am however of the 
opinion that conversions that may throw should be marked appropriately, 
so this will be the right way:

a = validated!(isPositive, lessThan42)(b); // Only tests lessThan42

New version now available on GitHub:
http://git.io/hEe0MA
http://git.io/QEP-kQ

--
   Simen

Nov 25 2013

"inout" <inout gmail.com> writes:

On Monday, 25 November 2013 at 13:01:43 UTC, Simen Kjærås wrote:
 On 2013-11-25 13:00, Marco Leise wrote:
 Am Fri, 22 Nov 2013 02:55:44 +0100
 schrieb Simen Kjærås <simen.kjaras gmail.com>:

 On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. 
 Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as 
 LessThan42!Positive!int?
 Implicitly convertible?

 Allow multiple validation functions. Then a Validated type 
 is only valid
 if validationFunction1(val) && validationFunction2(val) &&...

 Validated!(isPositive, lessThan42, int) validatedInt =
 validate!(isPositive, lessThan42)(34);
 //Do stuff with validatedInt

 I believe inout's point was this, though:

    Validated!(isPositive, lessThan42, int) i = foo();

    Validated!(isPositive, int) n = i; // Fails.
    Validated!(lessThan42, isPositive, int) r = i; // Fails.

 This is of course less than optimal.

 If a type such as Validate is to be added to Phobos, these 
 problems need
 to be fixed first.

 Can you write a templated assignment operator that
 accepts any Validated!* instance and builds the set difference
 of validation functions that are missing on the assigned value?
 E.g. in the case of n = i: {isPositive} / {isPositive,
 lessThan42} = emtpy set.

 Do you mean this?

 Validated!(int, isPositive, lessThan42) a =
     validated!(isPositive, lessThan42)(13);
 Validated!(int, isPositive) b = a;
 a = b; // Only tests lessThan42

 If so, you're mostly right that this should be done. I am 
 however of the opinion that conversions that may throw should 
 be marked appropriately, so this will be the right way:

 a = validated!(isPositive, lessThan42)(b); // Only tests 
 lessThan42

 New version now available on GitHub:
 http://git.io/hEe0MA
 http://git.io/QEP-kQ

 --
   Simen

I find this to be too verbose to be useful. And you also need to
be very careful not to discard any existing qualifiers on input
and carry them over. This will essentially make any function that
uses them to be templated, while all the instances will be the
same (yet have a different body since no D compiler merges
identical functions).

I still find wrapping int with some type to add a tag to it
without adding any methods is not a great idea - it doesn't scale
well with composition and tag propagation. Any operation that
expects int will essentially discard all the qualifiers.

Nov 26 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 26.11.2013 21:14, inout wrote:
 I find this to be too verbose to be useful.

This I understand. It is actually the best argument I can find in favor 
of doing constraints checking upon construction, rather than in a 
separate construction function. This allows you to use one alias instead 
of two.


 And you also need to
 be very careful not to discard any existing qualifiers on input
 and carry them over. This will essentially make any function that
 uses them to be templated, while all the instances will be the
 same (yet have a different body since no D compiler merges
 identical functions).

Could you give an example of this? It's a bit unclear to me what you 
mean. Is it this sort of thing:

   auto doPrimeStuff(Validated!(int, isPrime) a){return a;}
   auto doLessThan42Stuff(Validated!(int, lessThan42) a){return a;}

   Validated!(int, isPrime, lessThan42) i = 13;

   i.doPrimeStuff().doLessThan42Stuff();

Where the second chained function call fails due to lessThan42 being 
removed from the constraints? (There's also the problem that this 
wouldn't work in the first place due to D's lack of implicit conversions)

 I still find wrapping int with some type to add a tag to it
 without adding any methods is not a great idea - it doesn't scale
 well with composition and tag propagation. Any operation that
 expects int will essentially discard all the qualifiers.

And any operation of the kind you describe is likely to change the value 
so the constraints need to be checked again. abs(Validated!(int, 
isNegative)) cannot possibly return the same type it received.

-- 
   Simen

Nov 26 2013

"Meta" <jared771 gmail.com> writes:

On Tuesday, 26 November 2013 at 20:14:15 UTC, inout wrote:
 Any operation that expects int will essentially discard all the 
 qualifiers.

It isn't surprising that any operation that expects int will get 
int. To take advantage of Validated, an operation has to expect 
Validated.

Nov 26 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 25 Nov 2013 14:01:28 +0100
schrieb Simen Kj=C3=A6r=C3=A5s <simen.kjaras gmail.com>:

 On 2013-11-25 13:00, Marco Leise wrote:
 Can you write a templated assignment operator that
 accepts any Validated!* instance and builds the set difference
 of validation functions that are missing on the assigned value?
 E.g. in the case of n =3D i: {isPositive} / {isPositive,
 lessThan42} =3D emtpy set.

=20
 Do you mean this?
=20
 Validated!(int, isPositive, lessThan42) a =3D
      validated!(isPositive, lessThan42)(13);
 Validated!(int, isPositive) b =3D a;
 a =3D b; // Only tests lessThan42
=20
 If so, you're mostly right that this should be done. I am however of the=

=20
 opinion that conversions that may throw should be marked appropriately,=20
 so this will be the right way:
=20
 a =3D validated!(isPositive, lessThan42)(b); // Only tests lessThan42
=20
 New version now available on GitHub:
 http://git.io/hEe0MA
 http://git.io/QEP-kQ
=20
 --
    Simen

Yes, that is what I had in mind.

--=20
Marco

Nov 26 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 22.11.2013 02:55, Simen Kjærås wrote:
 On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as LessThan42!Positive!int?
 Implicitly convertible?

 Allow multiple validation functions. Then a Validated type is only valid
 if validationFunction1(val) && validationFunction2(val) &&...

 Validated!(isPositive, lessThan42, int) validatedInt =
 validate!(isPositive, lessThan42)(34);
 //Do stuff with validatedInt

 I believe inout's point was this, though:

    Validated!(isPositive, lessThan42, int) i = foo();

    Validated!(isPositive, int) n = i; // Fails.
    Validated!(lessThan42, isPositive, int) r = i; // Fails.

 This is of course less than optimal.

 If a type such as Validate is to be added to Phobos, these problems need
 to be fixed first.


 Or just pass a function that validates that the int is both positive and
 less than 42, which would be much simpler.


I've created a version of Validated now that takes 1 or more 
constraints, and where a type whose constraints are a superset of 
another's, is implicitly convertible to that. Sadly, because of D's lack 
of certain implicit conversions, there are limits.

Attached is source (validation.d), and some utility functions that are 
necessary for it to compile (utils.d).

Is this worth working more on? Should it be in Phobos? Other critique?

Oh, sorry about those stupid questions, we have a term for that:

Detroy!

-- 
   Simen

Nov 24 2013

"Meta" <jared771 gmail.com> writes:

On Sunday, 24 November 2013 at 17:35:51 UTC, Simen Kjærås wrote:
 I believe inout's point was this, though:

    Validated!(isPositive, lessThan42, int) i = foo();

    Validated!(isPositive, int) n = i; // Fails.
    Validated!(lessThan42, isPositive, int) r = i; // Fails.

 This is of course less than optimal.

 If a type such as Validate is to be added to Phobos, these 
 problems need
 to be fixed first.


 Or just pass a function that validates that the int is both 
 positive and
 less than 42, which would be much simpler.


 I've created a version of Validated now that takes 1 or more
 constraints, and where a type whose constraints are a superset 
 of
 another's, is implicitly convertible to that. Sadly, because of 
 D's lack
 of certain implicit conversions, there are limits.

 Attached is source (validation.d), and some utility functions 
 that are
 necessary for it to compile (utils.d).

 Is this worth working more on? Should it be in Phobos? Other 
 critique?

 Oh, sorry about those stupid questions, we have a term for that:

 Detroy!

Awesome, I was messing around with something similar but you beat 
me to the punch. A couple things:

- The function validated would probably be better named validate, 
since it actually performs validation and returns a validated 
type. The struct's name is fine.

- I think it'd be better to change "static if 
(is(typeof(fn(value)) == bool))" to "static if 
(is(typeof(fn(value)) : bool))", which rather than checking that 
the return type is exactly bool, it only checks that it's 
implicitly convertible to bool, AKA "truthy".

- It might be a good idea to have a version(AlwaysValidate) block 
in assumeValidated for people who don't care about code speed and 
want maximum safety, that would always run the validation 
functions. Also, it might be a good idea to mark assumeValidated 
 system, because it blatantly breaks the underlying assumptions 
being made in the first place. Code that wants to be rock-solid 
 safe will be restricted to using only validate. Or maybe that's 
going too far.

- Validated doesn't work very well with reference types. The 
following fails:

class CouldBeNull
{
}

bool notNull(T)(T t)
if (is(T == class))
{
	return t !is null;
}

//Error: cannot implicitly convert expression (this._value) of 
type inout(CouldBeNull) to f505.CouldBeNull
void takesNonNull(Validated!(CouldBeNull, notNull) validatedT)
{
}

- On the subject of reference types, I don't think Validated 
handles them quite correctly. This is a problem I ran into, and 
it's not an easy one. Assume for a second that there's a class 
FourtyTwo that *does* work with Validated:

	class FortyTwo
	{
		int i = 42;
	}
	
	bool containsFortyTwo(FortyTwo ft)
	{
		return ft.i == 42;
	}
	
	void mutateFortyTwo(Validated!(FortyTwo, containsFortyTwo) 
fortyTwo)
	{
		fortyTwo.i = 43;
	}
	
	auto a = validated!containsFortyTwo(new FortyTwo());
	auto b = a;
	//Passes
	assert(a.i == 42);
	assert(b.i == 42);
	mutateFortyTwo(a);
	//Fails
	assert(a.i == 43);
	assert(b.i == 43);

This is an extremely contrived example, but it illustrates the 
problem of using reference types with Validated. It gets even 
hairier if i itself were a reference type, like a slice:

	void mutateCopiedValue(Validated!(FortyTwo, containsFortyTwo) 
fortyTwo)
	{
		//We're not out of the woods yet
		int[] arr = fortyTwo.i;
		arr[0] += 1;
	}

         //Continuing from previous example,
         //except i is now an array
	mutateCopiedValue(b);
	assert(a.i[0] == 44);
	assert(b.i[0] == 44);

Obviously in this case you could just .dup i, but what if i were 
a class itself? It'd be extremely easy to accidentally invalidate 
every Validated!(FortyTwo, ...) in the program in a single swipe. 
It gets even worse if i were some class reference to which other, 
non-validated references existed. Changing those naked references 
would change i, and possibly invalidate it.

Nov 24 2013

"Meta" <jared771 gmail.com> writes:

On Monday, 25 November 2013 at 07:24:10 UTC, Meta wrote:
 	auto a = validated!containsFortyTwo(new FortyTwo());
 	auto b = a;
 	//Passes
 	assert(a.i == 42);
 	assert(b.i == 42);
 	mutateFortyTwo(a);
 	//Fails
 	assert(a.i == 43);
 	assert(b.i == 43);

"//Fails" should be "//Passes" as well.

Nov 24 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 2013-11-25 08:24, Meta wrote:
 - The function validated would probably be better named validate, since
 it actually performs validation and returns a validated type. The
 struct's name is fine.

Yeah, I was somewhat torn there, but I think you're right. Fixed.


 - I think it'd be better to change "static if (is(typeof(fn(value)) ==
 bool))" to "static if (is(typeof(fn(value)) : bool))", which rather than
 checking that the return type is exactly bool, it only checks that it's
 implicitly convertible to bool, AKA "truthy".

Even better - test if 'if (fn(value)) {}' compiles. Fixed.


 - It might be a good idea to have a version(AlwaysValidate) block in
 assumeValidated for people who don't care about code speed and want
 maximum safety, that would always run the validation functions. Also, it
 might be a good idea to mark assumeValidated  system, because it
 blatantly breaks the underlying assumptions being made in the first
 place. Code that wants to be rock-solid  safe will be restricted to
 using only validate. Or maybe that's going too far.

 safe is only for memory safety, which this is not. I agree it would be 
nice to mark assumeValidated as 'warning, may not do what it claims', 
but  safe is not really the correct indicator of that.


 - Validated doesn't work very well with reference types. The following
 fails:

 class CouldBeNull
 {
 }

 bool notNull(T)(T t)
 if (is(T == class))
 {
      return t !is null;
 }

 //Error: cannot implicitly convert expression (this._value) of type
 inout(CouldBeNull) to f505.CouldBeNull
 void takesNonNull(Validated!(CouldBeNull, notNull) validatedT)
 {
 }

Yeah, found that. It's a bug in value(), which should return inout(T), 
not T. Fixed.


 - On the subject of reference types, I don't think Validated handles
 them quite correctly. This is a problem I ran into, and it's not an easy
 one. Assume for a second that there's a class FourtyTwo that *does* work
 with Validated:

      class FortyTwo
      {
          int i = 42;
      }

      bool containsFortyTwo(FortyTwo ft)
      {
          return ft.i == 42;
      }

      void mutateFortyTwo(Validated!(FortyTwo, containsFortyTwo) fortyTwo)
      {
          fortyTwo.i = 43;
      }

      auto a = validated!containsFortyTwo(new FortyTwo());
      auto b = a;
      //Passes
      assert(a.i == 42);
      assert(b.i == 42);
      mutateFortyTwo(a);
      //Fails
      assert(a.i == 43);
      assert(b.i == 43);

 This is an extremely contrived example, but it illustrates the problem
 of using reference types with Validated. It gets even hairier if i
 itself were a reference type, like a slice:

      void mutateCopiedValue(Validated!(FortyTwo, containsFortyTwo)
 fortyTwo)
      {
          //We're not out of the woods yet
          int[] arr = fortyTwo.i;
          arr[0] += 1;
      }

          //Continuing from previous example,
          //except i is now an array
      mutateCopiedValue(b);
      assert(a.i[0] == 44);
      assert(b.i[0] == 44);

 Obviously in this case you could just .dup i, but what if i were a class
 itself? It'd be extremely easy to accidentally invalidate every
 Validated!(FortyTwo, ...) in the program in a single swipe. It gets even
 worse if i were some class reference to which other, non-validated
 references existed. Changing those naked references would change i, and
 possibly invalidate it.

This is a known shortcoming for which I see no good workaround. It would 
be possible to use std.traits.hasAliasing to see which types can be 
safely .dup'ed and only allow those types, but this is not a solution I 
like.

I guess it could print a warning when used with unsafe types. If I were 
to do that, I would still want some way to turn that message off. Eh. 
Maybe there is no good solution.


What else is new?
- Better error messages for invalid constraints (testing if an int is
   null, a string is divisible by 3 or an array has a database
   connection, e.g.)
- Fixed a bug in opCast (I love that word - in Norwegian it [oppkast]
   means puke. ...anyways...) when converting to an incompatible wrapped
   value.

--
   Simen

Nov 25 2013

"Meta" <jared771 gmail.com> writes:

On Monday, 25 November 2013 at 08:52:14 UTC, Simen Kjærås wrote:
  safe is only for memory safety, which this is not. I agree it 
 would be
 nice to mark assumeValidated as 'warning, may not do what it 
 claims',
 but  safe is not really the correct indicator of that.

What about a version flag, then, that can be passed to specify 
that the user wants assumeValidated() to run the validation 
functions as well?

 This is a known shortcoming for which I see no good workaround. 
 It would
 be possible to use std.traits.hasAliasing to see which types 
 can be
 safely .dup'ed and only allow those types, but this is not a 
 solution I
 like.

It's a hard problem. This is a case where a Unique!T type would 
be really useful.

 What else is new?
 - Better error messages for invalid constraints (testing if an 
 int is
    null, a string is divisible by 3 or an array has a database
    connection, e.g.)
 - Fixed a bug in opCast (I love that word - in Norwegian it 
 [oppkast]
    means puke. ...anyways...) when converting to an 
 incompatible wrapped
    value.

 --
    Simen

Keep up the good work!

Nov 25 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 2013-11-26 06:37, Meta wrote:
 On Monday, 25 November 2013 at 08:52:14 UTC, Simen Kjærås wrote:
  safe is only for memory safety, which this is not. I agree it would be
 nice to mark assumeValidated as 'warning, may not do what it claims',
 but  safe is not really the correct indicator of that.

 What about a version flag, then, that can be passed to specify that the
 user wants assumeValidated() to run the validation functions as well?

That's already in:
http://git.io/EdHw8A

--
   Simen

Nov 26 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 20.11.2013 18:45, Simen Kjærås wrote:
 On 20.11.2013 12:49, Jacob Carlborg wrote:
 On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");

 Right.

 ValidatedString would then avoid any extra validation when iterating
 over the
 characters, though I don't know how much of an efficiency gain that
 would
 actually be given that much of the validation occurs naturally when
 decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on
 ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks
 and see if
 there actually is much difference. Because if there isn't, then I
 don't think
 that it's worth going to the trouble of adding something like
 ValidatedString.

 If not just if the string is valid UTF-8. There can be many other types
 of valid strings. Or rather other functions that have additional
 requirements. Like sanitized filenames, HTML/SQL escaped strings and
 so on.

 May I suggest:

 struct Validated(alias fn, T) {
      private T value;
       property inout
      T get() {
          return value;
      }

Uh-hm. Add this:
        alias get this;

 }

 Validated!(fn, T) validate(alias fn, T)(T value) {
      Validated!(fn, T) result;
      fn(value);
      result.value = value;
      return result;
 }

 void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName,
 string) path) {
     // Do stuff
 }


-- 
   Simen

Nov 20 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

20-Nov-2013 22:01, Simen Kjærås пишет:
 On 20.11.2013 18:45, Simen Kjærås wrote:

[snip]
 May I suggest:

 struct Validated(alias fn, T) {
      private T value;
       property inout
      T get() {
          return value;
      }

 Uh-hm. Add this:
         alias get this;

And it decays to the naked type in a blink of an eye. And some function 
down the road will do the validation again...

 }

 Validated!(fn, T) validate(alias fn, T)(T value) {
      Validated!(fn, T) result;
      fn(value);
      result.value = value;
      return result;
 }

 void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName,
 string) path) {
     // Do stuff
 }



-- 
Dmitry Olshansky

Nov 20 2013

"Meta" <jared771 gmail.com> writes:

On Wednesday, 20 November 2013 at 18:30:58 UTC, Dmitry Olshansky 
wrote:
 20-Nov-2013 22:01, Simen Kjærås пишет:
 On 20.11.2013 18:45, Simen Kjærås wrote:

 [snip]
 May I suggest:

 struct Validated(alias fn, T) {
     private T value;
      property inout
     T get() {
         return value;
     }

 Uh-hm. Add this:
        alias get this;

 And it decays to the naked type in a blink of an eye. And some 
 function down the road will do the validation again...

 }

 Validated!(fn, T) validate(alias fn, T)(T value) {
     Validated!(fn, T) result;
     fn(value);
     result.value = value;
     return result;
 }

 void 
 functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName,
 string) path) {
    // Do stuff
 }



Yes. It is very important not to allow direct access to the 
underlying value. This is important for ensuring that it is not 
put in an invalid state. This is a mistake that was made with 
std.typecons.Nullable, making it useless for anything other than 
giving a non-nullable type a null state (which, in fairness, is 
probably all that it was originally intended for).

Nov 20 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, November 20, 2013 19:53:43 Meta wrote:
 Yes. It is very important not to allow direct access to the
 underlying value. This is important for ensuring that it is not
 put in an invalid state. This is a mistake that was made with
 std.typecons.Nullable, making it useless for anything other than
 giving a non-nullable type a null state (which, in fairness, is
 probably all that it was originally intended for).

It's arguably pretty pointless to put a nullable type in 
std.typecons.Nullable. If you want a nullable type to be null, just set it to 
null.

- Jonathan M Davis

Nov 20 2013

"Meta" <jared771 gmail.com> writes:

On Wednesday, 20 November 2013 at 19:23:32 UTC, Jonathan M Davis 
wrote:
 On Wednesday, November 20, 2013 19:53:43 Meta wrote:
 Yes. It is very important not to allow direct access to the
 underlying value. This is important for ensuring that it is not
 put in an invalid state. This is a mistake that was made with
 std.typecons.Nullable, making it useless for anything other 
 than
 giving a non-nullable type a null state (which, in fairness, is
 probably all that it was originally intended for).

 It's arguably pretty pointless to put a nullable type in
 std.typecons.Nullable. If you want a nullable type to be null, 
 just set it to
 null.

 - Jonathan M Davis

See the discussion from the other thread for why it can be useful 
to wrap a nullable reference in a option type (nullable is a 
pseudo-option type).

Nov 20 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, November 20, 2013 20:40:40 Meta wrote:
 On Wednesday, 20 November 2013 at 19:23:32 UTC, Jonathan M Davis
 
 wrote:
 On Wednesday, November 20, 2013 19:53:43 Meta wrote:
 Yes. It is very important not to allow direct access to the
 underlying value. This is important for ensuring that it is not
 put in an invalid state. This is a mistake that was made with
 std.typecons.Nullable, making it useless for anything other
 than
 giving a non-nullable type a null state (which, in fairness, is
 probably all that it was originally intended for).

 
 It's arguably pretty pointless to put a nullable type in
 std.typecons.Nullable. If you want a nullable type to be null,
 just set it to
 null.
 
 - Jonathan M Davis

 
 See the discussion from the other thread for why it can be useful
 to wrap a nullable reference in a option type (nullable is a
 pseudo-option type).

I know. And I still think that it's pointless - and it incurs extra overhead 
to boot, making it _worse_ than pointless. But clearly there's disagreement on 
the matter.

- Jonathan M Davis

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-20 19:53, Meta wrote:

 Yes. It is very important not to allow direct access to the underlying
 value. This is important for ensuring that it is not put in an invalid
 state. This is a mistake that was made with std.typecons.Nullable,
 making it useless for anything other than giving a non-nullable type a
 null state (which, in fairness, is probably all that it was originally
 intended for).

In that case all string functionality needs to be provided inside the 
Validated struct. In addition to that we loose the beauty of UFCS, at 
least for functions expecting plain "string".

-- 
/Jacob Carlborg

Nov 20 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, November 21, 2013 08:36:37 Jacob Carlborg wrote:
 On 2013-11-20 19:53, Meta wrote:
 Yes. It is very important not to allow direct access to the underlying
 value. This is important for ensuring that it is not put in an invalid
 state. This is a mistake that was made with std.typecons.Nullable,
 making it useless for anything other than giving a non-nullable type a
 null state (which, in fairness, is probably all that it was originally
 intended for).

 
 In that case all string functionality needs to be provided inside the
 Validated struct. In addition to that we loose the beauty of UFCS, at
 least for functions expecting plain "string".

You could use alias this and alias the Validated struct to the underlying 
string, but if you did that, you'd probably end up having it escape the struct 
and used as a naked string the vast majority of the time, which would 
essentially defeat the purpose of the Validated struct.

- Jonathan M Davis

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-21 08:46, Jonathan M Davis wrote:

 You could use alias this and alias the Validated struct to the underlying
 string, but if you did that, you'd probably end up having it escape the struct
 and used as a naked string the vast majority of the time, which would
 essentially defeat the purpose of the Validated struct.

Yeah, that's what needs to be avoided and is the reason "alias this" or 
a property returning the raw string cannot be used.

-- 
/Jacob Carlborg

Nov 21 2013

"Meta" <jared771 gmail.com> writes:

On Thursday, 21 November 2013 at 07:36:38 UTC, Jacob Carlborg 
wrote:
 On 2013-11-20 19:53, Meta wrote:

 Yes. It is very important not to allow direct access to the 
 underlying
 value. This is important for ensuring that it is not put in an 
 invalid
 state. This is a mistake that was made with 
 std.typecons.Nullable,
 making it useless for anything other than giving a 
 non-nullable type a
 null state (which, in fairness, is probably all that it was 
 originally
 intended for).

 In that case all string functionality needs to be provided 
 inside the Validated struct. In addition to that we loose the 
 beauty of UFCS, at least for functions expecting plain "string".

This is tricky business. Unfortunately, having the wrapper be 
able to degrade to its base type is at odds with providing 
compiler-enforced guarantees. We can't allow direct access to the 
underlying string, because the user could purposely or 
inadvertently put it in an invalid state. On the other hand, 
these opaque wrapper types can no longer be transparently 
substituted into existing code. One solution is copying the 
validated string to do arbitrary operations on, leaving the 
original validated string unchanged.

auto validatedString = validate!isValidUTF(someString);
//Doesn't work; Validated!string does not expose the string 
interface
//auto invalidString = validatedString.map!(c => c - 
cast(char)int.max);
//Also doesn't work
//validatedString ~= cast(char)0xFFFF
auto validatedCopy = validatedString.duplicate();
//Do bad things with validatedCopy. validatedString remains 
unchanged and valid

Nov 20 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 20.11.2013 19:30, Dmitry Olshansky wrote:
 20-Nov-2013 22:01, Simen Kjærås пишет:
 On 20.11.2013 18:45, Simen Kjærås wrote:

 [snip]
 May I suggest:

 struct Validated(alias fn, T) {
      private T value;
       property inout
      T get() {
          return value;
      }

 Uh-hm. Add this:
         alias get this;

 And it decays to the naked type in a blink of an eye. And some function
 down the road will do the validation again...

And guess what? That's (often) ok. It's better to do the validation once 
too many than missing it once.

The point (at least in the cases I've used it) is to enforce that only 
validated values are passed to functions that require validated strings, 
not that validated values never be passed to functions that don't really 
care. Doing it like this also lets you call functions that take the 
unadorned type, because that might be just as important.

The result of re-validating is performance loss. The result of missed 
validation is a bug. Also, in just a few lines, you can make a version 
that will *not* decay to the original type:

   struct Validated(alias fn, T) {
       private T _value;
        property inout
       T value() {
           return _value;
       }
   }

   // validated() is identical to before.

Sure, using it is a bit more verbose than using the unadorned type, 
which is why I chose to make the original version automatically decay. 
This is a judgment where sensible people may disagree, even with 
themselves on a case-by-case basis.

-- 
   Simen

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-21 01:16, Simen Kjærås wrote:

 The result of re-validating is performance loss. The result of missed
 validation is a bug. Also, in just a few lines, you can make a version
 that will *not* decay to the original type:

    struct Validated(alias fn, T) {
        private T _value;
         property inout
        T value() {
            return _value;
        }
    }

    // validated() is identical to before.

 Sure, using it is a bit more verbose than using the unadorned type,
 which is why I chose to make the original version automatically decay.
 This is a judgment where sensible people may disagree, even with
 themselves on a case-by-case basis.

It's still accessible via "value".

-- 
/Jacob Carlborg

Nov 20 2013

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 2013-11-21 08:38, Jacob Carlborg wrote:
 On 2013-11-21 01:16, Simen Kjærås wrote:

 The result of re-validating is performance loss. The result of missed
 validation is a bug. Also, in just a few lines, you can make a version
 that will *not* decay to the original type:

    struct Validated(alias fn, T) {
        private T _value;
         property inout
        T value() {
            return _value;
        }
    }

    // validated() is identical to before.

 Sure, using it is a bit more verbose than using the unadorned type,
 which is why I chose to make the original version automatically decay.
 This is a judgment where sensible people may disagree, even with
 themselves on a case-by-case basis.

 It's still accessible via "value".

Indeed it is. If we want to make it perfectly impossible to get at the 
contents, so as to hinder all possible use of the data, I suggest this 
solution:

struct Validated {}

Validated validate() {
     return Validated.init;
}

--
   Simen

Nov 21 2013

"Daniel Davidson" <nospam spam.com> writes:

On Wednesday, 20 November 2013 at 18:30:58 UTC, Dmitry Olshansky 
wrote:
 And it decays to the naked type in a blink of an eye. And some 
 function down the road will do the validation again...

Not if that function down the road only accepted validated in the 
first place because that is what it needed. Follow the rule - if 
you need validated instance only accept validated type - do not 
try to validate.

Nov 21 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/20/2013 3:16 AM, Jonathan M Davis wrote:
 ValidatedString would then avoid any extra validation when iterating over the
 characters, though I don't know how much of an efficiency gain that would
 actually be given that much of the validation occurs naturally when decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks and see if
 there actually is much difference. Because if there isn't, then I don't think
 that it's worth going to the trouble of adding something like ValidatedString.

Utf validation isn't the only form of validation for strings. You could, for 
example, validate that the string doesn't contain SQL injection code, or 
contains a correctly formatted date, or has a name that is guaranteed to be in 
your employee database, or is a valid phone number, or is a correct email 
address, etc.

Again, validation is not defined by D, it is defined by the constraints YOUR 
PROGRAM puts on it.

Nov 20 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, November 20, 2013 16:26:59 Walter Bright wrote:
 On 11/20/2013 3:16 AM, Jonathan M Davis wrote:
 ValidatedString would then avoid any extra validation when iterating over
 the characters, though I don't know how much of an efficiency gain that
 would actually be given that much of the validation occurs naturally when
 decoding or using stride. It would have the downside that any function
 which specializes on strings would likely have to then specialize on
 ValidatedString as well. So, while I agree with the idea in concept, I'd
 propose that we benchmark the difference in decoding and striding without
 the checks and see if there actually is much difference. Because if there
 isn't, then I don't think that it's worth going to the trouble of adding
 something like ValidatedString.

 Utf validation isn't the only form of validation for strings. You could, for
 example, validate that the string doesn't contain SQL injection code, or
 contains a correctly formatted date, or has a name that is guaranteed to be
 in your employee database, or is a valid phone number, or is a correct
 email address, etc.
 
 Again, validation is not defined by D, it is defined by the constraints YOUR
 PROGRAM puts on it.

Yes, but we seemed to be discussing the possibility of having some kind of 
type in Phobos which indicated that the string had been validated for UTF 
correctness. I wouldn't expect other types of string validation to end up in 
Phobos.

And without the type for UTF validation being in Phobos and specialized on in 
Phobos functions, I don't think that I would ever want to use it, because in 
such a case, you lose out on all of the specialization that Phobos does for 
strings and are stuck with a range of dchar, which will force a lot of extra 
decoding even if some of the validation can be skipped, since it was already 
validated, whereas a number of Phobos functions are able to specialize on 
narrow strings and avoid decoding altogether. That performance boost would be 
lost if a string was wrapped in a UTFValidatedString without Phobos 
specializing on UTFValidatedString, and based on how decode and stride work, 
it looks to me like the decoding costs way more than the little bit of extra 
validation that is currently done as part of that such that avoiding the 
decoding is likely to be a much greater performance boost than avoiding those 
checks. And if that is indeed the case, I don't see much point to something 
like UTFValidatedString unless Phobos specializes for it like it specializes 
for narrow strings.

Other types of string validation might very well be worth doing without Phobos 
knowing about them, but having the wrapper type which indicates that that 
validation has been done still needs to be worth more than the performance hit 
of not being able to use naked strings anymore and losing any performance 
gains that come from the functions which specialize for narrow strings. And 
that's probably true for strings that just get passed around but probably 
isn't true for strings that end up being processed by range-based functions a 
lot.

- Jonathan M Davis

Nov 20 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Wed, 20 Nov 2013 16:26:59 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 Utf validation isn't the only form of validation for strings. You could, for 
 example, validate that the string doesn't contain SQL injection code, or 
 contains a correctly formatted date, or has a name that is guaranteed to be in 
 your employee database, or is a valid phone number, or is a correct email 
 address, etc.
 
 Again, validation is not defined by D, it is defined by the constraints YOUR 
 PROGRAM puts on it.

A checked type for database access goes a bit beyond the scope
of the proposal. You'd need to encapsulate a transaction that
needs to be working on a snapshot of the database state and
fail if data changed in another transaction.
Otherwise you could validate a name against the database just
before someone else deletes it and thus invalidates the string.

With a DB transaction wrapped in the validation, assignment
between two "validated" strings becomes a pretty sophisticated
runtime action, while the original proposals evolved around
validation functions that can be pure. /This allows us to assign
one validated string type to another with no runtime overhead./

-- 
Marco

Nov 25 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/20/2013 2:49 AM, Jacob Carlborg wrote:
 How should we accomplish this? We can't replace:

 void main (string[] args)

 With

 void main (UnsafeString[] args)

 And break every application out there.

Use a different type for the validated string, validated means your program has 
guaranteed it has a certain form defined by that program.

Nov 20 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, November 19, 2013 16:01:00 Andrei Alexandrescu wrote:
 Please chime in with ideas!

In general, I favor using defensive programming in library APIs and using 
enforce to validate the input to functions. Doing so makes it much harder to 
misuse the library and makes it much less likely that programs will run into 
weird and/or undefined behavior or other types of bugs. I then favor using DbC 
within a library or application for its own code and asserting that input is 
valid in those cases, because in that case, the caller is essentially part of 
the same code that's doing the asserting and is maintained by the same people.

The problem with that is of course that there are cases where performance 
degrades when you use defensive programming and always check input - 
especially when the caller can know that the data is valid without having to 
check it first. So, having a way to use an API that doesn't involve it always 
defensively checking its input can be useful for the sake of efficiency.

Unfortunately, I don't think that it scales at all to take the approach that 
Walter has suggested of having the API normally assert on input and provide 
helper functions which the caller can use to validate input when they deem 
appropriate. That has the advantage of giving the caller control over what is 
and isn't checked and avoiding unnecessary checks, but it also makes it much 
easier to misuse the API, and I would expect the average programmer to skip 
the checks in most cases. It very quickly becomes like using error codes 
instead of exceptions, except that in this case, instead of an error code 
being ignored, the data's validity wouldn't have even been checked in the first 
place, resulting in the function being called doing who-knows-what. And the 
resulting bugs could be very obvious, or they could be insidiously hard to 
detect.

So, if we can find a way to default to checking validity and throwing on bad 
input but still provide a way for the caller to avoid the checks when 
appropriate, I think that that would be ideal. That way, we default to 
correctness and user-friendliness (in that the API is harder to silently use 
incorrectly that way), but we still provide a more performant route for those 
who know what they're doing and are willing to take the time to make sure that 
they are sure that they truly do know how to use the API correctly and take 
responsibility for ensuring that they don't feed bad input to the API.

Now, how we do that, I don't know. In some cases, creating a wrapper type 
would solve the problem (e.g. some kind of wrapper for strings which 
guaranteed UTF-correctness). But I don't think that it scales to use wrapper 
types for all such situations. One alternative is to essentially duplicate a 
lot of functions with one function validating the input for you and throwing 
on failure, and the other asserting that the input is valid. But that could 
result in a lot of code duplication, which isn't terribly desirable either.

The assumeSorted or FracSec.assumeValid solutions seem to go either with a 
wrapper type or with essentially being a second function which does the same 
thing but without the validation depending on the types involved and what the 
function is doing.

Another alternative would be to provide an argument (probably a template 
argument, though it could be a function argument if that makes more sense) 
which told the function whether it should assert or enforce on its input. That 
would at least localize the code duplication, but again, that could get a bit 
verbose, and I do like how assumeXYZ makes it abundantly clear that the caller 
is taking responsibility for the correctness in that case.

And in some situations, I think that it would clearly be the case that it 
wouldn't make any sense to do anything else other than enforce on the input 
(e.g. string parsing functions have a tendency to have to do almos the same 
work in the validation function as the actual parsing function, making it 
almost pointless to have a separate validation function).

So, I think that what we end up doing is definitely going to depend on what the 
code in question is for and what it's doing, but I agree that it would be 
valuable to come up with some common idioms for handling validation and error 
checking, and assumeXYZ would be one such idiom and one which documents things 
nicely when it can be used.

Still, the most important point that I'd like to make is that I think we 
should lean towards validating input with enforce by default and then provide 
alternative means to avoid that validation rather than using assertions and 
DbC by default, because leaving the validation up to the caller in release and 
asserting in debug is going to lead to _far_ more bugs in code using Phobos, 
particularly when the result isn't immediately and obviously wrong when bad 
input is given. And the fact that by default, the assertions in Phobos won't 
be hit in calling code unless the Phobos function is templatized (because 
Phobos will have been compiled in release) makes using assertions that much 
worse.

But I'll definitely have to think about idioms that we could use to do separate 
validation where appropriate and yet validate arguments via enforce by 
default.

- Jonathan M Davis

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-20 11:38, Jonathan M Davis wrote:

 Unfortunately, I don't think that it scales at all to take the approach that
 Walter has suggested of having the API normally assert on input and provide
 helper functions which the caller can use to validate input when they deem
 appropriate. That has the advantage of giving the caller control over what is
 and isn't checked and avoiding unnecessary checks, but it also makes it much
 easier to misuse the API, and I would expect the average programmer to skip
 the checks in most cases. It very quickly becomes like using error codes
 instead of exceptions, except that in this case, instead of an error code
 being ignored, the data's validity wouldn't have even been checked in the first
 place, resulting in the function being called doing who-knows-what. And the
 resulting bugs could be very obvious, or they could be insidiously hard to
 detect.

I think Walter suggestion requires the use of asserts:

bool isValid (Data data);

void process (Data data)
{
     assert(isValid(data));
     // process
}

The asserts should be on by default and remove in release builds. This 
would require DMD shipping two versions of Phobos, one with asserts 
enabled and one where they're disabled. Then only when the -release flag 
is used the the version of Phobos with disabled asserts will be used.

 Still, the most important point that I'd like to make is that I think we
 should lean towards validating input with enforce by default and then provide
 alternative means to avoid that validation rather than using assertions and
 DbC by default, because leaving the validation up to the caller in release and
 asserting in debug is going to lead to _far_ more bugs in code using Phobos,
 particularly when the result isn't immediately and obviously wrong when bad
 input is given. And the fact that by default, the assertions in Phobos won't
 be hit in calling code unless the Phobos function is templatized (because
 Phobos will have been compiled in release) makes using assertions that much
 worse.

DMD need to ship with two versions of Phobos, one with assertions on and 
one with them disabled.

-- 
/Jacob Carlborg

Nov 20 2013

Timon Gehr <timon.gehr gmx.ch> writes:

On 11/20/2013 12:57 PM, Jacob Carlborg wrote:
 bool isValid (Data data);

 void process (Data data)
 {
      assert(isValid(data));
      // process
 }


void process(Data data)in{ assert(isValid(data)); }body{
     // process
}

Nov 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-20 14:01, Timon Gehr wrote:

 void process(Data data)in{ assert(isValid(data)); }body{
      // process
 }

Right, forgot about contracts.

-- 
/Jacob Carlborg

Nov 20 2013

"Lars T. Kyllingstad" <public kyllingen.net> writes:

On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei 
Alexandrescu wrote:
 (c) A variety of text functions currently suffer because we 
 don't make the difference between validated UTF strings and 
 potentially invalid ones.

I think it is fair to always assume that a char[] is a valid 
UTF-8 string, and instead perform the validation when 
creating/filling the string from a non-validated source.

Take std.file.read() as an example; it returns void[], but has a 
validating counterpart in std.file.readText().

I think we should use ubyte[] to a greater extent for data which 
is potentially *not* valid UTF.  Examples include interfacing 
with C functions, where I think there is a tendency towards 
always translating C char to D char, when they are in fact not 
equivalent.  Another example is, again, std.file.read(), which 
currently returns void[].  I guess it is a matter of taste, but I 
think ubyte[] would be more appropriate here, since you can 
actually use it for something without casting it first.

The transition from string to ubyte[] is already made simple by 
std.string.representation.  We should offer an equally simple and 
convenient way to do the opposite transformation.  In one of my 
current projects, I am using this function:

   inout(char)[] asString(inout(ubyte)[] data)  safe pure
   {
     auto s = cast(typeof(return)) data;
     import std.utf: validate;
     validate(s);
     return s;
   }

This could easily be written as a template, to accept wider 
encodings as well, and I think it would be a nice addition to 
Phobos.

Lars

Nov 20 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, November 20, 2013 11:45:57 Lars T. Kyllingstad wrote:
 On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei
 
 Alexandrescu wrote:
 (c) A variety of text functions currently suffer because we
 don't make the difference between validated UTF strings and
 potentially invalid ones.

 
 I think it is fair to always assume that a char[] is a valid
 UTF-8 string, and instead perform the validation when
 creating/filling the string from a non-validated source.

That doesn't work when strings are being created via concatenation and the 
like inside the program rather than simply coming from outside the program.

 Take std.file.read() as an example; it returns void[], but has a
 validating counterpart in std.file.readText().
 
 I think we should use ubyte[] to a greater extent for data which
 is potentially *not* valid UTF.

Well, we've already discussed the possibility of using ubyte[] to indicate 
ASCII strings, and that makes a lot more sense IMHO, because then no decoding 
occurs (which is precisely what you want for ASCII), whereas with a string 
that's potentially invalid UTF, it's not that we don't want to decode it. It's 
just that we need to validate it when decoding it.

So, I'd argue that ubyte[] should be used when you want to operate on code 
units rather than code points rather than it having anything to do with 
validating code points.

- Jonathan M Davis

Nov 20 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

20-Nov-2013 14:45, Lars T. Kyllingstad пишет:
 On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:
 (c) A variety of text functions currently suffer because we don't make
 the difference between validated UTF strings and potentially invalid
 ones.

 I think it is fair to always assume that a char[] is a valid UTF-8
 string, and instead perform the validation when creating/filling the
 string from a non-validated source.

 Take std.file.read() as an example; it returns void[], but has a
 validating counterpart in std.file.readText().

Sadly it's horrifically slow to do so. Above all practicality must take 
precedence. Would you like to validate the whole file just to later 
re-scan it anew to say tokenize source file?

 I think we should use ubyte[] to a greater extent for data which is
 potentially *not* valid UTF.  Examples include interfacing with C
 functions, where I think there is a tendency towards always translating
 C char to D char, when they are in fact not equivalent.  Another example
 is, again, std.file.read(), which currently returns void[].  I guess it
 is a matter of taste, but I think ubyte[] would be more appropriate
 here, since you can actually use it for something without casting it first.

Otherwise I think it's a good idea to encode high-level invariants in 
types. The only problem is inadvertent template bloat then.

[snip]

-- 
Dmitry Olshansky

Nov 20 2013

Lionello Lunesu <lionello lunesu.remove.com> writes:

On 11/20/13, 18:45, Lars T. Kyllingstad wrote:
 I think we should use ubyte[] to a greater extent for data which is
 potentially *not* valid UTF.  Examples include interfacing with C
 functions, where I think there is a tendency towards always translating
 C char to D char, when they are in fact not equivalent.  Another example
 is, again, std.file.read(), which currently returns void[].  I guess it
 is a matter of taste, but I think ubyte[] would be more appropriate
 here, since you can actually use it for something without casting it first.

+1

Especially the windows APIs, they never take UTF-8(*) but consistently 
get translated to taking D char :(

In fact, if we want a good translation from C to D, we should be using D 
byte. On most platforms I've run into have C char is signed. (To be 
honest, you don't see 'byte' much in D code, so it would make the ported 
code stand out even more.)

* except from MultiByteToWideChar

Nov 26 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, November 20, 2013 08:51:16 Joseph Rushton Wakeling wrote:
 On 20/11/13 01:01, Andrei Alexandrescu wrote:
 There's been recent discussion herein about what parameter validation
 method would be best for Phobos to adhere to.
 
 Currently we are using a mix of approaches:
 
 1. Some functions enforce()
 
 2. Some functions just assert()
 
 3. Some (fewer I think) functions assert(0)
 
 4. Some functions don't do explicit checking, relying instead on
 lower-level enforcement such as null dereference and bounds checking to
 ensure safety.
 
 Each method has its place. The question is what guidelines we put forward
 for Phobos code to follow; we're a bit loose about that right now.

 
 Regarding enforce() vs. assert(), a good rule that I remember having
 suggested to me was that enforce() should be used for actual runtime
 checking (e.g. checking that the input to a public API function has correct
 properties), assert() should be used to test logical failures (i.e.
 checking that cases which should never arise, really don't arise).
 
 I've always followed that as a rule of thumb ever since.

When an assertion fails, it's a bug in your code. Assertions should _never_ be 
used for validating user input. So, if your function is asserting on the state 
of its input, then it is requiring that the caller give input which follows 
that contract, and it's a bug in the caller when they violate that contract by 
passing in bad input.

When your function uses enforce to validate its input, it is _not_ considered 
a bug when bad input is given. It _could_ be a bug in the caller, but they are 
not required to give valid input. When they give invalid input, they then get 
to react to the exception that was thrown and handle the error appropriately. 
Then this works when the input came from outside the program (e.g. a user or a 
file) as well as when it doesn't make sense for the caller to have validated 
the input before calling the function (e.g. because the validator function and 
the function doing the work end up having to almost the same work, making it 
cheaper to just have the function validate its input and not have a separate 
validator function). It also makes it so that the function will _never_ have 
to operate on invalid input as invalid input will always be checked and 
rejected, which then makes it much harder to use the function incorrectly.

But ultimately, whether you use assertions or exceptions comes down to whether 
it's considered to always be a bug in the caller if the input is bad. DbC uses 
assertions and considers it a bug in the caller (since they violated their 
part of the contract), whereas defensive programming has the function protect 
itself and always check and throw on invalid input rather than assuming that 
the caller is going to provide valid input.

- Jonathan M Davis

Nov 20 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Checking function parameters in Phobos