www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Checking function parameters in Phobos

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
There's been recent discussion herein about what parameter validation 
method would be best for Phobos to adhere to.

Currently we are using a mix of approaches:

1. Some functions enforce()

2. Some functions just assert()

3. Some (fewer I think) functions assert(0)

4. Some functions don't do explicit checking, relying instead on 
lower-level enforcement such as null dereference and bounds checking to 
ensure safety.

Each method has its place. The question is what guidelines we put 
forward for Phobos code to follow; we're a bit loose about that right now.

A second, just as interesting topic, is how to design abstractions for 
speed and safety. There are cases in which spurious checking is 
prohibitively expensive if not necessary, so it should be avoided where 
necessary. Examples:

(a) FracSecs(long x) validates x to be within range. The cost of the 
validation itself is about as high as the payload itself (which is one 
assignment).

(b) sort() offers a SortedRange with its goodies. We also have 
assumeSorted that also offers a SortedRange, but relies on the user to 
validate that assumption.

(c) A variety of text functions currently suffer because we don't make 
the difference between validated UTF strings and potentially invalid ones.

Walter and I are thinking of fostering the idiom in which types (or 
attributes?) are used as information about validation, similar to how 
assumeSorted works. Building on that, we'd have a function like "static 
FracSecs assumeValid(long)" inside FracSecs (no need for a different 
type here). Then, we'd have a CleanUTF type or something that would 
guarantee the string stored within has been validated.


Please chime in with ideas!

Andrei
Nov 19 2013
next sibling parent "growler" <growlercab gmail.com> writes:
On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei 
Alexandrescu wrote:
 There's been recent discussion herein about what parameter 
 validation method would be best for Phobos to adhere to.

 Currently we are using a mix of approaches:

 1. Some functions enforce()

 2. Some functions just assert()

 3. Some (fewer I think) functions assert(0)

 4. Some functions don't do explicit checking, relying instead 
 on lower-level enforcement such as null dereference and bounds 
 checking to ensure safety.

 Each method has its place. The question is what guidelines we 
 put forward for Phobos code to follow; we're a bit loose about 
 that right now.

 A second, just as interesting topic, is how to design 
 abstractions for speed and safety. There are cases in which 
 spurious checking is prohibitively expensive if not necessary, 
 so it should be avoided where necessary. Examples:

 (a) FracSecs(long x) validates x to be within range. The cost 
 of the validation itself is about as high as the payload itself 
 (which is one assignment).

 (b) sort() offers a SortedRange with its goodies. We also have 
 assumeSorted that also offers a SortedRange, but relies on the 
 user to validate that assumption.

 (c) A variety of text functions currently suffer because we 
 don't make the difference between validated UTF strings and 
 potentially invalid ones.

 Walter and I are thinking of fostering the idiom in which types 
 (or attributes?) are used as information about validation, 
 similar to how assumeSorted works. Building on that, we'd have 
 a function like "static FracSecs assumeValid(long)" inside 
 FracSecs (no need for a different type here). Then, we'd have a 
 CleanUTF type or something that would guarantee the string 
 stored within has been validated.


 Please chime in with ideas!

 Andrei
I'm not a Phobos dev. but as a user of Phobos and coming from C/C++ I'd like to see... Less enforce and more debug-only contracts in the std lib, with opt-in run-time checks for release builds. That way I can decide on a function-by-function basis or globally at compile time whether the run-time checks occur in release builds. For example, given: 1. FracSecs(long x) 2. FracSecs!Args.verify(long x) In debug 1. would always have full run-time checking enabled. In release builds 1. would only have essential run-time checks, preferably none. I can then opt-in for run-time checks in release builds using 2. There would also be a version(ArgsVerify) so I can turn on run-time checks globally at compile time in release builds (maybe the --debug flag allows this already, not sure). Of course this unfortunately requires even more work from Phobos devs and I'm not a D expert so I don't know how viable it would be. Whatever is decided I'm looking forward to see what you guys come up with because I'm currently using Phobos as my "Idiomatic D" reference guide. Thanks G.
Nov 19 2013
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 There's been recent discussion herein about what parameter 
 validation method would be best for Phobos to adhere to.

 Currently we are using a mix of approaches:

 1. Some functions enforce()

 2. Some functions just assert()

 3. Some (fewer I think) functions assert(0)

 4. Some functions don't do explicit checking, relying instead 
 on lower-level enforcement such as null dereference and bounds 
 checking to ensure safety.

 Each method has its place. The question is what guidelines we 
 put forward for Phobos code to follow; we're a bit loose about 
 that right now.
I think Phobos should rely much more on Contract Programming based on asserts. This could mean Dmd automatically using a Phobos compiled with asserts when you compile your D code normally, and automatically using a assert-stripped version of Phobos libs when you compile with -release and similar. In other situations enforce and exceptions are still useful.
 (b) sort() offers a SortedRange with its goodies. We also have 
 assumeSorted that also offers a SortedRange, but relies on the 
 user to validate that assumption.
I'd like another function, that could be named validateSorted() that returns a SortedRange and always fully verifies its range argument is actually sorted, and throws an exception otherwise. So it doesn't assume its input is sorted. It's like a isSorted + assumeSorted.
 (c) A variety of text functions currently suffer because we 
 don't make the difference between validated UTF strings and 
 potentially invalid ones.
Often I have genomic data or other text data that is surely ASCII (and I can accept a run-time exception at loading time if it's not ASCII). Once such text is in memory I'd like to not pay for UTF on it. Sometimes you can do this with std.string.representation, but there is no opposite function (http://d.puremagic.com/issues/show_bug.cgi?id=10162 ). Also in Phobos there are several string/char functions that could be made faster if the input is assumed to be ASCII. To solve this problem in languages as Haskell they usually introduce a new type like AsciiString. In past I have suggested to introduce such string wrapper in Phobos.
 Then, we'd have a CleanUTF type or something that would 
 guarantee the string stored within has been validated.
In recent talks Bjarne Stroustrup has being advocating a lot such usage of types for safety in C++11/C++14, and functional programmers use it often since lot of time. OcaML programmers use such style of coding to write "safer" code all the time. Too many types make the code harder (also because D doesn't have de-structuring syntax in function signatures and so on), but few strategically designed structs can help. Bye, bearophile
Nov 19 2013
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Wednesday, 20 November 2013 at 00:48:40 UTC, bearophile wrote:
 [snip]
 Often I have genomic data or other text data that is surely 
 ASCII (and I can accept a run-time exception at loading time if 
 it's not ASCII). Once such text is in memory I'd like to not 
 pay for UTF on it. Sometimes you can do this with 
 std.string.representation, but there is no opposite function 
 (http://d.puremagic.com/issues/show_bug.cgi?id=10162 ). Also in 
 Phobos there are several string/char functions that could be 
 made faster if the input is assumed to be ASCII. To solve this 
 problem in languages as Haskell they usually introduce a new 
 type like AsciiString. In past I have suggested to introduce 
 such string wrapper in Phobos.
Is that not what phobo's AsciiString is?
Nov 19 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/19/2013 4:48 PM, bearophile wrote:
 Also in Phobos there are
 several string/char functions that could be made faster if the input is assumed
 to be ASCII.
Which ones? The ones I coded up originally were designed so they weren't degraded by utf.
Nov 20 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-11-20 01:01, Andrei Alexandrescu wrote:
 There's been recent discussion herein about what parameter validation
 method would be best for Phobos to adhere to.

 Currently we are using a mix of approaches:

 1. Some functions enforce()

 2. Some functions just assert()

 3. Some (fewer I think) functions assert(0)

 4. Some functions don't do explicit checking, relying instead on
 lower-level enforcement such as null dereference and bounds checking to
 ensure safety.

 Each method has its place. The question is what guidelines we put
 forward for Phobos code to follow; we're a bit loose about that right now.

 A second, just as interesting topic, is how to design abstractions for
 speed and safety. There are cases in which spurious checking is
 prohibitively expensive if not necessary, so it should be avoided where
 necessary. Examples:

 (a) FracSecs(long x) validates x to be within range. The cost of the
 validation itself is about as high as the payload itself (which is one
 assignment).

 (b) sort() offers a SortedRange with its goodies. We also have
 assumeSorted that also offers a SortedRange, but relies on the user to
 validate that assumption.

 (c) A variety of text functions currently suffer because we don't make
 the difference between validated UTF strings and potentially invalid ones.

 Walter and I are thinking of fostering the idiom in which types (or
 attributes?) are used as information about validation, similar to how
 assumeSorted works. Building on that, we'd have a function like "static
 FracSecs assumeValid(long)" inside FracSecs (no need for a different
 type here). Then, we'd have a CleanUTF type or something that would
 guarantee the string stored within has been validated.
Would we accompany the assumeSorted with an assert in the function assuming something is sorted? We probably don't want to rely on convention. What about distributing a version of druntime and Phobos with asserts enabled that is used by default (or with the -debug flag). Then a version with asserts disabled is used when the -release flag is used. We probably also want it to be possible to use Phobos with asserts enabled even in release mode. -- /Jacob Carlborg
Nov 19 2013
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 20 Nov 2013 08:49:28 +0100
schrieb Jacob Carlborg <doob me.com>:

 What about distributing a version of druntime and Phobos with asserts 
 enabled that is used by default (or with the -debug flag). Then a 
 version with asserts disabled is used when the -release flag is used.
 
 We probably also want it to be possible to use Phobos with asserts 
 enabled even in release mode.
That is what LDC does and with the -defaultlib switch it is easy to use the debug Phobos in release builds. Currently this flag is mostly used to link against the shared phobos2.so. -- Marco
Nov 20 2013
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/20/2013 08:49 AM, Jacob Carlborg wrote:
 Would we accompany the assumeSorted with an assert in the function
 assuming something is sorted? We probably don't want to rely on convention.
We do in any case: import std.algorithm, std.range; void main(){ auto a = [1,2,3,4,5]; auto s = sort(a); swap(a[0],a[$-1]); assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted()); }
Nov 20 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
      auto a = [1,2,3,4,5];
      auto s = sort(a);
      swap(a[0],a[$-1]);
      assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted());
 }
I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted? -- /Jacob Carlborg
Nov 20 2013
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/20/2013 02:52 PM, Jacob Carlborg wrote:
 On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
      auto a = [1,2,3,4,5];
      auto s = sort(a);
      swap(a[0],a[$-1]);
      assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted());
 }
I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted?
Yes, hence SortedRange being sorted is just a convention in any case.
Nov 20 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/20/13 6:14 AM, Timon Gehr wrote:
 Yes, hence SortedRange being sorted is just a convention in any case.
That's right. In particular we can't have assumeSorted check for isSorted even at the point of creation, and even with debug-only asserts. This is because checking would change the complexity of binary search and related algorithms, which is often prohibitive. Andrei
Nov 20 2013
prev sibling parent reply "Meta" <jared771 gmail.com> writes:
On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr wrote:
 On 11/20/2013 02:52 PM, Jacob Carlborg wrote:
 On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
     auto a = [1,2,3,4,5];
     auto s = sort(a);
     swap(a[0],a[$-1]);
     assert(is(typeof(s)==SortedRange!(int[])) && 
 !s.isSorted());
 }
I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted?
Yes, hence SortedRange being sorted is just a convention in any case.
Couldn't we have an overload of each of the mutating functions in std.algorithm that takes a SortedRange and does static assert(0, "Cannot modify a sorted range")? I suppose there are cases where we *want* to mutate a sorted range... Unwrap the inner type, maybe?
Nov 20 2013
next sibling parent "Meta" <jared771 gmail.com> writes:
On Wednesday, 20 November 2013 at 17:56:22 UTC, Meta wrote:
 On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr 
 wrote:
 On 11/20/2013 02:52 PM, Jacob Carlborg wrote:
 On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
    auto a = [1,2,3,4,5];
    auto s = sort(a);
    swap(a[0],a[$-1]);
    assert(is(typeof(s)==SortedRange!(int[])) && 
 !s.isSorted());
 }
I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted?
Yes, hence SortedRange being sorted is just a convention in any case.
Couldn't we have an overload of each of the mutating functions in std.algorithm that takes a SortedRange and does static assert(0, "Cannot modify a sorted range")? I suppose there are cases where we *want* to mutate a sorted range... Unwrap the inner type, maybe?
That is, a mutating function that takes a sorted range strips the SortedRange wrapper and returns the underlying type.
Nov 20 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/20/13 9:56 AM, Meta wrote:
 On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr wrote:
 On 11/20/2013 02:52 PM, Jacob Carlborg wrote:
 On 2013-11-20 13:56, Timon Gehr wrote:

 We do in any case:

 import std.algorithm, std.range;

 void main(){
     auto a = [1,2,3,4,5];
     auto s = sort(a);
     swap(a[0],a[$-1]);
     assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted());
 }
I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted?
Yes, hence SortedRange being sorted is just a convention in any case.
Couldn't we have an overload of each of the mutating functions in std.algorithm that takes a SortedRange and does static assert(0, "Cannot modify a sorted range")? I suppose there are cases where we *want* to mutate a sorted range... Unwrap the inner type, maybe?
That wouldn't help much - people have access to the underlying range anyway. Andrei
Nov 20 2013
parent "Meta" <jared771 gmail.com> writes:
On Wednesday, 20 November 2013 at 20:06:47 UTC, Andrei 
Alexandrescu wrote:
 That wouldn't help much - people have access to the underlying 
 range anyway.

 Andrei
You're right, I forgot about that. However, people generally won't be modifying a SortedRange in place, will they? Even if they do, it'll probably be using one of the mutating functions in std.algorithm. Also, somewhat related, couldn't std.algorithm.sort simply return the passed-in range if that range is already wrapped with SortedRange?
Nov 20 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 20/11/13 01:01, Andrei Alexandrescu wrote:
 There's been recent discussion herein about what parameter validation method
 would be best for Phobos to adhere to.

 Currently we are using a mix of approaches:

 1. Some functions enforce()

 2. Some functions just assert()

 3. Some (fewer I think) functions assert(0)

 4. Some functions don't do explicit checking, relying instead on lower-level
 enforcement such as null dereference and bounds checking to ensure safety.

 Each method has its place. The question is what guidelines we put forward for
 Phobos code to follow; we're a bit loose about that right now.
Regarding enforce() vs. assert(), a good rule that I remember having suggested to me was that enforce() should be used for actual runtime checking (e.g. checking that the input to a public API function has correct properties), assert() should be used to test logical failures (i.e. checking that cases which should never arise, really don't arise). I've always followed that as a rule of thumb ever since.
Nov 19 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/19/2013 4:01 PM, Andrei Alexandrescu wrote:
 There's been recent discussion herein about what parameter validation method
 would be best for Phobos to adhere to.
Important is deciding upon the notions of "validated data" and "untrusted data" is. 1. Validated data should get asserts if it is found to be invalid. 2. Untrusted data should get exceptions thrown if it is found to be invalid (or return errors). For example, consider a utf string. If it has passed a validation check, then it becomes trusted data. Further processing on it should assert if it turns out to be invalid (because then you've got a programming bug). File open failures should always throw, and never assert, because the file is not part of the program and so is inherently not trusted. One way to distinguish validated from untrusted data is by using different types (or a naming convention, see Joel Spolsky's http://www.joelonsoftware.com/articles/Wrong.html). It is of major importance in a program to think about what APIs get validated arguments and what APIs get untrusted arguments.
Nov 20 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-11-20 09:50, Walter Bright wrote:

 Important is deciding upon the notions of "validated data" and
 "untrusted data" is.

 1. Validated data should get asserts if it is found to be invalid.

 2. Untrusted data should get exceptions thrown if it is found to be
 invalid (or return errors).

 For example, consider a utf string. If it has passed a validation check,
 then it becomes trusted data. Further processing on it should assert if
 it turns out to be invalid (because then you've got a programming bug).

 File open failures should always throw, and never assert, because the
 file is not part of the program and so is inherently not trusted.

 One way to distinguish validated from untrusted data is by using
 different types (or a naming convention, see Joel Spolsky's
 http://www.joelonsoftware.com/articles/Wrong.html).

 It is of major importance in a program to think about what APIs get
 validated arguments and what APIs get untrusted arguments.
How should we accomplish this? We can't replace: void main (string[] args) With void main (UnsafeString[] args) And break every application out there. -- /Jacob Carlborg
Nov 20 2013
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, November 20, 2013 11:49:32 Jacob Carlborg wrote:
 On 2013-11-20 09:50, Walter Bright wrote:
 Important is deciding upon the notions of "validated data" and
 "untrusted data" is.
 
 1. Validated data should get asserts if it is found to be invalid.
 
 2. Untrusted data should get exceptions thrown if it is found to be
 invalid (or return errors).
 
 For example, consider a utf string. If it has passed a validation check,
 then it becomes trusted data. Further processing on it should assert if
 it turns out to be invalid (because then you've got a programming bug).
 
 File open failures should always throw, and never assert, because the
 file is not part of the program and so is inherently not trusted.
 
 One way to distinguish validated from untrusted data is by using
 different types (or a naming convention, see Joel Spolsky's
 http://www.joelonsoftware.com/articles/Wrong.html).
 
 It is of major importance in a program to think about what APIs get
 validated arguments and what APIs get untrusted arguments.
How should we accomplish this? We can't replace: void main (string[] args) With void main (UnsafeString[] args) And break every application out there.
You'd do it the other way around by having something like ValidatedString!char s = validateString("hello world"); ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString. - Jonathan M Davis
Nov 20 2013
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");
Right.
 ValidatedString would then avoid any extra validation when iterating over the
 characters, though I don't know how much of an efficiency gain that would
 actually be given that much of the validation occurs naturally when decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks and see if
 there actually is much difference. Because if there isn't, then I don't think
 that it's worth going to the trouble of adding something like ValidatedString.
If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on. -- /Jacob Carlborg
Nov 20 2013
next sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 20 Nov 2013 12:49:20 +0100
schrieb Jacob Carlborg <doob me.com>:

 On 2013-11-20 12:16, Jonathan M Davis wrote:
 
 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");
Right.
 ValidatedString would then avoid any extra validation when iterating over the
 characters, though I don't know how much of an efficiency gain that would
 actually be given that much of the validation occurs naturally when decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks and see if
 there actually is much difference. Because if there isn't, then I don't think
 that it's worth going to the trouble of adding something like ValidatedString.
If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
None of that is feasible. We can only hope that we simply catch every case of user input (or untrusted data) and check it before passing it to Phobos APIs. That's why there are functions to validate and also to sanitize UTF strings on a best effort basis in Phobos. So in my opinion Phobos should continue forward with assert instead of enforce. I/O functions, of course, have to use exceptions. That said, I never thought of validating args[] before passing it to getopt or using them as a filename. Lesson learned, I guess? -- Marco
Nov 20 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-11-20 13:22, Marco Leise wrote:

 None of that is feasible. We can only hope that we simply
 catch every case of user input (or untrusted data) and check
 it before passing it to Phobos APIs. That's why there are
 functions to validate and also to sanitize UTF strings on a
 best effort basis in Phobos.

 So in my opinion Phobos should continue forward with assert
 instead of enforce. I/O functions, of course, have to use
 exceptions.

 That said, I never thought of validating args[] before passing
 it to getopt or using them as a filename. Lesson learned, I
 guess?
I don't know how getopt behaves but using them as a filename will most likely end up calling a system function, which will hopefully take care of the checking. -- /Jacob Carlborg
Nov 20 2013
prev sibling next sibling parent reply =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 20.11.2013 12:49, Jacob Carlborg wrote:
 On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");
Right.
 ValidatedString would then avoid any extra validation when iterating
 over the
 characters, though I don't know how much of an efficiency gain that would
 actually be given that much of the validation occurs naturally when
 decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on
 ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks
 and see if
 there actually is much difference. Because if there isn't, then I
 don't think
 that it's worth going to the trouble of adding something like
 ValidatedString.
If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; } } Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff } -- Simen
Nov 20 2013
next sibling parent reply "Meta" <jared771 gmail.com> writes:
On Wednesday, 20 November 2013 at 17:45:43 UTC, Simen Kjærås 
wrote:
 On 20.11.2013 12:49, Jacob Carlborg wrote:
 On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");
Right.
 ValidatedString would then avoid any extra validation when 
 iterating
 over the
 characters, though I don't know how much of an efficiency 
 gain that would
 actually be given that much of the validation occurs 
 naturally when
 decoding
 or using stride. It would have the downside that any function 
 which
 specializes on strings would likely have to then specialize on
 ValidatedString
 as well. So, while I agree with the idea in concept, I'd 
 propose that we
 benchmark the difference in decoding and striding without the 
 checks
 and see if
 there actually is much difference. Because if there isn't, 
 then I
 don't think
 that it's worth going to the trouble of adding something like
 ValidatedString.
If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; } } Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff }
I was having the exact same thought. I think this could be very powerful if done correctly.
Nov 20 2013
parent reply "Dicebot" <public dicebot.lv> writes:
I also think this is very powerful and under-explored approach 
but it really better belongs to certain domain framework than to 
stdlib. One example I keep thinking about is to re-declare vibe.d 
string functions in terms of EscapedString!(SQL), 
EscapedString!(HTML) and so on for better application safety and 
correctness. No idea how that may work in practice though.
Nov 20 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
20-Nov-2013 22:28, Dicebot пишет:
 I also think this is very powerful and under-explored approach but it
 really better belongs to certain domain framework than to stdlib. One
 example I keep thinking about is to re-declare vibe.d string functions
 in terms of EscapedString!(SQL), EscapedString!(HTML) and so on for
 better application safety and correctness. No idea how that may work in
 practice though.
I think the obstacles are mostly: 1. There is a non-zero intersection between validated subsets. Some kind of NiceStringWithNoPunctuation fits practically every EscapedString!(XYZ). There must be a way to cascade and mix/match these classes. 2. Template bloatZ! It would be real hard to fight the IFTI duping functions bodies behind your back. Or if you dumb down these escaped types to not fit the most of templates, it may become a usability problem. 3. This kind of thing is viral. With escape hatch though, it may be done step by step. -- Dmitry Olshansky
Nov 20 2013
parent "Dicebot" <public dicebot.lv> writes:
On Wednesday, 20 November 2013 at 20:19:28 UTC, Dmitry Olshansky 
wrote:
 2. Template bloatZ! It would be real hard to fight the IFTI 
 duping functions bodies behind your back. Or if you dumb down 
 these escaped types to not fit the most of templates, it may 
 become a usability problem.

 3. This kind of thing is viral. With escape hatch though, it 
 may be done step by step.
This is the very reason why I am saying it makes much more sense as part of certain application framework as those tends to have more clear separation between internal and external infrastructure and strict usage API expectations. So it is not a usability problem, it is a usability feature :)
Nov 20 2013
prev sibling parent reply "inout" <inout gmail.com> writes:
On Wednesday, 20 November 2013 at 17:45:43 UTC, Simen Kjærås 
wrote:
 On 20.11.2013 12:49, Jacob Carlborg wrote:
 On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");
Right.
 ValidatedString would then avoid any extra validation when 
 iterating
 over the
 characters, though I don't know how much of an efficiency 
 gain that would
 actually be given that much of the validation occurs 
 naturally when
 decoding
 or using stride. It would have the downside that any function 
 which
 specializes on strings would likely have to then specialize on
 ValidatedString
 as well. So, while I agree with the idea in concept, I'd 
 propose that we
 benchmark the difference in decoding and striding without the 
 checks
 and see if
 there actually is much difference. Because if there isn't, 
 then I
 don't think
 that it's worth going to the trouble of adding something like
 ValidatedString.
If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; } } Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff }
What if you have more that just one validation, e.g. Positive and LessThan42? Is Positive!LessThan42!int the same type as LessThan42!Positive!int? Implicitly convertible? I feel that it might be better to use attributes here instead. Something like: positive int validatePositive(int value) { assert(value > 0); return value; } lessThan42 validateLessThan42(int value) { assert(value < 42); return value; } Now you can have positive lessThan42 int value = validatePositive(validateLessThan42(x)); It also doesn't involve creating new types.
Nov 21 2013
parent reply "Meta" <jared771 gmail.com> writes:
On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive 
 and LessThan42?
 Is Positive!LessThan42!int the same type as 
 LessThan42!Positive!int? Implicitly convertible?
Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt Or just pass a function that validates that the int is both positive and less than 42, which would be much simpler.
 ...

 It also doesn't involve creating new types.
Creating new types is what allows us to provide static, compiler-verified guarantees.
Nov 21 2013
next sibling parent reply =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as LessThan42!Positive!int?
 Implicitly convertible?
Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt
I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.
 Or just pass a function that validates that the int is both positive and
 less than 42, which would be much simpler.
-- Simen
Nov 21 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 22 Nov 2013 02:55:44 +0100
schrieb Simen Kj=C3=A6r=C3=A5s <simen.kjaras gmail.com>:

 On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as LessThan42!Positive!int?
 Implicitly convertible?
Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt =3D validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt
=20 I believe inout's point was this, though: =20 Validated!(isPositive, lessThan42, int) i =3D foo(); =20 Validated!(isPositive, int) n =3D i; // Fails. Validated!(lessThan42, isPositive, int) r =3D i; // Fails. =20 This is of course less than optimal. =20 If a type such as Validate is to be added to Phobos, these problems need=
=20
 to be fixed first.
Can you write a templated assignment operator that accepts any Validated!* instance and builds the set difference of validation functions that are missing on the assigned value? E.g. in the case of n =3D i: {isPositive} / {isPositive, lessThan42} =3D emtpy set. --=20 Marco
Nov 25 2013
parent reply =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 2013-11-25 13:00, Marco Leise wrote:
 Am Fri, 22 Nov 2013 02:55:44 +0100
 schrieb Simen Kjærås <simen.kjaras gmail.com>:

 On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as LessThan42!Positive!int?
 Implicitly convertible?
Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt
I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.
Can you write a templated assignment operator that accepts any Validated!* instance and builds the set difference of validation functions that are missing on the assigned value? E.g. in the case of n = i: {isPositive} / {isPositive, lessThan42} = emtpy set.
Do you mean this? Validated!(int, isPositive, lessThan42) a = validated!(isPositive, lessThan42)(13); Validated!(int, isPositive) b = a; a = b; // Only tests lessThan42 If so, you're mostly right that this should be done. I am however of the opinion that conversions that may throw should be marked appropriately, so this will be the right way: a = validated!(isPositive, lessThan42)(b); // Only tests lessThan42 New version now available on GitHub: http://git.io/hEe0MA http://git.io/QEP-kQ -- Simen
Nov 25 2013
next sibling parent reply "inout" <inout gmail.com> writes:
On Monday, 25 November 2013 at 13:01:43 UTC, Simen Kjærås wrote:
 On 2013-11-25 13:00, Marco Leise wrote:
 Am Fri, 22 Nov 2013 02:55:44 +0100
 schrieb Simen Kjærås <simen.kjaras gmail.com>:

 On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. 
 Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as 
 LessThan42!Positive!int?
 Implicitly convertible?
Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt
I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.
Can you write a templated assignment operator that accepts any Validated!* instance and builds the set difference of validation functions that are missing on the assigned value? E.g. in the case of n = i: {isPositive} / {isPositive, lessThan42} = emtpy set.
Do you mean this? Validated!(int, isPositive, lessThan42) a = validated!(isPositive, lessThan42)(13); Validated!(int, isPositive) b = a; a = b; // Only tests lessThan42 If so, you're mostly right that this should be done. I am however of the opinion that conversions that may throw should be marked appropriately, so this will be the right way: a = validated!(isPositive, lessThan42)(b); // Only tests lessThan42 New version now available on GitHub: http://git.io/hEe0MA http://git.io/QEP-kQ -- Simen
I find this to be too verbose to be useful. And you also need to be very careful not to discard any existing qualifiers on input and carry them over. This will essentially make any function that uses them to be templated, while all the instances will be the same (yet have a different body since no D compiler merges identical functions). I still find wrapping int with some type to add a tag to it without adding any methods is not a great idea - it doesn't scale well with composition and tag propagation. Any operation that expects int will essentially discard all the qualifiers.
Nov 26 2013
next sibling parent =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 26.11.2013 21:14, inout wrote:
 I find this to be too verbose to be useful.
This I understand. It is actually the best argument I can find in favor of doing constraints checking upon construction, rather than in a separate construction function. This allows you to use one alias instead of two.
 And you also need to
 be very careful not to discard any existing qualifiers on input
 and carry them over. This will essentially make any function that
 uses them to be templated, while all the instances will be the
 same (yet have a different body since no D compiler merges
 identical functions).
Could you give an example of this? It's a bit unclear to me what you mean. Is it this sort of thing: auto doPrimeStuff(Validated!(int, isPrime) a){return a;} auto doLessThan42Stuff(Validated!(int, lessThan42) a){return a;} Validated!(int, isPrime, lessThan42) i = 13; i.doPrimeStuff().doLessThan42Stuff(); Where the second chained function call fails due to lessThan42 being removed from the constraints? (There's also the problem that this wouldn't work in the first place due to D's lack of implicit conversions)
 I still find wrapping int with some type to add a tag to it
 without adding any methods is not a great idea - it doesn't scale
 well with composition and tag propagation. Any operation that
 expects int will essentially discard all the qualifiers.
And any operation of the kind you describe is likely to change the value so the constraints need to be checked again. abs(Validated!(int, isNegative)) cannot possibly return the same type it received. -- Simen
Nov 26 2013
prev sibling parent "Meta" <jared771 gmail.com> writes:
On Tuesday, 26 November 2013 at 20:14:15 UTC, inout wrote:
 Any operation that expects int will essentially discard all the 
 qualifiers.
It isn't surprising that any operation that expects int will get int. To take advantage of Validated, an operation has to expect Validated.
Nov 26 2013
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 25 Nov 2013 14:01:28 +0100
schrieb Simen Kj=C3=A6r=C3=A5s <simen.kjaras gmail.com>:

 On 2013-11-25 13:00, Marco Leise wrote:
 Can you write a templated assignment operator that
 accepts any Validated!* instance and builds the set difference
 of validation functions that are missing on the assigned value?
 E.g. in the case of n =3D i: {isPositive} / {isPositive,
 lessThan42} =3D emtpy set.
=20 Do you mean this? =20 Validated!(int, isPositive, lessThan42) a =3D validated!(isPositive, lessThan42)(13); Validated!(int, isPositive) b =3D a; a =3D b; // Only tests lessThan42 =20 If so, you're mostly right that this should be done. I am however of the=
=20
 opinion that conversions that may throw should be marked appropriately,=20
 so this will be the right way:
=20
 a =3D validated!(isPositive, lessThan42)(b); // Only tests lessThan42
=20
 New version now available on GitHub:
 http://git.io/hEe0MA
 http://git.io/QEP-kQ
=20
 --
    Simen
Yes, that is what I had in mind. --=20 Marco
Nov 26 2013
prev sibling parent reply =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 22.11.2013 02:55, Simen Kjærås wrote:
 On 22.11.2013 00:50, Meta wrote:
 On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:
 What if you have more that just one validation, e.g. Positive and
 LessThan42?
 Is Positive!LessThan42!int the same type as LessThan42!Positive!int?
 Implicitly convertible?
Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt
I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.
 Or just pass a function that validates that the int is both positive and
 less than 42, which would be much simpler.
I've created a version of Validated now that takes 1 or more constraints, and where a type whose constraints are a superset of another's, is implicitly convertible to that. Sadly, because of D's lack of certain implicit conversions, there are limits. Attached is source (validation.d), and some utility functions that are necessary for it to compile (utils.d). Is this worth working more on? Should it be in Phobos? Other critique? Oh, sorry about those stupid questions, we have a term for that: Detroy! -- Simen
Nov 24 2013
parent reply "Meta" <jared771 gmail.com> writes:
On Sunday, 24 November 2013 at 17:35:51 UTC, Simen Kjærås wrote:
 I believe inout's point was this, though:

    Validated!(isPositive, lessThan42, int) i = foo();

    Validated!(isPositive, int) n = i; // Fails.
    Validated!(lessThan42, isPositive, int) r = i; // Fails.

 This is of course less than optimal.

 If a type such as Validate is to be added to Phobos, these 
 problems need
 to be fixed first.


 Or just pass a function that validates that the int is both 
 positive and
 less than 42, which would be much simpler.
I've created a version of Validated now that takes 1 or more constraints, and where a type whose constraints are a superset of another's, is implicitly convertible to that. Sadly, because of D's lack of certain implicit conversions, there are limits. Attached is source (validation.d), and some utility functions that are necessary for it to compile (utils.d). Is this worth working more on? Should it be in Phobos? Other critique? Oh, sorry about those stupid questions, we have a term for that: Detroy!
Awesome, I was messing around with something similar but you beat me to the punch. A couple things: - The function validated would probably be better named validate, since it actually performs validation and returns a validated type. The struct's name is fine. - I think it'd be better to change "static if (is(typeof(fn(value)) == bool))" to "static if (is(typeof(fn(value)) : bool))", which rather than checking that the return type is exactly bool, it only checks that it's implicitly convertible to bool, AKA "truthy". - It might be a good idea to have a version(AlwaysValidate) block in assumeValidated for people who don't care about code speed and want maximum safety, that would always run the validation functions. Also, it might be a good idea to mark assumeValidated system, because it blatantly breaks the underlying assumptions being made in the first place. Code that wants to be rock-solid safe will be restricted to using only validate. Or maybe that's going too far. - Validated doesn't work very well with reference types. The following fails: class CouldBeNull { } bool notNull(T)(T t) if (is(T == class)) { return t !is null; } //Error: cannot implicitly convert expression (this._value) of type inout(CouldBeNull) to f505.CouldBeNull void takesNonNull(Validated!(CouldBeNull, notNull) validatedT) { } - On the subject of reference types, I don't think Validated handles them quite correctly. This is a problem I ran into, and it's not an easy one. Assume for a second that there's a class FourtyTwo that *does* work with Validated: class FortyTwo { int i = 42; } bool containsFortyTwo(FortyTwo ft) { return ft.i == 42; } void mutateFortyTwo(Validated!(FortyTwo, containsFortyTwo) fortyTwo) { fortyTwo.i = 43; } auto a = validated!containsFortyTwo(new FortyTwo()); auto b = a; //Passes assert(a.i == 42); assert(b.i == 42); mutateFortyTwo(a); //Fails assert(a.i == 43); assert(b.i == 43); This is an extremely contrived example, but it illustrates the problem of using reference types with Validated. It gets even hairier if i itself were a reference type, like a slice: void mutateCopiedValue(Validated!(FortyTwo, containsFortyTwo) fortyTwo) { //We're not out of the woods yet int[] arr = fortyTwo.i; arr[0] += 1; } //Continuing from previous example, //except i is now an array mutateCopiedValue(b); assert(a.i[0] == 44); assert(b.i[0] == 44); Obviously in this case you could just .dup i, but what if i were a class itself? It'd be extremely easy to accidentally invalidate every Validated!(FortyTwo, ...) in the program in a single swipe. It gets even worse if i were some class reference to which other, non-validated references existed. Changing those naked references would change i, and possibly invalidate it.
Nov 24 2013
next sibling parent "Meta" <jared771 gmail.com> writes:
On Monday, 25 November 2013 at 07:24:10 UTC, Meta wrote:
 	auto a = validated!containsFortyTwo(new FortyTwo());
 	auto b = a;
 	//Passes
 	assert(a.i == 42);
 	assert(b.i == 42);
 	mutateFortyTwo(a);
 	//Fails
 	assert(a.i == 43);
 	assert(b.i == 43);
"//Fails" should be "//Passes" as well.
Nov 24 2013
prev sibling parent reply =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 2013-11-25 08:24, Meta wrote:
 - The function validated would probably be better named validate, since
 it actually performs validation and returns a validated type. The
 struct's name is fine.
Yeah, I was somewhat torn there, but I think you're right. Fixed.
 - I think it'd be better to change "static if (is(typeof(fn(value)) ==
 bool))" to "static if (is(typeof(fn(value)) : bool))", which rather than
 checking that the return type is exactly bool, it only checks that it's
 implicitly convertible to bool, AKA "truthy".
Even better - test if 'if (fn(value)) {}' compiles. Fixed.
 - It might be a good idea to have a version(AlwaysValidate) block in
 assumeValidated for people who don't care about code speed and want
 maximum safety, that would always run the validation functions. Also, it
 might be a good idea to mark assumeValidated  system, because it
 blatantly breaks the underlying assumptions being made in the first
 place. Code that wants to be rock-solid  safe will be restricted to
 using only validate. Or maybe that's going too far.
safe is only for memory safety, which this is not. I agree it would be nice to mark assumeValidated as 'warning, may not do what it claims', but safe is not really the correct indicator of that.
 - Validated doesn't work very well with reference types. The following
 fails:

 class CouldBeNull
 {
 }

 bool notNull(T)(T t)
 if (is(T == class))
 {
      return t !is null;
 }

 //Error: cannot implicitly convert expression (this._value) of type
 inout(CouldBeNull) to f505.CouldBeNull
 void takesNonNull(Validated!(CouldBeNull, notNull) validatedT)
 {
 }
Yeah, found that. It's a bug in value(), which should return inout(T), not T. Fixed.
 - On the subject of reference types, I don't think Validated handles
 them quite correctly. This is a problem I ran into, and it's not an easy
 one. Assume for a second that there's a class FourtyTwo that *does* work
 with Validated:

      class FortyTwo
      {
          int i = 42;
      }

      bool containsFortyTwo(FortyTwo ft)
      {
          return ft.i == 42;
      }

      void mutateFortyTwo(Validated!(FortyTwo, containsFortyTwo) fortyTwo)
      {
          fortyTwo.i = 43;
      }

      auto a = validated!containsFortyTwo(new FortyTwo());
      auto b = a;
      //Passes
      assert(a.i == 42);
      assert(b.i == 42);
      mutateFortyTwo(a);
      //Fails
      assert(a.i == 43);
      assert(b.i == 43);

 This is an extremely contrived example, but it illustrates the problem
 of using reference types with Validated. It gets even hairier if i
 itself were a reference type, like a slice:

      void mutateCopiedValue(Validated!(FortyTwo, containsFortyTwo)
 fortyTwo)
      {
          //We're not out of the woods yet
          int[] arr = fortyTwo.i;
          arr[0] += 1;
      }

          //Continuing from previous example,
          //except i is now an array
      mutateCopiedValue(b);
      assert(a.i[0] == 44);
      assert(b.i[0] == 44);

 Obviously in this case you could just .dup i, but what if i were a class
 itself? It'd be extremely easy to accidentally invalidate every
 Validated!(FortyTwo, ...) in the program in a single swipe. It gets even
 worse if i were some class reference to which other, non-validated
 references existed. Changing those naked references would change i, and
 possibly invalidate it.
This is a known shortcoming for which I see no good workaround. It would be possible to use std.traits.hasAliasing to see which types can be safely .dup'ed and only allow those types, but this is not a solution I like. I guess it could print a warning when used with unsafe types. If I were to do that, I would still want some way to turn that message off. Eh. Maybe there is no good solution. What else is new? - Better error messages for invalid constraints (testing if an int is null, a string is divisible by 3 or an array has a database connection, e.g.) - Fixed a bug in opCast (I love that word - in Norwegian it [oppkast] means puke. ...anyways...) when converting to an incompatible wrapped value. -- Simen
Nov 25 2013
parent reply "Meta" <jared771 gmail.com> writes:
On Monday, 25 November 2013 at 08:52:14 UTC, Simen Kjærås wrote:
  safe is only for memory safety, which this is not. I agree it 
 would be
 nice to mark assumeValidated as 'warning, may not do what it 
 claims',
 but  safe is not really the correct indicator of that.
What about a version flag, then, that can be passed to specify that the user wants assumeValidated() to run the validation functions as well?
 This is a known shortcoming for which I see no good workaround. 
 It would
 be possible to use std.traits.hasAliasing to see which types 
 can be
 safely .dup'ed and only allow those types, but this is not a 
 solution I
 like.
It's a hard problem. This is a case where a Unique!T type would be really useful.
 What else is new?
 - Better error messages for invalid constraints (testing if an 
 int is
    null, a string is divisible by 3 or an array has a database
    connection, e.g.)
 - Fixed a bug in opCast (I love that word - in Norwegian it 
 [oppkast]
    means puke. ...anyways...) when converting to an 
 incompatible wrapped
    value.

 --
    Simen
Keep up the good work!
Nov 25 2013
parent =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 2013-11-26 06:37, Meta wrote:
 On Monday, 25 November 2013 at 08:52:14 UTC, Simen Kjærås wrote:
  safe is only for memory safety, which this is not. I agree it would be
 nice to mark assumeValidated as 'warning, may not do what it claims',
 but  safe is not really the correct indicator of that.
What about a version flag, then, that can be passed to specify that the user wants assumeValidated() to run the validation functions as well?
That's already in: http://git.io/EdHw8A -- Simen
Nov 26 2013
prev sibling parent reply =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 20.11.2013 18:45, Simen Kjærås wrote:
 On 20.11.2013 12:49, Jacob Carlborg wrote:
 On 2013-11-20 12:16, Jonathan M Davis wrote:

 You'd do it the other way around by having something like

 ValidatedString!char s = validateString("hello world");
Right.
 ValidatedString would then avoid any extra validation when iterating
 over the
 characters, though I don't know how much of an efficiency gain that
 would
 actually be given that much of the validation occurs naturally when
 decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on
 ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks
 and see if
 there actually is much difference. Because if there isn't, then I
 don't think
 that it's worth going to the trouble of adding something like
 ValidatedString.
If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; }
Uh-hm. Add this: alias get this;
 }

 Validated!(fn, T) validate(alias fn, T)(T value) {
      Validated!(fn, T) result;
      fn(value);
      result.value = value;
      return result;
 }

 void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName,
 string) path) {
     // Do stuff
 }
-- Simen
Nov 20 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
20-Nov-2013 22:01, Simen Kjærås пишет:
 On 20.11.2013 18:45, Simen Kjærås wrote:
[snip]
 May I suggest:

 struct Validated(alias fn, T) {
      private T value;
       property inout
      T get() {
          return value;
      }
Uh-hm. Add this: alias get this;
And it decays to the naked type in a blink of an eye. And some function down the road will do the validation again...
 }

 Validated!(fn, T) validate(alias fn, T)(T value) {
      Validated!(fn, T) result;
      fn(value);
      result.value = value;
      return result;
 }

 void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName,
 string) path) {
     // Do stuff
 }
-- Dmitry Olshansky
Nov 20 2013
next sibling parent reply "Meta" <jared771 gmail.com> writes:
On Wednesday, 20 November 2013 at 18:30:58 UTC, Dmitry Olshansky 
wrote:
 20-Nov-2013 22:01, Simen Kjærås пишет:
 On 20.11.2013 18:45, Simen Kjærås wrote:
[snip]
 May I suggest:

 struct Validated(alias fn, T) {
     private T value;
      property inout
     T get() {
         return value;
     }
Uh-hm. Add this: alias get this;
And it decays to the naked type in a blink of an eye. And some function down the road will do the validation again...
 }

 Validated!(fn, T) validate(alias fn, T)(T value) {
     Validated!(fn, T) result;
     fn(value);
     result.value = value;
     return result;
 }

 void 
 functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName,
 string) path) {
    // Do stuff
 }
Yes. It is very important not to allow direct access to the underlying value. This is important for ensuring that it is not put in an invalid state. This is a mistake that was made with std.typecons.Nullable, making it useless for anything other than giving a non-nullable type a null state (which, in fairness, is probably all that it was originally intended for).
Nov 20 2013
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, November 20, 2013 19:53:43 Meta wrote:
 Yes. It is very important not to allow direct access to the
 underlying value. This is important for ensuring that it is not
 put in an invalid state. This is a mistake that was made with
 std.typecons.Nullable, making it useless for anything other than
 giving a non-nullable type a null state (which, in fairness, is
 probably all that it was originally intended for).
It's arguably pretty pointless to put a nullable type in std.typecons.Nullable. If you want a nullable type to be null, just set it to null. - Jonathan M Davis
Nov 20 2013
parent reply "Meta" <jared771 gmail.com> writes:
On Wednesday, 20 November 2013 at 19:23:32 UTC, Jonathan M Davis 
wrote:
 On Wednesday, November 20, 2013 19:53:43 Meta wrote:
 Yes. It is very important not to allow direct access to the
 underlying value. This is important for ensuring that it is not
 put in an invalid state. This is a mistake that was made with
 std.typecons.Nullable, making it useless for anything other 
 than
 giving a non-nullable type a null state (which, in fairness, is
 probably all that it was originally intended for).
It's arguably pretty pointless to put a nullable type in std.typecons.Nullable. If you want a nullable type to be null, just set it to null. - Jonathan M Davis
See the discussion from the other thread for why it can be useful to wrap a nullable reference in a option type (nullable is a pseudo-option type).
Nov 20 2013
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, November 20, 2013 20:40:40 Meta wrote:
 On Wednesday, 20 November 2013 at 19:23:32 UTC, Jonathan M Davis
 
 wrote:
 On Wednesday, November 20, 2013 19:53:43 Meta wrote:
 Yes. It is very important not to allow direct access to the
 underlying value. This is important for ensuring that it is not
 put in an invalid state. This is a mistake that was made with
 std.typecons.Nullable, making it useless for anything other
 than
 giving a non-nullable type a null state (which, in fairness, is
 probably all that it was originally intended for).
It's arguably pretty pointless to put a nullable type in std.typecons.Nullable. If you want a nullable type to be null, just set it to null. - Jonathan M Davis
See the discussion from the other thread for why it can be useful to wrap a nullable reference in a option type (nullable is a pseudo-option type).
I know. And I still think that it's pointless - and it incurs extra overhead to boot, making it _worse_ than pointless. But clearly there's disagreement on the matter. - Jonathan M Davis
Nov 20 2013
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-11-20 19:53, Meta wrote:

 Yes. It is very important not to allow direct access to the underlying
 value. This is important for ensuring that it is not put in an invalid
 state. This is a mistake that was made with std.typecons.Nullable,
 making it useless for anything other than giving a non-nullable type a
 null state (which, in fairness, is probably all that it was originally
 intended for).
In that case all string functionality needs to be provided inside the Validated struct. In addition to that we loose the beauty of UFCS, at least for functions expecting plain "string". -- /Jacob Carlborg
Nov 20 2013
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, November 21, 2013 08:36:37 Jacob Carlborg wrote:
 On 2013-11-20 19:53, Meta wrote:
 Yes. It is very important not to allow direct access to the underlying
 value. This is important for ensuring that it is not put in an invalid
 state. This is a mistake that was made with std.typecons.Nullable,
 making it useless for anything other than giving a non-nullable type a
 null state (which, in fairness, is probably all that it was originally
 intended for).
In that case all string functionality needs to be provided inside the Validated struct. In addition to that we loose the beauty of UFCS, at least for functions expecting plain "string".
You could use alias this and alias the Validated struct to the underlying string, but if you did that, you'd probably end up having it escape the struct and used as a naked string the vast majority of the time, which would essentially defeat the purpose of the Validated struct. - Jonathan M Davis
Nov 20 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-11-21 08:46, Jonathan M Davis wrote:

 You could use alias this and alias the Validated struct to the underlying
 string, but if you did that, you'd probably end up having it escape the struct
 and used as a naked string the vast majority of the time, which would
 essentially defeat the purpose of the Validated struct.
Yeah, that's what needs to be avoided and is the reason "alias this" or a property returning the raw string cannot be used. -- /Jacob Carlborg
Nov 21 2013
prev sibling parent "Meta" <jared771 gmail.com> writes:
On Thursday, 21 November 2013 at 07:36:38 UTC, Jacob Carlborg 
wrote:
 On 2013-11-20 19:53, Meta wrote:

 Yes. It is very important not to allow direct access to the 
 underlying
 value. This is important for ensuring that it is not put in an 
 invalid
 state. This is a mistake that was made with 
 std.typecons.Nullable,
 making it useless for anything other than giving a 
 non-nullable type a
 null state (which, in fairness, is probably all that it was 
 originally
 intended for).
In that case all string functionality needs to be provided inside the Validated struct. In addition to that we loose the beauty of UFCS, at least for functions expecting plain "string".
This is tricky business. Unfortunately, having the wrapper be able to degrade to its base type is at odds with providing compiler-enforced guarantees. We can't allow direct access to the underlying string, because the user could purposely or inadvertently put it in an invalid state. On the other hand, these opaque wrapper types can no longer be transparently substituted into existing code. One solution is copying the validated string to do arbitrary operations on, leaving the original validated string unchanged. auto validatedString = validate!isValidUTF(someString); //Doesn't work; Validated!string does not expose the string interface //auto invalidString = validatedString.map!(c => c - cast(char)int.max); //Also doesn't work //validatedString ~= cast(char)0xFFFF auto validatedCopy = validatedString.duplicate(); //Do bad things with validatedCopy. validatedString remains unchanged and valid
Nov 20 2013
prev sibling next sibling parent reply =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 20.11.2013 19:30, Dmitry Olshansky wrote:
 20-Nov-2013 22:01, Simen Kjærås пишет:
 On 20.11.2013 18:45, Simen Kjærås wrote:
[snip]
 May I suggest:

 struct Validated(alias fn, T) {
      private T value;
       property inout
      T get() {
          return value;
      }
Uh-hm. Add this: alias get this;
And it decays to the naked type in a blink of an eye. And some function down the road will do the validation again...
And guess what? That's (often) ok. It's better to do the validation once too many than missing it once. The point (at least in the cases I've used it) is to enforce that only validated values are passed to functions that require validated strings, not that validated values never be passed to functions that don't really care. Doing it like this also lets you call functions that take the unadorned type, because that might be just as important. The result of re-validating is performance loss. The result of missed validation is a bug. Also, in just a few lines, you can make a version that will *not* decay to the original type: struct Validated(alias fn, T) { private T _value; property inout T value() { return _value; } } // validated() is identical to before. Sure, using it is a bit more verbose than using the unadorned type, which is why I chose to make the original version automatically decay. This is a judgment where sensible people may disagree, even with themselves on a case-by-case basis. -- Simen
Nov 20 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-11-21 01:16, Simen Kjærås wrote:

 The result of re-validating is performance loss. The result of missed
 validation is a bug. Also, in just a few lines, you can make a version
 that will *not* decay to the original type:

    struct Validated(alias fn, T) {
        private T _value;
         property inout
        T value() {
            return _value;
        }
    }

    // validated() is identical to before.

 Sure, using it is a bit more verbose than using the unadorned type,
 which is why I chose to make the original version automatically decay.
 This is a judgment where sensible people may disagree, even with
 themselves on a case-by-case basis.
It's still accessible via "value". -- /Jacob Carlborg
Nov 20 2013
parent =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On 2013-11-21 08:38, Jacob Carlborg wrote:
 On 2013-11-21 01:16, Simen Kjærås wrote:

 The result of re-validating is performance loss. The result of missed
 validation is a bug. Also, in just a few lines, you can make a version
 that will *not* decay to the original type:

    struct Validated(alias fn, T) {
        private T _value;
         property inout
        T value() {
            return _value;
        }
    }

    // validated() is identical to before.

 Sure, using it is a bit more verbose than using the unadorned type,
 which is why I chose to make the original version automatically decay.
 This is a judgment where sensible people may disagree, even with
 themselves on a case-by-case basis.
It's still accessible via "value".
Indeed it is. If we want to make it perfectly impossible to get at the contents, so as to hinder all possible use of the data, I suggest this solution: struct Validated {} Validated validate() { return Validated.init; } -- Simen
Nov 21 2013
prev sibling parent "Daniel Davidson" <nospam spam.com> writes:
On Wednesday, 20 November 2013 at 18:30:58 UTC, Dmitry Olshansky 
wrote:
 And it decays to the naked type in a blink of an eye. And some 
 function down the road will do the validation again...
Not if that function down the road only accepted validated in the first place because that is what it needed. Follow the rule - if you need validated instance only accept validated type - do not try to validate.
Nov 21 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 11/20/2013 3:16 AM, Jonathan M Davis wrote:
 ValidatedString would then avoid any extra validation when iterating over the
 characters, though I don't know how much of an efficiency gain that would
 actually be given that much of the validation occurs naturally when decoding
 or using stride. It would have the downside that any function which
 specializes on strings would likely have to then specialize on ValidatedString
 as well. So, while I agree with the idea in concept, I'd propose that we
 benchmark the difference in decoding and striding without the checks and see if
 there actually is much difference. Because if there isn't, then I don't think
 that it's worth going to the trouble of adding something like ValidatedString.
Utf validation isn't the only form of validation for strings. You could, for example, validate that the string doesn't contain SQL injection code, or contains a correctly formatted date, or has a name that is guaranteed to be in your employee database, or is a valid phone number, or is a correct email address, etc. Again, validation is not defined by D, it is defined by the constraints YOUR PROGRAM puts on it.
Nov 20 2013
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, November 20, 2013 16:26:59 Walter Bright wrote:
 On 11/20/2013 3:16 AM, Jonathan M Davis wrote:
 ValidatedString would then avoid any extra validation when iterating over
 the characters, though I don't know how much of an efficiency gain that
 would actually be given that much of the validation occurs naturally when
 decoding or using stride. It would have the downside that any function
 which specializes on strings would likely have to then specialize on
 ValidatedString as well. So, while I agree with the idea in concept, I'd
 propose that we benchmark the difference in decoding and striding without
 the checks and see if there actually is much difference. Because if there
 isn't, then I don't think that it's worth going to the trouble of adding
 something like ValidatedString.
Utf validation isn't the only form of validation for strings. You could, for example, validate that the string doesn't contain SQL injection code, or contains a correctly formatted date, or has a name that is guaranteed to be in your employee database, or is a valid phone number, or is a correct email address, etc. Again, validation is not defined by D, it is defined by the constraints YOUR PROGRAM puts on it.
Yes, but we seemed to be discussing the possibility of having some kind of type in Phobos which indicated that the string had been validated for UTF correctness. I wouldn't expect other types of string validation to end up in Phobos. And without the type for UTF validation being in Phobos and specialized on in Phobos functions, I don't think that I would ever want to use it, because in such a case, you lose out on all of the specialization that Phobos does for strings and are stuck with a range of dchar, which will force a lot of extra decoding even if some of the validation can be skipped, since it was already validated, whereas a number of Phobos functions are able to specialize on narrow strings and avoid decoding altogether. That performance boost would be lost if a string was wrapped in a UTFValidatedString without Phobos specializing on UTFValidatedString, and based on how decode and stride work, it looks to me like the decoding costs way more than the little bit of extra validation that is currently done as part of that such that avoiding the decoding is likely to be a much greater performance boost than avoiding those checks. And if that is indeed the case, I don't see much point to something like UTFValidatedString unless Phobos specializes for it like it specializes for narrow strings. Other types of string validation might very well be worth doing without Phobos knowing about them, but having the wrapper type which indicates that that validation has been done still needs to be worth more than the performance hit of not being able to use naked strings anymore and losing any performance gains that come from the functions which specialize for narrow strings. And that's probably true for strings that just get passed around but probably isn't true for strings that end up being processed by range-based functions a lot. - Jonathan M Davis
Nov 20 2013
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 20 Nov 2013 16:26:59 -0800
schrieb Walter Bright <newshound2 digitalmars.com>:

 Utf validation isn't the only form of validation for strings. You could, for 
 example, validate that the string doesn't contain SQL injection code, or 
 contains a correctly formatted date, or has a name that is guaranteed to be in 
 your employee database, or is a valid phone number, or is a correct email 
 address, etc.
 
 Again, validation is not defined by D, it is defined by the constraints YOUR 
 PROGRAM puts on it.
A checked type for database access goes a bit beyond the scope of the proposal. You'd need to encapsulate a transaction that needs to be working on a snapshot of the database state and fail if data changed in another transaction. Otherwise you could validate a name against the database just before someone else deletes it and thus invalidates the string. With a DB transaction wrapped in the validation, assignment between two "validated" strings becomes a pretty sophisticated runtime action, while the original proposals evolved around validation functions that can be pure. /This allows us to assign one validated string type to another with no runtime overhead./ -- Marco
Nov 25 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/20/2013 2:49 AM, Jacob Carlborg wrote:
 How should we accomplish this? We can't replace:

 void main (string[] args)

 With

 void main (UnsafeString[] args)

 And break every application out there.
Use a different type for the validated string, validated means your program has guaranteed it has a certain form defined by that program.
Nov 20 2013
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, November 19, 2013 16:01:00 Andrei Alexandrescu wrote:
 Please chime in with ideas!
In general, I favor using defensive programming in library APIs and using enforce to validate the input to functions. Doing so makes it much harder to misuse the library and makes it much less likely that programs will run into weird and/or undefined behavior or other types of bugs. I then favor using DbC within a library or application for its own code and asserting that input is valid in those cases, because in that case, the caller is essentially part of the same code that's doing the asserting and is maintained by the same people. The problem with that is of course that there are cases where performance degrades when you use defensive programming and always check input - especially when the caller can know that the data is valid without having to check it first. So, having a way to use an API that doesn't involve it always defensively checking its input can be useful for the sake of efficiency. Unfortunately, I don't think that it scales at all to take the approach that Walter has suggested of having the API normally assert on input and provide helper functions which the caller can use to validate input when they deem appropriate. That has the advantage of giving the caller control over what is and isn't checked and avoiding unnecessary checks, but it also makes it much easier to misuse the API, and I would expect the average programmer to skip the checks in most cases. It very quickly becomes like using error codes instead of exceptions, except that in this case, instead of an error code being ignored, the data's validity wouldn't have even been checked in the first place, resulting in the function being called doing who-knows-what. And the resulting bugs could be very obvious, or they could be insidiously hard to detect. So, if we can find a way to default to checking validity and throwing on bad input but still provide a way for the caller to avoid the checks when appropriate, I think that that would be ideal. That way, we default to correctness and user-friendliness (in that the API is harder to silently use incorrectly that way), but we still provide a more performant route for those who know what they're doing and are willing to take the time to make sure that they are sure that they truly do know how to use the API correctly and take responsibility for ensuring that they don't feed bad input to the API. Now, how we do that, I don't know. In some cases, creating a wrapper type would solve the problem (e.g. some kind of wrapper for strings which guaranteed UTF-correctness). But I don't think that it scales to use wrapper types for all such situations. One alternative is to essentially duplicate a lot of functions with one function validating the input for you and throwing on failure, and the other asserting that the input is valid. But that could result in a lot of code duplication, which isn't terribly desirable either. The assumeSorted or FracSec.assumeValid solutions seem to go either with a wrapper type or with essentially being a second function which does the same thing but without the validation depending on the types involved and what the function is doing. Another alternative would be to provide an argument (probably a template argument, though it could be a function argument if that makes more sense) which told the function whether it should assert or enforce on its input. That would at least localize the code duplication, but again, that could get a bit verbose, and I do like how assumeXYZ makes it abundantly clear that the caller is taking responsibility for the correctness in that case. And in some situations, I think that it would clearly be the case that it wouldn't make any sense to do anything else other than enforce on the input (e.g. string parsing functions have a tendency to have to do almos the same work in the validation function as the actual parsing function, making it almost pointless to have a separate validation function). So, I think that what we end up doing is definitely going to depend on what the code in question is for and what it's doing, but I agree that it would be valuable to come up with some common idioms for handling validation and error checking, and assumeXYZ would be one such idiom and one which documents things nicely when it can be used. Still, the most important point that I'd like to make is that I think we should lean towards validating input with enforce by default and then provide alternative means to avoid that validation rather than using assertions and DbC by default, because leaving the validation up to the caller in release and asserting in debug is going to lead to _far_ more bugs in code using Phobos, particularly when the result isn't immediately and obviously wrong when bad input is given. And the fact that by default, the assertions in Phobos won't be hit in calling code unless the Phobos function is templatized (because Phobos will have been compiled in release) makes using assertions that much worse. But I'll definitely have to think about idioms that we could use to do separate validation where appropriate and yet validate arguments via enforce by default. - Jonathan M Davis
Nov 20 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-11-20 11:38, Jonathan M Davis wrote:

 Unfortunately, I don't think that it scales at all to take the approach that
 Walter has suggested of having the API normally assert on input and provide
 helper functions which the caller can use to validate input when they deem
 appropriate. That has the advantage of giving the caller control over what is
 and isn't checked and avoiding unnecessary checks, but it also makes it much
 easier to misuse the API, and I would expect the average programmer to skip
 the checks in most cases. It very quickly becomes like using error codes
 instead of exceptions, except that in this case, instead of an error code
 being ignored, the data's validity wouldn't have even been checked in the first
 place, resulting in the function being called doing who-knows-what. And the
 resulting bugs could be very obvious, or they could be insidiously hard to
 detect.
I think Walter suggestion requires the use of asserts: bool isValid (Data data); void process (Data data) { assert(isValid(data)); // process } The asserts should be on by default and remove in release builds. This would require DMD shipping two versions of Phobos, one with asserts enabled and one where they're disabled. Then only when the -release flag is used the the version of Phobos with disabled asserts will be used.
 Still, the most important point that I'd like to make is that I think we
 should lean towards validating input with enforce by default and then provide
 alternative means to avoid that validation rather than using assertions and
 DbC by default, because leaving the validation up to the caller in release and
 asserting in debug is going to lead to _far_ more bugs in code using Phobos,
 particularly when the result isn't immediately and obviously wrong when bad
 input is given. And the fact that by default, the assertions in Phobos won't
 be hit in calling code unless the Phobos function is templatized (because
 Phobos will have been compiled in release) makes using assertions that much
 worse.
DMD need to ship with two versions of Phobos, one with assertions on and one with them disabled. -- /Jacob Carlborg
Nov 20 2013
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/20/2013 12:57 PM, Jacob Carlborg wrote:
 bool isValid (Data data);

 void process (Data data)
 {
      assert(isValid(data));
      // process
 }
void process(Data data)in{ assert(isValid(data)); }body{ // process }
Nov 20 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-11-20 14:01, Timon Gehr wrote:

 void process(Data data)in{ assert(isValid(data)); }body{
      // process
 }
Right, forgot about contracts. -- /Jacob Carlborg
Nov 20 2013
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei 
Alexandrescu wrote:
 (c) A variety of text functions currently suffer because we 
 don't make the difference between validated UTF strings and 
 potentially invalid ones.
I think it is fair to always assume that a char[] is a valid UTF-8 string, and instead perform the validation when creating/filling the string from a non-validated source. Take std.file.read() as an example; it returns void[], but has a validating counterpart in std.file.readText(). I think we should use ubyte[] to a greater extent for data which is potentially *not* valid UTF. Examples include interfacing with C functions, where I think there is a tendency towards always translating C char to D char, when they are in fact not equivalent. Another example is, again, std.file.read(), which currently returns void[]. I guess it is a matter of taste, but I think ubyte[] would be more appropriate here, since you can actually use it for something without casting it first. The transition from string to ubyte[] is already made simple by std.string.representation. We should offer an equally simple and convenient way to do the opposite transformation. In one of my current projects, I am using this function: inout(char)[] asString(inout(ubyte)[] data) safe pure { auto s = cast(typeof(return)) data; import std.utf: validate; validate(s); return s; } This could easily be written as a template, to accept wider encodings as well, and I think it would be a nice addition to Phobos. Lars
Nov 20 2013
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, November 20, 2013 11:45:57 Lars T. Kyllingstad wrote:
 On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei
 
 Alexandrescu wrote:
 (c) A variety of text functions currently suffer because we
 don't make the difference between validated UTF strings and
 potentially invalid ones.
I think it is fair to always assume that a char[] is a valid UTF-8 string, and instead perform the validation when creating/filling the string from a non-validated source.
That doesn't work when strings are being created via concatenation and the like inside the program rather than simply coming from outside the program.
 Take std.file.read() as an example; it returns void[], but has a
 validating counterpart in std.file.readText().
 
 I think we should use ubyte[] to a greater extent for data which
 is potentially *not* valid UTF.
Well, we've already discussed the possibility of using ubyte[] to indicate ASCII strings, and that makes a lot more sense IMHO, because then no decoding occurs (which is precisely what you want for ASCII), whereas with a string that's potentially invalid UTF, it's not that we don't want to decode it. It's just that we need to validate it when decoding it. So, I'd argue that ubyte[] should be used when you want to operate on code units rather than code points rather than it having anything to do with validating code points. - Jonathan M Davis
Nov 20 2013
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
20-Nov-2013 14:45, Lars T. Kyllingstad пишет:
 On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:
 (c) A variety of text functions currently suffer because we don't make
 the difference between validated UTF strings and potentially invalid
 ones.
I think it is fair to always assume that a char[] is a valid UTF-8 string, and instead perform the validation when creating/filling the string from a non-validated source. Take std.file.read() as an example; it returns void[], but has a validating counterpart in std.file.readText().
Sadly it's horrifically slow to do so. Above all practicality must take precedence. Would you like to validate the whole file just to later re-scan it anew to say tokenize source file?
 I think we should use ubyte[] to a greater extent for data which is
 potentially *not* valid UTF.  Examples include interfacing with C
 functions, where I think there is a tendency towards always translating
 C char to D char, when they are in fact not equivalent.  Another example
 is, again, std.file.read(), which currently returns void[].  I guess it
 is a matter of taste, but I think ubyte[] would be more appropriate
 here, since you can actually use it for something without casting it first.
Otherwise I think it's a good idea to encode high-level invariants in types. The only problem is inadvertent template bloat then. [snip] -- Dmitry Olshansky
Nov 20 2013
prev sibling parent Lionello Lunesu <lionello lunesu.remove.com> writes:
On 11/20/13, 18:45, Lars T. Kyllingstad wrote:
 I think we should use ubyte[] to a greater extent for data which is
 potentially *not* valid UTF.  Examples include interfacing with C
 functions, where I think there is a tendency towards always translating
 C char to D char, when they are in fact not equivalent.  Another example
 is, again, std.file.read(), which currently returns void[].  I guess it
 is a matter of taste, but I think ubyte[] would be more appropriate
 here, since you can actually use it for something without casting it first.
+1 Especially the windows APIs, they never take UTF-8(*) but consistently get translated to taking D char :( In fact, if we want a good translation from C to D, we should be using D byte. On most platforms I've run into have C char is signed. (To be honest, you don't see 'byte' much in D code, so it would make the ported code stand out even more.) * except from MultiByteToWideChar
Nov 26 2013
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, November 20, 2013 08:51:16 Joseph Rushton Wakeling wrote:
 On 20/11/13 01:01, Andrei Alexandrescu wrote:
 There's been recent discussion herein about what parameter validation
 method would be best for Phobos to adhere to.
 
 Currently we are using a mix of approaches:
 
 1. Some functions enforce()
 
 2. Some functions just assert()
 
 3. Some (fewer I think) functions assert(0)
 
 4. Some functions don't do explicit checking, relying instead on
 lower-level enforcement such as null dereference and bounds checking to
 ensure safety.
 
 Each method has its place. The question is what guidelines we put forward
 for Phobos code to follow; we're a bit loose about that right now.
Regarding enforce() vs. assert(), a good rule that I remember having suggested to me was that enforce() should be used for actual runtime checking (e.g. checking that the input to a public API function has correct properties), assert() should be used to test logical failures (i.e. checking that cases which should never arise, really don't arise). I've always followed that as a rule of thumb ever since.
When an assertion fails, it's a bug in your code. Assertions should _never_ be used for validating user input. So, if your function is asserting on the state of its input, then it is requiring that the caller give input which follows that contract, and it's a bug in the caller when they violate that contract by passing in bad input. When your function uses enforce to validate its input, it is _not_ considered a bug when bad input is given. It _could_ be a bug in the caller, but they are not required to give valid input. When they give invalid input, they then get to react to the exception that was thrown and handle the error appropriately. Then this works when the input came from outside the program (e.g. a user or a file) as well as when it doesn't make sense for the caller to have validated the input before calling the function (e.g. because the validator function and the function doing the work end up having to almost the same work, making it cheaper to just have the function validate its input and not have a separate validator function). It also makes it so that the function will _never_ have to operate on invalid input as invalid input will always be checked and rejected, which then makes it much harder to use the function incorrectly. But ultimately, whether you use assertions or exceptions comes down to whether it's considered to always be a bug in the caller if the input is bad. DbC uses assertions and considers it a bug in the caller (since they violated their part of the contract), whereas defensive programming has the function protect itself and always check and throw on invalid input rather than assuming that the caller is going to provide valid input. - Jonathan M Davis
Nov 20 2013