digitalmars.D - The review of std.hash package

Dmitry Olshansky (28/28) Aug 07 2012 Since the review queue has been mostly silent again I've decided to

David (19/19) Aug 07 2012 Is the API already set in stone?

David (2/5) Aug 07 2012 Well, it's there to implement the Range interface, but still, put
Jonathan M Davis (5/6) Aug 07 2012 No. That's the main point of the review process. The API needs to be rev...
Johannes Pfau (66/94) Aug 07 2012 No, as Jonathan already mentioned reviewing the API is an important

David (45/134) Aug 07 2012 Ok this point with the one above makes sense (I implemented my OpenSSL

Johannes Pfau (50/145) Aug 08 2012 You mean an external function which constructs the digest context and

Piotr Szturmaj (14/15) Aug 07 2012 I tried it before but I wanted to create whole crypto package at once,

Johannes Pfau (43/61) Aug 08 2012 I'm sorry, I didn't want to conceal your work. What I meant with

Piotr Szturmaj (21/68) Aug 08 2012 Yes, there should be bcrypt, scrypt and PBKDF2.

Johannes Pfau (25/53) Aug 08 2012 Great! I always tried the *endianToNative and nativeTo*Endian functions.

Jonathan M Davis (5/16) Aug 08 2012 What's wrong with the *endianToNative and nativeTo*Endian functions? The...

Johannes Pfau (6/12) Aug 08 2012 in CTFE?

Jonathan M Davis (15/30) Aug 08 2012 No. It wouldn't work in CTFE, because it uses a union. But what it's try...

Johannes Pfau (7/34) Aug 09 2012 I completely agree, but this is true for hashes. Once the final hash

Johannes Pfau (3/5) Aug 08 2012 Wow, I didn't know about scrypt. Seems to be pretty cool.

Ary Manzana (10/21) Aug 07 2012 I think "std.crypto" is a better name for the package. At first I

deadalnix (6/31) Aug 07 2012 You'll find very hard to convince anyone that crc32 is a cryptographic

Johannes Pfau (15/20) Aug 07 2012 And there will hopefully be more hashes in std.hash a some point.

Jonathan M Davis (4/7) Aug 07 2012 That doesn't fly, because crc32 is going to be in there, and while it's ...

Regan Heath (6/12) Aug 08 2012 std.digest then?

H. S. Teoh (7/18) Aug 08 2012 [...]

Tobias Pankrath (7/27) Aug 08 2012 -1

Regan Heath (11/34) Aug 08 2012 That's exactly what it's supposed to suggest. The algorithm does digest...

Tobias Pankrath (9/17) Aug 08 2012 So at least this implies that hash function is the more general

Regan Heath (14/31) Aug 08 2012 I could have, but I didn't read that far :p I knew what I was looking f...

travert phare.normalesup.org (Christophe Travert) (5/9) Aug 08 2012 I think the question is: is std.hash going to contain only

Chris Cain (6/12) Aug 08 2012 Even if that were the case, I'd say they should be kept separate.

Regan Heath (12/22) Aug 08 2012 I don't think there is any reason to separate them. People should know ...
travert phare.normalesup.org (Christophe Travert) (14/27) Aug 08 2012 They should not be categorized the same. I don't expect a regular hash

Chris Cain (38/46) Aug 08 2012 In this case, I'm not suggesting keep them separate to not

travert phare.normalesup.org (Christophe Travert) (23/23) Aug 08 2012 "Chris Cain" , dans le message (digitalmars.D:174477), a écrit :

Regan Heath (30/51) Aug 08 2012 t
Chris Cain (17/33) Aug 08 2012 Actually, maybe I'm the one not doing a good job of explaining.

Johannes Pfau (4/11) Aug 08 2012 std.hash.digest doesn't sound too bad. We could have std.hash.func (or

Jonathan M Davis (3/14) Aug 08 2012 I say just keep at simple and leave it at std.hash. It's plenty clear IM...

Andrei Alexandrescu (5/6) Aug 08 2012 Not clear to quite a few of us. IMHO it just makes us seem (to the

Jonathan M Davis (7/14) Aug 08 2012 I prefer std.hash to std.digest, but I don't necessarily care all that m...

RivenTheMage (11/14) Aug 15 2012 Three basic types of hash functions are:

David Nadlinger (6/13) Aug 15 2012 Why? 1) might have a different interface than the others, but 2)

RivenTheMage (10/15) Aug 15 2012 The "only" difference between 2) and 3) is a big difference.

RivenTheMage (6/6) Aug 15 2012 Another example is a systematic error-correcting codes. The

=?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= (14/19) Aug 15 2012 Some people's point is that MD5 was consider a cryptographic digest

ReneSac (10/31) Aug 15 2012 I agree that MD5 isn't cryptographically secure anymore, but it

=?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= (13/15) Aug 15 2012 Thats because it is a "password module" and nobody or a small

RivenTheMage (19/23) Aug 15 2012 In turn, that's because CRC is not not a crytographic hash and

RivenTheMage (2/5) Aug 15 2012 I forgot that this case is already covered by reduce!(...)

Dmitry Olshansky (11/38) Aug 08 2012 You still can use say crc32 as normal hash function for some binary

Dmitry Olshansky (4/13) Aug 08 2012 Damned spellcheckers: desperate -> disparate

Andrei Alexandrescu (3/4) Aug 08 2012 Yes please.

Dmitry Olshansky (11/13) Aug 07 2012 There is std.container, so unambiguous for me.

Walter Bright (3/6) Aug 07 2012 The hash functions must use a Range interface, not a file interface.

Johannes Pfau (7/15) Aug 08 2012 I guess this is meant as a general statement and not specifically

Walter Bright (20/35) Aug 08 2012 It should accept an input range. But using an Output Range confuses me. ...

Piotr Szturmaj (6/10) Aug 08 2012 Suppose you have a callback that will give you blocks of bytes to hash.

Walter Bright (2/13) Aug 08 2012 Have the callback supply a range interface to call the hash with.

Martin Nowak (17/18) Aug 08 2012 --------

Walter Bright (2/5) Aug 08 2012 See the discussion on using reduce().

Johannes Pfau (39/47) Aug 09 2012 I just don't understand it. Let's take the example by Martin Nowak and

Walter Bright (5/52) Aug 09 2012 That isn't a problem, the internal state can be private data for the str...
deadalnix (3/50) Aug 16 2012 I'm pretty sure it is possible to pad and finish when a result is

Regan Heath (13/72) Aug 17 2012 d

deadalnix (2/40) Aug 08 2012 That is a really good point. +1
Martin Nowak (3/16) Aug 08 2012 I think sha1Of/digest!SHA1 should do this.

Walter Bright (4/25) Aug 08 2012 Take a look at the reduce function in

Johannes Pfau (17/24) Aug 08 2012 This can only work if the final state is valid as an initial state.

Walter Bright (5/7) Aug 08 2012 The idea is to have hash act like a component - not with special added c...

Johannes Pfau (32/43) Aug 09 2012 Please explain that. Nobody's going to simply replace a call to reduce
David Nadlinger (18/22) Aug 09 2012 Sorry, but I think this is a meaningless statement without

Regan Heath (51/71) Aug 09 2012 On Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger ...

Dmitry Olshansky (20/30) Aug 09 2012 struct ShaState

Dmitry Olshansky (6/22) Aug 09 2012 Too fast.. should have been:

David Nadlinger (8/11) Aug 09 2012 I have been thinking about using AliasThis as well, but the

Jonathan M Davis (10/22) Aug 09 2012 Yeah. alias this can be very useful, but it's very dangerous when it com...

Walter Bright (15/35) Aug 09 2012 It is not a meaningless statement in that components have a predictable ...

travert phare.normalesup.org (Christophe Travert) (21/21) Aug 09 2012 If a has is a range, it's an output range, because it's something you

Walter Bright (3/3) Aug 11 2012 See the new thread Andrei started entitled "finish function for output r...

Kagamin (3/9) Aug 15 2012 An example of stateless hash in .net:

Dmitry Olshansky (5/14) Aug 15 2012 AFAIK it'a method of HashAlgorithm Object.

Kagamin (6/7) Aug 15 2012 It's a minor design detail, see the example: the method is called

David Nadlinger (10/14) Aug 15 2012 No, it's not a »minor design detail«, at least not regarding
Dmitry Olshansky (6/12) Aug 15 2012 Brrr. It's how convenience wrapper works :)

Kagamin (10/15) Aug 15 2012 Well there was a wish for stateless hash, Walter even posted the

David Nadlinger (5/8) Aug 15 2012 And our point is that such an interface is trivial to implement
Dmitry Olshansky (6/17) Aug 15 2012 auto result = file.byChunk(4096 * 1025).joiner.digest();

David Nadlinger (3/10) Aug 15 2012 http://msdn.microsoft.com/en-us/library/system.security.cryptography.has...

Kagamin (5/16) Aug 15 2012 Ok, but HashAlgorithm still supports stateless interface which

Johannes Pfau (69/98) Aug 08 2012 auto crc = crc32Of(data);

Johannes Pfau (16/19) Aug 08 2012 I implemented the function, it's actually quite simple:

travert phare.normalesup.org (Christophe Travert) (3/5) Aug 08 2012 Don't overload the function taking a void[][]. Remplace it. void[][] is
Walter Bright (3/22) Aug 08 2012 I don't know what you mean, it takes a range, not a void[][] as input.

Johannes Pfau (6/37) Aug 08 2012 So the post in D.learn for a detailed description. Yes the code I

Walter Bright (3/7) Aug 08 2012 Have the templated version with overloads simply call the single version...

Johannes Pfau (10/21) Aug 09 2012 Well that's possible, but I don't like the template bloat it causes.

Walter Bright (2/3) Aug 09 2012 It's more the user API that matters, not how it works under the hood.
Walter Bright (4/5) Aug 09 2012 The Range argument - is it an InputRange, an OutputRange? While it's jus...

Johannes Pfau (7/14) Aug 09 2012 It's an InputRange (of bytes) or an InputRange of some byte buffer

Andrei Alexandrescu (6/7) Aug 09 2012 What have you measured, and what is your dislike based upon?

Johannes Pfau (16/25) Aug 09 2012 What annoys me is that as long the function only supported arrays, it

Jacob Carlborg (7/14) Aug 09 2012 A workaround is to make the non-template function to a template, with no...

Denis Shelomovskij (9/9) Aug 08 2012 The question about module names.

Johannes Pfau (12/21) Aug 08 2012 They're supposed to contain more implementations in the future. I

travert phare.normalesup.org (Christophe Travert) (32/32) Aug 08 2012 I'm not familiar with hash functions in general.
Vladimir Panteleev (6/9) Aug 09 2012 Is it too late to ask to include MurmurHash 2 and/or 3? It's

Johannes Pfau (8/20) Aug 09 2012 To be honest I didn't even know that MurmurHash can be used

Regan Heath (6/26) Aug 09 2012 Once the API is formalised I can contribute the hashes I have also :)

Johannes Pfau (4/7) Aug 09 2012 great! with all those contributions we'll probably have a rather

Johannes Pfau (30/30) Aug 10 2012 I implemented some of the suggestions, here's the list of changes:

Johannes Pfau (15/15) Aug 20 2012 Changelog:

Jesse Phillips (10/10) Aug 28 2012 All this discussion on the use of auto in the docs made me notice

Johannes Pfau (11/25) Sep 04 2012 I had a look at how std.range documents the range interfaces, but the

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

	Since the review queue has been mostly silent again I've decided to 
jump in and manage the one that's ready to go :)

	Today starts the the review of std.hash package by Johannes Pfau. We go 
with the usual cycle of two weeks for review and one week for voting. 
Thus review ends on 22th of August, followed by voting that ends on 29th 
of August.

Description:

std.hash.hash is a new module for Phobos defining an uniform interface 
for hashes and checksums. It also provides some useful helper functions 
to deal with this new API.

The std.hash package also includes:

	- MD5 implementation deprecating std.md5 (in std.hash.md, adapted from 
std.md5);

	- new SHA1 implementation by redstar (in std.hash.sha);

	- CRC32 implementation (in std.hash.crc) based on and deprecating the 
crc32 module (that's shipped with phobos but not documented).

It only covers hashes which can process data incrementally (in
smaller buffers as opposed to all data at once).

Code:
https://github.com/jpf91/phobos/tree/newHash/std/hash
https://github.com/jpf91/phobos/compare/master...newHash

Docs:
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_md.html
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_sha.html
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_crc.html

-- 
Dmitry Olshansky

Aug 07 2012

David <d dav1d.de> writes:

Is the API already set in stone?

Using .start and .finish, feels like the use of OpenSSL. Why not making 
a Hash object not "restartable" (since e.g. MD5.start() sets `this` just 
to MD5.init) and making finish private and implementing a digest and 
hexdigest property which calls finish and returns an ubyte[] array (or 
string for hexdigest), this would also eliminate the need of the 
`digest` wrapper (e.g. you can mixin these properties with template 
mixins, so no code duplication)

Also I am not sure if we want to use `hash.put` instead of `hash.update` 
(which is used by python and openssl). But that's just a minor point 
(e.g. perl uses .add)

Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions, why 
not having a static method? e.g. MD5.hexdigest("input"), which returns 
the hexdigest and not a MD5 object.

One more thing, `.reset` does the same as `start`? If so, why do both 
exist? (I also find the examples quite confusing, why do you want to 
reset the hash onject? I would documentate the function but wouldn't use 
it in the examples)

Well, that's it from my side :)

Aug 07 2012

David <d dav1d.de> writes:

 Also I am not sure if we want to use `hash.put` instead of `hash.update`
 (which is used by python and openssl). But that's just a minor point
 (e.g. perl uses .add)

Well, it's there to implement the Range interface, but still, put 
doesn't make too much sense for me (maybe personal preference).

Aug 07 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, August 07, 2012 20:07:07 David wrote:
 Is the API already set in stone?

No. That's the main point of the review process. The API needs to be reviewed 
and revised as appropriate (as does the implementation, but the API is much 
harder to fix later, since that tends to cause breakin changes).

- Jonathan M Davis

Aug 07 2012

Johannes Pfau <nospam example.com> writes:

Am Tue, 07 Aug 2012 20:07:07 +0200
schrieb David <d dav1d.de>:

 Is the API already set in stone?

No, as Jonathan already mentioned reviewing the API is an important
part of the review. (Actually the most important part for this review,
as we define one API for many implementations)

 Using .start and .finish, feels like the use of OpenSSL. Why not
 making a Hash object not "restartable" (since e.g. MD5.start() sets
 `this` just to MD5.init)

I'm not an expert in the hash/digest area. We had incompatible APIs for
CRC32, MD5 and SHA1 and I proposed we should have a unique interface.
As I had some free time and nobody else stepped up I implemented it,
asking for expert help in the newsgroup. This probably also explains
why this API isn't revolutionary, it's inspired by other designs
(tango, .net, ...).

BTW: making it possible to wrap OpenSSL and other C hash libraries was
also a goal. This is one reason why a start() function is necessary.
OK, that was the intro ;-)

start is here as structs can't have default constructors. For all
current hashes it's really mostly useless (except it can be used as a
simple way to reset a hash, which is also why the hash structs don't
have a reset member), but I just don't know if there are hashes which
would require a more complex (runtime) initialization.

Classes don't have a start method as they can do that initialization in
the default constructor. But the default constructor can't be used as a
cheap way to reset a hash, therefore the reset function was added to
the OOP interface.

 and making finish private and implementing a
 digest and hexdigest property

property is not a good idea, as finish must reset the internal state
for some objects, so you can't access the property repeatedly. It'd
have to be a function.

which calls finish and returns an
 ubyte[] array (or string for hexdigest)

This could be done and is probably a matter of taste. I prefer the free
function in this case, it makes clear that digest() really is a generic
function which can be used with any hash. It's also one function less
which must be implemented by hash writers. I'd like to keep the number
of members required to implement a hash as low as possible.

(This is however unrelated to the actual naming of the functions. If
you think 'digest' is not a good name, please let me know)

, this would also eliminate
 the need of the `digest` wrapper (e.g. you can mixin these properties
 with template mixins, so no code duplication)

there's actually no code duplication. 'digest' is the only
implementation, sha1Of, md5Of are just aliases.

 
 Also I am not sure if we want to use `hash.put` instead of
 `hash.update` (which is used by python and openssl). But that's just
 a minor point (e.g. perl uses .add)

I initially called it update, I agree it's a better name. But phobos is
all about ranges, so we need the put member anyway. Providing an
'update' alias just for the name isn't worth it, imho (see above,
keeping the number of hash members low).

 
 Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions,
 why not having a static method? e.g. MD5.hexdigest("input"), which
 returns the hexdigest and not a MD5 object.

Same reason as above, sha1Of, md5Of are just convenience aliases, it
could be argued you don't need those at all. You can use 'digest'
directly and it will provide exactly the same result. So hash writers
don't have to implement anything to support digest, providing an alias
is optional and just 1 line of code. A static method would have to be
implemented by every hash. Yes you could use mixins to make that
painless, but I think it's still to much work. (And mixins can't be
documented in DDOC, but that's an unrelated compiler issue)

BTW: it is implemented as a member for the OOP API, as we can just
implement it in the base interface, so the actual implementations don't
have to care about it.

 
 One more thing, `.reset` does the same as `start`? If so, why do both 
 exist?

See above. It's because start is used in the template/struct API and
structs can't have default constructors, so we need start.

The OOP/class api can use a default constructor. But unlike the start
function it can't work as a reset function, so there's an explicit
reset function.

(I also find the examples quite confusing, why do you want to 
 reset the hash onject?

My approach was to document how to use the functions, not as much when
it makes sense to use them. Maybe I should care more about that?

It's more efficient than allocating a new hash with 'new' (or do you
mean why use reset if finish resets anyway? It's useful if you put some
data into a hash, then decide you don't want the hash (because the
user aborts the operation, network connection is lost,...) but you still
need the hash object later on.) You could just call finish and
disregard the result, but the reset implementation is faster. I agree
there's probably no _strong_ need for reset, but I think it doesn't do
any harm.

I would documentate the function but wouldn't
 use it in the examples)
 
 Well, that's it from my side :)

Thanks for your review!

Aug 07 2012

David <d dav1d.de> writes:

Am 07.08.2012 21:53, schrieb Johannes Pfau:
 Am Tue, 07 Aug 2012 20:07:07 +0200
 schrieb David <d dav1d.de>:

 Is the API already set in stone?

 No, as Jonathan already mentioned reviewing the API is an important
 part of the review. (Actually the most important part for this review,
 as we define one API for many implementations)

 Using .start and .finish, feels like the use of OpenSSL. Why not
 making a Hash object not "restartable" (since e.g. MD5.start() sets
 `this` just to MD5.init)

 I'm not an expert in the hash/digest area. We had incompatible APIs for
 CRC32, MD5 and SHA1 and I proposed we should have a unique interface.
 As I had some free time and nobody else stepped up I implemented it,
 asking for expert help in the newsgroup. This probably also explains
 why this API isn't revolutionary, it's inspired by other designs
 (tango, .net, ...).

 BTW: making it possible to wrap OpenSSL and other C hash libraries was
 also a goal. This is one reason why a start() function is necessary.
 OK, that was the intro ;-)

 start is here as structs can't have default constructors. For all
 current hashes it's really mostly useless (except it can be used as a
 simple way to reset a hash, which is also why the hash structs don't
 have a reset member), but I just don't know if there are hashes which
 would require a more complex (runtime) initialization.

Ok this point with the one above makes sense (I implemented my OpenSSL 
hashing wrapper as a class, initilaization is done in the constructor), 
it still doesn't feel right, if you have to call .start first. What 
about a struct-constructor which calls .start internally, so you get a 
bit more of a modern API (imo) and you're still able to implement the 
same interface for any kind of wrapper/own implementation/whatever.

 Classes don't have a start method as they can do that initialization in
 the default constructor. But the default constructor can't be used as a
 cheap way to reset a hash, therefore the reset function was added to
 the OOP interface.

Ok. (more below)

 and making finish private and implementing a
 digest and hexdigest property

 property is not a good idea, as finish must reset the internal state
 for some objects, so you can't access the property repeatedly. It'd
 have to be a function.

Well, you could store the result internally, property generates on the 
first call the digest, stores it (that's not too much, 16 byte for a 
md5) and returns the already stored value on the 2nd call.

 which calls finish and returns an
 ubyte[] array (or string for hexdigest)

 This could be done and is probably a matter of taste. I prefer the free
 function in this case, it makes clear that digest() really is a generic
 function which can be used with any hash. It's also one function less
 which must be implemented by hash writers. I'd like to keep the number
 of members required to implement a hash as low as possible.

With digest you mean: 
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html#digest ?

You normally always want the hexdigest (you barely need "real" digest), 
so it's a matter of convenience. Well, thanks to UFCS, you can call it 
like a method.

 (This is however unrelated to the actual naming of the functions. If
 you think 'digest' is not a good name, please let me know)

I think that's a fitting name, but I am not a hashing expert.

 , this would also eliminate
 the need of the `digest` wrapper (e.g. you can mixin these properties
 with template mixins, so no code duplication)

 there's actually no code duplication. 'digest' is the only
 implementation, sha1Of, md5Of are just aliases.

Yeah, I meant, you would have to implement these properties for each 
hash-type and in the end, all properties do the same (calling finish and 
maybe setting some internal flags, buffers), so you can put them into a 
template mixin and mixin them for each hash-type.

 Also I am not sure if we want to use `hash.put` instead of
 `hash.update` (which is used by python and openssl). But that's just
 a minor point (e.g. perl uses .add)

 I initially called it update, I agree it's a better name. But phobos is
 all about ranges, so we need the put member anyway. Providing an
 'update' alias just for the name isn't worth it, imho (see above,
 keeping the number of hash members low).

I have to agree, an alias would produce more confusion (what two methods 
which do the same?)

 Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions,
 why not having a static method? e.g. MD5.hexdigest("input"), which
 returns the hexdigest and not a MD5 object.

 Same reason as above, sha1Of, md5Of are just convenience aliases, it
 could be argued you don't need those at all. You can use 'digest'
 directly and it will provide exactly the same result. So hash writers
 don't have to implement anything to support digest, providing an alias
 is optional and just 1 line of code. A static method would have to be
 implemented by every hash.

Yes, you would have to implement it for every hash, but that's 3 lines:

static string hexdigest(void[] data) {
	return toHexString(digest!(typeof(this))("yay"));
}

 Yes you could use mixins to make that
 painless, but I think it's still to much work. (And mixins can't be
 documented in DDOC, but that's an unrelated compiler issue)

 BTW: it is implemented as a member for the OOP API, as we can just
 implement it in the base interface, so the actual implementations don't
 have to care about it.

I've seen that, but I am wondering why it's not a static method, the 
creation of the Hash Object could be hidden inside. You might say, this 
would allocate a class instance just for a single use, without the 
possibility to reuse it. Correct, but I think that's the use of such a 
function, if you just need a Hash, nothing else, otherwise you could use 
the class "directly", with all its other functionality.

 One more thing, `.reset` does the same as `start`? If so, why do both
 exist?

 See above. It's because start is used in the template/struct API and
 structs can't have default constructors, so we need start.

 The OOP/class api can use a default constructor. But unlike the start
 function it can't work as a reset function, so there's an explicit
 reset function.

I am not sure what do think about that. On the one side it's useful if 
you really need that speed, but then I think, is it really worth it 
resetting the state hidden in a function. If you really want to avoid 
another allocation you could use emplace (if I didn't misunderstand the 
use of emplace). To quote from the Python Zen:
"Explicit is better than implicit."

 (I also find the examples quite confusing, why do you want to
 reset the hash onject?

 My approach was to document how to use the functions, not as much when
 it makes sense to use them. Maybe I should care more about that?

When I saw the documentation/examples, I was confused, why do you want 
to reset a hash object, so does it really reset the state? So I had to 
take a look into the code before I was sure, it surely does reset the 
whole thingy (is it called context?).

 It's more efficient than allocating a new hash with 'new' (or do you
 mean why use reset if finish resets anyway? It's useful if you put some
 data into a hash, then decide you don't want the hash (because the
 user aborts the operation, network connection is lost,...) but you still
 need the hash object later on.) You could just call finish and
 disregard the result, but the reset implementation is faster. I agree
 there's probably no _strong_ need for reset, but I think it doesn't do
 any harm.

"Explicit is better than implicit."

 Well, that's it from my side :)

 Thanks for your review!

Thanks for writing std.hash, I think lots of people need it.

Aug 07 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 00:55:47 +0200
schrieb David <d dav1d.de>:

 Am 07.08.2012 21:53, schrieb Johannes Pfau:
 Am Tue, 07 Aug 2012 20:07:07 +0200
 schrieb David <d dav1d.de>:

 start is here as structs can't have default constructors. For all
 current hashes it's really mostly useless (except it can be used as
 a simple way to reset a hash, which is also why the hash structs
 don't have a reset member), but I just don't know if there are
 hashes which would require a more complex (runtime) initialization.

 
 Ok this point with the one above makes sense (I implemented my
 OpenSSL hashing wrapper as a class, initilaization is done in the
 constructor), it still doesn't feel right, if you have to call .start
 first. What about a struct-constructor which calls .start internally,
 so you get a bit more of a modern API (imo) and you're still able to
 implement the same interface for any kind of wrapper/own
 implementation/whatever.

You mean an external function which constructs the digest context and
calls start? That's similar to the appender function in std.array and
probably a good idea. We'd need a good name for it though.
* initContext
* initializeContext
* context
* startContext
* startHash / startDigest

 property is not a good idea, as finish must reset the internal state
 for some objects, so you can't access the property repeatedly. It'd
 have to be a function.

 
 Well, you could store the result internally, property generates on
 the first call the digest, stores it (that's not too much, 16 byte
 for a md5) and returns the already stored value on the 2nd call.
 

yes, that could be done but to be honest I don't think it's worth it.

 which calls finish and returns an
 ubyte[] array (or string for hexdigest)

 This could be done and is probably a matter of taste. I prefer the
 free function in this case, it makes clear that digest() really is
 a generic function which can be used with any hash. It's also one
 function less which must be implemented by hash writers. I'd like
 to keep the number of members required to implement a hash as low
 as possible.

 
 With digest you mean: 
 http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html#digest ?
 
 You normally always want the hexdigest (you barely need "real"
 digest),

really? I though it's the other way round and comparing/verifying
hashes is the common case?

 so it's a matter of convenience. Well, thanks to UFCS, you
 can call it like a method.
 

a hexDigest which works like digest could probably be useful, although
it's very easy to implement this in user code as well:
string hexDigest(Hash, Order order = Order.increasing)(scope
const(void[])[] data...) if(isDigest!Hash)
{
    return digest!Hash(data).toHexString!order();
}

 A static
 method would have to be implemented by every hash.

 
 Yes, you would have to implement it for every hash, but that's 3
 lines:
 
 static string hexdigest(void[] data) {
 	return toHexString(digest!(typeof(this))("yay"));
 }

The implementation is indeed quite simple, so it probably is a matter
of taste whether hexDigest is implemented as a member or as a free
function.

 BTW: it is implemented as a member for the OOP API, as we can just
 implement it in the base interface, so the actual implementations
 don't have to care about it.

 
 I've seen that, but I am wondering why it's not a static method, the 
 creation of the Hash Object could be hidden inside. You might say,
 this would allocate a class instance just for a single use, without
 the possibility to reuse it. Correct, but I think that's the use of
 such a function, if you just need a Hash, nothing else, otherwise you
 could use the class "directly", with all its other functionality.

I have to think about this. It's too bad we can't have static & member
functions with the same name in D.

I'd probably argue if you want the behavior of a static function, use
the struct API: digest!MD5(data);
Doesn't this do everything you need, without the GC allocation the OOP
api would cause? I mean the OOP API is mostly about polymorphism (and
about reference semantics). If you use a static method as proposed,
you can't use polymorphism and reference semantics are not useful
either, as you never get a reference to the context?

 The OOP/class api can use a default constructor. But unlike the
 start function it can't work as a reset function, so there's an
 explicit reset function.

 
 I am not sure what do think about that. On the one side it's useful
 if you really need that speed, but then I think, is it really worth
 it resetting the state hidden in a function. If you really want to
 avoid another allocation you could use emplace (if I didn't
 misunderstand the use of emplace). To quote from the Python Zen:
 "Explicit is better than implicit."

Yes, it should be possible to use emplace for that. Emplace is not
well-known though and I'm not sure how well it works now. Also I'm not
sure if emplace is enough in this case, I guess you'd have to destroy
the old instance as well (so destructors are run)?

But I'm not really sure about the importance of reset either. As long
as the main implementation is in the struct API and WrapperDigest is
used to wrap it into the OOP API there's no extra work to implement
reset. If the main implementation is in the OOP interface, the reset
function probably looks a lot like the constructor, so simply calling
reset from the constructor should avoid code duplication.
 
 (I also find the examples quite confusing, why do you want to
 reset the hash onject?

 My approach was to document how to use the functions, not as much
 when it makes sense to use them. Maybe I should care more about
 that?

 
 When I saw the documentation/examples, I was confused, why do you
 want to reset a hash object, so does it really reset the state? So I
 had to take a look into the code before I was sure, it surely does
 reset the whole thingy (is it called context?).

Yes, the whole terminology is probably a little confusing. The old API
called the contexts like this: MD5_CTX, but that doesn't fit the phobos
naming conventions. MD5CTX, MD5Ctx, Md5Ctx and Md5CTX all look ugly
(and only the first fits our naming conventions). So MD5Context is the
next best choice, but it's a little long. It'd more precise though.

 
 It's more efficient than allocating a new hash with 'new' (or do you
 mean why use reset if finish resets anyway? It's useful if you put
 some data into a hash, then decide you don't want the hash (because
 the user aborts the operation, network connection is lost,...) but
 you still need the hash object later on.) You could just call
 finish and disregard the result, but the reset implementation is
 faster. I agree there's probably no _strong_ need for reset, but I
 think it doesn't do any harm.

 
 "Explicit is better than implicit."
 

I'm just not sure if reset can be implemented in a reasonable way
outside of the class in all cases. But then it's probably not important
enough to justify an extra member...

Aug 08 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Johannes Pfau wrote:
 As I had some free time and nobody else stepped up I implemented it,

I tried it before but I wanted to create whole crypto package at once, 
and I guess it was a huge mistake. I know you have used my code in your 
uuid module, you can use it to add it to std.hash if you want. There are 
all SHA implementations, MD5 and a Tiger (both versions). They all 
support bit hashing and work with CTFE.

I made a quick look at your proposal, but currently don't have time to 
do a deep investigation :). One note though. I don't think it's 
neccessary to maintain two different API sets. Sure, classes need heap 
allocation but usually that's only one allocation, it's more important 
to not allocate during algorithm computations. Fortunately crypto 
algorithms don't do this. So, struct allocation efficency is negligible 
here. I think separate struct based API set is not worth it (or I'm 
missing something not related to allocation cost).

Aug 07 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 01:27:45 +0200
schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:

 Johannes Pfau wrote:
 As I had some free time and nobody else stepped up I implemented it,

 
 I tried it before but I wanted to create whole crypto package at
 once,

I'm sorry, I didn't want to conceal your work. What I meant with
'nobody stepped up' is this: We had a pull request which moved the crc
code into std.crc and it was already merged in for 2.060. I
complained that it had yet another API and that we should have a common
API. In the end that pull request was reverted but I felt kinda guilty
as that left us without a public crc module. In that discussion, nobody
else had time to implement that API, so I thought we'd need a
solution soon and started implementing it.

 and I guess it was a huge mistake.

We still need your crypto work ;-)
I'd personally like to see a bcrypt implementation. There's no
widespread bcrypt C library which could be used, so there's no painless
way to use bcrypt in D. (Although there _is_ public domain C source
code)

 I know you have used my code
 in your uuid module, you can use it to add it to std.hash if you
 want. There are all SHA implementations, MD5 and a Tiger (both
 versions). They all support bit hashing and work with CTFE.

Great, I'll have a look if this review works out (or as soon as we have
a standardized hash API in phobos).

BTW: How does it work in CTFE? Don't you have to do endianness
conversions at some time? According to Don that's not really supported.

Another problem with prevents CTFE for my proposal would be that the
internal state is currently implemented as an array of uints, but the
API uses ubyte[] as a return type. That sort of reinterpret cast is not
supposed to work in CTFE though. I wonder how you avoided that issue?

And another problem is that void[][] (as used in the 'digest' function)
doesn't work in CTFE (and it isn't supposed to work). But that's a
problem specific to this API.

 I made a quick look at your proposal, but currently don't have time
 to do a deep investigation :). One note though. I don't think it's 
 neccessary to maintain two different API sets. Sure, classes need
 heap allocation but usually that's only one allocation, it's more
 important to not allocate during algorithm computations. Fortunately
 crypto algorithms don't do this. So, struct allocation efficency is
 negligible here. I think separate struct based API set is not worth
 it (or I'm missing something not related to allocation cost).

I'm not sure, I guess some people would disagree. I initially asked
whether we should use a struct based API or a class based API (+emplace
where necessary). At that time the struct API was favored.

It's about the allocation cost (and the additional GC work) although
people could probably also complain about the indirection classes bring
along.

There's one important example where you really don't want any
allocation to happen (and especially no GC allocation): You have a
function which always calculates the same type of hash (e.g. md5) and
you don't care about the implementation. You only care about the
final hash, the hash context is never used outside of that function.
This is an perfect fit for stack allocation, especially if the
function is used a lot (GC). With classes you'd have to do
some caching to do it efficiently (which then gives problems with
const).
It would be possible to use a class API + emplace, but that was deemed
to be too cumbersome.

Aug 08 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Johannes Pfau wrote:
 Am Wed, 08 Aug 2012 01:27:45 +0200
 schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:

 Johannes Pfau wrote:
 As I had some free time and nobody else stepped up I implemented it,

 I tried it before but I wanted to create whole crypto package at
 once,

 I'm sorry, I didn't want to conceal your work. What I meant with
 'nobody stepped up' is this: We had a pull request which moved the crc
 code into std.crc and it was already merged in for 2.060. I
 complained that it had yet another API and that we should have a common
 API. In the end that pull request was reverted but I felt kinda guilty
 as that left us without a public crc module. In that discussion, nobody
 else had time to implement that API, so I thought we'd need a
 solution soon and started implementing it.

Hey, I didn't say you did :-) I'm fine with your work :-)

 and I guess it was a huge mistake.

 We still need your crypto work ;-)
 I'd personally like to see a bcrypt implementation. There's no
 widespread bcrypt C library which could be used, so there's no painless
 way to use bcrypt in D. (Although there _is_ public domain C source
 code)

Yes, there should be bcrypt, scrypt and PBKDF2.

 I know you have used my code
 in your uuid module, you can use it to add it to std.hash if you
 want. There are all SHA implementations, MD5 and a Tiger (both
 versions). They all support bit hashing and work with CTFE.

 Great, I'll have a look if this review works out (or as soon as we have
 a standardized hash API in phobos).

 BTW: How does it work in CTFE? Don't you have to do endianness
 conversions at some time? According to Don that's not really supported.

std.bitmanip.swapEndian() works for me

 Another problem with prevents CTFE for my proposal would be that the
 internal state is currently implemented as an array of uints, but the
 API uses ubyte[] as a return type. That sort of reinterpret cast is not
 supposed to work in CTFE though. I wonder how you avoided that issue?

There is set of functions that abstract some operations to work with 
CTFE and at runtime: 
https://github.com/pszturmaj/phobos/blob/master/std/crypto/hash/base.d#L66. 
Particularly memCopy().

 And another problem is that void[][] (as used in the 'digest' function)
 doesn't work in CTFE (and it isn't supposed to work). But that's a
 problem specific to this API.

Yes, that's why I use ubyte[].

I think if it's possible to do it all with CTFE by using hacks, it 
should be rather implemented in the compiler, assuming the endiannes of 
the CPU it's running on.

[...cut...]
 There's one important example where you really don't want any
 allocation to happen (and especially no GC allocation): You have a
 function which always calculates the same type of hash (e.g. md5) and
 you don't care about the implementation. You only care about the
 final hash, the hash context is never used outside of that function.
 This is an perfect fit for stack allocation, especially if the
 function is used a lot (GC). With classes you'd have to do
 some caching to do it efficiently (which then gives problems with
 const).
 It would be possible to use a class API + emplace, but that was deemed
 to be too cumbersome.

I don't think std.typecons.scoped is cumbersome:

auto sha = scoped!SHA1(); // allocates on the stack
auto digest = sha.digest("test");

Why I think classes should be supported is the need of polymorphism. One 
should be able to accept Digest base class as a function parameter or a 
field/property. Without polymorphism we need to resort to C-like 
switches, ifs and so on. Templates are no go, because hash function may 
be determined at runtime (it may depend on security protocol handshake).

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 11:27:49 +0200
schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:

 BTW: How does it work in CTFE? Don't you have to do endianness
 conversions at some time? According to Don that's not really
 supported.

 
 std.bitmanip.swapEndian() works for me

Great! I always tried the *endianToNative and nativeTo*Endian functions.
So I didn't expect swapEndian to work.
 
 Another problem with prevents CTFE for my proposal would be that the
 internal state is currently implemented as an array of uints, but
 the API uses ubyte[] as a return type. That sort of reinterpret
 cast is not supposed to work in CTFE though. I wonder how you
 avoided that issue?

 
 There is set of functions that abstract some operations to work with 
 CTFE and at runtime: 
 https://github.com/pszturmaj/phobos/blob/master/std/crypto/hash/base.d#L66. 
 Particularly memCopy().

I should definitely look at this later. Would be great if hashes worked
in CTFE.

 And another problem is that void[][] (as used in the 'digest'
 function) doesn't work in CTFE (and it isn't supposed to work). But
 that's a problem specific to this API.

 
 Yes, that's why I use ubyte[].

But then you can't even hash a string in CTFE. I wanted to special case
strings, but for various reasons it didn't work out in the end.
 
 I don't think std.typecons.scoped is cumbersome:
 
 auto sha = scoped!SHA1(); // allocates on the stack
 auto digest = sha.digest("test");

Yes I'm not sure about this. But a class only based interface probably
hasn't high chances of being accepted into phobos. And I think the
struct interface+wrappers approach isn't bad.

 
 Why I think classes should be supported is the need of polymorphism.

And ABI compatibility and switching the backend (OpenSSL, native D,
windows crypto) at runtime. I know it's very useful, this is why we
have the OOP api. It's very easy to wrap the OOP api onto the struct
api. These are the implementations of MD5Digest, CRC32Digest and
SHA1Digest:

alias WrapperDigest!CRC32 CRC32Digest;
alias WrapperDigest!MD5 MD5Digest;
alias WrapperDigest!SHA1 SHA1Digest;

with the support code in std.hash.hash 1LOC is enough to implement the
OOP interface if a struct interface is available, so I don't think
maintaining two APIs is a problem.

A bigger problem is that the real implementation must be the struct
interface, so you can't use polymorphism there. I hope alias this is
enough.

Aug 08 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, August 08, 2012 18:12:23 Johannes Pfau wrote:
 Am Wed, 08 Aug 2012 11:27:49 +0200
 
 schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:
 BTW: How does it work in CTFE? Don't you have to do endianness
 conversions at some time? According to Don that's not really
 supported.

 
 std.bitmanip.swapEndian() works for me

 
 Great! I always tried the *endianToNative and nativeTo*Endian functions.
 So I didn't expect swapEndian to work.

What's wrong with the *endianToNative and nativeTo*Endian functions? They work 
just fine as far as I know. swapEndian works too if you want it to use that, but
there should be nothing wrong with the endian-specific ones.

- Jonathan M Davis

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 14:36:32 -0400
schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:

 What's wrong with the *endianToNative and nativeTo*Endian functions?
 They work just fine as far as I know. swapEndian works too if you
 want it to use that, but there should be nothing wrong with the
 endian-specific ones.
 
 - Jonathan M Davis

in CTFE?
http://dpaste.dzfl.pl/0503b8af

According to Don reinterpret casts (even if done through unions) won't
be supported in CTFE. So you can't convert from uint-->ubyte[4]

Aug 08 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, August 08, 2012 20:55:19 Johannes Pfau wrote:
 Am Wed, 08 Aug 2012 14:36:32 -0400
 
 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:
 What's wrong with the *endianToNative and nativeTo*Endian functions?
 They work just fine as far as I know. swapEndian works too if you
 want it to use that, but there should be nothing wrong with the
 endian-specific ones.
 
 - Jonathan M Davis

 
 in CTFE?
 http://dpaste.dzfl.pl/0503b8af
 
 According to Don reinterpret casts (even if done through unions) won't
 be supported in CTFE. So you can't convert from uint-->ubyte[4]

No. It wouldn't work in CTFE, because it uses a union. But what it's trying to 
doesn't really make sense in CTFE in most cases anyway, because the endianness 
of the target machine may not be the same endianness as the machine doing the 
compilation. Any computations which cared about endianness must be in a state 
where they don't care about endianness anymore once CTFE has completed, or 
you're going to have bugs.

Though if the issue is std.hash being CTFEable, I don't know why anyone would 
even care. It's cool if it's CTFEable, but the sorts of things that you hash 
pretty much always require user or file input of some kind (which you can't do 
with CTFE). You'd have to have a use case where something within the program 
itself needed to be hashed for some reason for it to matter whether std.hash 
was CTFEable or not, and it wouldn't surprise me at all if it were typical in 
hash functions to do stuff that isn't CTFEable anyway.

- Jonathan M Davis

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 16:44:03 -0400
schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:

 
 in CTFE?
 http://dpaste.dzfl.pl/0503b8af
 
 According to Don reinterpret casts (even if done through unions)
 won't be supported in CTFE. So you can't convert from
 uint-->ubyte[4]

 
 No. It wouldn't work in CTFE, because it uses a union.But what it's
 trying to doesn't really make sense in CTFE in most cases anyway,
 because the endianness of the target machine may not be the same
 endianness as the machine doing the compilation. Any computations
 which cared about endianness must be in a state where they don't care
 about endianness anymore once CTFE has completed, or you're going to
 have bugs.

I completely agree, but this is true for hashes. Once the final hash
value is produced it doesn't depend on the endianness.

 
 Though if the issue is std.hash being CTFEable, I don't know why
 anyone would even care. It's cool if it's CTFEable, but the sorts of
 things that you hash pretty much always require user or file input of
 some kind (which you can't do with CTFE).

Yeah it's not that useful, that's why I didn't care about CTFE support
right now. The only usecase I can think of is to hash a string in CTFE,
for example UUID could use it to support name based UUID literals.

 You'd have to have a use
 case where something within the program itself needed to be hashed
 for some reason for it to matter whether std.hash was CTFEable or
 not, and it wouldn't surprise me at all if it were typical in hash
 functions to do stuff that isn't CTFEable anyway.
 
 - Jonathan M Davis

Aug 09 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 11:27:49 +0200
schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:

 
 Yes, there should be bcrypt, scrypt and PBKDF2.

Wow, I didn't know about scrypt. Seems to be pretty cool.

Aug 08 2012

Ary Manzana <ary esperanto.org.ar> writes:

On 8/7/12 14:39 , Dmitry Olshansky wrote:
 Since the review queue has been mostly silent again I've decided to jump
 in and manage the one that's ready to go :)

 Today starts the the review of std.hash package by Johannes Pfau. We go
 with the usual cycle of two weeks for review and one week for voting.
 Thus review ends on 22th of August, followed by voting that ends on 29th
 of August.

 Description:

 std.hash.hash is a new module for Phobos defining an uniform interface
 for hashes and checksums. It also provides some useful helper functions
 to deal with this new API.

 The std.hash package also includes:

I think "std.crypto" is a better name for the package. At first I 
thought it contained an implementation of a Hash table.

Also note these entries in wikipedia:

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Cryptographic_hash_function

Your package provides the later, not just any hash functions, but 
*crypto*graphic hash functions. :-)

(and yes, I know I'm just discussing the name here, but names *are* 
important)

Aug 07 2012

deadalnix <deadalnix gmail.com> writes:

Le 07/08/2012 20:31, Ary Manzana a �crit :
 On 8/7/12 14:39 , Dmitry Olshansky wrote:
 Since the review queue has been mostly silent again I've decided to jump
 in and manage the one that's ready to go :)

 Today starts the the review of std.hash package by Johannes Pfau. We go
 with the usual cycle of two weeks for review and one week for voting.
 Thus review ends on 22th of August, followed by voting that ends on 29th
 of August.

 Description:

 std.hash.hash is a new module for Phobos defining an uniform interface
 for hashes and checksums. It also provides some useful helper functions
 to deal with this new API.

 The std.hash package also includes:

 I think "std.crypto" is a better name for the package. At first I
 thought it contained an implementation of a Hash table.

 Also note these entries in wikipedia:

 http://en.wikipedia.org/wiki/Hash_function
 http://en.wikipedia.org/wiki/Cryptographic_hash_function

 Your package provides the later, not just any hash functions, but
 *crypto*graphic hash functions. :-)

 (and yes, I know I'm just discussing the name here, but names *are*
 important)

You'll find very hard to convince anyone that crc32 is a cryptographic 
hash function.

And this API is suited for both cryptographic hash and regular hash. 
Many of them can be added in the future if need is met. I definitively 
am for std.hash .

Aug 07 2012

Johannes Pfau <nospam example.com> writes:

Am Tue, 07 Aug 2012 20:46:44 +0200
schrieb deadalnix <deadalnix gmail.com>:

 You'll find very hard to convince anyone that crc32 is a
 cryptographic hash function.

And there will hopefully be more hashes in std.hash a some point.

BTW: I also considered splitting hashes into cryptographic and
non-cryptographic/checksums. But as we also have some generic parts
(currently in std.hash.hash) this would pose the question where to put
the generic part? Put it in std.checksum and std.crypto users will
complain, put it in std.crypto and std.checksum users won't be happy.

And as was discussed previously what's considered a safe cryptographic
hash might change as time goes by.

 And this API is suited for both cryptographic hash and regular hash. 
 Many of them can be added in the future if need is met. I
 definitively am for std.hash .

We had this package name discussion a few times on the newsgroup and
I think on github as well.

I personally don't care about the package name and I'll just choose
what the majority thinks is best. Last time it seemed std.hash was the
favorite (although hash and digest can probably be used interchangeably)

Aug 07 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
 The std.hash package also includes:

 I think "std.crypto" is a better name for the package. At first I
 thought it contained an implementation of a Hash table.

That doesn't fly, because crc32 is going to be in there, and while it's a hash, 
it's no good for cryptography.

- Jonathan M Davis

Aug 07 2012

"Regan Heath" <regan netmail.co.nz> writes:

On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
 The std.hash package also includes:

 I think "std.crypto" is a better name for the package. At first I
 thought it contained an implementation of a Hash table.

 That doesn't fly, because crc32 is going to be in there, and while it's  
 a hash, it's no good for cryptography.

std.digest then?

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 08 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:
 On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 
On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
I think "std.crypto" is a better name for the package. At first I
thought it contained an implementation of a Hash table.

That doesn't fly, because crc32 is going to be in there, and while
it's a hash, it's no good for cryptography.

 
 std.digest then?

[...]

+1. I think std.hash is needlessly confusing (I thought it was another
hashtable implementation until I read this thread more carefully).


T

-- 
Two wrongs don't make a right; but three rights do make a left...

Aug 08 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

On Wednesday, 8 August 2012 at 12:00:42 UTC, H. S. Teoh wrote:
 On Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:
 On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 
On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
I think "std.crypto" is a better name for the package. At 
first I
thought it contained an implementation of a Hash table.

That doesn't fly, because crc32 is going to be in there, and 
while
it's a hash, it's no good for cryptography.

 
 std.digest then?

 [...]

 +1. I think std.hash is needlessly confusing (I thought it was 
 another
 hashtable implementation until I read this thread more 
 carefully).


 T

-1
std.digest let's me think of 
http://en.wikipedia.org/wiki/Digestion
digest is just not common if you mean hash in my cycles.

I didn't think of an hash table implementation, maybe you are 
spoiled by writing one at the moment? (no offence).

Aug 08 2012

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 08 Aug 2012 13:11:43 +0100, Tobias Pankrath <tobias pankrath.net>  
wrote:

 On Wednesday, 8 August 2012 at 12:00:42 UTC, H. S. Teoh wrote:
 On Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:
 On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
  >On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
I think "std.crypto" is a better name for the package. At >>first I
thought it contained an implementation of a Hash table.

That doesn't fly, because crc32 is going to be in there, and >while
it's a hash, it's no good for cryptography.

  std.digest then?

 [...]

 +1. I think std.hash is needlessly confusing (I thought it was another
 hashtable implementation until I read this thread more carefully).


 T

 -1
 std.digest let's me think of http://en.wikipedia.org/wiki/Digestion

That's exactly what it's supposed to suggest.  The algorithm does digest  
the input (AKA message) and output .. something else :p

 digest is just not common if you mean hash in my cycles.

Like it or not, Digest is the correct term:
http://en.wikipedia.org/wiki/MD5
"The MD5 Message-Digest Algorithm .."

 I didn't think of an hash table implementation, maybe you are spoiled by  
 writing one at the moment? (no offence).

"Hash" has too many meanings, we should avoid it.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 08 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

On Wednesday, 8 August 2012 at 12:55:04 UTC, Regan Heath wrote:

 Like it or not, Digest is the correct term:
 http://en.wikipedia.org/wiki/MD5
 "The MD5 Message-Digest Algorithm .."

You could have cited the hole sentence

 The MD5 Message-Digest Algorithm is a widely used cryptographic 
 hash function

So at least this implies that hash function is the more general 
term here and the corresponding wiki article is named "hash 
function" and does not even mention digest.

 I didn't think of an hash table implementation, maybe you are 
 spoiled by writing one at the moment? (no offence).

 "Hash" has too many meanings, we should avoid it.

At least hash table does not use a different meaning of the term 
hash.

But I'm not that deep into it, I'd just say that digest is not 
clearly better than hash.

Aug 08 2012

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 08 Aug 2012 14:03:32 +0100, Tobias Pankrath <tobias pankrath.net>  
wrote:

 On Wednesday, 8 August 2012 at 12:55:04 UTC, Regan Heath wrote:

 Like it or not, Digest is the correct term:
 http://en.wikipedia.org/wiki/MD5
 "The MD5 Message-Digest Algorithm .."

 You could have cited the hole sentence

I could have, but I didn't read that far :p  I knew what I was looking for  
and I copy/pasted it.

 The MD5 Message-Digest Algorithm is a widely used cryptographic hash  
 function

 So at least this implies that hash function is the more general term  
 here and the corresponding wiki article is named "hash function" and  
 does not even mention digest.

"Message-Digest Algorithm" is the proper term, "hash" is another, correct,  
more general term.

"hash" has other meanings, "Message-Digest Algorithm" does not.

std.message-digest-algorithm is a bit wordy.

std.digest is not.

std.digest cannot be confused with anything else.

 I didn't think of an hash table implementation, maybe you are spoiled  
 by writing one at the moment? (no offence).

 "Hash" has too many meanings, we should avoid it.

 At least hash table does not use a different meaning of the term hash.

 But I'm not that deep into it, I'd just say that digest is not clearly  
 better than hash.

I think it is.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 08 2012

travert phare.normalesup.org (Christophe Travert) writes:

"Regan Heath" , dans le message (digitalmars.D:174462), a écrit :
 "Message-Digest Algorithm" is the proper term, "hash" is another, correct,  
 more general term.
 
 "hash" has other meanings, "Message-Digest Algorithm" does not.

I think the question is: is std.hash going to contain only 
message-digest algorithm, or could it also contain other hash functions?
I think there is enough room in a package to have both message-digest 
algorithm and other kinds of hash functions.

Aug 08 2012

"Chris Cain" <clcain uncg.edu> writes:

On Wednesday, 8 August 2012 at 13:38:26 UTC, 
travert phare.normalesup.org (Christophe Travert) wrote:
 I think the question is: is std.hash going to contain only
 message-digest algorithm, or could it also contain other hash 
 functions?
 I think there is enough room in a package to have both 
 message-digest
 algorithm and other kinds of hash functions.

Even if that were the case, I'd say they should be kept separate. 
Cryptographic hash functions serve extremely different purposes 
from regular hash functions. There is no reason they should be 
categorized the same.

Aug 08 2012

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 08 Aug 2012 14:50:22 +0100, Chris Cain <clcain uncg.edu> wrote:

 On Wednesday, 8 August 2012 at 13:38:26 UTC,  
 travert phare.normalesup.org (Christophe Travert) wrote:
 I think the question is: is std.hash going to contain only
 message-digest algorithm, or could it also contain other hash functions?
 I think there is enough room in a package to have both message-digest
 algorithm and other kinds of hash functions.

 Even if that were the case, I'd say they should be kept separate.  
 Cryptographic hash functions serve extremely different purposes from  
 regular hash functions. There is no reason they should be categorized  
 the same.

I don't think there is any reason to separate them.  People should know  
which digest algorithm they want, they're not going to pick one at random  
and assume it's "super secure!"(tm).  And if they do, well tough, they  
deserve what they get.

"std.digest" can encompass all message digest algorithms, whether secure  
or not.

We could create a 2nd level below "secure" or "crypto" or similar if we  
really want, but I don't see much point TBH.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 08 2012

travert phare.normalesup.org (Christophe Travert) writes:

"Chris Cain" , dans le message (digitalmars.D:174466), a écrit :
 On Wednesday, 8 August 2012 at 13:38:26 UTC, 
 travert phare.normalesup.org (Christophe Travert) wrote:
 I think the question is: is std.hash going to contain only
 message-digest algorithm, or could it also contain other hash 
 functions?
 I think there is enough room in a package to have both 
 message-digest
 algorithm and other kinds of hash functions.

 
 Even if that were the case, I'd say they should be kept separate. 
 Cryptographic hash functions serve extremely different purposes 
 from regular hash functions. There is no reason they should be 
 categorized the same.

They should not be categorized the same. I don't expect a regular hash 
function to pass the isDigest predicate. But they have many 
similarities, which explains they are all called hash functions. There 
is enough room in a package to put several related concepts!

Here, we have a package for 4 files, with a total number of line that is 
about one third of the single std.algorithm file (which is probably too 
big, I conceed). There aren't hundreds of message-digest functions to 
add here.

If it where me, I would have the presently reviewed module std.hash.hash 
be called std.hash.digest, and leave room here for regular hash 
functions. In any case, I think regular hash HAVE to be in a std.hash 
module or package, because people looking for a regular hash function 
will look here first.

Aug 08 2012

"Chris Cain" <clcain uncg.edu> writes:

On Wednesday, 8 August 2012 at 14:14:29 UTC, Regan Heath wrote:
 I don't think there is any reason to separate them.  People 
 should know which digest algorithm they want, they're not going 
 to pick one at random and assume it's "super secure!"(tm).  And 
 if they do, well tough, they deserve what they get.

In this case, I'm not suggesting keep them separate to not 
confuse those who don't know better. They're simply disparate in 
actual use.

What do you use a traditional hash function for? Usually to turn 
a large multibyte stream into some finite size so that you can 
use a lookup table or maybe to decrease wasted time in 
comparisons.

What do you use a cryptographic hash function for? Almost always 
it's to verify the integrity of some data (usually files) or 
protect the original form from prying eyes (passwords ... though, 
there are better approaches for that now).

You'd _never_ use a cryptographic hash function in place of a 
traditional hash function and vice versa because they designed 
for completely different purposes. At a cursory glance, they bare 
only one similarity and that's the fact that they turn a big 
chunk of data into a smaller form that has a fixed size.

On Wednesday, 8 August 2012 at 14:16:40 UTC, 
travert phare.normalesup.org (Christophe Travert) wrote:
 function to pass the isDigest predicate. But they have many
 similarities, which explains they are all called hash 
 functions. There
 is enough room in a package to put several related concepts!

Crytographic hash functions are also known as "one-way 
compression functions." They also have similarities to file 
compression algorithms. After all, both of them turn large files 
into smaller data. However, the actual use of them is completely 
different and you wouldn't use one in place of the other. I 
wouldn't put the Burrows-Wheeler transform in the same package.



It's just my opinion of course, but I just feel it wouldn't be 
right to intermingle normal hash functions and cryptographic hash 
functions in the same package. If we had to make a compromise and 
group them with something else, I'd really like to see 
cryptographic hash functions put in the same place we'd put other 
cryptography (such as AES) ... in a std.crypto package. But 
std.digest is good if they can exist in their own package.


It also occurs to me that a lot of people are confounding 
cryptographic hash functions and normal hash functions enough 
that they think that a normal hash function has a "digest" ... 
I'm 99% sure that's exclusive to the cryptographic hash functions 
(at least, I've never heard of a normal hash function producing a 
digest).

Aug 08 2012

travert phare.normalesup.org (Christophe Travert) writes:

"Chris Cain" , dans le message (digitalmars.D:174477), a écrit :

I think you misunderstood me (and it's probably my fault, since I don't 
know much of hash functions), I was wanted to compare two kind of 
concepts:

1/ message digest functions, like md5, or sha1, used on large files,
which is what is covered by this std.hash proposal.
2/ small hash function. Like what are use in an associative array, and 
are called toHash when used a member function.

And I didn't thought of:
3/ cryptographic hash functions

My opinion was that in a module or package called hash, I expect tools 


would rather have it named std.hash.digest, leaving room in the hash 
package to other concepts, like small hash functions that can be used in 




crypto package makes sense.




-- 
Christophe

Aug 08 2012

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 08 Aug 2012 18:33:01 +0100, Christophe Travert  =

<travert phare.normalesup.org> wrote:

 "Chris Cain" , dans le message (digitalmars.D:174477), a =E9crit :

 I think you misunderstood me (and it's probably my fault, since I don'=

t
 know much of hash functions), I was wanted to compare two kind of
 concepts:

 1/ message digest functions, like md5, or sha1, used on large files,
 which is what is covered by this std.hash proposal.
 2/ small hash function. Like what are use in an associative array, and=

 are called toHash when used a member function.

 And I didn't thought of:
 3/ cryptographic hash functions

 My opinion was that in a module or package called hash, I expect tools=



 would rather have it named std.hash.digest, leaving room in the hash
 package to other concepts, like small hash functions that can be used =

in





 crypto package makes sense.




Here is a perfect example of why we need to avoid using "hash", it has t=
oo  =

many meanings to different people.

I suggest:

std.digest <- cryptographic "hash" algorithms
std.crc    <- crc "hash" algorithms
std.uuid   <- identity "hash" algorithms

This is assuming we cannot have more levels of depth in the package/modu=
le  =

tree, otherwise you could group them all under the package "hash":

std.hash.digest
std.hash.crc
std.hash.uuid

Some people are going to argue it should be:

std.crypto.digest or..
std.crypto.hash

But that leads us to something like:

std.crypto.hash
std.crc.hash
std.uuid.hash

And that seems back-to-front to me, and more importantly would  =

assume/suggest/require we have more packages to put in std.crc and  =

std.uuid, which I suspect we wont.

R

-- =

Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 08 2012

"Chris Cain" <clcain uncg.edu> writes:

On Wednesday, 8 August 2012 at 17:33:01 UTC, 
travert phare.normalesup.org (Christophe Travert) wrote:
 I think you misunderstood me (and it's probably my fault, since 
 I don't
 know much of hash functions), I was wanted to compare two kind 
 of
 concepts:

 1/ message digest functions, like md5, or sha1, used on large 
 files,
 which is what is covered by this std.hash proposal.
 2/ small hash function. Like what are use in an associative 
 array, and
 are called toHash when used a member function.

 And I didn't thought of:
 3/ cryptographic hash functions

Actually, maybe I'm the one not doing a good job of explaining.

1 and 3 are the same things (what you're calling "message digest" 
functions are cryptographic hash functions). I'm saying that even 
though similar in name, cryptographic hash functions really can't 
(IMO, I suppose I should make clear) be put in the same place as 
normal hash functions because they barely have anything in 
common. You can't use on in the place of another nor are they 
really used in similar manners.

 My opinion was that in a module or package called hash, I 
 expect tools


I agree. I'd think similarly (I'd assume std.hash has something 
to do with hash tables or hash functions used for hash tables).


If I were looking to use a cryptographic hash function like SHA1 
or (eh) MD5, I'd look for std.crypto first, and probably pick 
std.digest if I saw that. As a last resort I'd look in std.hash 
and vomit profusely after seeing it grouped with the "times 33" 
hash.

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 8 Aug 2012 14:16:40 +0000 (UTC)
schrieb travert phare.normalesup.org (Christophe Travert):

 If it where me, I would have the presently reviewed module
 std.hash.hash be called std.hash.digest, and leave room here for
 regular hash functions. In any case, I think regular hash HAVE to be
 in a std.hash module or package, because people looking for a regular
 hash function will look here first.
 
 

std.hash.digest doesn't sound too bad. We could have std.hash.func (or
a better named module ;-) for general hash functions later.

Aug 08 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, August 08, 2012 18:47:34 Johannes Pfau wrote:
 Am Wed, 8 Aug 2012 14:16:40 +0000 (UTC)
 
 schrieb travert phare.normalesup.org (Christophe Travert):
 If it where me, I would have the presently reviewed module
 std.hash.hash be called std.hash.digest, and leave room here for
 regular hash functions. In any case, I think regular hash HAVE to be
 in a std.hash module or package, because people looking for a regular
 hash function will look here first.

 
 std.hash.digest doesn't sound too bad. We could have std.hash.func (or
 a better named module ;-) for general hash functions later.

I say just keep at simple and leave it at std.hash. It's plenty clear IMHO.

- Jonathan M Davis

Aug 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/8/12 4:34 PM, Jonathan M Davis wrote:
 I say just keep at simple and leave it at std.hash. It's plenty clear IMHO.

Not clear to quite a few of us. IMHO it just makes us seem (to the 
larger community) clever about a petty point. There's plenty of other 
better names, and std.digest is very adequate.

Andrei

Aug 08 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, August 08, 2012 18:47:04 Andrei Alexandrescu wrote:
 On 8/8/12 4:34 PM, Jonathan M Davis wrote:
 I say just keep at simple and leave it at std.hash. It's plenty clear
 IMHO.

 
 Not clear to quite a few of us. IMHO it just makes us seem (to the
 larger community) clever about a petty point. There's plenty of other
 better names, and std.digest is very adequate.

I prefer std.hash to std.digest, but I don't necessarily care all that much. 
What I was objecting to in particular was the suggestion to split it into 
std.hash.digest and std.hash.func. I think that all of the hashing algorithms 
should just go in the one package. Adding another layer is an unnecessary 
complication IMHO.

- Jonathan M Davis

Aug 08 2012

"RivenTheMage" <riven-mage id.ru> writes:

On Wednesday, 8 August 2012 at 16:47:35 UTC, Johannes Pfau wrote:

 std.hash.digest doesn't sound too bad. We could have 
 std.hash.func (or
 a better named module ;-) for general hash functions later.

Three basic types of hash functions are:

1) Hash - for fast searching and indexing in data structures
2) Checksum - detects the accidental errors in files, archives, 
streams
3) Message digest code - prevents the intentional modification of 
data

They should not be mixed IMHO.

1) should go into std.container or (maybe) std.algorithm
2) std.checksum
3) std.crypto.mdc or std.crypto.digest

Aug 15 2012

"David Nadlinger" <see klickverbot.at> writes:

On Wednesday, 15 August 2012 at 08:49:26 UTC, RivenTheMage wrote:
 Three basic types of hash functions are:

 1) Hash - for fast searching and indexing in data structures
 2) Checksum - detects the accidental errors in files, archives, 
 streams
 3) Message digest code - prevents the intentional modification 
 of data

 They should not be mixed IMHO.

Why? 1) might have a different interface than the others, but 2) 
and 3) only differ in their cryptological properties, the 
interface will likely be just the same – or what are you 
thinking about?

David

Aug 15 2012

"RivenTheMage" <riven-mage id.ru> writes:

On Wednesday, 15 August 2012 at 08:55:30 UTC, David Nadlinger
wrote:

 Why? 1) might have a different interface than the others, but 
 2) and 3) only differ in their cryptological properties, the 
 interface will likely be just the same – or what are you 
 thinking about?

 David

The "only" difference between 2) and 3) is a big difference.
CRC32, Adler, etc. are NOT a cryptographic hash fuctions. Their
purpose is to detect accidental errors caused by malfunction of
hardware or software, nothing more.

For me, it's weird and confusing to mix checksums and MDCs.

It's about organizing the standard library for the better
usability. That is the whole point of modules. After all, you can
place all the standard library in one module, why not? :-)

Aug 15 2012

"RivenTheMage" <riven-mage id.ru> writes:

Another example is a systematic error-correcting codes. The
"only" difference between them and checksums is the ability to
correct errors, not just detect them. CRC or MD5 can be viewed as
systematic code with zero error-correcting ability.

Should we mix Reed-Solomon codes and MD5 in one module? I don't 
think so.

Aug 15 2012

=?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= <jsancio gmail.com> writes:

On Wed, Aug 15, 2012 at 2:40 AM, RivenTheMage <riven-mage id.ru> wrote:
 Another example is a systematic error-correcting codes. The
 "only" difference between them and checksums is the ability to
 correct errors, not just detect them. CRC or MD5 can be viewed as
 systematic code with zero error-correcting ability.

 Should we mix Reed-Solomon codes and MD5 in one module? I don't think so.

Some people's point is that MD5 was consider a cryptographic digest
function 16 years ago. It is not consider cryptographically secure
today. So why make any design assumption today on how the landscape
will look tomorrow? Specially on a field that is always changing. Why
not lumped them all together and explain the current situation and
recommendation in the comments.

Looks at Python's passlib module for example. They enumerate every
password encoding scheme under the sun (except for scrypt :() and give
a recommendation on the appropriate algorithm to use in the current
computing landscape.
http://packages.python.org/passlib/lib/passlib.hash.html#module-passlib.hash

Thanks,
-Jose

Aug 15 2012

"ReneSac" <reneduani yahoo.com.br> writes:

On Wednesday, 15 August 2012 at 14:36:00 UTC, José Armando 
García Sancio wrote:
 Some people's point is that MD5 was consider a cryptographic 
 digest
 function 16 years ago. It is not consider cryptographically 
 secure
 today. So why make any design assumption today on how the 
 landscape
 will look tomorrow? Specially on a field that is always 
 changing. Why
 not lumped them all together and explain the current situation 
 and
 recommendation in the comments.

 Looks at Python's passlib module for example. They enumerate 
 every
 password encoding scheme under the sun (except for scrypt :() 
 and give
 a recommendation on the appropriate algorithm to use in the 
 current
 computing landscape.
 http://packages.python.org/passlib/lib/passlib.hash.html#module-passlib.hash

 Thanks,
 -Jose

I agree that MD5 isn't cryptographically secure anymore, but it 
was designed as a cryptographic hash algorithm, and it shows. 
It's statistical and performance proprieties are completely 
different from CRCs, and no matter how broken, it still has a 
little of cryptographic strength (no practical preimage attack 
was found till this date, for example).

Note that in the Python passlib, there is no mention to CRC, FNV, 
ROT13, etc. Their place is different.

Aug 15 2012

=?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= <jsancio gmail.com> writes:

On Wed, Aug 15, 2012 at 8:11 AM, ReneSac <reneduani yahoo.com.br> wrote:
 Note that in the Python passlib, there is no mention to CRC, FNV, ROT13,
 etc. Their place is different.

Thats because it is a "password module" and nobody or a small
percentage of the population uses CRC for password digest. Note that
the Python passlib module also has archaic plaintext encodings mainly
for interacting with legacy systems.

The basic point is that std.digest/std.hash (whatever people decide)
should probably just have generic digesting algorithm. The user can
decided which one to use given their requirements. Also, it would be
beneficial if the module also includes a section where it recommends
digest based on the current landscape of computing. High-level
documentation and suggestions are easy to change; APIs are not.

Thanks,
-Jose

Aug 15 2012

"RivenTheMage" <riven-mage id.ru> writes:

On Wednesday, 15 August 2012 at 19:38:34 UTC, José Armando
García Sancio wrote:

 Thats because it is a "password module" and nobody or a small
 percentage of the population uses CRC for password digest.

In turn, that's because CRC is not not a crytographic hash and
not suited for password hashing :)

 The basic point is that std.digest/std.hash (whatever people 
 decide) should probably just have generic digesting algorithm.

Generic digesting algorithm should probably go into std.algorithm.

It could be used like that:

------------
import std.algorithm;
import std.checksum;
import std.crypto.mdc;

ushort num = 1234;
auto hash1 = hash!("(a >>> 20) ^ (a >>> 12) ^ (a >>> 7) ^ (a >>>
4) ^ a")(str); // indexing hash

string str = "abcd";
auto hash3 = hash!(CRC32)(str); // checksum
auto hash2 = hash!(MD5)(str); // crytographic hash
------------

CRC32 and MD5 are ranges and/or classes, derived from
HashAlgorithm interface.

Aug 15 2012

"RivenTheMage" <riven-mage id.ru> writes:

On Thursday, 16 August 2012 at 03:02:59 UTC, RivenTheMage wrote:

 ushort num = 1234;
 auto hash1 = hash!("(a >>> 20) ^ (a >>> 12) ^ (a >>> 7) ^ (a >>>
 4) ^ a")(str); // indexing hash

I forgot that this case is already covered by reduce!(...)

Aug 15 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 08-Aug-12 18:16, Christophe Travert wrote:
 "Chris Cain" , dans le message (digitalmars.D:174466), a écrit :
 On Wednesday, 8 August 2012 at 13:38:26 UTC,
 travert phare.normalesup.org (Christophe Travert) wrote:
 I think the question is: is std.hash going to contain only
 message-digest algorithm, or could it also contain other hash
 functions?
 I think there is enough room in a package to have both
 message-digest
 algorithm and other kinds of hash functions.

 Even if that were the case, I'd say they should be kept separate.
 Cryptographic hash functions serve extremely different purposes
 from regular hash functions. There is no reason they should be
 categorized the same.

 They should not be categorized the same. I don't expect a regular hash
 function to pass the isDigest predicate. But they have many
 similarities, which explains they are all called hash functions. There
 is enough room in a package to put several related concepts!

You still can use say crc32 as normal hash function for some binary 
object. The notions are not as desperate as some designers would want 
them to be.

 Here, we have a package for 4 files, with a total number of line that is
 about one third of the single std.algorithm file (which is probably too
 big, I conceed). There aren't hundreds of message-digest functions to
 add here.

I'd rather see clean by family separation, as importing one huge digest 
module only to use SHA is kind of creepy. On the other hand as all of 
code is templated it's not a big deal.
 If it where me, I would have the presently reviewed module std.hash.hash
 be called std.hash.digest, and leave room here for regular hash
 functions. In any case, I think regular hash HAVE to be in a std.hash
 module or package, because people looking for a regular hash function
 will look here first.

I thing concerns me: if incremental digest hashes are all in one module
what are the (would be) other modules in std.hash?


-- 
Dmitry Olshansky

Aug 08 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 08-Aug-12 21:00, Dmitry Olshansky wrote:
 On 08-Aug-12 18:16, Christophe Travert wrote:

 They should not be categorized the same. I don't expect a regular hash
 function to pass the isDigest predicate. But they have many
 similarities, which explains they are all called hash functions. There
 is enough room in a package to put several related concepts!

 You still can use say crc32 as normal hash function for some binary
 object. The notions are not as desperate as some designers would want
 them to be.

Damned spellcheckers: desperate -> disparate

-- 
Dmitry Olshansky

Aug 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/8/12 8:54 AM, Regan Heath wrote:
 "Hash" has too many meanings, we should avoid it.

Yes please.

Andrei

Aug 08 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 07-Aug-12 22:31, Ary Manzana wrote:
 I think "std.crypto" is a better name for the package. At first I
 thought it contained an implementation of a Hash table.

There is  std.container, so unambiguous for me.
As for std.crypto it's been discussed to death before. Short answer - 
no, because it's assumed to include other useful hash functions (like 
crc32, later I expect adler and whatnot).

Also keep in mind that cryptographic hash is more of a status then 
permanent property. Now that md5 was cracked, it is typically used only 
as normal hash (e.g. checksum). Same thing would one day happen to SHA 
family.


-- 
Dmitry Olshansky

Aug 07 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:
 std.hash.hash is a new module for Phobos defining an uniform interface for
 hashes and checksums. It also provides some useful helper functions to deal
with
 this new API.

The hash functions must use a Range interface, not a file interface.

This is extremely important.

Aug 07 2012

Johannes Pfau <nospam example.com> writes:

Am Tue, 07 Aug 2012 17:39:15 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:
 std.hash.hash is a new module for Phobos defining an uniform
 interface for hashes and checksums. It also provides some useful
 helper functions to deal with this new API.

 
 The hash functions must use a Range interface, not a file interface.
 
 This is extremely important.

I guess this is meant as a general statement and not specifically
targeted at my std.hash proposal?

I'm a little confused as all hashes already are OutputRanges in my
proposal. It's probably not explicit enough in the documentation, but
it's mentioned in one example and in the documentation for 'put';

Aug 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/8/2012 1:44 AM, Johannes Pfau wrote:
 Am Tue, 07 Aug 2012 17:39:15 -0700
 schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:
 std.hash.hash is a new module for Phobos defining an uniform
 interface for hashes and checksums. It also provides some useful
 helper functions to deal with this new API.

 The hash functions must use a Range interface, not a file interface.

 This is extremely important.

 I guess this is meant as a general statement and not specifically
 targeted at my std.hash proposal?

Both.

 I'm a little confused as all hashes already are OutputRanges in my
 proposal. It's probably not explicit enough in the documentation, but
 it's mentioned in one example and in the documentation for 'put';

It should accept an input range. But using an Output Range confuses me. A hash 
function is a reduce algorithm - it accepts a sequence of input values, and 
produces a single value. You should be able to write code like:

   ubyte[] data;
   ...
   auto crc = data.crc32();

For example, the hash example given is:

   foreach (buffer; file.byChunk(4096 * 1024))
       hash.put(buffer);
   auto result = hash.finish();

Instead it should be something like:

   auto result = file.byChunk(4096 * 1025).joiner.hash();

The magic is that any input range that produces bytes could be used, and that 
byte producing input range can be hooked up to the input of any reducing
function.

The use of a member finish() is not what any other reduce algorithm has, and so 
the interface is not a general component interface.

I know the documentation on ranges in Phobos is incomplete and confusing.

I appreciate the effort and care you're putting into this.

Aug 08 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Walter Bright wrote:
    auto result = file.byChunk(4096 * 1025).joiner.hash();

 The magic is that any input range that produces bytes could be used, and
 that byte producing input range can be hooked up to the input of any
 reducing function.

Suppose you have a callback that will give you blocks of bytes to hash. 
Blocks of bytes come from a socket, but not a blocking one. Instead, 
socket uses eventing mechanism (libevent) to get notifications about its 
readiness.

How would you use the hash API in this situation?

Aug 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/8/2012 3:12 AM, Piotr Szturmaj wrote:
 Walter Bright wrote:
    auto result = file.byChunk(4096 * 1025).joiner.hash();

 The magic is that any input range that produces bytes could be used, and
 that byte producing input range can be hooked up to the input of any
 reducing function.

 Suppose you have a callback that will give you blocks of bytes to hash. Blocks
 of bytes come from a socket, but not a blocking one. Instead, socket uses
 eventing mechanism (libevent) to get notifications about its readiness.

 How would you use the hash API in this situation?

Have the callback supply a range interface to call the hash with.

Aug 08 2012

"Martin Nowak" <dawg dawgfoto.de> writes:

--------
Hash hash;

void onData(void[] data)
{
     hash.put(data);
}

void main()
{
     hash.start();
     auto stream = new EventTcpStream("localhost", 80);
     stream.onData = &onData;
     hash.finish();
}
--------

 Have the callback supply a range interface to call the hash with.

That hardly works for event based programming without using coroutines.
It's the classical inversion-of-control dilemma of event based programming  
that forces you to save/restore your state with every event.

Aug 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/8/2012 12:14 PM, Martin Nowak wrote:
 That hardly works for event based programming without using coroutines.
 It's the classical inversion-of-control dilemma of event based programming that
 forces you to save/restore your state with every event.

See the discussion on using reduce().

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 12:31:29 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/8/2012 12:14 PM, Martin Nowak wrote:
 That hardly works for event based programming without using
 coroutines. It's the classical inversion-of-control dilemma of
 event based programming that forces you to save/restore your state
 with every event.

 
 See the discussion on using reduce().
 

I just don't understand it. Let's take the example by Martin Nowak and
port it to reduce: (The code added as comments is the same code for
hashes, working with the current API)

int state; //Hash state;

void onData(void[] data)
{
     state = reduce(state, data); //copy(data, state);
     //state = copy(data, state); //also valid, but not necessary
     //state.put(data); //simple way, doesn't work for ranges
}

void main()
{
     state = 0; //state.start();
     auto stream = new EventTcpStream("localhost", 80);
     stream.onData = &onData;
     //auto result = hash.finish();
}

There are only 2 differences:

1:
the order of the arguments passed to copy and reduce is swapped. This
kinda makes sense (if copy is interpreted as copyTo). Solution: Provide
a method copyInto with swapped arguments if consistency is really so
important.

2:
We need an additional call to finish. I can't say it often enough, I
don't see a sane way to avoid it. Hashes work on blocks, if you didn't
pass enough data finish will have to fill the rest of the block with
zeros before you can get the hash value. This operation can't be
undone. To get a valid result with every call to copy, you'd have to
always call finish. This is
* inefficient, you calculate intermediate values you don't need at all
* you have to copy the hashes state, as you can't continue hashing
  after finish has been called

and both, the state and the result would have to fit into the one value
(called seed for reduce). But then it's still not 100% consistent, as
reduce will return a single value, not some struct including internal
state.

Aug 09 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/9/2012 2:48 AM, Johannes Pfau wrote:
 Am Wed, 08 Aug 2012 12:31:29 -0700
 schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/8/2012 12:14 PM, Martin Nowak wrote:
 That hardly works for event based programming without using
 coroutines. It's the classical inversion-of-control dilemma of
 event based programming that forces you to save/restore your state
 with every event.

 See the discussion on using reduce().

 I just don't understand it. Let's take the example by Martin Nowak and
 port it to reduce: (The code added as comments is the same code for
 hashes, working with the current API)

 int state; //Hash state;

 void onData(void[] data)
 {
       state = reduce(state, data); //copy(data, state);
       //state = copy(data, state); //also valid, but not necessary
       //state.put(data); //simple way, doesn't work for ranges
 }

 void main()
 {
       state = 0; //state.start();
       auto stream = new EventTcpStream("localhost", 80);
       stream.onData = &onData;
       //auto result = hash.finish();
 }

 There are only 2 differences:

 1:
 the order of the arguments passed to copy and reduce is swapped. This
 kinda makes sense (if copy is interpreted as copyTo). Solution: Provide
 a method copyInto with swapped arguments if consistency is really so
 important.

Consistency is important so that disparate components can fit together.

 2:
 We need an additional call to finish. I can't say it often enough, I
 don't see a sane way to avoid it. Hashes work on blocks, if you didn't
 pass enough data finish will have to fill the rest of the block with
 zeros before you can get the hash value. This operation can't be
 undone. To get a valid result with every call to copy, you'd have to
 always call finish. This is
 * inefficient, you calculate intermediate values you don't need at all
 * you have to copy the hashes state, as you can't continue hashing
    after finish has been called

 and both, the state and the result would have to fit into the one value
 (called seed for reduce). But then it's still not 100% consistent, as
 reduce will return a single value, not some struct including internal
 state.

That isn't a problem, the internal state can be private data for the struct,
and 
the "finish value" can be the result of overloading operator() on that struct. 
I'm not sure if that would work, but it's worth investigating.

Aug 09 2012

deadalnix <deadalnix gmail.com> writes:

Le 09/08/2012 11:48, Johannes Pfau a �crit :
 Am Wed, 08 Aug 2012 12:31:29 -0700
 schrieb Walter Bright<newshound2 digitalmars.com>:

 On 8/8/2012 12:14 PM, Martin Nowak wrote:
 That hardly works for event based programming without using
 coroutines. It's the classical inversion-of-control dilemma of
 event based programming that forces you to save/restore your state
 with every event.

 See the discussion on using reduce().

 I just don't understand it. Let's take the example by Martin Nowak and
 port it to reduce: (The code added as comments is the same code for
 hashes, working with the current API)

 int state; //Hash state;

 void onData(void[] data)
 {
       state = reduce(state, data); //copy(data, state);
       //state = copy(data, state); //also valid, but not necessary
       //state.put(data); //simple way, doesn't work for ranges
 }

 void main()
 {
       state = 0; //state.start();
       auto stream = new EventTcpStream("localhost", 80);
       stream.onData =&onData;
       //auto result = hash.finish();
 }

 There are only 2 differences:

 1:
 the order of the arguments passed to copy and reduce is swapped. This
 kinda makes sense (if copy is interpreted as copyTo). Solution: Provide
 a method copyInto with swapped arguments if consistency is really so
 important.

 2:
 We need an additional call to finish. I can't say it often enough, I
 don't see a sane way to avoid it. Hashes work on blocks, if you didn't
 pass enough data finish will have to fill the rest of the block with
 zeros before you can get the hash value. This operation can't be
 undone. To get a valid result with every call to copy, you'd have to
 always call finish. This is
 * inefficient, you calculate intermediate values you don't need at all
 * you have to copy the hashes state, as you can't continue hashing
    after finish has been called

 and both, the state and the result would have to fit into the one value
 (called seed for reduce). But then it's still not 100% consistent, as
 reduce will return a single value, not some struct including internal
 state.

I'm pretty sure it is possible to pad and finish when a result is 
required without messing up the internal state.

Aug 16 2012

"Regan Heath" <regan netmail.co.nz> writes:

On Thu, 16 Aug 2012 21:25:55 +0100, deadalnix <deadalnix gmail.com> wrot=
e:

 Le 09/08/2012 11:48, Johannes Pfau a =E9crit :
 Am Wed, 08 Aug 2012 12:31:29 -0700
 schrieb Walter Bright<newshound2 digitalmars.com>:

 On 8/8/2012 12:14 PM, Martin Nowak wrote:
 That hardly works for event based programming without using
 coroutines. It's the classical inversion-of-control dilemma of
 event based programming that forces you to save/restore your state
 with every event.

 See the discussion on using reduce().

 I just don't understand it. Let's take the example by Martin Nowak an=


d
 port it to reduce: (The code added as comments is the same code for
 hashes, working with the current API)

 int state; //Hash state;

 void onData(void[] data)
 {
       state =3D reduce(state, data); //copy(data, state);
       //state =3D copy(data, state); //also valid, but not necessary
       //state.put(data); //simple way, doesn't work for ranges
 }

 void main()
 {
       state =3D 0; //state.start();
       auto stream =3D new EventTcpStream("localhost", 80);
       stream.onData =3D&onData;
       //auto result =3D hash.finish();
 }

 There are only 2 differences:

 1:
 the order of the arguments passed to copy and reduce is swapped. This=


 kinda makes sense (if copy is interpreted as copyTo). Solution: Provi=


de
 a method copyInto with swapped arguments if consistency is really so
 important.

 2:
 We need an additional call to finish. I can't say it often enough, I
 don't see a sane way to avoid it. Hashes work on blocks, if you didn'=


t
 pass enough data finish will have to fill the rest of the block with
 zeros before you can get the hash value. This operation can't be
 undone. To get a valid result with every call to copy, you'd have to
 always call finish. This is
 * inefficient, you calculate intermediate values you don't need at al=


l
 * you have to copy the hashes state, as you can't continue hashing
    after finish has been called

 and both, the state and the result would have to fit into the one val=


ue
 (called seed for reduce). But then it's still not 100% consistent, as=


 reduce will return a single value, not some struct including internal=


 state.

 I'm pretty sure it is possible to pad and finish when a result is  =

 required without messing up the internal state.

Without copying it?  AFAICR padding/finishing mutates the state, I mean,=
  =

that's the whole point of it.

R

-- =

Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 17 2012

deadalnix <deadalnix gmail.com> writes:

Le 08/08/2012 11:49, Walter Bright a �crit :
 On 8/8/2012 1:44 AM, Johannes Pfau wrote:
 Am Tue, 07 Aug 2012 17:39:15 -0700
 schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:
 std.hash.hash is a new module for Phobos defining an uniform
 interface for hashes and checksums. It also provides some useful
 helper functions to deal with this new API.

 The hash functions must use a Range interface, not a file interface.

 This is extremely important.

 I guess this is meant as a general statement and not specifically
 targeted at my std.hash proposal?

 Both.

 I'm a little confused as all hashes already are OutputRanges in my
 proposal. It's probably not explicit enough in the documentation, but
 it's mentioned in one example and in the documentation for 'put';

 It should accept an input range. But using an Output Range confuses me.
 A hash function is a reduce algorithm - it accepts a sequence of input
 values, and produces a single value. You should be able to write code like:

 ubyte[] data;
 ...
 auto crc = data.crc32();

 For example, the hash example given is:

 foreach (buffer; file.byChunk(4096 * 1024))
 hash.put(buffer);
 auto result = hash.finish();

 Instead it should be something like:

 auto result = file.byChunk(4096 * 1025).joiner.hash();

 The magic is that any input range that produces bytes could be used, and
 that byte producing input range can be hooked up to the input of any
 reducing function.

 The use of a member finish() is not what any other reduce algorithm has,
 and so the interface is not a general component interface.

 I know the documentation on ranges in Phobos is incomplete and confusing.

 I appreciate the effort and care you're putting into this.

That is a really good point. +1

Aug 08 2012

"Martin Nowak" <dawg dawgfoto.de> writes:

 It should accept an input range. But using an Output Range confuses me.  
 A hash function is a reduce algorithm - it accepts a sequence of input  
 values, and produces a single value. You should be able to write code  
 like:

    ubyte[] data;
    ...
    auto crc = data.crc32();

 For example, the hash example given is:

    foreach (buffer; file.byChunk(4096 * 1024))
        hash.put(buffer);
    auto result = hash.finish();

 Instead it should be something like:

    auto result = file.byChunk(4096 * 1025).joiner.hash();

I think sha1Of/digest!SHA1 should do this.
It's also important to have a stateful hash implementation that can be  
updated incrementally, e.g. from a callback.

Aug 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/8/2012 5:13 AM, Martin Nowak wrote:
 It should accept an input range. But using an Output Range confuses me. A hash
 function is a reduce algorithm - it accepts a sequence of input values, and
 produces a single value. You should be able to write code like:

    ubyte[] data;
    ...
    auto crc = data.crc32();

 For example, the hash example given is:

    foreach (buffer; file.byChunk(4096 * 1024))
        hash.put(buffer);
    auto result = hash.finish();

 Instead it should be something like:

    auto result = file.byChunk(4096 * 1025).joiner.hash();

 I think sha1Of/digest!SHA1 should do this.
 It's also important to have a stateful hash implementation that can be updated
 incrementally, e.g. from a callback.

Take a look at the reduce function in 
http://dlang.org/phobos/std_algorithm.html#reduce

It has provision for an initial state that can be the current running total.

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 11:40:10 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 
 Take a look at the reduce function in 
 http://dlang.org/phobos/std_algorithm.html#reduce
 
 It has provision for an initial state that can be the current running
 total.
 

This can only work if the final state is valid as an initial state.
This is just not true for some hash algorithms.

---
auto sum = reduce!("a + b")(0, range);
auto sum2 = reduce!("a + b")(sum, range2);
---

---
MD5 hash;
hash.start();

auto sum = copy(range, hash);
auto sum2 = copy(range2, sum);

auto result = hash.finish();
---

No where's the difference, except that for hashes the context ('hash')
has to be setup and finished manually?

Aug 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/8/2012 12:08 PM, Johannes Pfau wrote:
 No where's the difference, except that for hashes the context ('hash')
 has to be setup and finished manually?


The idea is to have hash act like a component - not with special added code the 
user has to write.

In this case, it needs to work like a reduce algorithm, because it is a reduce 
algorithm. Need to find a way to make this work.

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 12:27:39 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/8/2012 12:08 PM, Johannes Pfau wrote:
 No where's the difference, except that for hashes the context
 ('hash') has to be setup and finished manually?

 
 
 The idea is to have hash act like a component - not with special
 added code the user has to write.

Please explain that. Nobody's going to simply replace a call to reduce
with a call to a fictional 'hashReduce'. Why is it so important that
reduce + hash API's match 100% even if the API doesn't fit hashes?

 
 In this case, it needs to work like a reduce algorithm, because it is
 a reduce algorithm. Need to find a way to make this work.
 

We could do some ugly, performance killing hacks to make it possible,
but I just don't see why this is necessary.

----
struct InterHash
{
    MD5 ctx;
    ubyte[16] finished;
    alias finished this;
}

InterHash hashReduce(Range)(Range data)
{
    InterHash hash;
    hash.ctx.start();
    return hashReduce(hash, data);
}

InterHash hashReduce(Range)(InterHash hash, Range data)
{
    copy(data, hash);
    auto ctxCopy = hash.ctx;
    hash.finished = ctxCopy.finish();
    return hash;
}

auto a = hashReduce([1,2,3]);
auto b = hashReduce(a, [3,4]);
----

However, a and b are still not really valid hash values. I just don't
see why we should force an interface onto hashes which just doesn't fit.

Aug 09 2012

"David Nadlinger" <see klickverbot.at> writes:

On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:
 The idea is to have hash act like a component - not with 
 special added code the user has to write.

Sorry, but I think this is a meaningless statement without 
specifying what kind of interface the component should adhere to. 
In my opinion, the proposed std.hash design would be a perfectly 
valid interface for »accumulate stuff and at some point get a 
result«-type components.

 In this case, it needs to work like a reduce algorithm, because 
 it is a reduce algorithm. Need to find a way to make this work.

Hash functions are _not_ analogous to reduce(), because the 
operation performed by reduce() is stateless, whereas hash 
functions generally have some internal state.

»Continuing« a reduce() operation by repeatedly calling it with 
the last partial result as the starting value is only possible 
because there is no additional state to carry over. To make this 
work with hashes, you'd have to return something encapsulating 
the internal state from your hash function. But then, you again 
need to obtain the actual result from that return value from that 
result somehow, which defeats the original intent of making it 
work like reduce – and incidentally is what finish() does.

David

Aug 09 2012

"Regan Heath" <regan netmail.co.nz> writes:

On Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger <see klickverbot.at>=
  =

wrote:

 On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:
 The idea is to have hash act like a component - not with special adde=


d  =

 code the user has to write.

 Sorry, but I think this is a meaningless statement without specifying =

 =

 what kind of interface the component should adhere to. In my opinion, =

 =

 the proposed std.hash design would be a perfectly valid interface for =

 =

 =C2=BBaccumulate stuff and at some point get a result=C2=AB-type compo=

nents.
 In this case, it needs to work like a reduce algorithm, because it is=


 a  =

 reduce algorithm. Need to find a way to make this work.

 Hash functions are _not_ analogous to reduce(), because the operation =

 =

 performed by reduce() is stateless, whereas hash functions generally  =

 have some internal state.

 =C2=BBContinuing=C2=AB a reduce() operation by repeatedly calling it w=

ith the last  =

 partial result as the starting value is only possible because there is=

  =

 no additional state to carry over. To make this work with hashes, you'=

d  =

 have to return something encapsulating the internal state from your ha=

sh  =

 function.

This isn't necessarily a problem.

 But then, you again need to obtain the actual result from that return =

 =

 value from that result somehow, which defeats the original intent of  =

 making it work like reduce =E2=80=93 and incidentally is what finish()=

 does.

But, this is a problem.  finish in most cases pads the remaining data to=
 a  =

boundary of the internal state size, then completes one more iteration o=
f  =

the algorithm to produce the final result.

So, like David has just said, you can have 1 or the other.  Either you c=
an  =

chain hashreduce operations together, but you have to perform a manual  =

finish step to get the actual result, or you cannot chain hashreduce  =

operations together and finish is done automatically when the input rang=
e  =

is consumed.

Wild thought.. and I have to admit I've not followed the proposed or  =

suggested API closely, nor have I used ranges extensively so this may no=
t  =

be possible..

If the range/hash object stores the current state and returns this as th=
e  =

result of hashreduce, it would be chainable.  If it also had a "Digest" =
 =

property/method which performed finish on a /temporary copy/ of the stat=
e  =

it would almost be as automatic as reduce.  There would still be a manua=
l  =

step to get the result, but it would be analogous to calling toString on=
  =

any range object to output it's "value".  The Digest property/method wou=
ld  =

not modify the internal state, and could be called at any time between  =

(not sure there is a point to this) or after chained hashreduce operatio=
ns.

R

-- =

Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 09 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 09-Aug-12 14:15, Regan Heath wrote:
 On Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger <see klickverbot.at>

 If the range/hash object stores the current state and returns this as
 the result of hashreduce, it would be chainable.  If it also had a
 "Digest" property/method which performed finish on a /temporary copy/ of
 the state it would almost be as automatic as reduce.  There would still
 be a manual step to get the result, but it would be analogous to calling
 toString on any range object to output it's "value".  The Digest
 property/method would not modify the internal state, and could be called
 at any time between (not sure there is a point to this) or after chained
 hashreduce operations.

struct ShaState
{
	...
	alias ubyte[16] getDidgest();
}

Problem is: too much magic, spoils auto x = reduce(...); idiom.

To be brutally honest I don't see what in std.hash doesn't fit "range 
component" model:
	- it works on input ranges via convenient adaptor (or would work soon)
	- it has output _ranges_.

What's not to like about it? That it doesn't fit general reduce 
algorithm? I'm not convinced it's important. In the same vane you may 
try to shoehorn symmetric cyphers to follow map interface because 
conceptually it does 1:1 conversion.

Now important thing: just look at other output ranges.
They either require extra call (see Appender .data) or horrendously slow 
and ugly see lockingTextWriter/Reader.

-- 
Dmitry Olshansky

Aug 09 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 09-Aug-12 20:32, Dmitry Olshansky wrote:
 On 09-Aug-12 14:15, Regan Heath wrote:
 On Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger <see klickverbot.at>

 If the range/hash object stores the current state and returns this as
 the result of hashreduce, it would be chainable.  If it also had a
 "Digest" property/method which performed finish on a /temporary copy/ of
 the state it would almost be as automatic as reduce.  There would still
 be a manual step to get the result, but it would be analogous to calling
 toString on any range object to output it's "value".  The Digest
 property/method would not modify the internal state, and could be called
 at any time between (not sure there is a point to this) or after chained
 hashreduce operations.

 struct ShaState
 {
      ...
      alias ubyte[16] getDidgest();
 }

Too fast.. should have been:
	ubyte[16] getDidgest();
	alias getDigest this;

-- 
Dmitry Olshansky

Aug 09 2012

"David Nadlinger" <see klickverbot.at> writes:

On Thursday, 9 August 2012 at 16:37:57 UTC, Dmitry Olshansky 
wrote:
 Too fast.. should have been:
 	ubyte[16] getDidgest();
 	alias getDigest this;

I have been thinking about using AliasThis as well, but the 
problem is that precisely the use case this is meant to enable 
(i.e. »snapping together components«, like Walter said) tends 
to get broken in subtle ways due to the use of template 
functions/type inference.

David

Aug 09 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, August 09, 2012 18:46:59 David Nadlinger wrote:
 On Thursday, 9 August 2012 at 16:37:57 UTC, Dmitry Olshansky
 
 wrote:
 Too fast.. should have been:
 ubyte[16] getDidgest();
 alias getDigest this;

 
 I have been thinking about using AliasThis as well, but the
 problem is that precisely the use case this is meant to enable
 (i.e. »snapping together components«, like Walter said) tends
 to get broken in subtle ways due to the use of template
 functions/type inference.

Yeah. alias this can be very useful, but it's very dangerous when it comes to 
templated functions, because it's very easy to make assumptions when writing 
those functions which hold great when using the actual type but do funny 
things when alias this comes into play. And unfortunately, I don't think that 
it's something that's at all well understood. It's probably one of those 
things that we as a community need to get a better grip on. Just imagine how 
bad it would be though if D allowed as many implicit conversions as C++ 
does...

- Jonathan M Davis

Aug 09 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/9/2012 2:59 AM, David Nadlinger wrote:
 On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:
 The idea is to have hash act like a component - not with special added code
 the user has to write.

 Sorry, but I think this is a meaningless statement without specifying what kind
 of interface the component should adhere to. In my opinion, the proposed
 std.hash design would be a perfectly valid interface for »accumulate stuff and
 at some point get a result«-type components.

It is not a meaningless statement in that components have a predictable set of 
methods and properties. That's all a range is. Requiring extra methods means 
there's either an error in the component interface design or an error in the 
component instance design.

What I'm trying to get away from is the C library style where every library 
lives in its own world, and when the user tries to connect them he's got a fair 
amount of work to do building a scaffolding between them.

With component programming, the interfaces between disparate things is 
standardized. It does not have unique methods for different instances. For 
example, one component has a finish() method, another has a getResult() method, 
and a third has no method at all. This situation I wish to avoid.


 In this case, it needs to work like a reduce algorithm, because it is a reduce
 algorithm. Need to find a way to make this work.

 Hash functions are _not_ analogous to reduce(), because the operation performed
 by reduce() is stateless, whereas hash functions generally have some internal
 state.

 »Continuing« a reduce() operation by repeatedly calling it with the last
partial
 result as the starting value is only possible because there is no additional
 state to carry over. To make this work with hashes, you'd have to return
 something encapsulating the internal state from your hash function.

Wouldn't that be simply the handle to the hash?

 But then,
 you again need to obtain the actual result from that return value from that
 result somehow, which defeats the original intent of making it work like reduce
 – and incidentally is what finish() does.

I understand what finish() does. The interesting part is trying to figure a way 
out of needing that method. Or perhaps the reduce component design is incorrect.

Aug 09 2012

travert phare.normalesup.org (Christophe Travert) writes:

If a has is a range, it's an output range, because it's something you   
fee data to. Output range have only one method: put. Johannes used this 
method. But it's not sufficient, you need something to start and to 
finish the hash.

To bring consistency in the library, we should not remove this start and 
finish method. We should make all output range of the library use the 
same functions.

In the library, we have really few output range. We have writable input 
range and we have Appender. Is there more? There should be files, 
socket, maybe even signal, but IIRC these don't implement output range 
at the moment. What did I miss?

Appender doesn't use a finish method, but we have to 'get the 
result' of the appender, and for this we use appender.data. This name is 
not appropriate for generically getting a result or terminating an 
output range.

So We need a name that fits most output range use. start/finish sounds 
not bad. open/close fits files and socket, but many not all output 
ranges. Relying solely on constructors, opCall or alias this seems 
dangerous to me.

-- 
Christophe

Aug 09 2012

Walter Bright <newshound2 digitalmars.com> writes:

See the new thread Andrei started entitled "finish function for output ranges". 
I think this discussion has clearly discovered a shortcoming in the current 
range design, and Andrei has a proposed solution.

Aug 11 2012

"Kagamin" <spam here.lot> writes:

On Thursday, 9 August 2012 at 09:59:48 UTC, David Nadlinger wrote:
 In this case, it needs to work like a reduce algorithm, 
 because it is a reduce algorithm. Need to find a way to make 
 this work.

 Hash functions are _not_ analogous to reduce(), because the 
 operation performed by reduce() is stateless, whereas hash 
 functions generally have some internal state.

An example of stateless hash in .net:
http://msdn.microsoft.com/en-us/library/xa627k19.aspx

Aug 15 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 15-Aug-12 11:41, Kagamin wrote:
 On Thursday, 9 August 2012 at 09:59:48 UTC, David Nadlinger wrote:
 In this case, it needs to work like a reduce algorithm, because it is
 a reduce algorithm. Need to find a way to make this work.

 Hash functions are _not_ analogous to reduce(), because the operation
 performed by reduce() is stateless, whereas hash functions generally
 have some internal state.

 An example of stateless hash in .net:
 http://msdn.microsoft.com/en-us/library/xa627k19.aspx

AFAIK it'a method of HashAlgorithm Object.

http://msdn.microsoft.com/en-us/library/c06s9c55

It also includes TransformBlock & TransformFinalBlock. It does contain 
state of course.

Aug 15 2012

"Kagamin" <spam here.lot> writes:

On Wednesday, 15 August 2012 at 08:13:27 UTC, Dmitry Olshansky 
wrote:
 AFAIK it'a method of HashAlgorithm Object.

It's a minor design detail, see the example: the method is called 
on each file without any explicit preparations and without calls 
to methods like TransformBlock. That's how stateless computation 
usually looks - it's just done and that's all.

Aug 15 2012

"David Nadlinger" <see klickverbot.at> writes:

On Wednesday, 15 August 2012 at 08:17:14 UTC, Kagamin wrote:
 On Wednesday, 15 August 2012 at 08:13:27 UTC, Dmitry Olshansky 
 wrote:
 AFAIK it'a method of HashAlgorithm Object.

 It's a minor design detail, see the example […]

No, it's not a »minor design detail«, at least not regarding 
what has been the topic of the discussion here – you can always 
provide a simple wrapper function in the proposed design and call 
it »stateless« as well (in fact, an implementation has already 
been posted, IIRC).

The point is that the ability to execute a hashing operation 
block by block is necessary, and that this operation is not 
analogous to reduce() because it potentially needs internal state.

David

Aug 15 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 15-Aug-12 12:17, Kagamin wrote:
 On Wednesday, 15 August 2012 at 08:13:27 UTC, Dmitry Olshansky wrote:
 AFAIK it'a method of HashAlgorithm Object.

 It's a minor design detail, see the example: the method is called on
 each file without any explicit preparations and without calls to methods
 like TransformBlock. That's how stateless computation usually looks -
 it's just done and that's all.

Brrr. It's how convenience wrapper works :)

And I totally expect this to call the same code and keep the same state 
during the work.

E.g. see std.digest.digest functions digest or hexDigest you could call 
it stateless in the same vane.

Aug 15 2012

"Kagamin" <spam here.lot> writes:

On Wednesday, 15 August 2012 at 08:25:51 UTC, Dmitry Olshansky 
wrote:
 Brrr. It's how convenience wrapper works :)

 And I totally expect this to call the same code and keep the 
 same state during the work.

 E.g. see std.digest.digest functions digest or hexDigest you 
 could call it stateless in the same vane.

Well there was a wish for stateless hash, Walter even posted the 
required interface:
auto result = file.byChunk(4096 * 1025).joiner.hash();

I just pointed out, that possibly stateful implementation doesn't 
prevent stateless interface. Can one even say that the 
implementation is stateful given just a stateless interface?

One can even call reduce stateful because it does keep track of 
the result which is or a part of its state.

Aug 15 2012

"David Nadlinger" <see klickverbot.at> writes:

On Wednesday, 15 August 2012 at 08:45:35 UTC, Kagamin wrote:
 I just pointed out, that possibly stateful implementation 
 doesn't prevent stateless interface. Can one even say that the 
 implementation is stateful given just a stateless interface?

And our point is that such an interface is trivial to implement 
over a »stateful« interface, and even already exists. Maybe you 
want to peruse some of the old posts?

David

Aug 15 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 15-Aug-12 12:45, Kagamin wrote:
 On Wednesday, 15 August 2012 at 08:25:51 UTC, Dmitry Olshansky wrote:
 Brrr. It's how convenience wrapper works :)

 And I totally expect this to call the same code and keep the same
 state during the work.

 E.g. see std.digest.digest functions digest or hexDigest you could
 call it stateless in the same vane.

 Well there was a wish for stateless hash, Walter even posted the
 required interface:
 auto result = file.byChunk(4096 * 1025).joiner.hash();

auto result = file.byChunk(4096 * 1025).joiner.digest();

and is already supported in the proposal, peek at updated docs.

There is no need for additional methods and whatnot.

-- 
Olshansky Dmitry

Aug 15 2012

"David Nadlinger" <see klickverbot.at> writes:

On Wednesday, 15 August 2012 at 07:41:20 UTC, Kagamin wrote:
 On Thursday, 9 August 2012 at 09:59:48 UTC, David Nadlinger 
 wrote:
 Hash functions are _not_ analogous to reduce(), because the 
 operation performed by reduce() is stateless, whereas hash 
 functions generally have some internal state.

 An example of stateless hash in .net:
 http://msdn.microsoft.com/en-us/library/xa627k19.aspx

http://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm.state

David

Aug 15 2012

"Kagamin" <spam here.lot> writes:

On Wednesday, 15 August 2012 at 08:17:01 UTC, David Nadlinger 
wrote:
 On Wednesday, 15 August 2012 at 07:41:20 UTC, Kagamin wrote:
 On Thursday, 9 August 2012 at 09:59:48 UTC, David Nadlinger 
 wrote:
 Hash functions are _not_ analogous to reduce(), because the 
 operation performed by reduce() is stateless, whereas hash 
 functions generally have some internal state.

 An example of stateless hash in .net:
 http://msdn.microsoft.com/en-us/library/xa627k19.aspx

 http://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm.state

 David

Ok, but HashAlgorithm still supports stateless interface which 
consists of a single method with a couple of overloads, the 
example in the ComputeHash article speaks for itself.

Aug 15 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 02:49:00 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 
 It should accept an input range. But using an Output Range confuses
 me. A hash function is a reduce algorithm - it accepts a sequence of
 input values, and produces a single value. You should be able to
 write code like:
 
    ubyte[] data;
    ...
    auto crc = data.crc32();

auto crc = crc32Of(data);
auto crc = data.crc32Of(); //ufcs

This doesn't wok with every InputRange and this needs to be fixed.
That's a quite simple fix (max 10 lines of code, one new overload) and
not a inherent problem of the API (see below for more).

 
 For example, the hash example given is:
 
    foreach (buffer; file.byChunk(4096 * 1024))
        hash.put(buffer);
    auto result = hash.finish();
 
 Instead it should be something like:
 
    auto result = file.byChunk(4096 * 1025).joiner.hash();

But it also says this:
//As digests implement OutputRange, we could use std.algorithm.copy
//Let's do it manually for now

You can basically do this with a range interface in 1 line:
----
import std.algorithm : copy;

auto result = copy(file.byChunk(4096 * 1024), hash).finish();
----
or with ufcs:
----
auto result = file.byChunk(4096 * 1024).copy(hash).finish();
----

OK, you have to initialize hash and you have to call finish. With a new
overload for digest it's as simple as this:
----
auto result = file.byChunk(4096 * 1024).digest!CRC32();
auto result = file.byChunk(4096 * 1024).crc32Of(); //with alias
----

The digests are OutputRanges, you can write data to them. There's
absolutely no need to make them InputRanges as they only produce 1
value, and the hash sum is produced at once, so there's no way to
receive the result in a partial way. A digest is very similar to
Appender and it's .data property in this regard.

The put function could accept an InputRange, but I think there was a
thread recently which said this is evil for OutputRanges as the same
feature can be achieved with copy.

There's also no big benefit in doing it that way. If your InputRange is
really unbuffered you could avoid double buffering. But then you
transfer data byte by byte which will be horribly slow.
If your InputRange has an internal buffer copy should just copy from
that internal buffer to the 64 byte buffer used inside the digest
implementation.
This double buffering could only be avoided if the put function
accepted an InputRange and could supply a buffer for that InputRange so
the InputRange could write directly into the 64 byte buffer. But
there's nothing like that in phobos, so this is all speculation.

(Also the internal buffer is only used for the first 64 bytes (or less)
of the supplied data. The rest is processed without copying. It could
probably be optimized so that there's absolutely no copying as long as
the input buffer length is a multiple of 64)

 
 The magic is that any input range that produces bytes could be used,
 and that byte producing input range can be hooked up to the input of
 any reducing function.

See above. Every InputRange with byte element type does work. You just
have to use copy.

 
 The use of a member finish() is not what any other reduce algorithm
 has, and so the interface is not a general component interface.

It's a struct with state, not a simple reduce function so it needs that
finish member. It works like that way in every other language (and this
is not cause those languages don't have ranges; streams and iterators


Let's take a real world example: You want to download a huge file with
std.net.curl and hash it on the fly. Completely reading into a buffer
is not possible (large file!). Now std.net.curl has a callback
interface (which is forced on us by libcurl). How would you map that
into an InputRange? (The byLine range in std.net.curl is eager,
byLineAsync needs an additional thread). A newbie trying to do that
will despair as it would work just fine in every other language, but
D forces that InputRange interface.

Implementing it as an OutputRange is much better. The described
scenario works fine and hashing an InputRange also works fine - just
use copy. OutputRange is much more universal for this usecase.

However, I do agree digest!Hash, md5Of, sha1Of should have an additional
overload which takes a InputRange. It would be implemented with copy
and be a nice convenience function.

 
 I know the documentation on ranges in Phobos is incomplete and
 confusing.

Especially for copy, as the documentation doesn't indicate the line I
posted could work in any way ;-)

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 8 Aug 2012 17:50:33 +0200
schrieb Johannes Pfau <nospam example.com>:

 However, I do agree digest!Hash, md5Of, sha1Of should have an
 additional overload which takes a InputRange. It would be implemented
 with copy and be a nice convenience function.

I implemented the function, it's actually quite simple:
----
digestType!Hash digestRange(Hash, Range)(Range data) if(isDigest!Hash &&
    isInputRange!Range && __traits(compiles,
    digest!Hash(ElementType!(Range).init)))
{
    Hash hash;
    hash.start();
    copy(data, hash);
    return hash.finish();
}
----

but I don't know how make it an overload. See thread "overloading a
function taking a void[][]" in D.learn for details.

Aug 08 2012

travert phare.normalesup.org (Christophe Travert) writes:

Johannes Pfau , dans le message (digitalmars.D:174478), a écrit :
 but I don't know how make it an overload. See thread "overloading a
 function taking a void[][]" in D.learn for details.

Don't overload the function taking a void[][]. Remplace it. void[][] is 
a range of void[].

Aug 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/8/2012 9:47 AM, Johannes Pfau wrote:
 Am Wed, 8 Aug 2012 17:50:33 +0200
 schrieb Johannes Pfau <nospam example.com>:

 However, I do agree digest!Hash, md5Of, sha1Of should have an
 additional overload which takes a InputRange. It would be implemented
 with copy and be a nice convenience function.

 I implemented the function, it's actually quite simple:
 ----
 digestType!Hash digestRange(Hash, Range)(Range data) if(isDigest!Hash &&
      isInputRange!Range && __traits(compiles,
      digest!Hash(ElementType!(Range).init)))
 {
      Hash hash;
      hash.start();
      copy(data, hash);
      return hash.finish();
 }
 ----

The finish() should be implicit when the range ends.

 but I don't know how make it an overload. See thread "overloading a
 function taking a void[][]" in D.learn for details.

I don't know what you mean, it takes a range, not a void[][] as input.

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 11:47:38 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/8/2012 9:47 AM, Johannes Pfau wrote:
 Am Wed, 8 Aug 2012 17:50:33 +0200
 schrieb Johannes Pfau <nospam example.com>:

 However, I do agree digest!Hash, md5Of, sha1Of should have an
 additional overload which takes a InputRange. It would be
 implemented with copy and be a nice convenience function.

 I implemented the function, it's actually quite simple:
 ----
 digestType!Hash digestRange(Hash, Range)(Range data)
 if(isDigest!Hash && isInputRange!Range && __traits(compiles,
      digest!Hash(ElementType!(Range).init)))
 {
      Hash hash;
      hash.start();
      copy(data, hash);
      return hash.finish();
 }
 ----

 
 The finish() should be implicit when the range ends.
 
 but I don't know how make it an overload. See thread "overloading a
 function taking a void[][]" in D.learn for details.

 
 I don't know what you mean, it takes a range, not a void[][] as input.
 
 

So the post in D.learn for a detailed description. Yes the code I
posted takes a range, but digest (as it is now) takes void[][] to
accept all kind of types _without_ template bloat. The difficulty is to
combine those two overloads without causing unnecessary template bloat.

Aug 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/8/2012 12:05 PM, Johannes Pfau wrote:
 So the post in D.learn for a detailed description. Yes the code I
 posted takes a range, but digest (as it is now) takes void[][] to
 accept all kind of types _without_ template bloat. The difficulty is to
 combine those two overloads without causing unnecessary template bloat.

Have the templated version with overloads simply call the single version (with
a 
different name) with void[][].

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 12:30:31 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/8/2012 12:05 PM, Johannes Pfau wrote:
 So the post in D.learn for a detailed description. Yes the code I
 posted takes a range, but digest (as it is now) takes void[][] to
 accept all kind of types _without_ template bloat. The difficulty
 is to combine those two overloads without causing unnecessary
 template bloat.

 
 Have the templated version with overloads simply call the single
 version (with a different name) with void[][].
 
 

Well that's possible, but I don't like the template bloat it causes.
AFAIK a function taking a void[][] is just one instance, with that
redirecting approach we'll have one instance per array type. This seems
unnecessary (and maybe the compiler can merge such template instances
in the future), but I can't seem to find a way to avoid it, so we'll
probably have to live with that.

http://dpaste.dzfl.pl/f86717f7

I guess a second function digestRange is not acceptable?

Aug 09 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/9/2012 2:05 AM, Johannes Pfau wrote:
 I guess a second function digestRange is not acceptable?

It's more the user API that matters, not how it works under the hood.

Aug 09 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/9/2012 2:05 AM, Johannes Pfau wrote:
 http://dpaste.dzfl.pl/f86717f7

The Range argument - is it an InputRange, an OutputRange? While it's just a
type 
name, the name should reflect what kind of range it is from the menagerie of 
ranges in std.range.

Aug 09 2012

Johannes Pfau <nospam example.com> writes:

Am Thu, 09 Aug 2012 02:13:10 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 8/9/2012 2:05 AM, Johannes Pfau wrote:
 http://dpaste.dzfl.pl/f86717f7

 
 The Range argument - is it an InputRange, an OutputRange? While it's
 just a type name, the name should reflect what kind of range it is
 from the menagerie of ranges in std.range.
 

It's an InputRange (of bytes) or an InputRange of some byte buffer
(ElementType == ubyte[] || ElementType == ubyte[num]). We get the
second version for free, so I just included it ;-)

The documentation would have to make that clear of course. I could also
change the name, it's just a proof of concept right now.

Aug 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/9/12 5:05 AM, Johannes Pfau wrote:
 Well that's possible, but I don't like the template bloat it causes.

What have you measured, and what is your dislike based upon?

The library function must be generic. Then users worried about bloating 
may use it with a limited number of types.

A digest function only dealing with void[void[]] is unacceptable.


Andrei

Aug 09 2012

Johannes Pfau <nospam example.com> writes:

Am Thu, 09 Aug 2012 08:48:37 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/9/12 5:05 AM, Johannes Pfau wrote:
 Well that's possible, but I don't like the template bloat it causes.

 
 What have you measured, and what is your dislike based upon?

What annoys me is that as long the function only supported arrays, it
didn't need templates _at all_. So template bloat for arrays = 0. But
adding range support means the version dealing with arrays now has to
be a template as well(which is probably a bug, can't overload template
and non template function) and will produce extra code for every array
type. I just think adding range support shouldn't cause the array code
to change in any way.

But that's why I said I don't like it, not that it's a show stopper.
The overhead is probably neglectable, but in theory it shouldn't be
there at all.

 
 The library function must be generic. Then users worried about
 bloating may use it with a limited number of types.
 
 A digest function only dealing with void[void[]] is unacceptable.

Sure I agree that range support is necessary, I just forgot to
implement it initially. I'm not against range support / ranges in
general / the template instances needed for ranges. I just dislike
that it affects the array implementation in this specific case.

Aug 09 2012

Jacob Carlborg <doob me.com> writes:

On 2012-08-09 15:02, Johannes Pfau wrote:

 What annoys me is that as long the function only supported arrays, it
 didn't need templates _at all_. So template bloat for arrays = 0. But
 adding range support means the version dealing with arrays now has to
 be a template as well(which is probably a bug, can't overload template
 and non template function) and will produce extra code for every array
 type. I just think adding range support shouldn't cause the array code
 to change in any way.

A workaround is to make the non-template function to a template, with no 
arguments. This should only cause one instantiation:

void foo (T) (T t) if (/* some constraint making it not match "int" */);
void foo () (int x);

-- 
/Jacob Carlborg

Aug 09 2012

Denis Shelomovskij <verylonglogin.reg gmail.com> writes:

The question about module names.

Is it supposed that e.g. std.hash.crc module will contain many CRC 
implementations, not only one CRC-32? If not, it will be better to call 
it std.hash.crc32 because other CRC variants are also in use. Or even 
std.hash.crc.crc32.

Same with std.hash.sha and std.hash.md modules.

-- 
Денис В. Шеломовский
Denis V. Shelomovskij

Aug 08 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 08 Aug 2012 11:48:54 +0400
schrieb Denis Shelomovskij <verylonglogin.reg gmail.com>:

 The question about module names.
 
 Is it supposed that e.g. std.hash.crc module will contain many CRC 
 implementations, not only one CRC-32? If not, it will be better to
 call it std.hash.crc32 because other CRC variants are also in use. Or
 even std.hash.crc.crc32.
 
 Same with std.hash.sha and std.hash.md modules.
 

They're supposed to contain more implementations in the future. I
basically just wrote the new API, then ported what we already had in
phobos (and in a pull request by redstar) to the new API. The rest of
the SHA functions could probably follow soon as Piotr Szturmaj already
has a properly licensed implementation. The Boost project has a
templated CRC implementation, we could port that as well.

One problem is that the current MD5 and SHA implementations have been
manually optimized for dmds inliner though, so those probably won't be
replaced by a more generic version until we have benchmarks in phobos
which ensure that there won't be performance regressions.

Aug 08 2012

travert phare.normalesup.org (Christophe Travert) writes:

I'm not familiar with hash functions in general.

I think the core of std.hash is the digest function:

digestType!Hash digest(Hash)(scope const(void[])[] data...) 
if(isDigest!Hash)
{
    Hash hash;
    hash.start();
    foreach(datum; data)
        hash.put(cast(const(ubyte[]))datum);
    return hash.finish();
}

That seems to be too restrictive: you can only provide a void[][] or one 
or several void[], but you should be able to give it any range of 
void[] or of ubyte[] like:

auto dig = file.byChunk.digest!MD5;

That's the point of the range interface.

this can be done by templatizing the function, something like 
(untested):

template digest(Hash) if(isDigest!Hash)
{
  auto digest(R)(R data)
    if (isInputRange!R &&  is(ElementType!R : void[])
  {
    Hash hash;
    hash.start();
    data.copy(hash);
    return hash.finish();
  }
}

An interesting overload for range of single ubyte could be provided. 
This overload would fill a buffer of with data from this range,  
feed the hash, and start again.

Aug 08 2012

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:
 std.hash.hash is a new module for Phobos defining an uniform 
 interface for hashes and checksums. It also provides some 
 useful helper functions to deal with this new API.

Is it too late to ask to include MurmurHash 2 and/or 3? It's 
public domain, and great for things like hash tables.

You can steal some code from here:
https://github.com/CyberShadow/ae/blob/master/utils/digest.d
https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d

Aug 09 2012

Johannes Pfau <nospam example.com> writes:

Am Thu, 09 Aug 2012 11:32:34 +0200
schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>:

 On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:
 std.hash.hash is a new module for Phobos defining an uniform 
 interface for hashes and checksums. It also provides some 
 useful helper functions to deal with this new API.

 
 Is it too late to ask to include MurmurHash 2 and/or 3? It's 
 public domain, and great for things like hash tables.
 
 You can steal some code from here:
 https://github.com/CyberShadow/ae/blob/master/utils/digest.d
 https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d
 

To be honest I didn't even know that MurmurHash can be used
incrementally. I could port that code soon, but I think it's
best to do it after the review.
After we have formalized a common API adding new hashes probably won't
require a full review. It should be possible to do that as a pull
request.

Aug 09 2012

"Regan Heath" <regan netmail.co.nz> writes:

On Thu, 09 Aug 2012 10:58:10 +0100, Johannes Pfau <nospam example.com>  
wrote:

 Am Thu, 09 Aug 2012 11:32:34 +0200
 schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>:

 On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:
 std.hash.hash is a new module for Phobos defining an uniform
 interface for hashes and checksums. It also provides some
 useful helper functions to deal with this new API.

 Is it too late to ask to include MurmurHash 2 and/or 3? It's
 public domain, and great for things like hash tables.

 You can steal some code from here:
 https://github.com/CyberShadow/ae/blob/master/utils/digest.d
 https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d

 To be honest I didn't even know that MurmurHash can be used
 incrementally. I could port that code soon, but I think it's
 best to do it after the review.
 After we have formalized a common API adding new hashes probably won't
 require a full review. It should be possible to do that as a pull
 request.

Once the API is formalised I can contribute the hashes I have also :)

R


-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 09 2012

Johannes Pfau <nospam example.com> writes:

Am Thu, 09 Aug 2012 11:16:36 +0100
schrieb "Regan Heath" <regan netmail.co.nz>:

 
 Once the API is formalised I can contribute the hashes I have also :)
 

great! with all those contributions we'll probably have a rather
complete set of digests soon.

Aug 09 2012

Johannes Pfau <nospam example.com> writes:

I implemented some of the suggestions, here's the list of changes:

Changelog:
* Add a new overload to the 'digest' function which accepts an
  InputRange
* Add a new convenience function 'hexDigest' which works just like
  'digest', but returns a string (also works with InputRanges)

  An open question is whether md5StringOf/md5HexOf aliases should be
  added (similar to md5Of)?
* Add a new convenience function 'startDigest' which returns an
  initialized digest
* New example for file hashing in idiomatic D
* Documented that Digests are always OutputRanges
* Added new examples using std.algorithm.copy & OutputRange interface
* Small optimization in toHexString & hexDigest: do not allocate if
  possible

TODO:
* move the package to std.digest (unless there are objections):
    std.hash.hash --> std.digest.digest
    std.hash.md   --> std.digest.md
    std.hash.sha  --> std.digest.sha
    std.hash.crc  --> std.digest.crc

* make sure the docs are consistent regarding names (digest vs. hash)


Code:
https://github.com/jpf91/phobos/tree/newHash/std/hash
https://github.com/jpf91/phobos/compare/master...newHash

Docs:
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_md.html
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_sha.html
http://dl.dropbox.com/u/24218791/d/phobos/std_hash_crc.html

Aug 10 2012

Johannes Pfau <nospam example.com> writes:

Changelog:
* moved the package to std.digest:
    std.hash.hash --> std.digest.digest
    std.hash.md   --> std.digest.md
    std.hash.sha  --> std.digest.sha
    std.hash.crc  --> std.digest.crc

* make sure the docs are consistent regarding names (digest vs. hash)


Code: (location changed!)
https://github.com/jpf91/phobos/tree/newHash/std/digest
https://github.com/jpf91/phobos/compare/master...newHash

Docs: (location changed!)
http://dl.dropbox.com/u/24218791/d/phobos/std_digest_digest.html
http://dl.dropbox.com/u/24218791/d/phobos/std_digest_md.html
http://dl.dropbox.com/u/24218791/d/phobos/std_digest_sha.html
http://dl.dropbox.com/u/24218791/d/phobos/std_digest_crc.html

Aug 20 2012

"Jesse Phillips" <jessekphillips+D gmail.com> writes:

All this discussion on the use of auto in the docs made me notice 
something else about the docs I missed.

I like how ranges are documented and think digest could do the 
same. Instead of an ExampleDigest, just write the details under 
isDigest.

I don't see a need for template the constraint example (D idiom).

This would require changing examples which use ExampleDigest, but 
maybe that should happen anyway since it doesn't exist.

I don't see a reason to change my vote because of this, its all 
documentation.

Aug 28 2012

Johannes Pfau <nospam example.com> writes:

Am Wed, 29 Aug 2012 04:57:32 +0200
schrieb "Jesse Phillips" <jessekphillips+D gmail.com>:

 All this discussion on the use of auto in the docs made me notice 
 something else about the docs I missed.
 
 I like how ranges are documented and think digest could do the 
 same. Instead of an ExampleDigest, just write the details under 
 isDigest.

I had a look at how std.range documents the range interfaces, but the
std.digest API forces more details on the implementation
( trusted, exact parameter types for put, return type of finish,...)
so I think simply writing a text paragraph could get confusing.

But if someone posts a pull request which replaces the ExampleDigest
with something else I'm all for it.
 
 I don't see a need for template the constraint example (D idiom).
 
 This would require changing examples which use ExampleDigest, but 
 maybe that should happen anyway since it doesn't exist.

Yes, I'll change the examples (this also makes them runnable in theory.
Although I have not found any documentation about making examples
runnable on dlang.org)
 
 I don't see a reason to change my vote because of this, its all 
 documentation.

Sep 04 2012

D Programming

C/C++ Programming

Other

digitalmars.D - The review of std.hash package