digitalmars.D - New hash API: Update

Johannes Pfau (44/44) Jun 24 2012 I'm mostly finished with my hash API proposal. I also ported the

Piotr Szturmaj (3/11) Jun 24 2012 I vote for std.crypto.hash and std.uuid. CRC, Adler and others could fit...

Johannes Pfau (6/20) Jun 24 2012 We could do this as well. Then we also have to decide whether we want

Masahiro Nakagawa (4/16) Jun 24 2012 I disagree this point. The name 'util' does not make any sense.

David (2/4) Jun 24 2012 Right `util` can be everything.

Johannes Pfau (7/14) Jun 24 2012 Yep that's the idea. You put everything in there which is not important

Dmitry Olshansky (8/16) Jun 24 2012 To be frank almost anything could 'nicely fit' into util. That's why we

David Nadlinger (5/7) Jun 24 2012 I tend to agree. An »util« package might make sense in an

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (30/74) Jun 24 2012 I am strongly against this. Java's 'java.util' package should be a fair

Johannes Pfau (10/14) Jun 25 2012 I still don't understand pure on member functions completely. The this

Jonathan M Davis (45/63) Jun 25 2012 s

Dmitry Olshansky (30/42) Jun 24 2012 I believe there ought to be CRC32 with *any* kind of license on the web.

Johannes Pfau (16/70) Jun 25 2012 OK

Jonathan M Davis (12/38) Jun 24 2012 No, no, no, no, no. util is _useless_ as a name. _Everything_ in Phobos ...

Johannes Pfau (5/50) Jun 25 2012 OK, OK I'm convinced.

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (12/62) Jun 25 2012 That, or make sure that your changes get into 2.060 (the changes aren't

Johannes Pfau (8/8) Jun 25 2012 OK, so I understand std.util is probably not a good idea.

Jonathan M Davis (6/16) Jun 25 2012 The previous discussions on this resulted in us going with std.hash.md5,...

Piotr Szturmaj (12/27) Jun 25 2012 IMHO crypto should be chosen because beside of hashes there are other

Jacob Carlborg (6/37) Jun 25 2012 Can't we have two namespaces, one for checksums and one for the rest. Or...
Jonathan M Davis (15/46) Jun 25 2012 Except that the same hashes could be used for either checksums or crypto...

Felix Hufnagel (27/84) Jun 25 2012 +1 for

Dmitry Olshansky (5/12) Jun 25 2012 Fixed :)
nazriel (2/11) Jun 25 2012 I couldn't agree more.
Jesse Phillips (2/10) Jun 25 2012 I'd be for not being so flat.

Don Clugston (10/21) Jun 29 2012 I reckon, follow biology.

Piotr Szturmaj (5/27) Jun 29 2012 I'd not generalize that much. Sometimes two levels are enough, sometimes...

Johannes Pfau <nospam example.com> writes:

I'm mostly finished with my hash API proposal. I also ported the
existing crc, md5 and the proposed sha1 hash to this new API.

I changed the namespace to std.util.digest. Andrei once said he thinks
std.digest/std.hash is a too narrow package and someone else said
putting crc into std.crypto.digest is ridiculous. So I did what tango
and other libraries do and created a std.util module.

I think std.uuid would also fit well into std.util so it'd become
std.util.uuid.

Here's the documentation:
http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html
http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html
http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html
http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html

And here's a pull request for the code:
https://github.com/D-Programming-Language/phobos/pull/646

Github branch:
https://github.com/jpf91/phobos/tree/newHash



There are still some open questions:

OOP interface: Digest.finish()

This can only throw if the supplied buffer is too small. Make this
nothrow & throw an Error on too small buffer? Or check buffer only in
debug mode using asserts or preconditions?



CRC32:
The current implementation doesn't seem to be compliant
to the 'common' CRC-32-IEEE 802.3 form, at least it doesn't pass these
test vectors:
http://www.febooti.com/products/filetweak/members/hash-and-crc/test-vectors/
http://www.lammertbies.nl/comm/info/crc-calculation.html
http://rosettacode.org/wiki/CRC-32

Turns out we'd need to invert the value and make sure to use LSB first
order. I made those changes, so now we produce the same output as the
rest of the world, but the new std.util.digest.crc will produce
different values than the old std.crc now. Is this OK?

I'm also not happy about the crc license:
provided that the above copyright notice appear in all copies and
 * that both that copyright notice and this permission notice appear
 * in supporting documentation. 


I'll put all my modifications under boost license. The only thing left
of the original code is the crc32_table and the implementation of the
put function.

The table and the one line of code is also available as public domain
code here:
http://www.csbruce.com/~csbruce/software/crc32.c

So I think it should be possible to change the license to boost?

Jun 24 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Johannes Pfau wrote:
 I'm mostly finished with my hash API proposal. I also ported the
 existing crc, md5 and the proposed sha1 hash to this new API.

 I changed the namespace to std.util.digest. Andrei once said he thinks
 std.digest/std.hash is a too narrow package and someone else said
 putting crc into std.crypto.digest is ridiculous. So I did what tango
 and other libraries do and created a std.util module.

 I think std.uuid would also fit well into std.util so it'd become
 std.util.uuid.

I vote for std.crypto.hash and std.uuid. CRC, Adler and others could fit 
in std.checksum.

Jun 24 2012

Johannes Pfau <nospam example.com> writes:

Am Sun, 24 Jun 2012 18:07:53 +0200
schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:

 Johannes Pfau wrote:
 I'm mostly finished with my hash API proposal. I also ported the
 existing crc, md5 and the proposed sha1 hash to this new API.

 I changed the namespace to std.util.digest. Andrei once said he
 thinks std.digest/std.hash is a too narrow package and someone else
 said putting crc into std.crypto.digest is ridiculous. So I did
 what tango and other libraries do and created a std.util module.

 I think std.uuid would also fit well into std.util so it'd become
 std.util.uuid.

 
 I vote for std.crypto.hash and std.uuid. CRC, Adler and others could
 fit in std.checksum.

We could do this as well. Then we also have to decide whether we want
to have a common API for std.checksum and std.crypto.hash. And we have
to decide where to put the common parts, those that are in
std.util.digest.digest right now.

Jun 24 2012

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Sunday, 24 June 2012 at 15:23:19 UTC, Johannes Pfau wrote:
 I'm mostly finished with my hash API proposal. I also ported the
 existing crc, md5 and the proposed sha1 hash to this new API.

Great! I will read docs and souce code later.

 I changed the namespace to std.util.digest. Andrei once said he 
 thinks
 std.digest/std.hash is a too narrow package and someone else 
 said
 putting crc into std.crypto.digest is ridiculous. So I did what 
 tango
 and other libraries do and created a std.util module.

 I think std.uuid would also fit well into std.util so it'd 
 become
 std.util.uuid.

I disagree this point. The name 'util' does not make any sense.
'util' seems to be very subjective.

Jun 24 2012

David <d dav1d.de> writes:

Am 24.06.2012 18:13, schrieb Masahiro Nakagawa:
 I disagree this point. The name 'util' does not make any sense.
 'util' seems to be very subjective.

Right `util` can be everything.

Jun 24 2012

Johannes Pfau <nospam example.com> writes:

Am Sun, 24 Jun 2012 18:21:06 +0200
schrieb David <d dav1d.de>:

 Am 24.06.2012 18:13, schrieb Masahiro Nakagawa:
 I disagree this point. The name 'util' does not make any sense.
 'util' seems to be very subjective.

 
 Right `util` can be everything.
 

Yep that's the idea. You put everything in there which is not important
enough to be in the std. namespace, but which also doesn't fit into
another submodule either.

For example I don't think uuid should be in the top-level std namespace.
But where to put it then?

Jun 24 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 24-Jun-12 19:23, Johannes Pfau wrote:
 I'm mostly finished with my hash API proposal. I also ported the
 existing crc, md5 and the proposed sha1 hash to this new API.

 I changed the namespace to std.util.digest. Andrei once said he thinks
 std.digest/std.hash is a too narrow package and someone else said
 putting crc into std.crypto.digest is ridiculous. So I did what tango
 and other libraries do and created a std.util module.

 I think std.uuid would also fit well into std.util so it'd become
 std.util.uuid.


To be frank almost anything could 'nicely fit' into util. That's why we 
should avoid this abomination at all costs.
std.crypto
or
std.checksum

-- 
Dmitry Olshansky

Jun 24 2012

"David Nadlinger" <see klickverbot.at> writes:

On Sunday, 24 June 2012 at 18:10:03 UTC, Dmitry Olshansky wrote:
 To be frank almost anything could 'nicely fit' into util. 
 That's why we should avoid this abomination at all costs.

I tend to agree. An »util« package might make sense in an 
application, where it could e.g. hold smaller self-contained 
helper modules, but not so much in a library.

David

Jun 24 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 24-06-2012 17:23, Johannes Pfau wrote:
 I'm mostly finished with my hash API proposal. I also ported the
 existing crc, md5 and the proposed sha1 hash to this new API.

Thanks.

 I changed the namespace to std.util.digest. Andrei once said he thinks
 std.digest/std.hash is a too narrow package and someone else said
 putting crc into std.crypto.digest is ridiculous. So I did what tango
 and other libraries do and created a std.util module.

I am strongly against this. Java's 'java.util' package should be a fair 
indicator of why; text processing, collections, date/time, ...

 I think std.uuid would also fit well into std.util so it'd become
 std.util.uuid.

I really really don't like where that is going. A package with a name 
like 'util' basically means /nothing/. It's as good as having this in 
the 'std' package in the first place. I understand that you want to 
group things into proper packages (I do this heavily in my projects), 
but we need a better name than 'util'. I really don't think there's 
anything wrong with just plain 'std.hash', especially since we're likely 
to add more algorithms over time.

 Here's the documentation:
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html

These modules aren't quite consistent: There exist many versions of 
SHA-1 and so the module is named 'sha'. Makes sense. There are several 
MD algorithms, but the module is named 'md5'?

I think we should either have a module per specific algorithm or modules 
with neutral names like 'md'.

 And here's a pull request for the code:
 https://github.com/D-Programming-Language/phobos/pull/646

 Github branch:
 https://github.com/jpf91/phobos/tree/newHash



 There are still some open questions:

 OOP interface: Digest.finish()

 This can only throw if the supplied buffer is too small. Make this
 nothrow & throw an Error on too small buffer? Or check buffer only in
 debug mode using asserts or preconditions?

Error. It's a logic error to pass in a too small buffer.

Also, most (if not all) Digest methods really should be pure nothrow. 
Same for some free functions (I think it's good practice to mark 
template functions as pure nothrow explicitly if you want to guarantee 
this).

I agree with the overall interface. I think it's going in the right 
direction.

 CRC32:
 The current implementation doesn't seem to be compliant
 to the 'common' CRC-32-IEEE 802.3 form, at least it doesn't pass these
 test vectors:
 http://www.febooti.com/products/filetweak/members/hash-and-crc/test-vectors/
 http://www.lammertbies.nl/comm/info/crc-calculation.html
 http://rosettacode.org/wiki/CRC-32

 Turns out we'd need to invert the value and make sure to use LSB first
 order. I made those changes, so now we produce the same output as the
 rest of the world, but the new std.util.digest.crc will produce
 different values than the old std.crc now. Is this OK?

Yes, correctness over *.

 I'm also not happy about the crc license:
 provided that the above copyright notice appear in all copies and
   * that both that copyright notice and this permission notice appear
   * in supporting documentation.

Yes, this annoyed me too.

 I'll put all my modifications under boost license. The only thing left
 of the original code is the crc32_table and the implementation of the
 put function.

 The table and the one line of code is also available as public domain
 code here:
 http://www.csbruce.com/~csbruce/software/crc32.c

 So I think it should be possible to change the license to boost?

Yes IMO. Anyone could have gone and implemented it from that C file.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Jun 24 2012

Johannes Pfau <nospam example.com> writes:

Am Sun, 24 Jun 2012 21:40:53 +0200
schrieb Alex R=C3=B8nne Petersen <alex lycus.org>:

 Also, most (if not all) Digest methods really should be pure nothrow.=20
 Same for some free functions (I think it's good practice to mark=20
 template functions as pure nothrow explicitly if you want to
 guarantee this).

I still don't understand pure on member functions completely. The this
pointer is considered as a function parameter, right?

So even put, start, reset... can be pure as those produce the same
result as long as they receive the same arguments and the same _this_
'state'?

I tried to add pure to the interface, but I seems it's not a good idea
right now. It doesn't work for the SHA1 implementation for example
cause that uses memcpy.

Jun 25 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, June 25, 2012 11:30:41 Johannes Pfau wrote:
 Am Sun, 24 Jun 2012 21:40:53 +0200
=20
 schrieb Alex R=C3=B8nne Petersen <alex lycus.org>:
 Also, most (if not all) Digest methods really should be pure nothro=


w.
 Same for some free functions (I think it's good practice to mark
 template functions as pure nothrow explicitly if you want to
 guarantee this).

=20
 I still don't understand pure on member functions completely. The thi=

s
 pointer is considered as a function parameter, right?
=20
 So even put, start, reset... can be pure as those produce the same
 result as long as they receive the same arguments and the same _this_=

 'state'?
=20
 I tried to add pure to the interface, but I seems it's not a good ide=

a
 right now. It doesn't work for the SHA1 implementation for example
 cause that uses memcpy.

_All_ that pure means by itself is that a function does not access any =
mutable=20
global or static variables and that it does not call any functions whic=
h are=20
not pure. It means _nothing_ more. There are _zero_ guarantees that the=
=20
function will return the same result or anything like that. It's what w=
e=20
sometimes term a weakly pure function. That's all pure is by itself.

A strongly pure function, on the other hand, is a pure function whose=20=

arguments are all either immutable or implicitly convertible to immutab=
le. And=20
because that guarantees that the function's arguments won't be mutated,=
 and=20
because you have the guarantee that that function and any function it c=
alls=20
cannot access any state which isn't passed to it via those immutable ar=
guments=20
or which it creates itself, the compiler can guarantee that calling tha=
t=20
function with the same arguments multiple times will result in the same=
 return=20
value.

So, in general, _all_ that you're really guaranteeing when you mark a f=
unction=20
pure is that it's not accessing mutable global or static state either d=
irectly=20
or via any functions that it calls. There are then cases where the comp=
iler=20
can generate improved guarantees based on that information (e.g. when t=
he=20
function is strongly pure), but that's arguably a detail of the compile=
r's=20
optimizer.

David Nadlinger wrote up an article on it not to long ago which may hel=
p=20
clarify matters (though I still haven't gotten around to reading it, so=
 I=20
don't really know how good it is), so you may want to check that out:

http://klickverbot.at/blog/2012/05/purity-in-d/

- Jonathan M Davis

Jun 25 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 24-Jun-12 19:23, Johannes Pfau wrote:
 There are still some open questions:

 OOP interface: Digest.finish()

 This can only throw if the supplied buffer is too small. Make this
 nothrow & throw an Error on too small buffer? Or check buffer only in
 debug mode using asserts or preconditions?

Yup. I'd suggest to simply assert on it.

 CRC32:
 The current implementation doesn't seem to be compliant
 to the 'common' CRC-32-IEEE 802.3 form, at least it doesn't pass these
 test vectors:
 http://www.febooti.com/products/filetweak/members/hash-and-crc/test-vectors/
 http://www.lammertbies.nl/comm/info/crc-calculation.html
 http://rosettacode.org/wiki/CRC-32

I believe there ought to be CRC32 with *any* kind of license on the web.

I'll throw in some things that looked plain wrong to me:
- calculateDigestOOP. Besides having awful, awful name it really should 
be final method of Digest interface (yes, we have these since quite some 
time)
- digestToString helper. What kind of string? Why not just toString as 
member? Or rather clarify that it obtains hexadecimal representation of 
digest.
e.g. toHexString looks far more intuitive for me  (again check out 
toString debate - I hardly believe that hashes have only one possible 
string representation)

- calculateDigest (also called calculateHash somewhere) - why not just 
digest ? In general I'm weary and tired of no-brainer prefixes. They add 
extra symbols for no benefit, because, of course, digest is calculated. 
And so does sum for instance, yet we (would) have sum in algorithm not 
calculate sum, same for min/max etc.
(I think one of hardest goals of std.* is to meet the best balance of 
clarity and brevity.)

- same goes for md5Sum --> md5Of (i'd love to do plain md5 but maybe 
it's much)
- crc32Sum --> scr32Of //basically the fact that both are sums helps 
very little, yet the suffix 'Of' (I think) indicates that the 
calculation is to happen right here (i.e. that it's not initialization 
or smth).

Everything else looks good, though docs may need some proof-reading.
Thanks for pushing this proposal.

-- 
Dmitry Olshansky

Jun 24 2012

Johannes Pfau <nospam example.com> writes:

Am Mon, 25 Jun 2012 00:02:00 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 On 24-Jun-12 19:23, Johannes Pfau wrote:
 There are still some open questions:

 OOP interface: Digest.finish()

 This can only throw if the supplied buffer is too small. Make this
 nothrow & throw an Error on too small buffer? Or check buffer only
 in debug mode using asserts or preconditions?

 
 Yup. I'd suggest to simply assert on it.

OK

 
 CRC32:
 The current implementation doesn't seem to be compliant
 to the 'common' CRC-32-IEEE 802.3 form, at least it doesn't pass
 these test vectors:
 http://www.febooti.com/products/filetweak/members/hash-and-crc/test-vectors/
 http://www.lammertbies.nl/comm/info/crc-calculation.html
 http://rosettacode.org/wiki/CRC-32

 
 I believe there ought to be CRC32 with *any* kind of license on the
 web.
 
 I'll throw in some things that looked plain wrong to me:
 - calculateDigestOOP. Besides having awful, awful name it really
 should be final method of Digest interface (yes, we have these since
 quite some time)

OK.

 - digestToString helper. What kind of string? Why not just toString
 as member? Or rather clarify that it obtains hexadecimal
 representation of digest.
 e.g. toHexString looks far more intuitive for me  (again check out 
 toString debate - I hardly believe that hashes have only one possible 
 string representation)

toString as a member doesn't work well for some hashes. Those hashes
destroy the internal state in finish(). So after a toString() call the
hash would be either invalid or reset to initial state, which is
counterintuitive.

But toHexString sounds like a good solution. (digestToString was the
name used in std.md5, btw)

 
 - calculateDigest (also called calculateHash somewhere) - why not
 just digest ? In general I'm weary and tired of no-brainer prefixes.
 They add extra symbols for no benefit, because, of course, digest is
 calculated. And so does sum for instance, yet we (would) have sum in
 algorithm not calculate sum, same for min/max etc.
 (I think one of hardest goals of std.* is to meet the best balance of 
 clarity and brevity.)

OK, I'll rename it to digest.

 
 - same goes for md5Sum --> md5Of (i'd love to do plain md5 but maybe 
 it's much)
 - crc32Sum --> scr32Of //basically the fact that both are sums helps 
 very little, yet the suffix 'Of' (I think) indicates that the 
 calculation is to happen right here (i.e. that it's not
 initialization or smth).

OK. those were originally called 'sum', but I think it's common to use
multiple hash modules at the same time so name clashes would happen.
md5Of/crcOf sound great.

 
 Everything else looks good, though docs may need some proof-reading.
 Thanks for pushing this proposal.

Yep that was only a first draft. I still need to run it through a
spell checker, proof read it and add some links.

Jun 25 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, June 24, 2012 17:23:18 Johannes Pfau wrote:
 I'm mostly finished with my hash API proposal. I also ported the
 existing crc, md5 and the proposed sha1 hash to this new API.
 
 I changed the namespace to std.util.digest. Andrei once said he thinks
 std.digest/std.hash is a too narrow package and someone else said
 putting crc into std.crypto.digest is ridiculous. So I did what tango
 and other libraries do and created a std.util module.
 
 I think std.uuid would also fit well into std.util so it'd become
 std.util.uuid.

No, no, no, no, no. util is _useless_ as a name. _Everything_ in Phobos is a 
utiliity of one sort or another. Just leave it as std.hash and std.uuid.

 Here's the documentation:
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html
 
 And here's a pull request for the code:
 https://github.com/D-Programming-Language/phobos/pull/646
 
 Github branch:
 https://github.com/jpf91/phobos/tree/newHash

I'll have to look it over later, but this is enough of a change, that I 
suspect that a proper review cycle is order rather than simply making the 
tweaks and creating a pull request for it.

 The table and the one line of code is also available as public domain
 code here:
 http://www.csbruce.com/~csbruce/software/crc32.c
 
 So I think it should be possible to change the license to boost?

As long as the only parts that are left of the original with the non-Boost 
license are publicly available, I don't see any reason why we couldn't put a 
Boost license on the version. Ideally, we would have _no_ licenses other than 
Boost in Phobos. The only reason that we do is due to old D1 code when Walter 
was doing most of it and the contributor situation was very different.

- Jonathan M Davis

Jun 24 2012

Johannes Pfau <nospam example.com> writes:

Am Sun, 24 Jun 2012 17:58:47 -0700
schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 On Sunday, June 24, 2012 17:23:18 Johannes Pfau wrote:
 I'm mostly finished with my hash API proposal. I also ported the
 existing crc, md5 and the proposed sha1 hash to this new API.
 
 I changed the namespace to std.util.digest. Andrei once said he
 thinks std.digest/std.hash is a too narrow package and someone else
 said putting crc into std.crypto.digest is ridiculous. So I did
 what tango and other libraries do and created a std.util module.
 
 I think std.uuid would also fit well into std.util so it'd become
 std.util.uuid.

 
 No, no, no, no, no. util is _useless_ as a name. _Everything_ in
 Phobos is a utiliity of one sort or another. Just leave it as
 std.hash and std.uuid.

OK, OK I'm convinced.


 
 Here's the documentation:
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html
 
 And here's a pull request for the code:
 https://github.com/D-Programming-Language/phobos/pull/646
 
 Github branch:
 https://github.com/jpf91/phobos/tree/newHash

 
 I'll have to look it over later, but this is enough of a change, that
 I suspect that a proper review cycle is order rather than simply
 making the tweaks and creating a pull request for it.

Yeah probably. We really should disable the new std.crc32 then, though.

 
 The table and the one line of code is also available as public
 domain code here:
 http://www.csbruce.com/~csbruce/software/crc32.c
 
 So I think it should be possible to change the license to boost?

 
 As long as the only parts that are left of the original with the
 non-Boost license are publicly available, I don't see any reason why
 we couldn't put a Boost license on the version. Ideally, we would
 have _no_ licenses other than Boost in Phobos. The only reason that
 we do is due to old D1 code when Walter was doing most of it and the
 contributor situation was very different.
 

Great, I'll change that then :-)

Jun 25 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 25-06-2012 11:13, Johannes Pfau wrote:
 Am Sun, 24 Jun 2012 17:58:47 -0700
 schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 On Sunday, June 24, 2012 17:23:18 Johannes Pfau wrote:
 I'm mostly finished with my hash API proposal. I also ported the
 existing crc, md5 and the proposed sha1 hash to this new API.

 I changed the namespace to std.util.digest. Andrei once said he
 thinks std.digest/std.hash is a too narrow package and someone else
 said putting crc into std.crypto.digest is ridiculous. So I did
 what tango and other libraries do and created a std.util module.

 I think std.uuid would also fit well into std.util so it'd become
 std.util.uuid.

 No, no, no, no, no. util is _useless_ as a name. _Everything_ in
 Phobos is a utiliity of one sort or another. Just leave it as
 std.hash and std.uuid.

 OK, OK I'm convinced.


 Here's the documentation:
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html
 http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html

 And here's a pull request for the code:
 https://github.com/D-Programming-Language/phobos/pull/646

 Github branch:
 https://github.com/jpf91/phobos/tree/newHash

 I'll have to look it over later, but this is enough of a change, that
 I suspect that a proper review cycle is order rather than simply
 making the tweaks and creating a pull request for it.

 Yeah probably. We really should disable the new std.crc32 then, though.

That, or make sure that your changes get into 2.060 (the changes aren't 
huge, so reviewing them shouldn't take /that/ long). But on the other 
hand, people seem to really want to get 2.060 out the door ASAP, so I 
don't really know...

In any case, just comment std.hash.crc32 in the makefiles and remove the 
deprecation label in the top-level crc32 module (yes, really, it's not 
even in std...).

 The table and the one line of code is also available as public
 domain code here:
 http://www.csbruce.com/~csbruce/software/crc32.c

 So I think it should be possible to change the license to boost?

 As long as the only parts that are left of the original with the
 non-Boost license are publicly available, I don't see any reason why
 we couldn't put a Boost license on the version. Ideally, we would
 have _no_ licenses other than Boost in Phobos. The only reason that
 we do is due to old D1 code when Walter was doing most of it and the
 contributor situation was very different.

 Great, I'll change that then :-)

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Jun 25 2012

Johannes Pfau <nospam example.com> writes:

OK, so I understand std.util is probably not a good idea.

So the candidates for the namespace are:
* std.crypto.hash
* std.checksum
* std.crypto.hash and std.checksum
* std.hash

and the same with hash replaced by digest.
So which one should we use?

Jun 25 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, June 25, 2012 11:35:33 Johannes Pfau wrote:
 OK, so I understand std.util is probably not a good idea.
 
 So the candidates for the namespace are:
 * std.crypto.hash
 * std.checksum
 * std.crypto.hash and std.checksum
 * std.hash
 
 and the same with hash replaced by digest.
 So which one should we use?

The previous discussions on this resulted in us going with std.hash.md5, 
std.hash.sha1, and std.hash.crc32. I don't see any reason to change that, and 
crypto was specifically _not_ chosen, because crc32 isn't cryptographically 
sound. But std.hash encompasses things quite nicely, since they're all hashes.

- Jonathan M Davis

Jun 25 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Jonathan M Davis wrote:
 On Monday, June 25, 2012 11:35:33 Johannes Pfau wrote:
 OK, so I understand std.util is probably not a good idea.

 So the candidates for the namespace are:
 * std.crypto.hash
 * std.checksum
 * std.crypto.hash and std.checksum
 * std.hash

 and the same with hash replaced by digest.
 So which one should we use?

 The previous discussions on this resulted in us going with std.hash.md5,
 std.hash.sha1, and std.hash.crc32. I don't see any reason to change that, and
 crypto was specifically _not_ chosen, because crc32 isn't cryptographically
 sound. But std.hash encompasses things quite nicely, since they're all hashes.

IMHO crypto should be chosen because beside of hashes there are other 
cryptographic primitives (ciphers, PKI, MACs, etc.) and it would be nice 
to have them in one place. std.hash is too narrow because when std gets 
crypto there will be too many namespaces like std.ciphers, std.ssl, 
std.mac. All of them will nicely fit in std.crypto or similar.

As you can see crypto isn't good candidate for checksums so another 
package std.checksum is proposed. Likewise mixing checksums and 
cryptographic hashes under one namespace (std.hash) isn't a right choice 
IMO.

Having cryptographic primitives splitted to std.hash and std.crypto.* 
isn't a good choice either.

Jun 25 2012

Jacob Carlborg <doob me.com> writes:

On 2012-06-25 12:24, Piotr Szturmaj wrote:
 Jonathan M Davis wrote:
 On Monday, June 25, 2012 11:35:33 Johannes Pfau wrote:
 OK, so I understand std.util is probably not a good idea.

 So the candidates for the namespace are:
 * std.crypto.hash
 * std.checksum
 * std.crypto.hash and std.checksum
 * std.hash

 and the same with hash replaced by digest.
 So which one should we use?

 The previous discussions on this resulted in us going with std.hash.md5,
 std.hash.sha1, and std.hash.crc32. I don't see any reason to change
 that, and
 crypto was specifically _not_ chosen, because crc32 isn't
 cryptographically
 sound. But std.hash encompasses things quite nicely, since they're all
 hashes.

 IMHO crypto should be chosen because beside of hashes there are other
 cryptographic primitives (ciphers, PKI, MACs, etc.) and it would be nice
 to have them in one place. std.hash is too narrow because when std gets
 crypto there will be too many namespaces like std.ciphers, std.ssl,
 std.mac. All of them will nicely fit in std.crypto or similar.

 As you can see crypto isn't good candidate for checksums so another
 package std.checksum is proposed. Likewise mixing checksums and
 cryptographic hashes under one namespace (std.hash) isn't a right choice
 IMO.

 Having cryptographic primitives splitted to std.hash and std.crypto.*
 isn't a good choice either.

Can't we have two namespaces, one for checksums and one for the rest. Or 
one for cryptographically safe primitives and one for the rest.

Is there a general enough name to fit all these into one namespace?

-- 
/Jacob Carlborg

Jun 25 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, June 25, 2012 12:24:44 Piotr Szturmaj wrote:
 Jonathan M Davis wrote:
 On Monday, June 25, 2012 11:35:33 Johannes Pfau wrote:
 OK, so I understand std.util is probably not a good idea.
 
 So the candidates for the namespace are:
 * std.crypto.hash
 * std.checksum
 * std.crypto.hash and std.checksum
 * std.hash
 
 and the same with hash replaced by digest.
 So which one should we use?

 
 The previous discussions on this resulted in us going with std.hash.md5,
 std.hash.sha1, and std.hash.crc32. I don't see any reason to change that,
 and crypto was specifically _not_ chosen, because crc32 isn't
 cryptographically sound. But std.hash encompasses things quite nicely,
 since they're all hashes.

 IMHO crypto should be chosen because beside of hashes there are other
 cryptographic primitives (ciphers, PKI, MACs, etc.) and it would be nice
 to have them in one place. std.hash is too narrow because when std gets
 crypto there will be too many namespaces like std.ciphers, std.ssl,
 std.mac. All of them will nicely fit in std.crypto or similar.
 
 As you can see crypto isn't good candidate for checksums so another
 package std.checksum is proposed. Likewise mixing checksums and
 cryptographic hashes under one namespace (std.hash) isn't a right choice
 IMO.
 
 Having cryptographic primitives splitted to std.hash and std.crypto.*
 isn't a good choice either.

Except that the same hashes could be used for either checksums or crypto stuff. 
It makes no sense to sense to split them between two packages. And you could 
probably get into arguments over whether any particular hash was 
cryptographically sound, particularly since that can change over time, can't 
it, given that at least part of what determines whether a hash is considered 
cryptographically sound is how easy it is to break. SHA-1 may or may not be 
considered cryptographically sound now, but it sure won't be forever, so 
putting it in std.crypto would become decreasingly accurate over time.

So, as far as the hashes go, it makes the most sense IMHO to just stuff them 
all in std.hash and be done with it. If we ever end up adding crypto-specific 
stuff to Phobos, then that stuff can go in std.crypto, but the hashes are _not_ 
crypto-specific. They just so happen to be used in cryptography. They aren't 
restricted to it.

- Jonathan M Davis

Jun 25 2012

"Felix Hufnagel" <suicide xited.de> writes:

+1 for
hashes into std.hash
and cryptographic primitives into std.crypto

and we should have a std.net (std.uri, std.socket, std.socketstream ,  =

std.net.curl, ...),
std.io. for (Outbuffer, file, ....)
and probably std.database or something like that for (csv, json, xml, ..=
.)

...


Am 25.06.2012, 17:31 Uhr, schrieb Jonathan M Davis <jmdavisProg gmx.com>=
:

 On Monday, June 25, 2012 12:24:44 Piotr Szturmaj wrote:
 Jonathan M Davis wrote:
 On Monday, June 25, 2012 11:35:33 Johannes Pfau wrote:
 OK, so I understand std.util is probably not a good idea.

 So the candidates for the namespace are:
 * std.crypto.hash
 * std.checksum
 * std.crypto.hash and std.checksum
 * std.hash

 and the same with hash replaced by digest.
 So which one should we use?

 The previous discussions on this resulted in us going with  =



 std.hash.md5,
 std.hash.sha1, and std.hash.crc32. I don't see any reason to change=



  =

 that,
 and crypto was specifically _not_ chosen, because crc32 isn't
 cryptographically sound. But std.hash encompasses things quite nice=



ly,
 since they're all hashes.

 IMHO crypto should be chosen because beside of hashes there are other=


 cryptographic primitives (ciphers, PKI, MACs, etc.) and it would be n=


ice
 to have them in one place. std.hash is too narrow because when std ge=


ts
 crypto there will be too many namespaces like std.ciphers, std.ssl,
 std.mac. All of them will nicely fit in std.crypto or similar.

 As you can see crypto isn't good candidate for checksums so another
 package std.checksum is proposed. Likewise mixing checksums and
 cryptographic hashes under one namespace (std.hash) isn't a right cho=


ice
 IMO.

 Having cryptographic primitives splitted to std.hash and std.crypto.*=


 isn't a good choice either.

 Except that the same hashes could be used for either checksums or cryp=

to  =

 stuff.
 It makes no sense to sense to split them between two packages. And you=

  =

 could
 probably get into arguments over whether any particular hash was
 cryptographically sound, particularly since that can change over time,=

  =

 can't
 it, given that at least part of what determines whether a hash is  =

 considered
 cryptographically sound is how easy it is to break. SHA-1 may or may n=

ot  =

 be
 considered cryptographically sound now, but it sure won't be forever, =

so
 putting it in std.crypto would become decreasingly accurate over time.=

 So, as far as the hashes go, it makes the most sense IMHO to just stuf=

f  =

 them
 all in std.hash and be done with it. If we ever end up adding  =

 crypto-specific
 stuff to Phobos, then that stuff can go in std.crypto, but the hashes =

 =

 are _not_
 crypto-specific. They just so happen to be used in cryptography. They =

 =

 aren't
 restricted to it.

 - Jonathan M Davis


-- =

Erstellt mit Operas revolution=E4rem E-Mail-Modul: http://www.opera.com/=
mail/

Jun 25 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 25-Jun-12 20:09, Felix Hufnagel wrote:
 +1 for
 hashes into std.hash
 and cryptographic primitives into std.crypto

Another +1 here

 and we should have  std.net (std.uri, std.socket, std.socketstream ,
 std.net.curl, ...),
 std.io for proper I/O framework.
 and probably std.data or something like that for (csv, json, xml, ...)


Fixed :)


-- 
Dmitry Olshansky

Jun 25 2012

"nazriel" <damian dzfl.pl> writes:

On Monday, 25 June 2012 at 16:09:43 UTC, Felix Hufnagel wrote:
 +1 for
 hashes into std.hash
 and cryptographic primitives into std.crypto

 and we should have a std.net (std.uri, std.socket, 
 std.socketstream , std.net.curl, ...),
 std.io. for (Outbuffer, file, ....)
 and probably std.database or something like that for (csv, 
 json, xml, ...)

 ...


I couldn't agree more.

Jun 25 2012

"Jesse Phillips" <Jessekphillips+D gmail.com> writes:

On Monday, 25 June 2012 at 16:09:43 UTC, Felix Hufnagel wrote:
 +1 for
 hashes into std.hash
 and cryptographic primitives into std.crypto

 and we should have a std.net (std.uri, std.socket, 
 std.socketstream , std.net.curl, ...),
 std.io. for (Outbuffer, file, ....)
 and probably std.database or something like that for (csv, 
 json, xml, ...)

I'd be for not being so flat.

Jun 25 2012

Don Clugston <dac nospam.com> writes:

On 25/06/12 20:04, Jesse Phillips wrote:
 On Monday, 25 June 2012 at 16:09:43 UTC, Felix Hufnagel wrote:
 +1 for
 hashes into std.hash
 and cryptographic primitives into std.crypto

 and we should have a std.net (std.uri, std.socket, std.socketstream ,
 std.net.curl, ...),
 std.io. for (Outbuffer, file, ....)
 and probably std.database or something like that for (csv, json, xml,
 ...)

 I'd be for not being so flat.

I reckon, follow biology.
There's kingdom.phyllus.class.order.family.genus.species
But in practice, that's far too clumsy. Instead, everyone just uses 
genus.species. And this works even though there are more than a million 
species.

So I reckon two levels of modules is enough. More than that is clumsy.
And, if you're not sure where something should be, because there are two 
or more equally valid alternatives, it should probably be a level closer 
to the root of the tree.

Jun 29 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Don Clugston wrote:
 On 25/06/12 20:04, Jesse Phillips wrote:
 On Monday, 25 June 2012 at 16:09:43 UTC, Felix Hufnagel wrote:
 +1 for
 hashes into std.hash
 and cryptographic primitives into std.crypto

 and we should have a std.net (std.uri, std.socket, std.socketstream ,
 std.net.curl, ...),
 std.io. for (Outbuffer, file, ....)
 and probably std.database or something like that for (csv, json, xml,
 ...)

 I'd be for not being so flat.

 I reckon, follow biology.
 There's kingdom.phyllus.class.order.family.genus.species
 But in practice, that's far too clumsy. Instead, everyone just uses
 genus.species. And this works even though there are more than a million
 species.

 So I reckon two levels of modules is enough. More than that is clumsy.
 And, if you're not sure where something should be, because there are two
 or more equally valid alternatives, it should probably be a level closer
 to the root of the tree.

I'd not generalize that much. Sometimes two levels are enough, sometimes 
there are three or more. I'd say "It depends" :)

And yes, I think we should have std.net package. Hey, packages were 
created for that!

Jun 29 2012

D Programming

C/C++ Programming

Other

digitalmars.D - New hash API: Update