www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.compress

reply Walter Bright <newshound2 digitalmars.com> writes:
https://github.com/WalterBright/phobos/blob/std_compress/std/compress.d

I wrote this to add components to compress and expand ranges.

Highlights:

1. doesn't do any memory allocation
2. can handle arbitrarily large sets of data
3. it's lazy
4. takes an InputRange, and outputs an InputRange

Comments welcome.
Jun 03 2013
next sibling parent reply "Diggory" <diggsey googlemail.com> writes:
On Tuesday, 4 June 2013 at 03:44:05 UTC, Walter Bright wrote:
 https://github.com/WalterBright/phobos/blob/std_compress/std/compress.d

 I wrote this to add components to compress and expand ranges.

 Highlights:

 1. doesn't do any memory allocation
 2. can handle arbitrarily large sets of data
 3. it's lazy
 4. takes an InputRange, and outputs an InputRange

 Comments welcome.
Nice! What happens if R is not a ubyte range?
Jun 03 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/3/2013 10:41 PM, Diggory wrote:
 Nice! What happens if R is not a ubyte range?
It'll work with char and ubyte, too. Anything else you'll need to cast or use an adapter.
Jun 03 2013
prev sibling next sibling parent reply Timothee Cour <thelastmammoth gmail.com> writes:
A)
there already is std.zlib; why not have:
std.compress.zlib: public import std.zlib
std.compress.lzw: put this new module there instead of in std.compress
std.compress.image.png
std.compress.image.jpg

B)
rename:
std.compress.lzwCompress => std.compress.lzw.compress
std.compress. lzwExpand => std.compress.lzw.uncompress

which is more consistent with compress/uncompress from std.zlib

C)
maybe add a link to
https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch or other
source

D)
CircularBuffer belongs somewhere else; maybe std.range or std.container





On Mon, Jun 3, 2013 at 8:44 PM, Walter Bright <newshound2 digitalmars.com>wrote:

 https://github.com/**WalterBright/phobos/blob/std_**
 compress/std/compress.d<https://github.com/WalterBright/phobos/blob/std_compress/std/compress.d>

 I wrote this to add components to compress and expand ranges.

 Highlights:

 1. doesn't do any memory allocation
 2. can handle arbitrarily large sets of data
 3. it's lazy
 4. takes an InputRange, and outputs an InputRange

 Comments welcome.
Jun 03 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/3/2013 11:40 PM, Timothee Cour wrote:
 D)
 CircularBuffer belongs somewhere else; maybe std.range or std.container
I have mixed feelings about that. If you'll notice, std.compress doesn't have any imports! I wanted to make at least one module that doesn't pull in 100% of everything in Phobos (one of my pet peeves).
Jun 04 2013
next sibling parent "Dicebot" <m.strashun gmail.com> writes:
On Tuesday, 4 June 2013 at 08:00:03 UTC, Walter Bright wrote:
 On 6/3/2013 11:40 PM, Timothee Cour wrote:
 D)
 CircularBuffer belongs somewhere else; maybe std.range or 
 std.container
I have mixed feelings about that. If you'll notice, std.compress doesn't have any imports! I wanted to make at least one module that doesn't pull in 100% of everything in Phobos (one of my pet peeves).
If that is an issue, it is an issue in DMD, not in module. Modules are supposed to use each other extensively, that is the very reason to have them!
Jun 04 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 1:00 AM, Walter Bright wrote:
 On 6/3/2013 11:40 PM, Timothee Cour wrote:
 D)
 CircularBuffer belongs somewhere else; maybe std.range or std.container
I have mixed feelings about that. If you'll notice, std.compress doesn't have any imports! I wanted to make at least one module that doesn't pull in 100% of everything in Phobos (one of my pet peeves).
Note also I didn't document it, so it is private and can be moved.
Jun 04 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 4 June 2013 at 08:03:15 UTC, Walter Bright wrote:
 On 6/4/2013 1:00 AM, Walter Bright wrote:
 On 6/3/2013 11:40 PM, Timothee Cour wrote:
 D)
 CircularBuffer belongs somewhere else; maybe std.range or 
 std.container
I have mixed feelings about that. If you'll notice, std.compress doesn't have any imports! I wanted to make at least one module that doesn't pull in 100% of everything in Phobos (one of my pet peeves).
Note also I didn't document it, so it is private and can be moved.
Then it should be private. You should also mangle the name so that it doesn't pollute the unqualified symbol namespace (either that or fix visibility of private symbols).
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 1:15 AM, Peter Alexander wrote:
 On Tuesday, 4 June 2013 at 08:03:15 UTC, Walter Bright wrote:
 On 6/4/2013 1:00 AM, Walter Bright wrote:
 On 6/3/2013 11:40 PM, Timothee Cour wrote:
 D)
 CircularBuffer belongs somewhere else; maybe std.range or std.container
I have mixed feelings about that. If you'll notice, std.compress doesn't have any imports! I wanted to make at least one module that doesn't pull in 100% of everything in Phobos (one of my pet peeves).
Note also I didn't document it, so it is private and can be moved.
Then it should be private.
I agree.
 You should also mangle the name so that it doesn't
 pollute the unqualified symbol namespace (either that or fix visibility of
 private symbols).
If it proves useful, it will be moved into some more proper and public place. I think it's a bad idea to 'mangle' the name. First off, if it is private, it is not visible. And even being public, the anti-hijacking language features make it a non-problem. The whole point is to avoid the wretched C problems with a global name space, by not having a global name space.
Jun 04 2013
next sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 4 June 2013 at 08:23:52 UTC, Walter Bright wrote:
 You should also mangle the name so that it doesn't
 pollute the unqualified symbol namespace (either that or fix 
 visibility of
 private symbols).
If it proves useful, it will be moved into some more proper and public place. I think it's a bad idea to 'mangle' the name. First off, if it is private, it is not visible. And even being public, the anti-hijacking language features make it a non-problem. The whole point is to avoid the wretched C problems with a global name space, by not having a global name space.
import std.compress; import mylib.circularbuffer; CircularBuffer!(ubyte[1024]) buf; ERROR: conflicting names, even though std.compress.CircularBuffer is private! I have to fully qualify CircularBuffer, or use alias to get around the problem. D may not have a global namespace, but it does have unqualified name lookup, and private symbols still pollute that pseudo-namespace.
Jun 04 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 4 June 2013 at 08:33:29 UTC, Peter Alexander wrote:
 On Tuesday, 4 June 2013 at 08:23:52 UTC, Walter Bright wrote:
 You should also mangle the name so that it doesn't
 pollute the unqualified symbol namespace (either that or fix 
 visibility of
 private symbols).
If it proves useful, it will be moved into some more proper and public place. I think it's a bad idea to 'mangle' the name. First off, if it is private, it is not visible. And even being public, the anti-hijacking language features make it a non-problem. The whole point is to avoid the wretched C problems with a global name space, by not having a global name space.
import std.compress; import mylib.circularbuffer; CircularBuffer!(ubyte[1024]) buf; ERROR: conflicting names, even though std.compress.CircularBuffer is private! I have to fully qualify CircularBuffer, or use alias to get around the problem.
Is this according to the specs though, or a bug? It was my understanding that another module's private symbols should not even be "seen" ?
Jun 04 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 4 June 2013 at 09:11:49 UTC, monarch_dodra wrote:
 On Tuesday, 4 June 2013 at 08:33:29 UTC, Peter Alexander wrote:
 On Tuesday, 4 June 2013 at 08:23:52 UTC, Walter Bright wrote:
 You should also mangle the name so that it doesn't
 pollute the unqualified symbol namespace (either that or fix 
 visibility of
 private symbols).
If it proves useful, it will be moved into some more proper and public place. I think it's a bad idea to 'mangle' the name. First off, if it is private, it is not visible. And even being public, the anti-hijacking language features make it a non-problem. The whole point is to avoid the wretched C problems with a global name space, by not having a global name space.
import std.compress; import mylib.circularbuffer; CircularBuffer!(ubyte[1024]) buf; ERROR: conflicting names, even though std.compress.CircularBuffer is private! I have to fully qualify CircularBuffer, or use alias to get around the problem.
Is this according to the specs though, or a bug? It was my understanding that another module's private symbols should not even be "seen" ?
Well, the fix is currently in an unapproved DIP. I have no idea whether Walter intends to accept it or reject it. The discussion thread just seems to have died off. http://wiki.dlang.org/DIP22
Jun 04 2013
parent Martin Nowak <code dawg.eu> writes:
On 06/04/2013 11:52 AM, Peter Alexander wrote:
 Well, the fix is currently in an unapproved DIP. I have no idea whether
 Walter intends to accept it or reject it. The discussion thread just
 seems to have died off.

 http://wiki.dlang.org/DIP22
I should really submit some ideas from my implementation to the DIP. https://github.com/D-Programming-Language/dmd/pull/739
Jun 04 2013
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
04-Jun-2013 12:23, Walter Bright пишет:
 On 6/4/2013 1:15 AM, Peter Alexander wrote:
agree.
 You should also mangle the name so that it doesn't
 pollute the unqualified symbol namespace (either that or fix
 visibility of
 private symbols).
If it proves useful, it will be moved into some more proper and public place. I think it's a bad idea to 'mangle' the name. First off, if it is private, it is not visible. And even being public, the anti-hijacking language features make it a non-problem. The whole point is to avoid the wretched C problems with a global name space, by not having a global name space.
They are visible and clash with other symbols just like public do. Maybe now is the time fix this bug? -- Dmitry Olshansky
Jun 04 2013
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, June 04, 2013 01:23:52 Walter Bright wrote:
 I think it's a bad idea to 'mangle' the name. First off, if it is private,
 it is not visible. And even being public, the anti-hijacking language
 features make it a non-problem. The whole point is to avoid the wretched C
 problems with a global name space, by not having a global name space.
Not visible? When was that fixed? Last time I checked, access level had zero effect on visibility, just your ability to actually call it. Access level is taken into account after overload resolution. So, if there's another, public symbol with the same name which would be as good a match as this one aside from access level, then you're going to get a compilation error - which is exactly why most of us argue that inaccessible symbols should not be visible. But that requires a language change (which should definitely happen IMHO, but AFAIK, it still hasn't). - Jonathan M Davis
Jun 04 2013
prev sibling next sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 4 June 2013 at 08:00:03 UTC, Walter Bright wrote:
 I have mixed feelings about that. If you'll notice, 
 std.compress doesn't have any imports! I wanted to make at 
 least one module that doesn't pull in 100% of everything in 
 Phobos (one of my pet peeves).
I think this is a workaround, not a proper solution. It probably means Phobos' granularity is horribly wrong.
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 1:08 AM, Jakob Ovrum wrote:
 On Tuesday, 4 June 2013 at 08:00:03 UTC, Walter Bright wrote:
 I have mixed feelings about that. If you'll notice, std.compress doesn't have
 any imports! I wanted to make at least one module that doesn't pull in 100% of
 everything in Phobos (one of my pet peeves).
I think this is a workaround, not a proper solution.
Yes, it is.
 It probably means Phobos' granularity is horribly wrong.
Yup. Phobos is hard to work on because of the complexity of everything importing and depending on everything else all in mutually referential cycles. I deliberately set out to create compress as a non-trivial module that did not do that. I hope that splitting things up into packages will improve things.
Jun 04 2013
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
I actually wish we could have multiple modules in a single file. 
Correct me if I'm wrong, but if imported something and only used 
one type there, the linker should strip out the others, right?

But this doesn't happen because ModuleInfo references all kinds 
of things, and moduleinfo is referenced for constructors and 
such. This is useful and removing it is probably a bad idea.

Breaking up into packages is one idea but you can't always do it. 
What if you're doing some big string mixins? A single file is 
also a little easier to distribute.

But mixins is the case that is hard to work around since they by 
definition go into one file. If we could do something like 
mixin("module foo.mixin"~name~" { code }"); you could work around 
it.

Then you could isolate sections of generated code in their own 
logical modules, letting the linker kill those sections if they 
aren't actually used.
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 5:17 AM, Adam D. Ruppe wrote:
 I actually wish we could have multiple modules in a single file.
I don't see much point to that in modern file systems.
Jun 04 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 04 Jun 2013 09:06:22 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 6/4/2013 5:17 AM, Adam D. Ruppe wrote:
 I actually wish we could have multiple modules in a single file.
I don't see much point to that in modern file systems.
Probably seek time if the files are scattered and not in cache. That's hardly a show stopper unless you have 17.156 files like the Java Runtime. But they 'solved' it by zipping them up. -- Marco
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 5:13 PM, Marco Leise wrote:
 Probably seek time if the files are scattered and not in cache.
 That's hardly a show stopper unless you have 17.156 files like
 the Java Runtime. But they 'solved' it by zipping them up.
Actually, I've often thought of making dmd able to read everything it needs out of a zip file.
Jun 04 2013
next sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 04 Jun 2013 17:58:01 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 6/4/2013 5:13 PM, Marco Leise wrote:
 Probably seek time if the files are scattered and not in cache.
 That's hardly a show stopper unless you have 17.156 files like
 the Java Runtime. But they 'solved' it by zipping them up.
Actually, I've often thought of making dmd able to read everything it needs out of a zip file.
That would have been difficult for editors and IDEs that can look up file names from include paths only when they are not zipped up. It is good the way it is. -- Marco
Jun 04 2013
parent "eles" <eles eles.com> writes:
On Wednesday, 5 June 2013 at 02:23:54 UTC, Marco Leise wrote:
 Am Tue, 04 Jun 2013 17:58:01 -0700
 schrieb Walter Bright <newshound2 digitalmars.com>:
 That would have been difficult for editors and IDEs that can
 look up file names from include paths only when they are not
 zipped up. It is good the way it is.
True, but Java also had the same issue with its .jar files and the editors adapted.
Jun 05 2013
prev sibling next sibling parent "eles" <eles eles.com> writes:
On Wednesday, 5 June 2013 at 00:58:02 UTC, Walter Bright wrote:
 On 6/4/2013 5:13 PM, Marco Leise wrote:
 Actually, I've often thought of making dmd able to read 
 everything it needs out of a zip file.
I support that. It would make distributing source code cleaner. Most of the time you don't need to look at the code, just compile it, while still knowing that you have it available in an archive if you need it. Maybe that kind of support could improve the distribution of closed-source libraries, too: the generated .di files and the binaries could be packaged together in a zip file. More, the zip file could be really easy tested for self-containment. It happens sometime that a folder of code compiles, then when you package the whole thing and ship it, you discover that you forget to package inside some kind of file/dependency, and the customers are complaining about it. With a zip file, you just do a compile-check on the final package and, if ok, then it is ready for shipment. Btw, I cannot not resist, just adding here my favorite quote in software development: "It compiles. Let's ship it!" :)
Jun 05 2013
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-06-05 02:58, Walter Bright wrote:

 Actually, I've often thought of making dmd able to read everything it
 needs out of a zip file.
I think it's better to have a proper package manager. -- /Jacob Carlborg
Jun 05 2013
parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 05 Jun 2013 11:17:06 +0100, Jacob Carlborg <doob me.com> wrote:

 On 2013-06-05 02:58, Walter Bright wrote:

 Actually, I've often thought of making dmd able to read everything it
 needs out of a zip file.
I think it's better to have a proper package manager.
I think it's better to have both :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 05 2013
prev sibling next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
There's no point in having modules reinvent the wheel everytime. CircularBuffer
is clearly usable in other contexts.

Reusing such code makes sure bug fixes and efficiency gains are done once
and for all and work across the board.

 I wanted to make at least one module that doesn't pull in 100% of
everything in Phobos That seems like a very artificial exercise leading to unnecessary contorsions. On Tue, Jun 4, 2013 at 1:00 AM, Walter Bright <newshound2 digitalmars.com>wrote:
 On 6/3/2013 11:40 PM, Timothee Cour wrote:

 D)
 CircularBuffer belongs somewhere else; maybe std.range or std.container
I have mixed feelings about that. If you'll notice, std.compress doesn't have any imports! I wanted to make at least one module that doesn't pull in 100% of everything in Phobos (one of my pet peeves).
Jun 04 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/4/13 4:00 AM, Walter Bright wrote:
 On 6/3/2013 11:40 PM, Timothee Cour wrote:
 D)
 CircularBuffer belongs somewhere else; maybe std.range or std.container
I have mixed feelings about that. If you'll notice, std.compress doesn't have any imports! I wanted to make at least one module that doesn't pull in 100% of everything in Phobos (one of my pet peeves).
The downside of that is reinventing everything. I haven't looked at the code yet, but std.range has http://dlang.org/phobos/std_range.html#cycle which implements a circular buffer. Andrei
Jun 04 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 5:26 AM, Andrei Alexandrescu wrote:
 The downside of that is reinventing everything. I haven't looked at the code
 yet, but std.range has http://dlang.org/phobos/std_range.html#cycle which
 implements a circular buffer.
cycle only reads from a circular buffer. CircularBuffer can be filled as well as emptied at the same time (it has a put() method).
Jun 04 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-06-04 08:40, Timothee Cour wrote:
 A)
 there already is std.zlib; why not have:
 std.compress.zlib: public import std.zlib
 std.compress.lzw: put this new module there instead of in std.compress
 std.compress.image.png
 std.compress.image.jpg

 B)
 rename:
 std.compress.lzwCompress => std.compress.lzw.compress
 std.compress. lzwExpand => std.compress.lzw.uncompress

 which is more consistent with compress/uncompress from std.zlib

 C)
 maybe add a link to
 https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch or other
 source

 D)
 CircularBuffer belongs somewhere else; maybe std.range or std.container
I agree with all of these. Perhaps it should be put in the review queue as well. -- /Jacob Carlborg
Jun 04 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Timothee Cour:

 D)
 CircularBuffer belongs somewhere else; maybe std.range or 
 std.container
If you are interested in adding a CircularBuffer to Phobos, then I'd like both that fixed sized one and a growing one like this: http://rosettacode.org/wiki/Queue/Usage#Faster_Version Bye, bearophile
Jun 04 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 4 June 2013 at 11:18:45 UTC, bearophile wrote:
 If you are interested in adding a CircularBuffer to Phobos, 
 then I'd like both that fixed sized one and a growing one like 
 this:
 http://rosettacode.org/wiki/Queue/Usage#Faster_Version
Nitpick; head = (head + 1) & ((cast(size_t)1 << power2) - 1); can be head = (head + 1) & (A.length - 1); No? power2 seems superfluous. Also, left/right shifts by variable amount are very slow on some processors Anyway, we'll really need allocators before we can add more allocating containers. Andrei? :-)
Jun 04 2013
parent "bearophile" <bearophileHUGS lycos.com> writes:
Peter Alexander:

 Nitpick;

 head = (head + 1) & ((cast(size_t)1 << power2) - 1);

 can be

 head = (head + 1) & (A.length - 1);

 No? power2 seems superfluous.
I see. Thank you. I will improve it later. Bye, bearophile
Jun 04 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/3/2013 8:44 PM, Walter Bright wrote:
 https://github.com/WalterBright/phobos/blob/std_compress/std/compress.d

 I wrote this to add components to compress and expand ranges.

 Highlights:

 1. doesn't do any memory allocation
 2. can handle arbitrarily large sets of data
 3. it's lazy
 4. takes an InputRange, and outputs an InputRange

 Comments welcome.
BTW, I also wrote this because it is a tricky component to write. There is not a 1:1 correspondence between input and output - the relationship is not predictable. Worse, there are "look backs" on input and "back patches" on output. Hence, sliding buffers have to be used on both input and output. I like to think of it as an example of how to do such. It took me a bit of time to figure out a way to do it that wasn't too numbingly complex.
Jun 04 2013
prev sibling next sibling parent reply David <d dav1d.de> writes:
Am 04.06.2013 05:44, schrieb Walter Bright:
 https://github.com/WalterBright/phobos/blob/std_compress/std/compress.d
 
 I wrote this to add components to compress and expand ranges.
 
 Highlights:
 
 1. doesn't do any memory allocation
 2. can handle arbitrarily large sets of data
 3. it's lazy
 4. takes an InputRange, and outputs an InputRange
 
 Comments welcome.
Why do we need that? I would much rather have a deflate which doesn't depend on a C zlib (a proper std.zlib written in 100% D) and followed by a less buggy, less pita, less limited std.zip (btw. I think I fxed one of the bugs a while ago but it is still open and listed as bug on dlang.org). I personally never used lzw compression and from what I know it is only used in GIF and TIFF (I might be wrong here), in comparison to deflate which is used in a varity of formats. So making std.compress only contain a rarely used compression algorithm feels wrong, having in it std.compress.* ok.
Jun 04 2013
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, June 04, 2013 14:48:34 David wrote:
 (btw. I think I fxed one
 of the bugs a while ago but it is still open and listed as bug on
 dlang.org).
If you're sure that it's fixed, then close it. - Jonathan M Davis
Jun 04 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-06-04 05:44, Walter Bright wrote:
 https://github.com/WalterBright/phobos/blob/std_compress/std/compress.d

 I wrote this to add components to compress and expand ranges.

 Highlights:

 1. doesn't do any memory allocation
 2. can handle arbitrarily large sets of data
 3. it's lazy
 4. takes an InputRange, and outputs an InputRange
I'm wondering if (un)compress can take the compressing algorithm as a template parameter. Does that make sense? Something like: auto result = data.compress!(LZW); Then we could pass different compressing algorithms to the compress function. -- /Jacob Carlborg
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 6:34 AM, Jacob Carlborg wrote:
 I'm wondering if (un)compress can take the compressing algorithm as a template
 parameter. Does that make sense?

 Something like:

 auto result = data.compress!(LZW);

 Then we could pass different compressing algorithms to the compress function.
I don't see the point. Furthermore, it requires that the compress template know about all the compression algorithms available, which limits future expansion.
Jun 04 2013
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 4 June 2013 at 16:09:09 UTC, Walter Bright wrote:
 On 6/4/2013 6:34 AM, Jacob Carlborg wrote:
 I'm wondering if (un)compress can take the compressing 
 algorithm as a template
 parameter. Does that make sense?

 Something like:

 auto result = data.compress!(LZW);

 Then we could pass different compressing algorithms to the 
 compress function.
I don't see the point. Furthermore, it requires that the compress template know about all the compression algorithms available, which limits future expansion.
Not necessarily. If the compression algorithms were free functions in the module you could just be passing an alias to one, which compress would then call. (which would also allow people to specify their own algorithms)
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 9:33 AM, John Colvin wrote:
 On Tuesday, 4 June 2013 at 16:09:09 UTC, Walter Bright wrote:
 On 6/4/2013 6:34 AM, Jacob Carlborg wrote:
 I'm wondering if (un)compress can take the compressing algorithm as a template
 parameter. Does that make sense?

 Something like:

 auto result = data.compress!(LZW);

 Then we could pass different compressing algorithms to the compress function.
I don't see the point. Furthermore, it requires that the compress template know about all the compression algorithms available, which limits future expansion.
Not necessarily. If the compression algorithms were free functions in the module you could just be passing an alias to one, which compress would then call. (which would also allow people to specify their own algorithms)
What value does a function which just passes an alias to another one add?
Jun 04 2013
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 4 June 2013 at 17:50:47 UTC, Walter Bright wrote:
 On 6/4/2013 9:33 AM, John Colvin wrote:
 On Tuesday, 4 June 2013 at 16:09:09 UTC, Walter Bright wrote:
 On 6/4/2013 6:34 AM, Jacob Carlborg wrote:
 I'm wondering if (un)compress can take the compressing 
 algorithm as a template
 parameter. Does that make sense?

 Something like:

 auto result = data.compress!(LZW);

 Then we could pass different compressing algorithms to the 
 compress function.
I don't see the point. Furthermore, it requires that the compress template know about all the compression algorithms available, which limits future expansion.
Not necessarily. If the compression algorithms were free functions in the module you could just be passing an alias to one, which compress would then call. (which would also allow people to specify their own algorithms)
What value does a function which just passes an alias to another one add?
A unified interface called "compress" that takes a compression function as an alias (with e.g. lzwCompress as a default) seems like a nicer way of working, seeing as people don't necessarily care/know about which algorithm they're using, they just want to compress something a bit. Also, it would be cool if a range could remember which algorithm it was compressed with (as it's type? I.e. LzwRange), so a generic function "expand" could call the appropriate ***Expand
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 11:04 AM, John Colvin wrote:
 On Tuesday, 4 June 2013 at 17:50:47 UTC, Walter Bright wrote:
 What value does a function which just passes an alias to another one add?
A unified interface called "compress" that takes a compression function as an alias (with e.g. lzwCompress as a default) seems like a nicer way of working, seeing as people don't necessarily care/know about which algorithm they're using, they just want to compress something a bit. Also, it would be cool if a range could remember which algorithm it was compressed with (as it's type? I.e. LzwRange), so a generic function "expand" could call the appropriate ***Expand
What is the improvement of typing: compress(lzw) over: lzwCompress() ?
Jun 04 2013
parent reply Timothee Cour <thelastmammoth gmail.com> writes:
On Tue, Jun 4, 2013 at 11:37 AM, Walter Bright
<newshound2 digitalmars.com>wrote:

 On 6/4/2013 11:04 AM, John Colvin wrote:

 On Tuesday, 4 June 2013 at 17:50:47 UTC, Walter Bright wrote:

 What value does a function which just passes an alias to another one add?
A unified interface called "compress" that takes a compression function as an alias (with e.g. lzwCompress as a default) seems like a nicer way of working, seeing as people don't necessarily care/know about which algorithm they're using, they just want to compress something a bit. Also, it would be cool if a range could remember which algorithm it was compressed with (as it's type? I.e. LzwRange), so a generic function "expand" could call the appropriate ***Expand
What is the improvement of typing: compress(lzw) over: lzwCompress() ?
writing generic code. same reason as why we prefer: auto y=to!double(x) over auto y=to_double(x);
Jun 04 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
Jun 04 2013
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, June 04, 2013 11:46:48 Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
Well, I'd expect it to be compress!lzw(), but in any case, what it buys you is that you can pass the algorithm around without caring what it is so that while code higher up on the stack may have to know that it's lzw, code deeper down doesn't have to care what type of algorithm it's using. Now, whether that flexibility is all that useful in this particular case, I don't know, but it _does_ help with generic code. It's like how a lot of std.algorithm takes its predicate as an alias. - Jonathan M Davis
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 11:55 AM, Jonathan M Davis wrote:
 Well, I'd expect it to be compress!lzw(), but in any case, what it buys you is
 that you can pass the algorithm around without caring what it is so that while
 code higher up on the stack may have to know that it's lzw, code deeper down
 doesn't have to care what type of algorithm it's using. Now, whether that
 flexibility is all that useful in this particular case, I don't know, but it
 _does_ help with generic code. It's like how a lot of std.algorithm takes its
 predicate as an alias.
There is zero utility in this: auto compress(alias dg) { return dg(); } Not even for generic code.
Jun 04 2013
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, June 04, 2013 13:15:07 Walter Bright wrote:
 On 6/4/2013 11:55 AM, Jonathan M Davis wrote:
 Well, I'd expect it to be compress!lzw(), but in any case, what it buys
 you is that you can pass the algorithm around without caring what it is
 so that while code higher up on the stack may have to know that it's lzw,
 code deeper down doesn't have to care what type of algorithm it's using.
 Now, whether that flexibility is all that useful in this particular case,
 I don't know, but it _does_ help with generic code. It's like how a lot
 of std.algorithm takes its predicate as an alias.
There is zero utility in this: auto compress(alias dg) { return dg(); } Not even for generic code.
If that's all it's doing, then no, it wouldn't be useful to pass it as an argument. I was just pointing out that there are plenty of cases where passing functions to generic algorithms is an improvement. I haven't looked at what you've done yet, so I can't really comment on the details of this particular case. - Jonathan M Davis
Jun 04 2013
prev sibling parent reply Byron Heads <byron.heads gmail.com> writes:
On Tue, 04 Jun 2013 13:15:07 -0700, Walter Bright wrote:

 On 6/4/2013 11:55 AM, Jonathan M Davis wrote:
 Well, I'd expect it to be compress!lzw(), but in any case, what it buys
 you is that you can pass the algorithm around without caring what it is
 so that while code higher up on the stack may have to know that it's
 lzw, code deeper down doesn't have to care what type of algorithm it's
 using. Now, whether that flexibility is all that useful in this
 particular case, I don't know, but it _does_ help with generic code.
 It's like how a lot of std.algorithm takes its predicate as an alias.
There is zero utility in this: auto compress(alias dg) { return dg(); } Not even for generic code.
but a compress interface would be nice: interface Compress { ubyte[] compress(ubyte[]); ubyte[] uncompress(ubyte[]); } that way you can use any compress algorithm bool send(Compress)(Socket sock);
Jun 04 2013
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
05-Jun-2013 00:30, Byron Heads пишет:
 On Tue, 04 Jun 2013 13:15:07 -0700, Walter Bright wrote:

 On 6/4/2013 11:55 AM, Jonathan M Davis wrote:
 Well, I'd expect it to be compress!lzw(), but in any case, what it buys
 you is that you can pass the algorithm around without caring what it is
 so that while code higher up on the stack may have to know that it's
 lzw, code deeper down doesn't have to care what type of algorithm it's
 using. Now, whether that flexibility is all that useful in this
 particular case, I don't know, but it _does_ help with generic code.
 It's like how a lot of std.algorithm takes its predicate as an alias.
There is zero utility in this: auto compress(alias dg) { return dg(); } Not even for generic code.
but a compress interface would be nice: interface Compress { ubyte[] compress(ubyte[]); ubyte[] uncompress(ubyte[]); } that way you can use any compress algorithm bool send(Compress)(Socket sock);
It's a range already thus composable. Ranged I/O though is something to come some time in near future (Steve?) -- Dmitry Olshansky
Jun 04 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 1:30 PM, Byron Heads wrote:
 but a compress interface would be nice:

 interface Compress
 {
      ubyte[] compress(ubyte[]);
      ubyte[] uncompress(ubyte[]);
 }

 that way you can use any compress algorithm
 bool send(Compress)(Socket sock);
That isn't how ranges work. Ranges already define an input and an output interface. We don't need to invent another scheme.
Jun 04 2013
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 4 June 2013 at 18:46:49 UTC, Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
Currently. However, compress could become more feature-rich in the future. Perhaps there's some scope for automatic algorithm/parameter selection based on the type and length(if available) of what gets passed.
Jun 04 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 4 June 2013 at 19:00:35 UTC, John Colvin wrote:
 On Tuesday, 4 June 2013 at 18:46:49 UTC, Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
Currently. However, compress could become more feature-rich in the future. Perhaps there's some scope for automatic algorithm/parameter selection based on the type and length(if available) of what gets passed.
I think this is over-engineering. It's unlikely that an application will need to support multiple compression algorithms in the same piece of code, and even if it did, it would be trivial to implement this on top of the simple interface that Walter is using.
Jun 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 12:41 PM, Peter Alexander wrote:
 I think this is over-engineering. It's unlikely that an application will need
to
 support multiple compression algorithms in the same piece of code, and even if
 it did, it would be trivial to implement this on top of the simple interface
 that Walter is using.
Yup. My experience with abstractions that have no use cases is all the wrong things get abstracted. And by my experience, I include every one I've seen other people write as well as my own. My favorite is windows.h. It was originally written for 16 bit Windows, and had all kinds of abstractions to make it portable for a future 32 bit Windows. Unfortunately, apparently nobody working on windows.h had any experience with 32 bit code, and the abstractions turned out to be all wrong.
Jun 04 2013
parent Paulo Pinto <pjmlp progtools.org> writes:
Am 04.06.2013 22:20, schrieb Walter Bright:
 On 6/4/2013 12:41 PM, Peter Alexander wrote:
 I think this is over-engineering. It's unlikely that an application
 will need to
 support multiple compression algorithms in the same piece of code, and
 even if
 it did, it would be trivial to implement this on top of the simple
 interface
 that Walter is using.
Yup. My experience with abstractions that have no use cases is all the wrong things get abstracted. And by my experience, I include every one I've seen other people write as well as my own. My favorite is windows.h. It was originally written for 16 bit Windows, and had all kinds of abstractions to make it portable for a future 32 bit Windows. Unfortunately, apparently nobody working on windows.h had any experience with 32 bit code, and the abstractions turned out to be all wrong.
Yep, it brings back some memories.
Jun 05 2013
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/4/13 2:46 PM, Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
Not absolutely nothing. Almost nothing. The distinction is important. Andrei
Jun 04 2013
prev sibling parent reply "Max Samukha" <maxsamukha gmail.com> writes:
On Tuesday, 4 June 2013 at 18:46:49 UTC, Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
That "absolutely" based on limited personal experience is the biggest D's problem.
Jun 04 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/13 12:44 AM, Max Samukha wrote:
 On Tuesday, 4 June 2013 at 18:46:49 UTC, Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
That "absolutely" based on limited personal experience is the biggest D's problem.
It's a point, but "biggest" is also kind of too much and based on limited personal experience :o). Andrei
Jun 04 2013
next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Wednesday, 5 June 2013 at 04:54:46 UTC, Andrei Alexandrescu 
wrote:
 On 6/5/13 12:44 AM, Max Samukha wrote:
 On Tuesday, 4 June 2013 at 18:46:49 UTC, Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
That "absolutely" based on limited personal experience is the biggest D's problem.
It's a point, but "biggest" is also kind of too much and based on limited personal experience :o). Andrei
Yeah, I noticed that.
Jun 04 2013
prev sibling parent "Zach the Mystic" <reachzach gggggmail.com> writes:
On Wednesday, 5 June 2013 at 04:54:46 UTC, Andrei Alexandrescu 
wrote:
 That "absolutely" based on limited personal experience is the 
 biggest
 D's problem.
It's a point, but "biggest" is also kind of too much and based on limited personal experience :o). Andrei
Hey, if you ever need someone who can reliably answer with limited personal experience, I'm available. :-)
Jun 05 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 9:44 PM, Max Samukha wrote:
 On Tuesday, 4 June 2013 at 18:46:49 UTC, Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
That "absolutely" based on limited personal experience is the biggest D's problem.
I've seen an awful lot of abstractions over the years that provided zero value. You need to provide a compelling use case to justify another layer of complexity. "generic code" is not a compelling use case. It's already generic. Note how these components are to be used: src.lzwCompress.copy(dst); Your proposal is: src.compress(lzw).copy(dst); I.e. zero value, as so far all compress() does is call lzw(). The whole point of range-based pipeline programming is you can just plug in different components. There is no demonstrated use case for adding another layer. I am actually wrong in saying it has zero value. It has negative value :-)
Jun 04 2013
next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Wednesday, 5 June 2013 at 06:18:54 UTC, Walter Bright wrote:
 On 6/4/2013 9:44 PM, Max Samukha wrote:
 On Tuesday, 4 June 2013 at 18:46:49 UTC, Walter Bright wrote:
 On 6/4/2013 11:43 AM, Timothee Cour wrote:
 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
That "absolutely" based on limited personal experience is the biggest D's problem.
I've seen an awful lot of abstractions over the years that provided zero value.
I understand. But I've also seen a lot of abstractions over the years that seemed useless initially but were discovered to be extremely useful later (Bayes theorem is an example - it took 300 years to find a concrete use for it). So "a compelling use case" is not a sufficient criterion for evaluating usefulness of abstractions.
 You need to provide a compelling use case to justify another 
 layer of complexity. "generic code" is not a compelling use 
 case. It's already generic.

 Note how these components are to be used:

     src.lzwCompress.copy(dst);

 Your proposal is:

     src.compress(lzw).copy(dst);

 I.e. zero value, as so far all compress() does is call lzw().
That's not my proposal. Honestly I didn't even take a close look at it. I just felt like it was time to attack you - there is an explicit permission for casual trolling you gave.
 The whole point of range-based pipeline programming is you can 
 just plug in different components. There is no demonstrated use 
 case for adding another layer.
Ok.
 I am actually wrong in saying it has zero value. It has 
 negative value :-)
In this particular case, maybe.
Jun 04 2013
prev sibling next sibling parent reply Timothee Cour <thelastmammoth gmail.com> writes:
What I suggested in my original post didn't involve any
indirection/abstraction; simply a renaming to be consistent with existing
zlib (see my points A+B in my 1st post on this thread):

std.compress.zlib.compress
std.compress.zlib.uncompress
std.compress.lzw.compress
std.compress.lzw.uncompress

same reason we have: std.file.write, std.stdio.write, etc, and not
std.fileWrite, std.stdioWrite.

On Tue, Jun 4, 2013 at 11:18 PM, Walter Bright
<newshound2 digitalmars.com>wrote:

 On 6/4/2013 9:44 PM, Max Samukha wrote:

 On Tuesday, 4 June 2013 at 18:46:49 UTC, Walter Bright wrote:

 On 6/4/2013 11:43 AM, Timothee Cour wrote:

 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
The situations aren't comparable. The to!double case is parameterizing with a type, the compress one is not. Secondly, compress(lzw) does ABSOLUTELY NOTHING but turn around and call lzw. It adds nothing.
That "absolutely" based on limited personal experience is the biggest D's problem.
I've seen an awful lot of abstractions over the years that provided zero value. You need to provide a compelling use case to justify another layer of complexity. "generic code" is not a compelling use case. It's already generic. Note how these components are to be used: src.lzwCompress.copy(dst); Your proposal is: src.compress(lzw).copy(dst); I.e. zero value, as so far all compress() does is call lzw(). The whole point of range-based pipeline programming is you can just plug in different components. There is no demonstrated use case for adding another layer. I am actually wrong in saying it has zero value. It has negative value :-)
Jun 04 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/13 2:55 AM, Timothee Cour wrote:
 What I suggested in my original post didn't involve any
 indirection/abstraction; simply a renaming to be consistent with
 existing zlib (see my points A+B in my 1st post on this thread):

 std.compress.zlib.compress
 std.compress.zlib.uncompress
 std.compress.lzw.compress
 std.compress.lzw.uncompress
I think that's nice. Andrei
Jun 05 2013
next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 5 June 2013 at 12:55:50 UTC, Andrei Alexandrescu
wrote:
 On 6/5/13 2:55 AM, Timothee Cour wrote:
 What I suggested in my original post didn't involve any
 indirection/abstraction; simply a renaming to be consistent 
 with
 existing zlib (see my points A+B in my 1st post on this 
 thread):

 std.compress.zlib.compress
 std.compress.zlib.uncompress
 std.compress.lzw.compress
 std.compress.lzw.uncompress
I think that's nice.
+1. D has many powerful features for handling module namespacing (e.g. "import lzw = std.compress.lzw"), let's enable people to make use of them. David
Jun 05 2013
prev sibling parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:koncgm$9f5$1 digitalmars.com...
 On 6/5/13 2:55 AM, Timothee Cour wrote:
 What I suggested in my original post didn't involve any
 indirection/abstraction; simply a renaming to be consistent with
 existing zlib (see my points A+B in my 1st post on this thread):

 std.compress.zlib.compress
 std.compress.zlib.uncompress
 std.compress.lzw.compress
 std.compress.lzw.uncompress
I think that's nice. Andrei
This has the problem that you now can't import more than one compression module and still use ufcs. The annoying one I keep hitting in phobos is std.file.write vs std.stdio.write. For range-based APIs it is a huge pita to have to switch away from ufcs. I think xyzCompress is still pretty sweet, consistent, and completely fixes the problem. It has the added benefit that you can tell which compression algorithm is being used without having to know what is imported. I would not have a problem with each module providing both 'compress' and 'xyzCompress', but that is against phobos policy.
Jun 09 2013
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, June 09, 2013 17:12:16 Daniel Murphy wrote:
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message
 news:koncgm$9f5$1 digitalmars.com...
 
 On 6/5/13 2:55 AM, Timothee Cour wrote:
 What I suggested in my original post didn't involve any
 indirection/abstraction; simply a renaming to be consistent with
 existing zlib (see my points A+B in my 1st post on this thread):
 
 std.compress.zlib.compress
 std.compress.zlib.uncompress
 std.compress.lzw.compress
 std.compress.lzw.uncompress
I think that's nice. Andrei
This has the problem that you now can't import more than one compression module and still use ufcs. The annoying one I keep hitting in phobos is std.file.write vs std.stdio.write. For range-based APIs it is a huge pita to have to switch away from ufcs. I think xyzCompress is still pretty sweet, consistent, and completely fixes the problem. It has the added benefit that you can tell which compression algorithm is being used without having to know what is imported.
That can be fixed by using a local alias, but it's true that it's an extra annoyance. - Jonathan M Davis
Jun 09 2013
prev sibling parent reply Timothee Cour <thelastmammoth gmail.com> writes:
On Sun, Jun 9, 2013 at 12:53 AM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

 On Sunday, June 09, 2013 17:12:16 Daniel Murphy wrote:
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message
 news:koncgm$9f5$1 digitalmars.com...

 On 6/5/13 2:55 AM, Timothee Cour wrote:
 What I suggested in my original post didn't involve any
 indirection/abstraction; simply a renaming to be consistent with
 existing zlib (see my points A+B in my 1st post on this thread):

 std.compress.zlib.compress
 std.compress.zlib.uncompress
 std.compress.lzw.compress
 std.compress.lzw.uncompress
I think that's nice. Andrei
This has the problem that you now can't import more than one compression module and still use ufcs. The annoying one I keep hitting in phobos is std.file.write vs std.stdio.write. For range-based APIs it is a huge
pita
 to have to switch away from ufcs.  I think xyzCompress is still pretty
 sweet, consistent, and completely fixes the problem.  It has the added
 benefit that you can tell which compression algorithm is being used
without
 having to know what is imported.
That can be fixed by using a local alias, but it's true that it's an extra annoyance. - Jonathan M Davis
which is why I have suggested supporting UFCS with fully qualified function names: auto a="".(std.path.join)("\n"); myfile.(std.file.write)(text); text.(std.stdio.write); see post: support UFCS with fully qualified function names (was in "digitalmars.D.learn") http://forum.dlang.org/post/mailman.1453.1369099708.4724.digitalmars-d puremagic.com it also helps searchability: if one uses local aliases such as import std.stdio:write2=write, naive searching via grep 'write(' will miss such cases. The increase in complexity is minimal, and the feature makes sense with the rest of the language.
Jun 09 2013
parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Timothee Cour" <thelastmammoth gmail.com> wrote in message 
news:mailman.999.1370827257.13711.digitalmars-d puremagic.com...
 which is why I have suggested supporting UFCS with fully qualified 
 function
 names:

 auto a="".(std.path.join)("\n");
 myfile.(std.file.write)(text);
 text.(std.stdio.write);

 see post: support UFCS with fully qualified function names (was in
 "digitalmars.D.learn")
 http://forum.dlang.org/post/mailman.1453.1369099708.4724.digitalmars-d puremagic.com
I'm not a huge fan of this syntax. If we were adding syntax, I would prefer a new operator with lower precedence than '.' eg auto a = "" -> std.path.join("\n"); But I'm not sure the problem is big enough to warrant new syntax.
 it also helps searchability: if one uses local aliases such as import
 std.stdio:write2=write, naive searching via grep 'write(' will miss such
 cases. The increase in complexity is minimal, and the feature makes sense
 with the rest of the language.
I agree, renamed imports make code harder to understand, and harder to refactor. In this case we can prevent problem simply by not giving functions generic names like 'compress'. Ideally you should be able to import the entire standard library with no name conflicts.
Jun 09 2013
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, June 10, 2013 11:44:56 Daniel Murphy wrote:
 In this case we can prevent problem simply by not giving functions generic
 names like 'compress'.  Ideally you should be able to import the entire
 standard library with no name conflicts.
We've actually made the opposite choice when discussing this in the past. We've specifically gone for making functions which do the same thing in different modules having the same name (e.g. std.ascii and std.uni), which makes swapping one for the other easy and avoids having to come up with distinct names, though it does obviously create more naming conflicts when you try and mix and match such modules. I'd also point out that it's been argued that it's a failure of the module system if we're specifically trying to avoid having different modules have functions with the same name. It's the module system's job to differentiate such functions, and specifically avoiding naming stuff the same to avoid naming conflicts means that you're pretty much ignoring the module system. So, the general approach has been to name functions differently when they do different things and name them the same when they do the same thing and then let the module system take care of differentiating between the two when you need to. - Jonathan M Davis
Jun 09 2013
next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Monday, 10 June 2013 at 01:59:29 UTC, Jonathan M Davis wrote:
 On Monday, June 10, 2013 11:44:56 Daniel Murphy wrote:
 In this case we can prevent problem simply by not giving 
 functions generic
 names like 'compress'.  Ideally you should be able to import 
 the entire
 standard library with no name conflicts.
We've actually made the opposite choice when discussing this in the past. We've specifically gone for making functions which do the same thing in different modules having the same name (e.g. std.ascii and std.uni), which makes swapping one for the other easy and avoids having to come up with distinct names, though it does obviously create more naming conflicts when you try and mix and match such modules. I'd also point out that it's been argued that it's a failure of the module system if we're specifically trying to avoid having different modules have functions with the same name. It's the module system's job to differentiate such functions, and specifically avoiding naming stuff the same to avoid naming conflicts means that you're pretty much ignoring the module system. So, the general approach has been to name functions differently when they do different things and name them the same when they do the same thing and then let the module system take care of differentiating between the two when you need to. - Jonathan M Davis
You are wise and speak the truth :P
Jun 09 2013
prev sibling parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Jonathan M Davis" <jmdavisProg gmx.com> wrote in message 
news:mailman.1001.1370829569.13711.digitalmars-d puremagic.com...
 On Monday, June 10, 2013 11:44:56 Daniel Murphy wrote:
 In this case we can prevent problem simply by not giving functions 
 generic
 names like 'compress'.  Ideally you should be able to import the entire
 standard library with no name conflicts.
We've actually made the opposite choice when discussing this in the past. We've specifically gone for making functions which do the same thing in different modules having the same name (e.g. std.ascii and std.uni), which makes swapping one for the other easy and avoids having to come up with distinct names, though it does obviously create more naming conflicts when you try and mix and match such modules. I'd also point out that it's been argued that it's a failure of the module system if we're specifically trying to avoid having different modules have functions with the same name. It's the module system's job to differentiate such functions, and specifically avoiding naming stuff the same to avoid naming conflicts means that you're pretty much ignoring the module system. So, the general approach has been to name functions differently when they do different things and name them the same when they do the same thing and then let the module system take care of differentiating between the two when you need to. - Jonathan M Davis
The difference here is these are range functions and you lose ufcs. It doesn't make much difference unless you are trying to chain them. Ranges, and call chaining of range-based functions using ufcs, are among the most attractive features of phobos. Let's define a new general approach, and keep them conflict-free when possible. Also, compress is a ridiculously general name for a function.
Jun 11 2013
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 11 June 2013 at 13:13:56 UTC, Daniel Murphy wrote:
 Also, compress is a ridiculously general name for a function.
We have module-level functions called "copy" (multiple), "read", "write", "map", etc. already, and it's not a bad thing! It's OK because the full name is not "compress", but "std.compression.lz77.compress". This way, how specific the code wants to be depends on the user and the particular use-case, instead of one-size-fits-all alternatives like "lz77Compress". There's no redundancy in the name yet we still have the option to be pin-point specific (e.g. static import), and yes, we still get to use UFCS! To eliminate the UFCS problem - which doesn't happen very often (how often do you want to use two different compression algorithms in the same unit?), we can (must?) use renamed symbols when importing. Since any example using multiple "compress" functions would be contrived, I'll use an existing conflict - the case of "copy". The following program backs up the specified files and writes a nicely formatted message to stdout (OK, so a tiny bit contrived): ---- void main(string[] args) { import std.algorithm : chain, copy, joiner; import std.array : empty; import std.file : fileCopy = copy; // `fileCopy` is std.file.copy import std.stdio : stdout; auto fileNames = args[1 .. $]; foreach(fileName; fileNames) fileName.fileCopy(fileName ~ ".bak"); if(!fileNames.empty) "Backed up the following files: " .chain(fileNames.joiner(", ")) .copy(stdout.lockingTextWriter()); } ---- By eliminating redundancies from symbol names, we empower the user, and the module system offers all the tools necessary to solve conflicts in a variety of ways.
Jun 11 2013
next sibling parent reply Timothee Cour <thelastmammoth gmail.com> writes:
On Tue, Jun 11, 2013 at 11:22 AM, Jakob Ovrum <jakobovrum gmail.com> wrote:

 On Tuesday, 11 June 2013 at 13:13:56 UTC, Daniel Murphy wrote:

 Also, compress is a ridiculously general name for a function.
We have module-level functions called "copy" (multiple), "read", "write", "map", etc. already, and it's not a bad thing! It's OK because the full name is not "compress", but "std.compression.lz77. **compress". This way, how specific the code wants to be depends on the user and the particular use-case, instead of one-size-fits-all alternatives like "lz77Compress". There's no redundancy in the name yet we still have the option to be pin-point specific (e.g. static import), and yes, we still get to use UFCS! To eliminate the UFCS problem - which doesn't happen very often (how often do you want to use two different compression algorithms in the same unit?), we can (must?) use renamed symbols when importing.
I have found a better way to do that: see http://forum.dlang.org/post/mailman.1002.1370829729.13711.digitalmars-d-learn puremagic.com subject: 'best way to handle UFCS with ambiguous names: using std.typetuple.Alias!' syntax: 'arg1.Alias!(std.file.write).arg2'* see related discussion for reasoning. I'd like to push this as standard way to deal with ambiguities.
 Since any example using multiple "compress" functions would be contrived,
 I'll use an existing conflict - the case of "copy".

 The following program backs up the specified files and writes a nicely
 formatted message to stdout (OK, so a tiny bit contrived):
 ----
 void main(string[] args)
 {
         import std.algorithm : chain, copy, joiner;
         import std.array : empty;
         import std.file : fileCopy = copy; // `fileCopy` is std.file.copy
         import std.stdio : stdout;

         auto fileNames = args[1 .. $];

         foreach(fileName; fileNames)
                 fileName.fileCopy(fileName ~ ".bak");

         if(!fileNames.empty)
                 "Backed up the following files: "
                         .chain(fileNames.joiner(", "))
                         .copy(stdout.**lockingTextWriter());
 }
 ----

 By eliminating redundancies from symbol names, we empower the user, and
 the module system offers all the tools necessary to solve conflicts in a
 variety of ways.
Jun 11 2013
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 11 June 2013 at 18:43:45 UTC, Timothee Cour wrote:
 I have found a better way to do that: see
 http://forum.dlang.org/post/mailman.1002.1370829729.13711.digitalmars-d-learn puremagic.com
 subject: 'best way to handle UFCS with ambiguous names: using
 std.typetuple.Alias!'
 syntax: 'arg1.Alias!(std.file.write).arg2'*
 see related discussion for reasoning. I'd like to push this as 
 standard way
 to deal with ambiguities.
It's clearly an option, but I think it's too syntactically heavy, causing more harm than good (the idea of UFCS is, of course, readability!). Since these conflicting symbols are in the minority for the vast majority of code units, I think renamed symbols are much, much better.
Jun 11 2013
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, June 11, 2013 20:50:17 Jakob Ovrum wrote:
 On Tuesday, 11 June 2013 at 18:43:45 UTC, Timothee Cour wrote:
 I have found a better way to do that: see
 http://forum.dlang.org/post/mailman.1002.1370829729.13711.digitalmars-d-le
 arn puremagic.com subject: 'best way to handle UFCS with ambiguous names:
 using
 std.typetuple.Alias!'
 syntax: 'arg1.Alias!(std.file.write).arg2'*
 see related discussion for reasoning. I'd like to push this as
 standard way
 to deal with ambiguities.
It's clearly an option, but I think it's too syntactically heavy, causing more harm than good (the idea of UFCS is, of course, readability!). Since these conflicting symbols are in the minority for the vast majority of code units, I think renamed symbols are much, much better.
Agreed. - Jonathan M Davis
Jun 11 2013
prev sibling parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Jakob Ovrum" <jakobovrum gmail.com> wrote in message 
news:fjmuuahorgbwkcvygnqq forum.dlang.org...
 On Tuesday, 11 June 2013 at 13:13:56 UTC, Daniel Murphy wrote:
 Also, compress is a ridiculously general name for a function.
We have module-level functions called "copy" (multiple), "read", "write", "map", etc. already, and it's not a bad thing!
It is.
 It's OK because the full name is not "compress", but 
 "std.compression.lz77.compress". This way, how specific the code wants to 
 be depends on the user and the particular use-case, instead of 
 one-size-fits-all alternatives like "lz77Compress". There's no redundancy 
 in the name yet we still have the option to be pin-point specific (e.g. 
 static import), and yes, we still get to use UFCS!
There is a reason we don't call every function in phobos 'process' and let the module name tell us what is actually does - when you see the name in your source code, it is easy to recognize what is being done.
 To eliminate the UFCS problem - which doesn't happen very often (how often 
 do you want to use two different compression algorithms in the same 
 unit?), we can (must?) use renamed symbols when importing.
My workplace has a fire extinguisher, but this doesn't mean lighting fires is a good idea. I know we have the tools to disambiguate, but they come at a syntax and/or clarity cost. Why create a problem when we don't have to?
 Since any example using multiple "compress" functions would be contrived, 
 I'll use an existing conflict - the case of "copy".
Eg. Code which implements http compression with support for multiple algorithms. tl;dr We have great tools to disambiguate when we have to. Let's not have to.
Jun 11 2013
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 11 June 2013 at 22:34:55 UTC, Daniel Murphy wrote:
 There is a reason we don't call every function in phobos 
 'process' and let
 the module name tell us what is actually does - when you see 
 the name in
 your source code, it is easy to recognize what is being done.
"copy", "write" and "compress" are perfectly recognizable names.
 tl;dr We have great tools to disambiguate when we have to.  
 Let's not have
 to.
The way I see it, you're asking that all code should pay for the benefit of a minority of cases. I'd choose the inverse.
Jun 12 2013
parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Jakob Ovrum" <jakobovrum gmail.com> wrote in message 
news:sdgqfozqnysbnumynkvp forum.dlang.org...
 On Tuesday, 11 June 2013 at 22:34:55 UTC, Daniel Murphy wrote:
 There is a reason we don't call every function in phobos 'process' and 
 let
 the module name tell us what is actually does - when you see the name in
 your source code, it is easy to recognize what is being done.
"copy", "write" and "compress" are perfectly recognizable names.
Ok, how exactly is the data compressed in the following snippet? No scrolling up to the top of the module to see what's imported! newdata = data.compress();
 tl;dr We have great tools to disambiguate when we have to.  Let's not 
 have
 to.
The way I see it, you're asking that all code should pay for the benefit of a minority of cases. I'd choose the inverse.
This is not a function that will be used every few lines. Making the name a little longer for an increase in clarity is usually seen as a good idea.
Jun 13 2013
next sibling parent "Michal Minich" <michal.minich gmail.com> writes:
On Thursday, 13 June 2013 at 11:36:16 UTC, Daniel Murphy wrote:

 Ok, how exactly is the data compressed in the following 
 snippet?  No
 scrolling up to the top of the module to see what's imported!

 newdata = data.compress();
You can have that argument for any single overload and virtual call. At least you know it statically; with virtual you don't know until runtime... In many languages you would have interface ICompressor { Stream compress (Stream s) }...
Jun 13 2013
prev sibling next sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Thursday, 13 June 2013 at 11:36:16 UTC, Daniel Murphy wrote:
 Ok, how exactly is the data compressed in the following 
 snippet?  No
 scrolling up to the top of the module to see what's imported!

 newdata = data.compress();
If it's not obvious from the context, just be explicit. newdata = std.compression.lz77.compress(data); Don't force verbosity on everyone just in case someone wants it.
Jun 13 2013
next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/13/13, Peter Alexander <peter.alexander.au gmail.com> wrote:
 If it's not obvious from the context, just be explicit.

 newdata = std.compression.lz77.compress(data);

 Don't force verbosity on everyone just in case someone wants it.
What happens when we get std.compression.lz78 and you end up accidentally calling compress on with lz77 and expand with lz78? Pseudocoding: module deserialize; import std.compression.lz77; auto readfile(string filename) { return readFile(filename).expand; } module serialize; import std.compression.lz78; // oops! void writeFile(T)(T[] data, string filename) { writeFile(filename, data.compress); } Imports are incredibly easy to screw up. But if we used types instead of global modules then we could not only make our calling code clearer (and less buggy), but it would also allow us to use package imports so we can use any compression algorithm: module deserialize; import std.compression; // package import, e.g. imports lz77, lz78, etc modules auto readfile(string filename) { return filename.readFile.lz77.expand; } module serialize; import std.compression; // package import void writeFile(T)(T[] data, string filename) { data.lz77.compress.writeFile(filename); } "lz77" would be an auto function which takes the buffer and returns a Lz77 struct that has expand/compress methods.
Jun 13 2013
next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Thursday, 13 June 2013 at 13:15:03 UTC, Andrej Mitrovic wrote:
 What happens when we get std.compression.lz78 and you end up
 accidentally calling compress on with lz77 and expand with lz78?
The exact same typo could happen with your structs. You haven't solved anything:
 module serialize;
 import std.compression;  // package import
 void writeFile(T)(T[] data, string filename) {
 data.lz77.compress.writeFile(filename); }
void writeFile(T)(T[] data, string filename) { data.lz78.compress.writeFile(filename); } oops!
Jun 13 2013
prev sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Thursday, 13 June 2013 at 13:15:03 UTC, Andrej Mitrovic wrote:
 What happens when we get std.compression.lz78 and you end up
 accidentally calling compress on with lz77 and expand with lz78?
 […]
 Imports are incredibly easy to screw up.
I think this argument is invalid: A typo in an import statement is just as likely as in a function call. David
Jun 13 2013
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/13/13, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 But if we used types instead of global modules
*global functions*
Jun 13 2013
prev sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Thursday, 13 June 2013 at 11:36:16 UTC, Daniel Murphy wrote:
 Ok, how exactly is the data compressed in the following 
 snippet?  No
 scrolling up to the top of the module to see what's imported!
I don't need to scroll to the top of the module, just a few lines up because I'm using function-local imports anyway. :P If you want extra verbosity (which can be good *sometimes*), just write "import lz77 = std.compression.lz77" and you are good to go. David
Jun 13 2013
parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"David Nadlinger" <code klickverbot.at> wrote in message 
news:ahqzxzjbhmfiacwjgfkj forum.dlang.org...
 On Thursday, 13 June 2013 at 11:36:16 UTC, Daniel Murphy wrote:
 Ok, how exactly is the data compressed in the following snippet?  No
 scrolling up to the top of the module to see what's imported!
I don't need to scroll to the top of the module, just a few lines up because I'm using function-local imports anyway. :P If you want extra verbosity (which can be good *sometimes*), just write "import lz77 = std.compression.lz77" and you are good to go.
I don't think 4 characters is a high price to pay for the added clarity. Then there is no ambiguity, no need to rename imports, no problems using ufcs. Every time I see lz77Compress in anybody's code I know exactly what it does! I understand the motivation for shortening function names that will be used frequently... but this is not in that category.
Jun 13 2013
next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Thursday, 13 June 2013 at 23:45:12 UTC, Daniel Murphy wrote:
 I don't think 4 characters is a high price to pay for the added 
 clarity.
 Then there is no ambiguity, no need to rename imports, no 
 problems using
 ufcs.  Every time I see lz77Compress in anybody's code I know 
 exactly what
 it does!
import std.compression : lz77Compress = lz78Compress; ;)
Jun 13 2013
next sibling parent "Daniel Murphy" <yebblies nospamgmail.com> writes:
"David Nadlinger" <code klickverbot.at> wrote in message 
news:gzniyhyeuhjturqffgan forum.dlang.org...
 On Thursday, 13 June 2013 at 23:45:12 UTC, Daniel Murphy wrote:
 I don't think 4 characters is a high price to pay for the added clarity.
 Then there is no ambiguity, no need to rename imports, no problems using
 ufcs.  Every time I see lz77Compress in anybody's code I know exactly 
 what
 it does!
import std.compression : lz77Compress = lz78Compress; ;)
:(
Jun 13 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-06-14 01:53, David Nadlinger wrote:

 import std.compression : lz77Compress = lz78Compress;

 ;)
If you do that you only have yourself to blame. What if someone uses monkey patching and replaces all your functions at runtime. -- /Jacob Carlborg
Jun 14 2013
prev sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Thursday, 13 June 2013 at 23:45:12 UTC, Daniel Murphy wrote:
 I don't think 4 characters is a high price to pay for the added 
 clarity.
 Then there is no ambiguity, no need to rename imports, no 
 problems using
 ufcs.  Every time I see lz77Compress in anybody's code I know 
 exactly what
 it does!
I recommend you just use local imports if it bother you that much, then it's obvious: import std.compression.lz77; auto newdata = compress(data); Really, it should be obvious from the context which compression algorithm you are using.
 I understand the motivation for shortening function names that 
 will be used
 frequently... but this is not in that category.
This is not the motivation. The problem with lz77compress is that it is redundant: std.compression.lz77.lz77compress It's bad style to repeat the module name in module identifiers. It completely defeats the purpose of using modules as namespaces. If all the compression algorithms were inside std.compression instead of having their own modules then yes, lz77compress would be a fantastic name, but they're not, so it's not.
Jun 14 2013
prev sibling parent reply Timothee Cour <thelastmammoth gmail.com> writes:
ok I found what I think is the best solution to this problem :-)
see:
http://forum.dlang.org/post/mailman.1002.1370829729.13711.digitalmars-d-learn puremagic.com



On Sun, Jun 9, 2013 at 6:59 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

 On Monday, June 10, 2013 11:44:56 Daniel Murphy wrote:
 In this case we can prevent problem simply by not giving functions
generic
 names like 'compress'.  Ideally you should be able to import the entire
 standard library with no name conflicts.
We've actually made the opposite choice when discussing this in the past. We've specifically gone for making functions which do the same thing in different modules having the same name (e.g. std.ascii and std.uni), which makes swapping one for the other easy and avoids having to come up with distinct names, though it does obviously create more naming conflicts when you try and mix and match such modules. I'd also point out that it's been argued that it's a failure of the module system if we're specifically trying to avoid having different modules have functions with the same name. It's the module system's job to differentiate such functions, and specifically avoiding naming stuff the same to avoid naming conflicts means that you're pretty much ignoring the module system. So, the general approach has been to name functions differently when they do different things and name them the same when they do the same thing and then let the module system take care of differentiating between the two when you need to. - Jonathan M Davis
Jun 09 2013
parent "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Timothee Cour" <thelastmammoth gmail.com> wrote in message 
news:mailman.1003.1370829991.13711.digitalmars-d puremagic.com...
 ok I found what I think is the best solution to this problem :-)
 see:
 http://forum.dlang.org/post/mailman.1002.1370829729.13711.digitalmars-d-learn puremagic.com
That's pretty awesome, but still much much much uglier than not having to disambiguate in the first place.
Jun 11 2013
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, June 04, 2013 23:55:05 Timothee Cour wrote:
 What I suggested in my original post didn't involve any
 indirection/abstraction; simply a renaming to be consistent with existing
 zlib (see my points A+B in my 1st post on this thread):
 
 std.compress.zlib.compress
 std.compress.zlib.uncompress
 std.compress.lzw.compress
 std.compress.lzw.uncompress
 
 same reason we have: std.file.write, std.stdio.write, etc, and not
 std.fileWrite, std.stdioWrite.
So, you want to create whole modules for each compression algorithm? That seems like overkill to me. What Walter currently has isn't even 1000 lines long (and that's including the CircularBuffer helper struct). Splitting it up like that seems like over-modularation to me. - Jonathan M Daivs
Jun 04 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 11:59 PM, Jonathan M Davis wrote:
 So, you want to create whole modules for each compression algorithm?
Yes.
 That seems like overkill to me. What Walter currently has isn't even 1000 lines
 long (and that's including the CircularBuffer helper struct). Splitting it up
 like that seems like over-modularation to me.
When two modules have nothing to do with each other, they should be in separate modules.
Jun 05 2013
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, June 05, 2013 00:14:59 Walter Bright wrote:
 When two modules have nothing to do with each other, they should be in
 separate modules.
Except that they're all compression algorithms, so they _are_ related. Having modules that are only a few hundred lines long is very counterproductive IMHO. It's highly annoying how Java insists on splitting everything up into different files. You end up with a lot of small files to wade through. Fortunately, D doesn't force that, and I don't think that we should go that route by choice. There's no more reason to split all of these up then there is to put each algorithm in std.algorithm in its own module. And yes, I know that you like that idea, but it seems ridiculous to me to try and have only one or two functions per module. We don't want them to be huge, but having them be very small is just as harmful IMHO. - Jonathan M Davis
Jun 05 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/5/2013 12:29 AM, Jonathan M Davis wrote:
 On Wednesday, June 05, 2013 00:14:59 Walter Bright wrote:
 When two modules have nothing to do with each other, they should be in
 separate modules.
Except that they're all compression algorithms, so they _are_ related.
No, they are not related. They don't share code, and it is unlikely more than one would be called in any particular use case. Remember, module contents have private access to other parts of the module. This violates encapsulation when the parts are unrelated.
 Having
 modules that are only a few hundred lines long is very counterproductive IMHO.
Why? On the other hand, when you are trying to understand a module, having thousands of lines of things that have no connection to each other makes it difficult. It also makes debugging them harder than necessary.
 It's highly annoying how Java insists on splitting everything up into different
 files. You end up with a lot of small files to wade through.
Wade through for what? If you're having a problem with the lzw compressor, why would you find it more productive to wade through the huffman compressor to get to it?
 Fortunately, D
 doesn't force that, and I don't think that we should go that route by choice.
 There's no more reason to split all of these up then there is to put each
 algorithm in std.algorithm in its own module. And yes, I know that you like
 that idea, but it seems ridiculous to me to try and have only one or two
 functions per module. We don't want them to be huge, but having them be very
 small is just as harmful IMHO.
You need a better case as to why it is harmful. I've spent many miserable hours trying to find a bug in a phobos module that is a zillion lines of code, trying to strip out what is not necessary to repro the problem. I don't see what problem kitchen sink modules solve - my experience is that smaller, better contained abstractions are more productive than kitchen sinks.
Jun 05 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-06-05 08:59, Jonathan M Davis wrote:

 So, you want to create whole modules for each compression algorithm? That
 seems like overkill to me. What Walter currently has isn't even 1000 lines
 long (and that's including the CircularBuffer helper struct). Splitting it up
 like that seems like over-modularation to me.
The current modules in Phobos already contains too much. We shouldn't make the same mistake again. -- /Jacob Carlborg
Jun 05 2013
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, June 05, 2013 09:31:01 Jacob Carlborg wrote:
 On 2013-06-05 08:59, Jonathan M Davis wrote:
 So, you want to create whole modules for each compression algorithm? That
 seems like overkill to me. What Walter currently has isn't even 1000 lines
 long (and that's including the CircularBuffer helper struct). Splitting it
 up like that seems like over-modularation to me.
The current modules in Phobos already contains too much. We shouldn't make the same mistake again.
Maybe some do, but many don't, and 1000 lines is _far_ from too much. If we start making modules that small, we're going to end up with tons of them to wade through to find anything. - Jonathan M Davis
Jun 05 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/5/2013 12:38 AM, Jonathan M Davis wrote:
 Maybe some do, but many don't, and 1000 lines is _far_ from too much. If we
 start making modules that small, we're going to end up with tons of them to
 wade through to find anything.
1. It isn't any harder to find things in multiple files than in one file. 2. If there's a ton in one file, you have to wade through the ton to find what you're looking for. Your argument has merit if you are using a floppy disk drive for storage, as floppies are agonizingly slow to read files off of. But that problem disappeared 30 years ago. (At an SD conference back in the 80's, I was on a compiler panel with the compiler guys from Microsoft, Borland, etc. We were each asked how our respective compilers worked on floppy systems. The guys would say "well, you set it up this way, configure it that way, juggle what goes on which floppy, and you can do it!" I was the third of five guys, and my response was: "We charge $200 extra for the floppy disk development system, and ship you a hard disk with it." That was the end of that discussion, and I never heard that question again.
Jun 05 2013
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 08:11:14 UTC, Walter Bright wrote:
 On 6/5/2013 12:38 AM, Jonathan M Davis wrote:
 Maybe some do, but many don't, and 1000 lines is _far_ from 
 too much. If we
 start making modules that small, we're going to end up with 
 tons of them to
 wade through to find anything.
1. It isn't any harder to find things in multiple files than in one file.
Although I think you're right about having smaller modules, I generally find it easier to browse through a larger file than many smaller files. Multiple files is ok if you know what you're looking for (grep) but when you're just trying to scan across a system to get a feel for how it's working, juggling many files is a real pita.
Jun 05 2013
next sibling parent reply "Diggory" <diggsey googlemail.com> writes:
On Wednesday, 5 June 2013 at 11:30:10 UTC, John Colvin wrote:
 On Wednesday, 5 June 2013 at 08:11:14 UTC, Walter Bright wrote:
 On 6/5/2013 12:38 AM, Jonathan M Davis wrote:
 Maybe some do, but many don't, and 1000 lines is _far_ from 
 too much. If we
 start making modules that small, we're going to end up with 
 tons of them to
 wade through to find anything.
1. It isn't any harder to find things in multiple files than in one file.
Although I think you're right about having smaller modules, I generally find it easier to browse through a larger file than many smaller files. Multiple files is ok if you know what you're looking for (grep) but when you're just trying to scan across a system to get a feel for how it's working, juggling many files is a real pita.
Surely you would know which compression algorithm you wanted to change? If it's a general renaming or something not specific to a particular use then a file search is necessary anyway.
Jun 05 2013
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 11:57:19 UTC, Diggory wrote:
 On Wednesday, 5 June 2013 at 11:30:10 UTC, John Colvin wrote:
 On Wednesday, 5 June 2013 at 08:11:14 UTC, Walter Bright wrote:
 On 6/5/2013 12:38 AM, Jonathan M Davis wrote:
 Maybe some do, but many don't, and 1000 lines is _far_ from 
 too much. If we
 start making modules that small, we're going to end up with 
 tons of them to
 wade through to find anything.
1. It isn't any harder to find things in multiple files than in one file.
Although I think you're right about having smaller modules, I generally find it easier to browse through a larger file than many smaller files. Multiple files is ok if you know what you're looking for (grep) but when you're just trying to scan across a system to get a feel for how it's working, juggling many files is a real pita.
Surely you would know which compression algorithm you wanted to change? If it's a general renaming or something not specific to a particular use then a file search is necessary anyway.
I eas speaking more generally, about phobos as a whole.
Jun 05 2013
prev sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 5 June 2013 at 11:30:10 UTC, John Colvin wrote:
 Although I think you're right about having smaller modules, I 
 generally find it easier to browse through a larger file than 
 many smaller files.

 Multiple files is ok if you know what you're looking for (grep) 
 but when you're just trying to scan across a system to get a 
 feel for how it's working, juggling many files is a real pita.
Use an editor with a file tree sidebar? Quite on the contrary, I find many files to be much preferable, because you automatically have "bookmarks" in the source to come back to, and having the functionality already grouped in manageable logical units saves you from inferring that structure again, as it is the case when scrolling through a huge file. On a lighter note, if it's really a problem for you that module files are too small, what about just concatenating all the files in a given directory using a little shell magic? ;) David
Jun 05 2013
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 14:17:43 UTC, David Nadlinger wrote:
 On Wednesday, 5 June 2013 at 11:30:10 UTC, John Colvin wrote:
 Although I think you're right about having smaller modules, I 
 generally find it easier to browse through a larger file than 
 many smaller files.

 Multiple files is ok if you know what you're looking for 
 (grep) but when you're just trying to scan across a system to 
 get a feel for how it's working, juggling many files is a real 
 pita.
Use an editor with a file tree sidebar? Quite on the contrary, I find many files to be much preferable, because you automatically have "bookmarks" in the source to come back to, and having the functionality already grouped in manageable logical units saves you from inferring that structure again, as it is the case when scrolling through a huge file. On a lighter note, if it's really a problem for you that module files are too small, what about just concatenating all the files in a given directory using a little shell magic? ;) David
Agreed. To be honest, it's a trivial matter easily solved by a variety of tools, but I'm often just lazy and end up reading code with gedit or similar.
Jun 05 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-06-05 16:31, John Colvin wrote:

 Agreed.

 To be honest, it's a trivial matter easily solved by a variety of tools,
 but I'm often just lazy and end up reading code with gedit or similar.
Gedit has a file tree sidebar, at least as a plugin. -- /Jacob Carlborg
Jun 09 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-06-05 09:38, Jonathan M Davis wrote:

 Maybe some do, but many don't, and 1000 lines is _far_ from too much. If we
 start making modules that small, we're going to end up with tons of them to
 wade through to find anything.
I completely agree with Walter and he mad my point a lot better than I could. -- /Jacob Carlborg
Jun 05 2013
prev sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Wednesday, 5 June 2013 at 07:39:12 UTC, Jonathan M Davis wrote:
 Maybe some do, but many don't, and 1000 lines is _far_ from too 
 much. If we
 start making modules that small, we're going to end up with 
 tons of them to
 wade through to find anything.

 - Jonathan M Davis
We have a standard library in disagreement with the language's encapsulation mechanics. The module/package system in D is almost ignored in Phobos (and that's probably why the package system still has all these little things needing ironing out). It seems to owe influence to typical C and C++ library structure, which is simply suboptimal in D's module system. Third-party libraries tend to do a much better job at this. For example, Tango goes all out and embraces the package and module system, and the result is an extremely organized tree of modules with appropriate granularity. Code isn't hard to find because everything isn't just dumped into (bloated) blobs in a flat structure like in Phobos; it's organized into a tree. It seems like a no-brainer with the D language, and Phobos is the only D library I know that doesn't embrace this style of organization. The result is awful coupling throughout; with Phobos, we can't even write Hello World without pulling in half of the standard library. It's not just about the actual dependencies a module has, but the perceived dependencies; important from a readability perspective. I know a lot of D programmers embrace selective imports when working with Phobos, because just seeing a plain import statement such as "import std.datetime;" tells you very little about what the importing module actually does, and it's harder to figure out exactly where unqualified symbols come from when reading the module's code. I think the programmer should have a choice of convenience versus readability/fine dependency management when importing. The current module system does a decent job at enabling this already, and it's bound to get better with improvements like DIP37. Scripts and certain application code may want to prioritize productivity over finely managed dependencies, while library code - especially the *standard* library! - should definitely aim for lean coupling that makes sense. To that end, I think a lot of improvements can be made without breaking user code, but I'd be very much willing to see all kinds of breakage if it means we can get rid of the present standard library of substandard quality. The language may have been declared stable, but Phobos is in no laudable state.
Jun 05 2013
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, June 05, 2013 14:02:37 Jakob Ovrum wrote:
 We have a standard library in disagreement with the language's
 encapsulation mechanics. The module/package system in D is almost
 ignored in Phobos (and that's probably why the package system
 still has all these little things needing ironing out). It seems
 to owe influence to typical C and C++ library structure, which is
 simply suboptimal in D's module system.
I honestly don't see how Phobos is in disagreement with the module system. No, it doesn't use hierarchy as much as it should, and there are a few modules that are overly large (like std.algorithm or std.datetime), but for the most part, I don't see any problem with its level of encapsulation. It's mainly just its organization which could have been better. My primary objection here is that it seems ridiculous to me create lots of tiny modules. I hate how Java does that sort of thing, but there you're _forced_ to in many cases, whereas we have the opportunity to actually group things together in a single module where appropriate. And having whole modules with only one or two functions is way too small IMHO, and that seems to be what we're proposing here. - Jonathan M Davis
Jun 05 2013
parent "Diggory" <diggsey googlemail.com> writes:
On Wednesday, 5 June 2013 at 17:21:01 UTC, Jonathan M Davis wrote:
 On Wednesday, June 05, 2013 14:02:37 Jakob Ovrum wrote:
 We have a standard library in disagreement with the language's
 encapsulation mechanics. The module/package system in D is 
 almost
 ignored in Phobos (and that's probably why the package system
 still has all these little things needing ironing out). It 
 seems
 to owe influence to typical C and C++ library structure, which 
 is
 simply suboptimal in D's module system.
I honestly don't see how Phobos is in disagreement with the module system. No, it doesn't use hierarchy as much as it should, and there are a few modules that are overly large (like std.algorithm or std.datetime), but for the most part, I don't see any problem with its level of encapsulation. It's mainly just its organization which could have been better. My primary objection here is that it seems ridiculous to me create lots of tiny modules. I hate how Java does that sort of thing, but there you're _forced_ to in many cases, whereas we have the opportunity to actually group things together in a single module where appropriate. And having whole modules with only one or two functions is way too small IMHO, and that seems to be what we're proposing here. - Jonathan M Davis
I agree with one or two functions it's far too small, but I'm in favour of having only one or two top-level classes/structs per module (there will be exceptional cases but in general) For examples: std.regex - I think it would be better if each implementation had its own module, plus a separate module for the parts common to all of them. Importing std.regex would publicly import the lot using the new package system. std.range - module for tests, ie. isXXX and hasXXX, module for algorithms ie. retro, take, etc., module for class wrappers std.datetime - split each class/struct into own module, systime alone is ~8000 lines
Jun 05 2013
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 05, 2013 at 01:20:48PM -0400, Jonathan M Davis wrote:
 On Wednesday, June 05, 2013 14:02:37 Jakob Ovrum wrote:
 We have a standard library in disagreement with the language's
 encapsulation mechanics. The module/package system in D is almost
 ignored in Phobos (and that's probably why the package system
 still has all these little things needing ironing out). It seems
 to owe influence to typical C and C++ library structure, which is
 simply suboptimal in D's module system.
I honestly don't see how Phobos is in disagreement with the module system. No, it doesn't use hierarchy as much as it should, and there are a few modules that are overly large (like std.algorithm or std.datetime), but for the most part, I don't see any problem with its level of encapsulation. It's mainly just its organization which could have been better. My primary objection here is that it seems ridiculous to me create lots of tiny modules. I hate how Java does that sort of thing, but there you're _forced_ to in many cases, whereas we have the opportunity to actually group things together in a single module where appropriate. And having whole modules with only one or two functions is way too small IMHO, and that seems to be what we're proposing here.
[...] As Andrei pointed out, I think we need to look at this not from a size perspective (number of lines, number of functions, etc.), but from an API perspective: do these functions/structs belong together, or are they only marginally related? More precisely, if some user code uses function X, is that code equally likely to also use Y? Are there common use cases in which only Y is used, not X? If the use of function X almost always implies the use of function Y (and vice versa), then they belong in the same module. Otherwise, I'd say they are candidates for splitting up. If function X uses function Z, and function Y also uses function Z, but the use of X does not necessarily imply the use of Y (and vice versa), then I'd argue that X, Y, and Z should be in separate modules to maximize reuse and reduce the amount of code you have to pull in (you shouldn't be forced to pull in Z just because you use X which calls Y, which Z happens to also call). This may be a bit heavy-handed for user code, but for Phobos, the standard library, I think the bar should be set higher. After all, one of the stated goals of Phobos is that you shouldn't need to pull in a whole ton of code just because you call a single function. Right now I think we're a bit short of that goal. T -- All men are mortal. Socrates is mortal. Therefore all men are Socrates.
Jun 05 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Wednesday, 5 June 2013 at 18:21:04 UTC, H. S. Teoh wrote:
 On Wed, Jun 05, 2013 at 01:20:48PM -0400, Jonathan M Davis 
 wrote:
 On Wednesday, June 05, 2013 14:02:37 Jakob Ovrum wrote:
 We have a standard library in disagreement with the 
 language's
 encapsulation mechanics. The module/package system in D is 
 almost
 ignored in Phobos (and that's probably why the package system
 still has all these little things needing ironing out). It 
 seems
 to owe influence to typical C and C++ library structure, 
 which is
 simply suboptimal in D's module system.
I honestly don't see how Phobos is in disagreement with the module system. No, it doesn't use hierarchy as much as it should, and there are a few modules that are overly large (like std.algorithm or std.datetime), but for the most part, I don't see any problem with its level of encapsulation. It's mainly just its organization which could have been better. My primary objection here is that it seems ridiculous to me create lots of tiny modules. I hate how Java does that sort of thing, but there you're _forced_ to in many cases, whereas we have the opportunity to actually group things together in a single module where appropriate. And having whole modules with only one or two functions is way too small IMHO, and that seems to be what we're proposing here.
[...] As Andrei pointed out, I think we need to look at this not from a size perspective (number of lines, number of functions, etc.), but from an API perspective: do these functions/structs belong together, or are they only marginally related? More precisely, if some user code uses function X, is that code equally likely to also use Y? Are there common use cases in which only Y is used, not X? If the use of function X almost always implies the use of function Y (and vice versa), then they belong in the same module. Otherwise, I'd say they are candidates for splitting up. If function X uses function Z, and function Y also uses function Z, but the use of X does not necessarily imply the use of Y (and vice versa), then I'd argue that X, Y, and Z should be in separate modules to maximize reuse and reduce the amount of code you have to pull in (you shouldn't be forced to pull in Z just because you use X which calls Y, which Z happens to also call). This may be a bit heavy-handed for user code, but for Phobos, the standard library, I think the bar should be set higher. After all, one of the stated goals of Phobos is that you shouldn't need to pull in a whole ton of code just because you call a single function. Right now I think we're a bit short of that goal.
Massive +1 Modules are for grouping functions/types that are commonly used together or have interdependencies, not for grouping things that are in a similar category (although these things can be related). I don't care if levenshteinDistance is a "classic algorithm", I don't want to have to compile it every time I want to take the minimum of two numbers. Barely anyone is ever going to use it, so it should be off in a module on its own. There's absolutely nothing wrong with having lots of small modules provided that you don't end up importing the same sets of modules over and over. There are numerous advantages: 1. Makes it easier to manage dependencies. 1a. reduces compile times. 1b. reduces binary size. 1c. benefits incremental and distributed/parallel compilation. 2. Makes version control easier as more files means merge conflicts are less likely. 3. Makes it easier to navigate files. The only downside is that you may occasionally have to import more modules.
Jun 06 2013
parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Thursday, 6 June 2013 at 14:26:51 UTC, Peter Alexander wrote:
 Modules are for grouping functions/types that are commonly used 
 together or have interdependencies, not for grouping things 
 that are in a similar category (although these things can be 
 related).

 I don't care if levenshteinDistance is a "classic algorithm", I 
 don't want to have to compile it every time I want to take the 
 minimum of two numbers. Barely anyone is ever going to use it, 
 so it should be off in a module on its own.

 There's absolutely nothing wrong with having lots of small 
 modules provided that you don't end up importing the same sets 
 of modules over and over. There are numerous advantages:

 1. Makes it easier to manage dependencies.
 1a. reduces compile times.
 1b. reduces binary size.
 1c. benefits incremental and distributed/parallel compilation.
 2. Makes version control easier as more files means merge 
 conflicts are less likely.
 3. Makes it easier to navigate files.

 The only downside is that you may occasionally have to import 
 more modules.
Wise words !
Jun 06 2013
prev sibling next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 5 June 2013 at 07:00:14 UTC, Jonathan M Davis wrote:
 So, you want to create whole modules for each compression 
 algorithm? That
 seems like overkill to me. What Walter currently has isn't even 
 1000 lines
 long (and that's including the CircularBuffer helper struct). 
 Splitting it up
 like that seems like over-modularation to me.
Modules are the unit of encapsulation in D (private), so they should always be as small as possible. As Andrei would say: Destroyed? David
Jun 05 2013
prev sibling parent reply "SomeDude" <lovelydear mailmetrash.com> writes:
On Wednesday, 5 June 2013 at 07:00:14 UTC, Jonathan M Davis wrote:
 So, you want to create whole modules for each compression 
 algorithm? That
 seems like overkill to me. What Walter currently has isn't even 
 1000 lines
 long (and that's including the CircularBuffer helper struct). 
 Splitting it up
 like that seems like over-modularation to me.

 - Jonathan M Daivs
Well, as the author of a 15,000 lines datetime module, I think your opinion is a little biased. *I* think 1,000 lines is a perfect size for a module.
Jun 05 2013
parent reply "Xiaoxi" <xiaoxi 163.com> writes:
On Wednesday, 5 June 2013 at 19:01:28 UTC, SomeDude wrote:
 On Wednesday, 5 June 2013 at 07:00:14 UTC, Jonathan M Davis 
 wrote:
 So, you want to create whole modules for each compression 
 algorithm? That
 seems like overkill to me. What Walter currently has isn't 
 even 1000 lines
 long (and that's including the CircularBuffer helper struct). 
 Splitting it up
 like that seems like over-modularation to me.

 - Jonathan M Daivs
Well, as the author of a 15,000 lines datetime module, I think your opinion is a little biased. *I* think 1,000 lines is a perfect size for a module.
are cross module / file, inling working on all d compilers? if not, bigger modules are better.
Jun 06 2013
parent "David Nadlinger" <code klickverbot.at> writes:
On Thursday, 6 June 2013 at 13:34:42 UTC, Xiaoxi wrote:
 are cross module / file, inling working on all d compilers? if 
 not, bigger modules are better.
This is not at all relevant if either a) the functions in question are templates, as it is the case here or b) the functions in a bigger module don't call each other anyway, such as in many kitchen-sink modules that just group vaguely related functionality together. David
Jun 06 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/4/13 2:43 PM, Timothee Cour wrote:
     What is the improvement of typing:

         compress(lzw)

     over:

         lzwCompress()

     ?


 writing generic code.
 same reason as why we prefer:
 auto y=to!double(x) over auto y=to_double(x);
I think the application here is a bit more tenuous. It's natural to think of a type-parameterized algorithm that needs to!T. But it's more of a long shot to think of an algorithm statically parameterized on the compression method. That could definitely intervene, but it's not likely to be frequent; and if it's not, a mixin can always take care of it. Andrei
Jun 04 2013
prev sibling parent reply David <d dav1d.de> writes:
Am 04.06.2013 18:09, schrieb Walter Bright:
 On 6/4/2013 6:34 AM, Jacob Carlborg wrote:
 I'm wondering if (un)compress can take the compressing algorithm as a
 template
 parameter. Does that make sense?

 Something like:

 auto result = data.compress!(LZW);

 Then we could pass different compressing algorithms to the compress
 function.
I don't see the point. Furthermore, it requires that the compress template know about all the compression algorithms available, which limits future expansion.
No the compression type only has to provide a certain api.
Jun 04 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 9:34 AM, David wrote:
 Am 04.06.2013 18:09, schrieb Walter Bright:
 On 6/4/2013 6:34 AM, Jacob Carlborg wrote:
 I'm wondering if (un)compress can take the compressing algorithm as a
 template
 parameter. Does that make sense?

 Something like:

 auto result = data.compress!(LZW);

 Then we could pass different compressing algorithms to the compress
 function.
I don't see the point. Furthermore, it requires that the compress template know about all the compression algorithms available, which limits future expansion.
No the compression type only has to provide a certain api.
Again, I'm not seeing the added value with this.
Jun 04 2013
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 03 Jun 2013 20:44:04 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 Comments welcome.
LZW is a nice and fast general purpose algorithm and I welcome its addition to Phobos to build file format readers from it (MS-DOS compress, GIF, TIFF) or even just to compress data on the fly in RAM. Most people seem to have moved on to zlib though for pretty much anything else. Actually I just happened to attempt something similar. Influenced by your talk about modularity and bioinfornatic's micro benchmarking with reading FASTA files I try to wrap up the concepts of bit streams and algorithms processing them. But some of my design goals are different: a) Not-Invented-Here must take precedence. :D b) There is no other measure than bytes/second. c) Every algorithm must run in its own thread for maximal parallelism. (like Unix process piping) So it is not about parallel algorithms, but building processing pipelines that work like Unix where only circular buffers need to be shared from one algorithm to the next. Am Mon, 3 Jun 2013 23:40:06 -0700 schrieb Timothee Cour <thelastmammoth gmail.com>:
 A)
 there already is std.zlib; why not have:
 std.compress.zlib: public import std.zlib
 std.compress.lzw: put this new module there instead of in std.compress
 std.compress.image.png
 std.compress.image.jpg
Yes and no. Compression algorithms should be in std.compress and share the same API, but image file formats in std.image.* or std.fileformat.*. You don't look into std.compress when you want to open *.bmps and *.jpgs. Am Tue, 04 Jun 2013 01:00:03 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:
 On 6/3/2013 11:40 PM, Timothee Cour wrote:
 D)
 CircularBuffer belongs somewhere else; maybe std.range or std.container
I have mixed feelings about that. If you'll notice, std.compress doesn't have any imports! I wanted to make at least one module that doesn't pull in 100% of everything in Phobos (one of my pet peeves).
I have nothing to add to the discussion on THAT matter, but a compromise should be found between few massive imports (D) and hundreds of tiny imports (Java). :) -- Marco
Jun 04 2013
prev sibling next sibling parent reply "Tiago Martinez" <tiago.martinez gmail.com> writes:
On Tuesday, 4 June 2013 at 03:44:05 UTC, Walter Bright wrote:
 https://github.com/WalterBright/phobos/blob/std_compress/std/compress.d

 I wrote this to add components to compress and expand ranges.

 Highlights:

 1. doesn't do any memory allocation
 2. can handle arbitrarily large sets of data
 3. it's lazy
 4. takes an InputRange, and outputs an InputRange

 Comments welcome.
I may have misunderstood something, but the code does not implement LZW (a variant of LZ78), but a variant of LZ77 (i.e. deflate/ZIP). See https://en.wikipedia.org/wiki/LZ77_and_LZ78
Jun 05 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
05-Jun-2013 16:16, Tiago Martinez пишет:
 On Tuesday, 4 June 2013 at 03:44:05 UTC, Walter Bright wrote:
 https://github.com/WalterBright/phobos/blob/std_compress/std/compress.d

 I wrote this to add components to compress and expand ranges.

 Highlights:

 1. doesn't do any memory allocation
 2. can handle arbitrarily large sets of data
 3. it's lazy
 4. takes an InputRange, and outputs an InputRange

 Comments welcome.
I may have misunderstood something, but the code does not implement LZW (a variant of LZ78), but a variant of LZ77 (i.e. deflate/ZIP).
+1 I thought to chime in with this too, keywords are: sliding window ===> LZ77 dictionary ===> LZW
 See https://en.wikipedia.org/wiki/LZ77_and_LZ78
-- Dmitry Olshansky
Jun 05 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/5/2013 10:46 AM, Dmitry Olshansky wrote:
 05-Jun-2013 16:16, Tiago Martinez пишет:
 I may have misunderstood something, but the code does not
 implement LZW (a variant of LZ78), but a variant of LZ77 (i.e.
 deflate/ZIP).
+1 I thought to chime in with this too, keywords are: sliding window ===> LZ77 dictionary ===> LZW
 See https://en.wikipedia.org/wiki/LZ77_and_LZ78
Thanks, you're both right.
Jun 05 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 05, 2013 at 04:17:42PM +0200, David Nadlinger wrote:
 On Wednesday, 5 June 2013 at 11:30:10 UTC, John Colvin wrote:
Although I think you're right about having smaller modules, I
generally find it easier to browse through a larger file than many
smaller files.
On the contrary, I find extremely large files (like std.algorithm) very hard to navigate, because it's a hodgepodge of only loosely-related code, most of which is completely independent of the others. Which means there's no logical ordering to the code, they're just in arbitrary random order (and often not the same order they appear in the ddoc index). The only way to find stuff in code like this is to use the search function -- which is no different from looking up a different file in a well-organized module directory hierarchy.
Multiple files is ok if you know what you're looking for (grep) but
when you're just trying to scan across a system to get a feel for how
it's working, juggling many files is a real pita.
Try scanning through std.algorithm and tell me whether you "get a feel for how it's working". I tried doing that before, and got so lost 12% into the file that I've even less clue about how it all fits together than before I looked at the code. After the first 5 seconds or so, I'm just randomly paging up/down without any idea of where I am code-wise.
 Use an editor with a file tree sidebar? Quite on the contrary, I find
 many files to be much preferable, because you automatically have
 "bookmarks" in the source to come back to, and having the
 functionality already grouped in manageable logical units saves you
 from inferring that structure again, as it is the case when scrolling
 through a huge file.
+1.
 On a lighter note, if it's really a problem for you that module
 files are too small, what about just concatenating all the files in
 a given directory using a little shell magic? ;)
cat std/compress/*.d > /tmp/src.d; vim /tmp/src.d :) On Wed, Jun 05, 2013 at 04:20:49PM +0200, David Nadlinger wrote:
 On Wednesday, 5 June 2013 at 12:55:50 UTC, Andrei Alexandrescu
 wrote:
On 6/5/13 2:55 AM, Timothee Cour wrote:
What I suggested in my original post didn't involve any
indirection/abstraction; simply a renaming to be consistent with
existing zlib (see my points A+B in my 1st post on this thread):

std.compress.zlib.compress
std.compress.zlib.uncompress
std.compress.lzw.compress
std.compress.lzw.uncompress
I think that's nice.
+1. D has many powerful features for handling module namespacing (e.g. "import lzw = std.compress.lzw"), let's enable people to make use of them.
[...] +1. Being D's standard library, Phobos really should be the standard example of how module namespacing should work. Right now it's just promulgating the bad practice of throwing a bunch of unrelated (or only loosely related) code in to giant monolithic files. C'mon, guys, this isn't 1975. We *have* tools for managing hierarchies of smallish files. There's no compelling reason why we have to stick to monolithic module design (or lack of design thereof) anymore. The biggest advantage of small modules is that code that doesn't depend on each other will not be lumped together in the same file. Why should they be? If you only use function X, why should the compiler do extra unnecessary work in parsing and compiling function Y, just because we arbitrarily lumped X and Y together for aesthetic (or whatever) reasons? Perhaps Phobos will be more palatable to the naysayers if using a single function doesn't, e.g., pull in a 5000-line std.algorithm. (Actually, std.algorithm currently sits at 11636 lines. I call BS on whoever claims to be able to "skim over" std.algorithm and "get a feel for how it works". Chances are your finger will get so tired of hitting PgDn about 2000 lines into the file that you won't even look at the rest. And most of the code is only superficially related to each other -- about 20 functions into the file you'd have lost track of all sense of how things fit together -- 'cos they *don't* really fit together! It's the epitome of why we *should* move to smaller modules, rather than the current giant monolithic ones.) T -- The right half of the brain controls the left half of the body. This means that only left-handed people are in their right mind. -- Manoj Srivastava
Jun 05 2013