digitalmars.D

digitalmars.D - dmdz

Ellery Newcomer (29/29) Mar 11 2010 So I'm toying with a prototype, which is proving nice enough, but there

Andrei Alexandrescu (17/46) Mar 11 2010 To me this looks like a definite V2 thing honed by experience. For now

Ellery Newcomer (24/57) Mar 11 2010 It is. I suppose the name isn't so important, but I really hate zip

Walter Bright (11/15) Mar 11 2010 What I'd like to see is the creation of a library file interface, say:
Nick Sabalausky (20/22) Mar 11 2010 This is a bit of a "vim vs emacs" or "static vs dynamic" sort of issue.

Lars T. Kyllingstad (14/39) Mar 12 2010 I don't really disagree, but it's not always that simple. Take tar, for...
Bernard Helyer (2/9) Mar 12 2010 The right click 'extract here' under GNOME does *exactly* this.

Lutger (4/16) Mar 12 2010 Same under KDE: Dolphin right click 'extract here, autodetect subfolder'...

Chad J (2/20) Mar 13 2010

Bill Baxter (7/12) Mar 12 2010 WinRAR has an option for that if the zip file and the single folder
Ellery Newcomer (5/28) Mar 12 2010 I rarely come across a zip file that doesn't follow that convention, and...

Walter Bright (12/14) Mar 11 2010 How about:
Lars T. Kyllingstad (4/6) Mar 12 2010 Cool! Looking forward to using it. :)

Ellery Newcomer (3/9) Mar 12 2010 I have no idea why it's called dmdz and not zdmd. My guess is so you can...

Ellery Newcomer (12/12) Mar 15 2010 Hello.

Nick Sabalausky (3/15) Mar 15 2010 I'd just require a setting in dmd.conf for that.

Ellery Newcomer (5/26) Mar 15 2010 Of course it turns out to be a screwy zip file. Nevermind..

Ellery Newcomer (11/11) Mar 16 2010 Anyone want to play with dmdz, here it is:

Lutger (3/15) Mar 16 2010 You might like TRAP:
Robert Clipsham (10/21) Mar 16 2010 $SOMECMD
Andrei Alexandrescu (20/31) Mar 16 2010 This is solid work, but I absolutely refuse to believe the solution must...

Ellery Newcomer (25/47) Mar 17 2010 I count 2 modules and about 800 loc. 2 to 300 of which implements

Andrei Alexandrescu (111/160) Mar 17 2010 Thanks for replying to this. I'd been afraid that I was coming off too

Ellery Newcomer (40/88) Mar 17 2010 dang right you are. If you're going to count the antlr runtime, then

Andrei Alexandrescu (28/127) Mar 17 2010 I meant the antlr grammar for the task. I gave two counts, one excluding...

BCS (8/10) Mar 17 2010 The difference in speed between disk IO and CPU /might/ be high enough t...

Andrei Alexandrescu (5/14) Mar 18 2010 That works on zsh, I'm not sure whether it works with other shells.
Walter Bright (4/15) Mar 18 2010 I'd argue that for this case, caching the extracted files is not worth t...

Andrei Alexandrescu (6/21) Mar 18 2010 Of course not, but the typical scenario is to just run a program off its...

Walter Bright (3/4) Mar 18 2010 Caching the executable, sure, but I'm not sure that translates into a ca...

Andrei Alexandrescu (4/8) Mar 18 2010 I see. It should be fine to cache the exe and regenerate only if the

BCS (7/25) Mar 18 2010 The only case I can think of where putting a zip file in the middle of t...

Walter Bright (3/7) Mar 18 2010 It might even be practical to have dmdz compile from a zip file specifie...

Andrei Alexandrescu (3/10) Mar 18 2010 In that case I do think caching would be helpful :o).
Lutger (4/12) Mar 18 2010 Just like dsss did...(and still does for D1 I guess)

Walter Bright (2/6) Mar 18 2010 Anyone can revive it if they're motivated too!

Ellery Newcomer (8/78) Mar 18 2010 Yeah, you're right there.

Robert Clipsham (4/6) Mar 18 2010 That seems like a tad too much for it... Surely it would only take a few...

Ellery Newcomer (4/10) Mar 18 2010 Sure. I could write it in 100 loc. My concern is they would be a buggy

Andrei Alexandrescu (3/15) Mar 18 2010 You could write it in 5 loc.

Andrei Alexandrescu (32/49) Mar 18 2010 My bad for not being able to see that in the code. I read through and

Clemens (2/10) Mar 18 2010 I think it would be a good idea to stay well away from gratuitous portab...

Andrei Alexandrescu (4/18) Mar 18 2010 Yah, I agree. Well `` don't need to be used in the command line, a

Walter Bright (5/11) Mar 18 2010 dmd will already read switches out of a file:
Lionello Lunesu (8/12) Mar 18 2010 and I'm out..

Andrei Alexandrescu (9/21) Mar 18 2010 I looked around. basename and dirname suggest that the ones in phobos

Robert Clipsham (10/13) Mar 18 2010 I'm usually one of those, but seen as you asked... It looks good :) I

Ellery Newcomer (5/18) Mar 18 2010 It would only involve building support for those formats into phobos :)

Andrei Alexandrescu (10/29) Mar 18 2010 Heh, incidentally I just needed a tar reader a few days ago, so I wrote

Walter Bright (10/23) Mar 18 2010 That's great, but I only suggest that this not be added to Phobos until ...

Andrei Alexandrescu (15/39) Mar 18 2010 The archive type should be a D class inheriting ArchiveReader, so no

Walter Bright (16/22) Mar 18 2010 The reasons for reading the file to determine the archive type are:

Andrei Alexandrescu (31/55) Mar 18 2010 It is not necessary, only vital.

Walter Bright (4/6) Mar 18 2010 I understand your point.

Andrei Alexandrescu (7/14) Mar 18 2010 Makes sense.

Walter Bright (8/27) Mar 18 2010 Maybe a better way to do it is to just pass a delegate that encapsulates...

Walter Bright (5/6) Mar 18 2010 Another thing needed for the interface is an associative array that maps...

Andrei Alexandrescu (4/10) Mar 18 2010 Emphatically NO. Archives work with streams. You can build indexing on

Michel Fortin (17/28) Mar 18 2010 Andrei, have you took a look at the Zip file format? It's not streamable...

Andrei Alexandrescu (7/31) Mar 18 2010 Interesting, thank you. I still think generally a random-access

Walter Bright (4/15) Mar 18 2010 Such an interface won't work with .lib or .a archives. Both have an embe...

Andrei Alexandrescu (6/21) Mar 18 2010 Now I understand why linkers thrash the disk.

Walter Bright (8/31) Mar 18 2010 I think this is incorrect. The table of contents in the .lib files was d...

Robert Clipsham (4/5) Mar 18 2010 http://en.wikipedia.org/wiki/Xz - A lot of linux distro's seem to be

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

So I'm toying with a prototype, which is proving nice enough, but there 
be a few things that I'm not quite sure which way to go with.

Currently I have the general pattern

dmdz [global flags] foo1.zip [foo1 local flags] foo2.zip [foo2 local 
flags] ...

although when given multiple zips it just compiles them independently.

My thought was when fooi.zip compiles a lib file, the result should be 
made available to all subsequent zip files, so you could do something like

dmdz lib1.zip lib2.zip main.zip

where lib2 can depend on lib1 and main can depend on either lib. But 
then most if not all of lib1's flags need to be forwarded to lib2 and main.

The other alternative I thought of is all the zip files get extracted 
and then all compiled at once.

Or is multiple zip files even a good idea?



For the more specific case

dmdz [global flags] foo.zip [local flags]

it expects all the relevant content in foo.zip to be located inside 
directory foo, and doesn't extract anything else unless you explicitly 
tell it to.

Also, there can be a file 'cmd' (name?) inside foo.zip which contains 
additional flags for the compile, with local flags overriding global 
flags overriding flags found in cmd. At least for dmdz flags.

dmd flags get filtered out and forwarded to dmd.

The current strategy for compiling just involves giving every compilable 
thing extracted to dmd. There's also an option to compile each source 
file separately (which I put in after hitting an odd Out of Memory Error).

Comments?


Also, are there any plans for std.zip, e.g. with regard to ranges, 
input/output streams, etc? The current api seems a smidge spartan.

Mar 11 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/11/2010 12:11 PM, Ellery Newcomer wrote:
 So I'm toying with a prototype, which is proving nice enough, but there
 be a few things that I'm not quite sure which way to go with.

I was eagerly waiting for you to get back regarding this project. Thank you!

 Currently I have the general pattern

 dmdz [global flags] foo1.zip [foo1 local flags] foo2.zip [foo2 local
 flags] ...

 although when given multiple zips it just compiles them independently.

 My thought was when fooi.zip compiles a lib file, the result should be
 made available to all subsequent zip files, so you could do something like

 dmdz lib1.zip lib2.zip main.zip

 where lib2 can depend on lib1 and main can depend on either lib. But
 then most if not all of lib1's flags need to be forwarded to lib2 and main.

 The other alternative I thought of is all the zip files get extracted
 and then all compiled at once.

 Or is multiple zip files even a good idea?

To me this looks like a definite V2 thing honed by experience. For now 
the focus is distributing entire programs as one zip file.

 For the more specific case

 dmdz [global flags] foo.zip [local flags]

 it expects all the relevant content in foo.zip to be located inside
 directory foo, and doesn't extract anything else unless you explicitly
 tell it to.

I don't understand this. Does the program foo.zip have to contain an 
actual directory called "foo"? That's a bit restrictive. My initial plan 
revolved around expanding foo.zip somewhere in a unique subdir of the 
temp directory and considering that a full-blown project resides inside 
that subdir.

 Also, there can be a file 'cmd' (name?) inside foo.zip which contains
 additional flags for the compile, with local flags overriding global
 flags overriding flags found in cmd. At least for dmdz flags.

How about dmd.conf?

 dmd flags get filtered out and forwarded to dmd.

 The current strategy for compiling just involves giving every compilable
 thing extracted to dmd. There's also an option to compile each source
 file separately (which I put in after hitting an odd Out of Memory Error).

 Comments?

That sounds about right. One thing I want is to stay reasonably KISS 
(e.g. like rdmd is), i.e. not invent a lot of arcana. rdmd has many 
heuristics and limitations but has the virtue that it gets a specific 
job done without requiring its user to learn most anything. I hope dmdz 
turns out similarly simple.

 Also, are there any plans for std.zip, e.g. with regard to ranges,
 input/output streams, etc? The current api seems a smidge spartan.

I've hoped to rewrite std.zip forever, but found no time to do so.


Andrei

Mar 11 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/11/2010 12:29 PM, Andrei Alexandrescu wrote:
 For the more specific case

 dmdz [global flags] foo.zip [local flags]

 it expects all the relevant content in foo.zip to be located inside
 directory foo, and doesn't extract anything else unless you explicitly
 tell it to.

 I don't understand this. Does the program foo.zip have to contain an
 actual directory called "foo"? That's a bit restrictive. My initial plan
 revolved around expanding foo.zip somewhere in a unique subdir of the
 temp directory and considering that a full-blown project resides inside
 that subdir.

It is. I suppose the name isn't so important, but I really hate zip 
files whose contents aren't contained inside a single directory. Also, 
there would be a bit of a dichotomy if

dmdz foo.zip

resulted in a directory 'foo' wherever, but

unzip foo.zip

resulted in what would be the contents of 'foo' above.

Another thing: do you envision this just being a 
build-this-completed-project, or do you see this as an actual 
development tool? Because I've been approaching it more from the latter 
perspective. Zip file is a roadmap: look, all the files you need for to 
compile are here, here, here, and here. So use them. Compile.

But if the zip file is a complete project, then you would expect to see 
source code, test code, test data, licenses, documentation, etc, which 
would likely require filtering anyways and possibly multiple compiles 
for different pieces. And you'd expect the result of the compile to end 
up somewhere in the directory you just created.

Alright, I think I'm seeing less and less value in foo.zip/foo as a req.

 Also, there can be a file 'cmd' (name?) inside foo.zip which contains
 additional flags for the compile, with local flags overriding global
 flags overriding flags found in cmd. At least for dmdz flags.

 How about dmd.conf?

Sounds good.

 dmd flags get filtered out and forwarded to dmd.

 The current strategy for compiling just involves giving every compilable
 thing extracted to dmd. There's also an option to compile each source
 file separately (which I put in after hitting an odd Out of Memory
 Error).

 Comments?

 That sounds about right. One thing I want is to stay reasonably KISS
 (e.g. like rdmd is), i.e. not invent a lot of arcana. rdmd has many
 heuristics and limitations but has the virtue that it gets a specific
 job done without requiring its user to learn most anything. I hope dmdz
 turns out similarly simple.

 Also, are there any plans for std.zip, e.g. with regard to ranges,
 input/output streams, etc? The current api seems a smidge spartan.

 I've hoped to rewrite std.zip forever, but found no time to do so.

Well, heck. Maybe I'll see what I can do with it. Do you want it to 
conform to any interface in particular?

Also: test whether a file [path?] is contained within a specific 
directory [path?]. does such functionality exist somewhere in phobos?
 Andrei

Mar 11 2010

Walter Bright <newshound1 digitalmars.com> writes:

Ellery Newcomer wrote:
 I've hoped to rewrite std.zip forever, but found no time to do so.

 
 Well, heck. Maybe I'll see what I can do with it. Do you want it to 
 conform to any interface in particular?

What I'd like to see is the creation of a library file interface, say:

    std.archive

and then have implementations of it:

    std.archive.zip
    std.archive.tar
    std.archive.lha
    std.archive.7zip

etc. Pass a file name to a factory method of std.archive, and it figures 
out what kind of archive it is, instantiates the appropriate 
implementation, etc.

Mar 11 2010

"Nick Sabalausky" <a a.a> writes:

"Ellery Newcomer" <ellery-newcomer utulsa.edu> wrote in message 
news:hnc4o3$2lms$1 digitalmars.com...
 I suppose the name isn't so important, but I really hate zip files whose 
 contents aren't contained inside a single directory.

This is a bit of a "vim vs emacs" or "static vs dynamic" sort of issue.

Most of the archive programs I've used, including the one I currently use, 
put an "Extract to new directory" option into my file manager's right-click 
menu. I *always* use that, and consider it downright silly not to. But every 
once in a while I'll get an archive that follows the "nothing but one dir" 
convention, so I get a useless extra subfolder that I have to either delete 
or allow it to clutter up my filesystem, and that just irritates the hell 
out of me.

Personally, I'm convinced that any archive program that doesn't allow you to 
automatically create a subfolder by default is a bad archive program. And 
I'm convinced that a convention that places restrictions on the top-level of 
a zip is, well, rediculous. But obviously there are people that disagree 
with me on that. So, I guess it's a "vim vs emacs" kind of thing.

What I really want is an archive program that automatically makes a 
subfolder by default *but* detects if the top level inside the archive 
contains nothing more than a single folder and intelligently *not* create a 
new folder in that case. But I've yet to see one that does that, and I 
haven't had time to make one.

Mar 11 2010

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Nick Sabalausky wrote:
 "Ellery Newcomer" <ellery-newcomer utulsa.edu> wrote in message 
 news:hnc4o3$2lms$1 digitalmars.com...
 I suppose the name isn't so important, but I really hate zip files whose 
 contents aren't contained inside a single directory.

 
 This is a bit of a "vim vs emacs" or "static vs dynamic" sort of issue.
 
 Most of the archive programs I've used, including the one I currently use, 
 put an "Extract to new directory" option into my file manager's right-click 
 menu. I *always* use that, and consider it downright silly not to. But every 
 once in a while I'll get an archive that follows the "nothing but one dir" 
 convention, so I get a useless extra subfolder that I have to either delete 
 or allow it to clutter up my filesystem, and that just irritates the hell 
 out of me.
 
 Personally, I'm convinced that any archive program that doesn't allow you to 
 automatically create a subfolder by default is a bad archive program. And 
 I'm convinced that a convention that places restrictions on the top-level of 
 a zip is, well, rediculous. But obviously there are people that disagree 
 with me on that. So, I guess it's a "vim vs emacs" kind of thing.

I don't really disagree, but it's not always that simple.  Take tar, for 
instance, which has been around since forever, and which has a legacy 
you can't drop just like that.  (I wonder if it's even part of the POSIX 
standard?)  There are literally thousands of applications that depend on 
tar working in exactly the same way as it has always done, on all 
systems.  And that way is to automatically extract all files into the 
current directory unless otherwise specified.

As long as tar is the most common archive format on *NIX (and it is, by 
far), one must expect people to be true gentlemen and -women who put 
their files in a subdirectory inside the archive -- i.e. make tarballs 
and not tarbombs. :)


 What I really want is an archive program that automatically makes a 
 subfolder by default *but* detects if the top level inside the archive 
 contains nothing more than a single folder and intelligently *not* create a 
 new folder in that case. But I've yet to see one that does that, and I 
 haven't had time to make one. 

If you do, let me know.  I'd like that too. :)

-Lars

Mar 12 2010

Bernard Helyer <b.helyer gmail.com> writes:

On 12/03/10 18:09, Nick Sabalausky wrote:
 "Ellery Newcomer"<ellery-newcomer utulsa.edu>  wrote in message
 news:hnc4o3$2lms$1 digitalmars.com...

 What I really want is an archive program that automatically makes a
 subfolder by default *but* detects if the top level inside the archive
 contains nothing more than a single folder and intelligently *not* create a
 new folder in that case. But I've yet to see one that does that, and I
 haven't had time to make one.

The right click 'extract here' under GNOME does *exactly* this.

Mar 12 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Bernard Helyer wrote:

 On 12/03/10 18:09, Nick Sabalausky wrote:
 "Ellery Newcomer"<ellery-newcomer utulsa.edu>  wrote in message
 news:hnc4o3$2lms$1 digitalmars.com...

 What I really want is an archive program that automatically makes a
 subfolder by default *but* detects if the top level inside the archive
 contains nothing more than a single folder and intelligently *not* create
 a new folder in that case. But I've yet to see one that does that, and I
 haven't had time to make one.

 
 The right click 'extract here' under GNOME does *exactly* this.

Same under KDE: Dolphin right click 'extract here, autodetect subfolder' 

Perhaps Dolphin will also function under XP, last time I checked KDE was 
still a bit buggy under windows though.

Mar 12 2010

Chad J <chadjoan __spam.is.bad__gmail.com> writes:

Lutger wrote:
 Bernard Helyer wrote:
 
 On 12/03/10 18:09, Nick Sabalausky wrote:
 "Ellery Newcomer"<ellery-newcomer utulsa.edu>  wrote in message
 news:hnc4o3$2lms$1 digitalmars.com...

 What I really want is an archive program that automatically makes a
 subfolder by default *but* detects if the top level inside the archive
 contains nothing more than a single folder and intelligently *not* create
 a new folder in that case. But I've yet to see one that does that, and I
 haven't had time to make one.

 The right click 'extract here' under GNOME does *exactly* this.

 
 Same under KDE: Dolphin right click 'extract here, autodetect subfolder' 

Yes, I love this feature.

 
 Perhaps Dolphin will also function under XP, last time I checked KDE was 
 still a bit buggy under windows though.

Mar 13 2010

Bill Baxter <wbaxter gmail.com> writes:

On Thu, Mar 11, 2010 at 9:09 PM, Nick Sabalausky <a a.a> wrote:

 What I really want is an archive program that automatically makes a
 subfolder by default *but* detects if the top level inside the archive
 contains nothing more than a single folder and intelligently *not* create a
 new folder in that case. But I've yet to see one that does that, and I
 haven't had time to make one.

WinRAR has an option for that if the zip file and the single folder
inside are named the same thing.
So if Foo.zip contains just a top level folder called Foo, then it
just extracts Foo.  Otherwise it makes a "Foo" folder and puts the
contents of Foo.zip into that.

--bb

Mar 12 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/11/2010 11:09 PM, Nick Sabalausky wrote:
 "Ellery Newcomer"<ellery-newcomer utulsa.edu>  wrote in message
 news:hnc4o3$2lms$1 digitalmars.com...
 I suppose the name isn't so important, but I really hate zip files whose
 contents aren't contained inside a single directory.

 This is a bit of a "vim vs emacs" or "static vs dynamic" sort of issue.

 Most of the archive programs I've used, including the one I currently use,
 put an "Extract to new directory" option into my file manager's right-click
 menu. I *always* use that, and consider it downright silly not to. But every
 once in a while I'll get an archive that follows the "nothing but one dir"
 convention, so I get a useless extra subfolder that I have to either delete
 or allow it to clutter up my filesystem, and that just irritates the hell
 out of me.

I rarely come across a zip file that doesn't follow that convention, and 
I never extract to new directory, but I do always check the contents of 
the zip file manually.

 Personally, I'm convinced that any archive program that doesn't allow you to
 automatically create a subfolder by default is a bad archive program. And
 I'm convinced that a convention that places restrictions on the top-level of
 a zip is, well, rediculous. But obviously there are people that disagree
 with me on that. So, I guess it's a "vim vs emacs" kind of thing.

 What I really want is an archive program that automatically makes a
 subfolder by default *but* detects if the top level inside the archive
 contains nothing more than a single folder and intelligently *not* create a
 new folder in that case. But I've yet to see one that does that, and I
 haven't had time to make one.

Yeah, I'm thinking I'm going to do that with dmdz

Mar 12 2010

Walter Bright <newshound1 digitalmars.com> writes:

Ellery Newcomer wrote:
 So I'm toying with a prototype, which is proving nice enough, but there 
 be a few things that I'm not quite sure which way to go with.


How about:

    dmdz ...stuff... foo.zip ...morestuff...

being semantically identical to:

    dmdz ...stuff... (expanded contents of foo.zip) ...morestuff...


In other words, it works just like wildcard expansion:

    dmd ...stuff... *.d ...morestuff...

Just think of foo.zip as a macro that expands to a list of the files 
that are the contents of foo.zip (while ignoring files that are not 
usable as input to dmd).

The neato thing is that, for a user, there's nothing to learn about 
using dmdz.

Mar 11 2010

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Ellery Newcomer wrote:
 So I'm toying with a prototype, which is proving nice enough, but there 
 be a few things that I'm not quite sure which way to go with.

Cool!  Looking forward to using it. :)

But can we please call it zdmd, so there is some consistency with rdmd?

-Lars

Mar 12 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/12/2010 06:15 AM, Lars T. Kyllingstad wrote:
 Ellery Newcomer wrote:
 So I'm toying with a prototype, which is proving nice enough, but
 there be a few things that I'm not quite sure which way to go with.

 Cool! Looking forward to using it. :)

 But can we please call it zdmd, so there is some consistency with rdmd?

 -Lars

I have no idea why it's called dmdz and not zdmd. My guess is so you can 
have rdmdz.

Mar 12 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

Hello.

I've run into a problem.

dmd foo/bar/bizz.d

bizz.d:
  module bar.bizz;
  ...

dmd thinks it's looking at module foo.bar.bizz and generally gets 
confused unless supplied with -Ifoo. As a user, I'm not manually 
specifying that -Ifoo. So I need some bare-bones lexing capabilities.

I have an ANTLR lexer grammar, which will do fine, unless the module 
name contains unicode characters.

Any other suggestions?

Mar 15 2010

"Nick Sabalausky" <a a.a> writes:

"Ellery Newcomer" <ellery-newcomer utulsa.edu> wrote in message 
news:hnmbkl$2rsj$1 digitalmars.com...
 Hello.

 I've run into a problem.

 dmd foo/bar/bizz.d

 bizz.d:
  module bar.bizz;
  ...

 dmd thinks it's looking at module foo.bar.bizz and generally gets confused 
 unless supplied with -Ifoo. As a user, I'm not manually specifying 
 that -Ifoo. So I need some bare-bones lexing capabilities.

 I have an ANTLR lexer grammar, which will do fine, unless the module name 
 contains unicode characters.

 Any other suggestions?

I'd just require a setting in dmd.conf for that.

Mar 15 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/15/2010 10:04 PM, Nick Sabalausky wrote:
 "Ellery Newcomer"<ellery-newcomer utulsa.edu>  wrote in message
 news:hnmbkl$2rsj$1 digitalmars.com...
 Hello.

 I've run into a problem.

 dmd foo/bar/bizz.d

 bizz.d:
   module bar.bizz;
   ...

 dmd thinks it's looking at module foo.bar.bizz and generally gets confused
 unless supplied with -Ifoo. As a user, I'm not manually specifying
 that -Ifoo. So I need some bare-bones lexing capabilities.

 I have an ANTLR lexer grammar, which will do fine, unless the module name
 contains unicode characters.

 Any other suggestions?

 I'd just require a setting in dmd.conf for that.

Of course it turns out to be a screwy zip file. Nevermind..

Is dmd.conf really a good name for that file? I'm of the opinion now 
that it isn't, since it isn't the same thing and it does confuse dmd 
when executed in the directory containing it. dmdz.conf?

Mar 15 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

Anyone want to play with dmdz, here it is:

http://personal.utulsa.edu/~ellery-newcomer/dmdz.zip


Haven't tested it much, especially on windows. Don't know what it will 
do with multiple zip files. piecemeal flag doesn't know how to stop when 
you tell it to. dmd's run flag isn't handled correctly (I don't know how 
it's supposed to work).

Does anyone know of a way to tell whether a command in bash or whatever 
segfaults?

And I modified std.path.dirname and std.path.basename, so I just 
included them in dmdz.d.

Otherwise, it should work okay. It can compile itself under 2.040.

Mar 16 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Ellery Newcomer wrote:

 Anyone want to play with dmdz, here it is:
 
 http://personal.utulsa.edu/~ellery-newcomer/dmdz.zip
 
 
 Haven't tested it much, especially on windows. Don't know what it will
 do with multiple zip files. piecemeal flag doesn't know how to stop when
 you tell it to. dmd's run flag isn't handled correctly (I don't know how
 it's supposed to work).
 
 Does anyone know of a way to tell whether a command in bash or whatever
 segfaults?

You might like TRAP:

http://www.davidpashley.com/articles/writing-robust-shell-scripts.html

Mar 16 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 16/03/10 22:55, Ellery Newcomer wrote:
 Anyone want to play with dmdz, here it is:

 http://personal.utulsa.edu/~ellery-newcomer/dmdz.zip


 Haven't tested it much, especially on windows. Don't know what it will
 do with multiple zip files. piecemeal flag doesn't know how to stop when
 you tell it to. dmd's run flag isn't handled correctly (I don't know how
 it's supposed to work).

 Does anyone know of a way to tell whether a command in bash or whatever
 segfaults?

$SOMECMD
if [ $? -eq 139 ]; then
	echo "Segfault: $SOMECMD"
fi


$SOMECMD
if [ $? -gte 1 ]; then
	echo Error
fi

 And I modified std.path.dirname and std.path.basename, so I just
 included them in dmdz.d.

 Otherwise, it should work okay. It can compile itself under 2.040.

Mar 16 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/16/2010 05:55 PM, Ellery Newcomer wrote:
 Anyone want to play with dmdz, here it is:

 http://personal.utulsa.edu/~ellery-newcomer/dmdz.zip


 Haven't tested it much, especially on windows. Don't know what it will
 do with multiple zip files. piecemeal flag doesn't know how to stop when
 you tell it to. dmd's run flag isn't handled correctly (I don't know how
 it's supposed to work).

 Does anyone know of a way to tell whether a command in bash or whatever
 segfaults?

 And I modified std.path.dirname and std.path.basename, so I just
 included them in dmdz.d.

 Otherwise, it should work okay. It can compile itself under 2.040.

This is solid work, but I absolutely refuse to believe the solution must 
be as complicated as this. Recall that the baseline is a 30-lines 
script. I can't bring myself to believe that a four-modules, over 
thousand lines solution justifies the added complexity.

Besides, what happened to std.getopt? You don't need to recognize dmd's 
options any more than rdmd does. rdmd dedicates only a few lines to 
argument parsing, dmdz makes it a science.

Don't take this the wrong way, the work is absolutely a tour de force. 
I'm just saying that things could be dramatically simpler with just a 
little loss of features. I'm looking over the code and am puzzled about 
the kind of gunpower that seems to be necessary for achieving the task.

Recall what's needed: someone who is able and willing would like to 
distribute a multi-module solution as a zip file. dmdz must provide a 
means to do so. Simple as that. The "able and willing" part is important 
- you don't need to cope with arbitrarily-formatted archives, you can 
impose people how the zip must be formatted. If you ask for them to 
provide a file called "main.d" in the root of the zip, then so be it if 
it reduces the size of dmdz by a factor of ten.


Andrei

Mar 16 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/16/2010 08:13 PM, Andrei Alexandrescu wrote:
 This is solid work, but I absolutely refuse to believe the solution must
 be as complicated as this. Recall that the baseline is a 30-lines
 script. I can't bring myself to believe that a four-modules, over
 thousand lines solution justifies the added complexity.

I count 2 modules and about 800 loc. 2 to 300 of which implements 
functionality which doesn't exist in std.path but should. The ANTLR crap 
could be replaced by a hundred lines of handwritten code, but the 
grammar already existed and took less time.

 Besides, what happened to std.getopt? You don't need to recognize dmd's
 options any more than rdmd does. rdmd dedicates only a few lines to
 argument parsing, dmdz makes it a science.

It started when I said, "huh. when is this thing building an executable, 
and when is it building a library?", and parsing dmd's options seemed 
like the most generally useful way of finding that out. I rather like 
the way it's turned out. eg during development:

$ dmdz dxl.zip -unittest
 ...

$ ./dxl/bin/dxl
 ...

"alright, unittests pass"

$ dmdz dxl.zip
 ...

"now for the release executable"

fwiw, I've never used rdmd due to bug 3860.

 Don't take this the wrong way, the work is absolutely a tour de force.
 I'm just saying that things could be dramatically simpler with just a
 little loss of features. I'm looking over the code and am puzzled about
 the kind of gunpower that seems to be necessary for achieving the task.

Huh. When all you have is a harquebus ..

 Recall what's needed: someone who is able and willing would like to
 distribute a multi-module solution as a zip file. dmdz must provide a
 means to do so. Simple as that. The "able and willing" part is important
 - you don't need to cope with arbitrarily-formatted archives, you can
 impose people how the zip must be formatted. If you ask for them to
 provide a file called "main.d" in the root of the zip, then so be it if
 it reduces the size of dmdz by a factor of ten.


 Andrei

By restricting the format of the zip file a bit and moving the directory 
dmd gets run in, I might save 100 loc. Maybe.

Does adding main.d to root help with the run flag? It doesn't do 
anything for dmdz that I can see.

By introducing path2list et al into std.path or wherever (really, it is 
quite handy) and fixing basename and dirname, I could save 2 - 300 loc.

By removing piecemeal and getting rid of dmd flags, I could quit 2 - 300 
loc plus the ANTLR modules. Except I find both of those features 
occasionally useful. Given the choice, I'd keep them.

Mar 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/17/2010 03:01 PM, Ellery Newcomer wrote:
 On 03/16/2010 08:13 PM, Andrei Alexandrescu wrote:
 This is solid work, but I absolutely refuse to believe the solution must
 be as complicated as this. Recall that the baseline is a 30-lines
 script. I can't bring myself to believe that a four-modules, over
 thousand lines solution justifies the added complexity.

 I count 2 modules and about 800 loc. 2 to 300 of which implements
 functionality which doesn't exist in std.path but should. The ANTLR crap
 could be replaced by a hundred lines of handwritten code, but the
 grammar already existed and took less time.

Thanks for replying to this. I'd been afraid that I was coming off too 
critical. (I counted the ANTLR files as modules, and I think that's 
fair.) To give you an idea on where I come from, distributing dmdz with 
dmd is also a message to users on how things are getting done in D.

For the problem "Compile a D file and all its dependents, link, and run" 
the solution rdmd has 469 lines. It seems quite much to me, but I 
couldn't find ways to make it much smaller.

For the problem "Given a zip file containing a D program, build it" the 
dmdz solution is quite large. If we count everything:

$ wc --lines dmdz.d import/antlrrt/*.d lexd.g opts.d sed.sh
    782 dmdz.d
    891 import/antlrrt/collections.d
    551 import/antlrrt/exceptions.d
   1253 import/antlrrt/lexing.d
   2085 import/antlrrt/parsing.d
     10 import/antlrrt/runtime.d
    600 import/antlrrt/utils.d
    436 lexd.g
     88 opts.d
     13 sed.sh
   6709 total

Arguably we can discount the import stuff, although I'd already raise 
some objections:

$ wc --lines dmdz.d lexd.g opts.d sed.sh
   782 dmdz.d
   436 lexd.g
    88 opts.d
    13 sed.sh
  1319 total

That would suggest that it's about three times as difficult to build 
stuff present in a zip file than to deduce dependencies and build stuff 
not in a zip file. I find that difficult to swallow because to me 
building stuff in a zip file should  be in some ways easier because 
there are no dependencies to deduce - they can be assumed to be in the 
zip file.

I looked more through the program and it looks like it uses the zip 
library (honestly I would have used system("unzip...")), which does add 
some aggravation for arguably a good reason. (But I also see there's no 
caching, which is an important requirement.)

In my mind it was all about check cache, unzip, and build. True there 
are details such as lib vs. executable that can be messy but I don't 
think anything could blow complexity up too hard.

 Besides, what happened to std.getopt? You don't need to recognize dmd's
 options any more than rdmd does. rdmd dedicates only a few lines to
 argument parsing, dmdz makes it a science.

 It started when I said, "huh. when is this thing building an executable,
 and when is it building a library?", and parsing dmd's options seemed
 like the most generally useful way of finding that out. I rather like
 the way it's turned out. eg during development:

 $ dmdz dxl.zip -unittest
  > ...
 $ ./dxl/bin/dxl
  > ...

 "alright, unittests pass"

 $ dmdz dxl.zip
  > ...

 "now for the release executable"

Nice, but I don't know why you need to understand dmd's flags instead of 
simply forwarding them to dmd. You could define dmdz-specific flags 
which you parse and understand, and then dump everything else to dmd, 
which will figure its own checking and error messages and all that.

 fwiw, I've never used rdmd due to bug 3860.

I didn't mean you to use it as much as look through it for examples of 
patterns that may be useful to dmdz (such as the one above).

 Don't take this the wrong way, the work is absolutely a tour de force.
 I'm just saying that things could be dramatically simpler with just a
 little loss of features. I'm looking over the code and am puzzled about
 the kind of gunpower that seems to be necessary for achieving the task.

 Huh. When all you have is a harquebus ..

Hehe :o). Well definitely you need to submit your stdlib additions to 
e.g. bugzilla.

 Recall what's needed: someone who is able and willing would like to
 distribute a multi-module solution as a zip file. dmdz must provide a
 means to do so. Simple as that. The "able and willing" part is important
 - you don't need to cope with arbitrarily-formatted archives, you can
 impose people how the zip must be formatted. If you ask for them to
 provide a file called "main.d" in the root of the zip, then so be it if
 it reduces the size of dmdz by a factor of ten.

 Andrei

 By restricting the format of the zip file a bit and moving the directory
 dmd gets run in, I might save 100 loc. Maybe.

 Does adding main.d to root help with the run flag? It doesn't do
 anything for dmdz that I can see.

 By introducing path2list et al into std.path or wherever (really, it is
 quite handy) and fixing basename and dirname, I could save 2 - 300 loc.

 By removing piecemeal and getting rid of dmd flags, I could quit 2 - 300
 loc plus the ANTLR modules. Except I find both of those features
 occasionally useful. Given the choice, I'd keep them.

I think it would be great to remove all stuff that's not necessary. I 
paste at the end of this message my two baselines: a shell script and a 
D program. They compare poorly with your program, but are extremely 
simple. I think it may be useful to see how much impact each feature 
that these programs lack is adding size to your solution.

Andrei

EXTENSIONS=(d di a o)

ZIP=$1

TGT=/tmp/$ZIP

BIN=${ZIP/.zip/}

if [[ ! -f $ZIP ]]; then
     echo "Zip file missing: \`$ZIP'" >&2
     echo "Usage: dmdz file.zip" >&2
     exit 1
fi

if [[ ! -d $TGT ]] || [[ $ZIP -nt $TGT ]]; then
     mkdir --parents $TGT
     unzip $ZIP -d $TGT >/dev/null
fi

FIND="find . -type f -false "
for EXT in $EXTENSIONS; do
     FIND="$FIND -or -iname '*.$EXT'"
done
(cd $TGT && dmd -of$BIN `eval $FIND`)

// Accepted extensions
auto extensions = [ "d", "di", "a", "o" ];

int main(string[] args) {
     // The one and only parameter is the zip file
     auto zip = args[1];
     if (!exists(zip)) {
         stderr.writeln("Zip file missing: `", zip, "'");
         stderr.writeln("Usage: dmdz file.zip");
         return 1;
     }
     // Target directory
     auto tgt = "/tmp/" ~ zip;
     // Binary result is the name of the zip without the .zip
     auto bin = replace(zip, ".zip", "");

     // Was the zip file already extracted? If not, extract it
     if (lastModified(zip) >= lastModified(tgt, d_time.min)) {
         system("mkdir --parents " ~ tgt);
         system("unzip " ~ zip " -d " tgt ~ " >/dev/null");
     }

     // Compile all files with accepted extensions
     auto find = "find . -type f -false ";
     foreach (ext; extensions) {
         find ~= " -or -iname '*." ~ ext ~ "'";
     }
     return system("cd " ~ tgt ~ " && dmd -of" ~ bin ~ " `eval " ~ find 
~ "`");
}

Mar 17 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/17/2010 03:53 PM, Andrei Alexandrescu wrote:
 Thanks for replying to this. I'd been afraid that I was coming off too
 critical. (I counted the ANTLR files as modules, and I think that's
 fair.) To give you an idea on where I come from, distributing dmdz with
 dmd is also a message to users on how things are getting done in D.

dang right you are. If you're going to count the antlr runtime, then 
maybe you should also be counting druntime and the sections of phobos 
that I used?

 For the problem "Compile a D file and all its dependents, link, and run"
 the solution rdmd has 469 lines. It seems quite much to me, but I
 couldn't find ways to make it much smaller.

user wouldn't know that from any dmd distribution I've ever seen.

 For the problem "Given a zip file containing a D program, build it" the
 dmdz solution is quite large. If we count everything:

 $ wc --lines dmdz.d import/antlrrt/*.d lexd.g opts.d sed.sh
 782 dmdz.d
 891 import/antlrrt/collections.d
 551 import/antlrrt/exceptions.d
 1253 import/antlrrt/lexing.d
 2085 import/antlrrt/parsing.d
 10 import/antlrrt/runtime.d
 600 import/antlrrt/utils.d
 436 lexd.g
 88 opts.d
 13 sed.sh
 6709 total

forgot generated/*.d

should bump it up to 11 or 12 k.

 Arguably we can discount the import stuff, although I'd already raise
 some objections:

 $ wc --lines dmdz.d lexd.g opts.d sed.sh
 782 dmdz.d
 436 lexd.g
 88 opts.d
 13 sed.sh
 1319 total

lexd.g and sed.sh are only there for reference. I hate it when machine 
generated source code is in a project, but the source grammar isn't.

 That would suggest that it's about three times as difficult to build
 stuff present in a zip file than to deduce dependencies and build stuff
 not in a zip file. I find that difficult to swallow because to me
 building stuff in a zip file should be in some ways easier because there
 are no dependencies to deduce - they can be assumed to be in the zip file.

 I looked more through the program and it looks like it uses the zip
 library (honestly I would have used system("unzip...")), which does add
 some aggravation for arguably a good reason. (But I also see there's no
 caching, which is an important requirement.)

eh?

 Nice, but I don't know why you need to understand dmd's flags instead of
 simply forwarding them to dmd. You could define dmdz-specific flags
 which you parse and understand, and then dump everything else to dmd,
 which will figure its own checking and error messages and all that.

filtering out flags that screw things up for the build in question; 
knowing where the resultant executable is supposed to be;

 I think it would be great to remove all stuff that's not necessary. I
 paste at the end of this message my two baselines: a shell script and a
 D program. They compare poorly with your program, but are extremely
 simple. I think it may be useful to see how much impact each feature
 that these programs lack is adding size to your solution.


 Andrei

You come at this problem like "It should be an eloquent showcase of what 
D has to offer."

I come at it like "I want this to be generally useful. To me."

In my opinion, how well it works trumps how many lines of code it took 
to write. But for the aforementioned bug, I never would have looked at 
rdmd's source, and even then I didn't notice how many lines of code it 
was. The way dmdz was written is based on the needs that presented 
themselves to me at the time. So far I've run it against three different 
projects and I'm happy with it the way it's turned out.

1. dmdz

toy example. not much here.

2. dexcelapi

port of jexcelapi, 90k loc (that thing must have shrunk when I wasn't 
looking, I was sure it was 200k), ~ 400 source files. Big. Dumping 
everything to dmd is easy enough to implement one way or another, but 
when I hit an Out of Memory Error I need what -piecemeal has to offer. I 
found the offending file (still don't know what's up with it), commented 
it out, and I can dump everything to dmd again. Without it, I probably 
would have given up on D for another year and a half.

3. dcrypt

Today, I wanted to play with it, so I checked it out, popped dmdz.conf 
and a main.d in the directory and zipped the whole thing up.

dmdz dcrypt.zip

It worked. Without me doing anything to dmdz or dcrypt (except adding a 
string alias, &&^%^ tango).


I was kind of hoping others would try it and give their opinions, but 
apparently nobody else cares. Or they're on vacation, like I should be. 
Or they're giving the infamous 'silent approval'. Who knows.

Mar 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/17/2010 08:17 PM, Ellery Newcomer wrote:
 On 03/17/2010 03:53 PM, Andrei Alexandrescu wrote:
 Thanks for replying to this. I'd been afraid that I was coming off too
 critical. (I counted the ANTLR files as modules, and I think that's
 fair.) To give you an idea on where I come from, distributing dmdz with
 dmd is also a message to users on how things are getting done in D.

 dang right you are. If you're going to count the antlr runtime, then
 maybe you should also be counting druntime and the sections of phobos
 that I used?

I meant the antlr grammar for the task. I gave two counts, one excluding 
the antlr runtime, and based the rest of my discussion on that. I sadly 
note the irony. There is no need to get defensive, really.

 For the problem "Compile a D file and all its dependents, link, and run"
 the solution rdmd has 469 lines. It seems quite much to me, but I
 couldn't find ways to make it much smaller.

 user wouldn't know that from any dmd distribution I've ever seen.

 For the problem "Given a zip file containing a D program, build it" the
 dmdz solution is quite large. If we count everything:

 $ wc --lines dmdz.d import/antlrrt/*.d lexd.g opts.d sed.sh
 782 dmdz.d
 891 import/antlrrt/collections.d
 551 import/antlrrt/exceptions.d
 1253 import/antlrrt/lexing.d
 2085 import/antlrrt/parsing.d
 10 import/antlrrt/runtime.d
 600 import/antlrrt/utils.d
 436 lexd.g
 88 opts.d
 13 sed.sh
 6709 total

 forgot generated/*.d

Well that's generated. I counted what's needed to get things going. 
Unless you meant that ironically...

 should bump it up to 11 or 12 k.

 Arguably we can discount the import stuff, although I'd already raise
 some objections:

 $ wc --lines dmdz.d lexd.g opts.d sed.sh
 782 dmdz.d
 436 lexd.g
 88 opts.d
 13 sed.sh
 1319 total

 lexd.g and sed.sh are only there for reference. I hate it when machine
 generated source code is in a project, but the source grammar isn't.

My understanding is that lexd.g is your code so it should be included in 
the size of the solution, whereas the generated code should not.

 That would suggest that it's about three times as difficult to build
 stuff present in a zip file than to deduce dependencies and build stuff
 not in a zip file. I find that difficult to swallow because to me
 building stuff in a zip file should be in some ways easier because there
 are no dependencies to deduce - they can be assumed to be in the zip
 file.

 I looked more through the program and it looks like it uses the zip
 library (honestly I would have used system("unzip...")), which does add
 some aggravation for arguably a good reason. (But I also see there's no
 caching, which is an important requirement.)

 eh?

The idea is to not extract the files every time you build. If they are 
in place already, the tool should recognize that.

 Nice, but I don't know why you need to understand dmd's flags instead of
 simply forwarding them to dmd. You could define dmdz-specific flags
 which you parse and understand, and then dump everything else to dmd,
 which will figure its own checking and error messages and all that.

 filtering out flags that screw things up for the build in question;
 knowing where the resultant executable is supposed to be;

 I think it would be great to remove all stuff that's not necessary. I
 paste at the end of this message my two baselines: a shell script and a
 D program. They compare poorly with your program, but are extremely
 simple. I think it may be useful to see how much impact each feature
 that these programs lack is adding size to your solution.


 Andrei

 You come at this problem like "It should be an eloquent showcase of what
 D has to offer."

 I come at it like "I want this to be generally useful. To me."

The tool shouldn't be a showcase. Obviously the primary purpose is for 
the tool to be useful. The shell script and the D script are useful. I 
am sure your tool is useful, but I think it doesn't hit the right 
balance. I simply don't think it takes that much code to achieve what 
the tool needs to achieve.

 In my opinion, how well it works trumps how many lines of code it took
 to write. But for the aforementioned bug, I never would have looked at
 rdmd's source, and even then I didn't notice how many lines of code it
 was. The way dmdz was written is based on the needs that presented
 themselves to me at the time. So far I've run it against three different
 projects and I'm happy with it the way it's turned out.

 1. dmdz

 toy example. not much here.

 2. dexcelapi

 port of jexcelapi, 90k loc (that thing must have shrunk when I wasn't
 looking, I was sure it was 200k), ~ 400 source files. Big. Dumping
 everything to dmd is easy enough to implement one way or another, but
 when I hit an Out of Memory Error I need what -piecemeal has to offer. I
 found the offending file (still don't know what's up with it), commented
 it out, and I can dump everything to dmd again. Without it, I probably
 would have given up on D for another year and a half.

 3. dcrypt

 Today, I wanted to play with it, so I checked it out, popped dmdz.conf
 and a main.d in the directory and zipped the whole thing up.

 dmdz dcrypt.zip

 It worked. Without me doing anything to dmdz or dcrypt (except adding a
 string alias, &&^%^ tango).

I'm not contending the tool is not useful. I'm just saying it is too big 
for what it does, and that that does matter with regard to distributing 
it with dmd.

 I was kind of hoping others would try it and give their opinions, but
 apparently nobody else cares. Or they're on vacation, like I should be.
 Or they're giving the infamous 'silent approval'. Who knows.

It looks like we're getting into a little diatribe, which is very sad 
because you've clearly done a good amount of work and I didn't intend to 
make it look any other way. All I can say is that the tool is very far 
removed from what I think it should look like; for my money, the moment 
it gets larger than one simple module it would mean I took a few wrong 
turns along the way.

BTW Walter made a very nice suggestion: make a .zip file in the command 
line be equivalent to listing all files in that zip in the command line. 
I think it's this kind of idea that greatly simplifies things.


Andrei

Mar 17 2010

BCS <none anon.com> writes:

Hello Andrei,

 The idea is to not extract the files every time you build. If they are
 in place already, the tool should recognize that.

The difference in speed between disk IO and CPU /might/ be high enough that 
(unless the uncompressed file is cached or you round trip it back to the 
disk) reading from the zip may be faster. I know that on linux there is a 
way to pass a stream as a file name (I forget what happens under the hood, 
but bash uses the ">(cmd)" syntax to do it) so you could work with that.


-- 
... <IXOYE><

Mar 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/17/2010 10:30 PM, BCS wrote:
 Hello Andrei,

 The idea is to not extract the files every time you build. If they are
 in place already, the tool should recognize that.

 The difference in speed between disk IO and CPU /might/ be high enough
 that (unless the uncompressed file is cached or you round trip it back
 to the disk) reading from the zip may be faster. I know that on linux
 there is a way to pass a stream as a file name (I forget what happens
 under the hood, but bash uses the ">(cmd)" syntax to do it) so you could
 work with that.

That works on zsh, I'm not sure whether it works with other shells. 
Also, dmd refuses to compile such streams because they don't end in .d. 
The file must be written to the file system, so caching would always help.

Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

BCS wrote:
 Hello Andrei,
 
 The idea is to not extract the files every time you build. If they are
 in place already, the tool should recognize that.

 
 The difference in speed between disk IO and CPU /might/ be high enough 
 that (unless the uncompressed file is cached or you round trip it back 
 to the disk) reading from the zip may be faster. I know that on linux 
 there is a way to pass a stream as a file name (I forget what happens 
 under the hood, but bash uses the ">(cmd)" syntax to do it) so you could 
 work with that.


I'd argue that for this case, caching the extracted files is not worth the 
effort, complexity, or speed. If you're in an edit/compile/debug loop, I can't 
see working off of a zip file of the sources.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 02:36 PM, Walter Bright wrote:
 BCS wrote:
 Hello Andrei,

 The idea is to not extract the files every time you build. If they are
 in place already, the tool should recognize that.

 The difference in speed between disk IO and CPU /might/ be high enough
 that (unless the uncompressed file is cached or you round trip it back
 to the disk) reading from the zip may be faster. I know that on linux
 there is a way to pass a stream as a file name (I forget what happens
 under the hood, but bash uses the ">(cmd)" syntax to do it) so you
 could work with that.


 I'd argue that for this case, caching the extracted files is not worth
 the effort, complexity, or speed. If you're in an edit/compile/debug
 loop, I can't see working off of a zip file of the sources.

Of course not, but the typical scenario is to just run a program off its 
.zip file every so often. In that case, extraction makes for an 
unpleasant latency.

FWIW, for rdmd caching makes a big, big difference.


Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 FWIW, for rdmd caching makes a big, big difference.

Caching the executable, sure, but I'm not sure that translates into a case for 
caching the intermediate files (i.e. the extracted source).

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 04:14 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 FWIW, for rdmd caching makes a big, big difference.

 Caching the executable, sure, but I'm not sure that translates into a
 case for caching the intermediate files (i.e. the extracted source).

I see. It should be fine to cache the exe and regenerate only if the 
archive is newer.

Andrei

Mar 18 2010

BCS <none anon.com> writes:

Hello Walter,

 BCS wrote:
 
 Hello Andrei,
 
 The idea is to not extract the files every time you build. If they
 are in place already, the tool should recognize that.
 

 The difference in speed between disk IO and CPU /might/ be high
 enough that (unless the uncompressed file is cached or you round trip
 it back to the disk) reading from the zip may be faster. I know that
 on linux there is a way to pass a stream as a file name (I forget
 what happens under the hood, but bash uses the ">(cmd)" syntax to do
 it) so you could work with that.
 

 I'd argue that for this case, caching the extracted files is not worth
 the effort, complexity, or speed. If you're in an edit/compile/debug
 loop, I can't see working off of a zip file of the sources.
 

The only case I can think of where putting a zip file in the middle of that 
loop is even remotely reasonable would be for a remote build farm. The other 
use cases for build-from-zip are building someone else's code where you aren't 
editing the parts in the zip file.

-- 
... <IXOYE><

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

BCS wrote:
 The only case I can think of where putting a zip file in the middle of 
 that loop is even remotely reasonable would be for a remote build farm. 
 The other use cases for build-from-zip are building someone else's code 
 where you aren't editing the parts in the zip file.

It might even be practical to have dmdz compile from a zip file specified by a 
URL! That would be cool.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 04:15 PM, Walter Bright wrote:
 BCS wrote:
 The only case I can think of where putting a zip file in the middle of
 that loop is even remotely reasonable would be for a remote build
 farm. The other use cases for build-from-zip are building someone
 else's code where you aren't editing the parts in the zip file.

 It might even be practical to have dmdz compile from a zip file
 specified by a URL! That would be cool.

In that case I do think caching would be helpful :o).

Andrei

Mar 18 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Walter Bright wrote:

 BCS wrote:
 The only case I can think of where putting a zip file in the middle of
 that loop is even remotely reasonable would be for a remote build farm.
 The other use cases for build-from-zip are building someone else's code
 where you aren't editing the parts in the zip file.

 
 It might even be practical to have dmdz compile from a zip file specified
 by a URL! That would be cool.

Just like dsss did...(and still does for D1 I guess)

I like dmdz and rdmd, but it's a pity dsss isn't revived yet. I still really 
miss it, always thought it would become the ruby gems / CPAN of D.

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Lutger wrote:
 Just like dsss did...(and still does for D1 I guess)
 
 I like dmdz and rdmd, but it's a pity dsss isn't revived yet. I still really 
 miss it, always thought it would become the ruby gems / CPAN of D. 

Anyone can revive it if they're motivated too!

Mar 18 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/17/2010 08:49 PM, Andrei Alexandrescu wrote:
 Well that's generated. I counted what's needed to get things going.
 Unless you meant that ironically...

Yes I was speaking in jest up to this point.

 should bump it up to 11 or 12 k.

 Arguably we can discount the import stuff, although I'd already raise
 some objections:

 $ wc --lines dmdz.d lexd.g opts.d sed.sh
 782 dmdz.d
 436 lexd.g
 88 opts.d
 13 sed.sh
 1319 total

 lexd.g and sed.sh are only there for reference. I hate it when machine
 generated source code is in a project, but the source grammar isn't.

 My understanding is that lexd.g is your code so it should be included in
 the size of the solution, whereas the generated code should not.

Yeah, you're right there.
 That would suggest that it's about three times as difficult to build
 stuff present in a zip file than to deduce dependencies and build stuff
 not in a zip file. I find that difficult to swallow because to me
 building stuff in a zip file should be in some ways easier because there
 are no dependencies to deduce - they can be assumed to be in the zip
 file.

 I looked more through the program and it looks like it uses the zip
 library (honestly I would have used system("unzip...")), which does add
 some aggravation for arguably a good reason. (But I also see there's no
 caching, which is an important requirement.)

 eh?

 The idea is to not extract the files every time you build. If they are
 in place already, the tool should recognize that.

It does that, but on a per-file basis.

 Nice, but I don't know why you need to understand dmd's flags instead of
 simply forwarding them to dmd. You could define dmdz-specific flags
 which you parse and understand, and then dump everything else to dmd,
 which will figure its own checking and error messages and all that.

 filtering out flags that screw things up for the build in question;
 knowing where the resultant executable is supposed to be;

 I think it would be great to remove all stuff that's not necessary. I
 paste at the end of this message my two baselines: a shell script and a
 D program. They compare poorly with your program, but are extremely
 simple. I think it may be useful to see how much impact each feature
 that these programs lack is adding size to your solution.


 Andrei

 You come at this problem like "It should be an eloquent showcase of what
 D has to offer."

 I come at it like "I want this to be generally useful. To me."

 The tool shouldn't be a showcase. Obviously the primary purpose is for
 the tool to be useful. The shell script and the D script are useful. I
 am sure your tool is useful, but I think it doesn't hit the right
 balance. I simply don't think it takes that much code to achieve what
 the tool needs to achieve.

All right. I'll try cutting things out and see where I end up.

 I'm not contending the tool is not useful. I'm just saying it is too big
 for what it does, and that that does matter with regard to distributing
 it with dmd.

I still don't see why (other than lexd.g adds ~ 10k loc just to get the 
line 'module foo.bar;' out of a source file)

 BTW Walter made a very nice suggestion: make a .zip file in the command
 line be equivalent to listing all files in that zip in the command line.
 I think it's this kind of idea that greatly simplifies things.


 Andrei

Fair enough.

Mar 18 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 18/03/10 16:28, Ellery Newcomer wrote:
 I still don't see why (other than lexd.g adds ~ 10k loc just to get the
 line 'module foo.bar;' out of a source file)

That seems like a tad too much for it... Surely it would only take a few 
(here meaning far less than 10k) lines to parse away comments/whitespace 
at the start of the file then read the module declaration if there is one?

Mar 18 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/18/2010 11:36 AM, Robert Clipsham wrote:
 On 18/03/10 16:28, Ellery Newcomer wrote:
 I still don't see why (other than lexd.g adds ~ 10k loc just to get the
 line 'module foo.bar;' out of a source file)

 That seems like a tad too much for it... Surely it would only take a few
 (here meaning far less than 10k) lines to parse away comments/whitespace
 at the start of the file then read the module declaration if there is one?

Sure. I could write it in 100 loc. My concern is they would be a buggy 
100 loc that would take a good deal of effort to get right. lexd.g 
already existed and has been pretty heavily tested.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 11:48 AM, Ellery Newcomer wrote:
 On 03/18/2010 11:36 AM, Robert Clipsham wrote:
 On 18/03/10 16:28, Ellery Newcomer wrote:
 I still don't see why (other than lexd.g adds ~ 10k loc just to get the
 line 'module foo.bar;' out of a source file)

 That seems like a tad too much for it... Surely it would only take a few
 (here meaning far less than 10k) lines to parse away comments/whitespace
 at the start of the file then read the module declaration if there is
 one?

 Sure. I could write it in 100 loc. My concern is they would be a buggy
 100 loc that would take a good deal of effort to get right. lexd.g
 already existed and has been pretty heavily tested.

You could write it in 5 loc.

Andrei

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 11:28 AM, Ellery Newcomer wrote:
 On 03/17/2010 08:49 PM, Andrei Alexandrescu wrote:
 The idea is to not extract the files every time you build. If they are
 in place already, the tool should recognize that.

 It does that, but on a per-file basis.

My bad for not being able to see that in the code. I read through and 
also searched for "cache", "date", "time"... couldn't find it. I now 
find it by looking for "last".

 I'm not contending the tool is not useful. I'm just saying it is too big
 for what it does, and that that does matter with regard to distributing
 it with dmd.

 I still don't see why (other than lexd.g adds ~ 10k loc just to get the
 line 'module foo.bar;' out of a source file)

If a casual user downloads the dmd distro and says, hey, let me see how 
this rdmd tool is implemented, I wouldn't be afraid. If they take a look 
at dmdz, they may be daunted.

The example you gave is perfect. Right now rdmd runs dmd -v to figure 
out dependencies, but before it was parsing the file for lines that 
begin with "import". That was problematic, so I'm glad I now use the 
compiler. Your task is much simpler - nothing is allowed before the 
module line aside from the shebang line and comments, and you should 
feel free to restrict modules to e.g. not include recursive comments or 
anything that aggravates your job.

So, I'm very glad you mentioned it: 10K of code to detect "module" is 
absolute overkill. I now confess that I couldn't figure out why you 
needed the lexer for dmdz and didn't have the time to sift through the 
code and figure that out. I thought there must be some solid reason, and 
so I was ashamed to even ask. I did know you want to find "module", but 
in my naivete, I wasn't thinking that just that would ever inspire you 
to include a lexer.

To be frank, I even think you shouldn't worry at all about "module". 
Just extract the blessed thing with caching and call it a day. I was 
also thinking of simplifying options etc. by requiring a file 
"dmdflags.txt" in the archive and then do this when you run dmd:

dmd `cat dmdflags.txt` stuff morestuff andsomemorestuff

i.e. simply expand the file in the command line. No need for any 
extravaganza. But even dmdflags.txt I'd think would be a bit much. And 
speaking of cmdline stuff, assume find, zip, etc. are present on the 
host system if you need them.

 BTW Walter made a very nice suggestion: make a .zip file in the command
 line be equivalent to listing all files in that zip in the command line.
 I think it's this kind of idea that greatly simplifies things.


 Andrei

 Fair enough.

Thank you for considering changing your program.


Andrei

Mar 18 2010

Clemens <eriatarka84 gmail.com> writes:

Andrei Alexandrescu Wrote:

 To be frank, I even think you shouldn't worry at all about "module". 
 Just extract the blessed thing with caching and call it a day. I was 
 also thinking of simplifying options etc. by requiring a file 
 "dmdflags.txt" in the archive and then do this when you run dmd:
 
 dmd `cat dmdflags.txt` stuff morestuff andsomemorestuff
 
 i.e. simply expand the file in the command line.

I think it would be a good idea to stay well away from gratuitous portability
barriers like this or that system("unzip") suggestion if the portable
alternative isn't too much more work. I don't see why you wouldn't want this
thing to work on Windows too.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 12:28 PM, Clemens wrote:
 Andrei Alexandrescu Wrote:

 To be frank, I even think you shouldn't worry at all about
 "module". Just extract the blessed thing with caching and call it a
 day. I was also thinking of simplifying options etc. by requiring a
 file "dmdflags.txt" in the archive and then do this when you run
 dmd:

 dmd `cat dmdflags.txt` stuff morestuff andsomemorestuff

 i.e. simply expand the file in the command line.

 I think it would be a good idea to stay well away from gratuitous
 portability barriers like this or that system("unzip") suggestion if
 the portable alternative isn't too much more work. I don't see why
 you wouldn't want this thing to work on Windows too.

Yah, I agree. Well `` don't need to be used in the command line, a 
std.file.readText("dmdflags") should suffice.

Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 To be frank, I even think you shouldn't worry at all about "module". 
 Just extract the blessed thing with caching and call it a day. I was 
 also thinking of simplifying options etc. by requiring a file 
 "dmdflags.txt" in the archive and then do this when you run dmd:
 
 dmd `cat dmdflags.txt` stuff morestuff andsomemorestuff

dmd will already read switches out of a file:

    dmd  cmdfile ...

So there's no need to parse the command file or do any shell expansion on it. 
Just pass it, and precede it with an  .

Mar 18 2010

Lionello Lunesu <lio lunesu.remove.com> writes:

On 19-3-2010 1:18, Andrei Alexandrescu wrote:
 i.e. simply expand the file in the command line. No need for any
 extravaganza. But even dmdflags.txt I'd think would be a bit much. And
 speaking of cmdline stuff, assume find, zip, etc. are present on the
 host system if you need them.

and I'm out..

I'm using Windows and don't have any of those (well, I have MS's
FIND.EXE but that has nothing in common with posix')

Anyway, Ellery is right: general stuff that dmdz needs could probably be
moved into Phobos at some point. As for "module", couldn't dmd include
an option to output these, similar to the way it outputs deps?

L.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 06:43 PM, Lionello Lunesu wrote:
 On 19-3-2010 1:18, Andrei Alexandrescu wrote:
 i.e. simply expand the file in the command line. No need for any
 extravaganza. But even dmdflags.txt I'd think would be a bit much. And
 speaking of cmdline stuff, assume find, zip, etc. are present on the
 host system if you need them.

 and I'm out..

 I'm using Windows and don't have any of those (well, I have MS's
 FIND.EXE but that has nothing in common with posix')

You're right.

 Anyway, Ellery is right: general stuff that dmdz needs could probably be
 moved into Phobos at some point.

I looked around. basename and dirname suggest that the ones in phobos 
have issues (what are those?), and some other functions rely on 
path2list which I'd hope to replace with a range so as to not allocate 
memory without necessity.

 As for "module", couldn't dmd include
 an option to output these, similar to the way it outputs deps?

I think that would be a natural thing to ask for. Until then I don't 
think there's a real need for supporting module declarations in dmdz.


Andrei

Mar 18 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 18/03/10 01:17, Ellery Newcomer wrote:
 I was kind of hoping others would try it and give their opinions, but
 apparently nobody else cares. Or they're on vacation, like I should be.
 Or they're giving the infamous 'silent approval'. Who knows.

I'm usually one of those, but seen as you asked... It looks good :) I 
haven't had chance to try it yet, but a simple tool like this could be 
really useful. I don't have the same reservations as Andrei about the 
amount of code/how it's done... If it does its job it's good enough for 
me :)

One thing I would like to know, are there plans for file formats other 
than .zip? You can generally get files less than half the size with 
faster compression/decompression times using other formats... would 
adding support for them (.tar.xz, .tar.gz) be too much extra hassle?

Mar 18 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/18/2010 11:32 AM, Robert Clipsham wrote:
 On 18/03/10 01:17, Ellery Newcomer wrote:
 I was kind of hoping others would try it and give their opinions, but
 apparently nobody else cares. Or they're on vacation, like I should be.
 Or they're giving the infamous 'silent approval'. Who knows.

 I'm usually one of those, but seen as you asked... It looks good :) I
 haven't had chance to try it yet, but a simple tool like this could be
 really useful. I don't have the same reservations as Andrei about the
 amount of code/how it's done... If it does its job it's good enough for
 me :)

 One thing I would like to know, are there plans for file formats other
 than .zip? You can generally get files less than half the size with
 faster compression/decompression times using other formats... would
 adding support for them (.tar.xz, .tar.gz) be too much extra hassle?

It would only involve building support for those formats into phobos :)

I actually had the same thought after I saw Walter's suggestion for a 
std.archive. If I have time, I'd like to make it happen.

Wait, what's tar.xz?

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 11:39 AM, Ellery Newcomer wrote:
 On 03/18/2010 11:32 AM, Robert Clipsham wrote:
 On 18/03/10 01:17, Ellery Newcomer wrote:
 I was kind of hoping others would try it and give their opinions, but
 apparently nobody else cares. Or they're on vacation, like I should be.
 Or they're giving the infamous 'silent approval'. Who knows.

 I'm usually one of those, but seen as you asked... It looks good :) I
 haven't had chance to try it yet, but a simple tool like this could be
 really useful. I don't have the same reservations as Andrei about the
 amount of code/how it's done... If it does its job it's good enough for
 me :)

 One thing I would like to know, are there plans for file formats other
 than .zip? You can generally get files less than half the size with
 faster compression/decompression times using other formats... would
 adding support for them (.tar.xz, .tar.gz) be too much extra hassle?

 It would only involve building support for those formats into phobos :)

 I actually had the same thought after I saw Walter's suggestion for a
 std.archive. If I have time, I'd like to make it happen.

Heh, incidentally I just needed a tar reader a few days ago, so I wrote 
an embryo of a base class etc. I'll add it soon.

The basic interface is:

(a) open the archive

(b) get an input range for it. The range iterates over archive entries.

(c) You can look at archive info, and if you want to extract you can get 
a .byChunk() range to extract it. That's also an input range.

For now I'm only concerned with reading... writing needs to be added.


Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Heh, incidentally I just needed a tar reader a few days ago, so I wrote 
 an embryo of a base class etc. I'll add it soon.
 
 The basic interface is:
 
 (a) open the archive
 
 (b) get an input range for it. The range iterates over archive entries.
 
 (c) You can look at archive info, and if you want to extract you can get 
 a .byChunk() range to extract it. That's also an input range.
 
 For now I'm only concerned with reading... writing needs to be added.

That's great, but I only suggest that this not be added to Phobos until a 
generic archive interface is also added. That way, we can constantly add
support 
for new archive formats without requiring users to change their code.

Some suggestions for that:

1. The archive type should be represented by a string literal, not an enum.
This 
way, users can add other archive types without having to touch the Phobos
source 
code.

2. The reader should auto-detect the archive type based on the file contents, 
not the file name, and then call the appropriate factory method.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 02:49 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Heh, incidentally I just needed a tar reader a few days ago, so I
 wrote an embryo of a base class etc. I'll add it soon.

 The basic interface is:

 (a) open the archive

 (b) get an input range for it. The range iterates over archive entries.

 (c) You can look at archive info, and if you want to extract you can
 get a .byChunk() range to extract it. That's also an input range.

 For now I'm only concerned with reading... writing needs to be added.

 That's great, but I only suggest that this not be added to Phobos until
 a generic archive interface is also added. That way, we can constantly
 add support for new archive formats without requiring users to change
 their code.

Yah.

 Some suggestions for that:

 1. The archive type should be represented by a string literal, not an
 enum. This way, users can add other archive types without having to
 touch the Phobos source code.
 2. The reader should auto-detect the archive type based on the file
 contents, not the file name, and then call the appropriate factory method.

The archive type should be a D class inheriting ArchiveReader, so no 
enum and no string need be involved. The rest is a matter of registry - 
a new archiver registers itself into a database of archivers that maps 
file header data to (pointers to) factory methods. Typical file 
extensions should help, too, because they'd ease matching.

Reading the file header (e.g. first 512 bytes) and then matching against 
archive signatures is, I think, a very nice touch. (I was only thinking 
of matching by file name.) There is a mild complication - you can't 
close and reopen the archive, so you need to pass those 512 bytes to the 
archiver along with the rest of the stream. This is because the stream 
may not be rewindable, as is the case with pipes.

Sounds great!


Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Reading the file header (e.g. first 512 bytes) and then matching against 
 archive signatures is, I think, a very nice touch. (I was only thinking 
 of matching by file name.) There is a mild complication - you can't 
 close and reopen the archive, so you need to pass those 512 bytes to the 
 archiver along with the rest of the stream. This is because the stream 
 may not be rewindable, as is the case with pipes.

The reasons for reading the file to determine the archive type are:

1. Files sometimes lose their extensions when being transferred around. I 
sometimes have this problem when downloading files from the internet - Windows 
will store it without an extension.

2. Sometimes I have to remove the extension when sending a file via email, as 
stupid email readers block certain email messages based on file attachment 
extensions.

3. People don't always put the right extension onto the file.

4. Passing an archive of one type to a reader for another type causes the
reader 
to crash (yes, I know, readers should be more robust that way, but reality is 
reality).


Is it really necessary to support streaming archives? The reason I ask is we
can 
nicely separate building/reading archives from file I/O. The archives can be 
entirely done in memory. Perhaps if an archive is being streamed, the program 
can simply accumulate it all in memory, then call the archive library functions.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 03:11 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Reading the file header (e.g. first 512 bytes) and then matching
 against archive signatures is, I think, a very nice touch. (I was only
 thinking of matching by file name.) There is a mild complication - you
 can't close and reopen the archive, so you need to pass those 512
 bytes to the archiver along with the rest of the stream. This is
 because the stream may not be rewindable, as is the case with pipes.

 The reasons for reading the file to determine the archive type are:

 1. Files sometimes lose their extensions when being transferred around.
 I sometimes have this problem when downloading files from the internet -
 Windows will store it without an extension.

 2. Sometimes I have to remove the extension when sending a file via
 email, as stupid email readers block certain email messages based on
 file attachment extensions.

 3. People don't always put the right extension onto the file.

 4. Passing an archive of one type to a reader for another type causes
 the reader to crash (yes, I know, readers should be more robust that
 way, but reality is reality).

Makes sense.

 Is it really necessary to support streaming archives?

It is not necessary, only vital.

 The reason I ask
 is we can nicely separate building/reading archives from file I/O. The
 archives can be entirely done in memory. Perhaps if an archive is being
 streamed, the program can simply accumulate it all in memory, then call
 the archive library functions.

This is completely nonscalable! 90% of all my archive manipulation 
involves streaming, and I wouldn't dream of thinking of loading most of 
those files in RAM. They are huge!

I paste from a script I'm working on right now:

     if [[ ! -f $D/sentences.num.gz ]]; then

         ./txt2num.d $D/voc.txt \
             < <(pv $D/sentences.txt.gz | gunzip) \
             > >(gzip >$D/sentences.num.tmp.gz)
         mv $D/sentences.num.tmp.gz $D/sentences.num.gz
     fi

That takes a good amount of time to run because the .gz involved is 
2,180,367,456 bytes _after_ compression. Note how zipping is done both 
ways - on reading and writing.

It would be great if we all went to the utmost possible lengths to 
distance ourselves from such nonscalable thinking. It's the root reason 
for which the wc sample program on digitalmars.com is _inappropriate_ 
and _damaging_ to the reputation of the language, and also the reason 
for which hash tables' implementation performs so poorly on large data - 
i.e., exactly when it matters. It's the kind of thinking stemming from 
"But I don't have _one_ file larger than 1GB anywhere on my hard drive!" 
which you repeatedly claimed as if it were a solid argument. Well if you 
don't have one you better get some.

Nobody's going to give us a cookie if we process 50KB files 10 times 
faster than Perl or Python. Where it does matter is large data, and I'd 
be in a much better mood if I didn't feel my beard growing while I'm 
waiting next to a program that uses hashes to build a large index file.


Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Is it really necessary to support streaming archives?

 It is not necessary, only vital.

I understand your point.

But I still would like a way to build and read archives entirely in memory. One 
reason is that's how dmd is able to generate libraries so quickly.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 04:22 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Is it really necessary to support streaming archives?

 It is not necessary, only vital.

 I understand your point.

 But I still would like a way to build and read archives entirely in
 memory. One reason is that's how dmd is able to generate libraries so
 quickly.

Makes sense.

(On the read side, reading in memory is not a problem if reading from a 
stream is defined - just use the streaming interface to load stuff in 
memory. For the writing part we need the mythical streaming abstraction 
that replaces current streams...)

Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 On 03/18/2010 04:22 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Is it really necessary to support streaming archives?

 It is not necessary, only vital.

 I understand your point.

 But I still would like a way to build and read archives entirely in
 memory. One reason is that's how dmd is able to generate libraries so
 quickly.

 
 Makes sense.
 
 (On the read side, reading in memory is not a problem if reading from a 
 stream is defined - just use the streaming interface to load stuff in 
 memory. For the writing part we need the mythical streaming abstraction 
 that replaces current streams...)
 
 Andrei

Maybe a better way to do it is to just pass a delegate that encapsulates a 
reader, and a delegate for the writing. That way, both streams and in-memory 
buffers will work with the same interface, and the archiver need know nothing 
about streams or memory.

Some default delegates can be provided that interface to streams, files, and 
memory buffers.

Or maybe just pass a range!

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 The basic interface is:

Another thing needed for the interface is an associative array that maps a 
string to a member of the archive. Object code libraries do this (the string is 
the unresolved symbol's name, the member is of course the corresponding object 
file).

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 05:11 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 The basic interface is:

 Another thing needed for the interface is an associative array that maps
 a string to a member of the archive. Object code libraries do this (the
 string is the unresolved symbol's name, the member is of course the
 corresponding object file).

Emphatically NO. Archives work with streams. You can build indexing on 
top of them.

Andrei

Mar 18 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-03-18 18:17:26 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 03/18/2010 05:11 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 The basic interface is:

 
 Another thing needed for the interface is an associative array that maps
 a string to a member of the archive. Object code libraries do this (the
 string is the unresolved symbol's name, the member is of course the
 corresponding object file).

 
 Emphatically NO. Archives work with streams. You can build indexing on 
 top of them.

Andrei, have you took a look at the Zip file format? It's not streamable.

To be exact, zip is not streamable because you need to read the central 
directory at the end of the archive to get the actual file list. This 
has its benefits: it makes it easy to peak at the content without 
loading everything, and it makes it possible to completely change the 
archive's logical content just by appending to the file. It's like a 
mini-database in a way.
<http://en.wikipedia.org/wiki/ZIP_(file_format)#Technical_information>

I agree it is essential to have streaming support for archives formats 
that works with streaming. But offering only that is not a solution for 
archives in general.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 05:32 PM, Michel Fortin wrote:
 On 2010-03-18 18:17:26 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 03/18/2010 05:11 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 The basic interface is:

 Another thing needed for the interface is an associative array that maps
 a string to a member of the archive. Object code libraries do this (the
 string is the unresolved symbol's name, the member is of course the
 corresponding object file).

 Emphatically NO. Archives work with streams. You can build indexing on
 top of them.

 Andrei, have you took a look at the Zip file format? It's not streamable.

 To be exact, zip is not streamable because you need to read the central
 directory at the end of the archive to get the actual file list. This
 has its benefits: it makes it easy to peak at the content without
 loading everything, and it makes it possible to completely change the
 archive's logical content just by appending to the file. It's like a
 mini-database in a way.
 <http://en.wikipedia.org/wiki/ZIP_(file_format)#Technical_information>

 I agree it is essential to have streaming support for archives formats
 that works with streaming. But offering only that is not a solution for
 archives in general.

Interesting, thank you. I still think generally a random-access 
interface is not the charter of the Archive interface. A zip archive 
should open the archive, seek to the end of it once, build an index, and 
then rewind the file for sequential access. But we shouldn't ask for 
such miracles from all archives.

Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 On 03/18/2010 05:11 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 The basic interface is:

 Another thing needed for the interface is an associative array that maps
 a string to a member of the archive. Object code libraries do this (the
 string is the unresolved symbol's name, the member is of course the
 corresponding object file).

 
 Emphatically NO. Archives work with streams. You can build indexing on 
 top of them.

Such an interface won't work with .lib or .a archives. Both have an embedded 
table of contents that is such an associative array - it's not a list of file 
names, either, that's separate.

Mar 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/18/2010 06:00 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 On 03/18/2010 05:11 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 The basic interface is:

 Another thing needed for the interface is an associative array that maps
 a string to a member of the archive. Object code libraries do this (the
 string is the unresolved symbol's name, the member is of course the
 corresponding object file).

 Emphatically NO. Archives work with streams. You can build indexing on
 top of them.

 Such an interface won't work with .lib or .a archives. Both have an
 embedded table of contents that is such an associative array - it's not
 a list of file names, either, that's separate.

Now I understand why linkers thrash the disk.

Anyway, my point is: indexing the archive should be not part of the 
basic interface. Such capabilities should be in an enhanced interface 
that builds upon the basic one.

Andrei

Mar 18 2010

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 On 03/18/2010 06:00 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 On 03/18/2010 05:11 PM, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 The basic interface is:

 Another thing needed for the interface is an associative array that 
 maps
 a string to a member of the archive. Object code libraries do this (the
 string is the unresolved symbol's name, the member is of course the
 corresponding object file).

 Emphatically NO. Archives work with streams. You can build indexing on
 top of them.

 Such an interface won't work with .lib or .a archives. Both have an
 embedded table of contents that is such an associative array - it's not
 a list of file names, either, that's separate.

 
 Now I understand why linkers thrash the disk.

I think this is incorrect. The table of contents in the .lib files was designed 
to work with a floppy disk system, and to minimize the number of disk reads.
The 
design of .a libraries is equivalent.

The thrashing of linkers came about on limited memory systems as the linker's 
in-memory data set often exceeded physical ram. A typical linker run also
simply 
needs to read a lot of files.


 Anyway, my point is: indexing the archive should be not part of the 
 basic interface. Such capabilities should be in an enhanced interface 
 that builds upon the basic one.

That would be fine.

Mar 18 2010

Robert Clipsham <robert octarineparrot.com> writes:

On 18/03/10 16:39, Ellery Newcomer wrote:
 Wait, what's tar.xz?

http://en.wikipedia.org/wiki/Xz - A lot of linux distro's seem to be 
moving to it for packaging from .tar.gz, I'm on Arch Linux, and the 
updates are a fraction of the size they used to be :)

Mar 18 2010

D Programming

C/C++ Programming

Other

digitalmars.D - dmdz