www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is D right for me?

reply "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Hello!

I'd like some help to resolve the subject's question...

Let me start with my background...Finished Computer Engineering
studies and during that period was introduced to several programming
languages (including some assembly stuff) starting with Fortran & C, a
little Prolog and did my thesis by programming simulator for coloured
Petri nets using Zortech C++ - that was in 1990 :-)

I also playing with Smalltalk a bit...I'm saying as small emphasis
that D would not be my 1st programming language.

Later, my life went away from programming into totally different area
and today programming is not my occupation, but should serve my hobby
projects.

After my 'programming come-back', I explored Ruby a bit, checked Ocaml
shortly (did not like syntax) and wanting something 'new', I ended up
tinkering with Haskell.

For my web purposes I wanted to use Django, but realized that not
being web developer, there is no sense to write so many code, when I
can achieve the same by learning some PHP (which I do these days) in
order to tweak and/or write some missing module for my preferred CMS
(Made Simple).

Similarly, to create invoices for our startup 'company', I abandoned
GNUCash which requires tinkering with Scheme (Guile) and decided to
use SimpleInvoices (PHP & MySQL web app).

The keyword here is 'pragmatism', iow. understanding that one needs to
make some compromises in order to "get job done".

Now, we're back at D...Saw a Reddit thread yesterday which inspired
me to think (once again) about Haskell vs D...

So, we want a general programming language to work on our open-source
(we plan GPL) hobby project which is desktop GUI application and
besides the need to develop several libs for it, it needs to use C-lib
(Swiss Ephemeris).

Are there other alternatives?

Well, I do not like Java, VMs (Scala included)., want something
'modern' to avoid manual memory management, pointers etc.,
higher-level...which eliminates C(++).

Scripting languages (Perl, Python) are too slow and I'm aware of some
projects from the same domain which switched from Python to C++.

I'm not interested in LISP-family ala Clojure, neither inspired by
C#,Go...

We want to develop on Linux (running x86_64 I7 cpu) and have app
working on Mac and possibly Windoze.

For a long time I was thinking about gtk2hs bindings, but since Mac
platform became important for us (supervisor of the project recently
switched to it), I abandoned GTK. I was even advised by one dev
working on GTK Mac port that wx(haskell) is better solution if we
target Mac.

However, we would like to write kind of 'desktop-lite' app here idea
to use Qt & Meego was born, since there is no wxQT port.

I was sorry to discover yesterday that QtD project is suspended. :-(

So, let's recap in regards to Haskell vs D.

a) I like Haskell syntax, its type-system, purity and the concept of
separating pure code from non-pure (e.g. IO), HOF. Community is very
friendly and growing (1st time when i visited #haskell it was <100
users, today probably >600), there are lot of packages available on
Hackage, GHC is keeping strong, Cabal build system is nice,
QuickCheck...

Otoh, many libs/packages are not adequately  documented, there is joke
that one needs PhD to use the language, lot of papers but with strong
influence from academia and one has to encounter lot of terminology
from category theory etc. although maybe wanting to 'just get the job
done' - iow, Haskell could become more pragmatic. Moreover, to get
better performance, laziness with its non-determinism might be a
problem and/or profiling to discover leaks is not straightforward
and/or code becomes more ugly. :-)

D, from the other side, is younger language, community is not so
big, language is, afaict, evolving very rapidly and it's not easy to
tell which compiler to use, which libs etc.

Moreover, I'm a bit worried on the state of GUI libs available for D,
especially about QtD.

Moreover, 64-bit is not ready (yet), although I'm told it should come
soon. What about ARM if we want to target MeeGo in the future?

I also did not research what is the state of database support...Now
we're thinking to use sqlite3 as back-end.

Our project will be developed in free time and we want language which
is easy to maintain because the project (with all desired features)
might evolve into a big one during the period of several years.

I also have experience that some potential developers did not join our
team since Haskell was to hard to grok for them (coming from C++), so
D might be an easier path with less steep learning curve, but I also
wonder about myself whether I could pick D quickly enough (I'll buy
book, of course) after long exposure to Haskell and FP.

I read The Case for D article and saw Andrei's Google talk - it was
funny to see Google people being like little children when questioned
by him :-)

So, can you offer some advice, what could be better choice between
Haskell & D for our planned project with the following features:

a) maintainable code

b) decent performance

c) higher-level programming and suitable for general programming tasks

d) good library support (database stuff, data structures, Qt GUI...)

e) vibrant community and active development so that there is some
guarantee that the language won't fall in oblivion if some devs leave
the project, iow. 'bus-factor > 2' ?

(It would be nice if someone familiar with both languages can share...)


Sincerely,
Gour

--=20

Gour  | Hlapicina, Croatia  | GPG key: CDBF17CA
----------------------------------------------------------------
Oct 05 2010
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Gour D. <gour atmarama.net> wrote:

 D, from the other side, is younger language, community is not so
 big, language is, afaict, evolving very rapidly and it's not easy to
 tell which compiler to use, which libs etc.

This is a big problem for D at this point. The language is no longer evolving (much), and we're at a point in time where libraries and toolchain parts need to be written.
 Moreover, I'm a bit worried on the state of GUI libs available for D,
 especially about QtD.

 Moreover, 64-bit is not ready (yet), although I'm told it should come
 soon.

It will. Latest news (2 days ago) say it's now getting as far as main(), which is good.
 What about ARM if we want to target MeeGo in the future?

I believe GDC supports ARM.
 I also did not research what is the state of database support...Now
 we're thinking to use sqlite3 as back-end.

There's a list here: http://www.wikiservice.at/d/wiki.cgi?DatabaseBindings However, most of those are for D1, and a large percentage seem to be abandoned. SQLite seems to be well supported, with 7 projects claiming support.
 I also have experience that some potential developers did not join our
 team since Haskell was to hard to grok for them (coming from C++), so
 D might be an easier path with less steep learning curve, but I also
 wonder about myself whether I could pick D quickly enough (I'll buy
 book, of course) after long exposure to Haskell and FP.

I'm sure you can. D also supports programming styles closer to those of FP, making such a transition easier (I hope :p)
 So, can you offer some advice, what could be better choice between
 Haskell & D for our planned project with the following features:

 a) maintainable code

This is likely a bit subjective, and much more dependent upon the programmers themselves than the language used. That said, D supports a variety of features that boost maintainability: - Contract programming in the form of pre and post contracts for functions[1]. - Class invariants[2]. - Built in unit testing[3]. - Documentation comments[4]. Of course, other features of D may increase maintainability, but those are the ones most directly associated with it.
 b) decent performance

D is generally as fast as C, though some abstractions of course cost more than others.
 c) higher-level programming and suitable for general programming tasks

My impression (not having used Haskell), D wins hands down on the latter, and is a bit weaker on the former.
 d) good library support (database stuff, data structures, Qt GUI...)

Likely Haskell is better here (as noted above, D has some problems in this regard).
 e) vibrant community and active development so that there is some
 guarantee that the language won't fall in oblivion if some devs leave
 the project, iow. 'bus-factor > 2' ?

The bus-factor of D is sadly close to 1. If Walter should choose to leave, we have a problem. On the other hand, I don't think a mere bus would keep him from continuing the project.
 (It would be nice if someone familiar with both languages can share...)

Here I can't help. I don't know Haskell. In closing, [1]: http://digitalmars.com/d/2.0/dbc.html [2]: http://digitalmars.com/d/2.0/class.html#invariants [3]: http://digitalmars.com/d/2.0/unittest.html [4]: http://digitalmars.com/d/2.0/ddoc.html -- Simen
Oct 05 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Gour D. schrieb:
 On Tue, 05 Oct 2010 12:52:22 +0200
 "Simen" == "Simen kjaeraas" <simen.kjaras gmail.com> wrote:






Simen> I believe GDC supports ARM. Hmm, baes on http://dgcc.sourceforge.net/ it looks it is not overly active?

try http://bitbucket.org/goshawk/gdc/wiki/Home :-)
 
 Simen> The bus-factor of D is sadly close to 1. If Walter should choose
 Simen> to leave, we have a problem. On the other hand, I don't think a
 Simen> mere bus would keep him from continuing the project.
 
 Uhh...this is almost like a showstopper or, at least, very strong
 anti-adoption pattern. :-(
 

I don't think it's as serious, because afaik Walter is not the only one developing the dmd compiler (and thus familiar with it) and, more importantly, there are alternative D compilers (gdc and ldc, with at least gdc being actively developed). So even if Walter, for whatever reason, stops developing D, there is - IMHO - a good chance that others will continue his efforts and keep D alive. Cheers, - Daniel
Oct 05 2010
next sibling parent Amber <hate spam.com> writes:
Daniel Gibson Wrote:

 Gour D. schrieb:
 On Tue, 05 Oct 2010 12:52:22 +0200
 "Simen" == "Simen kjaeraas" <simen.kjaras gmail.com> wrote:






Simen> I believe GDC supports ARM. Hmm, baes on http://dgcc.sourceforge.net/ it looks it is not overly active?

try http://bitbucket.org/goshawk/gdc/wiki/Home :-)
 
 Simen> The bus-factor of D is sadly close to 1. If Walter should choose
 Simen> to leave, we have a problem. On the other hand, I don't think a
 Simen> mere bus would keep him from continuing the project.
 
 Uhh...this is almost like a showstopper or, at least, very strong
 anti-adoption pattern. :-(
 

I don't think it's as serious, because afaik Walter is not the only one developing the dmd compiler (and thus familiar with it) and, more importantly, there are alternative D compilers (gdc and ldc, with at least gdc being actively developed). So even if Walter, for whatever reason, stops developing D, there is - IMHO - a good chance that others will continue his efforts and keep D alive. Cheers, - Daniel

I've also heard there is an unannounced compiler in the works.
Oct 05 2010
prev sibling next sibling parent Daniel Gibson <metalcaedes gmail.com> writes:
Gour D. schrieb:
 On Tue, 05 Oct 2010 15:39:30 +0200
 "Daniel" == Daniel Gibson <metalcaedes gmail.com> wrote:






Daniel> try http://bitbucket.org/goshawk/gdc/wiki/Home :-) Ahh, this looks much better. Thanks. ;) Daniel> I don't think it's as serious, because afaik Walter is not the Daniel> only one developing the dmd compiler (and thus familiar with Daniel> it) and, more importantly, there are alternative D compilers Daniel> (gdc and ldc, with at least gdc being actively developed). So, both gdc & ldc are open-source?

Yes.
 What about standard libs?

They're open source, too (boost license for phobos, is said to be even more liberal than BSD license, there is an alternative standard lib for D1 - tango[1] - that uses a BSD license and the "Academic Free License v3.0"[2]). Also, there are already several people maintaining/developing Phobos (not just Walter and Andrei Alexandrescu). Cheers, - Daniel [1] http://www.dsource.org/projects/tango/ [2] http://www.dsource.org/projects/tango/wiki/LibraryLicense
Oct 05 2010
prev sibling parent reply Don <nospam nospam.com> writes:
Daniel Gibson wrote:
 Gour D. schrieb:
 On Tue, 05 Oct 2010 12:52:22 +0200
 "Simen" == "Simen kjaeraas" <simen.kjaras gmail.com> wrote:






Simen> I believe GDC supports ARM. Hmm, baes on http://dgcc.sourceforge.net/ it looks it is not overly active?

try http://bitbucket.org/goshawk/gdc/wiki/Home :-)
 Simen> The bus-factor of D is sadly close to 1. If Walter should choose
 Simen> to leave, we have a problem. On the other hand, I don't think a
 Simen> mere bus would keep him from continuing the project.

 Uhh...this is almost like a showstopper or, at least, very strong
 anti-adoption pattern. :-(

I don't think it's as serious, because afaik Walter is not the only one developing the dmd compiler (and thus familiar with it) and, more importantly, there are alternative D compilers (gdc and ldc, with at least gdc being actively developed). So even if Walter, for whatever reason, stops developing D, there is - IMHO - a good chance that others will continue his efforts and keep D alive.

Look at the changelog for the last three releases. During that time, Walter has worked almost exclusively on the backend for the 64-bit compiler. If there were no community involvement, there would have been almost no progress on the 32-bit compiler. Yet the rate of compiler bug fixing has not fallen. I would estimate the truck factor as between 2.0 and 2.5. Two years ago, the truck factor was 1.0, but not any more.
Oct 05 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/5/10 9:37 CDT, Gour D. wrote:
 On Tue, 05 Oct 2010 16:01:41 +0200
 "Don" == Don<nospam nospam.com>  wrote:






Don> I would estimate the truck factor as between 2.0 and 2.5. Two Don> years ago, the truck factor was 1.0, but not any more. Nice, nice...Still SO people say: "Neither Haskell nor D is popular enough for it to be at all likely that you will ever attract a single other developer to your project..." :-)

If developer attraction is a concern, you're likely better off with D. Programmers who have used at least one Algol-like language (C, C++, Java, C#) will have no problem feeling comfortable in D. With Haskell you'd need to stick with "the choir".
 If just QtD hadn't been suspended...

I agree that's a bummer. I suggest you write the developers and ask what would revive their interest. The perspective of a solid client is bound to be noticeable. Andrei
Oct 05 2010
next sibling parent bearophile <bearophileHUGS lycps.com> writes:
Denis Koroskin:

 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=103453

Yes, among the things D2 is designed to support, a there's a lack of built-in (or user-defined) basic features to create GUIs. But the good thing is that (I think) all the useful features here are additive changes. Bye, bearophile
Oct 05 2010
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/5/10 14:59 CDT, Denis Koroskin wrote:
 http://h3.gd/devlog/?p=22 - increasingly more people are unsatisfied
 with D2 and talking about a fork so I wouldn't be surprised to see one
 sooner or later (!)

I'm confused. Reading through the link reveals exactly two people mentioning that a fork would be good. That doesn't quite qualify for "increasingly more people" in normal conversation. Andrei
Oct 05 2010
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Gour D. wrote:
 Andrei> I agree that's a bummer. I suggest you write the developers and
 Andrei> ask what would revive their interest. The perspective of a
 Andrei> solid client is bound to be noticeable.
 
 You think that D beginner with a open-source project is "solid client"?
 
 Otoh, there is nothing to lose...

Few things work better than customers letting a company know they are interested in such-and-such a product.
Oct 05 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Gour D. wrote:
 Walter> Few things work better than customers letting a company know
 Walter> they are interested in such-and-such a product.
 Even a non-paying customer in the open-source world?

At least it shows interest. No emails tells the open source developer "nobody cares, so I'll just abandon it".
 Well, I'm going to send email to two people considering them important
 for QtD, based on what I could deduce...

Good!
Oct 09 2010
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Gour D. wrote:
 (64bit dmd, do you hear me?)

Don't I know it. 64 bit support is absolutely essential for D's future.
Oct 10 2010
parent reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
LDC support 64 bit ;)
Oct 10 2010
next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
bioinfornatics schrieb:
 LDC support 64 bit ;)

as well as GDC. But both currently lack an up-to-date D2 compiler (but the GDC guys are at least working on it, seems like they're currently at 2.029 - which is great - about 3 months ago they were still at 2.018 and in between was the big 2.020 update that introduced druntime). I agree with walter that 64bit support in DMD is very important, especially for D2: I started a project a few months ago that might have benefited from D2s features (especially ranges), but I decided to use D1 because no 64bit compiler for D2 was in sight. I am btw a bit worried about "upgrading" code from D1 to D2 some day because of the heavy (still ongoing) changes, especially in phobos. I read that for example std.stream is going to be deprecated and replaced by either something with ranges or something like std.stdio (whoever was right about that, maybe even both?). This means that all my (file/network) IO-Code would have to be rewritten sooner or later.. But, currently lacking an alternative to std.stream/std.socketstream for networking, I would have had the same problems even if I had started the project with D2...
Oct 10 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/10/10 20:59 CDT, Jonathan M Davis wrote:
 On Sunday 10 October 2010 17:27:55 Daniel Gibson wrote:
 bioinfornatics schrieb:
 LDC support 64 bit ;)

as well as GDC. But both currently lack an up-to-date D2 compiler (but the GDC guys are at least working on it, seems like they're currently at 2.029 - which is great - about 3 months ago they were still at 2.018 and in between was the big 2.020 update that introduced druntime). I agree with walter that 64bit support in DMD is very important, especially for D2: I started a project a few months ago that might have benefited from D2s features (especially ranges), but I decided to use D1 because no 64bit compiler for D2 was in sight. I am btw a bit worried about "upgrading" code from D1 to D2 some day because of the heavy (still ongoing) changes, especially in phobos. I read that for example std.stream is going to be deprecated and replaced by either something with ranges or something like std.stdio (whoever was right about that, maybe even both?). This means that all my (file/network) IO-Code would have to be rewritten sooner or later.. But, currently lacking an alternative to std.stream/std.socketstream for networking, I would have had the same problems even if I had started the project with D2...

A stream solution is in the works (it's discussed periodically on the Phobos list), but they haven't sorted out quite what they want to do with it yet. The Phobos API in general is in flux, though pieces of it are likely to stay more or less unchanged from what they currently are. But there's far from any kind of guarantee that much of anything from the D1 Phobos is going to survive in the D2 Phobos. They're looking to make Phobos as good as they can, and they aren't yet worried about keeping its API stable (though they don't make changes unless they think that it's actually benificial, so things don't change willy-nilly). I'm sure that the time will come, however, when Phobos' API will stabilize, and projects will be able to rely on it staying the same. I think that the reality of the matter is that porting D1 code to D2 code is going to be just like, if not exactly like, porting code from one library to another rather than an upgrade like you'd get between Qt3 and Qt4 (which had plenty of changes). I'm sure that the split between D1 and D2 is going to cause a lot of problems for people looking to port code from one to the other, but it will be better for newly written code, so it's a definite tradeoff. I wouldn't look forward to porting a project from D1 to D2 though.

I think it's a bit hasty to speak on behalf of all of Phobos' participants. Phobos 2 is indeed different from Phobos 1 but backward-incompatible changes to Phobos 2 are increasingly rare. Andrei
Oct 10 2010
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2010-10-11 06:54, Jonathan M Davis wrote:
 On Sunday 10 October 2010 21:26:28 Andrei Alexandrescu wrote:
 I think it's a bit hasty to speak on behalf of all of Phobos'
 participants. Phobos 2 is indeed different from Phobos 1 but
 backward-incompatible changes to Phobos 2 are increasingly rare.

Sorry if I overstepped my bounds on that. It's just that from what I've seen, the Phobos devs have been quite willing to make backwards incompatible changes if they thought that they were an improvement, though they aren't done all that frequently. Backwards compatability is considered, but improvements to the API seem to override it. Regardless, the result is that if you wrote your code for dmd 2.040 or something similar and ended up trying to update it to 2.050, you'd likely have a number of changes to make, though porting from Phobos 1 would be far worse. If Phobos were completely stable or at least never made backwards-compatability breaking changes, that wouldn't be the case. I fully expect that as Phobos matures, such breaking changes will become quite rare if not outright nonexistent, but they do still happen. Actually deprecating and replacing the modules that are intended to be deprecated and replace will help a lot with that, but that obviously takes time. - Jonathan M Davis

I think it's time to separate the compiler, the language and phobos in the releases. One idea is the use three numbers to indicate a release of phobos, major, minor and build, like this: 2.4.3. Increment the build number when a implementation detail is changed the doesn't change the API. Increment the minor number when a backwards compatible API change is made and increment the major number when a API change is made that break backwards compatibility. -- /Jacob Carlborg
Oct 11 2010
prev sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Andrei Alexandrescu schrieb:
 On 10/10/10 20:59 CDT, Jonathan M Davis wrote:
 On Sunday 10 October 2010 17:27:55 Daniel Gibson wrote:
 bioinfornatics schrieb:
 LDC support 64 bit ;)

as well as GDC. But both currently lack an up-to-date D2 compiler (but the GDC guys are at least working on it, seems like they're currently at 2.029 - which is great - about 3 months ago they were still at 2.018 and in between was the big 2.020 update that introduced druntime). I agree with walter that 64bit support in DMD is very important, especially for D2: I started a project a few months ago that might have benefited from D2s features (especially ranges), but I decided to use D1 because no 64bit compiler for D2 was in sight. I am btw a bit worried about "upgrading" code from D1 to D2 some day because of the heavy (still ongoing) changes, especially in phobos. I read that for example std.stream is going to be deprecated and replaced by either something with ranges or something like std.stdio (whoever was right about that, maybe even both?). This means that all my (file/network) IO-Code would have to be rewritten sooner or later.. But, currently lacking an alternative to std.stream/std.socketstream for networking, I would have had the same problems even if I had started the project with D2...

A stream solution is in the works (it's discussed periodically on the Phobos list), but they haven't sorted out quite what they want to do with it yet. The Phobos API in general is in flux, though pieces of it are likely to stay more or less unchanged from what they currently are. But there's far from any kind of guarantee that much of anything from the D1 Phobos is going to survive in the D2 Phobos. They're looking to make Phobos as good as they can, and they aren't yet worried about keeping its API stable (though they don't make changes unless they think that it's actually benificial, so things don't change willy-nilly). I'm sure that the time will come, however, when Phobos' API will stabilize, and projects will be able to rely on it staying the same. I think that the reality of the matter is that porting D1 code to D2 code is going to be just like, if not exactly like, porting code from one library to another rather than an upgrade like you'd get between Qt3 and Qt4 (which had plenty of changes). I'm sure that the split between D1 and D2 is going to cause a lot of problems for people looking to port code from one to the other, but it will be better for newly written code, so it's a definite tradeoff. I wouldn't look forward to porting a project from D1 to D2 though.

I think it's a bit hasty to speak on behalf of all of Phobos' participants. Phobos 2 is indeed different from Phobos 1 but backward-incompatible changes to Phobos 2 are increasingly rare. Andrei

But parts of phobos are deprecated or will be deprecated and there still is no alternative for them. That may prevent people from writing "real" projects in D2 (or D at all) - who wants to use classes that will be deprecated soon? Sure, that old stuff will not be removed and can still be used, but I personally feel a bit uncomfortable with using deprecated code. Cheers, - Daniel
Oct 11 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there still
 is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D at all)
 - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used, but I
 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list for streams. What are the essential features/feature groups? Andrei
Oct 11 2010
next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Andrei Alexandrescu schrieb:
 On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there still
 is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D at all)
 - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used, but I
 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list for streams. What are the essential features/feature groups? Andrei

Maybe something like the following (I hope it's not too extensive): * Input- Output- and InputAndOutput- Streams - having InputStream and OutputStream as an interface like in the old design may be a good idea - implementing the standard operations that are mostly independent from the data source/sink like read/write for basic types, strings, ... in mixin templates is probably elegant to create streams that are both Input and Output (one mixin that implements most of InputStream and one that implements most of OutputStream) * Two kinds of streams: 1. basic streams: reading/writing from/to: * network (socket) * files * just memory (currently MemoryStream) * Arrays/Ranges? * ... 2. streams wrapping other streams: * for buffering - buffer input/output/both - with the possibility to peek? * to modify data when it's read/written (e.g. change endianess - important for networking!) * custom streams.. e.g. could parse/create CSV (comma seperated values) data or similar * Also there are different types of streams: seekable, resettable (a network stream is neither), ... * functionality/methods needed/desirable: - low level access * void read(void *buf, size_t len) // read *exactly* len bytes into buf * void write(void *buf, size_t len) // write *exactly* len bytes from buf to stream - convenient methods to read/write basic types in binary (!) from/to stream * <type> read<Type>() (like int readInt()) or T read(T)() (like int read!int()) - with enforcing T is somehow basic (certainly no Object or pointer) - could use read(void *buf, size_t len) like in old implementation * void write( <basic type> val ) or void write(T)( T val ) - again T should be basic type - could use write(void *buf, size_t len) like in old implementation - convenient methods to read/write arrays of T (T should again be a basic type) * T[] readArray(T)( size_t len) // return array of T's containing len T's - probably both alternatives make sense - the first one to write into an existing array (-slice), the second one for convenience if you want a new array anyway * void read(T)( T[] array ) // read array.length T's into array - maybe name this readArray(T)(..) as well for consistency? * void writeArray(T)( T[] array ) - special cases for strings? * void writeString(char[] str) // same for wchar and dchar - could write str into the stream with its length (as ushort xor uint xor ulong, _not_ size_t!) prepended * char[] readString() // same for wchar and dchar - read length of the string and then the string itself that will be returned - all that array/string/low level stuff but reading *at most* len (or array.length) values and returning the amount actually read ( readUpTo() ?) * useful e.g. for parsing http (you don't know how long the header is etc) * the same for write? don't see much use for that though.. - some way to determine whether the stream * is at its definite end (eof on file, socket closed or something like that) * currently empty (for input stream) - just doing a read() would block ? - Output streams need flush() - for Input streams skip(size_t noBytes) or even skip(T)(size_t noElems) may be handy to just throw away data we're not interested in without having it copied around - especially for non-seekable streams (network..) Cheers, - Daniel
Oct 11 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/11/2010 07:49 PM, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 Agreed. Maybe this is a good time to sart making a requirements list
 for streams. What are the essential features/feature groups?

 Andrei

Maybe something like the following (I hope it's not too extensive): * Input- Output- and InputAndOutput- Streams - having InputStream and OutputStream as an interface like in the old design may be a good idea - implementing the standard operations that are mostly independent from the data source/sink like read/write for basic types, strings, ... in mixin templates is probably elegant to create streams that are both Input and Output (one mixin that implements most of InputStream and one that implements most of OutputStream)

So far so good. I will point out, however, that the classic read/write routines are not all that good. For example if you want to implement a line-buffered stream on top of a block-buffered stream you'll be forced to write inefficient code. Also, a requirement that I think is essential is separation between formatting and transport. std.stream does not have that. At the top level there are two types of transport: text and binary. On top of that lie various formatters.
 * Two kinds of streams:
 1. basic streams: reading/writing from/to:
 * network (socket)
 * files
 * just memory (currently MemoryStream)
 * Arrays/Ranges?
 * ...
 2. streams wrapping other streams:
 * for buffering - buffer input/output/both
 - with the possibility to peek?
 * to modify data when it's read/written (e.g. change endianess -
 important for networking!)
 * custom streams.. e.g. could parse/create CSV (comma seperated values)
 data or similar

Would these be streams be different in their interface?
 * Also there are different types of streams: seekable, resettable (a
 network stream is neither), ...

Agreed. Question: is there a file system that offers resettable but not seekable files? I'm thinking of collapsing the two together.
 * functionality/methods needed/desirable:
 - low level access
 * void read(void *buf, size_t len) // read *exactly* len bytes into buf
 * void write(void *buf, size_t len) // write *exactly* len bytes from
 buf to stream
 - convenient methods to read/write basic types in binary (!) from/to stream

Again, binary vs. text is a capability of the stream. For example, a tty can never transport binary data - programs like gzip refuse to write binary data to a terminal. (Then of course a binary stream can always accommodate text data.)
 * <type> read<Type>() (like int readInt()) or T read(T)() (like int
 read!int())

Templates will be difficult for a class hierarchy.
 - with enforcing T is somehow basic (certainly no Object or pointer)
 - could use read(void *buf, size_t len) like in old implementation
 * void write( <basic type> val ) or void write(T)( T val ) - again T
 should be basic type
 - could use write(void *buf, size_t len) like in old implementation
 - convenient methods to read/write arrays of T (T should again be a
 basic type)
 * T[] readArray(T)( size_t len) // return array of T's containing len T's
 - probably both alternatives make sense - the first one to write into an
 existing
 array (-slice), the second one for convenience if you want a new array
 anyway
 * void read(T)( T[] array ) // read array.length T's into array
 - maybe name this readArray(T)(..) as well for consistency?
 * void writeArray(T)( T[] array )
 - special cases for strings?
 * void writeString(char[] str) // same for wchar and dchar
 - could write str into the stream with its length (as ushort xor uint
 xor ulong,
 _not_ size_t!) prepended
 * char[] readString() // same for wchar and dchar
 - read length of the string and then the string itself that will be
 returned

Many of these capabilities involve template methods. Is a template-based approach preferable to a straight class hierarchy? I tend to think that in the case of streams, classic hierarchies are most adequate.
 - all that array/string/low level stuff but reading *at most* len (or
 array.length) values
 and returning the amount actually read ( readUpTo() ?)
 * useful e.g. for parsing http (you don't know how long the header is etc)
 * the same for write? don't see much use for that though..

 - some way to determine whether the stream
 * is at its definite end (eof on file, socket closed or something like
 that)
 * currently empty (for input stream) - just doing a read() would block ?

 - Output streams need flush()
 - for Input streams skip(size_t noBytes) or even skip(T)(size_t noElems)
 may be
 handy to just throw away data we're not interested in without having it
 copied around - especially for non-seekable streams (network..)

OK, that's a good start. Let's toss this back and forth a few times and see what sticks. Andrei
Oct 13 2010
next sibling parent reply Johannes Pfau <spam example.com> writes:
On 13.10.2010 16:32, Andrei Alexandrescu wrote:
 On 10/11/2010 07:49 PM, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 * Also there are different types of streams: seekable, resettable (a
 network stream is neither), ...

Agreed. Question: is there a file system that offers resettable but not seekable files? I'm thinking of collapsing the two together.

Can't think of a file system. But for example a http stream is always resettable, but not always seekable. -- Johannes Pfau
Oct 13 2010
next sibling parent Wayne Anderson <wanderon comcast.net> writes:
Don't forget data alignment.  For example, when streaming structs it may be
useful to stream out to the file
system or network in packed form, then unpack to the preferred alignment of the
destination architecture
on streaming in.
Oct 13 2010
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
  On 13.10.2010 19:37, Johannes Pfau wrote:
 On 13.10.2010 16:32, Andrei Alexandrescu wrote:
 On 10/11/2010 07:49 PM, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 * Also there are different types of streams: seekable, resettable (a
 network stream is neither), ...

seekable files? I'm thinking of collapsing the two together.

resettable, but not always seekable.

resettable, but not seekable. To seek one would need to decompress stream and discard data (hardly any faster then reading). -- Dmitry Olshansky
Oct 13 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 On Wed, 13 Oct 2010 18:32:15 +0400, Andrei Alexandrescu
 So far so good. I will point out, however, that the classic read/write
 routines are not all that good. For example if you want to implement a
 line-buffered stream on top of a block-buffered stream you'll be
 forced to write inefficient code.

Never heard of filesystems that allow reading files in lines - they always read in blocks, and that's what streams should do.

http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html I don't think streams must mimic the low-level OS I/O interface.
 That's because
 most of the steams are binary streams, and there is no such thing as a
 "line" in them (e.g. how often do you need to read a line from a
 SocketStream?).

http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html You need a line when e.g. you parse a HTML header or a email header or an FTP response. Again, if at a low level the transfer occurs in blocks, that doesn't mean the API must do the same at all levels.
 I don't think streams should buffer anything either (what an underlying
 OS I/O API caches should suffice), buffered streams adapters can do that
 in a stream-independent way (why duplicate code when you can do that as
 efficiently with external methods?).

Most OS primitives don't give access to their own internal buffers. Instead, they ask user code to provide a buffer and transfer data into it. So clearly buffering on the client side is a must.
 Besides, as you noted, the buffering is redundant for byChunk/byLine
 adapter ranges. It means that byChunk/byLine should operate on
 unbuffered streams.

Chunks keep their own buffer so indeed they could operate on streams that don't do additional buffering. The story with lines is a fair amount more complicated if it needs to be done efficiently.
 I'll explain my I/O streams implementation below in case you didn't read
 my message (I've changed some stuff a little since then).

Honest, I opened it to remember to read it but somehow your fonts are small and make my eyes hurt.
 My Stream
 interface is very simple:

 // A generic stream
 interface Stream
 {
  property InputStream input();
  property OutputStream output();
  property SeekableStream seekable();
  property bool endOfStream();
 void close();
 }

 You may ask, why separate Input and Output streams?

I think my first question is: why doesn't Stream inherit InputStream and OutputStream? My hypothesis: you want to sometimes return null. Nice.
 Well, that's because
 you either read from them, write from them, or both.
 Some streams are read-only (think Stdin), some write-only (Stdout), some
 support both, like FileStream. Right?

Sounds good. But then where's flush()? Must be in OutputStream.
 Not exactly. Does FileStream support writing when you open file for
 reading? Does it support reading when you open for writing?
 So, you may or may not read from a generic stream, and you also may or
 may not write to a generic stream. With a design like that you can make
 a mistake: if a stream isn't readable, you have no reference to invoke
 read() method on.

That is indeed pretty nifty. I hope you would allow us to copy that feature in Phobos (unless you are considering submitting your library wholesale). Let me know.
 Similarly, a stream is either seekable, or not. SeekableStreams allow
 stream cursor manipulation:

 interface SeekableStream : Stream
 {
 long getPosition(Anchor whence = Anchor.begin);
 void setPosition(long position, Anchor whence = Anchor.begin);
 }

Makes sense. Why is getPosition signed? Why do you need an anchor for getPosition?
 InputStream doesn't really has many methods:

 interface InputStream
 {
 // reads up to buffer.length bytes from a stream
 // returns number of bytes read
 // throws on error
 size_t read(ubyte[] buffer);

That makes implementation of line buffering inefficient :o).
 // reads from current position
 AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null);
 }

Why doesn't Sean's concurrency API scale for your needs? Can that be fixed? Would you consider submitting some informed bug reports?
 So is OutputStream:

 interface OutputStream
 {
 // returns number of bytes written
 // throws on error
 size_t write(const(ubyte)[] buffer);

 // writes from current position
 AsyncWriteRequest writeAsync(const(ubyte)[] buffer, Mailbox* mailbox =
 null);
 }

 They basically support only reading and writing in blocks, nothing else.

I'm surprised there's no flush().
 However, they support asynchronous reads/writes, too (think of mailbox
 as a std.concurrency's Tid).

 Unlike Daniel's proposal, my design reads up to buffer size bytes for
 two reasons:
 - it avoids potential buffering and multiple sys calls

But there's a problem. It's very rare that the user knows what a good buffer size is. And often there are size and alignment restrictions at the low level. So somewhere there is still buffering going on, and also there are potential inefficiencies (if a user reads small buffers).
 - it is the only way to go with SocketStreams. I mean, you often don't
 know how many bytes an incoming socket message contains. You either have
 to read it byte-by-byte, or your application might stall for potentially
 infinite time (if message was shorter than your buffer, and no more
 messages are being sent)

But if you don't know how many bytes are in an incoming socket message, a better design is to do this: void read(ref ubyte[] buffer); and resize the buffer to accommodate the incoming packet. Your design _imposes_ that the socket does additional buffering.
 Why do my streams provide async methods? Because it's the modern
 approach to I/O - blocking I/O (aka one thread per client) doesn't
 scale. E.g. Java adds a second revision of Async I/O API in JDK7 (called
 NIO2, first appeared in February, 2002), C# has asynchronous operations
 as part of their Stream interface since .NET 1.1 (April, 2003).

Async I/O is nice, no two ways about that. I have on my list to define byChunkAsync that works exactly like byChunk from the client's perspective, except it does I/O concurrently with client code. [snip]
 I strongly believe we shouldn't ignore this type of API.

 P.S. For threads this deep it's better fork a new one, especially when
 changing the subject.

I thought I did by changing the title... Andrei
Oct 13 2010
next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Andrei Alexandrescu schrieb:
 On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 P.S. For threads this deep it's better fork a new one, especially when
 changing the subject.

I thought I did by changing the title... Andrei

At least on my Thunderbird/Icedove 2.0.0.24 it's still in the old Thread.
Oct 13 2010
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 13/10/2010 18:48, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 P.S. For threads this deep it's better fork a new one, especially when
 changing the subject.

I thought I did by changing the title... Andrei

At least on my Thunderbird/Icedove 2.0.0.24 it's still in the old Thread.

Same here on my Thunderbird 3.0. Is seems TB cares more about the "References:" field in NNTP message to determine the parent. In fact, with version 3 of TB, it seems that's all it considers... which means that NG messages with the same title as the parent will not be put in the same thread as the parent if they don't have the references field. That sounds like the right approach, however there are some problems in practice because some clients never put the references field (particularly Webnews I think), so all those messages show up in my TB as new threads. :/ -- Bruno Medeiros - Software Engineer
Oct 29 2010
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 29/10/2010 12:50, Denis Koroskin wrote:
 On Fri, 29 Oct 2010 15:40:35 +0400, Bruno Medeiros
 <brunodomedeiros+spam com.gmail> wrote:

 On 13/10/2010 18:48, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 P.S. For threads this deep it's better fork a new one, especially when
 changing the subject.

I thought I did by changing the title... Andrei

At least on my Thunderbird/Icedove 2.0.0.24 it's still in the old Thread.

Same here on my Thunderbird 3.0. Is seems TB cares more about the "References:" field in NNTP message to determine the parent. In fact, with version 3 of TB, it seems that's all it considers... which means that NG messages with the same title as the parent will not be put in the same thread as the parent if they don't have the references field. That sounds like the right approach, however there are some problems in practice because some clients never put the references field (particularly Webnews I think), so all those messages show up in my TB as new threads. :/

Nope, most of the responses through WebNews have correct References in place.

All responses that appear as new threads in my TB (ie, threads whose title starts with "Re: ") and for which I have looked at the message source, have user agent: User-Agent: Web-News v.1.6.3 (by Terence Yim) and no references field. These messages are common with some posters, like berophile, Sean Kelly, Kagamin,etc.. But some Web-News messages do have a references field though, so it's not all Web-News messages that are missing it. -- Bruno Medeiros - Software Engineer
Oct 29 2010
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 29/10/2010 18:08, Denis Koroskin wrote:
 On Fri, 29 Oct 2010 16:32:24 +0400, Bruno Medeiros
 <brunodomedeiros+spam com.gmail> wrote:

 On 29/10/2010 12:50, Denis Koroskin wrote:
 On Fri, 29 Oct 2010 15:40:35 +0400, Bruno Medeiros
 <brunodomedeiros+spam com.gmail> wrote:

 On 13/10/2010 18:48, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 P.S. For threads this deep it's better fork a new one, especially
 when
 changing the subject.

I thought I did by changing the title... Andrei

At least on my Thunderbird/Icedove 2.0.0.24 it's still in the old Thread.

Same here on my Thunderbird 3.0. Is seems TB cares more about the "References:" field in NNTP message to determine the parent. In fact, with version 3 of TB, it seems that's all it considers... which means that NG messages with the same title as the parent will not be put in the same thread as the parent if they don't have the references field. That sounds like the right approach, however there are some problems in practice because some clients never put the references field (particularly Webnews I think), so all those messages show up in my TB as new threads. :/

Nope, most of the responses through WebNews have correct References in place.

All responses that appear as new threads in my TB (ie, threads whose title starts with "Re: ") and for which I have looked at the message source, have user agent: User-Agent: Web-News v.1.6.3 (by Terence Yim) and no references field. These messages are common with some posters, like berophile, Sean Kelly, Kagamin,etc.. But some Web-News messages do have a references field though, so it's not all Web-News messages that are missing it.

That's strange because here is what I get for a typical WebNews message: Path: digitalmars.com!not-for-mail From: tls <do notha.ev> Newsgroups: digitalmars.D Subject: Re: Lints, Condate and bugs Date: Fri, 29 Oct 2010 15:54:12 +0400 Organization: Digital Mars Lines: 48 Message-ID: <iaecl4$9j3$1 digitalmars.com> References: <ia6hac$15en$1 digitalmars.com> <op.vlbyabdfo7cclz korden-pc> <iae6dh$2u1f$1 digitalmars.com> <mailman.26.1288350233.21107.digitalmars-d puremagic.com> MIME-Version: 1.0 Content-Type: text/plain X-Trace: digitalmars.com 1288353252 9827 65.204.18.192 (29 Oct 2010 11:54:12 GMT) X-Complaints-To: usenet digitalmars.com NNTP-Posting-Date: Fri, 29 Oct 2010 11:54:12 +0000 (UTC) User-Agent: Web-News v.1.6.3 (by Terence Yim) Xref: digitalmars.com digitalmars.D:120649

Well, here's what I get for such a typical unparented message: Path: digitalmars.com!not-for-mail From: bearophile <bearophileHUGS lycos.com> Newsgroups: digitalmars.D Subject: Re: The Many Faces of D - slides Date: Sun, 03 Oct 2010 15:44:24 -0400 Organization: Digital Mars Lines: 14 Message-ID: <i8ameo$299v$1 digitalmars.com> Mime-Version: 1.0 Content-Type: text/plain X-Trace: digitalmars.com 1286135064 75071 65.204.18.192 (3 Oct 2010 19:44:24 GMT) X-Complaints-To: usenet digitalmars.com NNTP-Posting-Date: Sun, 3 Oct 2010 19:44:24 +0000 (UTC) User-Agent: Web-News v.1.6.3 (by Terence Yim) Xref: digitalmars.com digitalmars.D:118239
 Page 30: that little concurrent test program gives me an error:

-- Bruno Medeiros - Software Engineer
Nov 01 2010
prev sibling next sibling parent Daniel Gibson <metalcaedes gmail.com> writes:
Denis Koroskin schrieb:
 On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 On Wed, 13 Oct 2010 18:32:15 +0400, Andrei Alexandrescu
 So far so good. I will point out, however, that the classic read/write
 routines are not all that good. For example if you want to implement a
 line-buffered stream on top of a block-buffered stream you'll be
 forced to write inefficient code.

Never heard of filesystems that allow reading files in lines - they always read in blocks, and that's what streams should do.

http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html I don't think streams must mimic the low-level OS I/O interface.

I in contrast think that Streams should be a lowest-level possible platform-independent abstraction. No buffering besides what an OS provides, no additional functionality. If you need to be able to read something up to some character (besides, what should be considered a new-line separator: \r, \n, \r\n?), this should be done manually in "byLine".

Platform-independent? OS-Independent, yes. But being independent of Endianess and availability of 80bit real etc is to much for a simple stream (of course we'd need an EndianStream that can wrap a simple stream and take care of the endianess).
 That's because
 most of the steams are binary streams, and there is no such thing as a
 "line" in them (e.g. how often do you need to read a line from a
 SocketStream?).

http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html

These are special cases I don't like. There is no such thing in Windows anyway.
 You need a line when e.g. you parse a HTML header or a email header or 
 an FTP response. Again, if at a low level the transfer occurs in 
 blocks, that doesn't mean the API must do the same at all levels.

BSD sockets transmits in blocks. If you need to find a special sequence in a socket stream, you are forced to fetch a chunk, and manually search for a needed sequence. My position is that you should do it with an external predicate (e.g. read until whitespace).
 I don't think streams should buffer anything either (what an underlying
 OS I/O API caches should suffice), buffered streams adapters can do that
 in a stream-independent way (why duplicate code when you can do that as
 efficiently with external methods?).

Most OS primitives don't give access to their own internal buffers. Instead, they ask user code to provide a buffer and transfer data into it.

Right. This is why Stream may not cache.

Simple streams should not cache, but there must be a BufferedStream wrapping simple streams. When you read from a non-buffered SocketStream each read() (like readInt()) is a syscall - that's really expensive. In my project I got a speedup of about factor 4-5 by replacing std.Streams SocketStream with a custom BufferedSocketStream. I have to do further testing, but I think that shifted the bottleneck from socket-I/O to something else, so in other cases the speedup may be even bigger.
 So clearly buffering on the client side is a must.

I don't see how is it implied from above.
 Besides, as you noted, the buffering is redundant for byChunk/byLine
 adapter ranges. It means that byChunk/byLine should operate on
 unbuffered streams.

Chunks keep their own buffer so indeed they could operate on streams that don't do additional buffering. The story with lines is a fair amount more complicated if it needs to be done efficiently.

Yes. But line-reading is a case that I don't see a need to be handled specially.
 I'll explain my I/O streams implementation below in case you didn't read
 my message (I've changed some stuff a little since then).

Honest, I opened it to remember to read it but somehow your fonts are small and make my eyes hurt.
 My Stream
 interface is very simple:

 // A generic stream
 interface Stream
 {
  property InputStream input();
  property OutputStream output();
  property SeekableStream seekable();
  property bool endOfStream();
 void close();
 }

 You may ask, why separate Input and Output streams?

I think my first question is: why doesn't Stream inherit InputStream and OutputStream? My hypothesis: you want to sometimes return null. Nice.

Right.
 Well, that's because
 you either read from them, write from them, or both.
 Some streams are read-only (think Stdin), some write-only (Stdout), some
 support both, like FileStream. Right?

Sounds good. But then where's flush()? Must be in OutputStream.

That's probably because unbuffered streams don't need them.

You may need to tell the OS to flush its buffer (fsync()).
 
 I'm surprised there's no flush().

No buffering - no flush.

see above Cheers, - Daniel
Oct 13 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 14:02 CDT, Denis Koroskin wrote:
 On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html

 I don't think streams must mimic the low-level OS I/O interface.

I in contrast think that Streams should be a lowest-level possible platform-independent abstraction. No buffering besides what an OS provides, no additional functionality. If you need to be able to read something up to some character (besides, what should be considered a new-line separator: \r, \n, \r\n?), this should be done manually in "byLine".

This aggravates client code for the sake of simplicity in a library that was supposed to make streaming easy. I'm not seeing progress.
 That's because
 most of the steams are binary streams, and there is no such thing as a
 "line" in them (e.g. how often do you need to read a line from a
 SocketStream?).

http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html

These are special cases I don't like. There is no such thing in Windows anyway.

I didn't say I like them. Windows has _isatty: http://msdn.microsoft.com/en-us/library/f4s0ddew(v=VS.80).aspx
 You need a line when e.g. you parse a HTML header or a email header or
 an FTP response. Again, if at a low level the transfer occurs in
 blocks, that doesn't mean the API must do the same at all levels.

BSD sockets transmits in blocks. If you need to find a special sequence in a socket stream, you are forced to fetch a chunk, and manually search for a needed sequence. My position is that you should do it with an external predicate (e.g. read until whitespace).

Problem is how you set up interfaces to avoid inefficiencies and contortions in the client.
 I don't think streams should buffer anything either (what an underlying
 OS I/O API caches should suffice), buffered streams adapters can do that
 in a stream-independent way (why duplicate code when you can do that as
 efficiently with external methods?).

Most OS primitives don't give access to their own internal buffers. Instead, they ask user code to provide a buffer and transfer data into it.

Right. This is why Stream may not cache.

This is a big misunderstanding. If the interface is: size_t read(byte[] buffer); then *I*, the client, need to provide the buffer. It's in client space. This means willing or not I need to do buffering, regardless of whatever internal buffering is going on under the wraps.
 So clearly buffering on the client side is a must.

I don't see how is it implied from above.

Please implement an abstraction that given this: interface InputStream { size_t read(ubyte[] buf); } defines a line reader. Andrei
Oct 13 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 16:05 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 14:02 CDT, Denis Koroskin wrote:
 On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html

 I don't think streams must mimic the low-level OS I/O interface.

I in contrast think that Streams should be a lowest-level possible platform-independent abstraction. No buffering besides what an OS provides, no additional functionality. If you need to be able to read something up to some character (besides, what should be considered a new-line separator: \r, \n, \r\n?), this should be done manually in "byLine".

This aggravates client code for the sake of simplicity in a library that was supposed to make streaming easy. I'm not seeing progress.

This library code needs to be put somewhere. I just believe it belongs to line-reader, not a generic stream. By putting line reading into a stream interface, you want make it more efficient.
 That's because
 most of the steams are binary streams, and there is no such thing as a
 "line" in them (e.g. how often do you need to read a line from a
 SocketStream?).

http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html

These are special cases I don't like. There is no such thing in Windows anyway.

I didn't say I like them. Windows has _isatty: http://msdn.microsoft.com/en-us/library/f4s0ddew(v=VS.80).aspx

I stand corrected. Windows pretends to be Posix compliant, yes, but that's a sad story to tell. I don't see why would
 You need a line when e.g. you parse a HTML header or a email header or
 an FTP response. Again, if at a low level the transfer occurs in
 blocks, that doesn't mean the API must do the same at all levels.

BSD sockets transmits in blocks. If you need to find a special sequence in a socket stream, you are forced to fetch a chunk, and manually search for a needed sequence. My position is that you should do it with an external predicate (e.g. read until whitespace).

Problem is how you set up interfaces to avoid inefficiencies and contortions in the client.
 I don't think streams should buffer anything either (what an
 underlying
 OS I/O API caches should suffice), buffered streams adapters can do
 that
 in a stream-independent way (why duplicate code when you can do
 that as
 efficiently with external methods?).

Most OS primitives don't give access to their own internal buffers. Instead, they ask user code to provide a buffer and transfer data into it.

Right. This is why Stream may not cache.

This is a big misunderstanding. If the interface is: size_t read(byte[] buffer); then *I*, the client, need to provide the buffer. It's in client space. This means willing or not I need to do buffering, regardless of whatever internal buffering is going on under the wraps.

Use BufferedStream adapter if you need buffering, and raw streams if you do the buffering manually. That's the way it's implemented in C#, Java, Tango and many many other APIs.
 So clearly buffering on the client side is a must.

I don't see how is it implied from above.

Please implement an abstraction that given this: interface InputStream { size_t read(ubyte[] buf); } defines a line reader.

I thought we agreed that byLine/byChunk need to do buffering manually anyway. class ByLine { ubyte[] nextLine() { ubyte[BUFFER_SIZE] buffer; while (!inputStream.endOfStream()) { size_t bytesRead = inputStream.read(buffer); foreach (i, ubyte c; buffer[0..bytesRead]) { if (c != '\n') { continue; } appender.put(buffer[0..i]); ubyte[] line = appender.data.dup(); appender.reset(); appender.put(buffer[i+1..$]); return line; } appender.put(buffer[0..bytesRead]); } ubyte[] line = appender.data.dup(); appender.reset(); return line; } InputStream inputStream; Appender!(ubyte[]) appender; } (I've skipped the range interface for the sake of simplicity, replaced it with nextLine() function. I also don't remember proper appender interface, so I've used imaginary function names). Once again, what's the point of byLine, if all it does is call stream.readLine(); ? That's moving code from one place to many unrelated ones. I don't agree with that. I'm not convinced we need line-based API at core stream level. I don't think we need to sacrifice performance for a general case in order to avoid performance hit and a special case. who even told you it will be any less efficient that way?

The code above. Andrei
Oct 13 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 16:05 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

To substantiate my brief answer:
 This library code needs to be put somewhere. I just believe it belongs
 to line-reader, not a generic stream. By putting line reading into a
 stream interface, you want make it more efficient.

I assume you meant "won't" instead of "want". So here you're saying that line-oriented I/O does not belong in the interface because it won't make things more efficient. But then your line reading code is extremely efficient by using the interface you yourself embraced. Andrei P.S. I think I figured the issue with your fonts: the header Content-type contains "charset=KOI8-R". That charset propagates through all responses. Does anyone know how I can ignore it?
Oct 13 2010
next sibling parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Andrei Alexandrescu wrote:
 P.S. I think I figured the issue with your fonts: the header
 Content-type contains "charset=3DKOI8-R". That charset propagates throu=

 all responses. Does anyone know how I can ignore it?

In Thunderbird 2, Edit->Preferences->Display->Fonts&Encodings->Use the default character encoding in replies. I assume something similar is available in Thunderbird 3. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Oct 13 2010
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 17:01 CDT, Andrei Alexandrescu wrote:
 On 10/13/10 16:05 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

To substantiate my brief answer:
 This library code needs to be put somewhere. I just believe it belongs
 to line-reader, not a generic stream. By putting line reading into a
 stream interface, you want make it more efficient.

I assume you meant "won't" instead of "want". So here you're saying that line-oriented I/O does not belong in the interface because it won't make things more efficient. But then your line reading code is extremely efficient by using the interface you yourself embraced.

I meant "inefficient"...
 P.S. I think I figured the issue with your fonts: the header
 Content-type contains "charset=KOI8-R". That charset propagates through
 all responses. Does anyone know how I can ignore it?

Thanks Jerome. Found the option, played with it, no avail. Still trying... Andrei
Oct 13 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 17:23 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 02:01:24 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 16:05 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

To substantiate my brief answer:
 This library code needs to be put somewhere. I just believe it belongs
 to line-reader, not a generic stream. By putting line reading into a
 stream interface, you want make it more efficient.

I assume you meant "won't" instead of "want". So here you're saying that line-oriented I/O does not belong in the interface because it won't make things more efficient. But then your line reading code is extremely efficient by using the interface you yourself embraced.

By adding readln() to the stream interface you will only move that code from ByLine to Stream implementation. The code would be still the same. How can you make it any more efficient? I've read fgets() source code that comes with Microsoft CRT, and it does exactly the same what I did (i.e. fill buffer, read byte-by-byte, copy to output string). It also puts restrictions on line size while I didn't (that's why I had to use an Appender). I also did a line copy (dup) so that I could return immutable string. You see, it's not the Stream interface that make that code less efficient, it's the additional functionality over C API it provides.

Gnu offers two specialized routines: http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is many times more efficient than anything that can be done in client code using the stdio API. I'm thinking along those lines.
 Andrei

 P.S. I think I figured the issue with your fonts: the header
 Content-type contains "charset=KOI8-R". That charset propagates
 through all responses. Does anyone know how I can ignore it?

I've changed that to utf-8. Did it help?

Yes, looking great. Thanks! Andrei
Oct 13 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/2010 06:23 PM, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 03:06:30 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Gnu offers two specialized routines:
 http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is many
 times more efficient than anything that can be done in client code
 using the stdio API. I'm thinking along those lines.

I can easily implement similar interface on top of chunked read: ubyte[] readLine(ubyte[] lineBuffer); or bool readLine(ref ubyte[] lineBuffer);

You can't.
 I've quickly looked through an implementation, too, and it's still
 filling a buffer first, and then copying character byte-by-byte to the
 output string (making realloc when needed) until a delimiter is found.
 It is exactly as efficient as implemented externally.

Except you don't have an interface to copy byte by byte. Oops...
 It does the same
 amount of copying and memory allocations. "Many times more efficient" is
 just an overestimation.

It's not. I measured because it was important in an application I was working on. It's shocking how some seemingly minor changes can make a big difference in throughput.
 BTW, did you see my message about std.concurrency?

Yes, but I'll need to leave the bulk of it to Sean. Thanks. Andrei
Oct 13 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 21:20 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 03:47:12 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/2010 06:23 PM, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 03:06:30 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Gnu offers two specialized routines:
 http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is many
 times more efficient than anything that can be done in client code
 using the stdio API. I'm thinking along those lines.

I can easily implement similar interface on top of chunked read: ubyte[] readLine(ubyte[] lineBuffer); or bool readLine(ref ubyte[] lineBuffer);

You can't.
 I've quickly looked through an implementation, too, and it's still
 filling a buffer first, and then copying character byte-by-byte to the
 output string (making realloc when needed) until a delimiter is found.
 It is exactly as efficient as implemented externally.

Except you don't have an interface to copy byte by byte. Oops...
 It does the same
 amount of copying and memory allocations. "Many times more efficient" is
 just an overestimation.

It's not. I measured because it was important in an application I was working on. It's shocking how some seemingly minor changes can make a big difference in throughput.
 BTW, did you see my message about std.concurrency?

Yes, but I'll need to leave the bulk of it to Sean. Thanks. Andrei

Okay. Now give me your best and tell me mine is slower (sorry for a lack of comments):

If you're satisfied with this, then my point has been lost in the midstream. I was saying it's impossible to implement a line reader on top of a read(ubyte[]) interface without extra buffering and copying. You provided a careful implementation that at the end of the day inevitably does the extra buffering and copying. Andrei
Oct 13 2010
prev sibling next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Andrei Alexandrescu schrieb:
 On 10/11/2010 07:49 PM, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 Agreed. Maybe this is a good time to sart making a requirements list
 for streams. What are the essential features/feature groups?

 Andrei

Maybe something like the following (I hope it's not too extensive): * Input- Output- and InputAndOutput- Streams - having InputStream and OutputStream as an interface like in the old design may be a good idea - implementing the standard operations that are mostly independent from the data source/sink like read/write for basic types, strings, ... in mixin templates is probably elegant to create streams that are both Input and Output (one mixin that implements most of InputStream and one that implements most of OutputStream)

So far so good. I will point out, however, that the classic read/write routines are not all that good. For example if you want to implement a line-buffered stream on top of a block-buffered stream you'll be forced to write inefficient code.

So what's a possible alternative to the classic read/write routines?
 
 Also, a requirement that I think is essential is separation between 
 formatting and transport. std.stream does not have that. At the top 
 level there are two types of transport: text and binary. On top of that 
 lie various formatters.
 

Ok, one should differ between text and binary streams. I was mostly focused on binary streams (because that's what I use). So there might be a hierarchy like * Input/Output-Stream (interface for all those read/write operations) - BinaryStream // abstract class implementing writeInt() etc using write(void* buf, size_t len) * BinarySocketStream * BinaryFileStream * ... - TextStream // abstract class implementing writeInt() etc using something like to!string and write(char[]) * TextFileStream * ... (This for both Input- and Outputstreams)
 * Two kinds of streams:
 1. basic streams: reading/writing from/to:
 * network (socket)
 * files
 * just memory (currently MemoryStream)
 * Arrays/Ranges?
 * ...
 2. streams wrapping other streams:
 * for buffering - buffer input/output/both
 - with the possibility to peek?
 * to modify data when it's read/written (e.g. change endianess -
 important for networking!)
 * custom streams.. e.g. could parse/create CSV (comma seperated values)
 data or similar

Would these be streams be different in their interface?

No. I just wanted to point out that it must be possible (and should be easy) to wrap streams.
 
 * Also there are different types of streams: seekable, resettable (a
 network stream is neither), ...

Agreed. Question: is there a file system that offers resettable but not seekable files? I'm thinking of collapsing the two together.

As mentioned before in other branches of this thread: Probably no file system, but maybe archive files (zip, ...)
 
 * functionality/methods needed/desirable:
 - low level access
 * void read(void *buf, size_t len) // read *exactly* len bytes into buf
 * void write(void *buf, size_t len) // write *exactly* len bytes from
 buf to stream
 - convenient methods to read/write basic types in binary (!) from/to 
 stream

Again, binary vs. text is a capability of the stream. For example, a tty can never transport binary data - programs like gzip refuse to write binary data to a terminal. (Then of course a binary stream can always accommodate text data.)
 * <type> read<Type>() (like int readInt()) or T read(T)() (like int
 read!int())

Templates will be difficult for a class hierarchy.

Ok. Another issue, as you mentioned line based streams: of course read(T)()/write(T)() would be quite messy on them (endless static if(is(T==int)) { ... } else static if(is(T==float)) {...} etc). (here were a lot of templated methods)
 * void writeString(char[] str) // same for wchar and dchar
 - could write str into the stream with its length (as ushort xor uint
 xor ulong,
 _not_ size_t!) prepended
 * char[] readString() // same for wchar and dchar
 - read length of the string and then the string itself that will be
 returned

Many of these capabilities involve template methods. Is a template-based approach preferable to a straight class hierarchy? I tend to think that in the case of streams, classic hierarchies are most adequate.

Ok agreed. I forgot that templated methods can't be overridden. As writeArray(T)() etc is hardly possible, at least for strings there should be write(char[]) write(dchar[]) and write(wchar[]) (maybe with some const-stuff added). These should just write the string to the stream, without its length. (Analogous for read(...))
 
 - all that array/string/low level stuff but reading *at most* len (or
 array.length) values
 and returning the amount actually read ( readUpTo() ?)
 * useful e.g. for parsing http (you don't know how long the header is 
 etc)
 * the same for write? don't see much use for that though..

 - some way to determine whether the stream
 * is at its definite end (eof on file, socket closed or something like
 that)
 * currently empty (for input stream) - just doing a read() would block ?

 - Output streams need flush()
 - for Input streams skip(size_t noBytes) or even skip(T)(size_t noElems)
 may be
 handy to just throw away data we're not interested in without having it
 copied around - especially for non-seekable streams (network..)

OK, that's a good start. Let's toss this back and forth a few times and see what sticks. Andrei

Cheers, - Daniel
Oct 13 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 13:45 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/11/2010 07:49 PM, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 Agreed. Maybe this is a good time to sart making a requirements list
 for streams. What are the essential features/feature groups?

 Andrei

Maybe something like the following (I hope it's not too extensive): * Input- Output- and InputAndOutput- Streams - having InputStream and OutputStream as an interface like in the old design may be a good idea - implementing the standard operations that are mostly independent from the data source/sink like read/write for basic types, strings, ... in mixin templates is probably elegant to create streams that are both Input and Output (one mixin that implements most of InputStream and one that implements most of OutputStream)

So far so good. I will point out, however, that the classic read/write routines are not all that good. For example if you want to implement a line-buffered stream on top of a block-buffered stream you'll be forced to write inefficient code.

So what's a possible alternative to the classic read/write routines?

It's something that ten people implement in eleven ways. Some starters: 1. Get one byte at a time. This may be inefficient. 2. Define this: size_t append(ref ubyte[] buffer); Such an interface allows the stream to append data to a user-maintained buffer. Then the user manages terminators etc. 3. Define this: size_t getDelim(ref ubyte[] buffer, in ubyte[] terminator);
 No. I just wanted to point out that it must be possible (and should be
 easy) to wrap streams.

And efficient too! That means no buffering friction :o).
 Agreed. Question: is there a file system that offers resettable but
 not seekable files? I'm thinking of collapsing the two together.

As mentioned before in other branches of this thread: Probably no file system, but maybe archive files (zip, ...)

I wonder to what extent resetting is really closing and reopening the file/connection. Andrei
Oct 13 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/13/10 16:32 CDT, Steven Schveighoffer wrote:
[snip]

All good points.

 interface InputStream
 {
 // reads up to buffer.length bytes from a stream
 // returns number of bytes read
 // throws on error
 size_t read(ubyte[] buffer);

 // reads from current position
 AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null);
 }

I'd say void[] is better here, since you aren't creating the buffer, you're accepting it. Using ubyte makes for awkward casts when you are reading binary data into specific structures. ditto for OutputStream.

Well casting from void[] is equally awkward isn't it? I'm still undecided on which is better. Andrei
Oct 13 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still 
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile
Oct 13 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/14/10 8:33 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 17:24:34 +0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 On Wed, 13 Oct 2010 18:21:16 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve

That is also the least safe: Object[] objects; stream.read(objects); // most likely will fill with garbage writeln(objects[0]); // access violation It's a type subversion that doesn't require casts.

Excellent point. Andrei
Oct 14 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/14/10 11:01 CDT, Steven Schveighoffer wrote:
 On Thu, 14 Oct 2010 09:33:54 -0400, Denis Koroskin <2korden gmail.com>
 wrote:

 On Thu, 14 Oct 2010 17:24:34 +0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 On Wed, 13 Oct 2010 18:21:16 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve

That is also the least safe: Object[] objects; stream.read(objects); // most likely will fill with garbage writeln(objects[0]); // access violation It's a type subversion that doesn't require casts.

Yes, and this is a problem. But on the flip side, requring casts for non-ubyte value types may be too restrictive. Do we want to require casts when the array being filled is for example utf-8? If so, then won't that disallow such a function in safe D?

I think a solid idea would be to template streaming interfaces on any type T that has no indirections. Andrei
Oct 14 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/14/10 13:21 CDT, Steven Schveighoffer wrote:
 On Thu, 14 Oct 2010 13:42:53 -0400, Denis Koroskin <2korden gmail.com>
 wrote:
 Besides, typed stream needs to be built of top of buffered stream (not
 an unbuffered one). E.g. T might be a big struct, and Stream doesn't
 provide a guaranty that it can ready exactly N bytes (think about
 socket stream).

Hm... you are right, you'd have to buffer at least one T in order for this to work. That kind of sucks. This rules out wchar and char. Maybe it's better to allow all T's where T.sizeof is 1 (this rules out pointer types anyways). Andrei?

I was thinking of throwing if the size of data read is not divisible by the item size. This makes it unlikely to read large data from a burst device like a socket, but that's expected. Andrei
Oct 14 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/14/10 13:46 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 22:21:37 +0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 What's point of having multiple types that only differ with read
 signature? When would you prefer Stream!(byte) over Stream!(ubyte) or
 Stream!(char)? What's wrong with an adapter that allows you to read any
 kind of data off the stream?

 Why do you need to use a Stream directly for reading an array of e.g.
 ints off the stream? Save your time, don't write duplicated code. Use an
 adapter specially provided for that purpose.

Good point. Perhaps indeed it's best to only deal with bytes and characters at transport level. Andrei
Oct 14 2010
parent reply Rainer Deyke <rainerd eldwood.com> writes:
On 10/14/2010 15:49, Andrei Alexandrescu wrote:
 Good point. Perhaps indeed it's best to only deal with bytes and
 characters at transport level.

Make that just bytes. Characters data must be encoded into bytes before it is written and decoded before it is read. The low-level OS functions only deal with bytes, not characters. Text encoding is a complicated process - consider different unicode encodings, different non-unicode encodings, byte order markers, and Windows versus Unix line endings. Furthermore, it is often useful to wedge an additional translation layer between the low-level (binary) stream and the high-level text encoding layer, such as an encryption or compression layer. Writing characters directly to streams made sense in the pre-Unicode world where there was a one-to-one correspondence between characters and bytes. In a modern world, text encoding is an important service that deserves its own standalone module. -- Rainer Deyke - rainerd eldwood.com
Oct 14 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/14/10 21:22 CDT, Rainer Deyke wrote:
 On 10/14/2010 15:49, Andrei Alexandrescu wrote:
 Good point. Perhaps indeed it's best to only deal with bytes and
 characters at transport level.

Make that just bytes. Characters data must be encoded into bytes before it is written and decoded before it is read. The low-level OS functions only deal with bytes, not characters.

I'm not so sure about that. For example, some code in std.stdio is dedicated to supporting fwide(): http://www.opengroup.org/onlinepubs/000095399/functions/fwide.html As far as I understand, a wide stream is essentially an UCS-2 (or UTF-16? Not sure) stream that is impossible to abstract away as a stream of bytes. I see Windows' commitment to fwide is... odd: http://msdn.microsoft.com/en-us/library/aa985619%28VS.80%29.aspx The ultimate question is whether we want to support that (as well as other dedicated text streams) or not.
 Text encoding is a complicated process - consider different unicode
 encodings, different non-unicode encodings, byte order markers, and
 Windows versus Unix line endings.  Furthermore, it is often useful to
 wedge an additional translation layer between the low-level (binary)
 stream and the high-level text encoding layer, such as an encryption or
 compression layer.

 Writing characters directly to streams made sense in the pre-Unicode
 world where there was a one-to-one correspondence between characters and
 bytes.  In a modern world, text encoding is an important service that
 deserves its own standalone module.

I'd say quite the opposite. Since now encodings are embedded all the way down at the low level (per fwide above), we can't pretend it's all bytes down there and leave characters to upper layers. There _are_ transports that deal with characters directly. So the $1M question is, do we support text transports or not? - fwide streams - files for which isatty() returns true (http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html) - email protocol and probably other Internet protocols - others? If we don't support text at the transport level, things can still made to work but in a more fragile manner: upper-level protocols will need to _know_ that although the API accepts any ubyte[], in fact the results would be weird and malfunctioning if the wrong things are being passed. A text-based transport would clarify at the type level that a text stream accepts only UTF-encoded characters. I think either way is not a catastrophe. We can make it work. Andrei
Oct 14 2010
parent Rainer Deyke <rainerd eldwood.com> writes:
On 10/14/2010 22:24, Andrei Alexandrescu wrote:
 On 10/14/10 21:22 CDT, Rainer Deyke wrote:
 Characters data must be encoded into bytes before it is written and
 decoded before it is read.  The low-level OS functions only deal with
 bytes, not characters.

I'm not so sure about that. For example, some code in std.stdio is dedicated to supporting fwide(): http://www.opengroup.org/onlinepubs/000095399/functions/fwide.html

I don't think that's not a low-level OS function. But it is true that I may have overstated my case. Still, the underlying file system and the underlying hardware deal in bytes, not chars, on all platforms that matter. Encoded text /is/ bytes.
 So the $1M question is, do we support text transports or not?

All text is encoded, and encoded text is logically bytes, not chars. This is distinction is somewhat confused in D because the native string types in D do specify an encoding. However, it would be a mistake to conflate the internal encoding with the external encoding used by text transports. It's also worth noting that some of these text transports are not 8-bit clean. This means that they cannot transport UTF-8 (without transcoding), which means that they cannot transport D strings.
 - email protocol and probably other Internet protocols

All internet protocols ultimately work over IP, and IP is a binary protocol.
 If we don't support text at the transport level, things can still made
 to work but in a more fragile manner: upper-level protocols will need to
 _know_ that although the API accepts any ubyte[], in fact the results
 would be weird and malfunctioning if the wrong things are being passed.

The situation for text would be no different from the situation for any other structured binary format.
 A text-based transport would clarify at the type level that a text
 stream accepts only UTF-encoded characters.

You can still have that, as a wrapper around the byte stream. -- Rainer Deyke - rainerd eldwood.com
Oct 14 2010
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Denis Koroskin Wrote:
 
 I prefer ubyte[] because that helps GC (void arrays are scanned for  
 pointers).

To be fair, the only thing that matters here is what the type is when the initial "new" occurs. After that, I think bits are preserved for reallocations so if NO_SCAN is set then it will remain.
Oct 13 2010
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
Denis Koroskin Wrote:

 On Thu, 14 Oct 2010 02:38:11 +0400, Sean Kelly <sean invisibleduck.org>  
 wrote:
 
 Denis Koroskin Wrote:
 I prefer ubyte[] because that helps GC (void arrays are scanned for
 pointers).

To be fair, the only thing that matters here is what the type is when the initial "new" occurs. After that, I think bits are preserved for reallocations so if NO_SCAN is set then it will remain.

It also matter when I dup it. Even if you preallocate void[] with NO_SCAN dup'ing it will reset the flag.

Why would you be duping the buffer for reading or writing? -Steve
Oct 13 2010
prev sibling parent Justin Johansson <no spam.com> writes:
On 14/10/2010 1:32 AM, Andrei Alexandrescu wrote:
 On 10/11/2010 07:49 PM, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 Agreed. Maybe this is a good time to sart making a requirements list
 for streams. What are the essential features/feature groups?

 Andrei



This sub-sub-sub-sub-thread should be a top-level discussion topic, especially if you want it to be visible to people who are not interested in the topic "Is D right for me". Justin
Oct 14 2010
prev sibling next sibling parent reply pipe dream <buffer io.org> writes:
Andrei Alexandrescu Wrote:

 On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there still
 is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D at all)
 - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used, but I
 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list for streams. What are the essential features/feature groups?

You could take a look at Tango's I/O by Kris Bell. It seems to have an efficient and clean design.
Oct 12 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 So, whatever we put in Phobos, 
 we do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems.

I have, on many occasions.
Oct 12 2010
parent reply Jacob Carlborg <doob me.com> writes:
On 2010-10-13 07:29, Walter Bright wrote:
 Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 So, whatever we put in Phobos, we do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems.

I have, on many occasions.

Druntime is basically the Tango runtime so apparently that worked out. -- /Jacob Carlborg
Oct 13 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Jacob Carlborg wrote:
 On 2010-10-13 07:29, Walter Bright wrote:
 Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 So, whatever we put in Phobos, we do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems.

I have, on many occasions.

Druntime is basically the Tango runtime so apparently that worked out.

The authors of some parts of Tango have graciously agreed to transfer their code into Phobos. This includes the work done by Sean (druntime) and Don (math).
Oct 13 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Walter Bright schrieb:
 Jacob Carlborg wrote:
 On 2010-10-13 07:29, Walter Bright wrote:
 Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 So, whatever we put in Phobos, we do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems.

I have, on many occasions.

Druntime is basically the Tango runtime so apparently that worked out.

The authors of some parts of Tango have graciously agreed to transfer their code into Phobos. This includes the work done by Sean (druntime) and Don (math).

We should probably just ask the author if the stream code (Kris, according to their API doc) :)
Oct 13 2010
parent Sean Kelly <sean invisibleduck.org> writes:
Daniel Gibson Wrote:

 Walter Bright schrieb:
 Jacob Carlborg wrote:
 On 2010-10-13 07:29, Walter Bright wrote:
 Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 So, whatever we put in Phobos, we do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems.

I have, on many occasions.

Druntime is basically the Tango runtime so apparently that worked out.

The authors of some parts of Tango have graciously agreed to transfer their code into Phobos. This includes the work done by Sean (druntime) and Don (math).

We should probably just ask the author if the stream code (Kris, according to their API doc) :)

Please excuse me if I don't hold my breath :-)
Oct 13 2010
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Denis Koroskin wrote:
 Either way, it needs some language support, because currently TLS 
 implemented on compiler level rather than library level. I proposed this 
 change in past, but no one responded.

Using TLS needs operating system support, and there's special linker support for it and the compiler has to generate TLS references in certain ways in order for it to work. There is no such support in OSX for TLS, and it is done manually by the library. However, TLS access is 10 times slower than on Linux/Windows. Without doing TLS in the operating system standard way, you tend to get hosed if you link with a shared library/DLL that has TLS.
Oct 12 2010
parent reply Jacob Carlborg <doob me.com> writes:
On 2010-10-13 07:17, Walter Bright wrote:
 Denis Koroskin wrote:
 Either way, it needs some language support, because currently TLS
 implemented on compiler level rather than library level. I proposed
 this change in past, but no one responded.

Using TLS needs operating system support, and there's special linker support for it and the compiler has to generate TLS references in certain ways in order for it to work. There is no such support in OSX for TLS, and it is done manually by the library. However, TLS access is 10 times slower than on Linux/Windows. Without doing TLS in the operating system standard way, you tend to get hosed if you link with a shared library/DLL that has TLS.

I don't know how you have implemented TLS on Mac OS X but it does support TLS via the Posix API pthreads. This is the only page from Apple's documentation I could find for now (I'm certain I've seen a better page) http://developer.apple.com/macosx/multithreadedprogramming.html . According to these: http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html and http://lists.apple.com/archives/darwin-dev/2005/Sep/msg00005.html the implementation of TLS in the Posix API on Mac OS X should be as fast as the EFL implementation. As the blog post mentions, there is an inline version of pthread_getspecific. I also have to add that I have no idea if the pthreads can be used to implement TLS in the compiler. -- /Jacob Carlborg
Oct 13 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Jacob Carlborg wrote:
 I don't know how you have implemented TLS on Mac OS X but it does 
 support TLS via the Posix API pthreads. This is the only page from 
 Apple's documentation I could find for now (I'm certain I've seen a 
 better page) 
 http://developer.apple.com/macosx/multithreadedprogramming.html .

Yeah, I know about pthreads TLS, but that's wholly inadequate.
 According to these: 
 http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html and 
 http://lists.apple.com/archives/darwin-dev/2005/Sep/msg00005.html the 
 implementation of TLS in the Posix API on Mac OS X should be as fast as 
 the EFL implementation. As the blog post mentions, there is an inline 
 version of pthread_getspecific. I also have to add that I have no idea 
 if the pthreads can be used to implement TLS in the compiler.

With gcc on OSX, try this: __thread int x; It will fail. Furthermore, OSX has no documented way to allocate TLS static data in the object file. I spent considerable effort figuring out a way to do this and get around the numerous bugs in the OSX linker that tried to stop me. There's good reason for Windows, Linux, FreeBSD, etc. to support the declaration of TLS in the C source code. BTW, the dates on the second article postdate the D TLS implementation by years. Perhaps Apple has improved things. The third article just points out problems with the TLS. Anyhow, the source is here: http://www.dsource.org/projects/druntime/browser/trunk/src/core/thread.d and the function you're interested in is ___tls_get_addr. You're welcome to make improvements to it.
Oct 13 2010
parent reply Jacob Carlborg <doob me.com> writes:
On 2010-10-13 20:18, Walter Bright wrote:
 Jacob Carlborg wrote:
 I don't know how you have implemented TLS on Mac OS X but it does
 support TLS via the Posix API pthreads. This is the only page from
 Apple's documentation I could find for now (I'm certain I've seen a
 better page)
 http://developer.apple.com/macosx/multithreadedprogramming.html .

Yeah, I know about pthreads TLS, but that's wholly inadequate.
 According to these:
 http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html and
 http://lists.apple.com/archives/darwin-dev/2005/Sep/msg00005.html the
 implementation of TLS in the Posix API on Mac OS X should be as fast
 as the EFL implementation. As the blog post mentions, there is an
 inline version of pthread_getspecific. I also have to add that I have
 no idea if the pthreads can be used to implement TLS in the compiler.

With gcc on OSX, try this: __thread int x; It will fail. Furthermore, OSX has no documented way to allocate TLS static data in the object file. I spent considerable effort figuring out a way to do this and get around the numerous bugs in the OSX linker that tried to stop me.

I just read a bit about how TLS is implemented on linux, just of curiosity what was the problem, the linker, runtime, loader or all? On linux the static TLS data is put in the the object file like any other data. The only difference is it has a different name of the section/segment and an additional flag. Then of course the linker, runtime and loader know about these sections and make any necessary initializations when the application loads.
 There's good reason for Windows, Linux, FreeBSD, etc. to support the
 declaration of TLS in the C source code.

 BTW, the dates on the second article postdate the D TLS implementation
 by years. Perhaps Apple has improved things. The third article just
 points out problems with the TLS.

 Anyhow, the source is here:

 http://www.dsource.org/projects/druntime/browser/trunk/src/core/thread.d

 and the function you're interested in is ___tls_get_addr. You're welcome
 to make improvements to it.

That doesn't seem to be a lot of code to optimize, is Thread.getThis() slow? -- /Jacob Carlborg
Oct 14 2010
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Jacob Carlborg <doob me.com> wrote:
 On 2010-10-13 20:18, Walter Bright wrote:
 Jacob Carlborg wrote:
 I don't know how you have implemented TLS on Mac OS X but it does
 support TLS via the Posix API pthreads. This is the only page from
 Apple's documentation I could find for now (I'm certain I've seen a
 better page)
 http://developer.apple.com/macosx/multithreadedprogramming.html .

Yeah, I know about pthreads TLS, but that's wholly inadequate.
 According to these:
 http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html
 and
 http://lists.apple.com/archives/darwin-dev/2005/Sep/msg00005.html
 the
 implementation of TLS in the Posix API on Mac OS X should be as fast
 as the EFL implementation. As the blog post mentions, there is an
 inline version of pthread_getspecific. I also have to add that I
 have
 no idea if the pthreads can be used to implement TLS in the
 compiler.

With gcc on OSX, try this: __thread int x; It will fail. Furthermore, OSX has no documented way to allocate TLS static data in the object file. I spent considerable effort figuring out a way to do this and get around the numerous bugs in the OSX linker that tried to stop me.

I just read a bit about how TLS is implemented on linux, just of curiosity what was the problem, the linker, runtime, loader or all? On linux the static TLS data is put in the the object file like any other data. The only difference is it has a different name of the section/segment and an additional flag. Then of course the linker, runtime and loader know about these sections and make any necessary initializations when the application loads.

On OSX the object file format lacks a way to specify a TLS data section, and so the linker would need upgrading as well. And the compiler, since it needs to generate the object files.
Oct 14 2010
parent reply Jacob Carlborg <doob me.com> writes:
On 2010-10-14 17:51, Sean Kelly wrote:
 Jacob Carlborg<doob me.com>  wrote:
 On 2010-10-13 20:18, Walter Bright wrote:
 Jacob Carlborg wrote:
 I don't know how you have implemented TLS on Mac OS X but it does
 support TLS via the Posix API pthreads. This is the only page from
 Apple's documentation I could find for now (I'm certain I've seen a
 better page)
 http://developer.apple.com/macosx/multithreadedprogramming.html .

Yeah, I know about pthreads TLS, but that's wholly inadequate.
 According to these:
 http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html
 and
 http://lists.apple.com/archives/darwin-dev/2005/Sep/msg00005.html
 the
 implementation of TLS in the Posix API on Mac OS X should be as fast
 as the EFL implementation. As the blog post mentions, there is an
 inline version of pthread_getspecific. I also have to add that I
 have
 no idea if the pthreads can be used to implement TLS in the
 compiler.

With gcc on OSX, try this: __thread int x; It will fail. Furthermore, OSX has no documented way to allocate TLS static data in the object file. I spent considerable effort figuring out a way to do this and get around the numerous bugs in the OSX linker that tried to stop me.

I just read a bit about how TLS is implemented on linux, just of curiosity what was the problem, the linker, runtime, loader or all? On linux the static TLS data is put in the the object file like any other data. The only difference is it has a different name of the section/segment and an additional flag. Then of course the linker, runtime and loader know about these sections and make any necessary initializations when the application loads.

On OSX the object file format lacks a way to specify a TLS data section, and so the linker would need upgrading as well. And the compiler, since it needs to generate the object files.

As I said, the static TLS data is put in the object file like any other data. I can see that the linker could/would be a problem. Of course the compiler needs to be updated but there should be any problems updating dmd. I guess you're referring to gcc. I also have to say that I haven't fully understood what the linker does in this case, with the TLS data. -- /Jacob Carlborg
Oct 14 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Jacob Carlborg wrote:
 As I said, the static TLS data is put in the object file like any other 
 data.

No, it isn't. It has to go into segregated sections so it can be distinguished from regular static data. Fixup records are different for them, and some special code sequences are used to access them.
 I can see that the linker could/would be a problem. Of course the 
 compiler needs to be updated but there should be any problems updating 
 dmd. I guess you're referring to gcc. I also have to say that I haven't 
 fully understood what the linker does in this case, with the TLS data.

On Linux, the linker understands the special TLS sections, and the special relocation fixups to reference them. It also patches the specific TLS access code sequences emitted by the compiler differently depending on whether the result is to go into an executable or a shared library. On OSX, none of this happens.
Oct 14 2010
parent Jacob Carlborg <doob me.com> writes:
On 2010-10-14 23:11, Walter Bright wrote:
 Jacob Carlborg wrote:
 As I said, the static TLS data is put in the object file like any
 other data.

No, it isn't. It has to go into segregated sections so it can be distinguished from regular static data. Fixup records are different for them, and some special code sequences are used to access them.

I don't know if we misunderstand each other here but I was trying to say that the TLS data is put in the object like any other data, just in a different section with a different flag. That is at least how I understand it from reading: http://www.akkadia.org/drepper/tls.pdf
 I can see that the linker could/would be a problem. Of course the
 compiler needs to be updated but there should be any problems updating
 dmd. I guess you're referring to gcc. I also have to say that I
 haven't fully understood what the linker does in this case, with the
 TLS data.

On Linux, the linker understands the special TLS sections, and the special relocation fixups to reference them. It also patches the specific TLS access code sequences emitted by the compiler differently depending on whether the result is to go into an executable or a shared library.

Ok, thanks for the explanation.
 On OSX, none of this happens.

/Jacob Carlborg
Oct 15 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Jacob Carlborg wrote:
 On 2010-10-13 20:18, Walter Bright wrote:
 Furthermore, OSX has no documented way to allocate TLS
 static data in the object file. I spent considerable effort figuring out
 a way to do this and get around the numerous bugs in the OSX linker that
 tried to stop me.

I just read a bit about how TLS is implemented on linux, just of curiosity what was the problem, the linker, runtime, loader or all? On linux the static TLS data is put in the the object file like any other data. The only difference is it has a different name of the section/segment and an additional flag. Then of course the linker, runtime and loader know about these sections and make any necessary initializations when the application loads.

The linker, runtime, and loader do not know about any TLS sections under OSX, and do not do any necessary initializations. The linker bugs were because it behaved erratically when laying out named sections that were not the usual sections emitted by gcc.
 Anyhow, the source is here:

 http://www.dsource.org/projects/druntime/browser/trunk/src/core/thread.d

 and the function you're interested in is ___tls_get_addr. You're welcome
 to make improvements to it.

That doesn't seem to be a lot of code to optimize, is Thread.getThis() slow?

I've already been 'round the block on that code a few times. If you want to have a go at it, perhaps I missed something and you'll see it.
Oct 14 2010
parent reply Jacob Carlborg <doob me.com> writes:
On 2010-10-14 20:12, Walter Bright wrote:
 Jacob Carlborg wrote:
 On 2010-10-13 20:18, Walter Bright wrote:
 Furthermore, OSX has no documented way to allocate TLS
 static data in the object file. I spent considerable effort figuring out
 a way to do this and get around the numerous bugs in the OSX linker that
 tried to stop me.

I just read a bit about how TLS is implemented on linux, just of curiosity what was the problem, the linker, runtime, loader or all? On linux the static TLS data is put in the the object file like any other data. The only difference is it has a different name of the section/segment and an additional flag. Then of course the linker, runtime and loader know about these sections and make any necessary initializations when the application loads.

The linker, runtime, and loader do not know about any TLS sections under OSX, and do not do any necessary initializations. The linker bugs were because it behaved erratically when laying out named sections that were not the usual sections emitted by gcc.

Can't the necessary initializations be done early in the start up process of the D runtime or is it already to late when application receives control? Since you have non-standard sections in the object file I assume you solved it.
 Anyhow, the source is here:

 http://www.dsource.org/projects/druntime/browser/trunk/src/core/thread.d

 and the function you're interested in is ___tls_get_addr. You're welcome
 to make improvements to it.

That doesn't seem to be a lot of code to optimize, is Thread.getThis() slow?

I've already been 'round the block on that code a few times. If you want to have a go at it, perhaps I missed something and you'll see it.

Thread.getThis() calls pthread_getspecific which is just three instructions on Mac OS X, so I guess that's not why it's so slow. The only thing I can think of is first moving the if statement into the assert and then trying to inline as much of the function calls. -- /Jacob Carlborg
Oct 14 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Jacob Carlborg Wrote:
 
 Thread.getThis() calls pthread_getspecific which is just three 
 instructions on Mac OS X, so I guess that's not why it's so slow. The 
 only thing I can think of is first moving the if statement into the 
 assert and then trying to inline as much of the function calls.

Swapping the assert and the executable code would save you a jump, but inlining the call to ___tls_get_addr would be be a bit trickier. We'd probably have to expose Thread.sm_this as an extern (C) symbol, move the function to object.d and explicitly do the pthread_getspecific call there. If that would be enough for the compiler to inline the calls then it shouldn't be too hard to test, but I'm worried that the call generation may happen too late. I guess it wouldn't be too hard to figure out from looking at the asm output though (PIC code weirdness notwithstanding).
Oct 14 2010
parent Jacob Carlborg <doob me.com> writes:
On 2010-10-15 01:22, Sean Kelly wrote:
 Jacob Carlborg Wrote:
 Thread.getThis() calls pthread_getspecific which is just three
 instructions on Mac OS X, so I guess that's not why it's so slow. The
 only thing I can think of is first moving the if statement into the
 assert and then trying to inline as much of the function calls.

Swapping the assert and the executable code would save you a jump, but inlining the call to ___tls_get_addr would be be a bit trickier. We'd probably have to expose Thread.sm_this as an extern (C) symbol, move the function to object.d and explicitly do the pthread_getspecific call there. If that would be enough for the compiler to inline the calls then it shouldn't be too hard to test, but I'm worried that the call generation may happen too late. I guess it wouldn't be too hard to figure out from looking at the asm output though (PIC code weirdness notwithstanding).

I think it would save more than just a jump. When compiling in release mode the compiler have to generate code for the if statement but with an assert it can just skip it. See the assembly at the bottom. I was thinking about inlining Thread.getThis() as a first step. Then inlining pthread_getspecific as a second step. I don't know if we can inline pthread_getspecific due to license issues but at least there is an inline version available. Then of course inlining the call to __tls_get_addr could help as well. Both of the following versions are compiled with "dmd -c -O -release". ___tls_get_addr in thread.d compiled with if statement: ___tls_get_addr: push EBP mov EBP,ESP push EAX mov EDX,EAX push EBX call L2000 L2000: pop EBX cmp 03AAh[EBX],EDX ja L2011 cmp 03AEh[EBX],EDX ja L2012 L2011: hlt L2012: mov -4[EBP],EDX call L217E mov EAX,054h[EAX] add EAX,-4[EBP] sub EAX,03AAh[EBX] pop EBX mov ESP,EBP pop EBP ret nop 22 lines for the above code, the same compiled with an assert instead: ___tls_get_addr: push EBP mov EBP,ESP sub ESP,8 mov -4[EBP],EAX call L216E mov EAX,054h[EAX] add EAX,-4[EBP] call L200D L200D: pop ECX sub EAX,038Dh[ECX] mov ESP,EBP pop EBP ret This is just 13 lines of code, I can tell you that I don't know assembly but I can see the number of instructions are a lot more in the version with the if statement than the one with the assert. -- /Jacob Carlborg
Oct 15 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/12/10 7:01 CDT, Fawzi Mohamed wrote:
 On 12-ott-10, at 13:04, Denis Koroskin wrote:

 On Tue, 12 Oct 2010 02:32:55 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there
 still
 is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D at
 all)
 - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used, but I
 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list for streams. What are the essential features/feature groups? Andrei

For me, I/O should be scalable (and thus support async operations) so I came up with my own implementation. I've tried building it on top of std.concurrency, but it doesn't scale either. So, once again, I had to implement my own message passing mechanism. I can implement existing std.concurrency interface on top of my own one without sacrificing anything, but not vice-versa).

I very much agree that IO should be scalable. In my opinion this is possible if one has a robust framework for smp parallelization. This is what I have been working on with blip http://dsource.org/blip .

Seeing your recent comments about fixing druntime, I wanted to take a look at this other work but the page seems to be in error. Did you migrate blip somewhere else? Thanks, Andrei
Oct 19 2010
parent Daniel Gibson <metalcaedes gmail.com> writes:
Andrei Alexandrescu schrieb:
 On 10/12/10 7:01 CDT, Fawzi Mohamed wrote:
 On 12-ott-10, at 13:04, Denis Koroskin wrote:

 On Tue, 12 Oct 2010 02:32:55 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there
 still
 is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D at
 all)
 - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used, but I
 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list for streams. What are the essential features/feature groups? Andrei

For me, I/O should be scalable (and thus support async operations) so I came up with my own implementation. I've tried building it on top of std.concurrency, but it doesn't scale either. So, once again, I had to implement my own message passing mechanism. I can implement existing std.concurrency interface on top of my own one without sacrificing anything, but not vice-versa).

I very much agree that IO should be scalable. In my opinion this is possible if one has a robust framework for smp parallelization. This is what I have been working on with blip http://dsource.org/blip .

Seeing your recent comments about fixing druntime, I wanted to take a look at this other work but the page seems to be in error. Did you migrate blip somewhere else? Thanks, Andrei

Oct 19 2010
prev sibling parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
== Quote from Daniel Gibson (metalcaedes gmail.com)'s article
 bioinfornatics schrieb:
 LDC support 64 bit ;)

But both currently lack an up-to-date D2 compiler (but the GDC guys are at least working on it, seems like they're currently at 2.029 - which is great - about 3 months ago they were still at 2.018 and in between was the big 2.020 update that introduced druntime).

Merging frontend is one thing, implementing the new features the frontend offers is another (though I think I've so far covered everything there ;). I'm actually rather curious how DMD plans on doing varargs in 64bit. Not least because I don't think their current method of "calculating" the address of the _argptr takes 64bit parameter passing into account.
Oct 11 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
Iain Buclaw wrote:
 I'm actually rather curious how DMD plans on doing varargs in 64bit. Not least
 because I don't think their current method of "calculating" the address of the
 _argptr takes 64bit parameter passing into account.

I'm working on it now. It's a bitch. The way varargs works for the C ABI is inefficient, clumsy, and frankly won't work for what D needs. So I'm going to have two varargs - one for extern (C) which is compatible with the C ABI and one for extern (D) which will push things on the stack as for 32 bits. The only remaining nuisance for the D method is that some types have to be aligned more strictly than the stack. That means TypeInfo will have to be extended to provide an alignment size for the type.
Oct 11 2010
prev sibling next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 19-ott-10, at 22:04, Andrei Alexandrescu wrote:

 On 10/12/10 7:01 CDT, Fawzi Mohamed wrote:
 [...]
 I very much agree that IO should be scalable.
 In my opinion this is possible if one has a robust framework for smp
 parallelization.
 This is what I have been working on with blip http://dsource.org/ 
 blip .

Seeing your recent comments about fixing druntime, I wanted to take a look at this other work but the page seems to be in error. Did you migrate blip somewhere else?

sorry I forgot the projects in there, the project is at http://dsource.org/projects/blip the code is actually at http://github.com/fawzi/blip I hope to do a release in the next days. It needs hwloc (tested with 1.x) and libev (tested with 3.9). Windows will need some porting, but on max and linux it works correctly (as far as I tested). I havent done any real optimization yet (waiting to have a more complete test suite, and I have switched off reuse of non recursive tasks, thus allocating a bit too much at a low level). feedback is appreciated :) ciao Fawzi
Oct 19 2010
prev sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 19-ott-10, at 22:41, Fawzi Mohamed wrote:

 [...]
 sorry I forgot the projects in there, the project is at
 	http://dsource.org/projects/blip
 the code is actually at
 	http://github.com/fawzi/blip
 I hope to do a release in the next days. It needs hwloc (tested with  
 1.x) and libev (tested with 3.9).

forgot to add that by default the build script also needs lapack (for linalg ops on NArray), but that is not needed if one is interested only in the parallel/io part
 Windows will need some porting, but on max and linux it works  
 correctly (as far as I tested).
 I havent done any real optimization yet (waiting to have a more  
 complete test suite, and I have switched off reuse of non recursive  
 tasks, thus allocating a bit too much at a low level).

 feedback is appreciated :)

 ciao
 Fawzi

Oct 19 2010
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Jonathan M Davis wrote:
 Of course, projects like QtD suffer from the same sort of problem as a
compiler 
 does in that it's not necessarily very useful until it's complete. Lots of 
 people may be interested in using QtD, but if it's not at least close to done, 
 it's not going to be useable enough to use in any major project, so people
won't 
 use, they won't report bugs on it, and the won't give any kind of feedback on 
 the project. So, the poor QtD people then have to get a _lot_ of code done 
 before they see any kind of positive feedback from the community, and when
they 
 _do_ start getting feedback, much of it is likely to be negative because
feature 
 X hasn't been implemented yet or feature Y is buggy. A lot of people have
given 
 up on D for similar reasons. Hopefully enough of the problems that they were 
 having with dmd get fixed soon enough that they're able to actually continue 
 working on the project without getting too frustrated over it.

Things sure have changed. Back in the 80's, people were able to get real projects done with absolutely *terrible* compilers. Compilers have steadily gotten better, and so have expectations.
Oct 10 2010
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2010-10-05 18:29, Andrei Alexandrescu wrote:
 On 10/5/10 9:37 CDT, Gour D. wrote:
 On Tue, 05 Oct 2010 16:01:41 +0200
 "Don" == Don<nospam nospam.com> wrote:






Don> I would estimate the truck factor as between 2.0 and 2.5. Two Don> years ago, the truck factor was 1.0, but not any more. Nice, nice...Still SO people say: "Neither Haskell nor D is popular enough for it to be at all likely that you will ever attract a single other developer to your project..." :-)

If developer attraction is a concern, you're likely better off with D. Programmers who have used at least one Algol-like language (C, C++, Java, C#) will have no problem feeling comfortable in D. With Haskell you'd need to stick with "the choir".
 If just QtD hadn't been suspended...

I agree that's a bummer. I suggest you write the developers and ask what would revive their interest. The perspective of a solid client is bound to be noticeable. Andrei

Probably a compiler that is working better, one with fewer bugs. -- /Jacob Carlborg
Oct 06 2010
parent reply Jacob Carlborg <doob me.com> writes:
On 2010-10-06 11:43, Simen kjaeraas wrote:
 Jacob Carlborg <doob me.com> wrote:

 Probably a compiler that is working better, one with fewer bugs.

Of course. And knowing which bugs those are, one could perhaps fix them.

Well that's the problem, fixing those bugs and you will encounter new bugs. BTW, I wonder why Andrei asked anyway, he already asked this in the thread "QtD is suspended" in the announce newsgroup and got an answer: http://www.digitalmars.com/d/archives/digitalmars/D/announce/QtD_is_suspended_19259.html#N19265 -- /Jacob Carlborg
Oct 06 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/6/10 7:24 CDT, Simen kjaeraas wrote:
 Jacob Carlborg <doob me.com> wrote:

 Well that's the problem, fixing those bugs and you will encounter new
 bugs.

So we should all just give up, then? Yes, there are bugs, and yes, fixing them will reveal new ones. But the only reasonable thing to do is to try and fix those bugs that cause the most headaches, and continue doing so.

I agree that that looked like an unwinnable battle while D was also evolving rapidly. Now that the language is stabilizing I expect the rate and the absolute number of bugs will decrease. Andrei
Oct 06 2010
parent reply Justin Johansson <no spam.com> writes:
On 7/10/2010 1:35 AM, Andrei Alexandrescu wrote:
 On 10/6/10 7:24 CDT, Simen kjaeraas wrote:
 Jacob Carlborg <doob me.com> wrote:

 Well that's the problem, fixing those bugs and you will encounter new
 bugs.

So we should all just give up, then? Yes, there are bugs, and yes, fixing them will reveal new ones. But the only reasonable thing to do is to try and fix those bugs that cause the most headaches, and continue doing so.

I agree that that looked like an unwinnable battle while D was also evolving rapidly. Now that the language is stabilizing I expect the rate and the absolute number of bugs will decrease. Andrei

I think that long are the days that a single individual (or corporation) can "own" a language. It will be an unwinnable battle so long as the governance of D is under the auspices of a single mortal. There is too much risk for investors of time let alone pecuniary investors under the current regime. Justin
Oct 07 2010
parent bioinfornatics <bioinfornatics fedoraroject.org> writes:
For me i will use D1 while ldc do not support D2 or if GDC come a GCC
project and support D2. Until this is not done a big community part do
not go to D2.
Oct 07 2010
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2010-10-06 14:24, Simen kjaeraas wrote:
 Jacob Carlborg <doob me.com> wrote:

 Well that's the problem, fixing those bugs and you will encounter new
 bugs.

So we should all just give up, then? Yes, there are bugs, and yes, fixing them will reveal new ones. But the only reasonable thing to do is to try and fix those bugs that cause the most headaches, and continue doing so.

I just think it has been show several times that D2 is not ready to be used just yet. -- /Jacob Carlborg
Oct 07 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 05 Oct 2010 12:52:22 +0200
 "Simen" =3D=3D "Simen kjaeraas" <simen.kjaras gmail.com> wrote:






Simen> This is a big problem for D at this point. The language is no Simen> longer evolving (much), and we're at a point in time where Simen> libraries and toolchain parts need to be written. That's nice to hear and it's solvable. Simen> It will. Latest news (2 days ago) say it's now getting as far as Simen> main(), which is good. Great! Simen> I believe GDC supports ARM. Hmm, baes on http://dgcc.sourceforge.net/ it looks it is not overly active? Simen> There's a list here: Simen> http://www.wikiservice.at/d/wiki.cgi?DatabaseBindings Simen>=20 Simen> However, most of those are for D1, and a large percentage seem Simen> to be abandoned. :-( Simen> SQLite seems to be well supported, with 7 projects claiming Simen> support. Why so many? Similar to Haskell where one can find bunch of libs doing practically the same thing, but most of them half-baked. Simen> I'm sure you can. D also supports programming styles closer to Simen> those of FP, making such a transition easier (I hope :p) This is certainly bonus. Simen> > a) maintainable code Simen>=20 Simen> This is likely a bit subjective, and much more dependent upon the Simen> programmers themselves than the language used. I agree. Otoh, afaict, D use modules/packages, so code can be nicely organized, as well as in Haskell. Simen> - Contract programming in the form of pre and post contracts for Simen> functions[1]. Simen> - Class invariants[2]. Simen> - Built in unit testing[3]. Simen> - Documentation comments[4]. Simen>=20 Simen> Of course, other features of D may increase maintainability, but Simen> those are the ones most directly associated with it. Not bad.;) Simen> > b) decent performance Simen>=20 Simen> D is generally as fast as C, though some abstractions of course Simen> cost more than others. This is, probably, more than we'd need, but definitely no fear as with e.g. Python & co. Simen> > c) higher-level programming and suitable for general Simen> > programming tasks Simen>=20 Simen> My impression (not having used Haskell), D wins hands down on the Simen> latter, and is a bit weaker on the former. Still, I believe, D provides much more comfortable higher-order experience than C++. Simen> > d) good library support (database stuff, data structures, Qt Simen> > GUI...) Simen>=20 Simen> Likely Haskell is better here (as noted above, D has some Simen> problems in this regard). Lack of GUI libs for D2 is serious concern atm. Simen> The bus-factor of D is sadly close to 1. If Walter should choose Simen> to leave, we have a problem. On the other hand, I don't think a Simen> mere bus would keep him from continuing the project. Uhh...this is almost like a showstopper or, at least, very strong anti-adoption pattern. :-( It is even worse than Haskell where GHC has bus-factor >=3D2 and there are other compilers like uhc, lhc, jhc...there is even Haskell committee working on Haskell' (prime) standard. Simen> Here I can't help. I don't know Haskell. Thanks a lot. It is helpful, although with a little discouraging end. :-( Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 05 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 05 Oct 2010 15:39:30 +0200
 "Daniel" =3D=3D Daniel Gibson <metalcaedes gmail.com> wrote:






Daniel> try http://bitbucket.org/goshawk/gdc/wiki/Home :-) Ahh, this looks much better. Thanks. ;) Daniel> I don't think it's as serious, because afaik Walter is not the Daniel> only one developing the dmd compiler (and thus familiar with Daniel> it) and, more importantly, there are alternative D compilers Daniel> (gdc and ldc, with at least gdc being actively developed). So, both gdc & ldc are open-source? What about standard libs? Daniel> So even if Walter, for whatever reason, stops developing D, Daniel> there is - IMHO - a good chance that others will continue his Daniel> efforts and keep D alive. This is not so disheartening. :-) btw, I've asked similar/same question on SO http://stackoverflow.com/questions/3863111/haskell-or-d-for-gui-desktop-app= lication if you want to contribute (I'm not much advised for D, so far). ;) Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 05 2010
prev sibling next sibling parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
LDC is free compiler last revision use dmdfe 1.063 and soon the ladt dmdfe
ldc support 32, 64 and arm and works on linux, mac OS X and windows
LDC is wrote in c++ using LLVM, all people who whant help this project are
welcome.
They are a D2 experimental branch (you can too come for help this part )
Oct 05 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 05 Oct 2010 16:00:48 +0200
 "Daniel" =3D=3D Daniel Gibson <metalcaedes gmail.com> wrote:






Daniel> > So, both gdc & ldc are open-source? Daniel>=20 Daniel> Yes. Nice. It means that, in the future, we could target ARM as well (for MeeGo). Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 05 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 05 Oct 2010 16:01:41 +0200
 "Don" =3D=3D Don <nospam nospam.com> wrote:






Don> I would estimate the truck factor as between 2.0 and 2.5. Two Don> years ago, the truck factor was 1.0, but not any more. Nice, nice...Still SO people say: "Neither Haskell nor D is popular enough for it to be at all likely that you will ever attract a single other developer to your project..." :-) If just QtD hadn't been suspended... Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 05 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 05 Oct 2010 11:29:03 -0500
 "Andrei" =3D=3D Andrei Alexandrescu >>>>>> wrote:






Andrei> If developer attraction is a concern, you're likely better off Andrei> with D. Programmers who have used at least one Algol-like Andrei> language (C, C++, Java, C#) will have no problem feeling Andrei> comfortable in D. With Haskell you'd need to stick with "the Andrei> choir". Yeah, I'm aware of it... btw, let me personally congatulate for a great talk at Google! Andrei> I agree that's a bummer. I suggest you write the developers and Andrei> ask what would revive their interest. The perspective of a Andrei> solid client is bound to be noticeable. You think that D beginner with a open-source project is "solid client"? Otoh, there is nothing to lose... Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 05 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 05 Oct 2010 20:29:03 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/5/10 9:37 CDT, Gour D. wrote:
 On Tue, 05 Oct 2010 16:01:41 +0200
 "Don" == Don<nospam nospam.com>  wrote:






Don> I would estimate the truck factor as between 2.0 and 2.5. Two Don> years ago, the truck factor was 1.0, but not any more. Nice, nice...Still SO people say: "Neither Haskell nor D is popular enough for it to be at all likely that you will ever attract a single other developer to your project..." :-)

If developer attraction is a concern, you're likely better off with D. Programmers who have used at least one Algol-like language (C, C++, Java, C#) will have no problem feeling comfortable in D. With Haskell you'd need to stick with "the choir".
 If just QtD hadn't been suspended...

I agree that's a bummer. I suggest you write the developers and ask what would revive their interest. The perspective of a solid client is bound to be noticeable. Andrei

I've heard from one of the developers that one of the most frustrating parts was inability of having struct default ctors (http://d.puremagic.com/issues/show_bug.cgi?id=3852), and dtors that aren't called (http://d.puremagic.com/issues/show_bug.cgi?id=3516). I also know they also had *huge* issues with optlink (http://d.puremagic.com/issues/show_bug.cgi?id=2436 and many others), but those hopefully got fixed. I'll try to talk eldar into sharing his development experience and the issues they came across. Until then you may be interested in reading these posts: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=103453 http://h3.gd/devlog/?p=22 - increasingly more people are unsatisfied with D2 and talking about a fork so I wouldn't be surprised to see one sooner or later (!)
Oct 05 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 05 Oct 2010 23:59:07 +0400
 "Denis" =3D=3D "Denis Koroskin" <2korden gmail.com> wrote:






Denis> ...increasingly more people are unsatisfied with D2 and talking Denis> about a fork so I wouldn't be surprised to see one sooner or Denis> later (!) Unsatisfied that D2 changed (too) much or with bugs and development in general? To me, it looks very early to fork D2... Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 05 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 05 Oct 2010 16:06:00 -0700
 "Walter" =3D=3D Walter Bright <newshound2 digitalmars.com> wrote:






Walter> Few things work better than customers letting a company know Walter> they are interested in such-and-such a product. Even a non-paying customer in the open-source world? Well, I'm going to send email to two people considering them important for QtD, based on what I could deduce... Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 05 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Jacob Carlborg <doob me.com> wrote:

 Probably a compiler that is working better, one with fewer bugs.

Of course. And knowing which bugs those are, one could perhaps fix them. -- Simen
Oct 06 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 05 Oct 2010 16:24:02 -0400, Gour D. <gour atmarama.net> wrote:

 On Tue, 05 Oct 2010 23:59:07 +0400
 "Denis" == "Denis Koroskin" <2korden gmail.com> wrote:






Denis> ...increasingly more people are unsatisfied with D2 and talking Denis> about a fork so I wouldn't be surprised to see one sooner or Denis> later (!) Unsatisfied that D2 changed (too) much or with bugs and development in general? To me, it looks very early to fork D2...

I think the blog mentions forking D1 and adding certain useful features from D2. Forking D1 might be the only way to add new features to it. I think many yearn for some of the useful, stable features of D2, but aren't willing to put up with the not-yet-cooked features of D2. You don't have a choice at the moment, if you want D2 features, you must use D2, and the only up-to-date implementation of it -- dmd. On top of that, phobos is a long way from being finished, but it is progressing rapidly, there are many good developers working on it (as opposed to DMD which has 3). I personally do not share that pessimistic view, and I don't think I'll ever go back to D1 anyways, since the D1 phobos library is pretty underpowered. D2/phobos2 will get there, perhaps in a year or so. -Steve
Oct 06 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Jacob Carlborg <doob me.com> wrote:

 Well that's the problem, fixing those bugs and you will encounter new  
 bugs.

So we should all just give up, then? Yes, there are bugs, and yes, fixing them will reveal new ones. But the only reasonable thing to do is to try and fix those bugs that cause the most headaches, and continue doing so. -- Simen
Oct 06 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Fri, 08 Oct 2010 00:24:52 +1100
 "Justin" =3D=3D Justin Johansson wrote:






Justin> I think that long are the days that a single individual (or Justin> corporation) can "own" a language. It will be an unwinnable Justin> battle so long as the governance of D is under the auspices of Justin> a single mortal. I agree that, based on my short research about D, I see several places with the label 'room for improvement' like (more) open development process, using of DVCS, planning releases, better organized web sites, up-to-date wiki, more docs etc. Still, I believe that things are improving or, at least, there are still people enthused with D. That's why I'm curious how to proceed my evaluation phase in order to be able to properly decide for our project? (Running 108 variations of "Hello world" is not adequate test, neither of the language, nor for the tools.) So, my question is whether is there some option to get more complete coverage of the language (D2) besides TDPL book which will require some time to arrive here in Croatia (I found that book is worthy purchase no matter what we decide about D eventually)? Justin> There is too much risk for investors of time let alone pecuniary Justin> investors under the current regime. I agree here...this is e.g. one of the reasons to cautious to invest my time in learning & using ConTeXt (http://www.pragma-ade.com/) seeing it practically as one-man-band mostly pushed by one developer (Hans Hagen), no matter how talented he is and I'll continue using LyX/LaTeX. btw, my question on SO (http://stackoverflow.com/questions/3863111/haskell-or-d-for-gui-desktop-ap= plication=20 or http://is.gd/fPK36) got one nice response from Don Stewart who wrote: "Let's tease out some requirements here, and I'll try to make the Haskell case. Perhaps the D fans or others could try to do the same.", so it would be nice, at least for other users, that some more experienced D user writes The Case for D (no pun intended). Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 07 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Thu, 7 Oct 2010 15:36:36 +0000 (UTC)
 "bioinfornatics" =3D=3D <bioinfornatics fedoraroject.org> wrote:






bioinfornatics> For me i will use D1 while ldc do not support D2 or if bioinfornatics> GDC come a GCC project and support D2. Until this is bioinfornatics> not done a big community part do not go to D2. Hmm...based on that I saw, I consider that D2 features are much more compelling when comparing D with Haskell...Hopefully, in a few months some things may change in D2 arena... Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 07 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday 09 October 2010 12:44:37 Walter Bright wrote:
 Gour D. wrote:
 Walter> Few things work better than customers letting a company know
 Walter> they are interested in such-and-such a product.
 Even a non-paying customer in the open-source world?

At least it shows interest. No emails tells the open source developer "nobody cares, so I'll just abandon it".

Yes, a lack of positive feedback can be frustrating even if you have the best code ever. And as much as the developers of QtD likely want to use it for their own stuff, it's likely not worth doing it just for themselves. It's just too much work. Of course, projects like QtD suffer from the same sort of problem as a compiler does in that it's not necessarily very useful until it's complete. Lots of people may be interested in using QtD, but if it's not at least close to done, it's not going to be useable enough to use in any major project, so people won't use, they won't report bugs on it, and the won't give any kind of feedback on the project. So, the poor QtD people then have to get a _lot_ of code done before they see any kind of positive feedback from the community, and when they _do_ start getting feedback, much of it is likely to be negative because feature X hasn't been implemented yet or feature Y is buggy. A lot of people have given up on D for similar reasons. Hopefully enough of the problems that they were having with dmd get fixed soon enough that they're able to actually continue working on the project without getting too frustrated over it. QtD is a huge service to the D community. - Jonathan M Davis
Oct 10 2010
prev sibling next sibling parent "Gour D." <gour atmarama.net> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Sun, 10 Oct 2010 00:46:47 -0700
 "Jonathan" =3D=3D Jonathan M Davis wrote:






Jonathan> Yes, a lack of positive feedback can be frustrating even if Jonathan> you have the best code ever. And as much as the developers of Jonathan> QtD likely want to use it for their own stuff, it's likely Jonathan> not worth doing it just for themselves. It's just too much Jonathan> work. Well, I tried to do my little 'homework'...wrote to the QtD devs explaining them that their project is essential to adopting D for our project. Moreover, informed them that only for the sake of trying QtD I've created 32bit chroot on my machine(64bit dmd, do you hear me?) and got some help on #qtd in order to build "hello world" (instructions at http://www.dsource.org/projects/qtd/wiki/BuildLinux are now up-to-date). Lastly, I told devs that despite of current status of QtD, we have decided to 'gamble' and will use D/QtD for our project. The next step is to order Andrei's book, start learning the language, experiment with non-GUI stuff and try to help (in any way) to push QtD further. Jonathan> QtD is a huge service to the D community. Indeed! Coming from Haskell community where all the GUI libs (bindings) are in hands of just few devs, I sincerely hope that users of D will recognize importance of QtD for the success of language itself and help the project to become complete and fully usable asap. Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 10 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday 10 October 2010 17:27:55 Daniel Gibson wrote:
 bioinfornatics schrieb:
 LDC support 64 bit ;)

as well as GDC. But both currently lack an up-to-date D2 compiler (but the GDC guys are at least working on it, seems like they're currently at 2.029 - which is great - about 3 months ago they were still at 2.018 and in between was the big 2.020 update that introduced druntime). I agree with walter that 64bit support in DMD is very important, especially for D2: I started a project a few months ago that might have benefited from D2s features (especially ranges), but I decided to use D1 because no 64bit compiler for D2 was in sight. I am btw a bit worried about "upgrading" code from D1 to D2 some day because of the heavy (still ongoing) changes, especially in phobos. I read that for example std.stream is going to be deprecated and replaced by either something with ranges or something like std.stdio (whoever was right about that, maybe even both?). This means that all my (file/network) IO-Code would have to be rewritten sooner or later.. But, currently lacking an alternative to std.stream/std.socketstream for networking, I would have had the same problems even if I had started the project with D2...

A stream solution is in the works (it's discussed periodically on the Phobos list), but they haven't sorted out quite what they want to do with it yet. The Phobos API in general is in flux, though pieces of it are likely to stay more or less unchanged from what they currently are. But there's far from any kind of guarantee that much of anything from the D1 Phobos is going to survive in the D2 Phobos. They're looking to make Phobos as good as they can, and they aren't yet worried about keeping its API stable (though they don't make changes unless they think that it's actually benificial, so things don't change willy-nilly). I'm sure that the time will come, however, when Phobos' API will stabilize, and projects will be able to rely on it staying the same. I think that the reality of the matter is that porting D1 code to D2 code is going to be just like, if not exactly like, porting code from one library to another rather than an upgrade like you'd get between Qt3 and Qt4 (which had plenty of changes). I'm sure that the split between D1 and D2 is going to cause a lot of problems for people looking to port code from one to the other, but it will be better for newly written code, so it's a definite tradeoff. I wouldn't look forward to porting a project from D1 to D2 though. - Jonathan M Davis
Oct 10 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday 10 October 2010 21:26:28 Andrei Alexandrescu wrote:
 A stream solution is in the works (it's discussed periodically on the
 Phobos list), but they haven't sorted out quite what they want to do
 with it yet. The Phobos API in general is in flux, though pieces of it
 are likely to stay more or less unchanged from what they currently are.
 But there's far from any kind of guarantee that much of anything from
 the D1 Phobos is going to survive in the D2 Phobos. They're looking to
 make Phobos as good as they can, and they aren't yet worried about
 keeping its API stable (though they don't make changes unless they think
 that it's actually benificial, so things don't change willy-nilly). I'm
 sure that the time will come, however, when Phobos' API will stabilize,
 and projects will be able to rely on it staying the same.
 
 I think that the reality of the matter is that porting D1 code to D2 code
 is going to be just like, if not exactly like, porting code from one
 library to another rather than an upgrade like you'd get between Qt3 and
 Qt4 (which had plenty of changes). I'm sure that the split between D1
 and D2 is going to cause a lot of problems for people looking to port
 code from one to the other, but it will be better for newly written
 code, so it's a definite tradeoff. I wouldn't look forward to porting a
 project from D1 to D2 though.

I think it's a bit hasty to speak on behalf of all of Phobos' participants. Phobos 2 is indeed different from Phobos 1 but backward-incompatible changes to Phobos 2 are increasingly rare.

Sorry if I overstepped my bounds on that. It's just that from what I've seen, the Phobos devs have been quite willing to make backwards incompatible changes if they thought that they were an improvement, though they aren't done all that frequently. Backwards compatability is considered, but improvements to the API seem to override it. Regardless, the result is that if you wrote your code for dmd 2.040 or something similar and ended up trying to update it to 2.050, you'd likely have a number of changes to make, though porting from Phobos 1 would be far worse. If Phobos were completely stable or at least never made backwards-compatability breaking changes, that wouldn't be the case. I fully expect that as Phobos matures, such breaking changes will become quite rare if not outright nonexistent, but they do still happen. Actually deprecating and replacing the modules that are intended to be deprecated and replace will help a lot with that, but that obviously takes time. - Jonathan M Davis
Oct 10 2010
prev sibling next sibling parent Juanjo Alvarez <fake fakeemail.com> writes:
On Sun, 10 Oct 2010 00:46:47 -0700, Jonathan M Davis 
<jmdavisProg gmx.com> wrote:
 working on the project without getting too frustrated over it. QtD 

 service to the D community.

It is. Qt bindings were the first thing I looked for when I started with the language, even with my current project not using any GUI.
Oct 11 2010
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday 12 October 2010 02:45:33 pipe dream wrote:
 Andrei Alexandrescu Wrote:
 On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there
 still is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D at
 all) - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used, but I
 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list for streams. What are the essential features/feature groups?

You could take a look at Tango's I/O by Kris Bell. It seems to have an efficient and clean design.

Except that copying Tango is taboo. We want to avoid any possible accusation of copying Tango's code or design. There have been issues in the past where Tango devs thought that we might be doing that, and we just don't want to risk any sort of problems with the Tango folks. So, whatever we put in Phobos, we do it without looking at Tango. - Jonathan M Davis
Oct 12 2010
parent Daniel Gibson <metalcaedes gmail.com> writes:
retard schrieb:
 Tue, 12 Oct 2010 10:35:03 -0700, Jonathan M Davis wrote:
 
 On Tuesday, October 12, 2010 04:08:13 Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 Except that copying Tango is taboo. We want to avoid any possible
 accusation of
 copying Tango's code or design. There have been issues in the past
 where Tango
 devs thought that we might be doing that, and we just don't want to
 risk any
 sort of problems with the Tango folks. So, whatever we put in Phobos,
 we do it
 without looking at Tango.

should be no problems.

code - or even API - and create something similar, and from what I recall, most cases of trying to get permission have a been a problem (primarily due to there being multiple authors, I think). If there's only one author for the stream code in Tango, that would be easier. Regardless, the point is that we can't just go and look at the Tango API and use it to give ourselves ideas on what to do with Phobos.

I doubt the copyright law can protect API definitions. In that case projects like Wine (winehq.org) couldn't exist. What do you think?

The problem is that for the windows API you don't see the implementation, so you can't have stolen the code. Tango's code however is available to anyone so it's a lot harder to prove that you only looked at their API documention and not at their code. And it's quite probable that your implementation looks similar to theirs - for standard stuff most programmers (that may have seen any code doing something similar) will produce similar code - how do you prove that you just did it the way that occurred natural to you and didn't copy their code? So to be safe you have two possibilities: 1. don't model you API after Tango's 2. clone their API, look at their code and make sure yours is ridiculously different Else it may turn out like the SHOO time code disaster...
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 12 Oct 2010 02:32:55 +0400, Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org> wrote:

 On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there st=


 is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D at a=


 - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used, but I=


 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list f=

 streams. What are the essential features/feature groups?

 Andrei

For me, I/O should be scalable (and thus support async operations) so I = = came up with my own implementation. I've tried building it on top of std.concurrency, but it doesn't scale = either. So, once again, I had to implement my own message passing = mechanism. I can implement existing std.concurrency interface on top of = my = own one without sacrificing anything, but not vice-versa). = Classes implemented so far: FileStream (file i/o), MemoryStream (e.g. = async memcpy) and SocketStream. None of the streams support range interface explicitly (by design). = Instead, range interface can be achieved by StreamReader (InputStream) a= nd = StreamWriter (OutputStream) adaptors. Here it is in case you want to take a look and borrow ideas: Stream: http://bitbucket.org/korDen/io/src/tip/io/stream.d Mailbox: http://bitbucket.org/korDen/io/src/tip/io/mailbox.d AsyncRequest: http://bitbucket.org/korDen/io/src/tip/io/async.d Unlike std.concurrency, it is easy (and encouraged) to have as many = mailboxes as you need. Mailboxes can forward events to other mailboxes, = = e.g. so that you can poll() only one mailbox (and not every single one o= f = them). And the main difference from std.concurrency is that it allows = event processing in a different thread context. For example, you can = process network message as soon as it arrives (which is done in a = background thread), parse it and then dispatch to main thread. This is h= ow = my HttpRequest (which uses SocketStream) works. Here is an example: import io.http; import io.mailbox; import std.stdio; import std.file; void main() { auto host =3D "=CE=C9=C7=CD=C1.=D2=C6"; // supports international domai= n names, too auto connection =3D new HttpConnection(host); version (Wait) { auto request =3D connection.execute(new HttpRequest(host, "/"))= ; request.wait(); // blocks until completed std.file.write("out.html", request.response.contents); } else { // use thread-unique mailbox for event handing, similar to = std.concurrency.getTid() connection.execute(new HttpRequest(host, "/"), mailbox); bool done =3D false; void onComplete(HttpResponseRequest request) { std.file.write("out.html", request.response.contents); done =3D true; } = mailbox.registerHandler!(HttpResponseRequest)(&onComplete); = version (Loop) { while (!done) { mailbox.poll(); // doesn't block // do something useful (e.g. show progress bar) while not done } } else { mailbox.poll(long.max); assert(done); } } connection.close(); }
Oct 12 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Jonathan M Davis <jmdavisProg gmx.com> wrote:


 Except that copying Tango is taboo. We want to avoid any possible  
 accusation of
 copying Tango's code or design. There have been issues in the past where  
 Tango
 devs thought that we might be doing that, and we just don't want to risk  
 any
 sort of problems with the Tango folks. So, whatever we put in Phobos, we  
 do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems. -- Simen
Oct 12 2010
prev sibling next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 12-ott-10, at 13:04, Denis Koroskin wrote:

 On Tue, 12 Oct 2010 02:32:55 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org 
 wrote:

 On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there  
 still
 is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D  
 at all)
 - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used,  
 but I
 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list for streams. What are the essential features/feature groups? Andrei

For me, I/O should be scalable (and thus support async operations) so I came up with my own implementation. I've tried building it on top of std.concurrency, but it doesn't scale either. So, once again, I had to implement my own message passing mechanism. I can implement existing std.concurrency interface on top of my own one without sacrificing anything, but not vice-versa).

I very much agree that IO should be scalable. In my opinion this is possible if one has a robust framework for smp parallelization. This is what I have been working on with blip http://dsource.org/blip . The API is fixed, and seem to work correctly in all the cases I tested. It has not been optimized so it still has obvious optimizations, but I wanted to have a bit more of code using it, before thinking about optimization (so that I will be able to catch wrong optimizations with a high probability). Indeed I have already a rather large amount of code using it, and had still cases where bugs were *very* rare and difficult to locate. Still I already used it for example to build a socket server that uses all available threads efficiently to implement rpc between processes. One important thing in my opinion is that idle work should not use resources: waiting for i/o, waiting for events uses basically no cpu. All this is with D 1.0, and tango (even if I did try to reduce and encapsulate the dependence on tango as much as possible). For example when using sockets one does not see the blocking by default (for a webserver one will likely want to add some timeout), and the processor transparently switches to fibers that have work to do. The programmer has to just think about tasks and things that can be executed in parallel, if possible, the optimal scheduling and mapping to threads is done automatically. Fawzi
Oct 12 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 12 Oct 2010 16:01:30 +0400, Fawzi Mohamed <fawzi gmx.ch> wrote:

 On 12-ott-10, at 13:04, Denis Koroskin wrote:

 On Tue, 12 Oct 2010 02:32:55 +0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/11/2010 12:38 PM, Daniel Gibson wrote:
 But parts of phobos are deprecated or will be deprecated and there  
 still
 is no alternative for them.
 That may prevent people from writing "real" projects in D2 (or D at  
 all)
 - who wants to use classes that will be deprecated soon?
 Sure, that old stuff will not be removed and can still be used, but I
 personally feel a bit uncomfortable with using deprecated code.

Agreed. Maybe this is a good time to sart making a requirements list for streams. What are the essential features/feature groups? Andrei

For me, I/O should be scalable (and thus support async operations) so I came up with my own implementation. I've tried building it on top of std.concurrency, but it doesn't scale either. So, once again, I had to implement my own message passing mechanism. I can implement existing std.concurrency interface on top of my own one without sacrificing anything, but not vice-versa).

I very much agree that IO should be scalable. In my opinion this is possible if one has a robust framework for smp parallelization. This is what I have been working on with blip http://dsource.org/blip . The API is fixed, and seem to work correctly in all the cases I tested. It has not been optimized so it still has obvious optimizations, but I wanted to have a bit more of code using it, before thinking about optimization (so that I will be able to catch wrong optimizations with a high probability). Indeed I have already a rather large amount of code using it, and had still cases where bugs were *very* rare and difficult to locate. Still I already used it for example to build a socket server that uses all available threads efficiently to implement rpc between processes. One important thing in my opinion is that idle work should not use resources: waiting for i/o, waiting for events uses basically no cpu. All this is with D 1.0, and tango (even if I did try to reduce and encapsulate the dependence on tango as much as possible). For example when using sockets one does not see the blocking by default (for a webserver one will likely want to add some timeout), and the processor transparently switches to fibers that have work to do. The programmer has to just think about tasks and things that can be executed in parallel, if possible, the optimal scheduling and mapping to threads is done automatically. Fawzi

I'm using a thread pool at this moment, however I was thinking about implementing some kind of green threads recently. In D2, there is a local/shared separation, and OS threads should be able switch D local context, i.e. forward access to local variables through custom TLS class reference, and swap that reference to reuse same thread for many "green threads". Conceptually it is different from Fibers (i.e you must return instead of just call yield() at any time), but it is also easier to implement and pretty much portable to any platform. Either way, it needs some language support, because currently TLS implemented on compiler level rather than library level. I proposed this change in past, but no one responded.
Oct 12 2010
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, October 12, 2010 04:08:13 Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 Except that copying Tango is taboo. We want to avoid any possible
 accusation of
 copying Tango's code or design. There have been issues in the past where
 Tango
 devs thought that we might be doing that, and we just don't want to risk
 any
 sort of problems with the Tango folks. So, whatever we put in Phobos, we
 do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems.

That can be done, to be sure, but we definitely can't just look at their code - or even API - and create something similar, and from what I recall, most cases of trying to get permission have a been a problem (primarily due to there being multiple authors, I think). If there's only one author for the stream code in Tango, that would be easier. Regardless, the point is that we can't just go and look at the Tango API and use it to give ourselves ideas on what to do with Phobos. - Jonathan M Davis
Oct 12 2010
parent klickverbot <see klickverbot.at> writes:
On 10/13/10 6:56 PM, Jonathan M Davis wrote:
 I wouldn't think that it would be a problem, but I'm no expert, and we've had
 problems in the past because Tango devs thought that proposed Phobos code was
 too similar to Tango. So, as I understand it, unless we get specific permission
 from the Tango devs which wrote a particular module, we're trying to not have
 code in Phobos which is an API which is at all close to Tango's. That way we
can
 avoid potential conflicts with the Tango devs.

 - Jonathan M Davis

We had this over and over again, but I still think it should be noted that the disaster around SOHO's code was not entirely made up by »the Tango devs«, but originated from a single developer's phone call to Walter Bright and was then exaggerated by large parts of the D community, including both »sides« – your statement(s) makes it look a bit as if it was all Tango at fault there…
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 13 Oct 2010 09:17:58 +0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 Denis Koroskin wrote:
 Either way, it needs some language support, because currently TLS  
 implemented on compiler level rather than library level. I proposed  
 this change in past, but no one responded.

Using TLS needs operating system support, and there's special linker support for it and the compiler has to generate TLS references in certain ways in order for it to work. There is no such support in OSX for TLS, and it is done manually by the library. However, TLS access is 10 times slower than on Linux/Windows. Without doing TLS in the operating system standard way, you tend to get hosed if you link with a shared library/DLL that has TLS.

Aha. Thanks for pointing that out, it never occurred to me before. Means I need to investigate Fibers for D2 then (thats the proper way anyway albeit less portable).
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 13 Oct 2010 18:32:15 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/11/2010 07:49 PM, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 Agreed. Maybe this is a good time to sart making a requirements list
 for streams. What are the essential features/feature groups?

 Andrei

Maybe something like the following (I hope it's not too extensive): * Input- Output- and InputAndOutput- Streams - having InputStream and OutputStream as an interface like in the old design may be a good idea - implementing the standard operations that are mostly independent from the data source/sink like read/write for basic types, strings, ... in mixin templates is probably elegant to create streams that are both Input and Output (one mixin that implements most of InputStream and one that implements most of OutputStream)

So far so good. I will point out, however, that the classic read/write routines are not all that good. For example if you want to implement a line-buffered stream on top of a block-buffered stream you'll be forced to write inefficient code.

Never heard of filesystems that allow reading files in lines - they always read in blocks, and that's what streams should do. That's because most of the steams are binary streams, and there is no such thing as a "line" in them (e.g. how often do you need to read a line from a SocketStream?). I don't think streams should buffer anything either (what an underlying OS I/O API caches should suffice), buffered streams adapters can do that in a stream-independent way (why duplicate code when you can do that as efficiently with external methods?). Besides, as you noted, the buffering is redundant for byChunk/byLine adapter ranges. It means that byChunk/byLine should operate on unbuffered streams. I'll explain my I/O streams implementation below in case you didn't read my message (I've changed some stuff a little since then). My Stream interface is very simple: // A generic stream interface Stream { property InputStream input(); property OutputStream output(); property SeekableStream seekable(); property bool endOfStream(); void close(); } You may ask, why separate Input and Output streams? Well, that's because you either read from them, write from them, or both. Some streams are read-only (think Stdin), some write-only (Stdout), some support both, like FileStream. Right? Not exactly. Does FileStream support writing when you open file for reading? Does it support reading when you open for writing? So, you may or may not read from a generic stream, and you also may or may not write to a generic stream. With a design like that you can make a mistake: if a stream isn't readable, you have no reference to invoke read() method on. Similarly, a stream is either seekable, or not. SeekableStreams allow stream cursor manipulation: interface SeekableStream : Stream { long getPosition(Anchor whence = Anchor.begin); void setPosition(long position, Anchor whence = Anchor.begin); } InputStream doesn't really has many methods: interface InputStream { // reads up to buffer.length bytes from a stream // returns number of bytes read // throws on error size_t read(ubyte[] buffer); // reads from current position AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null); } So is OutputStream: interface OutputStream { // returns number of bytes written // throws on error size_t write(const(ubyte)[] buffer); // writes from current position AsyncWriteRequest writeAsync(const(ubyte)[] buffer, Mailbox* mailbox = null); } They basically support only reading and writing in blocks, nothing else. However, they support asynchronous reads/writes, too (think of mailbox as a std.concurrency's Tid). Unlike Daniel's proposal, my design reads up to buffer size bytes for two reasons: - it avoids potential buffering and multiple sys calls - it is the only way to go with SocketStreams. I mean, you often don't know how many bytes an incoming socket message contains. You either have to read it byte-by-byte, or your application might stall for potentially infinite time (if message was shorter than your buffer, and no more messages are being sent) Why do my streams provide async methods? Because it's the modern approach to I/O - blocking I/O (aka one thread per client) doesn't scale. E.g. Java adds a second revision of Async I/O API in JDK7 (called NIO2, first appeared in February, 2002), C# has asynchronous operations as part of their Stream interface since .NET 1.1 (April, 2003). With async I/O you can server many clients with one thread. Here is an example (pseude-code, usings std.concurrency): foreach (connection; networkConnections) { connection.receiveMessage(getTid()); } receiveOnly!( (NetworkMessage message) { /* do stuff */ } This is still not the most performant solution, but it's still a lot better than one thread per client. Async I/O not only needed for network stuff. Here is a code snippet from DMD (comments added): #define ASYNCREAD 1 #if ASYNCREAD AsyncRead *aw = AsyncRead::create(modules.dim); for (i = 0; i < modules.dim; i++) { m = (Module *)modules.data[i]; aw->addFile(m->srcfile); } aw->start(); // executes async request, doesn't block #else // Single threaded for (i = 0; i < modules.dim; i++) { m = (Module *)modules.data[i]; m->read(0); // blocks } #endif // Do some other stuff for (i = 0; i < modules.dim; i++) { ... #if ASYNCREAD aw->read(i); // waits until async operation finishes #endif Walter told that this small change gave quite a speed up in compilation time. Also, my async methods return a reference to AsyncRequest interface that allows waiting for completion (that's what Walter does in DMD), canceling, querying a status (complete, in progress, failed), reporting an error, etc and that's very useful, too. I strongly believe we shouldn't ignore this type of API. P.S. For threads this deep it's better fork a new one, especially when changing the subject.
Oct 13 2010
prev sibling next sibling parent retard <re tard.com.invalid> writes:
Tue, 12 Oct 2010 10:35:03 -0700, Jonathan M Davis wrote:

 On Tuesday, October 12, 2010 04:08:13 Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 Except that copying Tango is taboo. We want to avoid any possible
 accusation of
 copying Tango's code or design. There have been issues in the past
 where Tango
 devs thought that we might be doing that, and we just don't want to
 risk any
 sort of problems with the Tango folks. So, whatever we put in Phobos,
 we do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems.

That can be done, to be sure, but we definitely can't just look at their code - or even API - and create something similar, and from what I recall, most cases of trying to get permission have a been a problem (primarily due to there being multiple authors, I think). If there's only one author for the stream code in Tango, that would be easier. Regardless, the point is that we can't just go and look at the Tango API and use it to give ourselves ideas on what to do with Phobos.

I doubt the copyright law can protect API definitions. In that case projects like Wine (winehq.org) couldn't exist. What do you think?
Oct 13 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, October 13, 2010 09:49:19 retard wrote:
 Tue, 12 Oct 2010 10:35:03 -0700, Jonathan M Davis wrote:
 On Tuesday, October 12, 2010 04:08:13 Simen kjaeraas wrote:
 Jonathan M Davis <jmdavisProg gmx.com> wrote:
 Except that copying Tango is taboo. We want to avoid any possible
 accusation of
 copying Tango's code or design. There have been issues in the past
 where Tango
 devs thought that we might be doing that, and we just don't want to
 risk any
 sort of problems with the Tango folks. So, whatever we put in Phobos,
 we do it
 without looking at Tango.

You know, we might consider asking them for permission. That way, there should be no problems.

That can be done, to be sure, but we definitely can't just look at their code - or even API - and create something similar, and from what I recall, most cases of trying to get permission have a been a problem (primarily due to there being multiple authors, I think). If there's only one author for the stream code in Tango, that would be easier. Regardless, the point is that we can't just go and look at the Tango API and use it to give ourselves ideas on what to do with Phobos.

I doubt the copyright law can protect API definitions. In that case projects like Wine (winehq.org) couldn't exist. What do you think?

I wouldn't think that it would be a problem, but I'm no expert, and we've had problems in the past because Tango devs thought that proposed Phobos code was too similar to Tango. So, as I understand it, unless we get specific permission from the Tango devs which wrote a particular module, we're trying to not have code in Phobos which is an API which is at all close to Tango's. That way we can avoid potential conflicts with the Tango devs. - Jonathan M Davis
Oct 13 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, October 13, 2010 10:37:43 klickverbot wrote:
 On 10/13/10 6:56 PM, Jonathan M Davis wrote:
 I wouldn't think that it would be a problem, but I'm no expert, and we'=


 had problems in the past because Tango devs thought that proposed Phobos
 code was too similar to Tango. So, as I understand it, unless we get
 specific permission from the Tango devs which wrote a particular module,
 we're trying to not have code in Phobos which is an API which is at all
 close to Tango's. That way we can avoid potential conflicts with the
 Tango devs.
=20
 - Jonathan M Davis

We had this over and over again, but I still think it should be noted that the disaster around SOHO's code was not entirely made up by =C2=BBthe Tango devs=C2=AB, but originated from a single developer's phone call to Walter Bright and was then exaggerated by large parts of the D community, including both =C2=BBsides=C2=AB =E2=80=93 your statement(s) m=

 bit as if it was all Tango at fault there=E2=80=A6

I never "the" Tango devs, just Tango devs, so I'm not claiming anything abo= ut=20 all Tango devs. I'm not even really saying whether code was or wasn't copie= d,=20 but there are Tango devs who are very sensitive to anything that looks like= it=20 might have been copied from Tango, and we want to avoid any misunderstandin= gs or=20 issues that could arise from any Tango dev thinking that we're swiping thei= r=20 code. So, we avoid doing anything that even looks similar, and many of us n= ever=20 look at the Tango API at all, let alone the code. Whether any copying of an= y=20 kind has ever taken place is irrelevant at this point. What matters is that= we=20 don't want to cause issues between the Phobos and Tango folks, so we need t= o=20 generally avoid anything that makes it look like we might be swiping anythi= ng=20 from Tango. =2D Jonathan M Davis
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 On Wed, 13 Oct 2010 18:32:15 +0400, Andrei Alexandrescu
 So far so good. I will point out, however, that the classic read/write
 routines are not all that good. For example if you want to implement a
 line-buffered stream on top of a block-buffered stream you'll be
 forced to write inefficient code.

Never heard of filesystems that allow reading files in lines - they always read in blocks, and that's what streams should do.

http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html I don't think streams must mimic the low-level OS I/O interface.

I in contrast think that Streams should be a lowest-level possible platform-independent abstraction. No buffering besides what an OS provides, no additional functionality. If you need to be able to read something up to some character (besides, what should be considered a new-line separator: \r, \n, \r\n?), this should be done manually in "byLine".
 That's because
 most of the steams are binary streams, and there is no such thing as a
 "line" in them (e.g. how often do you need to read a line from a
 SocketStream?).

http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html

These are special cases I don't like. There is no such thing in Windows anyway.
 You need a line when e.g. you parse a HTML header or a email header or  
 an FTP response. Again, if at a low level the transfer occurs in blocks,  
 that doesn't mean the API must do the same at all levels.

BSD sockets transmits in blocks. If you need to find a special sequence in a socket stream, you are forced to fetch a chunk, and manually search for a needed sequence. My position is that you should do it with an external predicate (e.g. read until whitespace).
 I don't think streams should buffer anything either (what an underlying
 OS I/O API caches should suffice), buffered streams adapters can do that
 in a stream-independent way (why duplicate code when you can do that as
 efficiently with external methods?).

Most OS primitives don't give access to their own internal buffers. Instead, they ask user code to provide a buffer and transfer data into it.

Right. This is why Stream may not cache.
 So clearly buffering on the client side is a must.

I don't see how is it implied from above.
 Besides, as you noted, the buffering is redundant for byChunk/byLine
 adapter ranges. It means that byChunk/byLine should operate on
 unbuffered streams.

Chunks keep their own buffer so indeed they could operate on streams that don't do additional buffering. The story with lines is a fair amount more complicated if it needs to be done efficiently.

Yes. But line-reading is a case that I don't see a need to be handled specially.
 I'll explain my I/O streams implementation below in case you didn't read
 my message (I've changed some stuff a little since then).

Honest, I opened it to remember to read it but somehow your fonts are small and make my eyes hurt.
 My Stream
 interface is very simple:

 // A generic stream
 interface Stream
 {
  property InputStream input();
  property OutputStream output();
  property SeekableStream seekable();
  property bool endOfStream();
 void close();
 }

 You may ask, why separate Input and Output streams?

I think my first question is: why doesn't Stream inherit InputStream and OutputStream? My hypothesis: you want to sometimes return null. Nice.

Right.
 Well, that's because
 you either read from them, write from them, or both.
 Some streams are read-only (think Stdin), some write-only (Stdout), some
 support both, like FileStream. Right?

Sounds good. But then where's flush()? Must be in OutputStream.

That's probably because unbuffered streams don't need them.
 Not exactly. Does FileStream support writing when you open file for
 reading? Does it support reading when you open for writing?
 So, you may or may not read from a generic stream, and you also may or
 may not write to a generic stream. With a design like that you can make
 a mistake: if a stream isn't readable, you have no reference to invoke
 read() method on.

That is indeed pretty nifty. I hope you would allow us to copy that feature in Phobos (unless you are considering submitting your library wholesale). Let me know.

Would love to contribute with design and implementation.
 Similarly, a stream is either seekable, or not. SeekableStreams allow
 stream cursor manipulation:

 interface SeekableStream : Stream
 {
 long getPosition(Anchor whence = Anchor.begin);
 void setPosition(long position, Anchor whence = Anchor.begin);
 }

Makes sense. Why is getPosition signed? Why do you need an anchor for getPosition?

long is chosen to be consistent with setPosition. Also getPosition may return a negative value: long pos = getPosition(Anchor.end); // how far is it till file end? Also this is how you can get file size (need to invert though). This is consistent with setPosition: setPosition(getPosition(anchor), anchor); // a no-op for any kind of achor I just thought why not? I'm okay with dropping it, but I find it nice.
 InputStream doesn't really has many methods:

 interface InputStream
 {
 // reads up to buffer.length bytes from a stream
 // returns number of bytes read
 // throws on error
 size_t read(ubyte[] buffer);

That makes implementation of line buffering inefficient :o).

There is no way you can do it more efficient on Windows. Fetch a chunk; search for a line end; found ? return : continue.
 // reads from current position
 AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null);
 }

Why doesn't Sean's concurrency API scale for your needs? Can that be fixed? Would you consider submitting some informed bug reports?

It's rather a design issue than a bug on its own. I'll write a separate letter on that.
 So is OutputStream:

 interface OutputStream
 {
 // returns number of bytes written
 // throws on error
 size_t write(const(ubyte)[] buffer);

 // writes from current position
 AsyncWriteRequest writeAsync(const(ubyte)[] buffer, Mailbox* mailbox =
 null);
 }

 They basically support only reading and writing in blocks, nothing else.

I'm surprised there's no flush().

No buffering - no flush.
 However, they support asynchronous reads/writes, too (think of mailbox
 as a std.concurrency's Tid).

 Unlike Daniel's proposal, my design reads up to buffer size bytes for
 two reasons:
 - it avoids potential buffering and multiple sys calls

But there's a problem. It's very rare that the user knows what a good buffer size is. And often there are size and alignment restrictions at the low level.

I agree, but he can guess. Or a library can give him a hint. E.g. BUFFER_SIZE is a good buffer size to start with :)
 So somewhere there is still buffering going on, and also there are  
 potential inefficiencies (if a user reads small buffers).

 - it is the only way to go with SocketStreams. I mean, you often don't
 know how many bytes an incoming socket message contains. You either have
 to read it byte-by-byte, or your application might stall for potentially
 infinite time (if message was shorter than your buffer, and no more
 messages are being sent)

But if you don't know how many bytes are in an incoming socket message, a better design is to do this: void read(ref ubyte[] buffer);

That could work, too.
 and resize the buffer to accommodate the incoming packet. Your design  
 _imposes_ that the socket does additional buffering.

The socket API does it anyway. I just don't complicate it even further but providing an additional layer of buffering.
 Why do my streams provide async methods? Because it's the modern
 approach to I/O - blocking I/O (aka one thread per client) doesn't
 scale. E.g. Java adds a second revision of Async I/O API in JDK7 (called
 NIO2, first appeared in February, 2002), C# has asynchronous operations
 as part of their Stream interface since .NET 1.1 (April, 2003).

Async I/O is nice, no two ways about that. I have on my list to define byChunkAsync that works exactly like byChunk from the client's perspective, except it does I/O concurrently with client code. [snip]
 I strongly believe we shouldn't ignore this type of API.

 P.S. For threads this deep it's better fork a new one, especially when
 changing the subject.

I thought I did by changing the title... Andrei

No, changing title isn't enough.
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Why doesn't Sean's concurrency API scale for your needs? Can that be  
 fixed? Would you consider submitting some informed bug reports?

Okay, now I got a few extra time so I can share what I missed in std.concurrency that made me implement my own message passing API: 1) Next to impossible to create and use multiple message boxes, which is a big pain for writing library code. std.concurrency operates on Tid, and you can't create new Tids (ctor is private). Tid is a very simple wrapper around a MessageBox class, and while you can create custom MessageBoxes, none of the public std.concurrency APIs work with them directly. As such you are stuck with one Tid per thread, and that's a no go for me. E.g. I have hundreds of concurrent socket connections, and I'd like to have different event handlers for different event sources (i.e. different SocketStreams). 2) Even if you are able to create N Tids, that's what event handling occurs: for (int i = 0; i < messageBoxes.length; ++i) { messageBoxes[i].tid.receiveTimeout(0 /* no timeout, blocking is not allowed*/, messageBoxes[i].callback ); } This doesn't scale well. With hunders of message boxes, this loop will consume all the CPU time. Besides, callback must be defined as "void delegate(Variant)" and loses all the type information. It just throws all the events into the same bag. That's not okay for me. 3) I want to bind callbacks to event types in one place, and poll for events in another one. E.g. instead of: void foo(FooMessage message) { ... } ... tid.receiveTimeout(0, &foo, &bar, &baz}; I'd want to be able to tid.register(&foo); tid.register(&bar); tid.register(&baz); ... tid.poll(0 /* no wait */); 4) Event chaining is impossible to achieve. This is mostly because of #3. Here is a more concrete example: I want to receive a incoming socket message notification, parse that message (extract http headers/contents etc) and then possibly dispatch new event. All this needs to be transparent to the user. E.g.: HttpConnection connection = new HttpConnection("google.com"); // HttpConnection is needed to send multiple http requests over the same socket connection HttpRequest request = new HttpRequest("/"); // main page connection.execute(request, tid); // start HttpConnection is using SocketStream under the hood. When you call connection.execute(request), it connects to tid.receiveOnly( (HttpResponse response) { writeln(response.contents); } ); With std.concurrency it is impossible to implement. Problem is that HttpResponse event is never sent, because SocketStream message is never received, because it is never polled for. The following could improve the situation: class HttpConnection { void execute(HttpRequest request) { ... tid.register(&onNewMessage); ... } } tid.register(&onHttpResponse); tid.poll(); // not it polls for both messages! When a new message arrives, the control is passed to HttpConnection, which is then passed to HttpRequest, which parses socket message and generates HttpResponse event, which is then received by user. 5) std.concurrency doesn't know about ThreadPools, and doesn't allow event processing in threads other than current one. This prevents code parallelization. 6) Tid can't redirect events to other Tids. Here is an example: void onNewEvent(Event e) { writeln(e.toString()); } Mailbox m1; m1.register(&onNewEvent); Mailbox m2 = Mailbox(&m1); // m1 is now a parent to m2 m2.raiseEvent(new Event()); // redirects to m1 m1.poll(INFINITE); // triggers event handling Useful when you have hunders of mailboxes. Just poll one and all the events will be triggered. That's pretty much that I needed (and my Mailbox provides) but std.concurrency lacks. My mailbox implementation is very-very slim, full source code available here: http://bitbucket.org/korDen/io/src/tip/io/mailbox.d
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 14:02 CDT, Denis Koroskin wrote:
 On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html

 I don't think streams must mimic the low-level OS I/O interface.

I in contrast think that Streams should be a lowest-level possible platform-independent abstraction. No buffering besides what an OS provides, no additional functionality. If you need to be able to read something up to some character (besides, what should be considered a new-line separator: \r, \n, \r\n?), this should be done manually in "byLine".

This aggravates client code for the sake of simplicity in a library that was supposed to make streaming easy. I'm not seeing progress.

This library code needs to be put somewhere. I just believe it belongs to line-reader, not a generic stream. By putting line reading into a stream interface, you want make it more efficient.
 That's because
 most of the steams are binary streams, and there is no such thing as a
 "line" in them (e.g. how often do you need to read a line from a
 SocketStream?).

http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html

These are special cases I don't like. There is no such thing in Windows anyway.

I didn't say I like them. Windows has _isatty: http://msdn.microsoft.com/en-us/library/f4s0ddew(v=VS.80).aspx

I stand corrected. Windows pretends to be Posix compliant, yes, but that's a sad story to tell. I don't see why would
 You need a line when e.g. you parse a HTML header or a email header or
 an FTP response. Again, if at a low level the transfer occurs in
 blocks, that doesn't mean the API must do the same at all levels.

BSD sockets transmits in blocks. If you need to find a special sequence in a socket stream, you are forced to fetch a chunk, and manually search for a needed sequence. My position is that you should do it with an external predicate (e.g. read until whitespace).

Problem is how you set up interfaces to avoid inefficiencies and contortions in the client.
 I don't think streams should buffer anything either (what an  
 underlying
 OS I/O API caches should suffice), buffered streams adapters can do  
 that
 in a stream-independent way (why duplicate code when you can do that  
 as
 efficiently with external methods?).

Most OS primitives don't give access to their own internal buffers. Instead, they ask user code to provide a buffer and transfer data into it.

Right. This is why Stream may not cache.

This is a big misunderstanding. If the interface is: size_t read(byte[] buffer); then *I*, the client, need to provide the buffer. It's in client space. This means willing or not I need to do buffering, regardless of whatever internal buffering is going on under the wraps.

Use BufferedStream adapter if you need buffering, and raw streams if you do the buffering manually. That's the way it's implemented in C#, Java, Tango and many many other APIs.
 So clearly buffering on the client side is a must.

I don't see how is it implied from above.

Please implement an abstraction that given this: interface InputStream { size_t read(ubyte[] buf); } defines a line reader.

I thought we agreed that byLine/byChunk need to do buffering manually anyway. class ByLine { ubyte[] nextLine() { ubyte[BUFFER_SIZE] buffer; while (!inputStream.endOfStream()) { size_t bytesRead = inputStream.read(buffer); foreach (i, ubyte c; buffer[0..bytesRead]) { if (c != '\n') { continue; } appender.put(buffer[0..i]); ubyte[] line = appender.data.dup(); appender.reset(); appender.put(buffer[i+1..$]); return line; } appender.put(buffer[0..bytesRead]); } ubyte[] line = appender.data.dup(); appender.reset(); return line; } InputStream inputStream; Appender!(ubyte[]) appender; } (I've skipped the range interface for the sake of simplicity, replaced it with nextLine() function. I also don't remember proper appender interface, so I've used imaginary function names). Once again, what's the point of byLine, if all it does is call stream.readLine(); ? That's moving code from one place to many unrelated ones. I don't agree with that. I'm not convinced we need line-based API at core stream level. I don't think we need to sacrifice performance for a general case in order to avoid performance hit and a special case. who even told you it will be any less efficient that way?
 Andrei

Oct 13 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
Before responding directly, I'll say I think this is on the right track.   
IMO, buffering should be transparent when it can be, meaning you should be  
able to have control over the buffering.  The abstraction should look like  
this:

specific application -> buffered streams -> OS abstraction -> Low level  
calls.

Denis' proposal covers up to OS abstraction.  What we need on top of that  
is a buffer layer.

On Wed, 13 Oct 2010 12:16:38 -0400, Denis Koroskin <2korden gmail.com>  
wrote:

 I'll explain my I/O streams implementation below in case you didn't read  
 my message (I've changed some stuff a little since then). My Stream  
 interface is very simple:

 // A generic stream
 interface Stream
 {
       property InputStream input();
       property OutputStream output();
       property SeekableStream seekable();
       property bool endOfStream();
      void close();
 }

 You may ask, why separate Input and Output streams? Well, that's because  
 you either read from them, write from them, or both.
 Some streams are read-only (think Stdin), some write-only (Stdout), some  
 support both, like FileStream. Right?

I feel we can possibly make this a compile-time decision. Can we do something like this: interface Stream : InputStream, OutputStream {}
 Not exactly. Does FileStream support writing when you open file for  
 reading? Does it support reading when you open for writing?
 So, you may or may not read from a generic stream, and you also may or  
 may not write to a generic stream. With a design like that you can make  
 a mistake: if a stream isn't readable, you have no reference to invoke  
 read() method on.

Essentially, it's near zero the times that I decide at runtime whether I'm opening a file for reading, writing or both. So why not build that into the type, and then we have the compiler to tell us when something can't be used for reading or writing?
 Similarly, a stream is either seekable, or not. SeekableStreams allow  
 stream cursor manipulation:

 interface SeekableStream : Stream
 {
      long getPosition(Anchor whence = Anchor.begin);
      void setPosition(long position, Anchor whence = Anchor.begin);
 }

A seekable interface is one of those things that's really hard to get right. In Tango, we eventually got rid of the seekable interface and just added seek methods to all the low level stream interfaces. The rationale is that most of the time seekability is not a requirement you can set when opening a file. You open a file for reading or writing, but not for seeking. So it's almost necessary that seekability is a runtime decision (because the OS decides it outside of your control). There are, of course, streams that will not be seekable (netowork sockets), but you just throw an exception when seeking such a stream. The only thing I'd create is a way to determine seekability without throwing an exception (i.e. a canSeek property). But most of the time you know whether a stream is seekable without having to check.
 InputStream doesn't really has many methods:

 interface InputStream
 {
 	// reads up to buffer.length bytes from a stream
 	// returns number of bytes read
 	// throws on error
 	size_t read(ubyte[] buffer);

 	// reads from current position
 	AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null);
 }

I'd say void[] is better here, since you aren't creating the buffer, you're accepting it. Using ubyte makes for awkward casts when you are reading binary data into specific structures. ditto for OutputStream. -Steve
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 01:57:56 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 16:32 CDT, Steven Schveighoffer wrote:
 [snip]

 All good points.

 interface InputStream
 {
 // reads up to buffer.length bytes from a stream
 // returns number of bytes read
 // throws on error
 size_t read(ubyte[] buffer);

 // reads from current position
 AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null);
 }

I'd say void[] is better here, since you aren't creating the buffer, you're accepting it. Using ubyte makes for awkward casts when you are reading binary data into specific structures. ditto for OutputStream.

Well casting from void[] is equally awkward isn't it? I'm still undecided on which is better. Andrei

I prefer ubyte[] because that helps GC (void arrays are scanned for pointers). Besides, ubyte[] is just a sequence of bytes and that's exactly what's being read from a stream.
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 02:01:24 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 16:05 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

To substantiate my brief answer:
 This library code needs to be put somewhere. I just believe it belongs
 to line-reader, not a generic stream. By putting line reading into a
 stream interface, you want make it more efficient.

I assume you meant "won't" instead of "want". So here you're saying that line-oriented I/O does not belong in the interface because it won't make things more efficient. But then your line reading code is extremely efficient by using the interface you yourself embraced.

By adding readln() to the stream interface you will only move that code from ByLine to Stream implementation. The code would be still the same. How can you make it any more efficient? I've read fgets() source code that comes with Microsoft CRT, and it does exactly the same what I did (i.e. fill buffer, read byte-by-byte, copy to output string). It also puts restrictions on line size while I didn't (that's why I had to use an Appender). I also did a line copy (dup) so that I could return immutable string. You see, it's not the Stream interface that make that code less efficient, it's the additional functionality over C API it provides.
 Andrei

 P.S. I think I figured the issue with your fonts: the header  
 Content-type contains "charset=KOI8-R". That charset propagates through  
 all responses. Does anyone know how I can ignore it?

I've changed that to utf-8. Did it help?
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 02:38:11 +0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 Denis Koroskin Wrote:
 I prefer ubyte[] because that helps GC (void arrays are scanned for
 pointers).

To be fair, the only thing that matters here is what the type is when the initial "new" occurs. After that, I think bits are preserved for reallocations so if NO_SCAN is set then it will remain.

It also matter when I dup it. Even if you preallocate void[] with NO_SCAN dup'ing it will reset the flag.
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 03:06:30 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 17:23 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 02:01:24 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 16:05 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

To substantiate my brief answer:
 This library code needs to be put somewhere. I just believe it belongs
 to line-reader, not a generic stream. By putting line reading into a
 stream interface, you want make it more efficient.

I assume you meant "won't" instead of "want". So here you're saying that line-oriented I/O does not belong in the interface because it won't make things more efficient. But then your line reading code is extremely efficient by using the interface you yourself embraced.

By adding readln() to the stream interface you will only move that code from ByLine to Stream implementation. The code would be still the same. How can you make it any more efficient? I've read fgets() source code that comes with Microsoft CRT, and it does exactly the same what I did (i.e. fill buffer, read byte-by-byte, copy to output string). It also puts restrictions on line size while I didn't (that's why I had to use an Appender). I also did a line copy (dup) so that I could return immutable string. You see, it's not the Stream interface that make that code less efficient, it's the additional functionality over C API it provides.

Gnu offers two specialized routines: http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is many times more efficient than anything that can be done in client code using the stdio API. I'm thinking along those lines.

I can easily implement similar interface on top of chunked read: ubyte[] readLine(ubyte[] lineBuffer); or bool readLine(ref ubyte[] lineBuffer); I've quickly looked through an implementation, too, and it's still filling a buffer first, and then copying character byte-by-byte to the output string (making realloc when needed) until a delimiter is found. It is exactly as efficient as implemented externally. It does the same amount of copying and memory allocations. "Many times more efficient" is just an overestimation. BTW, did you see my message about std.concurrency?
 Andrei

 P.S. I think I figured the issue with your fonts: the header
 Content-type contains "charset=KOI8-R". That charset propagates
 through all responses. Does anyone know how I can ignore it?

I've changed that to utf-8. Did it help?

Yes, looking great. Thanks! Andrei

Oct 13 2010
prev sibling next sibling parent s50 <shinji.igarashi gmail.com> writes:
I think st_blksize is often used to determine buffering size for I/O.
Calling I/O syscall many times causes inefficiency.
Buffered read, lookahead is essentially a gamble.
Reading data of st_blksize in one time is usually faster than
reading same size of the data in divided syscall.
But, in the first place, if you want to read less size than st_blksize?
And then if you want to seek the file far ahead from there?
I think there is no buffering strategy works well in all use cases.
So I believe the library should accept some hints from client codes.

2010/10/14 Denis Koroskin <2korden gmail.com>:
 On Thu, 14 Oct 2010 03:06:30 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 17:23 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 02:01:24 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 16:05 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

To substantiate my brief answer:
 This library code needs to be put somewhere. I just believe it belongs
 to line-reader, not a generic stream. By putting line reading into a
 stream interface, you want make it more efficient.

I assume you meant "won't" instead of "want". So here you're saying that line-oriented I/O does not belong in the interface because it won't make things more efficient. But then your line reading code is extremely efficient by using the interface you yourself embraced.

By adding readln() to the stream interface you will only move that code from ByLine to Stream implementation. The code would be still the same. How can you make it any more efficient? I've read fgets() source code that comes with Microsoft CRT, and it does exactly the same what I did (i.e. fill buffer, read byte-by-byte, copy to output string). It also puts restrictions on line size while I didn't (that's why I had to use an Appender). I also did a line copy (dup) so that I could return immutable string. You see, it's not the Stream interface that make that code less efficient, it's the additional functionality over C API it provides.

Gnu offers two specialized routines: http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is many times more efficient than anything that can be done in client code using the stdio API. I'm thinking along those lines.

I can easily implement similar interface on top of chunked read: ubyte[] readLine(ubyte[] lineBuffer); or bool readLine(ref ubyte[] lineBuffer); I've quickly looked through an implementation, too, and it's still filling a buffer first, and then copying character byte-by-byte to the output string (making realloc when needed) until a delimiter is found. It is exactly as efficient as implemented externally. It does the same amount of copying and memory allocations. "Many times more efficient" is just an overestimation. BTW, did you see my message about std.concurrency?
 Andrei

 P.S. I think I figured the issue with your fonts: the header
 Content-type contains "charset=KOI8-R". That charset propagates
 through all responses. Does anyone know how I can ignore it?

I've changed that to utf-8. Did it help?

Yes, looking great. Thanks! Andrei


Oct 13 2010
prev sibling next sibling parent s50 <shinji.igarashi gmail.com> writes:
I think copy-less interfaces basically can be implemented on
top of chunked(or block-oriented) read(or buffer) also w/o extra
inefficiencies.
Recently in phobos ML, someone has suggested a archetypal
buffering interface allows us to build upper layers which have
copy-less interfaces. If it would be polished more, probably
we can find a point of compromise.

2010/10/14 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 On 10/13/2010 06:23 PM, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 03:06:30 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Gnu offers two specialized routines:
 http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is many
 times more efficient than anything that can be done in client code
 using the stdio API. I'm thinking along those lines.

I can easily implement similar interface on top of chunked read: ubyte[] readLine(ubyte[] lineBuffer); or bool readLine(ref ubyte[] lineBuffer);

You can't.
 I've quickly looked through an implementation, too, and it's still
 filling a buffer first, and then copying character byte-by-byte to the
 output string (making realloc when needed) until a delimiter is found.
 It is exactly as efficient as implemented externally.

Except you don't have an interface to copy byte by byte. Oops...
 It does the same
 amount of copying and memory allocations. "Many times more efficient" is
 just an overestimation.

It's not. I measured because it was important in an application I was working on. It's shocking how some seemingly minor changes can make a big difference in throughput.
 BTW, did you see my message about std.concurrency?

Yes, but I'll need to leave the bulk of it to Sean. Thanks. Andrei

Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 03:47:12 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/2010 06:23 PM, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 03:06:30 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Gnu offers two specialized routines:
 http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is many
 times more efficient than anything that can be done in client code
 using the stdio API. I'm thinking along those lines.

I can easily implement similar interface on top of chunked read: ubyte[] readLine(ubyte[] lineBuffer); or bool readLine(ref ubyte[] lineBuffer);

You can't.
 I've quickly looked through an implementation, too, and it's still
 filling a buffer first, and then copying character byte-by-byte to the
 output string (making realloc when needed) until a delimiter is found.
 It is exactly as efficient as implemented externally.

Except you don't have an interface to copy byte by byte. Oops...
 It does the same
 amount of copying and memory allocations. "Many times more efficient" is
 just an overestimation.

It's not. I measured because it was important in an application I was working on. It's shocking how some seemingly minor changes can make a big difference in throughput.
 BTW, did you see my message about std.concurrency?

Yes, but I'll need to leave the bulk of it to Sean. Thanks. Andrei

Okay. Now give me your best and tell me mine is slower (sorry for a lack of comments): enum BUFFER_SIZE = 16 * 1024; import core.stdc.stdio; import core.stdc.string; import core.memory; class InputStream { this(const char* fileName) { f = fopen(fileName, "r".ptr); } size_t read(ubyte[] buffer) { return .fread(buffer.ptr, 1, buffer.length, f); } FILE* f; } struct ByLine { this(InputStream inputStream, char delim = '\n') { this.inputStream = inputStream; this.delim = delim; this.ptr = this.end = buffer.ptr; } private void refill() { ptr = buffer.ptr; end = ptr + inputStream.read(buffer); } ubyte[] readLine(ubyte[] line) { if (ptr is null) { return null; } ubyte* lineStart = line.ptr; ubyte* linePtr = lineStart; ubyte* lineEnd = lineStart + line.length; while (true) { ubyte* pos = cast(ubyte*)memchr(ptr, delim, end - ptr); if (pos is null) { int size = end - ptr; ubyte* newLinePtr = linePtr + size; if (newLinePtr > lineEnd) { size_t offset = linePtr - lineStart; lineStart = cast(ubyte*)GC.realloc(lineStart, newLinePtr - lineStart); linePtr = lineStart + offset; newLinePtr = linePtr + size; } memcpy(linePtr, ptr, size); linePtr = newLinePtr; refill(); if (ptr !is end) { continue; } ptr = null; return lineStart[0..linePtr - lineStart]; } int size = pos - ptr + 1; ubyte* newLinePtr = linePtr + size; if (newLinePtr > lineEnd) { size_t offset = linePtr - lineStart; lineStart = cast(ubyte*)GC.realloc(lineStart, newLinePtr - lineStart); linePtr = lineStart + offset; newLinePtr = linePtr + size; } memcpy(linePtr, ptr, size); linePtr = newLinePtr; ptr = pos + 1; return lineStart[0..linePtr - lineStart]; } } InputStream inputStream; ubyte* ptr; ubyte* end; ubyte buffer[BUFFER_SIZE]; int delim; } int main() { InputStream inputStream = new InputStream("very-large-file.txt"); ubyte[] line = new ubyte[128]; ByLine byLine = ByLine(inputStream); int numLines = 0; int numChars = 0; while (true) { line = byLine.readLine(line); if (line.ptr is null) { break; } numChars += line.length; numLines++; } printf("numLines: %d\n", numLines); printf("numChars: %d\n", numChars); return 0; }
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 06:36:17 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/10 21:20 CDT, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 03:47:12 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/13/2010 06:23 PM, Denis Koroskin wrote:
 On Thu, 14 Oct 2010 03:06:30 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Gnu offers two specialized routines:
 http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is  
 many
 times more efficient than anything that can be done in client code
 using the stdio API. I'm thinking along those lines.

I can easily implement similar interface on top of chunked read: ubyte[] readLine(ubyte[] lineBuffer); or bool readLine(ref ubyte[] lineBuffer);

You can't.
 I've quickly looked through an implementation, too, and it's still
 filling a buffer first, and then copying character byte-by-byte to the
 output string (making realloc when needed) until a delimiter is found.
 It is exactly as efficient as implemented externally.

Except you don't have an interface to copy byte by byte. Oops...
 It does the same
 amount of copying and memory allocations. "Many times more efficient"  
 is
 just an overestimation.

It's not. I measured because it was important in an application I was working on. It's shocking how some seemingly minor changes can make a big difference in throughput.
 BTW, did you see my message about std.concurrency?

Yes, but I'll need to leave the bulk of it to Sean. Thanks. Andrei

Okay. Now give me your best and tell me mine is slower (sorry for a lack of comments):

If you're satisfied with this, then my point has been lost in the midstream. I was saying it's impossible to implement a line reader on top of a read(ubyte[]) interface without extra buffering and copying. You provided a careful implementation that at the end of the day inevitably does the extra buffering and copying. Andrei

You must have missed it somehow, but I did say many times that buffering needs to be done externally (e.g. in byLine and BufferedStream). Can we outline basic Stream interface now so that we could move on?
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 06:53:57 +0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 Denis Koroskin Wrote:

 On Thu, 14 Oct 2010 02:38:11 +0400, Sean Kelly <sean invisibleduck.org>
 wrote:

 Denis Koroskin Wrote:
 I prefer ubyte[] because that helps GC (void arrays are scanned for
 pointers).

To be fair, the only thing that matters here is what the type is when the initial "new" occurs. After that, I think bits are preserved for reallocations so if NO_SCAN is set then it will remain.

It also matter when I dup it. Even if you preallocate void[] with NO_SCAN dup'ing it will reset the flag.

Why would you be duping the buffer for reading or writing? -Steve

I could imagine idup'ing once a read is complete. Anyway, that was just a note.
Oct 13 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 01:32:49 +0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 Before responding directly, I'll say I think this is on the right  
 track.  IMO, buffering should be transparent when it can be, meaning you  
 should be able to have control over the buffering.  The abstraction  
 should look like this:

 specific application -> buffered streams -> OS abstraction -> Low level  
 calls.

 Denis' proposal covers up to OS abstraction.  What we need on top of  
 that is a buffer layer.

 On Wed, 13 Oct 2010 12:16:38 -0400, Denis Koroskin <2korden gmail.com>  
 wrote:

 I'll explain my I/O streams implementation below in case you didn't  
 read my message (I've changed some stuff a little since then). My  
 Stream interface is very simple:

 // A generic stream
 interface Stream
 {
       property InputStream input();
       property OutputStream output();
       property SeekableStream seekable();
       property bool endOfStream();
      void close();
 }

 You may ask, why separate Input and Output streams? Well, that's  
 because you either read from them, write from them, or both.
 Some streams are read-only (think Stdin), some write-only (Stdout),  
 some support both, like FileStream. Right?

I feel we can possibly make this a compile-time decision. Can we do something like this: interface Stream : InputStream, OutputStream {}
 Not exactly. Does FileStream support writing when you open file for  
 reading? Does it support reading when you open for writing?
 So, you may or may not read from a generic stream, and you also may or  
 may not write to a generic stream. With a design like that you can make  
 a mistake: if a stream isn't readable, you have no reference to invoke  
 read() method on.

Essentially, it's near zero the times that I decide at runtime whether I'm opening a file for reading, writing or both. So why not build that into the type, and then we have the compiler to tell us when something can't be used for reading or writing?
 Similarly, a stream is either seekable, or not. SeekableStreams allow  
 stream cursor manipulation:

 interface SeekableStream : Stream
 {
      long getPosition(Anchor whence = Anchor.begin);
      void setPosition(long position, Anchor whence = Anchor.begin);
 }

A seekable interface is one of those things that's really hard to get right. In Tango, we eventually got rid of the seekable interface and just added seek methods to all the low level stream interfaces. The rationale is that most of the time seekability is not a requirement you can set when opening a file. You open a file for reading or writing, but not for seeking. So it's almost necessary that seekability is a runtime decision (because the OS decides it outside of your control). There are, of course, streams that will not be seekable (netowork sockets), but you just throw an exception when seeking such a stream. The only thing I'd create is a way to determine seekability without throwing an exception (i.e. a canSeek property). But most of the time you know whether a stream is seekable without having to check.

Essentially, there is no difference between bool canSeek and SeekableStream seekable(). However, it's almost always a good idea to check whether a stream actually supports seeking before doing so, and seekable() statically enforces you to do so.
 InputStream doesn't really has many methods:

 interface InputStream
 {
 	// reads up to buffer.length bytes from a stream
 	// returns number of bytes read
 	// throws on error
 	size_t read(ubyte[] buffer);

 	// reads from current position
 	AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null);
 }

I'd say void[] is better here, since you aren't creating the buffer, you're accepting it. Using ubyte makes for awkward casts when you are reading binary data into specific structures. ditto for OutputStream. -Steve

-- Using Opera's revolutionary email client: http://www.opera.com/mail/
Oct 13 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 13 Oct 2010 23:01:38 -0400, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 14 Oct 2010 06:53:57 +0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 Denis Koroskin Wrote:

 On Thu, 14 Oct 2010 02:38:11 +0400, Sean Kelly <sean invisibleduck.org>
 wrote:

 Denis Koroskin Wrote:
 I prefer ubyte[] because that helps GC (void arrays are scanned for
 pointers).

To be fair, the only thing that matters here is what the type is when the initial "new" occurs. After that, I think bits are preserved for reallocations so if NO_SCAN is set then it will remain.

It also matter when I dup it. Even if you preallocate void[] with NO_SCAN dup'ing it will reset the flag.

Why would you be duping the buffer for reading or writing? -Steve

I could imagine idup'ing once a read is complete.

But you don't *create* the buffer as a void[], you just *pass* it as a void[]. When you dup it outside of read, it's back to the type it actually is. As long as read/write don't dup the buffer you are OK. -Steve
Oct 14 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 13 Oct 2010 18:21:16 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve
Oct 14 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 13 Oct 2010 23:13:43 -0400, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 14 Oct 2010 01:32:49 +0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 A seekable interface is one of those things that's really hard to get  
 right.  In Tango, we eventually got rid of the seekable interface and  
 just added seek methods to all the low level stream interfaces.  The  
 rationale is that most of the time seekability is not a requirement you  
 can set when opening a file.  You open a file for reading or writing,  
 but not for seeking.  So it's almost necessary that seekability is a  
 runtime decision (because the OS decides it outside of your control).

 There are, of course, streams that will not be seekable (netowork  
 sockets), but you just throw an exception when seeking such a stream.   
 The only thing I'd create is a way to determine seekability without  
 throwing an exception (i.e. a canSeek property).  But most of the time  
 you know whether a stream is seekable without having to check.

Essentially, there is no difference between bool canSeek and SeekableStream seekable(). However, it's almost always a good idea to check whether a stream actually supports seeking before doing so, and seekable() statically enforces you to do so.

Yes and no. It's good to have to check once. It's not good to require checking on every call. Once you know seeking is supported, you don't have to keep checking. You may say a solution is to cache the result of seekable. But that's awkward also, now you have to maintain two class references to essentially the same implementation. Not only that, but I consider that canSeek will be a very rarely used function. If you have a method that requires seeking, and you check canSeek, what do you do if it returns false? Probably throw an exception? Then why not just try seeking and let it throw the exception? -Steve
Oct 14 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 17:24:34 +0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 13 Oct 2010 18:21:16 -0400, bearophile  
 <bearophileHUGS lycos.com> wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve

That is also the least safe: Object[] objects; stream.read(objects); // most likely will fill with garbage writeln(objects[0]); // access violation It's a type subversion that doesn't require casts.
Oct 14 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 Oct 2010 09:33:54 -0400, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 14 Oct 2010 17:24:34 +0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Wed, 13 Oct 2010 18:21:16 -0400, bearophile  
 <bearophileHUGS lycos.com> wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve

That is also the least safe: Object[] objects; stream.read(objects); // most likely will fill with garbage writeln(objects[0]); // access violation It's a type subversion that doesn't require casts.

Yes, and this is a problem. But on the flip side, requring casts for non-ubyte value types may be too restrictive. Do we want to require casts when the array being filled is for example utf-8? If so, then won't that disallow such a function in safe D? I'm unsure which is worse. To be sure, allowing references to be blindly filled in is not a good thing. But disallowing reading a file properly in safe D is not good either. Are there other techniques we can use? I like the use of void[] because it says what it is -- I don't have any knowledge of your typeinfo, I'm just going to fill in whatever you tell me to. What we need is a type that is implicitly castable from pure value types -- a non-pointer-void. Does this make sense? Is it too much? -Steve
Oct 14 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 Oct 2010 12:10:58 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/14/10 11:01 CDT, Steven Schveighoffer wrote:
 On Thu, 14 Oct 2010 09:33:54 -0400, Denis Koroskin <2korden gmail.com>
 wrote:

 On Thu, 14 Oct 2010 17:24:34 +0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 On Wed, 13 Oct 2010 18:21:16 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve

That is also the least safe: Object[] objects; stream.read(objects); // most likely will fill with garbage writeln(objects[0]); // access violation It's a type subversion that doesn't require casts.

Yes, and this is a problem. But on the flip side, requring casts for non-ubyte value types may be too restrictive. Do we want to require casts when the array being filled is for example utf-8? If so, then won't that disallow such a function in safe D?

I think a solid idea would be to template streaming interfaces on any type T that has no indirections.

This is a *great* idea. This allows you to instantly use whatever type you want in all calls you make, and to have the compiler enforce it. So stream becomes: interface Stream(T = ubyte) if (hasNoPointers!T) -Steve
Oct 14 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 21:31:02 +0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 14 Oct 2010 12:10:58 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/14/10 11:01 CDT, Steven Schveighoffer wrote:
 On Thu, 14 Oct 2010 09:33:54 -0400, Denis Koroskin <2korden gmail.com>
 wrote:

 On Thu, 14 Oct 2010 17:24:34 +0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 On Wed, 13 Oct 2010 18:21:16 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve

That is also the least safe: Object[] objects; stream.read(objects); // most likely will fill with garbage writeln(objects[0]); // access violation It's a type subversion that doesn't require casts.

Yes, and this is a problem. But on the flip side, requring casts for non-ubyte value types may be too restrictive. Do we want to require casts when the array being filled is for example utf-8? If so, then won't that disallow such a function in safe D?

I think a solid idea would be to template streaming interfaces on any type T that has no indirections.

This is a *great* idea. This allows you to instantly use whatever type you want in all calls you make, and to have the compiler enforce it. So stream becomes: interface Stream(T = ubyte) if (hasNoPointers!T) -Steve

I still favor a generic StreamReader with a templated T read() method. I can't think of an example where I would read one type of data from a stream (unless that type is a char). In many cases I read int N, followed by and array of floats of size N etc, followed by some other type. Do I need to reopen streams for that or something? Besides, typed stream needs to be built of top of buffered stream (not an unbuffered one). E.g. T might be a big struct, and Stream doesn't provide a guaranty that it can ready exactly N bytes (think about socket stream).
Oct 14 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 Oct 2010 13:42:53 -0400, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 14 Oct 2010 21:31:02 +0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 14 Oct 2010 12:10:58 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/14/10 11:01 CDT, Steven Schveighoffer wrote:
 On Thu, 14 Oct 2010 09:33:54 -0400, Denis Koroskin <2korden gmail.com>
 wrote:

 On Thu, 14 Oct 2010 17:24:34 +0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 On Wed, 13 Oct 2010 18:21:16 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve

That is also the least safe: Object[] objects; stream.read(objects); // most likely will fill with garbage writeln(objects[0]); // access violation It's a type subversion that doesn't require casts.

Yes, and this is a problem. But on the flip side, requring casts for non-ubyte value types may be too restrictive. Do we want to require casts when the array being filled is for example utf-8? If so, then won't that disallow such a function in safe D?

I think a solid idea would be to template streaming interfaces on any type T that has no indirections.

This is a *great* idea. This allows you to instantly use whatever type you want in all calls you make, and to have the compiler enforce it. So stream becomes: interface Stream(T = ubyte) if (hasNoPointers!T) -Steve

I still favor a generic StreamReader with a templated T read() method. I can't think of an example where I would read one type of data from a stream (unless that type is a char). In many cases I read int N, followed by and array of floats of size N etc, followed by some other type. Do I need to reopen streams for that or something?

No, you just use the default (ubyte) and use casts like you said earlier. But char, wchar, dchar should also be possible types. I can't see huge needs for using something like uint or float, but it's there for free.
 Besides, typed stream needs to be built of top of buffered stream (not  
 an unbuffered one). E.g. T might be a big struct, and Stream doesn't  
 provide a guaranty that it can ready exactly N bytes (think about socket  
 stream).

Hm... you are right, you'd have to buffer at least one T in order for this to work. That kind of sucks. This rules out wchar and char. Maybe it's better to allow all T's where T.sizeof is 1 (this rules out pointer types anyways). Andrei? -Steve
Oct 14 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 14 Oct 2010 22:21:37 +0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 14 Oct 2010 13:42:53 -0400, Denis Koroskin <2korden gmail.com>  
 wrote:

 On Thu, 14 Oct 2010 21:31:02 +0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 14 Oct 2010 12:10:58 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/14/10 11:01 CDT, Steven Schveighoffer wrote:
 On Thu, 14 Oct 2010 09:33:54 -0400, Denis Koroskin  
 <2korden gmail.com>
 wrote:

 On Thu, 14 Oct 2010 17:24:34 +0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 On Wed, 13 Oct 2010 18:21:16 -0400, bearophile
 <bearophileHUGS lycos.com> wrote:

 Andrei:

 Well casting from void[] is equally awkward isn't it? I'm still
 undecided on which is better.

See also: http://d.puremagic.com/issues/show_bug.cgi?id=4572 Bye, bearophile

That issue is slightly different because std.file.read actually creates the buffer. In this cases, the buffer is not created, dup'd, concatenated, etc. so void[] offers the most flexibility. -Steve

That is also the least safe: Object[] objects; stream.read(objects); // most likely will fill with garbage writeln(objects[0]); // access violation It's a type subversion that doesn't require casts.

Yes, and this is a problem. But on the flip side, requring casts for non-ubyte value types may be too restrictive. Do we want to require casts when the array being filled is for example utf-8? If so, then won't that disallow such a function in safe D?

I think a solid idea would be to template streaming interfaces on any type T that has no indirections.

This is a *great* idea. This allows you to instantly use whatever type you want in all calls you make, and to have the compiler enforce it. So stream becomes: interface Stream(T = ubyte) if (hasNoPointers!T) -Steve

I still favor a generic StreamReader with a templated T read() method. I can't think of an example where I would read one type of data from a stream (unless that type is a char). In many cases I read int N, followed by and array of floats of size N etc, followed by some other type. Do I need to reopen streams for that or something?

No, you just use the default (ubyte) and use casts like you said earlier. But char, wchar, dchar should also be possible types. I can't see huge needs for using something like uint or float, but it's there for free.
 Besides, typed stream needs to be built of top of buffered stream (not  
 an unbuffered one). E.g. T might be a big struct, and Stream doesn't  
 provide a guaranty that it can ready exactly N bytes (think about  
 socket stream).

Hm... you are right, you'd have to buffer at least one T in order for this to work. That kind of sucks. This rules out wchar and char. Maybe it's better to allow all T's where T.sizeof is 1 (this rules out pointer types anyways). Andrei? -Steve

What's point of having multiple types that only differ with read signature? When would you prefer Stream!(byte) over Stream!(ubyte) or Stream!(char)? What's wrong with an adapter that allows you to read any kind of data off the stream? Why do you need to use a Stream directly for reading an array of e.g. ints off the stream? Save your time, don't write duplicated code. Use an adapter specially provided for that purpose.
Oct 14 2010
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Fri, 29 Oct 2010 15:40:35 +0400, Bruno Medeiros  
<brunodomedeiros+spam com.gmail> wrote:

 On 13/10/2010 18:48, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 P.S. For threads this deep it's better fork a new one, especially when
 changing the subject.

I thought I did by changing the title... Andrei

At least on my Thunderbird/Icedove 2.0.0.24 it's still in the old Thread.

Same here on my Thunderbird 3.0. Is seems TB cares more about the "References:" field in NNTP message to determine the parent. In fact, with version 3 of TB, it seems that's all it considers... which means that NG messages with the same title as the parent will not be put in the same thread as the parent if they don't have the references field. That sounds like the right approach, however there are some problems in practice because some clients never put the references field (particularly Webnews I think), so all those messages show up in my TB as new threads. :/

Nope, most of the responses through WebNews have correct References in place.
Oct 29 2010
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Fri, 29 Oct 2010 16:32:24 +0400, Bruno Medeiros  
<brunodomedeiros+spam com.gmail> wrote:

 On 29/10/2010 12:50, Denis Koroskin wrote:
 On Fri, 29 Oct 2010 15:40:35 +0400, Bruno Medeiros
 <brunodomedeiros+spam com.gmail> wrote:

 On 13/10/2010 18:48, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/13/10 11:16 CDT, Denis Koroskin wrote:
 P.S. For threads this deep it's better fork a new one, especially  
 when
 changing the subject.

I thought I did by changing the title... Andrei

At least on my Thunderbird/Icedove 2.0.0.24 it's still in the old Thread.

Same here on my Thunderbird 3.0. Is seems TB cares more about the "References:" field in NNTP message to determine the parent. In fact, with version 3 of TB, it seems that's all it considers... which means that NG messages with the same title as the parent will not be put in the same thread as the parent if they don't have the references field. That sounds like the right approach, however there are some problems in practice because some clients never put the references field (particularly Webnews I think), so all those messages show up in my TB as new threads. :/

Nope, most of the responses through WebNews have correct References in place.

All responses that appear as new threads in my TB (ie, threads whose title starts with "Re: ") and for which I have looked at the message source, have user agent: User-Agent: Web-News v.1.6.3 (by Terence Yim) and no references field. These messages are common with some posters, like berophile, Sean Kelly, Kagamin,etc.. But some Web-News messages do have a references field though, so it's not all Web-News messages that are missing it.

That's strange because here is what I get for a typical WebNews message: Path: digitalmars.com!not-for-mail From: tls <do notha.ev> Newsgroups: digitalmars.D Subject: Re: Lints, Condate and bugs Date: Fri, 29 Oct 2010 15:54:12 +0400 Organization: Digital Mars Lines: 48 Message-ID: <iaecl4$9j3$1 digitalmars.com> References: <ia6hac$15en$1 digitalmars.com> <op.vlbyabdfo7cclz korden-pc> <iae6dh$2u1f$1 digitalmars.com> <mailman.26.1288350233.21107.digitalmars-d puremagic.com> MIME-Version: 1.0 Content-Type: text/plain X-Trace: digitalmars.com 1288353252 9827 65.204.18.192 (29 Oct 2010 11:54:12 GMT) X-Complaints-To: usenet digitalmars.com NNTP-Posting-Date: Fri, 29 Oct 2010 11:54:12 +0000 (UTC) User-Agent: Web-News v.1.6.3 (by Terence Yim) Xref: digitalmars.com digitalmars.D:120649
Oct 29 2010