www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 1482] New: std.file docs are insufficient

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1482

           Summary: std.file docs are insufficient
           Product: D
           Version: 2.005
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: jlquinn us.ibm.com


The prototype for read() returns a void[].  The docs say it returns an array of
bytes.  Why does it return void[] instead of byte[] or ubyte[]?  What types is
it safe to cast the return value to?

Does read() return the complete file from a single call?

write() and append() have the same issue.  What types may be passed in safely? 
What exactly gets written out if you pass an array of dchar?

The class (and others) really need overview text at the top.  Look at the Java
API docs for inspiration.


-- 
Sep 07 2007
next sibling parent reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1482


thecybershadow gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |http://digitalmars.com/d/pho
                   |                            |bos/std_file.html




------- Comment #1 from thecybershadow gmail.com  2007-09-07 18:46 -------
It returns an array of the bytes that it read from the file. I think that the
bug here is that the specs don't describe what a void[] really is. Closest to
that is the "Implicit Conversions" sections here: 

http://digitalmars.com/d/arrays.html#strings
(linked to closest anchor, scroll a bit below)

A void[] is functionally the same as a byte[] or ubyte[], with the differences:
1) you can't access an element of it (since the type of the underlying data is
unknown)
2) any array implicitly converts to void[] - this allows you to work with
functions that take void[] arguments without using explicit casts, except when
you need to pass something that isn't an array, in which case you have to
resort to syntax like write(filename, &mystruct[0..1]). 

IMO D or Phobos should have a simple syntax or template that allows you to
convert any variable to a void[], which is essentially a void* and a length.
Currently I use this (pretty crude) template (which likely can be rewritten in
a much better way):

struct BufferEx
{
        union
        {
                buffer buf;
                struct Fields
                {
                        size_t length;
                        void* ptr;
                } Fields fields;
        }
}

buffer toBuffer(T)(inout T data)
{
        BufferEx b;
        b.fields.ptr = &data;
        b.fields.length = T.sizeof;
        return b.buf;
}

The std.file functions aren't part of a class. Upside is simple and readable
code for small programs, downside is possible name collisions ("read"/"write"
is a common thing to be found in the global namespace). Yay for
selective/static/renaming imports, I guess. Note that Tango uses a File class
(which makes the code more bloated, as you need two operations - class
instantiation and the operation - for a single file operation).

P.S. It's none of my business, but is IBM interested in D now? :)


-- 
Sep 07 2007
parent "Stewart Gordon" <smjg_1998 yahoo.com> writes:
<d-bugmail puremagic.com> wrote in message 
news:fbsnsp$q84$1 digitalmars.com...
<snip>
 IMO D or Phobos should have a simple syntax or template that allows you to
 convert any variable to a void[], which is essentially a void* and a 
 length.
 Currently I use this (pretty crude) template (which likely can be 
 rewritten in
 a much better way):

And I use cast(void[]) (&var)[0..1] Of course, you'd need to .dup it if you want it to be valid after var has gone out of scope. Stewart.
Sep 08 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1482





------- Comment #2 from thecybershadow gmail.com  2007-09-07 18:52 -------
Oops, in the snippet of code above I forgot to include:

public alias void[] buffer;

(actually it's in a compiler version clause which mixins "public alias
const(void)[] buffer;" for the 2.0 branch)


-- 
Sep 07 2007
prev sibling next sibling parent reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1482





------- Comment #3 from jlquinn us.ibm.com  2007-09-11 09:04 -------
(In reply to comment #1)
 It returns an array of the bytes that it read from the file. I think that the
 bug here is that the specs don't describe what a void[] really is. Closest to
 that is the "Implicit Conversions" sections here: 
 
 http://digitalmars.com/d/arrays.html#strings
 (linked to closest anchor, scroll a bit below)
 
 A void[] is functionally the same as a byte[] or ubyte[], with the differences:

It's essentially the same as char[] as well, then, right? A char is really just a byte that is given preferential treatment as a UTF-8 string, no?
 The std.file functions aren't part of a class. Upside is simple and readable
 code for small programs, downside is possible name collisions ("read"/"write"
 is a common thing to be found in the global namespace). Yay for
 selective/static/renaming imports, 

The docs don't make this 100% clear. I think it's mostly visual formatting, combined with a lack of a cohesive summary for the module. A related doc note ... I find that the docs make it difficult to distinguish the methods associated with a class when there multiple classes on a single page. Again, I'd call it visual formatting. Another issue I found is that I didn't get a definitive answer on whether reading files automatically treats them as UTF-8 or not.
 P.S. It's none of my business, but is IBM interested in D now? :)

I can't speak for the company, but I personally find the core language more pleasant than C++. If it offers better performance than java then it becomes more interesting to me :-) --
Sep 11 2007
parent Regan Heath <regan netmail.co.nz> writes:
 A related doc note ...  I find that the docs make it difficult to distinguish
 the methods associated with a class when there multiple classes on a single
 page.  Again, I'd call it visual formatting.

I find the same thing.
 Another issue I found is that I didn't get a definitive answer on whether
 reading files automatically treats them as UTF-8 or not.

I believe; std.file.read simply reads bytes, it doesn't do any conversion, it doesn't handle UTF-8, 16, or 32 BOM or anything else.
 P.S. It's none of my business, but is IBM interested in D now? :)

I can't speak for the company, but I personally find the core language more pleasant than C++. If it offers better performance than java then it becomes more interesting to me :-)

D offers similar if not better performance than C/C++ in some cases. Regan
Sep 11 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1482





------- Comment #4 from thecybershadow gmail.com  2007-09-12 07:07 -------
(In reply to comment #3)
 It's essentially the same as char[] as well, then, right?  A char is really
 just  a byte that is given preferential treatment as a UTF-8 string, no?

Indeed (also character literals don't implicitly cast to integers, and vice-versa).
 The docs don't make this 100% clear.  I think it's mostly visual formatting,
 combined with a lack of a cohesive summary for the module.
 
 A related doc note ...  I find that the docs make it difficult to distinguish
 the methods associated with a class when there multiple classes on a single
 page.  Again, I'd call it visual formatting.

It's true. I got the habit of just checking the library source most of the time...
 Another issue I found is that I didn't get a definitive answer on whether
 reading files automatically treats them as UTF-8 or not.

The std.file routines treat the files as raw data. No conversion is performed. You are free to operate on the data as you please, however Phobos routines that take char[] arguments will usually expect them to be encoded in UTF-8. Pretty sure there was a page about Unicode and UTF-8 in D somewhere... --
Sep 12 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1482


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei metalanguage.com
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 11 2009
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1482


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


--- Comment #5 from Andrei Alexandrescu <andrei metalanguage.com> 2010-09-25
15:38:02 PDT ---
http://www.dsource.org/projects/phobos/changeset/2048

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 25 2010