www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Am I doing this right? (File byChunk)

reply Andrej Mitrovic <none none.none> writes:
Here's a little snippet of code that interfaces with Scintilla (it works btw.):

File file = File("test.txt", "r");
foreach (ubyte[] buf; file.byChunk(4096))
{
	sendEditor(SCI_ADDTEXT, buf.length, (cast(char[])buf).idup);
}

The cast looks ugly, but I *have* to send a copy. Am I doing it right, or can I
make this a bit simpler? Otherwise I can use file.byLine, but I think this
would be much slower (unless D reads the entire contents at once into memory,
but I think it does not).

Actually, I shouldn't even complain. I can't believe how easy it is to open a
file, use buffers and send it all to a C interface. This is a 4-liner, the C
equivalent spans 20+ lines. Ha!
Sep 12 2010
next sibling parent reply BCS <none anon.com> writes:
Hello Andrej,

 Here's a little snippet of code that interfaces with Scintilla (it
 works btw.):
 
 File file = File("test.txt", "r");
 foreach (ubyte[] buf; file.byChunk(4096))
 {
 sendEditor(SCI_ADDTEXT, buf.length, (cast(char[])buf).idup);
 }
 The cast looks ugly, but I *have* to send a copy.

What do those have to do with each other?
 Am I doing it right,

It doesn't seem right to me either, because you are allocating memory in chunks and passing it to C without keeping pointers around. Also what does that function do with them? If it keeps them around, you might be faster to load the whole file in one go and make a single call (fewer allocations, fewer calls, no-extra memory usage, etc.)
 or can I make this a bit simpler? Otherwise I can use file.byLine, but
 I think this would be much slower (unless D reads the entire contents
 at once into memory, but I think it does not).
 
 Actually, I shouldn't even complain. I can't believe how easy it is to
 open a file, use buffers and send it all to a C interface. This is a
 4-liner, the C equivalent spans 20+ lines. Ha!
 

... <IXOYE><
Sep 12 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 foreach (ubyte[] buf; file.byChunk(4096))
 {
 	sendEditor(SCI_ADDTEXT, buf.length, cast(LPARAM)buf.ptr);
 }
 ...
 SCI_ADDTEXT(int length, const char *s)

Keep in mind that the length of a D array is a size_t, this means a 32 or 64 bit long unsigned word.
 The ADDTEXT function creates a copy of the pointed-to contents to it's
 internal scintilla buffers, so I don't have to keep any pointers
 around after the call.

In byChunk() the content of buffer is reused across calls, so you are not wasting allocations. I don't know if it's possible to use a fixed-size char[4096] array to remove the first memory allocation too. I think byChunk() needs a second optional argument, to give it a preallocated buffer (like a slice of a fixed-size array). Bye, bearophile
Sep 13 2010
parent bearophile <bearophileHUGS lycos.com> writes:
 I think byChunk() needs a second optional argument, to give it a preallocated
buffer (like a slice of a fixed-size array).

http://d.puremagic.com/issues/show_bug.cgi?id=4859
Sep 13 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 12 September 2010 18:04:14 Andrej Mitrovic wrote:
 Here's a little snippet of code that interfaces with Scintilla (it works
 btw.):
 
 File file = File("test.txt", "r");
 foreach (ubyte[] buf; file.byChunk(4096))
 {
 	sendEditor(SCI_ADDTEXT, buf.length, (cast(char[])buf).idup);
 }
 
 The cast looks ugly, but I *have* to send a copy. Am I doing it right, or
 can I make this a bit simpler? Otherwise I can use file.byLine, but I
 think this would be much slower (unless D reads the entire contents at
 once into memory, but I think it does not).

I'd have to check, but I believe that byLine() reads the whole line into memory but not the whole file. - Jonathan M Davis
Sep 12 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Ah, silly me, disregard that code. I was working around the issue but
I didn't have to, here's the correct code:

foreach (ubyte[] buf; file.byChunk(4096))
{
	sendEditor(SCI_ADDTEXT, buf.length, cast(LPARAM)buf.ptr);
}

Now, the explanation. The sendEditor function takes an opcode (uint),
and 2 arguments. It uses message passing to communicate with
scintilla, the two arguments are aliased to uint (which is usually a
length argument but is sometimes ignored) and LPARAM (a pointer's
address as ULONG).

Scintilla converts the address back to a pointer, and passes it (and
the first argument) to the relevant function based on the opcode.
Here's the header of that function:

SCI_ADDTEXT(int length, const char *s)

The reason my first code was working is because the wrapper has two
overloaded sendEditor functions. With one you can pass an opcode, a
length and pointer address, with the other you can pass an opcode, a
length and a string, and the function makes a copy of the string, adds
a null terminator and uses its address and the other arguments to send
a message.

The ADDTEXT function creates a copy of the pointed-to contents to it's
internal scintilla buffers, so I don't have to keep any pointers
around after the call.

I hope I've explained this right (and didn't mess something up). :p


On Mon, Sep 13, 2010 at 3:47 AM, BCS <none anon.com> wrote:
 Hello Andrej,

 Here's a little snippet of code that interfaces with Scintilla (it
 works btw.):

 File file = File("test.txt", "r");
 foreach (ubyte[] buf; file.byChunk(4096))
 {
 sendEditor(SCI_ADDTEXT, buf.length, (cast(char[])buf).idup);
 }
 The cast looks ugly, but I *have* to send a copy.

What do those have to do with each other?
 Am I doing it right,

It doesn't seem right to me either, because you are allocating memory in chunks and passing it to C without keeping pointers around. Also what does that function do with them? If it keeps them around, you might be faster to load the whole file in one go and make a single call (fewer allocations, fewer calls, no-extra memory usage, etc.)
 or can I make this a bit simpler? Otherwise I can use file.byLine, but
 I think this would be much slower (unless D reads the entire contents
 at once into memory, but I think it does not).

 Actually, I shouldn't even complain. I can't believe how easy it is to
 open a file, use buffers and send it all to a C interface. This is a
 4-liner, the C equivalent spans 20+ lines. Ha!

... <IXOYE><

Sep 12 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I was refering to the open method, or when calling the constructor of
Text. But it doesn't actually *read* the contents until I ask it to
(which is a good thing). In other news it's late and I'm talking crazy
tonight, sorry. :)

On Mon, Sep 13, 2010 at 4:01 AM, Jonathan M Davis <jmdavisprog gmail.com> w=
rote:
 On Sunday 12 September 2010 18:04:14 Andrej Mitrovic wrote:
 Here's a little snippet of code that interfaces with Scintilla (it works
 btw.):

 File file =3D File("test.txt", "r");
 foreach (ubyte[] buf; file.byChunk(4096))
 {
 =A0 =A0 =A0 sendEditor(SCI_ADDTEXT, buf.length, (cast(char[])buf).idup);
 }

 The cast looks ugly, but I *have* to send a copy. Am I doing it right, o=


 can I make this a bit simpler? Otherwise I can use file.byLine, but I
 think this would be much slower (unless D reads the entire contents at
 once into memory, but I think it does not).

I'd have to check, but I believe that byLine() reads the whole line into =

 but not the whole file.

 - Jonathan M Davis

Sep 12 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 12 September 2010 19:31:23 Andrej Mitrovic wrote:
 I was refering to the open method, or when calling the constructor of
 Text. But it doesn't actually *read* the contents until I ask it to
 (which is a good thing). In other news it's late and I'm talking crazy
 tonight, sorry. :)

Well, you do appear to be 9 hours ahead of me, time zone-wise, and you do appear to be up rather late. Entertainingly, the time difference makes for interesting timestamps in the messages as our respective clients put in the local time that a message is sent:
 On Mon, Sep 13, 2010 at 4:01 AM, Jonathan M Davis <jmdavisprog gmail.com> 

 On Sunday 12 September 2010 18:04:14 Andrej Mitrovic wrote:


Mine also appears to be in 24 time, while yours is in 12 hour time, and mine seems to want to put second-precision, while yours only cares about minute- precision. The Date format is a fair bit different as well. Not that it really matters, but it's interesting nonetheless. - Jonathan M Davis
Sep 12 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I prefer 24-hour timestamps, but I've no idea where to set this in
Gmail. And I'm not using a special newsgroup reader because they're
all clumsy in their own little ways.

On Mon, Sep 13, 2010 at 4:46 AM, Jonathan M Davis <jmdavisprog gmail.com> wrote:
 On Sunday 12 September 2010 19:31:23 Andrej Mitrovic wrote:
 I was refering to the open method, or when calling the constructor of
 Text. But it doesn't actually *read* the contents until I ask it to
 (which is a good thing). In other news it's late and I'm talking crazy
 tonight, sorry. :)

Well, you do appear to be 9 hours ahead of me, time zone-wise, and you do appear to be up rather late. Entertainingly, the time difference makes for interesting timestamps in the messages as our respective clients put in the local time that a message is sent:
 On Mon, Sep 13, 2010 at 4:01 AM, Jonathan M Davis <jmdavisprog gmail.com>

 On Sunday 12 September 2010 18:04:14 Andrej Mitrovic wrote:


Mine also appears to be in 24 time, while yours is in 12 hour time, and mine seems to want to put second-precision, while yours only cares about minute- precision. The Date format is a fair bit different as well. Not that it really matters, but it's interesting nonetheless. - Jonathan M Davis

Sep 12 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 12 September 2010 19:56:28 Andrej Mitrovic wrote:
 I prefer 24-hour timestamps, but I've no idea where to set this in
 Gmail. And I'm not using a special newsgroup reader because they're
 all clumsy in their own little ways.

I'm using a gmail account, but I'm using kmail to manage my mail. So, the timestamps and whatnot are going to depend on my system settings. I like knode (KDE's newsgroup reader) overall, but I stopped using it because I had no way of syncing between machines without copying all of knode's files over every time, which is not pleasant. Using gmail/kmail with IMAP allows for keeping things in sync. Unfortunately, since gmail never sends you messages from the list that you sent to it, it doesn't really work properly to send messages which reply to your own messages, and kmail threads replies to my messages poorly (usually associating them with the message that I responded to). So, I'd prefer to use knode, but the lack of syncing was too painful. I suppose that it would work better if I used a non-gmail account though. I wouldn't know what to use though, since I'd want free IMAP support. And another provider would use proper folders rather than the annoying labels which mail programs like kmail don't deal with quite properly. So maybe I should think of finding a better e-mail provider than gmail... - Jonathan M Davis
Sep 12 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Yes, because LPARAM is defined in the DFL library as a long. Actually,
it's hardcoded, there's no static if or versioning. I'll keep an eye
on that for when DMD is able to build 64bit binaries.

On Mon, Sep 13, 2010 at 1:09 PM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 Andrej Mitrovic:

 foreach (ubyte[] buf; file.byChunk(4096))
 {
 =A0 =A0 =A0 sendEditor(SCI_ADDTEXT, buf.length, cast(LPARAM)buf.ptr);
 }
 ...
 SCI_ADDTEXT(int length, const char *s)

Keep in mind that the length of a D array is a size_t, this means a 32 or=

Sep 13 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
That could be a good idea.

On Mon, Sep 13, 2010 at 1:09 PM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 In byChunk() the content of buffer is reused across calls, so you are not=

r[4096] array to remove the first memory allocation too. I think byChunk() = needs a second optional argument, to give it a preallocated buffer (like a = slice of a fixed-size array).
 Bye,
 bearophile

Sep 13 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
gmx.com seems to deal with mailing list e-mails correctly (by actually putting 
the ones you sent to the list in your inbox when they come from the list), and 
it has free imap and lots of disk space just like gmail (not to mention that it 
uses proper folders instead of labels), so I've switched over to it. It's much 
more pleasant to deal with (at least when using it from an e-mail client; 
whether the user interface for the site itself is better is more debatable).
And 
switching e-mail addresses gives me a chance to reduce the spam that I get. :)

- Jonathan M Davis
Sep 14 2010
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
 I'll be sure to check it out once I get my phone line back (forgot to
pay the bills..). I'm currently on expensive wireless (Why is wireless
so expensive anyway? What's so special about radiowaves that were
invented half a century ago?..)

Thanks for the heads up!

On 9/14/10, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 gmx.com seems to deal with mailing list e-mails correctly (by actually
 putting
 the ones you sent to the list in your inbox when they come from the list),
 and
 it has free imap and lots of disk space just like gmail (not to mention that
 it
 uses proper folders instead of labels), so I've switched over to it. It's
 much
 more pleasant to deal with (at least when using it from an e-mail client;
 whether the user interface for the site itself is better is more debatable).
 And
 switching e-mail addresses gives me a chance to reduce the spam that I get.
 :)

 - Jonathan M Davis

Oct 14 2010