digitalmars.D.learn - Record separator is being lost after string cast
- Kadir Erdem Demir (33/33) Feb 04 2015 I am opening a .gz file and reading it chunk by chunk for
- Kagamin (2/2) Feb 04 2015 Looks like RS is an unprintable character, that's why you don't
- Kagamin (4/4) Feb 04 2015 You can use C functions in D too:
- ketmar (6/13) Feb 04 2015 nothing is lost in the program. what you see is a quirk in tty output:=2...
- Kadir Erdem Demir (16/17) Feb 04 2015 I am sorry make a busy community more busy with false alarms.
- ketmar (27/30) Feb 04 2015 don't mind it. ;-) "D.learn" is for *any* questions about language, no=2...
- Kadir Erdem Demir (5/5) Feb 04 2015 Thanks a lot,
I am opening a .gz file and reading it chunk by chunk for uncompressing it. The data in the uncompressed file is like : aRSbRScRSd, There are record separators(ASCII code 30) between each record(records in my dummy example a,b,c). File file = File(mylog.gz, "r"); auto uc = new UnCompress(); foreach (ubyte[] curChunk; file.byChunk(4096*1024)) { auto uncompressed = cast(string)uc.uncompress(curChunk); writeln(uncompressed); auto stringRange = uncompressed.splitLines(); foreach (string line; stringRange) { ***************** Do something with line The result of the code above is: abcd unfortunately record separators(ASCII 30) are missing. I realized by examining the data record separators are missing after I cast ubyte[] to string. Now I have two questions : Urgent one (my boss already a little disturbed I started the task with D I need to solve this): What should I change in the code to keep record separator? Second one : How can I write the code above without for loops? I want to read gz file line by line. A more general and understandable code for first question : ubyte[] temp = [ 65, 30, 66, 30, 67]; writeln(temp); string tempStr = cast(string) temp; writeln (tempStr); Result is : ABC which is not desired. Thanks Kadir Erdem
Feb 04 2015
Looks like RS is an unprintable character, that's why you don't see it in console.
Feb 04 2015
You can use C functions in D too: import core.stdc.stdio; ubyte[] temp = [ 65, 30, 66, 30, 67, 0]; puts(cast(char*)temp.ptr);
Feb 04 2015
On Wed, 04 Feb 2015 08:13:28 +0000, Kadir Erdem Demir wrote:A more general and understandable code for first question : =20 ubyte[] temp =3D [ 65, 30, 66, 30, 67]; writeln(temp); string tempStr =3D cast(string) temp; writeln (tempStr); =20 Result is : ABC which is not desired.nothing is lost in the program. what you see is a quirk in tty output:=20 '\x1f' is unprintable character, so you simply cannot see it. redirect=20 the output to file and open that file in any hex editor -- and you will=20 find your separators intact. don't beleive what you see! ;-)=
Feb 04 2015
don't beleive what you see! ;-)I am sorry make a busy community more busy with false alarms. When I write to file I saw Record Separator really exists. I hope my second question is a valid one. How can I write the code below better? How can I reduce the number of foreach? statements. File file = File(mylog.gz, "r"); auto uc = new UnCompress(); foreach (ubyte[] curChunk; file.byChunk(4096*1024)) { auto uncompressed = cast(string)uc.uncompress(curChunk); writeln(uncompressed); auto stringRange = uncompressed.splitLines(); foreach (string line; stringRange) { Thanks a lot for replies Kadir Erdem
Feb 04 2015
On Wed, 04 Feb 2015 09:28:27 +0000, Kadir Erdem Demir wrote:I am sorry make a busy community more busy with false alarms.don't mind it. ;-) "D.learn" is for *any* questions about language, no=20 matter how strange they may seem.How can I write the code below better? How can I reduce the number of foreach? statements.actually, your loop seems to be not good anyway, as it may easily read=20 only part of a line. sadly, there is no streaming interface to gz files,=20 so your best bet is to read the whole file in memory, then unpack it all=20 at once, and then process it. just be sure that you have enough RAM.=20 something like this: import std.stdio; import std.string; import std.zlib; void main () { char[] unpacked; // read the whole file and unpack it { auto fl =3D File("test.txt.gz", "rb"); auto packed =3D new ubyte[](cast(usize)fl.size); fl.rawRead(packed); auto up =3D new UnCompress(); unpacked ~=3D cast(char[])up.uncompress(packed); unpacked ~=3D cast(char[])up.flush(); } foreach (auto s; unpacked.splitLines) { writeln(s); } } =
Feb 04 2015
Thanks a lot, I will follow your advise and implement this part same as your example. Regards Kadir Erdem
Feb 04 2015