www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Reading UTF32 files

reply Tim Locke <root vic-20.net> writes:
How do I read an UTF32 file? Stream only seems to support UTF8 with
readLine and UTF16 with readLineW.

Thanks
Aug 03 2006
next sibling parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
Tim Locke wrote:
 How do I read an UTF32 file? Stream only seems to support UTF8 with
 readLine and UTF16 with readLineW.
 
 Thanks

I use mango to convert any file into UTF32 I haven't actually tested it very much .. but I think it should work: --------- static import std.file; import mango = mango.convert.UnicodeBom; version( build ) //TEMP until build learns renamed import syntax { pragma( include, mango.convert.UnicodeBom ) } dchar[] readFile( char[] fileName ) { if( std.file.exists( fileName ) ) return toUtf32( std.file.read( fileName ) ); else throw new Exception("File: " ~ fileName ~ " doesn't exist"); } private { alias mango.UnicodeBomTemplate!(dchar) Utf32Decoder; ///read BOM and decode/convert to utf-32 dchar[] toUtf32(void[] content) { auto decoder = new Utf32Decoder(mango.Unicode.Unknown); return decoder.decode(content); } } ---------
Aug 03 2006
parent reply kris <foo bar.com> writes:
It's perhaps easier to use UnicodeFile instead:

# import mango.io.UnicodeFile;
#
# auto file = new UnicodeFileT!(dchar)("myfile", Unicode.Unknown);
# auto content = file.read;


Please note that Mango leverages a different IO model than Phobos, so 
you'll have to compile this along with a few other mango.io modules.

Mango typically requires the use of Build to pull in relevant modules, 
because the combination of D, libraries, and templates just doesn't work 
reliably at this time. If the compiler front-end were to handle 
recursive imports natively (like a very simple Build), it would be 
great! The changes to do so (for DMD) are minimal ;)



Hasan Aljudy wrote:
 
 
 Tim Locke wrote:
 
 How do I read an UTF32 file? Stream only seems to support UTF8 with
 readLine and UTF16 with readLineW.

 Thanks

I use mango to convert any file into UTF32 I haven't actually tested it very much .. but I think it should work: --------- static import std.file; import mango = mango.convert.UnicodeBom; version( build ) //TEMP until build learns renamed import syntax { pragma( include, mango.convert.UnicodeBom ) } dchar[] readFile( char[] fileName ) { if( std.file.exists( fileName ) ) return toUtf32( std.file.read( fileName ) ); else throw new Exception("File: " ~ fileName ~ " doesn't exist"); } private { alias mango.UnicodeBomTemplate!(dchar) Utf32Decoder; ///read BOM and decode/convert to utf-32 dchar[] toUtf32(void[] content) { auto decoder = new Utf32Decoder(mango.Unicode.Unknown); return decoder.decode(content); } } ---------

Aug 03 2006
parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
kris wrote:
 It's perhaps easier to use UnicodeFile instead:
 
 # import mango.io.UnicodeFile;
 #
 # auto file = new UnicodeFileT!(dchar)("myfile", Unicode.Unknown);
 # auto content = file.read;
 

Ah nice! I didn't know about that. I wish someone had told me about it earlier. Are there any tutorials for mango that explain where everything is? I don't mean the documentation. I mean something that tells you: "if you want to read/decode files, see the documentation for mango.io.UnicodeFile" for example...
 
 Please note that Mango leverages a different IO model than Phobos, so 
 you'll have to compile this along with a few other mango.io modules.

I use build, so I don't really care.
 
 Mango typically requires the use of Build to pull in relevant modules, 
 because the combination of D, libraries, and templates just doesn't work 
 reliably at this time. If the compiler front-end were to handle 
 recursive imports natively (like a very simple Build), it would be 
 great! The changes to do so (for DMD) are minimal ;)
 

Yes, that would be great. Just let dmd recursivly compile all imported module, and because dmd is so fast, it doesn't matter even if dmd recompiles modules that have already been compiled. I always use the -full -clean switches on build anyways.
Aug 03 2006
parent reply kris <foo bar.com> writes:
Hasan Aljudy wrote:
 
 
 kris wrote:
 
 It's perhaps easier to use UnicodeFile instead:

 # import mango.io.UnicodeFile;
 #
 # auto file = new UnicodeFileT!(dchar)("myfile", Unicode.Unknown);
 # auto content = file.read;

Ah nice! I didn't know about that. I wish someone had told me about it earlier. Are there any tutorials for mango that explain where everything is? I don't mean the documentation. I mean something that tells you: "if you want to read/decode files, see the documentation for mango.io.UnicodeFile" for example...

No, but there should be :) BTW: that should probably read "auto content = file.read();" with parens, since otherwise the 'auto' will try to take the function reference
 Mango typically requires the use of Build to pull in relevant modules, 
 because the combination of D, libraries, and templates just doesn't 
 work reliably at this time. If the compiler front-end were to handle 
 recursive imports natively (like a very simple Build), it would be 
 great! The changes to do so (for DMD) are minimal ;)

Yes, that would be great. Just let dmd recursivly compile all imported module, and because dmd is so fast, it doesn't matter even if dmd recompiles modules that have already been compiled. I always use the -full -clean switches on build anyways.

Me too. Note that DMD *already* pulls in all imported modules during a compilation, and runs one or two stages on each of them ... it just doesn't propogate those modules through the latter stages of compilation and linking ~ choosing to discard them instead. A flag to include them in the compilation and linking stages would be just awesome.
Aug 03 2006
parent reply Markus Dangl <danglm in.tum.de> writes:
 BTW: that should probably read "auto content = file.read();" with 
 parens, since otherwise the 'auto' will try to take the function reference

Just a note: I think all methods that don't take parameters can be called without parens, just like you normally use properties, but it's a bit clearer to use parens here (because "read" should actually be used as a method). To take the reference you'd have to use sth like "auto pointer = &file.read" ...
Aug 10 2006
parent Oskar Linde <olREM OVEnada.kth.se> writes:
Markus Dangl wrote:

 BTW: that should probably read "auto content = file.read();" with
 parens, since otherwise the 'auto' will try to take the function
 reference

Just a note: I think all methods that don't take parameters can be called without parens, just like you normally use properties, but it's a bit clearer to use parens here (because "read" should actually be used as a method). To take the reference you'd have to use sth like "auto pointer = &file.read" ...

In this case, due to a bug or an unfortunate side effect, auto content = file.read; will neither call file.read() or make content a reference to the function. It will try to make content a function type (as opposed to a reference to a function) which will fail to compile. There is also still at least one case where an empty pair of parentheses are needed at a function call. Array extension methods: void func(int[] t) {} can not be called as: arr.func; Though I'm not sure there is any fundamental reason it has to be that way. All function (reference) and delegate types will also require the parentheses, which is more or less necessary to avoid ambiguities: int delegate() func() { return { return 1; }; } ... func; /Oskar
Aug 10 2006
prev sibling parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Fri, 04 Aug 2006 00:38:18 -0300, Tim Locke wrote:

 How do I read an UTF32 file? Stream only seems to support UTF8 with
 readLine and UTF16 with readLineW.
 
 Thanks

Read them in 4-byte chunks and, depending on endian-ness, convert to a ulong then cast to a dchar then append to a dchar[] ... simple! -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 4/08/2006 2:15:37 PM
Aug 03 2006
parent Tim Locke <root vic-20.net> writes:
On Fri, 4 Aug 2006 14:16:49 +1000, Derek Parnell
<derek nomail.afraid.org> wrote:

On Fri, 04 Aug 2006 00:38:18 -0300, Tim Locke wrote:

 How do I read an UTF32 file? Stream only seems to support UTF8 with
 readLine and UTF16 with readLineW.
 
 Thanks

Read them in 4-byte chunks and, depending on endian-ness, convert to a ulong then cast to a dchar then append to a dchar[] ... simple!

Thanks. I will try that.
Aug 04 2006