digitalmars.D - Transcoding - Summary
- Arcane Jill <Arcane_member pathlink.com> Aug 17 2004
- Arcane Jill <Arcane_member pathlink.com> Aug 17 2004
- Derek <derek psyc.ward> Aug 17 2004
- "Walter" <newshound digitalmars.com> Aug 17 2004
- Regan Heath <regan netwin.co.nz> Aug 17 2004
- Russ Lewis <spamhole-2001-07-16 deming-os.org> Aug 17 2004
- Regan Heath <regan netwin.co.nz> Aug 17 2004
- Arcane Jill <Arcane_member pathlink.com> Aug 18 2004
- Derek Parnell <derek psych.ward> Aug 17 2004
- "antiAlias" <fu bar.com> Aug 18 2004
- Arcane Jill <Arcane_member pathlink.com> Aug 18 2004
- "antiAlias" <fu bar.com> Aug 18 2004
- Arcane Jill <Arcane_member pathlink.com> Aug 19 2004
- J C Calvarese <jcc7 cox.net> Aug 18 2004
We have two separate problems: (1) formatted I/O (2) unformatted I/O For unformatted I/O, we need the ability to read a sequence of dchars from some source, and the ability to write a sequence of dchars to some sink. The class which acts as a dchar source must perform decoding from some underlying ubyte source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. The source and sink could be anything - a string; a console; a file; a socket; - even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: ubyte source = input stream ubyte sink = output stream dchar source = reader dchar sink = writer (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). For formatted I/O, we need: (1a) a replacement for printf() which emits a formatted sequence of dchars to an arbitrary dchar sink (1b) a replacement for scanf() which parses a sequence of dchars obtained from an arbitrary dchar source Further, for reasons of internationalization, our printf replacement must be able to random-access its variadic arguments. Observe that if the output of (1a) is plumbed into an encoder, and the input to (1b) is plumbed into a decoder, then formatted transcoding is achieved. This makes our printf/scanf replacements relatively easy to write. They are likely to require very little modification from the existing format()/unformat() routines, with essentially the only difference being that they must be dchar-based, not char-based. (Random-access of the arguments would be a new feature, however, though not necessarily an urgent one). Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were fully-featured, fully-documented, bug-free and intuitive, then nobody would be asking for this requirement. But as things are, the requirement is there). So ... listed below are the jobs which need to be done. Volunteers are requested for any unclaimed jobs: (1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can be written. (3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders to std and mango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. (6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's excellent ideas for throughput enhancement using buffers are part of (1) and (3), so I suggest AntiAlias and I send each other code back and forth until we are both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6). Arcane Jill
Aug 17 2004
In article <cfsm6d$va0$1 digitaldaemon.com>, Arcane Jill says...(1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can be written. (3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders to std and mango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. (6) Will somebody /please/ document std.Stream?
Nick, I think your work falls into category (5). If you want that job, I guess it's yours, but if so, please wait for (3) before you start. Jill
Aug 17 2004
On Tue, 17 Aug 2004 10:21:01 +0000 (UTC), Arcane Jill wrote:We have two separate problems: (1) formatted I/O (2) unformatted I/O For unformatted I/O, we need the ability to read a sequence of dchars from some source, and the ability to write a sequence of dchars to some sink. The class which acts as a dchar source must perform decoding from some underlying ubyte source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. The source and sink could be anything - a string; a console; a file; a socket; - even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: ubyte source = input stream ubyte sink = output stream dchar source = reader dchar sink = writer (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). For formatted I/O, we need: (1a) a replacement for printf() which emits a formatted sequence of dchars to an arbitrary dchar sink (1b) a replacement for scanf() which parses a sequence of dchars obtained from an arbitrary dchar source Further, for reasons of internationalization, our printf replacement must be able to random-access its variadic arguments. Observe that if the output of (1a) is plumbed into an encoder, and the input to (1b) is plumbed into a decoder, then formatted transcoding is achieved. This makes our printf/scanf replacements relatively easy to write. They are likely to require very little modification from the existing format()/unformat() routines, with essentially the only difference being that they must be dchar-based, not char-based. (Random-access of the arguments would be a new feature, however, though not necessarily an urgent one). Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were fully-featured, fully-documented, bug-free and intuitive, then nobody would be asking for this requirement. But as things are, the requirement is there). So ... listed below are the jobs which need to be done. Volunteers are requested for any unclaimed jobs: (1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can be written. (3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders to std and mango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. (6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's excellent ideas for throughput enhancement using buffers are part of (1) and (3), so I suggest AntiAlias and I send each other code back and forth until we are both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6). Arcane Jill
I hope I'm not stating the bleeding obvious, but you are talking about TEXT I/O aren't you? There is also a lot of other I/O that is not text based - sound and image files, databases, etc... -- Derek Melbourne, Australia
Aug 17 2004
"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:cfsm6d$va0$1 digitaldaemon.com...Further, for reasons of internationalization, our printf replacement must
able to random-access its variadic arguments.
I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter).
Aug 17 2004
On Tue, 17 Aug 2004 15:00:28 -0700, Walter <newshound digitalmars.com> wrote:"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:cfsm6d$va0$1 digitaldaemon.com...Further, for reasons of internationalization, our printf replacement must
able to random-access its variadic arguments.
I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter).
Did you miss the thread that mentioned that sentence structure in various languages differ? Example: english :- "The DOG is BIG" other :- ".. BIG .. DOG" (I don't actually know any other languages) So, it would be kind of useful to be able to define the format strings as: english :- "The $1 is $2" other :- ".. $2 .. $1" and be able to go: printf(format[lang_id],"DOG","BIG"); Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 17 2004
Regan Heath wrote:Did you miss the thread that mentioned that sentence structure in various languages differ? Example: english :- "The DOG is BIG" other :- ".. BIG .. DOG" (I don't actually know any other languages) So, it would be kind of useful to be able to define the format strings as: english :- "The $1 is $2" other :- ".. $2 .. $1" and be able to go: printf(format[lang_id],"DOG","BIG");
This isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? Your code could look (very roughly) like this: char[] formatString = LookupNLSFormat (msgID, language); char[] reorderString = LookupNLSReorder(msgID, language); vwritef(formatString, doArgumentReorder(reorderString, <args>)); The advantage here is that you can do reordering for NLS support but writef stays simple.
Aug 17 2004
On Tue, 17 Aug 2004 19:45:47 -0700, Russ Lewis <spamhole-2001-07-16 deming-os.org> wrote:Regan Heath wrote:Did you miss the thread that mentioned that sentence structure in various languages differ? Example: english :- "The DOG is BIG" other :- ".. BIG .. DOG" (I don't actually know any other languages) So, it would be kind of useful to be able to define the format strings as: english :- "The $1 is $2" other :- ".. $2 .. $1" and be able to go: printf(format[lang_id],"DOG","BIG");
This isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? Your code could look (very roughly) like this: char[] formatString = LookupNLSFormat (msgID, language); char[] reorderString = LookupNLSReorder(msgID, language); vwritef(formatString, doArgumentReorder(reorderString, <args>)); The advantage here is that you can do reordering for NLS support but writef stays simple.
The disadvantage being that the above idea is harder to maintain, there are 2 things that define how the message is displayed, 2 things in which a mistake could be made, 2 things in which you have to make changes, .. How hard or complex is it to implement a writef that can do: writef("The %1 is %2","dog","big"); (%1 and %2 can be changed to any symbol that fits with the current symbol set used in writef) I can't see it being a particularly big leap from what it currently does. Also consider: writef("A really long %1 that contains the same %1 several times. %1's like this could be quite common, yes?","string"); Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 17 2004
In article <opscwr50rl5a2sq9 digitalmars.com>, Regan Heath says...On Tue, 17 Aug 2004 19:45:47 -0700, Russ Lewis
This isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? Your code could look (very roughly) like this: char[] formatString = LookupNLSFormat (msgID, language); char[] reorderString = LookupNLSReorder(msgID, language); vwritef(formatString, doArgumentReorder(reorderString, <args>));
are 2 things that define how the message is displayed, 2 things in which a mistake could be made, 2 things in which you have to make changes, .. How hard or complex is it to implement a writef that can do: writef("The %1 is %2","dog","big"); (%1 and %2 can be changed to any symbol that fits with the current symbol set used in writef) I can't see it being a particularly big leap from what it currently does. Also consider: writef("A really long %1 that contains the same %1 several times. %1's like this could be quite common, yes?","string");
Well, I didn't mean to cause trouble here. :) Anyway. I'm agreeing with Regan, and slightly disagreeing with Walter. There /is/ a need to be able do: # // English # article = "the"; # adjective = "red"; # noun = "house"; # formatString = "%s %s %s"; // default order # # // French # article = "la"; # adjective = "rouge"; # noun = "maison"; # formatString = "%(1)s %(3)s %(2)s"; # # writef(formatString, article, adjective, noun); Sorry, but that's a requirement. It's not an /urgent/ requirement, but you can bet vast sums of money that internationalization will start to become more and more of an issue once other transcoding issues have been dealt with. Russ's idea is good, but obviously not /as/ good as simply coming up with an improved printf() replacement. Right now, POSIX-printf() can do this random-access, but D's writef() can't. It's not urgent, and we'll solve it in time. But it /is/ an internationalization issue, and it won't go away. Arcane Jill
Aug 18 2004
On Tue, 17 Aug 2004 15:00:28 -0700, Walter wrote:"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:cfsm6d$va0$1 digitaldaemon.com...Further, for reasons of internationalization, our printf replacement must
able to random-access its variadic arguments.
I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter).
I think that AJ was suggesting that there exists a business need for a type of formatter that can express in its template, the order that arguments will appear in the resultant string, regardless of the order that they are presented to the formatter. For example (contrived for simplicity): char[] Msg; if (gUserLang == LANG_english) temp = "%{1}s %{2}s %{3}s %{4}s %{5}s\n"; else temp = "%{2}s %{1}s %{5}s %{4}s %{3}s\n"; Msg = expand(temp, pSubjectDesc, pSubject, pVerb, pObjectDesc, pObject); writef(Msg); -- Derek Melbourne, Australia 18/Aug/04 10:31:55 AM
Aug 17 2004
Jill ~ I have a utf-8 transcoder that I'm using as a plaything within Mango; if you're interested, I'll send it on. "Arcane Jill" <Arcane_member pathlink.com> wrote in message news:cfsm6d$va0$1 digitaldaemon.com...We have two separate problems: (1) formatted I/O (2) unformatted I/O For unformatted I/O, we need the ability to read a sequence of dchars from
source, and the ability to write a sequence of dchars to some sink. The
which acts as a dchar source must perform decoding from some underlying
source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. The source and sink could be anything - a string; a console; a file; a
even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: ubyte source = input stream ubyte sink = output stream dchar source = reader dchar sink = writer (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). For formatted I/O, we need: (1a) a replacement for printf() which emits a formatted sequence of dchars
arbitrary dchar sink (1b) a replacement for scanf() which parses a sequence of dchars obtained
an arbitrary dchar source Further, for reasons of internationalization, our printf replacement must
able to random-access its variadic arguments. Observe that if the output of (1a) is plumbed into an encoder, and the
(1b) is plumbed into a decoder, then formatted transcoding is achieved.
makes our printf/scanf replacements relatively easy to write. They are
require very little modification from the existing format()/unformat()
with essentially the only difference being that they must be dchar-based,
char-based. (Random-access of the arguments would be a new feature,
though not necessarily an urgent one). Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams
fully-featured, fully-documented, bug-free and intuitive, then nobody
asking for this requirement. But as things are, the requirement is there). So ... listed below are the jobs which need to be done. Volunteers are
for any unclaimed jobs: (1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can be
(3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders to
mango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now be
(6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2).
excellent ideas for throughput enhancement using buffers are part of (1)
(3), so I suggest AntiAlias and I send each other code back and forth
are both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) are
upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for
Arcane Jill
Aug 18 2004
In article <cfv1d0$26s7$1 digitaldaemon.com>, antiAlias says...Jill ~ I have a utf-8 transcoder that I'm using as a plaything within Mango; if you're interested, I'll send it on.
Not really interested because (a) there's one in std.utf, and (b) I could write my own in just a few lines of code anyway. But we're really talking about general concepts here, not specific encodings. We need to get the architecture "right" first - which I guess means, in a form that everyone is happy with - and /then/ we start plugging in specific encodings. UTF-8 is one of the easiest, so I'm really not troubled by it. (ASCII is /the/ easiest, obviously). Antialias, it was you who came up with some ideas for throughput enhancement using buffers. I think we can do use those ideas without sacrificing genericity, which is why I suggested we collaborate on the generic interface. Would you be interested in that? Jill
Aug 18 2004
"Arcane Jill" <Arcane_member pathlink.com> wrote in messageAntialias, it was you who came up with some ideas for throughput
using buffers. I think we can do use those ideas without sacrificing
which is why I suggested we collaborate on the generic interface. Would
interested in that?
Sure, Jill. That's what I was attempting <g> Was offering a transcoder built in the manner suggested; to experiment with said interface. Sometimes it's easier to deal with a more concrete entitiy as opposed to something completely virtual -- if nothing else, it should serve to more fully describe the suggested approach.. I'll need an email address, if this module would be of any value to you?
Aug 18 2004
In article <cg01qr$13u$1 digitaldaemon.com>, antiAlias says...I'll need an email address, if this module would be of any value to you?
If you have an account on dsource, you can contact me privately there. My username is "Arcane Jill" Jill
Aug 19 2004
Arcane Jill wrote: ...(6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's excellent ideas for throughput enhancement using buffers are part of (1) and (3), so I suggest AntiAlias and I send each other code back and forth until we are both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6).
I'm not volunteering to single-handedly re-document std.stream, but I did start a wiki page for that purpose: http://www.prowiki.org/wiki4d/wiki.cgi?DocComments/Phobos/StdStream (Anyone can edit it by clicking on the "Edit" link in the upper right corner of the page.)Arcane Jill
-- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Aug 18 2004









Arcane Jill <Arcane_member pathlink.com> 