digitalmars.D - Text in D article

Daniel Keep (12/12) Nov 18 2006 Here's a draft of an article which, hopefully, will explain some of the

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (4/7) Nov 18 2006 If you change the license you can put it in the Wiki4D ?

Alexander Panek (4/15) Nov 18 2006 Would perfectly fit into a wiki! Would be great to have such a text on
Daniel Keep (14/23) Nov 18 2006 I'm happy to change the license so it can be used elsewhere... just

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (10/13) Nov 18 2006 I would avoid the term "Unicode character" like the plague...

Daniel Keep (15/31) Nov 18 2006 Mmm. I was trying to use the correct terms where appropriate, I just

Jarrett Billingsley (5/9) Nov 18 2006 Is null-termination of string literals even part of the D spec? Or is i...
=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (29/33) Nov 18 2006 I'm not sure if the text primarily wants to discuss Unicode encodings,

Tydr Schnubbis (2/6) Nov 18 2006 Any chance of an .rtf, .doc, or even .txt? :)

Alexander Panek (2/8) Nov 18 2006

Max Samuha (5/13) Nov 18 2006 For those who is still on Windows :), thiere is a free and compact doc

Daniel Keep (8/23) Nov 18 2006 Hey, *I'm* still on Windows :P

Max Samuha (21/39) Nov 18 2006 Daniel, I didn't intend to offend you, really. Sorry, if I did.

Daniel Keep (15/61) Nov 18 2006 None taken at all. Hence the ":P" -- OpenOffice.org *does* work on

Chris Nicholson-Sauls (6/35) Nov 18 2006 Same here -- for the most part. Luckily I'm an OOo fanboy. ;) As for ...

Daniel Keep (11/51) Nov 18 2006 I actually have... oh, what's it called? PDFCreator or somesuch. That

Bill Baxter (14/29) Nov 18 2006 Thanks for the link, Max.

Daniel Keep (18/53) Nov 18 2006 You are, of course, right.

Lutger (15/25) Nov 18 2006 Cool information! I only recently became aware of how unicode works
Daniel Keep (35/52) Nov 18 2006 I used the .odt since I wanted people to be able to make modifications

Walter Bright (12/15) Nov 18 2006 I usually send articles around for review in .txt format, that way

Daniel Keep (14/34) Nov 18 2006 Usually I write up stuff in reStructuredText which is basically plain

Walter Bright (3/5) Nov 18 2006 To tell the truth, I haven't read it yet, because I am reluctant to

Daniel Keep (10/16) Nov 18 2006 Ah, well, the latest zip contains an XHTML version which should open in

Serg Kovrov (5/5) Nov 18 2006 Hi Daniel,

Daniel Keep (14/18) Nov 18 2006 Blech. No offense, but I hate web apps. Dialup makes these things slow

Serg Kovrov (10/19) Nov 18 2006 I'm haven't used this google service before, but other people publish

Daniel Keep (13/33) Nov 18 2006 Not bad, except that there's no spacing between paragraphs. It also

Pierre Rouleau (34/38) Nov 18 2006 As someone who has not been coding in D except for trying out some D

Pierre Rouleau (14/69) Nov 18 2006 And BTW, the line::

Daniel Keep (38/110) Nov 18 2006 Read down a little bit further: it points out that you want to use

Pierre Rouleau (30/155) Nov 18 2006 I saw that. My point was that the article should be a little clearer as...

Chris Nicholson-Sauls (30/36) Nov 19 2006 No, but it ought to be easy enought to make. A quick hack at it:
=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (6/10) Nov 19 2006 Yes, we are using this in wxD - it also works for GNU gettext with D.

Daniel Keep (11/11) Nov 18 2006 Ok, here's the third revision. Includes some clearer examples, a Q&A

Daniel Keep (18/26) Nov 18 2006 I finally managed to find a copy of the C99 standard, and I've filled in

Hasan Aljudy (5/31) Nov 19 2006 Nice job on the article.

Daniel Keep (17/53) Nov 19 2006 Actually, I'm pretty sure it's supposed to be konnichiha: people keep

Chris Nicholson-Sauls (7/18) Nov 19 2006 Unless my Japanese mentor was playing a prank on me (which is /entirely/...

Hasan Aljudy (11/32) Nov 19 2006 It's written konnichiha in hiragana, but it's pronounced konnichiwa,

Chris Nicholson-Sauls (13/26) Nov 19 2006 Its the Kunreisiki 「訓令式」. I prefer it, personally, because it...

Bill Baxter (12/35) Nov 19 2006 yep.

Chris Nicholson-Sauls (6/51) Nov 19 2006 Could've sworn 'wo' was used to write 'o-' though... ah well. Either th...

Bruno Medeiros (8/29) Nov 20 2006 "D" wa sugoi desu ne...

Bill Baxter (3/31) Nov 20 2006 Yeh, maybe we should have the D Conference here in Tokyo, after all. ;-...

Don Clugston (3/4) Nov 20 2006 Fabulous. It's another *genuine* FAQ, and it'd be great to see this on

BCS (3/13) Dec 22 2006 Did this paper ever get hosted somewhere? I'm looking for a URL to cite

Daniel Keep <daniel.keep.lists gmail.com> writes:

Here's a draft of an article which, hopefully, will explain some of the
details of how text in D works.  Any constructive criticism is welcomed,
along with edits or corrections.

Also, any suggestions on where to put this?  Ideally it could go on the
D website, but I think anywhere would be fine so long as we can point
people to it.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:

Daniel Keep wrote:

 Also, any suggestions on where to put this?  Ideally it could go on the
 D website, but I think anywhere would be fine so long as we can point
 people to it.

If you change the license you can put it in the Wiki4D ?

Like http://www.prowiki.org/wiki4d/wiki.cgi?CharsAndStrs

--anders

Nov 18 2006

Alexander Panek <a.panek brainsware.org> writes:

Would perfectly fit into a wiki! Would be great to have such a text on 
wiki4d or dsource.org's tutorials.

Alex

Anders F Björklund wrote:
 Daniel Keep wrote:
 
 Also, any suggestions on where to put this?  Ideally it could go on the
 D website, but I think anywhere would be fine so long as we can point
 people to it.

 
 If you change the license you can put it in the Wiki4D ?
 
 Like http://www.prowiki.org/wiki4d/wiki.cgi?CharsAndStrs
 
 --anders

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Anders F Björklund wrote:
 Daniel Keep wrote:
 
 Also, any suggestions on where to put this?  Ideally it could go on the
 D website, but I think anywhere would be fine so long as we can point
 people to it.

 
 If you change the license you can put it in the Wiki4D ?

I'm happy to change the license so it can be used elsewhere... just
trying to find a site that actually has the full FDL and isn't down :(

I chose CC At-Sa since it should be pretty permissive; all you need to
do is attribute the original author and make sure you don't change the
license.  I thought that's what the FDL did :P

 Like http://www.prowiki.org/wiki4d/wiki.cgi?CharsAndStrs

Some good info there; even a few things I didn't know!  Might try to
work some of it in.

 --anders

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:

Daniel Keep wrote:

 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

I would avoid the term "Unicode character" like the plague...
If you must have something similar, then use "code point" ?
It's OK to have it in the casual text, like "ASCII character,
BMP character, Unicode character" but better not in the lists.

It also has an example on why: printf("Hello, World!\n");
doesn't work. But it does, since string *literals* are all
NUL-terminated. However, when you then try to extend that
to a string variable, and that variable contains a slice...

--anders

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Anders F Björklund wrote:
 Daniel Keep wrote:
 
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 
 I would avoid the term "Unicode character" like the plague...
 If you must have something similar, then use "code point" ?
 It's OK to have it in the casual text, like "ASCII character,
 BMP character, Unicode character" but better not in the lists.

Mmm.  I was trying to use the correct terms where appropriate, I just
didn't want it to descend into unintelligible gibberish.  This is sort
of aimed at the person who has no idea what a 'code point' or 'code
unit' even is.

 It also has an example on why: printf("Hello, World!\n");
 doesn't work. But it does, since string *literals* are all
 NUL-terminated. However, when you then try to extend that
 to a string variable, and that variable contains a slice...
 
 --anders

Very true.  I suppose I *should* say that literals are NUL-terminated,
but I want to make it perfectly clear that relying on this is a bad
idea; is it accepted practice to simply treat all strings as if they
were possibly non NUL-terminated?

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Daniel Keep" <daniel.keep.lists gmail.com> wrote in message 
news:ejn63u$1v79$1 digitaldaemon.com...

 Very true.  I suppose I *should* say that literals are NUL-terminated,
 but I want to make it perfectly clear that relying on this is a bad
 idea; is it accepted practice to simply treat all strings as if they
 were possibly non NUL-terminated?

Is null-termination of string literals even part of the D spec?  Or is it 
entirely up to the implementation?  If the latter, then I'd put something in 
there about it, saying that it can't even be relied on..

Nov 18 2006

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:

Daniel Keep wrote:

 Very true.  I suppose I *should* say that literals are NUL-terminated,
 but I want to make it perfectly clear that relying on this is a bad
 idea; is it accepted practice to simply treat all strings as if they
 were possibly non NUL-terminated?

I'm not sure if the text primarily wants to discuss Unicode encodings,
or if it wants to discuss strings and text in D in general, but....

The main problem with printf is that you see a line like printf("foo")
and think that all strings are allowed. If neither would work, then it
wouldn't be as tempting to try it. But your conclusion/practice is OK,
you shouldn't use printf with D strings without having a *good* reason
(chances are that the C library will choke on the UTF-8 format anyway?)

Even the good ole "%.*s" hack is not portable to all possible platforms.
(it depends on how parameters are passed, think it breaks on Solaris...)
toStringz is the safest, even if you probably need to couple it with a
call to an encoding conversion if the local platform isn't using UTF-8 ?
But then you are on your own, the D library doesn't do such conversions.

Even simple D programs such as:
import std.stdio;
void main(char[][] args)
{
   foreach(char[] arg; args)
     writefln("%s", arg);
}

Will break down if you run them on a platform without UTF-8 support,
since you will get illegal strings in "args" (exceptions on writefln)
As a workaround you can cast them over to ubyte[], translate to UTF-8
from the local encoding, and cast them back into (now legal) char[]...
But I would hardly characterize that as a language "support" for the
legacy platforms, it's better to say D *requires* Unicode support ?


You might also want to touch briefly on the topics on COW and mutability
and how you might get segfaults writing to string literals. Or not... :)

--anders

Nov 18 2006

Tydr Schnubbis <fake address.dude> writes:

Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.
 

Any chance of an .rtf, .doc, or even .txt? :)

Nov 18 2006

Alexander Panek <a.panek brainsware.org> writes:

PDF would be great, too.

Tydr Schnubbis wrote:
 Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)

Nov 18 2006

Max Samuha <maxter i.com.ua> writes:

On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
<a.panek brainsware.org> wrote:

PDF would be great, too.

Tydr Schnubbis wrote:
 Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)


For those who is still on Windows :), thiere is a free and compact doc
viewer that supports the open office format
http://www.officeviewers.com/

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Max Samuha wrote:
 On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
 <a.panek brainsware.org> wrote:
 
 PDF would be great, too.

 Tydr Schnubbis wrote:
 Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)


 For those who is still on Windows :), thiere is a free and compact doc
 viewer that supports the open office format
 http://www.officeviewers.com/ 

Hey, *I'm* still on Windows :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Max Samuha <maxter i.com.ua> writes:

On Sun, 19 Nov 2006 02:43:10 +1100, Daniel Keep
<daniel.keep.lists gmail.com> wrote:

Max Samuha wrote:
 On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
 <a.panek brainsware.org> wrote:
 
 PDF would be great, too.

 Tydr Schnubbis wrote:
 Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)


 For those who is still on Windows :), thiere is a free and compact doc
 viewer that supports the open office format
 http://www.officeviewers.com/ 

Hey, *I'm* still on Windows :P

	-- Daniel

Daniel, I didn't intend to offend you, really. Sorry, if I did.

The article is great and useful. I would add a note for those coming

is a bad idea:

class BlackBox
{
	private char[] _text;
	
	this()
	{
		_text = "object state";		
	}

	char[] text()
	{
		return _text; // should be 'return _text.dup' if you
don't want the user of the object to change the internal _text;
	}	
}

Or something like that.

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Max Samuha wrote:
 On Sun, 19 Nov 2006 02:43:10 +1100, Daniel Keep
 <daniel.keep.lists gmail.com> wrote:
 
 Max Samuha wrote:
 On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
 <a.panek brainsware.org> wrote:

 PDF would be great, too.

 Tydr Schnubbis wrote:
 Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)


 For those who is still on Windows :), thiere is a free and compact doc
 viewer that supports the open office format
 http://www.officeviewers.com/ 

 Hey, *I'm* still on Windows :P

 	-- Daniel

 
 Daniel, I didn't intend to offend you, really. Sorry, if I did.

None taken at all.  Hence the ":P" -- OpenOffice.org *does* work on
Windows quite nicely :)

 The article is great and useful. I would add a note for those coming

 is a bad idea:
 
 class BlackBox
 {
 	private char[] _text;
 	
 	this()
 	{
 		_text = "object state";		
 	}
 
 	char[] text()
 	{
 		return _text; // should be 'return _text.dup' if you
 don't want the user of the object to change the internal _text;
 	}	
 }
 
 Or something like that.  

Perhaps.  This was basically written to be a quick look at all the
things people expect to work, but don't.  To be honest, I've never had
this problem since strings are arrays and arrays are passed by reference
and thus can be mutated.  But then, maybe not everyone catches that
first time :P

I'll definitely give it some thought.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Daniel Keep wrote:
 
 Max Samuha wrote:
 
On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
<a.panek brainsware.org> wrote:


PDF would be great, too.

Tydr Schnubbis wrote:

Daniel Keep wrote:

Here's a draft of an article which, hopefully, will explain some of the
details of how text in D works.  Any constructive criticism is welcomed,
along with edits or corrections.

Any chance of an .rtf, .doc, or even .txt? :)


For those who is still on Windows :), thiere is a free and compact doc
viewer that supports the open office format
http://www.officeviewers.com/ 

 
 
 Hey, *I'm* still on Windows :P
 
 	-- Daniel
 

Same here -- for the most part.  Luckily I'm an OOo fanboy.  ;)  As for making
the PDF, I 
have also noticed the bloat of OOo's PDF output, but you might try CutePDF and
see if it 
gives you better results.  (Its a virtual printer that outputs to a PDF, so its
usable 
with anything supporting printers.)

-- Chris Nicholson-Sauls

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Chris Nicholson-Sauls wrote:
 Daniel Keep wrote:
 Max Samuha wrote:

 On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
 <a.panek brainsware.org> wrote:


 PDF would be great, too.

 Tydr Schnubbis wrote:

 Daniel Keep wrote:

 Here's a draft of an article which, hopefully, will explain some
 of the
 details of how text in D works.  Any constructive criticism is
 welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)


 For those who is still on Windows :), thiere is a free and compact doc
 viewer that supports the open office format
 http://www.officeviewers.com/ 


 Hey, *I'm* still on Windows :P

     -- Daniel

 
 Same here -- for the most part.  Luckily I'm an OOo fanboy.  ;)  As for
 making the PDF, I have also noticed the bloat of OOo's PDF output, but
 you might try CutePDF and see if it gives you better results.  (Its a
 virtual printer that outputs to a PDF, so its usable with anything
 supporting printers.)
 
 -- Chris Nicholson-Sauls

I actually have... oh, what's it called?  PDFCreator or somesuch.  That
doesn't usually do that much better than OOo.  I actually had to zip the
ODT and XHTML files since the newsgroup said they were too large
together.  I doubt I'd even be able to post the PDF at all :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Max Samuha wrote:
 On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
 <a.panek brainsware.org> wrote:
 
 PDF would be great, too.

 Tydr Schnubbis wrote:
 Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)


 For those who is still on Windows :), thiere is a free and compact doc
 viewer that supports the open office format
 http://www.officeviewers.com/ 

Thanks for the link, Max.

Daniel, I like it.  Seems quite clear to me.

One minor thing.  In one section you recommend just using dchar[] 
everywhere as a solution for not slicing characters in the middle.  But 
then in the next section you recommend using std.string as a 
comprehensive solution for manipulating strings.  Unfortunately 
std.string really only deals with char[] strings.  So you might want to 
point out explicitly the dilemma that poses to the developer:  If you go 
with dchar[] and have to do a lot of string munging, you're likely to 
find lots of toUTF8's and toUCS32's popping up in your code.  If you go 
with char[] you've got to remember that mystring[1..$] may not mean what 
you think it means.

--bb

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Bill Baxter wrote:
 Max Samuha wrote:
 On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
 <a.panek brainsware.org> wrote:

 PDF would be great, too.

 Tydr Schnubbis wrote:
 Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of
 the
 details of how text in D works.  Any constructive criticism is
 welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)


 For those who is still on Windows :), thiere is a free and compact doc
 viewer that supports the open office format
 http://www.officeviewers.com/ 

 
 Thanks for the link, Max.
 
 Daniel, I like it.  Seems quite clear to me.
 
 One minor thing.  In one section you recommend just using dchar[]
 everywhere as a solution for not slicing characters in the middle.  But
 then in the next section you recommend using std.string as a
 comprehensive solution for manipulating strings.  Unfortunately
 std.string really only deals with char[] strings.  So you might want to
 point out explicitly the dilemma that poses to the developer:  If you go
 with dchar[] and have to do a lot of string munging, you're likely to
 find lots of toUTF8's and toUCS32's popping up in your code.  If you go
 with char[] you've got to remember that mystring[1..$] may not mean what
 you think it means.
 
 --bb

You are, of course, right.

"OK; if you're doing array indexing or slicing, stick to dchar; if
you're going to be using std.string, stick to char."

Doesn't really sound good.  It implies that either the standard library
has a hole in it or that indexing and slicing on char[] and wchar[]
*should* work as expected.

I think I'll change the article so that it's correct, but here's a
question for Walter:

  Is std.string going to support wchar[]s and dchar[]s?  If not, why?

Heh, they say the best way to learn something is to teach it.  Guess I'm
still learning :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Lutger <lutger.blijdestijn gmail.com> writes:

Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.
 
 Also, any suggestions on where to put this?  Ideally it could go on the
 D website, but I think anywhere would be fine so long as we can point
 people to it.
 
 	-- Daniel
 

Cool information! I only recently became aware of how unicode works 
because of this newsgroup. The current solution in D looks fine to me, 
it's just that people are not aware of it and the documentation doesn't 
help much in increasing unicode awareness. I would vote for this 
information being incorporated right into the relevant sections of the 
official documentation.

Probably the best advice I read here was that if you want your text to 
just work, you either use dchar or do all string handling with 
std.string. It's very simple, don't go messing with char[] without the 
help of phobos unless you know what you're doing. Perhaps you could put 
something like that in the beginning of your document.

D does have something similar to a string class in the form of 
std.string imo, the only thing is that's it's procedural instead of 
object-based. I don't see a problem with that.

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Alexander Panek wrote:
 PDF would be great, too.

 Tydr Schnubbis wrote:
 Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 Any chance of an .rtf, .doc, or even .txt? :)


I used the .odt since I wanted people to be able to make modifications
to it directly, if they wanted.

I really don't like .rtf or .doc (long, painful history with those two),
and .txt would probably destroy all formatting.  I usually write stuff
in reStructuredText, but just didn't on this occasion.

Finally, the OOo-produced .pdf is kinda big (by an order of magnitude).

So here is an .xhtml version, and I will continue to supply this with
any updates.  If someone needs it in something else, I'll do that as
necessary.  No point in continually converting it when I'm still
updating it :P

 If you change the license you can put it in the Wiki4D ?

I've duel-licensed it under CC At-Sa and FDL but WOW the FDL is bad.
Reading it is like trying to swim through tar.  Also, I'm not entirely
sure, but I think I may be violating the license by distributing it as
ODT... I'm... not entirely sure.

I've also got some moral objections to a few parts of the license, but I
suppose it's not enough to prevent me using it.  Problem is that GNU
state specifically that the CC At-Sa license is not compatible with the
FDL.  Bloody hippies :3

 I would avoid the term "Unicode character" like the plague...
 If you must have something similar, then use "code point" ?
 It's OK to have it in the casual text, like "ASCII character,
 BMP character, Unicode character" but better not in the lists.

I've changed references to "characters" to "code points", but it now
seems very cumbersome.  I read the Wikipedia article, but I'm still not
100% sure where the distinction lies.

So: what *precisely* is a "character", and when it is appropriate to use
the word?

 It also has an example on why: printf("Hello, World!\n");
 doesn't work. But it does, since string *literals* are all
 NUL-terminated. However, when you then try to extend that
 to a string variable, and that variable contains a slice...

I've changed it to say that "statements like the above", and put in a
note that yeah, ok, the example actually *does* work, but you really
shouldn't count on that.

Apart from the "character" -> "code point" changes, I've tried to mark
all changes by hi lighting them yellow.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Walter Bright <newshound digitalmars.com> writes:

Daniel Keep wrote:
 I really don't like .rtf or .doc (long, painful history with those two),
 and .txt would probably destroy all formatting.  I usually write stuff
 in reStructuredText, but just didn't on this occasion.

I usually send articles around for review in .txt format, that way 
everyone can read them. After all the reviews are done, then I format it 
into html (using Ddoc) and put up the web page.

The problems with sending around text files in non-text format attached 
to postings are:

1) the discussions always seem to focus on how to read the files, rather 
than their content

2) when the posting gets archived, the content of the non-text format 
becomes inaccessible (it isn't searched by google, either)

That said, I think it's great you're working on a good article on 
strings in D. It'll be very helpful.

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Walter Bright wrote:
 Daniel Keep wrote:
 I really don't like .rtf or .doc (long, painful history with those two),
 and .txt would probably destroy all formatting.  I usually write stuff
 in reStructuredText, but just didn't on this occasion.

 
 I usually send articles around for review in .txt format, that way
 everyone can read them. After all the reviews are done, then I format it
 into html (using Ddoc) and put up the web page.
 
 The problems with sending around text files in non-text format attached
 to postings are:
 
 1) the discussions always seem to focus on how to read the files, rather
 than their content
 
 2) when the posting gets archived, the content of the non-text format
 becomes inaccessible (it isn't searched by google, either)
 
 That said, I think it's great you're working on a good article on
 strings in D. It'll be very helpful.

Usually I write up stuff in reStructuredText which is basically plain
text with markup that can be read without running it through a
formatter.  In this case I didn't because... I'm not really sure why.  I
think it was just because OOo has a better spell-checker than Vim :P

I might try dumping it out to a text file and see what happens...

Also, thanks for the response.  Let me know if you think there's
anything I should include :)

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Walter Bright <newshound digitalmars.com> writes:

Daniel Keep wrote:
 Also, thanks for the response.  Let me know if you think there's
 anything I should include :)

To tell the truth, I haven't read it yet, because I am reluctant to 
download viewers and install them.

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Walter Bright wrote:
 Daniel Keep wrote:
 Also, thanks for the response.  Let me know if you think there's
 anything I should include :)

 
 To tell the truth, I haven't read it yet, because I am reluctant to
 download viewers and install them.

Ah, well, the latest zip contains an XHTML version which should open in
just about any browser.  Don't tell me you don't even browse your own
website :3

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Serg Kovrov <kovrov no.spam> writes:

Hi Daniel,

You may want to give a try to Google Docs http://docs.google.com/
Seems your case is exactly what it for.


-- 
serg.

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Serg Kovrov wrote:
 Hi Daniel,
 
 You may want to give a try to Google Docs http://docs.google.com/
 Seems your case is exactly what it for.

Blech.  No offense, but I hate web apps.  Dialup makes these things slow
as molasses to use.  I've made a website with Google Pages before, and
it was not a fun experience.

*click a button*  *wait* ... ... ... ... *page loads*

In an ideal world, I could edit in OOo or GVim and have the files
mirrored over FTP or somesuch.  I really aught to try that one of these
days...

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Serg Kovrov <kovrov no.spam> writes:

Daniel Keep wrote:
 Blech.  No offense, but I hate web apps.  Dialup makes these things slow
 as molasses to use.  I've made a website with Google Pages before, and
 it was not a fun experience.
 
 *click a button*  *wait* ... ... ... ... *page loads*
 
 In an ideal world, I could edit in OOo or GVim and have the files
 mirrored over FTP or somesuch.  I really aught to try that one of these
 days...

I'm haven't used this google service before, but other people publish 
papers like yours this way. And if one do not have a wiki (I hate 
wiki's, btw) or other means to publish versioned documents - google docs 
seems best option.

Out of curiosity, I have created new document and pasted contents from 
open office. It takes me a about 10 seconds (OO was opened already) to 
have it online - http://docs.google.com/View?docid=dtqh79k_1rbxfmb


-- 
serg.

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Serg Kovrov wrote:
 Daniel Keep wrote:
 Blech.  No offense, but I hate web apps.  Dialup makes these things slow
 as molasses to use.  I've made a website with Google Pages before, and
 it was not a fun experience.

 *click a button*  *wait* ... ... ... ... *page loads*

 In an ideal world, I could edit in OOo or GVim and have the files
 mirrored over FTP or somesuch.  I really aught to try that one of these
 days...

 
 I'm haven't used this google service before, but other people publish
 papers like yours this way. And if one do not have a wiki (I hate
 wiki's, btw) or other means to publish versioned documents - google docs
 seems best option.
 
 Out of curiosity, I have created new document and pasted contents from
 open office. It takes me a about 10 seconds (OO was opened already) to
 have it online - http://docs.google.com/View?docid=dtqh79k_1rbxfmb
 

Not bad, except that there's no spacing between paragraphs.  It also
destroyed indenting on all the code examples :3

In any case, I dumped out the text to a plain text file, and re-marked
it up in reStructuredText.  Generates almost exactly the same HTML
output, but now people can't complain they can't view it :P  I'll post
it up as soon as I've worked out if I'm going to include this "Q&A" section.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Pierre Rouleau <prouleau impathnetworks.com> writes:

Daniel Keep wrote:

 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.
 

As someone who has not been coding in D except for trying out some D 
every so often, I find:

- the discussion of Unicode and its support of D clear and useful
- the description of the use of printf and string confusing:

You wrote::

    Back before D had the std.stdio.writefln method, most examples used
    the old C function printf. This worked fine until you tried to output
    a string::

       printf(“Hello, World!\n”);

    The above statement was very likely to print out garbage that left
    many people scratching their heads. The reason is that C uses
    NUL-terminated strings, whereas D uses true arrays. In other words:

    - Strings in C are a pointer to the first character. A string ends at
      the first NUL character.
    - Strings in D are a pointer to the first character, followed by a
      length. There is no terminating character.

    And that's the problem: printf is looking for a terminator that
    doesn't necessarily exist.


That would lead me to believe that I could not use printf to print a 
string litteral.  But then I just wrote and compiled the following D code::

   int
   main()
   {
      printf("Hello!\n");
      printf("Bye!\n");
      return 1;
   }

But it prints just fine.  So, something must be missing in your 
explanation or my understanding.  I'll have to read more about D to 
understand.

Just my 2 cents,

--
P.R.

Nov 18 2006

Pierre Rouleau <prouleau impathnetworks.com> writes:

Pierre Rouleau wrote:

 Daniel Keep wrote:
 
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 
 As someone who has not been coding in D except for trying out some D 
 every so often, I find:
 
 - the discussion of Unicode and its support of D clear and useful
 - the description of the use of printf and string confusing:
 
 You wrote::
 
    Back before D had the std.stdio.writefln method, most examples used
    the old C function printf. This worked fine until you tried to output
    a string::
 
       printf(“Hello, World!\n”);
 
    The above statement was very likely to print out garbage that left
    many people scratching their heads. The reason is that C uses
    NUL-terminated strings, whereas D uses true arrays. In other words:
 
    - Strings in C are a pointer to the first character. A string ends at
      the first NUL character.
    - Strings in D are a pointer to the first character, followed by a
      length. There is no terminating character.
 
    And that's the problem: printf is looking for a terminator that
    doesn't necessarily exist.
 
 
 That would lead me to believe that I could not use printf to print a 
 string litteral.  But then I just wrote and compiled the following D code::
 
   int
   main()
   {
      printf("Hello!\n");
      printf("Bye!\n");
      return 1;
   }
 
 But it prints just fine.  So, something must be missing in your 
 explanation or my understanding.  I'll have to read more about D to 
 understand.
 
 Just my 2 cents,
 
 -- 
 P.R.
 
 

And BTW, the line::

   printf(“Hello, World!\n”);

does not compile because of the non ASCII characters used for quoting.

So other questions comes to mind:

- Can D source code contain Unicode characters freely?
- If so, how is it done?
- If not, how can we define a Unicode string literal?
- Does D have a Unicode string type like, say Python, or is it better at 
specifying them?
- How do we handle internationalization of presentation strings in D?
- gettext support...
- Do we have to use text codecs (as in Python for example)?


This information would fit quite nicely in an article describing text in D.

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Pierre Rouleau wrote:
 Pierre Rouleau wrote:
 
 Daniel Keep wrote:

 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.

 As someone who has not been coding in D except for trying out some D
 every so often, I find:

 - the discussion of Unicode and its support of D clear and useful
 - the description of the use of printf and string confusing:

 You wrote::

    Back before D had the std.stdio.writefln method, most examples used
    the old C function printf. This worked fine until you tried to output
    a string::

       printf(“Hello, World!\n”);

    The above statement was very likely to print out garbage that left
    many people scratching their heads. The reason is that C uses
    NUL-terminated strings, whereas D uses true arrays. In other words:

    - Strings in C are a pointer to the first character. A string ends at
      the first NUL character.
    - Strings in D are a pointer to the first character, followed by a
      length. There is no terminating character.

    And that's the problem: printf is looking for a terminator that
    doesn't necessarily exist.


 That would lead me to believe that I could not use printf to print a
 string litteral.  But then I just wrote and compiled the following D
 code::

   int
   main()
   {
      printf("Hello!\n");
      printf("Bye!\n");
      return 1;
   }

 But it prints just fine.  So, something must be missing in your
 explanation or my understanding.  I'll have to read more about D to
 understand.

 Just my 2 cents,

 -- 
 P.R.


Read down a little bit further: it points out that you want to use
std.string.toStringz to ensure that the NUL terminator exists.

It also admits that the example actually DOES work, simply because dmd
sticks the NUL terminator on the end of all string literals.  But as
someone already pointed out, if what you're dealing with is NOT a string
literal: a slice of another string, or something read from disk, then it
won't be there and the code will choke.

I should probably reorganise the section to be clearer on this.  I used
that (wrong) example because an example that actually fails would be
somewhat longer, and probably make people think "Ok, so why can't I use
slices to C functions?  Are they not really strings?"

 
 And BTW, the line::
 
   printf(“Hello, World!\n”);
 
 does not compile because of the non ASCII characters used for quoting.

Damnit... every time I go to write prose that option's off, and every
time I write code examples it's ON.  I swear OOo is out to get me >_<

 So other questions comes to mind:

Off the top of my head:

 - Can D source code contain Unicode characters freely?

- Yup, you betcha!

 - If so, how is it done?

- Use a text editor that supports saving files in UTF-8.  I'm not sure
off the top of my head if UTF-16 and UTF-32 are supported directly...

 - If not, how can we define a Unicode string literal?

- If you don't have access to a Unicode-enabled editor, you can use
escape sequences with \uXXXX (or \UXXXXXXXX for higher Unicode code points.)

 - Does D have a Unicode string type like, say Python, or is it better at
 specifying them?

- That's *all* D has.  Remember, char, wchar and dchar correspond to
UTF-8, UTF-16 and UTF-32 which are the three main ways of storing
Unicode text.  Internally, Python uses UTF-16.

 - How do we handle internationalization of presentation strings in D?
 - gettext support...

I don't know if gettext would work in D, simply because I've never seen
it tried.  D doesn't have any *direct* support for this, tho.

(Then again, I'm yet to see *any* programming language that does.)

 - Do we have to use text codecs (as in Python for example)?

D has no built-in support for converting between code pages, as far as I
know.  You need to download and use a conversion library like iconv to
convert between code pages.

 This information would fit quite nicely in an article describing text in D.

I may have to restructure it into two sections: a "What the... it's a
borken!" section and a "Q&A" section.

Thanks for the feedback.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Pierre Rouleau <prouleau impathnetworks.com> writes:

Daniel Keep wrote:

 
 Pierre Rouleau wrote:
 
Pierre Rouleau wrote:


Daniel Keep wrote:


Here's a draft of an article which, hopefully, will explain some of the
details of how text in D works.  Any constructive criticism is welcomed,
along with edits or corrections.

As someone who has not been coding in D except for trying out some D
every so often, I find:

- the discussion of Unicode and its support of D clear and useful
- the description of the use of printf and string confusing:

You wrote::

   Back before D had the std.stdio.writefln method, most examples used
   the old C function printf. This worked fine until you tried to output
   a string::

      printf(“Hello, World!\n”);

   The above statement was very likely to print out garbage that left
   many people scratching their heads. The reason is that C uses
   NUL-terminated strings, whereas D uses true arrays. In other words:

   - Strings in C are a pointer to the first character. A string ends at
     the first NUL character.
   - Strings in D are a pointer to the first character, followed by a
     length. There is no terminating character.

   And that's the problem: printf is looking for a terminator that
   doesn't necessarily exist.


That would lead me to believe that I could not use printf to print a
string litteral.  But then I just wrote and compiled the following D
code::

  int
  main()
  {
     printf("Hello!\n");
     printf("Bye!\n");
     return 1;
  }

But it prints just fine.  So, something must be missing in your
explanation or my understanding.  I'll have to read more about D to
understand.

Just my 2 cents,

-- 
P.R.


 
 
 Read down a little bit further: it points out that you want to use
 std.string.toStringz to ensure that the NUL terminator exists.
 

I saw that.  My point was that the article should be a little clearer as 
to why you would want to use it.  As an introduction of text processing 
in D, and a treatment of the different string format (NUL terminated or 
lenght-based) a newbie would need to know the implications of the code 
he writes, the effect of transformations (such as slices or whatever).


 It also admits that the example actually DOES work, simply because dmd
 sticks the NUL terminator on the end of all string literals.  But as
 someone already pointed out, if what you're dealing with is NOT a string
 literal: a slice of another string, or something read from disk, then it
 won't be there and the code will choke.
 
 I should probably reorganise the section to be clearer on this.  I used
 that (wrong) example because an example that actually fails would be
 somewhat longer, and probably make people think "Ok, so why can't I use
 slices to C functions?  Are they not really strings?"

 
 
And BTW, the line::

  printf(“Hello, World!\n”);

does not compile because of the non ASCII characters used for quoting.

 
 
 Damnit... every time I go to write prose that option's off, and every
 time I write code examples it's ON.  I swear OOo is out to get me >_<

I also like reStructuredText myself...  but writing extra symbols is a 
little trickier...

 
So other questions comes to mind:

 Off the top of my head:
- Can D source code contain Unicode characters freely?

 - Yup, you betcha!
- If so, how is it done?

 - Use a text editor that supports saving files in UTF-8.  I'm not sure
 off the top of my head if UTF-16 and UTF-32 are supported directly...

Readers might be interested to know that they can use these in the 
source code file. As well, they wonder whether or not non ASCII 
characters are acceptables for things such as variable names.


- If not, how can we define a Unicode string literal?

 - If you don't have access to a Unicode-enabled editor, you can use
 escape sequences with \uXXXX (or \UXXXXXXXX for higher Unicode code points.)
- Does D have a Unicode string type like, say Python, or is it better at
specifying them?

 - That's *all* D has.  Remember, char, wchar and dchar correspond to
 UTF-8, UTF-16 and UTF-32 which are the three main ways of storing
 Unicode text.  Internally, Python uses UTF-16.
 
 
- How do we handle internationalization of presentation strings in D?
- gettext support...

 
 
 I don't know if gettext would work in D, simply because I've never seen
 it tried.  D doesn't have any *direct* support for this, tho.

I can't see why it would not.  Can we have a function named  '_()' in D?
Since gettext philosophy is to write all presentation strings in 
English, then the code can be written in ASCII-only files and since the 
strings are Unicode, the translated strings could contain any symbol at 
runtime.

One aspect is the string formatting.  Does D support string formatting 
similar to Python's dictionary-based formatting like:

a_dict = {person_name : 'Daniel'}
a_string = 'Hello %(person_name)s ! How are you?' % a_dict

Python dictionaries are very useful for that purpose.  Translating 
presentation strings works better when the entire string context is 
available to the person doing the natural language translation.  As far 
as I am concerned, this is an important feature for programming language 
used to (client-side) write applications.


 
 (Then again, I'm yet to see *any* programming language that does.)
 

Support for gettext does not have to be built in the language.  Simply 
that the language does not preclude using gettext.

 
- Do we have to use text codecs (as in Python for example)?

 
 
 D has no built-in support for converting between code pages, as far as I
 know.  You need to download and use a conversion library like iconv to
 convert between code pages.
 


 
 Thanks for the feedback.
 

You're welcome.

--

Pierre

Nov 18 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Pierre Rouleau wrote:
 One aspect is the string formatting.  Does D support string formatting 
 similar to Python's dictionary-based formatting like:
 
 a_dict = {person_name : 'Daniel'}
 a_string = 'Hello %(person_name)s ! How are you?' % a_dict
 

No, but it ought to be easy enought to make.  A quick hack at it:























Don't quote me on that working exactly right as is, since its just off the top
of my head. 
  But usage would be fairly straight forward, while not quite as pretty as
Python since we 
don't yet have associative literals.





-- Chris Nicholson-Sauls

Nov 19 2006

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:

Pierre Rouleau wrote:

 I don't know if gettext would work in D, simply because I've never seen
 it tried.  D doesn't have any *direct* support for this, tho.

 
 I can't see why it would not.  Can we have a function named  '_()' in D?

Yes, we are using this in wxD - it also works for GNU gettext with D.

It's defined as an alias that leads to a function with a longer name:

public static string wx.wxObject.GetTranslation(string str);

extern(C) char * gettext (char * msgid);

--anders

Nov 19 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Ok, here's the third revision.  Includes some clearer examples, a Q&A
section, and is now written in plain text, and then dumped out to HTML.
 If anyone complains about what file format it's in now, they can get
stuffed :P  (And *yes*, the HTML is generated directly from the .txt file.)

Again, all feedback and suggestions is welcome.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Daniel Keep wrote:
 Ok, here's the third revision.  Includes some clearer examples, a Q&A
 section, and is now written in plain text, and then dumped out to HTML.
  If anyone complains about what file format it's in now, they can get
 stuffed :P  (And *yes*, the HTML is generated directly from the .txt file.)
 
 Again, all feedback and suggestions is welcome.
 
 	-- Daniel

I finally managed to find a copy of the C99 standard, and I've filled in
what characters you can use... although it's still a bit tricky to
understand.  That said, I added an example which shows using function
names written entirely in hiragana, so it obviously works :P

Secondly, I've removed the references to std.utf.stride.  After going
over the docs again, and actually *testing* the code, it turns out I was
dead wrong on what stride does: it returns the length of the code point
sequence at the given location, not the number of code points from that
location.  Whoopsie.

I've replaced the code showing how to use std.utf.stride with a small
function that correctly computes the number of code points in a string.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 18 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Daniel Keep wrote:
 Daniel Keep wrote:
 Ok, here's the third revision.  Includes some clearer examples, a Q&A
 section, and is now written in plain text, and then dumped out to HTML.
  If anyone complains about what file format it's in now, they can get
 stuffed :P  (And *yes*, the HTML is generated directly from the .txt file.)

 Again, all feedback and suggestions is welcome.

 	-- Daniel

 
 I finally managed to find a copy of the C99 standard, and I've filled in
 what characters you can use... although it's still a bit tricky to
 understand.  That said, I added an example which shows using function
 names written entirely in hiragana, so it obviously works :P

konnichiwa!!!!!!11one :D

 
 Secondly, I've removed the references to std.utf.stride.  After going
 over the docs again, and actually *testing* the code, it turns out I was
 dead wrong on what stride does: it returns the length of the code point
 sequence at the given location, not the number of code points from that
 location.  Whoopsie.
 
 I've replaced the code showing how to use std.utf.stride with a small
 function that correctly computes the number of code points in a string.
 
 	-- Daniel
 

Nice job on the article.
Why don't you place it on the dsource tutorials section? It's a wiki 
system, so you can update it more easily.

Nov 19 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Hasan Aljudy wrote:
 
 
 Daniel Keep wrote:
 Daniel Keep wrote:
 Ok, here's the third revision.  Includes some clearer examples, a Q&A
 section, and is now written in plain text, and then dumped out to HTML.
  If anyone complains about what file format it's in now, they can get
 stuffed :P  (And *yes*, the HTML is generated directly from the .txt
 file.)

 Again, all feedback and suggestions is welcome.

     -- Daniel

 I finally managed to find a copy of the C99 standard, and I've filled in
 what characters you can use... although it's still a bit tricky to
 understand.  That said, I added an example which shows using function
 names written entirely in hiragana, so it obviously works :P

 
 konnichiwa!!!!!!11one :D

Actually, I'm pretty sure it's supposed to be konnichiha: people keep
spelling and saying it "konnichiwa" because westerners misheard what the
Japanese were saying :3

(Do correct me I'm wrong, btw...)

 Secondly, I've removed the references to std.utf.stride.  After going
 over the docs again, and actually *testing* the code, it turns out I was
 dead wrong on what stride does: it returns the length of the code point
 sequence at the given location, not the number of code points from that
 location.  Whoopsie.

 I've replaced the code showing how to use std.utf.stride with a small
 function that correctly computes the number of code points in a string.

     -- Daniel

 
 Nice job on the article.
 Why don't you place it on the dsource tutorials section? It's a wiki
 system, so you can update it more easily.

Honestly, I'd love to see this on the official D website; from the
number of people coming to the forums saying "why doesn't this work?"
and "strings are teh borken!" it's obvious we need to have something
that says "this is how things work and why they work the way they do."

But if Walter doesn't want it, then I'm happy to stick it up on the
Wiki... yet another format I'll have to change it over to :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Nov 19 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Daniel Keep wrote:
 
 Hasan Aljudy wrote:
konnichiwa!!!!!!11one :D

 
 
 Actually, I'm pretty sure it's supposed to be konnichiha: people keep
 spelling and saying it "konnichiwa" because westerners misheard what the
 Japanese were saying :3
 
 (Do correct me I'm wrong, btw...)

Unless my Japanese mentor was playing a prank on me (which is /entirely/
possible) its 
actually a quirk thing.  While it is written "kon'ityi-ha" it is indeed
pronouned 
"kon'nityi-wa", as the 'ha' kana is written for the particle 'wa' for some
long-forgotten 
reason.  (Kind of like the archaic 'wo' kana is still used for the 'o' prefix,
as in 
"(w)o-genki desu-ka".)

-- Chris Nicholson-Sauls

Nov 19 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Chris Nicholson-Sauls wrote:
 Daniel Keep wrote:
 Hasan Aljudy wrote:
 konnichiwa!!!!!!11one :D


 Actually, I'm pretty sure it's supposed to be konnichiha: people keep
 spelling and saying it "konnichiwa" because westerners misheard what the
 Japanese were saying :3

 (Do correct me I'm wrong, btw...)


It's written konnichiha in hiragana, but it's pronounced konnichiwa, 
because the "ha" is actually a particle, and the "ha" particle is 
pronounced "wa" even though it's written as "ha".
I think the phrase is basically an incomplete sentence understood to be 
"It's morning" or something like that ..

 
 Unless my Japanese mentor was playing a prank on me (which is /entirely/ 
 possible) its actually a quirk thing.  While it is written "kon'ityi-ha" 
 it is indeed pronouned "kon'nityi-wa", as the 'ha' kana is written for 
 the particle 'wa' for some long-forgotten reason.  (Kind of like the 
 archaic 'wo' kana is still used for the 'o' prefix, as in "(w)o-genki 
 desu-ka".)

kon'ity-ha?
Wow, what kind of romanization system is that? Now /that/ is a prank ..

I think what you said about the ha/wa is correct thu. From what I've 
gathered, the particle used to be pronounced "ha" but its pronunciation 
has changed over the centuries, while the spelling for it didn't.


 
 -- Chris Nicholson-Sauls

Nov 19 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Hasan Aljudy wrote:
 Chris Nicholson-Sauls wrote:
 Unless my Japanese mentor was playing a prank on me (which is 
 /entirely/ possible) its actually a quirk thing.  While it is written 
 "kon'ityi-ha" it is indeed pronouned "kon'nityi-wa", as the 'ha' kana 
 is written for the particle 'wa' for some long-forgotten reason.  
 (Kind of like the archaic 'wo' kana is still used for the 'o' prefix, 
 as in "(w)o-genki desu-ka".)

 
 kon'ity-ha?
 Wow, what kind of romanization system is that? Now /that/ is a prank ..

Its the Kunreisiki 「訓令式」.  I prefer it, personally, because it stays
a bit closer to 
the way it would be written in hiragana/katakana.  (Like using "si" rather than
"shi", 
because that's the only way it is pronounced, or using "tya" rather than "cha"
because it 
would be written 「ちゃ」 in the kata.)
Weblink: http://www.halcat.com/roomazi/doc/iso3602.html

That said, though... I actually did make a mistake.  *sigh*  It should've just
been "ti" 
rather than "tyi" at the end.  That's what I get for responding on the way to
bed, though.

And I think you're right about it meaning basically "its morning" or "its a
day", or some 
such.  I never really asked, but looking at the kanji its written with, it
seems to be a 
really awkward way of saying "good weather" or some such... ah hell.  :)

 I think what you said about the ha/wa is correct thu. From what I've 
 gathered, the particle used to be pronounced "ha" but its pronunciation 
 has changed over the centuries, while the spelling for it didn't.

That could well be.  Would make a little more sense than it just is, and that's
that.

-- Chris Nicholson-Sauls

Nov 19 2006

Bill Baxter <wbaxter gmail.com> writes:

Chris Nicholson-Sauls wrote:
 Daniel Keep wrote:
 
 Hasan Aljudy wrote:

 konnichiwa!!!!!!11one :D



 Actually, I'm pretty sure it's supposed to be konnichiha: people keep
 spelling and saying it "konnichiwa" because westerners misheard what the
 Japanese were saying :3

 (Do correct me I'm wrong, btw...)

 
 
 Unless my Japanese mentor was playing a prank on me (which is /entirely/ 
 possible) its actually a quirk thing.  While it is written "kon'ityi-ha" 
 it is indeed pronouned "kon'nityi-wa", as the 'ha' kana is written for 
 the particle 'wa' for some long-forgotten reason.  

yep.

(Kind of like the
 archaic 'wo' kana is still used for the 'o' prefix, as in "(w)o-genki 
 desu-ka".)

Now you're just making stuff up.  :-) 'wo' is used as a particle 
indicating the object of a transitive verb.  Like "hon wo yomu" (read a 
book)
    本を読む
Nothing to do with with the polite 'o' prefix in, o-genki desu ka:
    御元気ですか
(Though you're more likely to see it written with the hiragana 'o' 
instead: お元気ですか。)

--bb

Nov 19 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Bill Baxter wrote:
 Chris Nicholson-Sauls wrote:
 
 Daniel Keep wrote:

 Hasan Aljudy wrote:

 konnichiwa!!!!!!11one :D




 Actually, I'm pretty sure it's supposed to be konnichiha: people keep
 spelling and saying it "konnichiwa" because westerners misheard what the
 Japanese were saying :3

 (Do correct me I'm wrong, btw...)



 Unless my Japanese mentor was playing a prank on me (which is 
 /entirely/ possible) its actually a quirk thing.  While it is written 
 "kon'ityi-ha" it is indeed pronouned "kon'nityi-wa", as the 'ha' kana 
 is written for the particle 'wa' for some long-forgotten reason.  

 
 
 yep.
 
 (Kind of like the
 
 archaic 'wo' kana is still used for the 'o' prefix, as in "(w)o-genki 
 desu-ka".)

 
 
 Now you're just making stuff up.  :-) 'wo' is used as a particle 
 indicating the object of a transitive verb.  Like "hon wo yomu" (read a 
 book)
    本を読む
 Nothing to do with with the polite 'o' prefix in, o-genki desu ka:
    御元気ですか
 (Though you're more likely to see it written with the hiragana 'o' 
 instead: お元気ですか。)
 
 --bb

Could've sworn 'wo' was used to write 'o-' though... ah well.  Either that one
/was/ a 
prank, or its just because I haven't touched hardly any Japanese in a couple
years or so. 
  The shame.  :)  Guess I could've played it safe and dug out one of my
dictionaries to 
check.  But where's the fun in that?

-- Chris Nicholson-Sauls

Nov 19 2006

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Chris Nicholson-Sauls wrote:
 Daniel Keep wrote:
 Hasan Aljudy wrote:
 konnichiwa!!!!!!11one :D


 Actually, I'm pretty sure it's supposed to be konnichiha: people keep
 spelling and saying it "konnichiwa" because westerners misheard what the
 Japanese were saying :3

 (Do correct me I'm wrong, btw...)

 
 Unless my Japanese mentor was playing a prank on me (which is /entirely/ 
 possible) its actually a quirk thing.  While it is written "kon'ityi-ha" 
 it is indeed pronouned "kon'nityi-wa", as the 'ha' kana is written for 
 the particle 'wa' for some long-forgotten reason.  (Kind of like the 
 archaic 'wo' kana is still used for the 'o' prefix, as in "(w)o-genki 
 desu-ka".)
 
 -- Chris Nicholson-Sauls

"D" wa sugoi desu ne...

Whoa, do D community members have some bias towards japanese learning? I 
myself am a (slow, but active) learner of japanese (finished Pimsleur's 
Japanese Level 3 some time ago).



-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Nov 20 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Bruno Medeiros wrote:
 Chris Nicholson-Sauls wrote:
 Daniel Keep wrote:
 Hasan Aljudy wrote:
 konnichiwa!!!!!!11one :D


 Actually, I'm pretty sure it's supposed to be konnichiha: people keep
 spelling and saying it "konnichiwa" because westerners misheard what the
 Japanese were saying :3

 (Do correct me I'm wrong, btw...)

 Unless my Japanese mentor was playing a prank on me (which is 
 /entirely/ possible) its actually a quirk thing.  While it is written 
 "kon'ityi-ha" it is indeed pronouned "kon'nityi-wa", as the 'ha' kana 
 is written for the particle 'wa' for some long-forgotten reason.  
 (Kind of like the archaic 'wo' kana is still used for the 'o' prefix, 
 as in "(w)o-genki desu-ka".)

 -- Chris Nicholson-Sauls

 
 "D" wa sugoi desu ne...
 
 Whoa, do D community members have some bias towards japanese learning? I 
 myself am a (slow, but active) learner of japanese (finished Pimsleur's 
 Japanese Level 3 some time ago).

Yeh, maybe we should have the D Conference here in Tokyo, after all.  ;-)


--bb

Nov 20 2006

Don Clugston <dac nospam.com.au> writes:

Daniel Keep wrote:
 Again, all feedback and suggestions is welcome.

Fabulous. It's another *genuine* FAQ, and it'd be great to see this on 
the official website.

Nov 20 2006

BCS <BCS pathilink.com> writes:

Daniel Keep wrote:
 Here's a draft of an article which, hopefully, will explain some of the
 details of how text in D works.  Any constructive criticism is welcomed,
 along with edits or corrections.
 
 Also, any suggestions on where to put this?  Ideally it could go on the
 D website, but I think anywhere would be fine so long as we can point
 people to it.
 
 	-- Daniel
 


Did this paper ever get hosted somewhere? I'm looking for a URL to cite 
it by.

Dec 22 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Text in D article