www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Random string samples & unicode - Reprise

reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

I assume you have missed most of the things I was trying to say, maybe you have not even read the original post. So I try to explain better a subset of the things I have written. This is a quite common piece of Python code: from random import sample d = "0123456789" print "".join(sample(d, 2)) I need to perform the same thing in D. For me it's not easy to do that in D2 with Phobos2. This doesn't work: import std.stdio, std.random, std.array, std.range; void main() { string d = "0123456789"; string res = array(take(randomCover(d, rndGen), 2)); writeln(res); } It returns: test.d(4): Error: cannot implicitly convert expression (array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string If I change it like this: import std.stdio, std.random, std.array, std.range; void main() { string d = "0123456789"; dchar[] res = array(take(randomCover(d, rndGen), 2)); writeln(res); } It doesn't work, and gives a cloud of errors: ...\dmd\src\phobos\std\random.d(890): Error: cast(dchar)(this._input[this._current]) is not an lvalue ...\dmd\src\phobos\std\random.d(907): Error: template std.random.uniform(string boundaries = "[)",T1,T2,UniformRandomNumberGenerator) if (is(CommonType!(T1,UniformRandomNumberGenerator) == void) && !is(CommonType!(T1,T2) == void)) does not match any function template declaration ...\dmd\src\phobos\std\random.d(907): Error: template std.random.uniform(string boundaries = "[)",T1,T2,UniformRandomNumberGenerator) if (is(CommonType!(T1,UniformRandomNumberGenerator) == void) && !is(CommonType!(T1,T2) == void)) cannot deduce template function from argument types !()(int,uint,MersenneTwisterEngine!(uint,32,624,397,31,-1727483681u,11,7,-1658038656u,15,-272236544u,18)) If I replace the d string with a dchar[], it works: import std.stdio, std.random, std.array, std.range; void main() { dchar[] d = "0123456789"d.dup; dchar[] res = array(take(randomCover(d, rndGen), 2)); writeln(res); } But now all strings in this little program are dchar arrays. What I am trying to say is that with the recent changes to the management of the strings in std.algorithm, when you use strings and char arrays, and you use algorithms over them, the dchar becomes viral, and you end using in most of the code composed dchar arrays or dstrings (unless you cast things back to char[]/string, and I don't know if this is possible in SafeD). Do you understand now? I am mistaken? Bye, bearophile
Sep 12 2010
next sibling parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 12 September 2010 17:09:04 bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

I assume you have missed most of the things I was trying to say, maybe you have not even read the original post. So I try to explain better a subset of the things I have written. This is a quite common piece of Python code: from random import sample d = "0123456789" print "".join(sample(d, 2))

You do seem to try to do a lot of things that most other folks never even think of doing, let alone have a need to. This is one of them. That's probably why Andrei reacted the way that he did.
 I need to perform the same thing in D.
 For me it's not easy to do that in D2 with Phobos2.
 
 This doesn't work:
 
 import std.stdio, std.random, std.array, std.range;
 void main() {
     string d = "0123456789";
     string res = array(take(randomCover(d, rndGen), 2));
     writeln(res);
 }
 
 It returns:
 test.d(4): Error: cannot implicitly convert expression
 (array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string

I've found that if you want a string out of array(), what you need to do is to!string(array(...))). I don't know about this particular case, and it's a bit annoying - particularly when you started with a string in the first place - so perhaps take(), and until(), and the others like them that have this problem should be altered so that array() would produce a string if you passed them a string, but for the moment to!string seems to be the solution. I would point out, however, that if you're trying to grab random characters from a string, that's likely to work best with a dstring because it supports random access, so there's a decent chance that dstring is really what you want anyway, and trying to use string is just going to me a lot of conversions no matter how well put together the Phobos functions are, simply because the underlying algorithm works best with random access and string doesn't provide it. Just one of the irritations of UTF-8 vs UTF-16 vs UTF-32. Unicode is wonderful and unicode sucks. At least D handles in explicitly as part of the language, which is a big improvement over languages like C, C++, or Java. - Jonathan M Davis
Sep 12 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 09/12/2010 07:28 PM, Jonathan M Davis wrote:
 On Sunday 12 September 2010 17:09:04 bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

I assume you have missed most of the things I was trying to say, maybe you have not even read the original post. So I try to explain better a subset of the things I have written. This is a quite common piece of Python code: from random import sample d = "0123456789" print "".join(sample(d, 2))

You do seem to try to do a lot of things that most other folks never even think of doing, let alone have a need to. This is one of them. That's probably why Andrei reacted the way that he did.

No, it's not that at all. It's just this:
 I'll add it to Bugzilla later. But even if you remove that bug,
 forcing me to use dstrings in the whole program is strange. Or maybe
 it's a good thing, and the natural state for D programs is to just
 use dstrings everywhere. Andrei may offer his opinion on the
 situation.

I think it's not difficult to infer I wouldn't advocate using 32 bits characters everywhere. Andrei
Sep 12 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 I think it's not difficult to infer I wouldn't advocate using 32 bits 
 characters everywhere.

Yet, using std.algorithm on strings you may end doing that. How do you exactly suggest me to translate something like the original Python code to D? Bye, bearophile
Sep 12 2010
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:
 You do seem to try to do a lot of things that most other folks never even
think 
 of doing, let alone have a need to. This is one of them.

Choosing few random chars out of a sequence of possible chars is very normal in Python, it's even a common thing.
 I've found that if you want a string out of array(), what you need to do is 
 to!string(array(...))).

I see.
 if you're trying to grab random characters from 
 a string, that's likely to work best with a dstring because it supports random 
 access, so there's a decent chance that dstring is really what you want anyway,

This may be right. But as I have tried to explain two times, you may end up having string-processing code made mostly of dstrings. Bye, bearophile
Sep 12 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 09/12/2010 08:13 PM, bearophile wrote:
 Jonathan M Davis:
 You do seem to try to do a lot of things that most other folks never even think
 of doing, let alone have a need to. This is one of them.

Choosing few random chars out of a sequence of possible chars is very normal in Python, it's even a common thing.
 I've found that if you want a string out of array(), what you need to do is
 to!string(array(...))).

I see.
 if you're trying to grab random characters from
 a string, that's likely to work best with a dstring because it supports random
 access, so there's a decent chance that dstring is really what you want anyway,

This may be right. But as I have tried to explain two times, you may end up having string-processing code made mostly of dstrings.

No, you end up having string-processing code dealing with ranges of dchar. Which is in fact exactly as it should. If you want to keep the comparison with Python complete, Python's support for Unicode also needs to be part of the discussion. Andrei
Sep 12 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 No, you end up having string-processing code dealing with ranges of 
 dchar.

Well, in several situations it's better to produce a real string/dstring. Even in Haskell, that is designed to manage lazy computation well, you sometimes create eager lists/arrays to simplify the types or the code or to make the code more deterministic.
 If you want to keep the 
 comparison with Python complete, Python's support for Unicode also needs 
 to be part of the discussion.

Right. My code was written in Python 2.x. In Python 3.x the situation is different, all strings are Unicode on default (they are all UTF 16 or UTF 32 according to the way you have compiled CPython) (and there is a built-in bytearray, that is an array of bytes that in some situations is seen as an ASCII string). So in Python it's like using dstrings everywere (in Python there's no char type, it's a string of length 1) or using lazy generators of them. Bye, bearophile
Sep 12 2010
prev sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 12 September 2010 19:22:02 bearophile wrote:
 Andrei Alexandrescu:
 No, you end up having string-processing code dealing with ranges of
 dchar.

Well, in several situations it's better to produce a real string/dstring. Even in Haskell, that is designed to manage lazy computation well, you sometimes create eager lists/arrays to simplify the types or the code or to make the code more deterministic.

Personally, I've had to use strict functions rather than lazy ones in haskell primarily to save memory by forcing the program to actually do the computations rather than putting it off and piling up the whole list of operations to possibly do later in memory. When working on my thesis, I had a program which made me run out of memory - all 4 GB of memory and 6GB of swap - because it wasn't processing _any_ of the files that I gave it until it had gotten the last one. I had to make it process each file and save the result before processing the next file rather than processing them all and then saving the result.
 If you want to keep the
 comparison with Python complete, Python's support for Unicode also needs
 to be part of the discussion.

Right. My code was written in Python 2.x. In Python 3.x the situation is different, all strings are Unicode on default (they are all UTF 16 or UTF 32 according to the way you have compiled CPython) (and there is a built-in bytearray, that is an array of bytes that in some situations is seen as an ASCII string). So in Python it's like using dstrings everywere (in Python there's no char type, it's a string of length 1) or using lazy generators of them.

Well, then in comparing python 3 with D, it would then seem like you wouldn't really lose anything to be using dstrings everywhere. Sure, it's nice to be able to save space by using string, but if it's a comparison between python and D and you end up using UTF-32 in both, then it doesn't seem to me that it's all that big a deal when porting code. Now, in comparing python 2 and D, that may be a different issue, but it sounds like the python 2 strings aren't unicode, which could be problematic. The issues with UTF-8 vs UTF-32 and random access are just a natural side-effect of having all strings be unicode. And honestly, I _really_ don't want having non-unicode strings to be at all normal in D. The fact that D forces unicode is a _good_ thing. - Jonathan M Davis
Sep 12 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:
 Well, I don't think that I've ever seen a program that did that sort of thing.

It's common Python code (and maybe in future it will be common D2 code). In another answer I have given few examples to Andrei.
 If your string processing doesn't require random access, then you 
 avoid the problem, but as long as it needs random access, you're pretty much 
 stuck.

I understand, this is probably the answer I was looking for, thank you :-) Bye, bearophile
Sep 12 2010
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Jonathan M Davis:
 Well, I don't think that I've ever seen a program that did that sort of thing.


 If your string processing doesn't require random access, then you
 avoid the problem, but as long as it needs random access, you're pretty much
 stuck.

Bye, bearophile

I think what we need here is an AsciiString type. Such a type would be a thin wrapper over char[], or maybe immutable(char)[] for added safety. On construction it would enforce that the underlying string does not contain any multiple byte characters. It would only allow appending of chars, not wchars or dchars. If you appended a regular to it, it would throw if the appended string contained any characters that couldn't be represented in a single byte. It would be a random access range of chars with lvalue elements, and would provide a way of documenting the assumption that you're only working with ASCII, and a mechanism for verifying this assumption at runtime.
Sep 12 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 It's not necessarily a bad idea,

I don't know if it's a good idea.
 but I'm not sure that we want to encourage code 
 that assumes ASCII. It's far too easy for English-speaking programmers to end
up 
 making that assumption in their code and then they run into problems later
when 
 they unexpectedly end up with unicode characters in their input, or they have
to 
 change their code to work with unicode.

On the other hand there are situations when you know you are dealing just with digits, or few predetermined symbols like ()+-*/", or when you process very large biological strings that are composed by a restricted and limited number of different ASCII chars. Bye, bearophile
Sep 12 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Brad Roberts:
 Existence != common.
 27 hits among millions != common.
 I think you're viewpoint might be a little skewed.

Please Brad. I didn't mean that you are able to find thousands of strings """.join(sample(d, 2))" in Python code around the world. What I meant to say is that in Python2 that's a very natural idiom. I have no idea how to demonstrate this last statement of mine. Bye, bearophile
Sep 12 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
Andrej Mitrovic wrote:
 The "".join idiom itself is widespread (amongst those who know about
 it, at least). It's mentioned in several books and Python tutorials.
 As for taking random string samples, I've never used it so I can't
 judge whether it's common or not.

Yes, taking random substring samples seems very obscure to me. Sure, taking a random index into a string may wind up in the middle of a UTF8 sequence. But, in practice, indices into strings are not random. They are the result of some other operation on a string, and so they point to the start of a UTF8 sequence.
Sep 17 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 12 September 2010 19:15:10 dsimcha wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 
 Jonathan M Davis:
 Well, I don't think that I've ever seen a program that did that sort of
 thing.

It's common Python code (and maybe in future it will be common D2 code). In

another answer I have given few examples to Andrei.
 If your string processing doesn't require random access, then you
 avoid the problem, but as long as it needs random access, you're pretty
 much stuck.

I understand, this is probably the answer I was looking for, thank you :-) Bye, bearophile

I think what we need here is an AsciiString type. Such a type would be a thin wrapper over char[], or maybe immutable(char)[] for added safety. On construction it would enforce that the underlying string does not contain any multiple byte characters. It would only allow appending of chars, not wchars or dchars. If you appended a regular to it, it would throw if the appended string contained any characters that couldn't be represented in a single byte. It would be a random access range of chars with lvalue elements, and would provide a way of documenting the assumption that you're only working with ASCII, and a mechanism for verifying this assumption at runtime.

It's not necessarily a bad idea, but I'm not sure that we want to encourage code that assumes ASCII. It's far too easy for English-speaking programmers to end up making that assumption in their code and then they run into problems later when they unexpectedly end up with unicode characters in their input, or they have to change their code to work with unicode. I'm inclined to force the issue and keep the status quo that _all_ strings in D are unicode of some variety. There's far too much code out there which is not unicode compliant when it should be. - Jonathan m Davis
Sep 12 2010
prev sibling parent Daniel Gibson <metalcaedes gmail.com> writes:
On Mon, Sep 13, 2010 at 4:50 AM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 Jonathan M Davis:

 It's not necessarily a bad idea,

I don't know if it's a good idea.
 but I'm not sure that we want to encourage code
 that assumes ASCII. It's far too easy for English-speaking programmers t=


 making that assumption in their code and then they run into problems lat=


 they unexpectedly end up with unicode characters in their input, or they=


 change their code to work with unicode.

On the other hand there are situations when you know you are dealing just=

s very large biological strings that are composed by a restricted and limit= ed number of different ASCII chars.
 Bye,
 bearophile

Can't you just use byte[] for that? If you're 100% sure your string only contains ASCII characters, you can just cast it to byte[], feed that into algorithms and cast it back to char[] afterwards, I guess. Cheers, - Daniel
Sep 13 2010
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
On 9/12/2010 7:09 PM, bearophile wrote:
 Jonathan M Davis:
 Well, I don't think that I've ever seen a program that did that sort of
 thing.

It's common Python code (and maybe in future it will be common D2 code). In another answer I have given few examples to Andrei.

Existence != common. 27 hits among millions != common. I think you're viewpoint might be a little skewed.
Sep 12 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 09/12/2010 07:09 PM, bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

I assume you have missed most of the things I was trying to say, maybe you have not even read the original post. So I try to explain better a subset of the things I have written. This is a quite common piece of Python code: from random import sample d = "0123456789" print "".join(sample(d, 2))

Well it's not that common code. How often would one need to generate a string that contains two random but distinct digits?
 I need to perform the same thing in D.
 For me it's not easy to do that in D2 with Phobos2.

 This doesn't work:

 import std.stdio, std.random, std.array, std.range;
 void main() {
      string d = "0123456789";
      string res = array(take(randomCover(d, rndGen), 2));
      writeln(res);
 }

 It returns:
 test.d(4): Error: cannot implicitly convert expression
(array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string

The code compiles and runs as written on my system. I think it's David Simcha who changed the return type to ForEachType!Range[]. I'm not sure I agree with that, as it takes an oddity of foreach that I hoped would go away some time and propagates it. About the original problem: strings are bidirectional ranges of dchar, which is the way they ought to be. Algorithms used on top of strings will inherently traffic in dchar. If you want to get a string back, this should work: string res = to!string(take(randomCover(d, rndGen), 2)); That doesn't work for a different reason, and is a bug worth filing. In fact - no need, I just submitted a fix (http://www.dsource.org/projects/phobos/changeset/1988). Thanks for bringing this up! Andrei
Sep 12 2010
next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 12 September 2010 18:25:27 Andrei Alexandrescu wrote:
 string res = to!string(take(randomCover(d, rndGen), 2));
 
 That doesn't work for a different reason, and is a bug worth filing. In
 fact - no need, I just submitted a fix
 (http://www.dsource.org/projects/phobos/changeset/1988). Thanks for
 bringing this up!

Skipping the array() call and going straight to to!string() would certainly clean this sort of code up. - Jonathan M Davis
Sep 12 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 from random import sample
 d = "0123456789"
 print "".join(sample(d, 2))

Well it's not that common code. How often would one need to generate a string that contains two random but distinct digits?

It's not easy to give a good answer to this question. In Python it's normal code, almost common. Google Code Search gives 27 answers: http://www.google.com/codesearch?hl=en&lr=&q=%22.join%28sample%28%22+lang%3Apython&sbtn=Search Think about a "Bulls and cows" game (it's a task of Rosettacode site), it's similar to MasterMind, at the beginning you need to generate the secret key, four random distinct digits, that later are used in the program, the user has to guess them using the number of right items in the right place, or right items in the wrong place. To generate the key in Python you may use "".join(sample(d, 4)).
 The code compiles and runs as written on my system.

Sorry. I have used the normal DMD 2.048, I don't use the svn head :-)
 I think it's David 
 Simcha who changed the return type to ForEachType!Range[]. I'm not sure 
 I agree with that, as it takes an oddity of foreach that I hoped would 
 go away some time and propagates it.

I see. If there is something you don't like about this situation, then I think it's a good moment to discuss it :-)
 About the original problem: strings are bidirectional ranges of dchar, 
 which is the way they ought to be. Algorithms used on top of strings 
 will inherently traffic in dchar. If you want to get a string back, this 
 should work:
 
 string res = to!string(take(randomCover(d, rndGen), 2));

OK, I accept this (but what you have just said has some consequences). Thank you for your answer. With one of my suggestions: http://d.puremagic.com/issues/show_bug.cgi?id=4851 that line becomes string res = to!string(take(randomCover(d), 2));
 That doesn't work for a different reason, and is a bug worth filing. In 
 fact - no need, I just submitted a fix 
 (http://www.dsource.org/projects/phobos/changeset/1988). Thanks for 
 bringing this up!

You are welcome and thank you for the answers and the fix. Bye, bearophile
Sep 12 2010
next sibling parent Kagamin <spam here.lot> writes:
bearophile Wrote:

 string that contains two random but distinct digits?

It's not easy to give a good answer to this question. In Python it's normal code, almost common. Google Code Search gives 27 answers: http://www.google.com/codesearch?hl=en&lr=&q=%22.join%28sample%28%22+lang%3Apython&sbtn=Search

Well, captcha is a good example, but simple to!string(rand()) is ok as a password generator.
 Think about a "Bulls and cows" game (it's a task of Rosettacode site), it's
similar to MasterMind, at the beginning you need to generate the secret key,
four random distinct digits, that later are used in the program, the user has
to guess them using the number of right items in the right place, or right
items in the wrong place. To generate the key in Python you may use
"".join(sample(d, 4)).
 

Sep 12 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 Think about a "Bulls and cows" game (it's a task of Rosettacode site), it's
 similar to MasterMind, at the beginning you need to generate the secret key,
 four random distinct digits, that later are used in the program, the user has
 to guess them using the number of right items in the right place, or right
 items in the wrong place. To generate the key in Python you may use
 "".join(sample(d, 4)).

Generate a 4 digit random integer and convert it to a string. It's probably a lot more efficient than the Python version.
Sep 17 2010
parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Walter Bright wrote:
 bearophile wrote:
 Think about a "Bulls and cows" game (it's a task of Rosettacode site),=


 it's
 similar to MasterMind, at the beginning you need to generate the
 secret key,
 four random distinct digits, that later are used in the program, the
 user has
 to guess them using the number of right items in the right place, or
 right
 items in the wrong place. To generate the key in Python you may use
 "".join(sample(d, 4)).

Generate a 4 digit random integer and convert it to a string. It's probably a lot more efficient than the Python version.

Except that the Python version ensures that you don't have the same digit twice, which just generating a 4 digits random integer won't... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Sep 18 2010
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Jérôme M. Berger wrote:
 	Except that the Python version ensures that you don't have the same
 digit twice, which just generating a 4 digits random integer won't...

I didn't know sample() did that.
Sep 18 2010
prev sibling parent Kagamin <spam here.lot> writes:
Jérôme M. Berger Wrote:

 Generate a 4 digit random integer and convert it to a string. It's
 probably a lot more efficient than the Python version.

Except that the Python version ensures that you don't have the same digit twice, which just generating a 4 digits random integer won't...

So this trick is not good for captcha and password generation.
Sep 18 2010
prev sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 09/12/2010 07:09 PM, bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

I assume you have missed most of the things I was trying to say, maybe you


things I have written.
 This is a quite common piece of Python code:

 from random import sample
 d = "0123456789"
 print "".join(sample(d, 2))

string that contains two random but distinct digits?
 I need to perform the same thing in D.
 For me it's not easy to do that in D2 with Phobos2.

 This doesn't work:

 import std.stdio, std.random, std.array, std.range;
 void main() {
      string d = "0123456789";
      string res = array(take(randomCover(d, rndGen), 2));
      writeln(res);
 }

 It returns:
 test.d(4): Error: cannot implicitly convert expression


 The code compiles and runs as written on my system. I think it's David
 Simcha who changed the return type to ForEachType!Range[]. I'm not sure
 I agree with that, as it takes an oddity of foreach that I hoped would
 go away some time and propagates it.

Just to clear up some confusion, I specialized array() for narrow strings so it always returns a dchar[] instead of using ForeachType. Therefore, the behavior is effectively the same as before I changed array() to work with opApply, when it used ElementType. I figured there's two use cases for calling array() on a narrow string: Generic code and non-generic code. In generic code you want to be able to assume that the array returned will be a random access range with lvalue elements like every array type besides narrow strings is. In non-generic code you can just use std.conv to get exactly the type you want.
Sep 12 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Sunday 12 September 2010 18:13:47 bearophile wrote:
 Jonathan M Davis:
 You do seem to try to do a lot of things that most other folks never even
 think of doing, let alone have a need to. This is one of them.

Choosing few random chars out of a sequence of possible chars is very normal in Python, it's even a common thing.

Well, I don't think that I've ever seen a program that did that sort of thing. Of course, I don't program in python (I bought a book on it but haven't gotten around to reading it yet), but I suspect that either it's simply an artifact of what you are trying to do as opposed to what's typical in python or that it's something that's typical to do in python but not other languages (which could be an artifact of which language people use for which task).
 if you're trying to grab random characters from
 a string, that's likely to work best with a dstring because it supports
 random access, so there's a decent chance that dstring is really what
 you want anyway,

This may be right. But as I have tried to explain two times, you may end up having string-processing code made mostly of dstrings.

Well, yes. That would be a side effect of using algorithms that need dstrings. If the algorithms that you're using are primarily random access-based, then naturally, most of your code will end up using dstrings rather than strings. There's no really any way around that unless you want to keep translating back and forth. If your string processing doesn't require random access, then you avoid the problem, but as long as it needs random access, you're pretty much stuck. - Jonathan M Davis
Sep 12 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
The "".join idiom itself is widespread (amongst those who know about
it, at least). It's mentioned in several books and Python tutorials.
As for taking random string samples, I've never used it so I can't
judge whether it's common or not.

On Mon, Sep 13, 2010 at 4:34 AM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 Brad Roberts:
 Existence !=3D common.
 27 hits among millions !=3D common.
 I think you're viewpoint might be a little skewed.

Please Brad. I didn't mean that you are able to find thousands of strings=

ay is that in Python2 that's a very natural idiom. I have no idea how to de= monstrate this last statement of mine.
 Bye,
 bearophile

Sep 12 2010
prev sibling parent reply Pelle <pelle.mansson gmail.com> writes:
On 09/13/2010 02:09 AM, bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

I assume you have missed most of the things I was trying to say, maybe you have not even read the original post. So I try to explain better a subset of the things I have written. This is a quite common piece of Python code: from random import sample d = "0123456789" print "".join(sample(d, 2)) I need to perform the same thing in D. For me it's not easy to do that in D2 with Phobos2. This doesn't work: import std.stdio, std.random, std.array, std.range; void main() { string d = "0123456789"; string res = array(take(randomCover(d, rndGen), 2)); writeln(res); } It returns: test.d(4): Error: cannot implicitly convert expression (array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string If I change it like this: import std.stdio, std.random, std.array, std.range; void main() { string d = "0123456789"; dchar[] res = array(take(randomCover(d, rndGen), 2)); writeln(res); } It doesn't work, and gives a cloud of errors: ...\dmd\src\phobos\std\random.d(890): Error: cast(dchar)(this._input[this._current]) is not an lvalue ...\dmd\src\phobos\std\random.d(907): Error: template std.random.uniform(string boundaries = "[)",T1,T2,UniformRandomNumberGenerator) if (is(CommonType!(T1,UniformRandomNumberGenerator) == void)&& !is(CommonType!(T1,T2) == void)) does not match any function template declaration ...\dmd\src\phobos\std\random.d(907): Error: template std.random.uniform(string boundaries = "[)",T1,T2,UniformRandomNumberGenerator) if (is(CommonType!(T1,UniformRandomNumberGenerator) == void)&& !is(CommonType!(T1,T2) == void)) cannot deduce template function from argument types !()(int,uint,MersenneTwisterEngine!(uint,32,624,397,31,-1727483681u,11,7,-1658038656u,15,-272236544u,18)) If I replace the d string with a dchar[], it works: import std.stdio, std.random, std.array, std.range; void main() { dchar[] d = "0123456789"d.dup; dchar[] res = array(take(randomCover(d, rndGen), 2)); writeln(res); } But now all strings in this little program are dchar arrays. What I am trying to say is that with the recent changes to the management of the strings in std.algorithm, when you use strings and char arrays, and you use algorithms over them, the dchar becomes viral, and you end using in most of the code composed dchar arrays or dstrings (unless you cast things back to char[]/string, and I don't know if this is possible in SafeD). Do you understand now? I am mistaken? Bye, bearophile

pp ~% python Python 2.6.5 (r265:79063, Apr 1 2010, 05:22:20) [GCC 4.4.3 20100316 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
 from random import sample
 ""



 "".join(sample("", 2))



Doesn't work with utf8. The D version is clearly superior. :-) pp ~/dee% cat test.d | tail -50 | head -8 void main() { string s = ""; writeln(take(randomCover(to!dstring(s), rndGen), 2)); return; pp ~/dee% rdmd test.d pp ~/dee% rdmd test.d
Sep 13 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Pelle:

  >>> from random import sample
  >>> ""
 '\xc3\xa4\xc3\xb6'
  >>> "".join(sample("", 2))
 '\xb6\xc3'
 
 Doesn't work with utf8. The D version is clearly superior. :-)

On the other hand D/Phobos/DMD have several thousand problems, small, big and HUGE, that Python lacks :-) You are using Python 2.6.5, where you need to use unicode strings ("u" prefix). This works correctly on both Windows and Linux with Python 2.6.6, if your source code is UTF-8: # coding: utf-8 from random import sample print u"".encode("utf-8") print "".join(sample(u"", 2)).encode("utf-8") The strings have being changed in Python3.x, where they are the default. So there is no need to use the "u" prefix. Mine was not a comparison, and it didn't have the purpose to show that Python is better, it was a way to put in the limelight a possible problem with Phobos. Bye, bearophile
Sep 13 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 On the other hand D/Phobos/DMD have several thousand problems, small, big and
HUGE, that Python lacks :-)

Python has 2507 open issues. http://bugs.python.org/
Sep 18 2010