www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Random string samples & unicode

reply bearophile <bearophileHUGS lycos.com> writes:
The need to take a random sample without replacement is very common. For
example this is how in Python 2.x I create a random string without replacement
of fixed size from a input string of chars:

from random import sample
d = "0123456789"
print "".join(sample(d, 2))


This seems similar D2 code:

import std.stdio, std.random, std.array, std.range;
void main() {
    dchar[] d = "0123456789"d.dup;
    dchar[] res = array(take(randomCover(d, rndGen), 2));
    writeln(res);
}


There randomCover() doesn't work with a string, a dstrings or with a char[]. If
later you need to process that res dchar[] with std.string you will have
troubles.


But randomShuffle() is able to shuffle a char[] in place:

import std.stdio, std.random;
void main() {
    char[] d = "0123456789".dup;
    randomShuffle(d);
    writeln(d);
}


If randomCover() receives a char[] I think in theory it has to yield its
shuffled chars. And if it receives a string it has to yield its shuffled dchars
(converted from the chars). A string may contain UFT8 chars that are longer
than 1 byte, but a char[] is not a string, and if you want its items in random
order, it has to act like randomShuffle().

My head hurts, and I don't know what the right thing to do is.

Maybe I have to work with ubyte[] instead of char[], and add casts:

import std.stdio, std.random, std.array, std.range;
void main() {
    char[] d = "0123456789".dup;
    char[] res = cast(char[])array(take(randomCover(cast(ubyte[])d, rndGen),
2));
    writeln(res);
}


Ideas welcome.

Bye,
bearophile
Sep 10 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
 There randomCover() doesn't work with a string, a dstrings or with a char[].
 If later you need to process that res dchar[] with std.string you will have
troubles.

The problems are more widespread, this is a simple generator of terms of the "look and say" sequence (to generate a member of the sequence from the previous member, read off the digits of the previous member, counting the number of digits in groups of the same digit: http://en.wikipedia.org/wiki/Look_and_say_sequence ): import std.stdio, std.conv, std.algorithm; string lookAndSay(string input) { string result; foreach (g; group(input)) result ~= to!string(g._1) ~ (cast(char)g._0); return result; } void main() { string last = "1"; writeln(last); foreach (i; 0 .. 10) { last = lookAndSay(last); writeln(last); } } I was not able to remove that cast(char), even if I replace all strings in that program with dstrings. Is someone else using D2? Bye, bearophile
Sep 11 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 I think this might be a compiler bug:

I'll add it to Bugzilla later. But even if you remove that bug, forcing me to use dstrings in the whole program is strange. Or maybe it's a good thing, and the natural state for D programs is to just use dstrings everywhere. Andrei may offer his opinion on the situation. Bye, bearophile
Sep 11 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/11/10 10:24 CDT, bearophile wrote:
 Andrej Mitrovic:

 I think this might be a compiler bug:

I'll add it to Bugzilla later. But even if you remove that bug, forcing me to use dstrings in the whole program is strange. Or maybe it's a good thing, and the natural state for D programs is to just use dstrings everywhere. Andrei may offer his opinion on the situation. Bye, bearophile

This goes into "bearophile's odd posts coming now and then". Andrei
Sep 11 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

You aren't helping solve those problems. Bye, bearophile
Sep 11 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/11/10 9:48 CDT, Andrej Mitrovic wrote:
 I think this might be a compiler bug:


 import std.conv : to;

 void main()
 {
      string mystring;
      dchar mydchar;

      // ok, appending dchar to string
      mystring ~= mydchar;

      // error:  incompatible types for
      // ((cast(uint)mydchar) ~ (cast(uint)mydchar)): 'uint' and 'uint'
      mystring ~= mydchar ~ mydchar;
 }

You can't concatenate two integrals. Andrei
Sep 11 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 You can't concatenate two integrals.

The compiler has full type information, so what's wrong in concatenating two char or two dchar into a string or dstring? And I think there are other problems: http://d.puremagic.com/issues/show_bug.cgi?id=4853 Bye, bearophile
Sep 11 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
 The compiler has full type information, so what's wrong in concatenating two
 char or two dchar into a string or dstring?

But in C the ~ among two chars has a different meaning, so in D you may at best disallow it.
 And I think there are other problems:
 http://d.puremagic.com/issues/show_bug.cgi?id=4853

So that's invalid, I have closed it. Using a bit of contortions it's possible to write lookAndSay() with no casts, but the code is not good still: import std.stdio, std.conv, std.algorithm; string lookAndSay(string input) { string result; foreach (g; group(input)) { string s = to!string(g._1); s ~= g._0; // string ~ dchar wrong, string ~= dchar good result ~= s; } return result; } void main() { string last = "1"; writeln(last); foreach (i; 0 .. 10) { last = lookAndSay(last); writeln(last); } } Bye, bearophile
Sep 11 2010
parent bearophile <bearophileHUGS lycos.com> writes:
     foreach (g; group(input)) {
         string s = to!string(g._1);
         s ~= g._0; // string ~ dchar wrong, string ~= dchar good
         result ~= s;
     }

Shorter: foreach (g; group(input)) result ~= text(g._1, g._0); bearophile
Sep 11 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 11 Sep 2010 13:20:25 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Andrei Alexandrescu:
 You can't concatenate two integrals.

The compiler has full type information, so what's wrong in concatenating two char or two dchar into a string or dstring?

It's ambiguous also: string s1 = "abc", s2 = "def"; auto x = s1 ~ s2; would you expect x to be "abcdef" or ["abc", "def"]? Essentially, one of the arguments to concatenation must be an array type in order to avoid ambiguity. Fortunately, you can get the results you wish with the bracket notation: auto x = [s1, s2]; -Steve
Sep 13 2010
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I think this might be a compiler bug:


import std.conv : to;

void main()
{
    string mystring;
    dchar mydchar;

    // ok, appending dchar to string
    mystring ~=3D mydchar;

    // error:  incompatible types for
    // ((cast(uint)mydchar) ~ (cast(uint)mydchar)): 'uint' and 'uint'
    mystring ~=3D mydchar ~ mydchar;
}


On Sat, Sep 11, 2010 at 3:42 PM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 There randomCover() doesn't work with a string, a dstrings or with a cha=


 If later you need to process that res dchar[] with std.string you will h=


 The problems are more widespread, this is a simple generator of terms of =

previous member, read off the digits of the previous member, counting the n= umber of digits in groups of the same digit: http://en.wikipedia.org/wiki/L= ook_and_say_sequence ):
 import std.stdio, std.conv, std.algorithm;

 string lookAndSay(string input) {
 =A0 =A0string result;
 =A0 =A0foreach (g; group(input))
 =A0 =A0 =A0 =A0result ~=3D to!string(g._1) ~ (cast(char)g._0);
 =A0 =A0return result;
 }

 void main() {
 =A0 =A0string last =3D "1";
 =A0 =A0writeln(last);
 =A0 =A0foreach (i; 0 .. 10) {
 =A0 =A0 =A0 =A0last =3D lookAndSay(last);
 =A0 =A0 =A0 =A0writeln(last);
 =A0 =A0}
 }


 I was not able to remove that cast(char), even if I replace all strings i=

 Is someone else using D2?

 Bye,
 bearophile

Sep 11 2010