digitalmars.D.learn - How to remove all characters from a string, except the integers?
- BoQsc (17/17) Mar 03 2022 I need to check if a string contains integers,
- Stanislav Blinov (25/30) Mar 03 2022 ```d
- BoQsc (3/33) Mar 03 2022 D language should be renamed into Exclamation-mark language.
- bachmeier (2/40) Mar 03 2022 But you have no problem with parenthesis and braces?
- Salih Dincer (11/16) Mar 03 2022 **When using ```find!isNumber```:**
- matheus (13/15) Mar 03 2022 I'm a simple man who uses D with the old C mentality:
- forkit (3/15) Mar 03 2022 mmm..but we no longer live in simple times ;-)
- H. S. Teoh (12/34) Mar 03 2022 ------
- matheus (22/31) Mar 03 2022 Just because I'm a simple man. :)
- H. S. Teoh (18/28) Mar 03 2022 [...]
- matheus (47/51) Mar 04 2022 OK but there is another problem, I tested your version and mine
- H. S. Teoh (11/17) Mar 04 2022 This line allocates a new string for every single loop iteration. This
- matheus (4/8) Mar 04 2022 Yes, in fact I usually do my coding/compiling with DMD because is
- ag0aep6g (11/52) Mar 04 2022 The second version involves auto-decoding, which isn't actually
- H. S. Teoh (9/12) Mar 04 2022 [...]
- matheus (5/17) Mar 04 2022 That's awesome my timing are pretty much like yours. In fact now
- Stanislav Blinov (4/9) Mar 04 2022 To add to the already-mentioned difference in allocation
- matheus (3/12) Mar 04 2022 Interesting and I'll try that way. Thanks.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/6) Mar 03 2022 I assumed it would generate separate integers 123 and 456. I started to
- Salih Dincer (26/33) Mar 03 2022 It's called hit two targets with one arrow:
- Salih Dincer (35/39) Mar 03 2022 I worked on it a little. I guess it's better that way. But I
- H. S. Teoh (15/24) Mar 04 2022 [...]
- Salih Dincer (8/22) Mar 03 2022 If you look
- forkit (6/30) Mar 03 2022 If you get this question at an interview, please remember to
- Salih Dincer (7/9) Mar 04 2022 ```d
- =?UTF-8?Q?Ali_=c3=87ehreli?= (7/16) Mar 04 2022 I think what forkit means is, should the function consider numbers made
- BoQsc (25/42) Mar 04 2022 #### Regular expression solution
- =?UTF-8?Q?Ali_=c3=87ehreli?= (49/50) Mar 04 2022 Others assumed you wanted integer values but I think you want the digits...
- Salih Dincer (10/14) Mar 04 2022 It's delicious, only four lines:
I need to check if a string contains integers, and if it contains integers, remove all the regular string characters. I've looked around and it seems using regex is the only closest solution. ``` import std.stdio; void main(string[] args){ if (args.length > 1){ write(args[1]); // Needs to print only integers. } else { write("Please write an argument."); } } ```
Mar 03 2022
On Thursday, 3 March 2022 at 12:14:13 UTC, BoQsc wrote:I need to check if a string contains integers, and if it contains integers, remove all the regular string characters. I've looked around and it seems using regex is the only closest solution.```d import std.stdio; import std.algorithm : find, filter; import std.conv : to; import std.uni : isNumber; void main(string[] args){ if (args.length > 1){ auto filtered = () { auto r = args[1].find!isNumber; // check if a string contains integers return r.length ? r.filter!isNumber.to!string // and if it does, keep only integers : args[1]; // otherwise keep original } (); filtered.writeln; } else { write("Please write an argument."); } } ```
Mar 03 2022
On Thursday, 3 March 2022 at 13:25:32 UTC, Stanislav Blinov wrote:On Thursday, 3 March 2022 at 12:14:13 UTC, BoQsc wrote:D language should be renamed into Exclamation-mark language. It feels overused everywhere and without a better alternative.I need to check if a string contains integers, and if it contains integers, remove all the regular string characters. I've looked around and it seems using regex is the only closest solution.```d import std.stdio; import std.algorithm : find, filter; import std.conv : to; import std.uni : isNumber; void main(string[] args){ if (args.length > 1){ auto filtered = () { auto r = args[1].find!isNumber; // check if a string contains integers return r.length ? r.filter!isNumber.to!string // and if it does, keep only integers : args[1]; // otherwise keep original } (); filtered.writeln; } else { write("Please write an argument."); } } ```
Mar 03 2022
On Thursday, 3 March 2022 at 13:55:47 UTC, BoQsc wrote:On Thursday, 3 March 2022 at 13:25:32 UTC, Stanislav Blinov wrote:But you have no problem with parenthesis and braces?On Thursday, 3 March 2022 at 12:14:13 UTC, BoQsc wrote:D language should be renamed into Exclamation-mark language. It feels overused everywhere and without a better alternative.I need to check if a string contains integers, and if it contains integers, remove all the regular string characters. I've looked around and it seems using regex is the only closest solution.```d import std.stdio; import std.algorithm : find, filter; import std.conv : to; import std.uni : isNumber; void main(string[] args){ if (args.length > 1){ auto filtered = () { auto r = args[1].find!isNumber; // check if a string contains integers return r.length ? r.filter!isNumber.to!string // and if it does, keep only integers : args[1]; // otherwise keep original } (); filtered.writeln; } else { write("Please write an argument."); } } ```
Mar 03 2022
On Thursday, 3 March 2022 at 13:25:32 UTC, Stanislav Blinov wrote:auto filtered = () { auto r = args[1].find!isNumber; // check if a string contains integers ```**When using ```find!isNumber```:** ``` 0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdefghijklmnopqrstuvwxyz{|}~ ``` **When using ```find!isAlphaNum```:** ``` 0123456789 ```
Mar 03 2022
On Thursday, 3 March 2022 at 12:14:13 UTC, BoQsc wrote:I've looked around and it seems using regex is the only closest solution.I'm a simple man who uses D with the old C mentality: import std.stdio; void main(){ string s, str = "4A0B1de!2C9~6"; foreach(i;str){ if(i < '0' || i > '9'){ continue; } s ~= i; } writeln("Result: ", s); } Result: 401296 Matheus.
Mar 03 2022
On Thursday, 3 March 2022 at 19:28:36 UTC, matheus wrote:I'm a simple man who uses D with the old C mentality: import std.stdio; void main(){ string s, str = "4A0B1de!2C9~6"; foreach(i;str){ if(i < '0' || i > '9'){ continue; } s ~= i; } writeln("Result: ", s); } Result: 401296 Matheus.mmm..but we no longer live in simple times ;-) (i.e. unicode)
Mar 03 2022
On Thu, Mar 03, 2022 at 08:23:14PM +0000, forkit via Digitalmars-d-learn wrote:On Thursday, 3 March 2022 at 19:28:36 UTC, matheus wrote:------ void main() { string s = "blahblah123blehbleh456bluhbluh"; auto result = s.filter!(ch => ch.isDigit).to!int; assert(result == 123456); } ------ Problem solved. Why write 6 lines when 3 will do? T -- People tell me that I'm skeptical, but I don't believe them.I'm a simple man who uses D with the old C mentality: import std.stdio; void main(){ string s, str = "4A0B1de!2C9~6"; foreach(i;str){ if(i < '0' || i > '9'){ continue; } s ~= i; } writeln("Result: ", s); } Result: 401296 Matheus.mmm..but we no longer live in simple times ;-) (i.e. unicode)
Mar 03 2022
On Thursday, 3 March 2022 at 21:03:40 UTC, H. S. Teoh wrote:... ------ void main() { string s = "blahblah123blehbleh456bluhbluh"; auto result = s.filter!(ch => ch.isDigit).to!int; assert(result == 123456); } ------ Problem solved. Why write 6 lines when 3 will do?Just because I'm a simple man. :) I usually program mostly in C and when in D, I go in the same way but using features like: GC, strings, AA etc. Of course your version is a D'ish way of handling things, and I can't contest it looks better visually. But if size was problem I could have written: void main(){ string s, str = "4A0B1de!2C9~6"; foreach(i;str){ (i >= '0' && i <= '9') ? s~=i : null; } writeln(s); } Well still 1 line off, but I goes with my flow. I mean this example is a simple one, but usually I can see and understand what a code in C is doing (more) easily than D just looking at it. Don't even ask about C++, because I gave up. :) Matheus. PS: I spotted something on your code, you're converting the result to int, this can lead to a overflow depending the values in the string.
Mar 03 2022
On Thu, Mar 03, 2022 at 10:54:39PM +0000, matheus via Digitalmars-d-learn wrote:On Thursday, 3 March 2022 at 21:03:40 UTC, H. S. Teoh wrote:[...][...]------ void main() { string s = "blahblah123blehbleh456bluhbluh"; auto result = s.filter!(ch => ch.isDigit).to!int; assert(result == 123456); } ------PS: I spotted something on your code, you're converting the result to int, this can lead to a overflow depending the values in the string.If you need to, convert to long instead. Or if you want a string for subsequent manipulation, replace `int` with `string`. Or, if you don't actually need to manipulate the value at all, but just print the digits, then it becomes even simpler: void main() { string s = "blahblah123blehbleh456bluhbluh"; writeln(s.filter!(ch => ch.isDigit)); } This version doesn't even allocate extra storage for the filtered digits, since no storage is actually needed (each digit is spooled directly to the output). T -- The peace of mind---from knowing that viruses which exploit Microsoft system vulnerabilities cannot touch Linux---is priceless. -- Frustrated system administrator.
Mar 03 2022
On Thursday, 3 March 2022 at 23:46:49 UTC, H. S. Teoh wrote:... This version doesn't even allocate extra storage for the filtered digits, since no storage is actually needed (each digit is spooled directly to the output).OK but there is another problem, I tested your version and mine and there is a HUGE difference in speed: LDC 1.27.1, with -O2: import std.datetime.stopwatch; import std.stdio: write, writeln, writef, writefln; import std; void printStrTim(string s,StopWatch sw){ writeln("\nstr: ", s ,"\nTim(ms): ", sw.peek.total!"msecs" ,"\nTim(us): ", sw.peek.total!"usecs" ); } void main(){ auto sw = StopWatch(AutoStart.no); string s, str = "4A0B1de!2C9~6"; int j; sw.start(); for(j=0;j<1_000_000;++j){ s=""; foreach(i;str){ (i >= '0' && i <= '9') ? s~=i : null; } } sw.stop(); printStrTim(s,sw); s = ""; sw.reset(); sw.start(); for(j=0;j<1_000_000;++j){ s=""; s = str.filter!(ch => ch.isDigit).to!string; } sw.stop(); printStrTim(s,sw); } Prints: str: 401296 Tim(ms): 306 Tim(us): 306653 str: 401296 Tim(ms): 1112 Tim(us): 1112648 ------------------------------- Unless I did something wrong (If anything please tell). By the way on DMD was worse, it was like 5x slower in your version. Matheus.
Mar 04 2022
On Fri, Mar 04, 2022 at 07:51:44PM +0000, matheus via Digitalmars-d-learn wrote: [...]for(j=0;j<1_000_000;++j){ s=""; s = str.filter!(ch => ch.isDigit).to!string;This line allocates a new string for every single loop iteration. This is generally not something you want to do in an inner loop. :-)}[...]Unless I did something wrong (If anything please tell). By the way on DMD was worse, it was like 5x slower in your version.[...] I don't pay any attention to DMD when I'm doing anything remotely performance-related. Its optimizer is known to be suboptimal. :-P T -- Study gravitation, it's a field with a lot of potential.
Mar 04 2022
On Friday, 4 March 2022 at 20:33:08 UTC, H. S. Teoh wrote:On Fri, Mar 04, 2022 at 07:51:44PM +0000, matheus via ... I don't pay any attention to DMD when I'm doing anything remotely performance-related. Its optimizer is known to be suboptimal. :-PYes, in fact I usually do my coding/compiling with DMD because is faster, then I go for LDC for production and speed. Matheus.
Mar 04 2022
On Friday, 4 March 2022 at 19:51:44 UTC, matheus wrote:import std.datetime.stopwatch; import std.stdio: write, writeln, writef, writefln; import std; void printStrTim(string s,StopWatch sw){ writeln("\nstr: ", s ,"\nTim(ms): ", sw.peek.total!"msecs" ,"\nTim(us): ", sw.peek.total!"usecs" ); } void main(){ auto sw = StopWatch(AutoStart.no); string s, str = "4A0B1de!2C9~6"; int j; sw.start(); for(j=0;j<1_000_000;++j){ s=""; foreach(i;str){ (i >= '0' && i <= '9') ? s~=i : null; } } sw.stop(); printStrTim(s,sw); s = ""; sw.reset(); sw.start(); for(j=0;j<1_000_000;++j){ s=""; s = str.filter!(ch => ch.isDigit).to!string; } sw.stop(); printStrTim(s,sw); } Prints: str: 401296 Tim(ms): 306 Tim(us): 306653 str: 401296 Tim(ms): 1112 Tim(us): 1112648 ------------------------------- Unless I did something wrong (If anything please tell).The second version involves auto-decoding, which isn't actually needed. You can work around it with `str.byCodeUnit.filter!...`. On my machine, times become the same then. Typical output: str: 401296 Tim(ms): 138 Tim(us): 138505 str: 401296 Tim(ms): 137 Tim(us): 137376
Mar 04 2022
On Fri, Mar 04, 2022 at 08:38:11PM +0000, ag0aep6g via Digitalmars-d-learn wrote: [...]The second version involves auto-decoding, which isn't actually needed. You can work around it with `str.byCodeUnit.filter!...`. On my machine, times become the same then.[...] And this here is living proof of why autodecoding is a Bad Idea(tm). Whatever happened to Andrei's std.v2 effort?! The sooner we can shed this baggage, the better. T -- The two rules of success: 1. Don't tell everything you know. -- YHL
Mar 04 2022
On Friday, 4 March 2022 at 20:38:11 UTC, ag0aep6g wrote:... The second version involves auto-decoding, which isn't actually needed. You can work around it with `str.byCodeUnit.filter!...`. On my machine, times become the same then. Typical output: str: 401296 Tim(ms): 138 Tim(us): 138505 str: 401296 Tim(ms): 137 Tim(us): 137376That's awesome my timing are pretty much like yours. In fact now with "byCodeUnit" it's faster. :) Thanks, Matheus.
Mar 04 2022
On Friday, 4 March 2022 at 19:51:44 UTC, matheus wrote:OK but there is another problem, I tested your version and mine and there is a HUGE difference in speed:string s, str = "4A0B1de!2C9~6";Unless I did something wrong (If anything please tell). By the way on DMD was worse, it was like 5x slower in your version.To add to the already-mentioned difference in allocation strategies, try replacing the input with e.g. a command-line argument. Looping over a literal may be skewing the results.
Mar 04 2022
On Friday, 4 March 2022 at 21:20:20 UTC, Stanislav Blinov wrote:On Friday, 4 March 2022 at 19:51:44 UTC, matheus wrote:Interesting and I'll try that way. Thanks. Matheus.OK but there is another problem, I tested your version and mine and there is a HUGE difference in speed:string s, str = "4A0B1de!2C9~6";Unless I did something wrong (If anything please tell). By the way on DMD was worse, it was like 5x slower in your version.To add to the already-mentioned difference in allocation strategies, try replacing the input with e.g. a command-line argument. Looping over a literal may be skewing the results.
Mar 04 2022
On 3/3/22 13:03, H. S. Teoh wrote:string s = "blahblah123blehbleh456bluhbluh";assert(result == 123456);I assumed it would generate separate integers 123 and 456. I started to implement a range with findSkip, findSplit, and friends but failed. :/ Ali
Mar 03 2022
On Friday, 4 March 2022 at 02:36:35 UTC, Ali Çehreli wrote:On 3/3/22 13:03, H. S. Teoh wrote:It's called hit two targets with one arrow: ```d auto splitNumbers(string str) { size_t[] n = [0]; size_t i; foreach(s; str) { if(s >= '0' && s <= '9') { n[i] = 10 * n[i] + (s - '0'); } else { i++; n.length++; } } return n.filter!(c => c > 0); } void main() { auto s = "abc1234567890def1234567890xyz"; s.splitNumbers.writeln; // [1234567890, 1234567890] } ``` SDB 79string s = "blahblah123blehbleh456bluhbluh";assert(result == 123456);I assumed it would generate separate integers 123 and 456. I started to implement a range with findSkip, findSplit, and friends but failed. :/ Ali
Mar 03 2022
On Friday, 4 March 2022 at 02:36:35 UTC, Ali Çehreli wrote:I assumed it would generate separate integers 123 and 456. I started to implement a range with findSkip, findSplit, and friends but failed. :/I worked on it a little. I guess it's better that way. But I didn't think about negative numbers. ```d auto splitNumbers(string str) { size_t[] n; int i = -1; bool nextNumber = true; foreach(s; str) { if(s >= '0' && s <= '9') { if(nextNumber) { i++; n.length++; nextNumber = false; } n[i] = 10 * n[i] + (s - '0'); } else nextNumber = true; } return n; } unittest { auto n = splitNumbers(" 1,23, 456\n\r7890..."); assert(n[0] == 1Lu); assert(n[1] == 23Lu); assert(n[2] == 456Lu); assert(n[3] == 7890Lu); } ``` Presumably, D has more active and short possibilities. This is what I can do that making little use of the library. Thank you... SDB 79
Mar 03 2022
On Thu, Mar 03, 2022 at 06:36:35PM -0800, Ali Çehreli via Digitalmars-d-learn wrote:On 3/3/22 13:03, H. S. Teoh wrote:[...] import std; void main() { string s = "blahblah123blehbleh456bluhbluh"; auto result = s.matchAll(regex(`\d+`)) .each!(m => writeln(m[0])); } Output: 123 456 Takes only 3 lines of code. ;-) T -- People demand freedom of speech to make up for the freedom of thought which they avoid. -- Soren Aabye Kierkegaard (1813-1855)string s = "blahblah123blehbleh456bluhbluh";assert(result == 123456);I assumed it would generate separate integers 123 and 456. I started to implement a range with findSkip, findSplit, and friends but failed. :/
Mar 04 2022
On Thursday, 3 March 2022 at 20:23:14 UTC, forkit wrote:On Thursday, 3 March 2022 at 19:28:36 UTC, matheus wrote:If you look [here](https://github.com/dlang/phobos/blob/master/std/ascii.d#L315), you'll see that it's already the same logic. If it were me I would have even written like this: ```d "4A0B1de!2C9~6".filter!(c => '0' <= c && c <= '9' ).writeln; // 401296 ```I'm a simple man who uses D with the old C mentality: [...] ```d string s, str = "4A0B1de!2C9~6"; foreach(i;str){ if(i < '0' || i > '9'){ continue; } s ~= i; } ``` [...]mmm..but we no longer live in simple times ;-) (i.e. unicode)
Mar 03 2022
On Friday, 4 March 2022 at 02:10:11 UTC, Salih Dincer wrote:On Thursday, 3 March 2022 at 20:23:14 UTC, forkit wrote:If you get this question at an interview, please remember to first ask whether it's ascii or unicode ;-) " All of the functions in std.ascii accept Unicode characters but effectively ignore them if they're not ASCII." - https://github.com/dlang/phobos/blob/master/std/ascii.dOn Thursday, 3 March 2022 at 19:28:36 UTC, matheus wrote:If you look [here](https://github.com/dlang/phobos/blob/master/std/ascii.d#L315), you'll see that it's already the same logic. If it were me I would have even written like this: ```d "4A0B1de!2C9~6".filter!(c => '0' <= c && c <= '9' ).writeln; // 401296 ```I'm a simple man who uses D with the old C mentality: [...] ```d string s, str = "4A0B1de!2C9~6"; foreach(i;str){ if(i < '0' || i > '9'){ continue; } s ~= i; } ``` [...]mmm..but we no longer live in simple times ;-) (i.e. unicode)
Mar 03 2022
On Friday, 4 March 2022 at 07:55:18 UTC, forkit wrote:If you get this question at an interview, please remember to first ask whether it's ascii or unicode 😀```d auto UTFsample = ` 1 İş 100€, 1.568,38 Türk Lirası çarşıda eğri 1 çöp 4lınmaz!`; UTFsample.splitNumbers.writeln; // [1, 100, 1, 568, 38, 1, 4] ```
Mar 04 2022
On 3/4/22 01:53, Salih Dincer wrote:On Friday, 4 March 2022 at 07:55:18 UTC, forkit wrote:I think what forkit means is, should the function consider numbers made of non-ascii characters as well? For example, the ones on this page: https://www.fileformat.info/info/unicode/category/Nd/list.htm Typical to any programming task, all of us made assumptions on what actually is needed. :) AliIf you get this question at an interview, please remember to first ask whether it's ascii or unicode 😀```d auto UTFsample = ` 1 İş 100€, 1.568,38 Türk Lirası çarşıda eğri 1 çöp 4lınmaz!`; UTFsample.splitNumbers.writeln; // [1, 100, 1, 568, 38, 1, 4] ```
Mar 04 2022
On Thursday, 3 March 2022 at 12:14:13 UTC, BoQsc wrote:I need to check if a string contains integers, and if it contains integers, remove all the regular string characters. I've looked around and it seems using regex is the only closest solution. ``` import std.stdio; void main(string[] args){ if (args.length > 1){ write(args[1]); // Needs to print only integers. } else { write("Please write an argument."); } } `````` import std.stdio; import std.regex; import std.string: isNumeric; import std.conv; void main(string[] args){ if (args.length > 1){ writeln(args[1]); // Needs to print only integers. string argument1 = args[1].replaceAll(regex(r"[^0-9.]","g"), ""); if (argument1.isNumeric){ writeln(std.conv.to!uint(argument1)); } else { writeln("Invalid value: ", args[1]," (must be int integer)"); } } else { write("Please write an argument."); } } ```
Mar 04 2022
On 3/3/22 04:14, BoQsc wrote:and if it contains integers, remove all the regular string characters.Others assumed you wanted integer values but I think you want the digits of the integers. It took me a while to realize that chunkBy can do that: // Convenience functions to tuple members of the result // of chunkBy when used with a unary predicate. auto isMatched(T)(T tuple) { return tuple[0]; } // Ditto auto chunkOf(T)(T tuple) { return tuple[1]; } auto numbers(R)(R range) { import std.algorithm : chunkBy, filter, map; import std.uni : isNumber; return range .chunkBy!isNumber .filter!isMatched .map!chunkOf; } unittest { import std.algorithm : equal, map; import std.conv : text; // "٤٢" is a non-ASCII number example. auto r = "123 ab ٤٢ c 456 xyz 789".numbers; assert(r.map!text.equal(["123", "٤٢", "456", "789"])); } void main() { } isMatched() and chunkOf() are not necessary at all. I wanted to use readable names to fields of the elements of chunkBy instead of the cryptic t[0] and t[1]: return range .chunkBy!isNumber .filter!(t => t[0]) // Not pretty .map!(t => t[1]); // Not pretty Those functions could not be nested functions because otherwise I would have to write e.g. return range .chunkBy!isNumber .filter!(t => isMatched(t)) // Not pretty .map!(t => chunkOf(t)); // Not pretty To get integer values, .to!int would work as long as the numbers consist of ASCII digits. (I am removing ٤٢.) import std.stdio; import std.algorithm; import std.conv; writeln("123 abc 456 xyz 789".numbers.map!(to!int)); Ali
Mar 04 2022
On Friday, 4 March 2022 at 10:34:29 UTC, Ali Çehreli wrote:[...] isMatched() and chunkOf() are not necessary at all. I wanted to use readable names to fields of the elements of chunkBy instead of the cryptic t[0] and t[1]:It's delicious, only four lines: ```d "1,2,3".chunkBy!(n => '0' <= n && n <= '9') .filter!(t => t[0]) .map!(c => c[1]) .writeln; ``` Thank you very much for this information sharing... SDB 79
Mar 04 2022