www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [Proposal] Add module for C-strings support in Phobos

reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
It's filed as enhancement 12418 [2]:

C-strings processing is a special and common case so:
1. C-strings should be supported with both performance and usability.
2. There should be a dedicated module for C-strings (instead of adding 
such functions here and there in other modules).

Current state: there is no good support for C-strings in Phobos, there 
is slow and broken `toStringz` (Issue 12417 [3]), and no standard way to 
make many common operations, like converting returned C-string to string 
and releasing its memory or creating a C-string from string using an 
allocation function.

So I propose to add `unstd.c.string` [1] module to Phobos which include 
all use-cases I have seen implementing (correct and fast in contrast to 
existing ones like GtkD (yes, it's both incorrect and slow because of 
tons of GC allocations)) C library wrappers.


[1] http://denis-sh.bitbucket.org/unstandard/unstd.c.string.html
[2] https://d.puremagic.com/issues/show_bug.cgi?id=12418
[3] https://d.puremagic.com/issues/show_bug.cgi?id=12417

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
Mar 20 2014
next sibling parent reply "Rikki Cattermole" <alphaglosined gmail.com> writes:
On Thursday, 20 March 2014 at 08:24:30 UTC, Denis Shelomovskij 
wrote:
 It's filed as enhancement 12418 [2]:

 C-strings processing is a special and common case so:
 1. C-strings should be supported with both performance and 
 usability.
 2. There should be a dedicated module for C-strings (instead of 
 adding such functions here and there in other modules).

 Current state: there is no good support for C-strings in 
 Phobos, there is slow and broken `toStringz` (Issue 12417 [3]), 
 and no standard way to make many common operations, like 
 converting returned C-string to string and releasing its memory 
 or creating a C-string from string using an allocation function.

 So I propose to add `unstd.c.string` [1] module to Phobos which 
 include all use-cases I have seen implementing (correct and 
 fast in contrast to existing ones like GtkD (yes, it's both 
 incorrect and slow because of tons of GC allocations)) C 
 library wrappers.


 [1] http://denis-sh.bitbucket.org/unstandard/unstd.c.string.html
 [2] https://d.puremagic.com/issues/show_bug.cgi?id=12418
 [3] https://d.puremagic.com/issues/show_bug.cgi?id=12417
Looks like it wouldn't be really useful with Windows API. Given that wstrings are more common there. Another thing that would be nice to have is a wrapper struct for the pointer that allows accessing via e.g. opIndex and opSlice. Ext. Use case: Store the struct on D side to make sure GC doesn't clean it up and still be able to access and modify it like a normal string easily.
Mar 20 2014
parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
20.03.2014 13:20, Rikki Cattermole пишет:
 On Thursday, 20 March 2014 at 08:24:30 UTC, Denis Shelomovskij wrote:
 It's filed as enhancement 12418 [2]:

 C-strings processing is a special and common case so:
 1. C-strings should be supported with both performance and usability.
 2. There should be a dedicated module for C-strings (instead of adding
 such functions here and there in other modules).

 Current state: there is no good support for C-strings in Phobos, there
 is slow and broken `toStringz` (Issue 12417 [3]), and no standard way
 to make many common operations, like converting returned C-string to
 string and releasing its memory or creating a C-string from string
 using an allocation function.

 So I propose to add `unstd.c.string` [1] module to Phobos which
 include all use-cases I have seen implementing (correct and fast in
 contrast to existing ones like GtkD (yes, it's both incorrect and slow
 because of tons of GC allocations)) C library wrappers.


 [1] http://denis-sh.bitbucket.org/unstandard/unstd.c.string.html
 [2] https://d.puremagic.com/issues/show_bug.cgi?id=12418
 [3] https://d.puremagic.com/issues/show_bug.cgi?id=12417
Looks like it wouldn't be really useful with Windows API. Given that wstrings are more common there.
You misunderstand the terminology. C string is a zero-terminated string. Also looks like you didn't even go to docs page as the second example is WinAPI one.
 Another thing that would be nice to have is a wrapper struct for the
 pointer that allows accessing via e.g. opIndex and opSlice. Ext.
 Use case: Store the struct on D side to make sure GC doesn't clean it up
 and still be able to access and modify it like a normal string easily.
I don't understand the use-case. If you did implemented some C library wrappers and have a personal experience, I'd like to hear your opinion on C functions calling problem and your proposal to solve it, if you dislike mine. Also with examples, please, where my solution fails and your one rocks. ) -- Денис В. Шеломовский Denis V. Shelomovskij
Mar 20 2014
parent reply "Rikki Cattermole" <alphaglosined gmail.com> writes:
On Thursday, 20 March 2014 at 09:32:33 UTC, Denis Shelomovskij 
wrote:
 20.03.2014 13:20, Rikki Cattermole пишет:
 On Thursday, 20 March 2014 at 08:24:30 UTC, Denis Shelomovskij 
 wrote:
 It's filed as enhancement 12418 [2]:

 C-strings processing is a special and common case so:
 1. C-strings should be supported with both performance and 
 usability.
 2. There should be a dedicated module for C-strings (instead 
 of adding
 such functions here and there in other modules).

 Current state: there is no good support for C-strings in 
 Phobos, there
 is slow and broken `toStringz` (Issue 12417 [3]), and no 
 standard way
 to make many common operations, like converting returned 
 C-string to
 string and releasing its memory or creating a C-string from 
 string
 using an allocation function.

 So I propose to add `unstd.c.string` [1] module to Phobos 
 which
 include all use-cases I have seen implementing (correct and 
 fast in
 contrast to existing ones like GtkD (yes, it's both incorrect 
 and slow
 because of tons of GC allocations)) C library wrappers.


 [1] 
 http://denis-sh.bitbucket.org/unstandard/unstd.c.string.html
 [2] https://d.puremagic.com/issues/show_bug.cgi?id=12418
 [3] https://d.puremagic.com/issues/show_bug.cgi?id=12417
Looks like it wouldn't be really useful with Windows API. Given that wstrings are more common there.
You misunderstand the terminology. C string is a zero-terminated string. Also looks like you didn't even go to docs page as the second example is WinAPI one.
I understand how c strings work. It would be nice to have more unittests for dstring/wstring, because it looks more geared towards char/string. Which is why it looks on the offset that it is less going to work.
 Another thing that would be nice to have is a wrapper struct 
 for the
 pointer that allows accessing via e.g. opIndex and opSlice. 
 Ext.
 Use case: Store the struct on D side to make sure GC doesn't 
 clean it up
 and still be able to access and modify it like a normal string 
 easily.
I don't understand the use-case. If you did implemented some C library wrappers and have a personal experience, I'd like to hear your opinion on C functions calling problem and your proposal to solve it, if you dislike mine. Also with examples, please, where my solution fails and your one rocks. )
I don't dislike your approach at all. I just feel that it needs to allow for a little more use cases. Given the proposal is for phobos. What you have done looks fine for most cases to c libraries. I'm just worried that it has less use cases then it could have. I'm just nitpicking so don't mind me too much :)
Mar 20 2014
parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
20.03.2014 13:52, Rikki Cattermole пишет:
 On Thursday, 20 March 2014 at 09:32:33 UTC, Denis Shelomovskij wrote:
 20.03.2014 13:20, Rikki Cattermole пишет:
 On Thursday, 20 March 2014 at 08:24:30 UTC, Denis Shelomovskij wrote:
 It's filed as enhancement 12418 [2]:

 C-strings processing is a special and common case so:
 1. C-strings should be supported with both performance and usability.
 2. There should be a dedicated module for C-strings (instead of adding
 such functions here and there in other modules).

 Current state: there is no good support for C-strings in Phobos, there
 is slow and broken `toStringz` (Issue 12417 [3]), and no standard way
 to make many common operations, like converting returned C-string to
 string and releasing its memory or creating a C-string from string
 using an allocation function.

 So I propose to add `unstd.c.string` [1] module to Phobos which
 include all use-cases I have seen implementing (correct and fast in
 contrast to existing ones like GtkD (yes, it's both incorrect and slow
 because of tons of GC allocations)) C library wrappers.


 [1] http://denis-sh.bitbucket.org/unstandard/unstd.c.string.html
 [2] https://d.puremagic.com/issues/show_bug.cgi?id=12418
 [3] https://d.puremagic.com/issues/show_bug.cgi?id=12417
Looks like it wouldn't be really useful with Windows API. Given that wstrings are more common there.
You misunderstand the terminology. C string is a zero-terminated string. Also looks like you didn't even go to docs page as the second example is WinAPI one.
I understand how c strings work. It would be nice to have more unittests for dstring/wstring, because it looks more geared towards char/string. Which is why it looks on the offset that it is less going to work.
I'd say must unittests do test UTF-16 & UTF-32 versions. As for documentation, function signatures contain template parameter for character but probably there is a lack of ddoc unittests and/or documentation.
 Another thing that would be nice to have is a wrapper struct for the
 pointer that allows accessing via e.g. opIndex and opSlice. Ext.
 Use case: Store the struct on D side to make sure GC doesn't clean it up
 and still be able to access and modify it like a normal string easily.
I don't understand the use-case. If you did implemented some C library wrappers and have a personal experience, I'd like to hear your opinion on C functions calling problem and your proposal to solve it, if you dislike mine. Also with examples, please, where my solution fails and your one rocks. )
I don't dislike your approach at all. I just feel that it needs to allow for a little more use cases. Given the proposal is for phobos. What you have done looks fine for most cases to c libraries. I'm just worried that it has less use cases then it could have. I'm just nitpicking so don't mind me too much :)
Thanks. So the algorithm is like this: find C library which needs more love and file me an issue [1]. As I just added all common use-cases I have seen. [1] https://bitbucket.org/denis-sh/unstandard/issues -- Денис В. Шеломовский Denis V. Shelomovskij
Mar 20 2014
prev sibling parent reply "angel" <andrey.gelman gmail.com> writes:
Going slightly beyond a new module code, it might, possibly, be 
useful to enable zero-terminated string creation on the core 
language level, with:
     auto mystr = "hello"z;

The 'z' in the end is much the same as 'L' in a '5L' ...
Mar 21 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 21 March 2014 at 19:59:51 UTC, angel wrote:
 Going slightly beyond a new module code, it might, possibly, be 
 useful to enable zero-terminated string creation on the core 
 language level, with:
     auto mystr = "hello"z;
The core language already knows zero-terminated strings: void main() { immutable(char)* s = "lol"; } Regular 8-bit strings implicitly convert to pointers without needing to explicitly call the .ptr property and they are always zero terminated automatically. This is why you can write printf("foo"); in D and have it just work without complaining about needing toStringz. You can also write: const char* s = "lol"; and that works too. Not quite auto, but not a big hassle.
Mar 21 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
You could also write:

alias toStringz z;

auto foo = "bar".z;

and that would work too!
Mar 21 2014
next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 alias toStringz z;

 auto foo = "bar".z;
In this case "bar" is already zero-terminated right? See "String literals already have a 0 appended to them" in http://dlang.org/arrays.html
Mar 22 2014
prev sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 You could also write:

 alias toStringz z;

 auto foo = "bar".z;

 and that would work too!
DMD currently cannot infer aliases to be callable using UCFS unfortunately: unittest { import std.stdio: wln = writeln; import std.string; wln(typeof("a".z).stringof); } errors with t_string.d(19,19): Error: no property 'z' for type 'string' Shouldn't be to hard to fix, though.
Mar 22 2014
next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:


 unittest {
     import std.stdio: wln = writeln;
     import std.string;
     wln(typeof("a".z).stringof);
 }
Correction: unittest { import std.stdio: wln = writeln; import std.string; alias z = toStringz; wln(typeof("a".z).stringof); } gives same error t_string.d(7,19): Error: no property 'z' for type 'string'
Mar 22 2014
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Saturday, 22 March 2014 at 13:01:11 UTC, Nordlöw wrote:
 gives same error
That's because you made the alias local, UFCS only works with global symbols right now (which is actually by design, though I don't think it is a great design). So this works: // move these out to module scope import std.string; alias z = toStringz; unittest { import std.stdio: wln = writeln; wln(typeof("a".z).stringof); // now we're good/ }
Mar 22 2014
next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Saturday, 22 March 2014 at 13:21:45 UTC, Adam D. Ruppe wrote:
 On Saturday, 22 March 2014 at 13:01:11 UTC, Nordlöw wrote:
 gives same error
That's because you made the alias local, UFCS only works with global symbols right now (which is actually by design, though I don't think it is a great design). So this works: // move these out to module scope import std.string; alias z = toStringz; unittest { import std.stdio: wln = writeln; wln(typeof("a".z).stringof); // now we're good/ }
Ok. Great. Still...I believe a warning hint should be outputted. This is not obvious. /Per
Mar 22 2014
prev sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 That's because you made the alias local, UFCS only works with 
 global symbols right now (which is actually by design, though I 
 don't think it is a great design).
What were the motivations behind this choice of design?
Mar 22 2014
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/22/14, 6:01 AM, "Nordlöw" wrote:


 unittest {
     import std.stdio: wln = writeln;
     import std.string;
     wln(typeof("a".z).stringof);
 }
Correction: unittest { import std.stdio: wln = writeln; import std.string; alias z = toStringz; wln(typeof("a".z).stringof); } gives same error t_string.d(7,19): Error: no property 'z' for type 'string'
Please bugzilla, thanks! -- Andrei
Mar 22 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 Shouldn't be to hard to fix, though.
Does anybody know if there is an Issue for this?
Mar 22 2014
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/22/14, "Nordlw" <per.nordlow gmail.com> wrote:
 DMD currently cannot infer aliases to be callable using UCFS
 unfortunately
Actually you're running into UFCS not working for module-scoped imports. The following will work: ----- import std.stdio: wln = writeln; import std.string; alias toStringz z; void main() { wln(typeof("a".z).stringof); // works ok } ----- UFCS not working for module-scoped imports is a filed bug.
Mar 22 2014
next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 UFCS not working for module-scoped imports is a filed bug.
Ok. Great!
Mar 22 2014
prev sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 UFCS not working for module-scoped imports is a filed bug.
Do you a reference to this bugzilla issue?
Mar 22 2014
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/22/14, "Nordlw" <per.nordlow gmail.com> wrote:
 UFCS not working for module-scoped imports is a filed bug.
Do you have a reference to this bugzilla issue?
https://d.puremagic.com/issues/show_bug.cgi?id=6185
Mar 22 2014
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/22/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 3/22/14, "Nordlw" <per.nordlow gmail.com> wrote:
 UFCS not working for module-scoped imports is a filed bug.
Do you have a reference to this bugzilla issue?
https://d.puremagic.com/issues/show_bug.cgi?id=6185
Oops, that's slightly different and solved. I'm not sure if the alias version is filed.
Mar 22 2014
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/22/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 https://d.puremagic.com/issues/show_bug.cgi?id=6185
Oops, that's slightly different and solved. I'm not sure if the alias version is filed.
Looks like what happened was I filed the 'alias' version as a duplicate of 6185, then 6185 was fixed but not the test-case in 9515. Gonna reopen it now: https://d.puremagic.com/issues/show_bug.cgi?id=9515
Mar 22 2014