www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - mixin template's alias parameter ... ignored ?

reply someone <someone somewhere.com> writes:
```d
mixin template templateUGC (
    typeStringUTF,
    alias lstrStructureID
    ) {

    public struct lstrStructureID {

       typeStringUTF whatever;

    }

}

mixin templateUGC!(string,  "gudtUGC08");
mixin templateUGC!(dstring, "gudtUGC16");
mixin templateUGC!(wstring, "gudtUGC32");

void main() {

    gudtUGC32 something; /// Error: undefined identifier 
`gudtUGC32`

}
```

I cannot manage to get this right; not even with:

```d
    public struct mixin(lstrStructureID) { ... }
```

because the argument seems to require a complete statement.
Jul 10
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 7/10/21 10:20 PM, someone wrote:

 mixin template templateUGC (
     typeStringUTF,
     alias lstrStructureID
     ) {

     public struct lstrStructureID {
The only way that I know is to take a string parameter and use it with a string mixin: mixin template templateUGC ( typeStringUTF, string lstrStructureID ) { mixin("public struct " ~ lstrStructureID ~ q{ { typeStringUTF whatever; } }); } mixin templateUGC!(string, "gudtUGC08"); mixin templateUGC!(dstring, "gudtUGC16"); mixin templateUGC!(wstring, "gudtUGC32"); void main() { gudtUGC32 something; } Ali
Jul 10
next sibling parent someone <someone somewhere.com> writes:
On Sunday, 11 July 2021 at 05:54:48 UTC, Ali Çehreli wrote:

 The only way that I know is to take a string parameter and use 
 it with a string mixin:
Yes, that I tried, but the structure has a lot of lines of codes and so it is impractical and of course it will turn out difficult to debug. Since this seems to be a dead-end I did reshuffle some things around: ```d /// for illustration purposes only: alias stringUTF08 = string; /// = immutable(char )[]; alias stringUTF16 = dstring; /// = immutable(dchar)[]; alias stringUTF32 = wstring; /// = immutable(wchar)[]; alias stringUGC08 = gudtUGC!(stringUTF08); alias stringUGC16 = gudtUGC!(stringUTF16); alias stringUGC32 = gudtUGC!(stringUTF32); public struct gudtUGC(typeStringUTF) { typeStringUTF whatever; ... lots of functions using typeStringUTF here } void main() { version (useUTF08) { stringUGC08 lugcSequence3 = stringUGC08(r"..."c); } version (useUTF16) { stringUGC16 lugcSequence3 = stringUGC16(r"..."d); } version (useUTF32) { stringUGC32 lugcSequence3 = stringUGC32(r"..."w); } } ``` It works. Thanks Ali :) !
Jul 10
prev sibling parent reply someone <someone somewhere.com> writes:
On Sunday, 11 July 2021 at 05:54:48 UTC, Ali Çehreli wrote:

 Ali
Primarily to Ali & Steve for their help, be advised, this post will be somehow ... long. Some bit of background to begin with: a week or so ago I posted asking advice on code safeness, and still I didn't reply to the ones that kindly answered. Seeing some replies, and encountering a code issue regarding string manipulation, I pretty soon figured out that I still did not have solid knowledge on many basic things regarding D, so I put the brakes on, and went to square one and started reading and researching some things a bit more ... slowly. One of the things that struck me this week is that UniCode string manipulation in many cases is more complex that I previously thought, because there is no precise-concept of what is a character in UniCode, at least, not the way we are used to with plain-old-ASCII. After reading a lot of about it (this was good: https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-u icode-code-points/) I learned of code-units, code-points, abstract-graphemes, graphemes-clusters, and the like. And I learned the inner details of the UTF encodings and that UTF-32 is best (almost required) for string processing (easier, faster, etc) and of course UTF-8 for definitive storage, and UTF-16 to the trashcan unless you need to interface with Windows (I was previously using UTF-8 within all my code for processing). So, in order to manipulate a string, say, left(n), right(n), substr(n,m), ie: the usual stuff for many languages/libraries, I need to operate on grapheme-clusters and not in code-points and never ever on code-units, at least, for unexpected text, ie: incoming text, user-input, etc, the things that we can not control beforehand. Both primary D books, Andrei's and Ali's ones, as the D documentation, have plenty of examples but they are mainly focused on simple things like strings having nothing-out-of-the-ordinary. They perform string manipulation mainly slicing the source string (ie: the char array) with the functions of std.range like take, takeOne, etc. I needed to set this things once-and-for-all for my code and thus I decided to build a grapheme-aware UDT that once instantiated with any given string will provide the usual string manipulation functions so I can forget the minutiae about them. The unittest at the bottom has many usage examples. The whole UDT needed to be templated for the three string types (string, dstring, wstring -and nothing else) and this was what produced this post to begin with. This issue was solved, not the way I liked to, but solved. The code works alas for something grapheme arrays (foreach always missing the last one). I ended up with the following (as usual advice/suggestions welcomed): ```d /// testing D on 2021-06~07 import std.algorithm : map, joiner; import std.array : array; import std.conv : to; import std.range : walkLength, take, tail, drop, dropBack; import std.stdio; import std.uni : Grapheme, byGrapheme; alias stringUGC = Grapheme; alias stringUGC08 = gudtUGC!(stringUTF08); alias stringUGC16 = gudtUGC!(stringUTF16); alias stringUGC32 = gudtUGC!(stringUTF32); alias stringUTF08 = string; /// same as immutable(char )[]; alias stringUTF16 = dstring; /// same as immutable(dchar)[]; alias stringUTF32 = wstring; /// same as immutable(wchar)[]; void main() {} //mixin templateUGC!(stringUTF08, r"gudtUGC08"w); /// if these main() //mixin templateUGC!(stringUTF16, r"gudtUGC16"w); //mixin templateUGC!(stringUTF32, r"gudtUGC32"w); //template templateUGC ( // typeStringUTF, // alias lstrStructureID // ) { public struct gudtUGC(typeStringUTF) { /// UniCode grapheme cluster‐aware string manipulation void popFront() { ++pintSequenceCurrent; } bool empty() { return pintSequenceCurrent == pintSequenceCount; } typeStringUTF front() { return toUTFtake(pintSequenceCurrent); } private stringUGC[] pugcSequence; private size_t pintSequenceCount = cast(size_t) 0; private size_t pintSequenceCurrent = cast(size_t) 0; property public size_t count() { return pintSequenceCount; } this(scope const typeStringUTF lstrSequence) { decode(lstrSequence); } safe public size_t decode( scope const typeStringUTF lstrSequence ) { scope size_t lintSequenceCount = cast(size_t) 0; if (lstrSequence is null) { pugcSequence = null; pintSequenceCount = cast(size_t) 0; pintSequenceCurrent = cast(size_t) 0; } else { pugcSequence = lstrSequence.byGrapheme.array; pintSequenceCount = pugcSequence.walkLength; pintSequenceCurrent = cast(size_t) 1; lintSequenceCount = pintSequenceCount; } return lintSequenceCount; } safe public typeStringUTF encode() { /// UniCode grapheme cluster to UniCode UTF‐encoded string scope typeStringUTF lstrSequence = null; if (pintSequenceCount >= cast(size_t) 1) { lstrSequence = pugcSequence .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF toUTFtake( /// UniCode grapheme cluster to UniCode UTF‐encoded string scope const size_t lintStart, scope const size_t lintCount = cast(size_t) 1 ) { scope typeStringUTF lstrSequence = null; if (lintStart <= lintStart + lintCount) { scope size_t lintRange1 = lintStart - cast(size_t) 1; scope size_t lintRange2 = lintRange1 + lintCount; if (lintRange1 >= cast(size_t) 0 && lintRange2 <= pintSequenceCount) { lstrSequence = pugcSequence[lintRange1..lintRange2] .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } } return lstrSequence; } safe public typeStringUTF toUTFtakeL( /// UniCode grapheme cluster to UniCode UTF‐encoded string scope const size_t lintCount ) { scope typeStringUTF lstrSequence = null; if (lintCount <= pintSequenceCount) { lstrSequence = pugcSequence .take(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF toUTFtakeR( /// UniCode grapheme cluster to UniCode UTF‐encoded string scope const size_t lintCount ) { scope typeStringUTF lstrSequence = null; if (lintCount <= pintSequenceCount) { lstrSequence = pugcSequence .tail(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF toUTFchopL( /// UniCode grapheme cluster to UniCode UTF‐encoded string scope const size_t lintCount ) { scope typeStringUTF lstrSequence = null; if (lintCount <= pintSequenceCount) { lstrSequence = pugcSequence .drop(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF toUTFchopR( /// UniCode grapheme cluster to UniCode UTF‐encoded string scope const size_t lintCount ) { scope typeStringUTF lstrSequence = null; if (lintCount <= pintSequenceCount) { lstrSequence = pugcSequence .dropBack(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF toUTFpadL( /// UniCode grapheme cluster to UniCode UTF‐encoded string scope const size_t lintCount, scope const typeStringUTF lstrPadding = cast(typeStringUTF) r" " ) { scope typeStringUTF lstrSequence = null; if (lintCount > pintSequenceCount) { lstrSequence = null; /// pending } return lstrSequence; } safe public typeStringUTF toUTFpadR( /// UniCode grapheme cluster to UniCode UTF‐encoded string scope const size_t lintCount, scope const typeStringUTF lstrPadding = cast(typeStringUTF) r" " ) { scope typeStringUTF lstrSequence = null; if (lintCount > pintSequenceCount) { lstrSequence = null; /// pending } return lstrSequence; } /* safe public gudtUGC(typeStringUTF) take( scope const size_t lintStart, scope const size_t lintCount = cast(size_t) 1 ) { /// the idea behind this new set of functions (returning a new object) is to enable the following one‐liner constructions: /// assert(lugcSequence3.take(35, 3).take(1,2).take(1,1).encode() == cast(stringUTF) r"日"); /// ooops … error: function declaration without return type. (Note that constructors are always named `this`) /// ooops … error: no identifier for declarator ` safe gudtUGC(typeStringUTF)` scope gudtUGC(typeStringUTF) lugcSequence; if (lintStart <= lintStart + lintCount) { scope size_t lintRange1 = lintStart - cast(size_t) 1; scope size_t lintRange2 = lintRange1 + lintCount; if (lintRange1 >= cast(size_t) 0 && lintRange2 <= pintSequenceCount) { lugcSequence = gudtUGC(typeStringUTF)(pugcSequence[lintRange1..lintRange2] .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ); } } return lugcSequence; }*/ } //} unittest { version (useUTF08) { scope stringUTF08 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"c; scope stringUTF08 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"c; scope stringUTF08 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"c; } version (useUTF16) { scope stringUTF16 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"d; scope stringUTF16 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"d; scope stringUTF16 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"d; } version (useUTF32) { scope stringUTF32 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"w; scope stringUTF32 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"w; scope stringUTF32 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"w; } scope size_t lintSequence1sizeUTF = lstrSequence1.length; scope size_t lintSequence2sizeUTF = lstrSequence2.length; scope size_t lintSequence3sizeUTF = lstrSequence3.length; scope size_t lintSequence1sizeUGA = lstrSequence1.walkLength; scope size_t lintSequence2sizeUGA = lstrSequence2.walkLength; scope size_t lintSequence3sizeUGA = lstrSequence3.walkLength; scope size_t lintSequence1sizeUGC = lstrSequence1.byGrapheme.walkLength; scope size_t lintSequence2sizeUGC = lstrSequence2.byGrapheme.walkLength; scope size_t lintSequence3sizeUGC = lstrSequence3.byGrapheme.walkLength; assert(lintSequence1sizeUGC == cast(size_t) 50); assert(lintSequence2sizeUGC == cast(size_t) 50); assert(lintSequence3sizeUGC == cast(size_t) 50); assert(lintSequence1sizeUGA == cast(size_t) 50); assert(lintSequence2sizeUGA == cast(size_t) 50); assert(lintSequence3sizeUGA == cast(size_t) 52); version (useUTF08) { assert(lintSequence1sizeUTF == cast(size_t) 50); assert(lintSequence2sizeUTF == cast(size_t) 60); assert(lintSequence3sizeUTF == cast(size_t) 91); } version (useUTF16) { assert(lintSequence1sizeUTF == cast(size_t) 50); assert(lintSequence2sizeUTF == cast(size_t) 50); assert(lintSequence3sizeUTF == cast(size_t) 52); } version (useUTF32) { assert(lintSequence1sizeUTF == cast(size_t) 50); assert(lintSequence2sizeUTF == cast(size_t) 50); assert(lintSequence3sizeUTF == cast(size_t) 57); } /// the following should be the same regardless of the encoding being used and is the whole point of this UDT being made: version (useUTF08) { alias stringUTF = stringUTF08; scope stringUGC08 lugcSequence3 = stringUGC08(lstrSequence3); } version (useUTF16) { alias stringUTF = stringUTF16; scope stringUGC16 lugcSequence3 = stringUGC16(lstrSequence3); } version (useUTF32) { alias stringUTF = stringUTF32; scope stringUGC32 lugcSequence3 = stringUGC32(lstrSequence3); } assert(lugcSequence3.encode() == lstrSequence3); assert(lugcSequence3.toUTFtake(21) == cast(stringUTF) r"р"); assert(lugcSequence3.toUTFtake(27) == cast(stringUTF) r"й"); assert(lugcSequence3.toUTFtake(35) == cast(stringUTF) r"日"); assert(lugcSequence3.toUTFtake(37) == cast(stringUTF) r"語"); assert(lugcSequence3.toUTFtake(21, 7) == cast(stringUTF) r"русский"); assert(lugcSequence3.toUTFtake(35, 3) == cast(stringUTF) r"日本語"); assert(lugcSequence3.toUTFtakeL(1) == cast(stringUTF) r"ä"); assert(lugcSequence3.toUTFtakeR(1) == cast(stringUTF) r"😎"); assert(lugcSequence3.toUTFtakeL(7) == cast(stringUTF) r"äëåčñœß"); assert(lugcSequence3.toUTFtakeR(16) == cast(stringUTF) r"日本語 = japanese 😎"); assert(lugcSequence3.toUTFchopL(10) == cast(stringUTF) r"russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"); assert(lugcSequence3.toUTFchopR(21) == cast(stringUTF) r"äëåčñœß … russian = русский 🇷🇺"); version (useUTF08) { scope stringUTF08 lstrSequence3reencoded; } version (useUTF16) { scope stringUTF16 lstrSequence3reencoded; } version (useUTF32) { scope stringUTF32 lstrSequence3reencoded; } for ( size_t lintSequenceUGC = cast(size_t) 1; lintSequenceUGC <= lintSequence3sizeUGC; ++lintSequenceUGC ) { lstrSequence3reencoded ~= lugcSequence3.toUTFtake(lintSequenceUGC); } assert(lstrSequence3reencoded == lstrSequence3); lstrSequence3reencoded = null; version (useUTF08) { foreach (stringUTF08 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } } version (useUTF16) { foreach (stringUTF16 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } } version (useUTF32) { foreach (stringUTF32 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } } assert(lstrSequence3reencoded == lstrSequence3); /// ooops … } ```
Jul 11
parent reply ag0aep6g <anonymous example.com> writes:
On 12.07.21 03:37, someone wrote:
 I ended up with the following (as usual advice/suggestions welcomed): 
[...]> alias stringUTF16 = dstring; /// same as immutable(dchar)[];> alias stringUTF32 = wstring; /// same as immutable(wchar)[]; Bug: You mixed up `wstring` and `dstring`. `wstring` is UTF-16. `dstring` is UTF-32. [...]
 public struct gudtUGC(typeStringUTF) { /// UniCode grapheme 
 cluster‐aware string manipulation
Style: `typeStringUTF` is a type, so it should start with a capital letter (`TypeStringUTF`). [...]
     private size_t pintSequenceCount = cast(size_t) 0;
     private size_t pintSequenceCurrent = cast(size_t) 0;
Style: There's no need for the casts (throughout). [...]
      safe public typeStringUTF encode() { /// UniCode grapheme cluster 
 to UniCode UTF‐encoded string
 
        scope typeStringUTF lstrSequence = null;
[...]
        return lstrSequence;
 
     }
Bug: `scope` makes no sense if you want to return `lstrSequence` (throughout).
      safe public typeStringUTF toUTFtake( /// UniCode grapheme cluster 
 to UniCode UTF‐encoded string
        scope const size_t lintStart,
        scope const size_t lintCount = cast(size_t) 1
        ) {
Style: `scope` does nothing on `size_t` parameters (throughout). [...]
        if (lintStart <= lintStart + lintCount) {
[...]
           scope size_t lintRange1 = lintStart - cast(size_t) 1;
Possible bug: Why subtract 1?
           scope size_t lintRange2 = lintRange1 + lintCount;
 
           if (lintRange1 >= cast(size_t) 0 && lintRange2 <= 
 pintSequenceCount) {
Style: The first half of that condition is pointless. `lintRange1` is unsigned, so it will always be greater than or equal to 0. If you want to defend against overflow, you have to do it before subtracting. [...]
           }
 
        }
[...]
     }
[...]
      safe public typeStringUTF toUTFpadL( /// UniCode grapheme cluster 
 to UniCode UTF‐encoded string
        scope const size_t lintCount,
        scope const typeStringUTF lstrPadding = cast(typeStringUTF) r" "
Style: Cast is not needed (throughout).
        ) {
[...]
     }
[...]
 }
[...]
Jul 11
parent reply someone <someone somewhere.com> writes:
On Monday, 12 July 2021 at 05:33:22 UTC, ag0aep6g wrote:

 Bug: You mixed up `wstring` and `dstring`. `wstring` is UTF-16. 
 `dstring` is UTF-32.
I can't believe this one ... these lines were introduced almost a week ago LoL !
 Style: `typeStringUTF` is a type, so it should start with a 
 capital letter (`TypeStringUTF`).
Style is a personal preference; I am not following D style conventions (if any) nor do I follow any other language style conventions; I have my personal style and I apply it everywhere, I think it is not important which style you use, what is important in the end is that you adhere to your chosen style all the time -unless, of course, you are contributing to x project which states its own style and then there's no choice but to follow it.
 private size_t pintSequenceCount = cast(size_t) 0;
 private size_t pintSequenceCurrent = cast(size_t) 0;
 Style: There's no need for the casts (throughout).
I know. I do these primarily because of muscle memory and secondly because I try to write code thinking someone not knowing the language details may be porting it later so I tend to state the obvious; besides, it won't hurt, and it helps me in many ways.
  safe public typeStringUTF encode() {
 
        scope typeStringUTF lstrSequence = null;
[...]
        return lstrSequence;
 
     }
Bug: `scope` makes no sense if you want to return `lstrSequence` (throughout).
Teach me please: if I declare a variable right after the function declaration like this one ... ain't scope its default visibility ? I understand (not quite sure whether correct or not right now) that everything you declare without explicitly stating its visibility (public/private/whatever) becomes scope ie: what in many languages are called a local variable. What actually is the visibility of lstrSequence without my scope declaration ?
  safe public typeStringUTF toUTFtake(
    scope const size_t lintStart,
    scope const size_t lintCount = cast(size_t) 1
    ) {
 Style: `scope` does nothing on `size_t` parameters (throughout).
A week ago I was using [in] almost everywhere for parameters, ain't [in] an alias for [scope const] ? Did I get it wrong ? I'm not talking style here, I'm talking unexpected (to me) functionality.
 scope size_t lintRange1 = lintStart - cast(size_t) 1;
 scope size_t lintRange2 = lintRange1 + lintCount;
 Possible bug: Why subtract 1?
Because ranges are zero-based for their first argument and one-based for their second; ie: something[n..m] where m should always be one-beyond than the one we want.
 if (lintRange1 >= cast(size_t) 0 && lintRange2 <= 
 pintSequenceCount) {
 Style: The first half of that condition is pointless. 
 `lintRange1` is unsigned, so it will always be greater than or 
 equal to 0. If you want to defend against overflow, you have to 
 do it before subtracting.
Indeed. Refactored the code (previously were int parameters) and got stuck in the wrong place ! All in all, thank you very much for your detailed reply, this kind of stuff is what helps me most understanding the language nuances :)
Jul 12
next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Monday, 12 July 2021 at 22:35:27 UTC, someone wrote:
 Bug: `scope` makes no sense if you want to return 
 `lstrSequence` (throughout).
Teach me please: if I declare a variable right after the function declaration like this one ... ain't scope its default visibility ? I understand (not quite sure whether correct or not right now) that everything you declare without explicitly stating its visibility (public/private/whatever) becomes scope ie: what in many languages are called a local variable. What actually is the visibility of lstrSequence without my scope declaration ?
Local variables don't have a visibility in the sense of public or private. They do have a 'scope' in the general computer science sense, and a variable can be said to be in or out of scope at different points in a program, but this is the case without regard for whether the variable is declared with D's `scope`. What `scope` says is https://dlang.org/spec/attribute.html#scope
For local declarations, scope ... means that the destructor for 
an object is automatically called when the reference to it goes 
out of scope.
The value of a normal, non-scope local variable has a somewhat indefinite lifetime: you have to examine the program and think about operations on the variable to be sure about that lifetime. Does it survive the function? Might it die even before the function completes? Does it live until the next GC collection or until the program ends? These are questions you can ask. For a `scope` variable, the lifetime of its value ends with the scope of the variable. Consider: ```d import std.stdio : writeln, writefln; import std.conv : to; import core.memory : pureMalloc, pureFree; class Noisy { static int ids; int* id; this() { id = cast(int*) pureMalloc(int.sizeof); *id = ids++; } ~this() { writefln!"[%d] I perish."(*id); pureFree(id); } } Noisy f() { scope n = new Noisy; return n; } void main() { scope a = f(); writeln("Checking a.n..."); writefln!"a.n = %d"(*a.id); } ``` Which has this output on my system: ```d [0] I perish. Checking a.n... Error: program killed by signal 11 ``` Or with -preview=dip1000, this dmd output: ```d Error: scope variable `n` may not be returned ``` the lifetime of the Noisy object bound by `scope n` is the same as the scope of the variable, and the varaible goes out of scope when the function returns, so the Noisy object is destructed at that point.
Jul 12
parent reply someone <someone somewhere.com> writes:
On Monday, 12 July 2021 at 23:18:57 UTC, jfondren wrote:
 On Monday, 12 July 2021 at 22:35:27 UTC, someone wrote:
 Bug: `scope` makes no sense if you want to return 
 `lstrSequence` (throughout).
Teach me please: if I declare a variable right after the function declaration like this one ... ain't scope its default visibility ? I understand (not quite sure whether correct or not right now) that everything you declare without explicitly stating its visibility (public/private/whatever) becomes scope ie: what in many languages are called a local variable. What actually is the visibility of lstrSequence without my scope declaration ?
Local variables don't have a visibility in the sense of public or private. They do have a 'scope' in the general computer science sense, and a variable can be said to be in or out of scope at different points in a program, but this is the case without regard for whether the variable is declared with D's `scope`. What `scope` says is https://dlang.org/spec/attribute.html#scope
For local declarations, scope ... means that the destructor for 
an object is automatically called when the reference to it goes 
out of scope.
The value of a normal, non-scope local variable has a somewhat indefinite lifetime: you have to examine the program and think about operations on the variable to be sure about that lifetime. Does it survive the function? Might it die even before the function completes? Does it live until the next GC collection or until the program ends? These are questions you can ask. For a `scope` variable, the lifetime of its value ends with the scope of the variable. Consider: ```d import std.stdio : writeln, writefln; import std.conv : to; import core.memory : pureMalloc, pureFree; class Noisy { static int ids; int* id; this() { id = cast(int*) pureMalloc(int.sizeof); *id = ids++; } ~this() { writefln!"[%d] I perish."(*id); pureFree(id); } } Noisy f() { scope n = new Noisy; return n; } void main() { scope a = f(); writeln("Checking a.n..."); writefln!"a.n = %d"(*a.id); } ``` Which has this output on my system: ```d [0] I perish. Checking a.n... Error: program killed by signal 11 ``` Or with -preview=dip1000, this dmd output: ```d Error: scope variable `n` may not be returned ``` the lifetime of the Noisy object bound by `scope n` is the same as the scope of the variable, and the varaible goes out of scope when the function returns, so the Noisy object is destructed at that point.
Some days ago I assumed scope was, as I previously stated, the local default scope, and explicitly added scope to all my *local* variables. Soon afterward I encountered a situation which gave me the "program killed by signal 11" which I did not fully-understand why it was happening at all, because it never occurred to me it was connected to my previous scope refactor. Now I understand. Regarding -preview=dip1000 (and the explicit error description that could have helped me a lot back then) : DMD man page says the preview switch lists upcoming language features, so DIP1000 is something like a D proposal as I glanced somewhere sometime ago ... where do DIPs get listed (docs I mean) ? So, every *local* variable within a chunk of code, say, a function, should be declared without anything else to avoid this type of behavior ? I mean, anything in code that it is not private/public/etc. Or, as I presume, every *local* meaning *aux* variable that won't need to survive the function should be declared scope but *not* the one we are returning ... lstrSequence in my specific case ? Can I declare everything *scope* within and on the last line using lstrSequence.dup instead ? dup/idup duplicates the variable (the first allowing mutability while the second not) right ? Which one of the following approaches do you consider best practice if you were directed to explicitly state as much behavior as possible ? Your reply with this example included was very illustrating to me -right to the point. Thanks a lot for your time :) !
Jul 12
parent Mike Parker <aldacron gmail.com> writes:
On Monday, 12 July 2021 at 23:45:57 UTC, someone wrote:
 Regarding -preview=dip1000 (and the explicit error description 
 that could have helped me a lot back then) : DMD man page says 
 the preview switch lists upcoming language features, so DIP1000 
 is something like a D proposal as I glanced somewhere sometime 
 ago ... where do DIPs get listed (docs I mean) ?
DIPs are handled in this repository: https://github.com/dlang/DIPs This is a list of every DIP that is going through or has gone through the review process: https://github.com/dlang/DIPs/blob/master/DIPs/README.md DIP1000 is here: https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1000.md But it doesn't describe the actual implementation, as described here: https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1000.md#addendum I don't know what all the differences are, as I haven't followed it.
 So, every *local* variable within a chunk of code, say, a 
 function, should be declared without anything else to avoid 
 this type of behavior ? I mean, anything in code that it is not 
 private/public/etc.
Not "without anything", but without scope---unless you're using -preview=dip1000, or unless you're applying it to class references (see below).
 Or, as I presume, every *local* meaning *aux* variable that 
 won't need to survive the function should be declared scope but 
 *not* the one we are returning ... lstrSequence in my specific 
 case ?

 Can I declare everything *scope* within and on the last line 
 using lstrSequence.dup instead ? dup/idup duplicates the 
 variable (the first allowing mutability while the second not) 
 right ?

 Which one of the following approaches do you consider best 
 practice if you were directed to explicitly state as much 
 behavior as possible ?
Consider this example, which demonstrates the original purpose of scope prior to DIP 1000: ```d import std.stdio; class C { int id; this(int id) { this.id = id; } } struct S { int id; this(int id) { this.id = id; } } void main() { { C c1 = new C(1); scope c2 = new C(2); S s1 = S(1); S* s2 = new S(2); scope s3 = new S(3); writeln("The inner scope is exiting now."); } writeln("Main is exiting now."); } static ~this() { writeln("The GC will cleanup after this point."); } ``` Classes are reference types and must be allocated. c1 is allocated on the GC and lives beyond its scope. By applying the scope attribute to c2, its destructor is forced to execute when its scope exits. It is not allocated on the GC, but on the stack. Structs are value types, so s1 is automatically allocated on the stack. Its destructor will be always be called when the scope exits. s2 is a pointer allocated on the GC heap, so its lifetime is managed by the GC and it exists beyond its scope. s3 is also of type S*. The scope attribute has no effect on it, and it is still managed by the GC. If you want stack allocation and RAII destructors for structs, you just use the default behavior like s1. You can run it here: https://run.dlang.io/is/iu7QiO Someone else will have to explain what DIP 1000 actually does right now (if anyone really knows). What I'm certain about is that it prevents things like this: ```d void func() { int i = 10; int* pi = &i; return pi; ``` The compiler has always raised an error when it encountered something like `return &i`, but the above would slip by. With -preview=dip1000, that is also an error. But scope isn't needed on either variable for it to do so. Beyond that, my knowledge of DIP 1000's implementation is limited. But I do know that scope has no effect on variables with no indirections. It's all about indirections (pointers & references). At any rate, DIP 1000 is not yet ready for prime time. Getting it to that state is a current priority of the language maintainers. So for now, you probably just shouldn't worry about scope at all.
Jul 12
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 7/12/21 3:35 PM, someone wrote:

 private size_t pintSequenceCurrent = cast(size_t) 0;
 Style: There's no need for the casts (throughout).
[...] besides, it won't hurt, and it helps me in many ways.
I think you are doing it only for literal values but in general, casts can be very cumbersome and harmful. For example, if we change the parameter from 'int' to 'long', the cast in the function body is a bug to be chased and fixed: // Used to be 'int arg' void foo(long arg) { // ... auto a = cast(int)arg; // BUG? // ... } void main() { foo(long.max); } Ali
Jul 12
parent reply someone <someone somewhere.com> writes:
On Monday, 12 July 2021 at 23:25:13 UTC, Ali Çehreli wrote:
 On 7/12/21 3:35 PM, someone wrote:

 private size_t pintSequenceCurrent = cast(size_t) 0;
 Style: There's no need for the casts (throughout).
[...] besides, it won't hurt, and it helps me in many ways.
I think you are doing it only for literal values but in general, casts can be very cumbersome and harmful.
Cumbersome and harmful ... could you explain ?
 For example, if we change the parameter from 'int' to 'long', 
 the cast in the function body is a bug to be chased and fixed:

 // Used to be 'int arg'
 void foo(long arg) {
   // ...
   auto a = cast(int)arg;  // BUG?
   // ...
 }
nope, I'll never do such a downcast UNLESS I previously tested with if () {} for proper int range; I use cast a lot, but this is mainly because I am used to strongly-typed languages etc etc, for example if for whatever reason I have to: ushort a = 250; ubyte b = cast(ubyte) a; I'll do: ushort a = 250; ubyte b = cast(ubyte) 0; /// redundant of course; but we don't have nulls in D for ints so this is muscle-memory if (a <= 255) { /// or ubyte.max instead of 255 (I think it is possible) b = cast(ubyte) a; }
 void main() {
   foo(long.max);
 }

 Ali
Jul 12
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 7/12/21 5:42 PM, someone wrote:

 On Monday, 12 July 2021 at 23:25:13 UTC, Ali =C3=87ehreli wrote:
 On 7/12/21 3:35 PM, someone wrote:

 private size_t pintSequenceCurrent =3D cast(size_t) 0;
 Style: There's no need for the casts (throughout).
[...] besides, it won't hurt, and it helps me in many ways.
I think you are doing it only for literal values but in general, cast=
s
 can be very cumbersome and harmful.
Cumbersome and harmful ... could you explain ?
Cumbersome because one has to make sure existing casts are correct after = changing a type. Harmful because it bypasses the compiler's type checking.
 For example, if we change the parameter from 'int' to 'long', the cas=
t
 in the function body is a bug to be chased and fixed:

 // Used to be 'int arg'
 void foo(long arg) {
   // ...
   auto a =3D cast(int)arg;  // BUG?
   // ...
 }
nope, I'll never do such a downcast
The point was, nobody did a downcast in that code. The original=20 parameter was 'int' so cast(int) was "correct" initially. Then somebody=20 charnged the parameter to "long" and the cast became potentially harmful.=
 UNLESS I previously tested with if
 () {} for proper int range; I use cast a lot, but this is mainly becau=
se
 I am used to strongly-typed languages etc etc,
Hm. I am used to strongly-typed languages as well and that's exactly why = I *avoid* casts as much as possible. :)
 for example if for
 whatever reason I have to:

 ushort a =3D 250;
 ubyte b =3D cast(ubyte) a;

 I'll do:

 ushort a =3D 250;
 ubyte b =3D cast(ubyte) 0; /// redundant of course; but we don't have
We have a different way of looking at this. :) My first preference would = be: ubyte b; This alternative has less typing than your method and is easier to=20 change the code because 'ubyte' appears only in one place. (DRY principle= =2E) auto b =3D ubyte(0); Another alternative: auto b =3D ubyte.init; Ali
Jul 12
parent someone <someone somewhere.com> writes:
On Tuesday, 13 July 2021 at 05:26:56 UTC, Ali Çehreli wrote:

 Cumbersome because one has to make sure existing casts are 
 correct after changing a type.
ACK.
 Harmful because it bypasses the compiler's type checking.
Hmmm ... I'll be reconsidering my cast usage approach then.
 For example, if we change the parameter from 'int' to
'long', the cast
 in the function body is a bug to be chased and fixed:

 // Used to be 'int arg'
 void foo(long arg) {
   // ...
   auto a = cast(int)arg;  // BUG?
   // ...
 }
nope, I'll never do such a downcast
The point was, nobody did a downcast in that code. The original parameter was 'int' so cast(int) was "correct" initially. Then somebody charnged the parameter to "long" and the cast became potentially harmful.
ACK.
 UNLESS I previously tested with if
 () {} for proper int range; I use cast a lot, but this is
mainly because
 I am used to strongly-typed languages etc etc,
Hm. I am used to strongly-typed languages as well and that's exactly why I *avoid* casts as much as possible. :)
 for example if for
 whatever reason I have to:

 ushort a = 250;
 ubyte b = cast(ubyte) a;

 I'll do:

 ushort a = 250;
 ubyte b = cast(ubyte) 0; /// redundant of course; but we
don't have We have a different way of looking at this. :) My first preference would be: ubyte b; This alternative has less typing than your method and is easier to change the code because 'ubyte' appears only in one place. (DRY principle.) auto b = ubyte(0); Another alternative: auto b = ubyte.init;
ACK. I'll be revisiting the whole matter. I just re-read your http://ddili.org/ders/d.en/cast.html chapter. I did not have a clear understanding between the difference of to!(...) and cast() for example; and, re-reading integer promotion and arithmetic conversions refreshed my knowledge at this point.
 Ali
Jul 13
prev sibling parent reply ag0aep6g <anonymous example.com> writes:
On Monday, 12 July 2021 at 22:35:27 UTC, someone wrote:
 On Monday, 12 July 2021 at 05:33:22 UTC, ag0aep6g wrote:
[...]
 Teach me please: if I declare a variable right after the 
 function declaration like this one ... ain't scope its default 
 visibility ? I understand (not quite sure whether correct or 
 not right now) that everything you declare without explicitly 
 stating its visibility (public/private/whatever) becomes scope 
 ie: what in many languages are called a local variable. What 
 actually is the visibility of lstrSequence without my scope 
 declaration ?
`scope` is not a visibility level. `lstrSequence` is local to the function, so visibility (`public`, `private`, ...) doesn't even apply. Most likely, you don't have any use for `scope` at the moment. You're obviously not compiling with `-preview=dip1000`. And neither should you, because the feature is not ready for a general audience yet. [...]
 Style: `scope` does nothing on `size_t` parameters 
 (throughout).
A week ago I was using [in] almost everywhere for parameters, ain't [in] an alias for [scope const] ? Did I get it wrong ? I'm not talking style here, I'm talking unexpected (to me) functionality.
I'm not sure where we stand with `in`, but let's say that it means `scope const`. The `scope` part of `scope const` still does nothing to a `size_t`. These are all the same: `in size_t`, `const size_t`, `scope const size_t`.
 scope size_t lintRange1 = lintStart - cast(size_t) 1;
 scope size_t lintRange2 = lintRange1 + lintCount;
 Possible bug: Why subtract 1?
Because ranges are zero-based for their first argument and one-based for their second; ie: something[n..m] where m should always be one-beyond than the one we want.
That doesn't make sense. A length of zero is perfectly fine. It's just an empty range. You're making `lintStart` one-based for no reason.
Jul 12
parent reply someone <someone somewhere.com> writes:
On Monday, 12 July 2021 at 23:28:29 UTC, ag0aep6g wrote:

 `scope` is not a visibility level.
Well, that explains why it is not listed among the visibility attributes to begin with -something that at first glance seemed weird to me.
 `lstrSequence` is local to the function, so visibility 
 (`public`, `private`, ...) doesn't even apply.
Being *local* to ... ain't imply visibility too regardless scope not being a visibility attribute ? I mean, scope is restricting the variable to be leaked outside the function/whatever and to me it seems like restricted to be seen from the outside. *Please note* that I am not making an argument against the implementation, I am just trying to understand why it is not being classified as another visibility attribute given that more-or-less has the same concept as a local variable like in other languages.
 Most likely, you don't have any use for `scope` at the moment.
Almost sure if you say so given your vast knowledge of D against my humble first steps LoL.
 You're obviously not compiling with `-preview=dip1000`.
Nope. I didn't knew it even existed.
 And neither should you, because the feature is not ready for a 
 general audience yet.
ACK.
 [...]
 Style: `scope` does nothing on `size_t` parameters 
 (throughout).
A week ago I was using [in] almost everywhere for parameters, ain't [in] an alias for [scope const] ? Did I get it wrong ? I'm not talking style here, I'm talking unexpected (to me) functionality.
I'm not sure where we stand with `in`
You mean *we* = D developers ?
 but let's say that it means `scope const`
This I stated because I read it somewhere in the docs, it was not my assumption.
 The `scope` part of `scope const` still does nothing to a 
 `size_t`.
 These are all the same:
 in size_t
 const size_t
 scope const size_t
OK. Specifically to integers nothing then. But, what about strings and whatever else ? I put them more-or-less as a general rule or so was the idea when I replaced the in's in the parameters app-wide.
 scope size_t lintRange1 = lintStart - cast(size_t) 1;
 scope size_t lintRange2 = lintRange1 + lintCount;
 Possible bug: Why subtract 1?
Because ranges are zero-based for their first argument and one-based for their second; ie: something[n..m] where m should always be one-beyond than the one we want.
That doesn't make sense. A length of zero is perfectly fine. It's just an empty range. You're making `lintStart` one-based for no reason.
For a UDT like mine I think it has a lot of sense because when I think of a string and I want to chop/count/whatever on it my mind works one-based not zero-based. Say "abc" needs b my mind works a lot easier mid("abc", 2, 1) than mid("abc", 1, 1) and besides I am *not* returning a range or a reference slice to a range or whatever I am returning a whole new string construction. If I would be returning a range I will follow common sense since I don't know what will be done thereafter of course.
Jul 12
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Tuesday, 13 July 2021 at 01:03:11 UTC, someone wrote:
 Being *local* to ... ain't imply visibility too regardless 
 scope not being a visibility attribute ? I mean, scope is 
 restricting the variable to be leaked outside the 
 function/whatever and to me it seems like restricted to be seen 
 from the outside. *Please note* that I am not making an 
 argument against the implementation, I am just trying to 
 understand why it is not being classified as another visibility 
 attribute given that more-or-less has the same concept as a 
 local variable like in other languages.
 OK. Specifically to integers nothing then. But, what about 
 strings and whatever else ? I put them more-or-less as a 
 general rule or so was the idea when I replaced the in's in the 
 parameters app-wide.
Hopefully, my post above will shed some light on this.
Jul 12
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Tuesday, 13 July 2021 at 02:22:46 UTC, Mike Parker wrote:
 On Tuesday, 13 July 2021 at 01:03:11 UTC, someone wrote:
 Being *local* to ... ain't imply visibility too regardless 
 scope not being a visibility attribute ? I mean, scope is 
 restricting the variable to be leaked outside the 
 function/whatever and to me it seems like restricted to be 
 seen from the outside.
And I meant to add... local variables are by default visible only inside the scope in which they are declared and, by extension, any inner scopes within that scope, and can never be visible outside. ```d { // Scope A // x can never be visible here { // Scope B int x; { // Scope C // x is visible here } } } ``` The only possible use for your concept of scope applying to visibility would be to prevent x from being visible in in Scope C. But since we already have the private attribute, it would make more sense to use that instead, e.g., `private int x` would not be visible in scope C. I don't know of any language that has that kind of feature, or if it would even be useful. But at any rate, there's no need for a visibility attribute to prevent outer scopes from seeing a local variable, as that's already impossible.
Jul 12
parent someone <someone somewhere.com> writes:
On Tuesday, 13 July 2021 at 02:34:07 UTC, Mike Parker wrote:
 On Tuesday, 13 July 2021 at 02:22:46 UTC, Mike Parker wrote:
 On Tuesday, 13 July 2021 at 01:03:11 UTC, someone wrote:
 Being *local* to ... ain't imply visibility too regardless 
 scope not being a visibility attribute ? I mean, scope is 
 restricting the variable to be leaked outside the 
 function/whatever and to me it seems like restricted to be 
 seen from the outside.
And I meant to add... local variables are by default visible only inside the scope in which they are declared and, by extension, any inner scopes within that scope, and can never be visible outside. ```d { // Scope A // x can never be visible here { // Scope B int x; { // Scope C // x is visible here } } } ```
Yes. This one I understood from the beginning -it was on Ali's book and previously I remember seeing it in Andrei's one too IIRC. http://ddili.org/ders/d.en/name_space.html The thing that I supposed started my confusion was the lack of a statement for it, nothing more; something like: whatever int x; ... it was more of form than concept.
 The only possible use for your concept of scope applying to 
 visibility would be to prevent x from being visible in in Scope 
 C. But since we already have the private attribute, it would 
 make more sense to use that instead, e.g., `private int x` 
 would not be visible in scope C.
No. My concept is/was the same that the one above. It was form not function.
 I don't know of any language that has that kind of feature, or 
 if it would even be useful. But at any rate, there's no need 
 for a visibility attribute to prevent outer scopes from seeing 
 a local variable, as that's already impossible.
Me neither.
Jul 12
prev sibling parent someone <someone somewhere.com> writes:
On Tuesday, 13 July 2021 at 02:22:46 UTC, Mike Parker wrote:

 Hopefully, my post above will shed some light on this.
Yes Mike, a *lot*. Your previous example was crystal-clear -it makes a lot of sense for some class usage scenarios I am thinking of but not for what I did with my example. Now I understand a couple of things more clearly. I was using scope thinking it was something else -now glancing at my code using scope like the way I did is ... pointless; period. I am getting rid of all those statements. Thanks a lot for your example and the links :) !
Jul 12
prev sibling parent reply ag0aep6g <anonymous example.com> writes:
On 13.07.21 03:03, someone wrote:
 On Monday, 12 July 2021 at 23:28:29 UTC, ag0aep6g wrote:
[...]
 I'm not sure where we stand with `in`
You mean *we* = D developers ?
Yes. Let me rephrase and elaborate: I'm not sure what the current status of `in` is. It used to mean `const scope`. But DIP1000 changes the effects of `scope` and there was some discussion about its relation to `in`. Checking the spec, it says that `in` simply means `const` unless you use `-preview=in`. The preview switch makes it `const scope` again, but that's not all. There's also something about passing by reference. https://dlang.org/spec/function.html#in-params [...]
 For a UDT like mine I think it has a lot of sense because when I think 
 of a string and I want to chop/count/whatever on it my mind works 
 one-based not zero-based. Say "abc" needs b my mind works a lot easier 
 mid("abc", 2, 1) than mid("abc", 1, 1) and besides I am *not* returning 
 a range or a reference slice to a range or whatever I am returning a 
 whole new string construction. If I would be returning a range I will 
 follow common sense since I don't know what will be done thereafter of 
 course.
I think you're setting yourself up for off-by-one bugs by going against the grain like that. Your functions are one-based. The rest of the D world, including the standard library, is zero-based. You're bound to forget to account for the difference. But it's your code, and you can do whatever you want, of course. Just looked like it might be a mistake.
Jul 12
parent someone <someone somewhere.com> writes:
On Tuesday, 13 July 2021 at 05:37:49 UTC, ag0aep6g wrote:
 On 13.07.21 03:03, someone wrote:
 On Monday, 12 July 2021 at 23:28:29 UTC, ag0aep6g wrote:
[...]
 I'm not sure where we stand with `in`
You mean *we* = D developers ?
Yes. Let me rephrase and elaborate: I'm not sure what the current status of `in` is. It used to mean `const scope`. But DIP1000 changes the effects of `scope` and there was some discussion about its relation to `in`. Checking the spec, it says that `in` simply means `const` unless you use `-preview=in`. The preview switch makes it `const scope` again, but that's not all. There's also something about passing by reference. https://dlang.org/spec/function.html#in-params
ACK. So for the time being I'll be reverting all my input parameters to const (unless ref or out of course) and when the whole in DIP matter resolves (one way or the other) I'll revert them (or not) accordingly. Parameters declared in read more naturally (and akin to out) than const but is form not function what I need to get right right now.
 For a UDT like mine I think it has a lot of sense because when 
 I think of a string and I want to chop/count/whatever on it my 
 mind works one-based not zero-based. Say "abc" needs b my mind 
 works a lot easier mid("abc", 2, 1) than mid("abc", 1, 1) and 
 besides I am *not* returning a range or a reference slice to a 
 range or whatever I am returning a whole new string 
 construction. If I would be returning a range I will follow 
 common sense since I don't know what will be done thereafter 
 of course.
I think you're setting yourself up for off-by-one bugs by going against the grain like that. Your functions are one-based. The rest of the D world, including the standard library, is zero-based. You're bound to forget to account for the difference.
And I think you have a good point. I'll reconsider.
 But it's your code, and you can do whatever you want, of 
 course. Just looked like it might be a mistake.
All in all the whole module was updated accordingly and it seems it is working as expected (further testing needed) but, in the meantime, I learned a lot of things following the advice given by you, Ali, and others in this forum: ```d /// implementation-bugs [-] using foreach (with this structure) 20483 unittest's last line /// implementation‐tasks [+] reconsider making this whole UDT zero‐based as suggested by ag0aep6g—has a good point /// implementation‐tasks [+] reconsider excessive cast usage as suggested by Ali: bypassing compiler checks could be potentially harmful … cast and integer promotion http://ddili.org/ders/d.en/cast.html /// implementation‐tasks [-] for the time being input parameters are declared const instead of in; eventually they'll be back to in when the related DIP was setted once and for all; but, definetely—not scope const /// implementation‐tasks‐possible [-] pad[L|R] /// implementation‐tasks‐possible [-] replicate/repeat /// implementation‐tasks‐possible [-] replace(string, string) /// implementation‐tasks‐possible [-] translate(string, string) … same‐size strings matching one‐to‐one /// usage: array slicing can be used for usual things like: left() right() substr() etc … mainly when grapheme‐clusters are not expected at all /// usage: array slicing needs a zero‐based first range argument and a second one one‐based (or one‐past‐beyond; which it is somehow … counter‐intuitive module fw.types.UniCode; import std.algorithm : map, joiner; import std.array : array; import std.conv : to; import std.range : walkLength, take, tail, drop, dropBack; /// repeat, padLeft, padRight import std.stdio; import std.uni : Grapheme, byGrapheme; /// within this file: gudtUGC shared static this() { } /// the following will be executed only‐once per‐app: static this() { } /// the following will be executed only‐once per‐thread: static ~this() { } /// the following will be executed only‐once per‐thread: shared static ~this() { } /// the following will be executed only‐once per‐app: alias stringUGC = Grapheme; alias stringUGC08 = gudtUGC!(stringUTF08); alias stringUGC16 = gudtUGC!(stringUTF16); alias stringUGC32 = gudtUGC!(stringUTF32); alias stringUTF08 = string; /// same as immutable(char )[]; alias stringUTF16 = wstring; /// same as immutable(wchar)[]; alias stringUTF32 = dstring; /// same as immutable(dchar)[]; /// mixin templateUGC!(stringUTF08, r"gudtUGC08"d); /// mixin templateUGC!(stringUTF16, r"gudtUGC16"d); /// mixin templateUGC!(stringUTF32, r"gudtUGC32"d); /// template templateUGC (typeStringUTF, alias lstrStructureID) { aliases in main() public struct gudtUGC(typeStringUTF) { /// UniCode grapheme‐cluster‐aware string manipulation (implemented for one‐based operations) /// provides: public property size_t count /// provides: public size_t decode(typeStringUTF strSequence) /// provides: public typeStringUTF encode() /// provides: public gudtUGC!(typeStringUTF) take(size_t intStart, size_t intCount = 1) /// provides: public gudtUGC!(typeStringUTF) takeL(size_t intCount) /// provides: public gudtUGC!(typeStringUTF) takeR(size_t intCount) /// provides: public gudtUGC!(typeStringUTF) chopL(size_t intCount) /// provides: public gudtUGC!(typeStringUTF) chopR(size_t intCount) /// provides: public gudtUGC!(typeStringUTF) padL(size_t intCount, typeStringUTF strPadding = r" ") /// provides: public gudtUGC!(typeStringUTF) padR(size_t intCount, typeStringUTF strPadding = r" ") /// provides: public typeStringUTF takeasUTF(size_t intStart, size_t intCount = 1) /// provides: public typeStringUTF takeLasUTF(size_t intCount) /// provides: public typeStringUTF takeRasUTF(size_t intCount) /// provides: public typeStringUTF chopLasUTF(size_t intCount) /// provides: public typeStringUTF chopRasUTF(size_t intCount) /// provides: public typeStringUTF padL(size_t intCount, typeStringUTF strPadding = r" ") /// provides: public typeStringUTF padR(size_t intCount, typeStringUTF strPadding = r" ") /// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).take(35, 3).take(1,2).take(1,1).encode(); /// 日 /// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).take(35).encode(); /// 日 /// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).takeasUTF(35); /// 日 void popFront() { ++pintSequenceCurrent; } bool empty() { return pintSequenceCurrent == pintSequenceCount; } typeStringUTF front() { return takeasUTF(pintSequenceCurrent); } private stringUGC[] pugcSequence; private size_t pintSequenceCount = cast(size_t) 0; private size_t pintSequenceCurrent = cast(size_t) 0; property public size_t count() { return pintSequenceCount; } this( const typeStringUTF lstrSequence ) { /// (1) given UTF‐encoded sequence decode(lstrSequence); } safe public size_t decode( /// UniCode (UTF‐encoded → grapheme‐cluster) sequence const typeStringUTF lstrSequence ) { /// (1) given UTF‐encoded sequence size_t lintSequenceCount = cast(size_t) 0; if (lstrSequence is null) { pugcSequence = null; pintSequenceCount = cast(size_t) 0; pintSequenceCurrent = cast(size_t) 0; } else { pugcSequence = lstrSequence.byGrapheme.array; pintSequenceCount = pugcSequence.walkLength; pintSequenceCurrent = cast(size_t) 1; lintSequenceCount = pintSequenceCount; } return lintSequenceCount; } safe public typeStringUTF encode() { /// UniCode (grapheme‐cluster → UTF‐encoded) sequence typeStringUTF lstrSequence = null; if (pintSequenceCount >= cast(size_t) 1) { lstrSequence = pugcSequence .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public gudtUGC!(typeStringUTF) take( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence const size_t lintStart, const size_t lintCount = cast(size_t) 1 ) { /// (1) given start position >= 1 /// (2) given count >= 1 gudtUGC!(typeStringUTF) lugcSequence; if (lintStart >= cast(size_t) 1 && lintCount >= cast(size_t) 1) { size_t lintRange1 = lintStart - cast(size_t) 1; size_t lintRange2 = lintRange1 + lintCount; if (lintRange2 <= pintSequenceCount) { lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence[lintRange1..lintRange2] .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ); } } return lugcSequence; } safe public gudtUGC!(typeStringUTF) takeL( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence const size_t lintCount ) { /// (1) given count >= 1 gudtUGC!(typeStringUTF) lugcSequence; if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) { lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence .take(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ); } return lugcSequence; } safe public gudtUGC!(typeStringUTF) takeR( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence const size_t lintCount ) { /// (1) given count >= 1 gudtUGC!(typeStringUTF) lugcSequence; if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) { lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence .tail(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ); } return lugcSequence; } safe public gudtUGC!(typeStringUTF) chopL( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence const size_t lintCount ) { /// (1) given count >= 1 gudtUGC!(typeStringUTF) lugcSequence; if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) { lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence .drop(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ); } return lugcSequence; } safe public gudtUGC!(typeStringUTF) chopR( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence const size_t lintCount ) { /// (1) given count >= 1 gudtUGC!(typeStringUTF) lugcSequence; if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) { lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence .dropBack(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ); } return lugcSequence; } safe public typeStringUTF takeasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence const size_t lintStart, const size_t lintCount = cast(size_t) 1 ) { /// (1) given start position >= 1 /// (2) given count >= 1 typeStringUTF lstrSequence = null; if (lintStart >= cast(size_t) 1 && lintCount >= cast(size_t) 1) { size_t lintRange1 = lintStart - cast(size_t) 1; size_t lintRange2 = lintRange1 + lintCount; if (lintRange2 <= pintSequenceCount) { lstrSequence = pugcSequence[lintRange1..lintRange2] .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } } return lstrSequence; } safe public typeStringUTF takeLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence const size_t lintCount ) { /// (1) given count >= 1 typeStringUTF lstrSequence = null; if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) { lstrSequence = pugcSequence .take(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF takeRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence const size_t lintCount ) { /// (1) given count >= 1 typeStringUTF lstrSequence = null; if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) { lstrSequence = pugcSequence .tail(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF chopLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence const size_t lintCount ) { /// (1) given count >= 1 typeStringUTF lstrSequence = null; if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) { lstrSequence = pugcSequence .drop(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF chopRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence const size_t lintCount ) { /// (1) given count >= 1 typeStringUTF lstrSequence = null; if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) { lstrSequence = pugcSequence .dropBack(lintCount) .map!((ref g) => g[]) .joiner .to!(typeStringUTF) ; } return lstrSequence; } safe public typeStringUTF padLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence const size_t lintCount, const typeStringUTF lstrPadding = cast(typeStringUTF) r" " ) { /// (1) given count >= 1 /// [2] given padding (default is a single blank space) typeStringUTF lstrSequence = null; if (lintCount >= cast(size_t) 1 && lintCount > pintSequenceCount) { lstrSequence = null; /// pending } return lstrSequence; } safe public typeStringUTF padRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence const size_t lintCount, const typeStringUTF lstrPadding = cast(typeStringUTF) r" " ) { /// (1) given count >= 1 /// [2] given padding (default is a single blank space) typeStringUTF lstrSequence = null; if (lintCount >= cast(size_t) 1 && lintCount > pintSequenceCount) { lstrSequence = null; /// pending } return lstrSequence; } } unittest { version (useUTF08) { stringUTF08 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"c; stringUTF08 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"c; stringUTF08 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"c; } version (useUTF16) { stringUTF16 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"w; stringUTF16 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"w; stringUTF16 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"w; } version (useUTF32) { stringUTF32 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"d; stringUTF32 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"d; stringUTF32 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"d; } size_t lintSequence1sizeUTF = lstrSequence1.length; size_t lintSequence2sizeUTF = lstrSequence2.length; size_t lintSequence3sizeUTF = lstrSequence3.length; size_t lintSequence1sizeUGA = lstrSequence1.walkLength; size_t lintSequence2sizeUGA = lstrSequence2.walkLength; size_t lintSequence3sizeUGA = lstrSequence3.walkLength; size_t lintSequence1sizeUGC = lstrSequence1.byGrapheme.walkLength; size_t lintSequence2sizeUGC = lstrSequence2.byGrapheme.walkLength; size_t lintSequence3sizeUGC = lstrSequence3.byGrapheme.walkLength; assert(lintSequence1sizeUGC == cast(size_t) 50); assert(lintSequence2sizeUGC == cast(size_t) 50); assert(lintSequence3sizeUGC == cast(size_t) 50); assert(lintSequence1sizeUGA == cast(size_t) 50); assert(lintSequence2sizeUGA == cast(size_t) 50); assert(lintSequence3sizeUGA == cast(size_t) 52); version (useUTF08) { assert(lintSequence1sizeUTF == cast(size_t) 50); assert(lintSequence2sizeUTF == cast(size_t) 60); assert(lintSequence3sizeUTF == cast(size_t) 91); } version (useUTF16) { assert(lintSequence1sizeUTF == cast(size_t) 50); assert(lintSequence2sizeUTF == cast(size_t) 50); assert(lintSequence3sizeUTF == cast(size_t) 57); } version (useUTF32) { assert(lintSequence1sizeUTF == cast(size_t) 50); assert(lintSequence2sizeUTF == cast(size_t) 50); assert(lintSequence3sizeUTF == cast(size_t) 52); } /// the following should be the same regardless of the encoding being used and is the whole point of this UDT being made: version (useUTF08) { alias stringUTF = stringUTF08; stringUGC08 lugcSequence3 = stringUGC08(lstrSequence3); } version (useUTF16) { alias stringUTF = stringUTF16; stringUGC16 lugcSequence3 = stringUGC16(lstrSequence3); } version (useUTF32) { alias stringUTF = stringUTF32; stringUGC32 lugcSequence3 = stringUGC32(lstrSequence3); } assert(lugcSequence3.encode() == lstrSequence3); assert(lugcSequence3.take(35, 3).take(1,2).take(1,1).encode() == cast(stringUTF) r"日"); assert(lugcSequence3.take(21).encode() == cast(stringUTF) r"р"); assert(lugcSequence3.take(27).encode() == cast(stringUTF) r"й"); assert(lugcSequence3.take(35).encode() == cast(stringUTF) r"日"); assert(lugcSequence3.take(37).encode() == cast(stringUTF) r"語"); assert(lugcSequence3.take(21, 7).encode() == cast(stringUTF) r"русский"); assert(lugcSequence3.take(35, 3).encode() == cast(stringUTF) r"日本語"); assert(lugcSequence3.takeasUTF(21) == cast(stringUTF) r"р"); assert(lugcSequence3.takeasUTF(27) == cast(stringUTF) r"й"); assert(lugcSequence3.takeasUTF(35) == cast(stringUTF) r"日"); assert(lugcSequence3.takeasUTF(37) == cast(stringUTF) r"語"); assert(lugcSequence3.takeasUTF(21, 7) == cast(stringUTF) r"русский"); assert(lugcSequence3.takeasUTF(35, 3) == cast(stringUTF) r"日本語"); assert(lugcSequence3.takeL(1).encode() == cast(stringUTF) r"ä"); assert(lugcSequence3.takeR(1).encode() == cast(stringUTF) r"😎"); assert(lugcSequence3.takeL(7).encode() == cast(stringUTF) r"äëåčñœß"); assert(lugcSequence3.takeR(16).encode() == cast(stringUTF) r"日本語 = japanese 😎"); assert(lugcSequence3.takeLasUTF(1) == cast(stringUTF) r"ä"); assert(lugcSequence3.takeRasUTF(1) == cast(stringUTF) r"😎"); assert(lugcSequence3.takeLasUTF(7) == cast(stringUTF) r"äëåčñœß"); assert(lugcSequence3.takeRasUTF(16) == cast(stringUTF) r"日本語 = japanese 😎"); assert(lugcSequence3.chopL(10).encode() == cast(stringUTF) r"russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"); assert(lugcSequence3.chopR(21).encode() == cast(stringUTF) r"äëåčñœß … russian = русский 🇷🇺"); assert(lugcSequence3.chopLasUTF(10) == cast(stringUTF) r"russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"); assert(lugcSequence3.chopRasUTF(21) == cast(stringUTF) r"äëåčñœß … russian = русский 🇷🇺"); version (useUTF08) { stringUTF08 lstrSequence3reencoded; } version (useUTF16) { stringUTF16 lstrSequence3reencoded; } version (useUTF32) { stringUTF32 lstrSequence3reencoded; } for ( size_t lintSequenceUGC = cast(size_t) 1; lintSequenceUGC <= lintSequence3sizeUGC; ++lintSequenceUGC ) { lstrSequence3reencoded ~= lugcSequence3.takeasUTF(lintSequenceUGC); } assert(lstrSequence3reencoded == lstrSequence3); lstrSequence3reencoded = null; version (useUTF08) { foreach (stringUTF08 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } } version (useUTF16) { foreach (stringUTF16 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } } version (useUTF32) { foreach (stringUTF32 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } } //assert(lstrSequence3reencoded == lstrSequence3); /// ooops … } ```
Jul 13
prev sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Sunday, 11 July 2021 at 05:20:49 UTC, someone wrote:
 ```d
 mixin template templateUGC (
    typeStringUTF,
    alias lstrStructureID
    ) {

    public struct lstrStructureID {

       typeStringUTF whatever;

    }
This creates a struct with teh literal name `lstrStructureID`. Just like any other name. So it is NOT the value of the variable.
 ```d
    public struct mixin(lstrStructureID) { ... }
 ```

 because the argument seems to require a complete statement.
Indeed, you'd have to mixin the whole thing like mixin("public struct " ~ lstrStructureId ~ " { ... } ");
Jul 11
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/11/21 8:49 AM, Adam D Ruppe wrote:
 On Sunday, 11 July 2021 at 05:20:49 UTC, someone wrote:
 ```d
 mixin template templateUGC (
    typeStringUTF,
    alias lstrStructureID
    ) {

    public struct lstrStructureID {

       typeStringUTF whatever;

    }
This creates a struct with teh literal name `lstrStructureID`. Just like any other name. So it is NOT the value of the variable.
 ```d
    public struct mixin(lstrStructureID) { ... }
 ```

 because the argument seems to require a complete statement.
Indeed, you'd have to mixin the whole thing like mixin("public struct " ~ lstrStructureId ~ " { ... } ");
when I've done this kind of stuff, what I usually do is: ```d struct Thing { ... // actual struct } mixin("alias ", lstrStructureID, " = Thing;"); ``` the downside is that the actual struct name symbol will be `Thing`, or whatever you called it. But at least you are not writing lots of code using mixins. -Steve
Jul 11
parent someone <someone somewhere.com> writes:
On Sunday, 11 July 2021 at 13:14:23 UTC, Steven Schveighoffer 
wrote:

 when I've done this kind of stuff, what I usually do is:

 ```d
 struct Thing {
   ... // actual struct
 }

 mixin("alias ", lstrStructureID, " = Thing;");
 ```

 the downside is that the actual struct name symbol will be 
 `Thing`, or whatever you called it. But at least you are not 
 writing lots of code using mixins.

 -Steve
Thanks for your tip Steve, I ended with something similar, I'll be posting my whole example below.
Jul 11
prev sibling next sibling parent reply zjh <fqbqrr 163.com> writes:
On Sunday, 11 July 2021 at 12:49:28 UTC, Adam D Ruppe wrote:

 This creates a struct with teh literal name `lstrStructureID`. 
 Just like any other name. So it is NOT the value of the 
 variable.
Could you explain more detail?
Jul 11
parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Sunday, 11 July 2021 at 13:30:27 UTC, zjh wrote:
 Could you explain more detail?
It is just normal code with a normal name. The fact there's another variable with the same name doesn't change anything.
Jul 11
parent reply zjh <fqbqrr 163.com> writes:
```d
mixin template templateUGC ( typeStringUTF, alias 
lstrStructureID){
    public struct lstrStructureID {
       typeStringUTF w;
    }
}
mixin templateUGC!(string,  "gudtUGC08");
```
You say `This creates a struct with teh literal name 
lstrStructureID`.
I tried,can compile,but I don't know generate what.

Could you explain more detail?
Jul 11
parent zjh <fqbqrr 163.com> writes:
On Sunday, 11 July 2021 at 14:04:14 UTC, zjh wrote:

just genenrate `lstrStructureID` struct.
Jul 11
prev sibling parent someone <someone somewhere.com> writes:
On Sunday, 11 July 2021 at 12:49:28 UTC, Adam D Ruppe wrote:

 Indeed, you'd have to mixin the whole thing like

 mixin("public struct " ~ lstrStructureId ~ " { ... } ");
As I mentioned in my previous reply to Ali this could be viable for one-liners-or-so, but for chunks of code having, say, a couple hundred lines for one UDT, it will become debug/maintenance-hell soon ... so clearly; it is a no-go ... for my specific case at least.
Jul 11