digitalmars.D - I can crash program (core.exception.InvalidMemoryOperationError) by
- James Blachly (119/119) Jan 18 2019 First, let me apologize for not being able to reduce this to a
- Steven Schveighoffer (8/142) Jan 18 2019 https://github.com/blachlylab/dhtslib/blob/master/source/dhtslib/htslib/...
- Steven Schveighoffer (6/17) Jan 18 2019 If I had to guess why __FUNCTION__ causes it and others do not, then I
- James Blachly (6/25) Jan 18 2019 Steve: Thanks for your quick reply. Why does manipulation of the
- Steven Schveighoffer (21/47) Jan 18 2019 It's a good question.
- Adam D. Ruppe (5/6) Jan 18 2019 The only downside here is the internal namespace has no public
- Steven Schveighoffer (4/10) Jan 18 2019 I'm 99.9% sure the behavior of tempCString won't be changed or removed
- Jonathan M Davis (14/23) Jan 18 2019 The safest thing for anyone wanting to use std.internal.cstring is to ju...
- Steven Schveighoffer (5/29) Jan 18 2019 I'm not speaking about rules or restrictions, I'm talking about
- Neia Neutuladh (5/8) Jan 18 2019 It's low risk but entirely unsupported. Using it in your own private cod...
- H. S. Teoh (7/17) Jan 18 2019 The best solution is just copy the code into your own module and use
- H. S. Teoh (9/21) Jan 18 2019 [...]
- Jonathan M Davis (7/25) Jan 18 2019 It should be possible to make it package(std), and really, it should be,...
First, let me apologize for not being able to reduce this to a minimal test case. I have tried to do so but was unsuccessful. Offending code, and links to codebase at bottom. TL;DR: I believe this is a compiler error. Please help me prove it, or teach me where I've gone wrong. --- Briefly, I have a class, SAMRecord, that contains: a default constructor, a non-default ctor, and a destructor, each of which for debugging purposes emits a log-line that includes __FUNCTION__. When the default ctor passes __FUNCTION__ to the log function, the program will crash after about 15k or 30k objects are created when using dmd or ldc2, respectively. Further, if I change __FUNCTION__ to any other string (except __MODULE__, which induces the same behavior), the program operates flawlessly. The error is: core.exception.InvalidMemoryOperationError src/core/exception.d(700): Invalid memory operation and is thrown at the the time that when under normal operation the GC would start clearing out the old objects. _Importantly, my code never calls the default constructor_ -- This part puzzles me the most. Additional data points: 1. I added a dummy log function to demonstrate it (likely?) not my library code offending 2. adding assert statements into the default constructor seems to elide the offending behaviour (i.e., I can leave the __FUNCTION__ in without memory error) 3. Commenting out the usage of __FUNCTION__ in the non-default ctor, which IS being used, does not prevent the error. Overall, the error seems to be related to passing __FUNCTION__ out of default class constructors, even when that constructor is not called explicitly by user code. Altering compiler behavior by e.g. adding assert(0) prevents the error. Thus, I conclude this is almost surely a compiler error, but I would be glad to be proven wrong. I would be happy to provide any more information, or to help develop an appropriate minimal reproducible bug, if given appropriate guidance. Otherwise, to replicate this you will need to install a C library. Details below. Thanks all in advance. James --- DMD: v2.081.2 LDC2: 1.11.0 repository: https://github.com/blachlylab/dhtslib/ commit: ee87434d0c48919aaa0ff6d04d6e0ca403564618 (current as of this post) requirement: htslib; https://github.com/samtools/htslib To replicate: 1. Download any BAM file. (random BAM file: https://www.encodeproject.org/files/ENCFF765NLQ/ download/ENCFF765NLQ.bam ) 2. Update test/samreader.d to open your BAM file (sorry I have not parameterized this, have been tearing hair about re this crash for hours) 3. `dub build -c sam_test && ./samreader` Attempt at minimal reproducible test-case (that does NOT trigger the error): repo/test/bug_minimal.d Offending code section: (note that I wrote test_log as a substitute for hts_log_debug to prove that the error is still triggered even when not calling library code) ``` import core.stdc.stdlib: calloc, free; import std.format; import std.parallelism: totalCPUs; import std.stdio: writeln, writefln; import std.string: fromStringz, toStringz; import dhtslib.htslib.hts: htsFile, hts_open, hts_close; import dhtslib.htslib.hts: hts_itr_t; import dhtslib.htslib.hts: seq_nt16_str; import dhtslib.htslib.hts: hts_set_threads; import dhtslib.htslib.hts_log; import dhtslib.htslib.kstring; import dhtslib.htslib.sam; void test_log(string ctx, string msg) { import std.stdio; stderr.writeln(ctx, msg); } /** Encapsulates a SAM/BAM/CRAM record, using the bam1_t type for memory efficiency, and the htslib helper functions for speed. **/ class SAMRecord { /// bam1_t *b; /// this() { //debug(dhtslib_debug) hts_log_debug(__FUNCTION__, "ctor()"); /// This line triggers memory error when __FUNCTION__, but not when "Other string" test_log(__FUNCTION__, "ctor()"); /// This line will also trigger the memory error when __FUNCTION__, but not other strings //writeln(__FUNCTION__); // this will not trigger the memory error this.b = bam_init1(); //assert(0); // This will elide(?) the memory error //assert(1 == 2); // This will elide(?) the memory error } /// this(bam1_t *b) { debug(dhtslib_debug) hts_log_debug(__FUNCTION__, "ctor(bam1_t *b)"); this.b = b; } ~this() { writeln("Record dtor"); debug(dhtslib_debug) hts_log_debug(__FUNCTION__, "dtor"); //bam_destroy1(this.b); // we don't own it! } } ```
Jan 18 2019
On 1/18/19 10:29 AM, James Blachly wrote:First, let me apologize for not being able to reduce this to a minimal test case. I have tried to do so but was unsuccessful. Offending code, and links to codebase at bottom. TL;DR: I believe this is a compiler error. Please help me prove it, or teach me where I've gone wrong. --- Briefly, I have a class, SAMRecord, that contains: a default constructor, a non-default ctor, and a destructor, each of which for debugging purposes emits a log-line that includes __FUNCTION__. When the default ctor passes __FUNCTION__ to the log function, the program will crash after about 15k or 30k objects are created when using dmd or ldc2, respectively. Further, if I change __FUNCTION__ to any other string (except __MODULE__, which induces the same behavior), the program operates flawlessly. The error is: core.exception.InvalidMemoryOperationError src/core/exception.d(700): Invalid memory operation and is thrown at the the time that when under normal operation the GC would start clearing out the old objects. _Importantly, my code never calls the default constructor_ -- This part puzzles me the most. Additional data points: 1. I added a dummy log function to demonstrate it (likely?) not my library code offending 2. adding assert statements into the default constructor seems to elide the offending behaviour (i.e., I can leave the __FUNCTION__ in without memory error) 3. Commenting out the usage of __FUNCTION__ in the non-default ctor, which IS being used, does not prevent the error. Overall, the error seems to be related to passing __FUNCTION__ out of default class constructors, even when that constructor is not called explicitly by user code. Altering compiler behavior by e.g. adding assert(0) prevents the error. Thus, I conclude this is almost surely a compiler error, but I would be glad to be proven wrong. I would be happy to provide any more information, or to help develop an appropriate minimal reproducible bug, if given appropriate guidance. Otherwise, to replicate this you will need to install a C library. Details below. Thanks all in advance. James --- DMD: v2.081.2 LDC2: 1.11.0 repository: https://github.com/blachlylab/dhtslib/ commit: ee87434d0c48919aaa0ff6d04d6e0ca403564618 (current as of this post) requirement: htslib; https://github.com/samtools/htslib To replicate: 1. Download any BAM file. (random BAM file: https://www.encodeproject.org/files/ENCFF765NLQ/ download/ENCFF765NLQ.bam ) 2. Update test/samreader.d to open your BAM file (sorry I have not parameterized this, have been tearing hair about re this crash for hours) 3. `dub build -c sam_test && ./samreader` Attempt at minimal reproducible test-case (that does NOT trigger the error): repo/test/bug_minimal.d Offending code section: (note that I wrote test_log as a substitute for hts_log_debug to prove that the error is still triggered even when not calling library code) ``` import core.stdc.stdlib: calloc, free; import std.format; import std.parallelism: totalCPUs; import std.stdio: writeln, writefln; import std.string: fromStringz, toStringz; import dhtslib.htslib.hts: htsFile, hts_open, hts_close; import dhtslib.htslib.hts: hts_itr_t; import dhtslib.htslib.hts: seq_nt16_str; import dhtslib.htslib.hts: hts_set_threads; import dhtslib.htslib.hts_log; import dhtslib.htslib.kstring; import dhtslib.htslib.sam; void test_log(string ctx, string msg) { import std.stdio; stderr.writeln(ctx, msg); } /** Encapsulates a SAM/BAM/CRAM record, using the bam1_t type for memory efficiency, and the htslib helper functions for speed. **/ class SAMRecord { /// bam1_t *b; /// this() { //debug(dhtslib_debug) hts_log_debug(__FUNCTION__, "ctor()"); /// This line triggers memory error when __FUNCTION__, but not when "Other string" test_log(__FUNCTION__, "ctor()"); /// This line will also trigger the memory error when __FUNCTION__, but not other strings //writeln(__FUNCTION__); // this will not trigger the memory error this.b = bam_init1(); //assert(0); // This will elide(?) the memory error //assert(1 == 2); // This will elide(?) the memory error } /// this(bam1_t *b) { debug(dhtslib_debug) hts_log_debug(__FUNCTION__, "ctor(bam1_t *b)"); this.b = b; } ~this() { writeln("Record dtor"); debug(dhtslib_debug) hts_log_debug(__FUNCTION__, "dtor"); //bam_destroy1(this.b); // we don't own it! } } ```https://github.com/blachlylab/dhtslib/blob/master/source/dhtslib/htslib/hts_log.d#L94 That is your problem. toStringz is going to attempt to allocate using the GC. Allocating during a GC collection causes an InvalidMemoryOperationError. Take that out of your destructor, and you should not cause that error. Alternatively, fix the logger so it doesn't use the GC. -Steve
Jan 18 2019
On 1/18/19 10:35 AM, Steven Schveighoffer wrote:https://github.com/blachlylab/dhtslib/blob/master/source/dhtslib/ tslib/hts_log.d#L94 That is your problem. toStringz is going to attempt to allocate using the GC. Allocating during a GC collection causes an InvalidMemoryOperationError. Take that out of your destructor, and you should not cause that error. Alternatively, fix the logger so it doesn't use the GC.If I had to guess why __FUNCTION__ causes it and others do not, then I would say it's because toStringz has particular cases where it DOESN'T allocate, and that depends completely on the length of the string. https://github.com/dlang/phobos/blob/master/std/string.d#L364-L365 -Steve
Jan 18 2019
On Friday, 18 January 2019 at 15:42:55 UTC, Steven Schveighoffer wrote:On 1/18/19 10:35 AM, Steven Schveighoffer wrote:Steve: Thanks for your quick reply. Why does manipulation of the default constructor, which is never called, affect whether or not the destructor throws the error? Jameshttps://github.com/blachlylab/dhtslib/blob/master/source/dhtslib/htslib/hts_log.d#L94 That is your problem. toStringz is going to attempt to allocate using the GC. Allocating during a GC collection causes an InvalidMemoryOperationError. Take that out of your destructor, and you should not cause that error. Alternatively, fix the logger so it doesn't use the GC.If I had to guess why __FUNCTION__ causes it and others do not, then I would say it's because toStringz has particular cases where it DOESN'T allocate, and that depends completely on the length of the string. https://github.com/dlang/phobos/blob/master/std/string.d#L364-L365 -Steve
Jan 18 2019
On 1/18/19 11:37 AM, James Blachly wrote:On Friday, 18 January 2019 at 15:42:55 UTC, Steven Schveighoffer wrote:It's a good question. My explanation above is slightly wrong, though, it's not just the length, but the address of the end byte. If the end byte is on a 4-byte boundary, then it allocates, otherwise it checks for a zero (which is very likely true, especially with string literals), and then just returns the pointer. When you have behavior changes based on where something is allocated, many bets are off. It's almost like the randomness of memory corruption. I would say, if you fix the destructor to not allocate, and your problems all go away, you should assume that's the fix and move on. Something to consider using instead of toStringz: https://github.com/dlang/phobos/blob/5e6fe2f9c8f72f4c3b0497dc363fc61d823ff489/std/internal/cstring.d#L77 This only allocates on the C heap (if necessary) and cleans itself up upon going out of scope. Use like this: import std.internal.cstring; auto cmsg = tempCString(msg); auto cctx = tempCString(ctx); hts_log(htsLogLevel.HTS_LOG_DEBUG, cctx, cmsg); -SteveOn 1/18/19 10:35 AM, Steven Schveighoffer wrote:Steve: Thanks for your quick reply. Why does manipulation of the default constructor, which is never called, affect whether or not the destructor throws the error?https://github.com/blachlylab/dhtslib/blob/master/source/dhtslib/ tslib/hts_log.d#L94 That is your problem. toStringz is going to attempt to allocate using the GC. Allocating during a GC collection causes an InvalidMemoryOperationError. Take that out of your destructor, and you should not cause that error. Alternatively, fix the logger so it doesn't use the GC.If I had to guess why __FUNCTION__ causes it and others do not, then I would say it's because toStringz has particular cases where it DOESN'T allocate, and that depends completely on the length of the string. https://github.com/dlang/phobos/blob/master/std/string.d#L364-L365
Jan 18 2019
On Friday, 18 January 2019 at 17:03:59 UTC, Steven Schveighoffer wrote:import std.internal.cstring;The only downside here is the internal namespace has no public guarantees; the phobos devs can remove or modify it as they want without notice.
Jan 18 2019
On 1/18/19 12:08 PM, Adam D. Ruppe wrote:On Friday, 18 January 2019 at 17:03:59 UTC, Steven Schveighoffer wrote:I'm 99.9% sure the behavior of tempCString won't be changed or removed in a way that breaks code. -Steveimport std.internal.cstring;The only downside here is the internal namespace has no public guarantees; the phobos devs can remove or modify it as they want without notice.
Jan 18 2019
On Friday, January 18, 2019 10:17:02 AM MST Steven Schveighoffer via Digitalmars-d wrote:On 1/18/19 12:08 PM, Adam D. Ruppe wrote:The safest thing for anyone wanting to use std.internal.cstring is to just copy the module into their own project, though arguably, it's one of those things that should be cleaned up and made public instead of being in internal (which would actually break any code currently using std.internal.cstring). Regardless, I would definitely _not_ advise anyone to just import something from internal, and I definitely don't want to see anyone trying to argue later that we can't change something from internal just because a bunch of people decided that something from there was useful and started importing it. While the risk of your code breaking later on because it's using something from internal _might_ be low, it is basically begging for it to be broken at some point. - Jonathan M DavisOn Friday, 18 January 2019 at 17:03:59 UTC, Steven Schveighoffer wrote:I'm 99.9% sure the behavior of tempCString won't be changed or removed in a way that breaks code.import std.internal.cstring;The only downside here is the internal namespace has no public guarantees; the phobos devs can remove or modify it as they want without notice.
Jan 18 2019
On 1/18/19 1:37 PM, Jonathan M Davis wrote:On Friday, January 18, 2019 10:17:02 AM MST Steven Schveighoffer via Digitalmars-d wrote:I'm not speaking about rules or restrictions, I'm talking about stability. tempCString is pretty stable and likely won't change in any way except implementation details. -SteveOn 1/18/19 12:08 PM, Adam D. Ruppe wrote:The safest thing for anyone wanting to use std.internal.cstring is to just copy the module into their own project, though arguably, it's one of those things that should be cleaned up and made public instead of being in internal (which would actually break any code currently using std.internal.cstring). Regardless, I would definitely _not_ advise anyone to just import something from internal, and I definitely don't want to see anyone trying to argue later that we can't change something from internal just because a bunch of people decided that something from there was useful and started importing it. While the risk of your code breaking later on because it's using something from internal _might_ be low, it is basically begging for it to be broken at some point.On Friday, 18 January 2019 at 17:03:59 UTC, Steven Schveighoffer wrote:I'm 99.9% sure the behavior of tempCString won't be changed or removed in a way that breaks code.import std.internal.cstring;The only downside here is the internal namespace has no public guarantees; the phobos devs can remove or modify it as they want without notice.
Jan 18 2019
On Fri, 18 Jan 2019 13:47:15 -0500, Steven Schveighoffer wrote:I'm not speaking about rules or restrictions, I'm talking about stability. tempCString is pretty stable and likely won't change in any way except implementation details.It's low risk but entirely unsupported. Using it in your own private code is okay; you can choose when to switch compiler versions and so forth, and if it causes any issues, you can change it right away. Using it in code intended for other people to compile is not a great idea.
Jan 18 2019
On Fri, Jan 18, 2019 at 07:08:02PM +0000, Neia Neutuladh via Digitalmars-d wrote:On Fri, 18 Jan 2019 13:47:15 -0500, Steven Schveighoffer wrote:The best solution is just copy the code into your own module and use that. In this day and age there's a stigma against copy-pasted code, but sometimes, that's exactly what the situation calls for. T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to KillI'm not speaking about rules or restrictions, I'm talking about stability. tempCString is pretty stable and likely won't change in any way except implementation details.It's low risk but entirely unsupported. Using it in your own private code is okay; you can choose when to switch compiler versions and so forth, and if it causes any issues, you can change it right away. Using it in code intended for other people to compile is not a great idea.
Jan 18 2019
On Fri, Jan 18, 2019 at 11:37:02AM -0700, Jonathan M Davis via Digitalmars-d wrote: [...]The safest thing for anyone wanting to use std.internal.cstring is to just copy the module into their own project, though arguably, it's one of those things that should be cleaned up and made public instead of being in internal (which would actually break any code currently using std.internal.cstring). Regardless, I would definitely _not_ advise anyone to just import something from internal, and I definitely don't want to see anyone trying to argue later that we can't change something from internal just because a bunch of people decided that something from there was useful and started importing it. While the risk of your code breaking later on because it's using something from internal _might_ be low, it is basically begging for it to be broken at some point.[...] Shouldn't std.internal.* declare all symbols as private to std.*, like with `package` or something? Or is that not expressible currently? It's a pretty bad idea for user code to depend on std.internal.*... T -- Philosophy: how to make a career out of daydreaming.
Jan 18 2019
On Friday, January 18, 2019 12:03:23 PM MST H. S. Teoh via Digitalmars-d wrote:On Fri, Jan 18, 2019 at 11:37:02AM -0700, Jonathan M Davis via Digitalmars-d wrote: [...]It should be possible to make it package(std), and really, it should be, but it probably predates being able to do that, since that's a recent-ish feature. It's also something that I think folks tend to forget about (I certainly do), because it doesn't get used often. - Jonathan M DavisThe safest thing for anyone wanting to use std.internal.cstring is to just copy the module into their own project, though arguably, it's one of those things that should be cleaned up and made public instead of being in internal (which would actually break any code currently using std.internal.cstring). Regardless, I would definitely _not_ advise anyone to just import something from internal, and I definitely don't want to see anyone trying to argue later that we can't change something from internal just because a bunch of people decided that something from there was useful and started importing it. While the risk of your code breaking later on because it's using something from internal _might_ be low, it is basically begging for it to be broken at some point.[...] Shouldn't std.internal.* declare all symbols as private to std.*, like with `package` or something? Or is that not expressible currently? It's a pretty bad idea for user code to depend on std.internal.*...
Jan 18 2019