digitalmars.D - Type safety could prevent nuclear war
- tsbockman (10/10) Feb 04 2016 The annual Underhanded C Contest announced their winners today.
- H. S. Teoh via Digitalmars-d (21/34) Feb 04 2016 The C preprocessor accepts all sorts of nasty, nonsensical things. For
- tsbockman (9/27) Feb 04 2016 Definitely. What puzzles me about the winning entry, though, is
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/18) Feb 04 2016 Linkers don't know anything about types. A type is a language
- tsbockman (6/18) Feb 04 2016 Yes, that was my point...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/19) Feb 04 2016 The context is a compilation system for building big software on
- tsbockman (15/28) Feb 04 2016 OK. That's a good reason for C's original design.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (29/41) Feb 04 2016 C has to be backwards compatible, but I don't know why people do
- tsbockman (5/13) Feb 04 2016 Why would simply adding a warning change any of that?
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/24) Feb 04 2016 Not sure what you mean by adding a warning. You can probably find
- H. S. Teoh via Digitalmars-d (24/29) Feb 04 2016 That's a lot more expensive than you think. There's a reason most modern
- tsbockman (9/12) Feb 04 2016 I should also point out that D can link to (more or less)
- H. S. Teoh via Digitalmars-d (49/52) Feb 04 2016 It cannot, because C symbols are not mangled. The function name uniquely
- Walter Bright (5/6) Feb 04 2016 The preprocessor makes C++ into an inherently unreliable, unsafe program...
-
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?=
(6/9)
Feb 04 2016
AFAICT C would have complained if he had included
. This - tsbockman (7/16) Feb 04 2016 What restriction does not checking, by default, that the
- Chris Wright (11/24) Feb 04 2016 C linkage does zero name mangling, which is the problem. C++ introduced
- tsbockman (4/14) Feb 04 2016 That explains why the linker doesn't catch it. I still don't see
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/8) Feb 04 2016 The excuse is that C use the same mechanism for creating bindings
- Chris Wright (12/15) Feb 04 2016 Doing this sort of validation requires build system integration (track
- tsbockman (17/25) Feb 04 2016 There is no need to take "as much time as compiling the whole
- Chris Wright (12/25) Feb 04 2016 True. That works if this is baked into your compiler, or if your compile...
- tsbockman (11/21) Feb 04 2016 On Friday, 5 February 2016 at 00:56:28 UTC, Ola Fosheim Grøstad
- Chris Wright (5/7) Feb 04 2016 The compiler doesn't have all the information you need. You could add it...
- tsbockman (15/20) Feb 04 2016 What information, specifically, is the compiler missing?
- Chris Wright (31/39) Feb 04 2016 It doesn't know what targets I'm ultimately creating, and it doesn't kno...
- tsbockman (16/50) Feb 04 2016 No spurious error is generated by my proposal in your example 2,
- H. S. Teoh via Digitalmars-d (27/50) Feb 04 2016 This fails for multi-executable projects, which may legally have
- tsbockman (15/49) Feb 04 2016 It's a small fraction of the total data being handled by the
- H. S. Teoh via Digitalmars-d (37/87) Feb 05 2016 The problem is, the linker knows nothing about the language. Arguably it
- Chris Wright (22/52) Feb 05 2016 I think you're talking about maintaining an in-memory, modifiable data
- tsbockman (13/43) Feb 05 2016 It doesn't necessarily have to be slow when you only changed one
- tsbockman (7/17) Feb 05 2016 I did some quick tests on my system, and even with 100,000,000
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (19/25) Feb 05 2016 Well, compilers "should" only implement the standard, then they
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/6) Feb 05 2016 Let me add to this that the superior approach is to compile to an
- anonymous (22/26) Feb 04 2016 You can do the same thing in D, using extern(C) to get no mangling:
- tsbockman (9/31) Feb 04 2016 You can do the same thing in D if you try, but it's not natural
- anonymous (5/11) Feb 04 2016 We do have a lot of bindings to C libraries, though. When there's a
- tsbockman (7/15) Feb 04 2016 The compiler cannot (in the general case) verify that `extern(C)`
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/18) Feb 04 2016 I guess D could do it, although this is a rather unlikely source
- tsbockman (9/17) Feb 04 2016 Aliasing types like that can be useful sometimes, but only within
- Daniel Murphy (12/18) Feb 05 2016 Currently D allows overloading extern(C) declarations, see
- tsbockman (12/26) Feb 05 2016 I think it makes sense (when actually linking to C) to allow
- Daniel Murphy (6/15) Feb 05 2016 Safety on C functions is always going to need to be hand verified, the
- H. S. Teoh via Digitalmars-d (15/25) Feb 04 2016 Nah... while D, by default, tries to be type-safe and prevent guffaws
- Chris Wright (6/10) Feb 04 2016 Which suggests a check of this sort should be a warning rather than an
- tsbockman (2/7) Feb 04 2016 Yes.
- tsbockman (11/28) Feb 04 2016 I'm not saying that `extern(C)` is bad in general; I understand
- Adam D. Ruppe (10/13) Feb 04 2016 D allows that. This is why I recommend putting `static
- tsbockman (12/21) Feb 04 2016 D *doesn't* allow that though - at least, not in a monolithic,
- Adam D. Ruppe (25/29) Feb 04 2016 Well, technically, a .di file is just a .d file renamed, but it
- tsbockman (16/46) Feb 04 2016 Thanks for the explanation. That does sound basically the same as
- H. S. Teoh via Digitalmars-d (13/25) Feb 04 2016 [...]
- tsbockman (9/18) Feb 04 2016 I should have clarified that I was considering static libraries,
The annual Underhanded C Contest announced their winners today. As always, the results are very entertaining, and also an excellent advertisement for languages-that-are-not-C. The first place entry is particularly ridiculous; is there any modern language that would make it so easy to commit such an awful "mistake"? http://www.underhanded-c.org/#winner Actually, I'm surprised that this works even in C - I would have expected at least a compiler (or linker?) warning; this seems like it should be easy to detect automatically.
Feb 04 2016
On Thu, Feb 04, 2016 at 10:57:00PM +0000, tsbockman via Digitalmars-d wrote:The annual Underhanded C Contest announced their winners today. As always, the results are very entertaining, and also an excellent advertisement for languages-that-are-not-C. The first place entry is particularly ridiculous; is there any modern language that would make it so easy to commit such an awful "mistake"? http://www.underhanded-c.org/#winner Actually, I'm surprised that this works even in C - I would have expected at least a compiler (or linker?) warning; this seems like it should be easy to detect automatically.The C preprocessor accepts all sorts of nasty, nonsensical things. For example, the following code compiles and runs (without any warning(!) on my Linux box's standard gcc installation), and prints "No": #include <stdio.h> #define if(a) if(!(a)) int main() { int i = 1; if (i == 1) printf("Yes\n"); else printf("No\n"); } Imagine if this nasty #define is buried somewhere under several layers of #include's. I'm pretty sure somebody can also concoct some nasty #define that will break the standard #include headers in horrible ways by changing the semantics of certain supposedly-built-in constructs. T -- Mediocrity has been pushed to extremes.
Feb 04 2016
On Thursday, 4 February 2016 at 23:10:23 UTC, H. S. Teoh wrote:On Thu, Feb 04, 2016 at 10:57:00PM +0000, tsbockman via Digitalmars-d wrote:Definitely. What puzzles me about the winning entry, though, is that the compiler and/or linker should be able to trivially detect the type mismatch *after* the preprocessor pass(es) are already done. It should just see that the post-preprocessor signatures of `spectral_contrast()` in match.c and spectral_contrast.c are in conflict, and either issue a warning, or refuse to link them at all.The annual Underhanded C Contest announced their winners today. As always, the results are very entertaining, and also an excellent advertisement for languages-that-are-not-C. The first place entry is particularly ridiculous; is there any modern language that would make it so easy to commit such an awful "mistake"? http://www.underhanded-c.org/#winner Actually, I'm surprised that this works even in C - I would have expected at least a compiler (or linker?) warning; this seems like it should be easy to detect automatically.The C preprocessor accepts all sorts of nasty, nonsensical things.
Feb 04 2016
On Thursday, 4 February 2016 at 23:21:54 UTC, tsbockman wrote:Definitely. What puzzles me about the winning entry, though, is that the compiler and/or linker should be able to trivially detect the type mismatch *after* the preprocessor pass(es) are already done.Linkers don't know anything about types. A type is a language feature.It should just see that the post-preprocessor signatures of `spectral_contrast()` in match.c and spectral_contrast.c are in conflict, and either issue a warning, or refuse to link them at all.Has nothing to do with the preprocessor. He defined float_t to be an alias for double in one compilation unit, and float_t to be an alias for float in another compilation unit. In C, compilation units are completely independent, and can in fact come from different compilers and different languages. C is very much a system level programming language.
Feb 04 2016
On Thursday, 4 February 2016 at 23:25:58 UTC, Ola Fosheim Grøstad wrote:On Thursday, 4 February 2016 at 23:21:54 UTC, tsbockman wrote:Yes, that was my point...It should just see that the post-preprocessor signatures of `spectral_contrast()` in match.c and spectral_contrast.c are in conflict, and either issue a warning, or refuse to link them at all.Has nothing to do with the preprocessor.He defined float_t to be an alias for double in one compilation unit, and float_t to be an alias for float in another compilation unit. In C, compilation units are completely independent, and can in fact come from different compilers and different languages. C is very much a system level programming language.Just because *sometimes* the source code of the other module must be compiled independently, is a poor excuse to skip obvious, useful safety checks *all* the time.
Feb 04 2016
On Thursday, 4 February 2016 at 23:35:46 UTC, tsbockman wrote:Just because *sometimes* the source code of the other module must be compiled independently, is a poor excuse to skip obvious, useful safety checks *all* the time.The context is a compilation system for building big software on very slow CPUs with kilobytes of RAM. C was designed for always compiling independently and compiling source files that are bigger than what can be held in RAM, and also for building executables that can fill most of system RAM. So the compilation system was designed for using external memory (disk) and that affects C a lot. The forerunner for C, BCPL was a bootstrap language for writing compilers. So C is minimal by design. BTW, C++ programmers sometimes use similar unsafe hacks of "pruned header files" to break dependencies and speed up compilation. So this is not unique to C, but C++ introduced the mangling of types into names to support overloading of functions on parameter types, which is why C++ detects (some) type issues at link time.
Feb 04 2016
On Thursday, 4 February 2016 at 23:53:58 UTC, Ola Fosheim Grøstad wrote:On Thursday, 4 February 2016 at 23:35:46 UTC, tsbockman wrote:OK. That's a good reason for C's original design. But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer? This isn't even a particularly expensive (in compile-time costs) check to perform anyway; all that is necessary is to store a temporary table of symbol signatures somewhere (it doesn't need to be in RAM), and check that any duplicate entries are consistent with each other before linking. This is already a solved problem in most other programming languages; there is no fundamental reason that the solutions used in D, C++, or Java could not be applied to C - without even changing any of the language semantics.Just because *sometimes* the source code of the other module must be compiled independently, is a poor excuse to skip obvious, useful safety checks *all* the time.The context is a compilation system for building big software on very slow CPUs with kilobytes of RAM. C was designed for always compiling independently and compiling source files that are bigger than what can be held in RAM, and also for building executables that can fill most of system RAM. So the compilation system was designed for using external memory (disk) and that affects C a lot. The forerunner for C, BCPL was a bootstrap language for writing compilers. So C is minimal by design.
Feb 04 2016
On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer?C has to be backwards compatible, but I don't know why people do larger projects in C in 2016. Libraries are done in C for portability and because it provides a FFI interface defined as the ABI by hardware and OS vendors. BeOS tried to define a specific C++ compiler as their ABI, but it was problematic. C++ does not have an ABI, you cannot link object files from So, basically, there is no suitable industry standard other than C.This is already a solved problem in most other programming languages; there is no fundamental reason that the solutions used in D, C++, or Java could not be applied to C - without even changing any of the language semantics.D and C++ change. C uses the ABI defined by the hardware/OS vendor. It is locked in stone, frozen, beyond discussion. As mentioned BeOS adopted C++. Apple has adopted Objective-C and Swift. But how can you make _all_ the other vendors (Microsoft, Google, IBM etc) standardize on something that isn't C?Aliasing types like that can be useful sometimes, but only within certain limits. In particular, the size (with alignment padding) of the types in question must match, otherwise you will corrupt the stack.I see where you are coming from, but I meant what I said literally. Machine language only deals with bitpatterns. When we interface with machine language we just add lots of constraints on what we hand over to it. Adding _more_ constraints the the creator of the machine language code intended is never wrong. Not adding enough constraints is not ideal, but often difficult to avoid if we care about performance. So if I write a piece of machine language code and give you the object file you only have my words for what the input is supposed to be. And then you have to make a formulation of the constraints that fits your use case and is expressible in your language. Different languages have different levels of expressiveness for describing and enforcing type constraints.
Feb 04 2016
On Friday, 5 February 2016 at 00:41:52 UTC, Ola Fosheim Grøstad wrote:On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:Why would simply adding a warning change any of that? No ABI changes are required. Backwards compatibility is not broken.But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer?C has to be backwards compatible, but I don't know why people do larger projects in C in 2016. [...]
Feb 04 2016
On Friday, 5 February 2016 at 00:50:32 UTC, tsbockman wrote:On Friday, 5 February 2016 at 00:41:52 UTC, Ola Fosheim Grøstad wrote:Not sure what you mean by adding a warning. You can probably find sanitizers that do it, but the standard does not require warnings for anything (AFAIK). That is up to compiler vendors. As for why C isn't displaced by something better, maybe the right question is: why don't new languages stick to the C ABI and provide sensible C code gen. Well, they want more features... and features... and features... There is probably a market for it, but nobody can be bothered to create and maintain a simple modern system level language.On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:Why would simply adding a warning change any of that? No ABI changes are required. Backwards compatibility is not broken.But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer?C has to be backwards compatible, but I don't know why people do larger projects in C in 2016. [...]
Feb 04 2016
On Fri, Feb 05, 2016 at 12:14:11AM +0000, tsbockman via Digitalmars-d wrote: [...]This isn't even a particularly expensive (in compile-time costs) check to perform anyway; all that is necessary is to store a temporary table of symbol signatures somewhere (it doesn't need to be in RAM), and check that any duplicate entries are consistent with each other before linking.That's a lot more expensive than you think. There's a reason most modern linkers do not do full cross-referencing of symbols -- because doing so would be excruciatingly slow and consume gobs of memory. Even a 32GB machine would not be able to hold *all* the symbols in some very large software projects, and looking things up on disk is unacceptably slow for software of those sizes. Most modern linkers instead use faster algorithms that rely on clever scheduling of the order of symbol resolution, just so they *don't* have to cross-reference all symbols at once. Besides, all this is unnecessary work. All you need to do is to have C compilers mangle function names. Mission accomplished. (However, this *will* break a lot of existing inter-language code that rely on being able to spell out symbols explicitly. So it probably will not fly. But, in theory, it *is* possible...) And to paraphrase one of my favorite Walter quotes: fixing inconsistent function signatures is only plugging one hole in a cheese grater. C has far more dangerous gotchas than just function signature mismatches. T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
Feb 04 2016
On Thursday, 4 February 2016 at 23:25:58 UTC, Ola Fosheim Grøstad wrote:In C, compilation units are completely independent, and can in fact come from different compilers and different languages. C is very much a system level programming language.I should also point out that D can link to (more or less) anything that C can, and yet does not have the weakness exploited by the winning entry. The only real reason that D is one wit less of a "system level programming language" than C, is the heavyweight runtime library - but that is irrelevant to the problem of type-checking cross-module references within the same code base.
Feb 04 2016
On Thu, Feb 04, 2016 at 11:21:54PM +0000, tsbockman via Digitalmars-d wrote: [...]Definitely. What puzzles me about the winning entry, though, is that the compiler and/or linker should be able to trivially detect the type mismatch *after* the preprocessor pass(es) are already done.It cannot, because C symbols are not mangled. The function name uniquely identifies the function, and the signature is not encoded anywhere. The linker knows nothing about types or parameters; all it knows is that within offset X of binary blob B, there's a binary number (usually a 32- or 64-bit address) associated with a symbol that it needs to replace with the value (i.e., address) of that symbol, which it obtains from the object file that defines that symbol. So as far as the linker is concerned, the function names match up, and that's all there is to it. C provides zero protection against calling functions with mismatched parameters if the caller is not in the same file, and does not have the right declaration. E.g.: /* module1.c */ void func(int a, int b) { ... } /* module2.c */ extern int func(double x); /* I'm too lazy to #include a header */ int main() { int x = func(1.0); /* kaboom */ } In theory, this problem is solved by #include'ing the appropriate header file, but even that isn't free from accidents like forgetting to update the header after you change the function signature. Of course, most sane C projects will also #include the header in the file that defines the function, in which case, finally, the compiler will catch the mistake. But you can see just how fragile this is, and how many points of failure it has, and, believe it or not, there *are* still C projects out there that don't follow the convention of one header per .c file, and of those that do, a frightening number do not #include the header in the .c file. This isn't the whole story, either. Even if you follow said conventions to prevent function signature mismatches, problems can still occur. For instance, once I've had to debug a mysterious crash problem in an enterprise project that, seemingly, cannot be found in the code. Turns out, that it was caused by two shared libraries that defined two different functions under the same name. Since the conflicting functions are in separately-compiled libraries, the compiler is oblivious to the conflict. Furthermore, the linker doesn't detect it either, because, being shared libraries, all the linker knows is that it found symbol X in library1, so it didn't bother looking for symbol X again in library2 which is processed afterward. An unrelated code change caused the order of libraries linked to change, and suddenly now the linker finds symbol X in library2 first, leading to the function call being linked to the wrong implementation. So at runtime, kaboom. Name mangling singlehandedly solves all of the above problems. T -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
Feb 04 2016
On 2/4/2016 3:10 PM, H. S. Teoh via Digitalmars-d wrote:The C preprocessor accepts all sorts of nasty, nonsensical things.The preprocessor makes C++ into an inherently unreliable, unsafe programming language. I've talked to some C++ committee members about this, about why there is no push to rid (at least deprecate) all use of the preprocessor. The general reaction I get is it is unimportant to do so.
Feb 04 2016
On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:Actually, I'm surprised that this works even in C - I would have expected at least a compiler (or linker?) warning; this seems like it should be easy to detect automatically.AFAICT C would have complained if he had included <math.h>. This is a rather unlikely mistake... Anyway, in C being able to work around restrictions is sometimes desired, so... if you don't want the ability to do it, don't use C.
Feb 04 2016
On Thursday, 4 February 2016 at 23:19:20 UTC, Ola Fosheim Grøstad wrote:On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:What restriction does not checking, by default, that the parameter types match allow one to work around, though? C already has `void*` and explicit casts, either of which would allow one to explicitly indicate that type checking is not desired.Actually, I'm surprised that this works even in C - I would have expected at least a compiler (or linker?) warning; this seems like it should be easy to detect automatically.AFAICT C would have complained if he had included <math.h>. This is a rather unlikely mistake... Anyway, in C being able to work around restrictions is sometimes desired, so... if you don't want the ability to do it, don't use C.
Feb 04 2016
On Thu, 04 Feb 2016 22:57:00 +0000, tsbockman wrote:The annual Underhanded C Contest announced their winners today. As always, the results are very entertaining, and also an excellent advertisement for languages-that-are-not-C. The first place entry is particularly ridiculous; is there any modern language that would make it so easy to commit such an awful "mistake"? http://www.underhanded-c.org/#winner Actually, I'm surprised that this works even in C - I would have expected at least a compiler (or linker?) warning; this seems like it should be easy to detect automatically.C linkage does zero name mangling, which is the problem. C++ introduced name mangling, so compiling with g++ would show the error rather quickly. C99 is pretty close to C++98, but there are enough differences that that isn't a reliable diagnostic. (Though if you're familiar with the differences, you could use it as a quick way to show potential problem areas.) I suppose a compiler could produce two symbol tables, one featuring mangled names and one with unmangled names. The linker would prefer matching mangled names and issue a warning if it only had an unmangled match with a mangled false match.
Feb 04 2016
On Thursday, 4 February 2016 at 23:24:21 UTC, Chris Wright wrote:C linkage does zero name mangling, which is the problem. C++ introduced name mangling, so compiling with g++ would show the error rather quickly. C99 is pretty close to C++98, but there are enough differences that that isn't a reliable diagnostic. (Though if you're familiar with the differences, you could use it as a quick way to show potential problem areas.) I suppose a compiler could produce two symbol tables, one featuring mangled names and one with unmangled names. The linker would prefer matching mangled names and issue a warning if it only had an unmangled match with a mangled false match.That explains why the linker doesn't catch it. I still don't see much excuse for the compiler allowing it though, beyond a desire to allow each module to be compiled independently.
Feb 04 2016
On Thursday, 4 February 2016 at 23:29:10 UTC, tsbockman wrote:That explains why the linker doesn't catch it. I still don't see much excuse for the compiler allowing it though, beyond a desire to allow each module to be compiled independently.The excuse is that C use the same mechanism for creating bindings to C and non-C code. It is actually very handy. IF you want a system level language and full separation of compilation units (which allows for very fast compilation).
Feb 04 2016
On Thu, 04 Feb 2016 23:29:10 +0000, tsbockman wrote:That explains why the linker doesn't catch it. I still don't see much excuse for the compiler allowing it though, beyond a desire to allow each module to be compiled independently.Doing this sort of validation requires build system integration (track the command line arguments that went into producing this object file; find which object files are combined into which targets; run the analysis on that) and costs as much time as compiling the whole project from scratch. Developing such a system is nontrivial, so it's not a matter of conjuring excuses; rather, someone would have to put in considerable effort to make it work. I'm betting some of the commercial static analyzers for C do this, but they're not the sort of things you install on every dev machine and run on every build. Generally they're the sort of thing that you send off to the security company anda they send you a report some weeks later.
Feb 04 2016
On Friday, 5 February 2016 at 00:03:56 UTC, Chris Wright wrote:Doing this sort of validation requires build system integration (track the command line arguments that went into producing this object file; find which object files are combined into which targets; run the analysis on that) and costs as much time as compiling the whole project from scratch.There is no need to take "as much time as compiling the whole project from scratch". The necessary information is already gathered during the normal course of compilation; all that is required is to actually save it somewhere until link-time, instead of throwing it away. The time required for the check should be at most O(N log(N)), where N is the number of function and global variable declarations in the project. The space required for the table is O(N). In both cases the constant factors should be quite small.Developing such a system is nontrivial, so it's not a matter of conjuring excuses; rather, someone would have to put in considerable effort to make it work.Adding any interesting feature to a build system is usually nontrivial, but I still think you're overestimating the cost of this one. Again, the hard part (finding all the signatures and processing them into a semantically meaningful form) is already being done by the compiler. The results just need to be saved, sorted, and scanned for conflicts.
Feb 04 2016
On Fri, 05 Feb 2016 00:38:16 +0000, tsbockman wrote:On Friday, 5 February 2016 at 00:03:56 UTC, Chris Wright wrote:True. That works if this is baked into your compiler, or if your compiler has plugin support. And you'd have to compile with this plugin or the relevant options turned on by default in order for you not to duplicate work. That's partly an engineering issue (build this thing in this particular way) and partly a social issue (get people to run it by default; have them add the extra flag to the makefile to specify to create the relevant output; possibly get your compiler vendor to build it in, depending on what compiler your devs are using). I imagine Google, to take a random example where I have experience, would add this as a presubmit step rather than requiring it on every build.Doing this sort of validation requires build system integration (track the command line arguments that went into producing this object file; find which object files are combined into which targets; run the analysis on that) and costs as much time as compiling the whole project from scratch.There is no need to take "as much time as compiling the whole project from scratch". The necessary information is already gathered during the normal course of compilation; all that is required is to actually save it somewhere until link-time, instead of throwing it away.
Feb 04 2016
On Friday, 5 February 2016 at 00:56:16 UTC, Chris Wright wrote:True. That works if this is baked into your compiler, or if your compiler has plugin support. And you'd have to compile with this plugin or the relevant options turned on by default in order for you not to duplicate work.On Friday, 5 February 2016 at 00:56:28 UTC, Ola Fosheim Grøstad wrote:Not sure what you mean by adding a warning. You can probably find sanitizers that do it, but the standard does not require warnings for anything (AFAIK). That is up to compiler vendors.Quoting myself (emphasis added): On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:Actually, I'm surprised that this works even in C - I would have expected at least a COMPILER (or linker?) warning; this seems like it should be easy to detect automatically.All along I have been saying this is something that *compilers* should warn about. As far as I can recall, I never suggested using linters, sanitizers, changing the C standard - or even compiler plugins. (I did suggest the linker as an alternative, but you all have already explained why that can't work for C.)
Feb 04 2016
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:All along I have been saying this is something that *compilers* should warn about.The compiler doesn't have all the information you need. You could add it to the build system or the linker as well as the compiler. Adding it to the linker is almost identical to my previous suggestion of adding optional name mangling to C.
Feb 04 2016
On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote: The compiler doesn't have all the information you need. You could add it to the build system or the linker as well as the compiler. Adding it to the linker is almost identical to my previous suggestion of adding optional name mangling to C.What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name. 3) Finally, scan the list for entries that share the same name, but have incompatible type signatures. Emit warning messages as needed. (The compiler should be used for this step, because it already has a lot of information about C's type system built into it that can help define "incompatible" sensibly.) As far as I can see, this requires an extra pass, but no additional information. What am I missing?
Feb 04 2016
On Fri, 05 Feb 2016 04:02:41 +0000, tsbockman wrote:On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:It doesn't know what targets I'm ultimately creating, and it doesn't know what files have been modified that I'm about to compile (but haven't compiled yet). Example 1: I compile one .c file referencing a function: void foo(int); That's going to end up in libfoo.so. I compile another .c file in the same directory defining a function: void foo(float); That's going to end up in libbar.so. No bug here. (The linker should tell us if someone depends on foo from libbar and foo from libfoo in the same executable.) How does your putative compiler plugin handle it? Either I have to define a build rule for every source file to specify where to put this symbol cache (and you need to add parameters for the plugin to look for multiple caches, because libfoo and libbar share a lot of source files), or the plugin gives me false positives. Example 2: I compile a.c: int foo(int i) { return i + 1; } In the course of refactoring, I delete that function from a.c and add it to b.c with modifications: int foo(int i, int increment) { return i + increment; } My build script recompiles b.c before it recompiles a.c. Your compiler plugin produces a build error, halting my build. I have to make clean && make in order to proceed -- and that's assuming I know your tool doesn't work well with incremental compilation. The first problem might be uncommon, but the second would crop up constantly. They have the same fix: collect the information when you compile, evaluate it when you link.On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote: The compiler doesn't have all the information you need. You could add it to the build system or the linker as well as the compiler. Adding it to the linker is almost identical to my previous suggestion of adding optional name mangling to C.What information, specifically, is the compiler missing?
Feb 04 2016
On Friday, 5 February 2016 at 06:05:49 UTC, Chris Wright wrote:It doesn't know what targets I'm ultimately creating, and it doesn't know what files have been modified that I'm about to compile (but haven't compiled yet). Example 1: I compile one .c file referencing a function: void foo(int); That's going to end up in libfoo.so. I compile another .c file in the same directory defining a function: void foo(float); That's going to end up in libbar.so. No bug here. (The linker should tell us if someone depends on foo from libbar and foo from libfoo in the same executable.) How does your putative compiler plugin handle it? Either I have to define a build rule for every source file to specify where to put this symbol cache (and you need to add parameters for the plugin to look for multiple caches, because libfoo and libbar share a lot of source files), or the plugin gives me false positives. Example 2: I compile a.c: int foo(int i) { return i + 1; } In the course of refactoring, I delete that function from a.c and add it to b.c with modifications: int foo(int i, int increment) { return i + increment; } My build script recompiles b.c before it recompiles a.c. Your compiler plugin produces a build error, halting my build. I have to make clean && make in order to proceed -- and that's assuming I know your tool doesn't work well with incremental compilation. The first problem might be uncommon, but the second would crop up constantly. They have the same fix: collect the information when you compile, evaluate it when you link.No spurious error is generated by my proposal in your example 2, because I specifically stated that the extra pass must be done once, after *all* modules have been compiled. I see, however, that this would require one of: 1) Modifying build scripts to pass the complete list of .c files to the compiler in a single command, or 2) Modifying build scripts to run the compiler one extra time after processing all the .c files, or 3) Run the final check at link-time. For a C tool chain with a clean-sheet design, any of those would handle example 2 fine. (1) or (3) could also handle example 1 without issue. However, as you say, only (3) is backwards compatible with existing make files and what-not. (This is not a limitation of the C language or ABI, though.)
Feb 04 2016
On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:This would make compilation of large projects excruciatingly slow.On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote: The compiler doesn't have all the information you need. You could add it to the build system or the linker as well as the compiler. Adding it to the linker is almost identical to my previous suggestion of adding optional name mangling to C.What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.3) Finally, scan the list for entries that share the same name, but have incompatible type signatures. Emit warning messages as needed. (The compiler should be used for this step, because it already has a lot of information about C's type system built into it that can help define "incompatible" sensibly.)This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)As far as I can see, this requires an extra pass, but no additional information. What am I missing?The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables. So as others have said, this can only work for compilers that are aware of the larger picture than just the single source file it's currently compiling. Even in D, for a sufficiently large project the compiler can't see everything at once either, because it won't fit into your RAM. Thankfully, D doesn't suffer from this particular problem because of name mangling. Which is why I said, adding name mangling to the C compiler will solve this problem. Except that it breaks existing inter-language code, so it won't work for *all* C programs. And it will also break linkage with existing shared libraries, which are *not* name-mangled. (Recompiling said libraries may not be an option if they are OEM, binary-only blobs.) So it can only work for self-contained, independent projects with no inter-language linkage, which would be a very restricted subset of C codebases. T -- Nobody is perfect. I am Nobody. -- pepoluan, GKC forum
Feb 04 2016
On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time. I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from?On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:This would make compilation of large projects excruciatingly slow.On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.Chris Wright pointed this out, as well. This just means the final pass should be done at link-time, though. It's not a fundamental problem with generating the warning.3) Finally, scan the list for entries that share the same name, but have incompatible type signatures. Emit warning messages as needed. (The compiler should be used for this step, because it already has a lot of information about C's type system built into it that can help define "incompatible" sensibly.)This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)This could be worked around with a little cooperation between the compiler and the linker. It's not even a feature of C the language - it's just the way current tool chains happen to work.As far as I can see, this requires an extra pass, but no additional information. What am I missing?The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables.
Feb 04 2016
On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d wrote:On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:OK, probably I'm misunderstanding something here. :-POn Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time. I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from?On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:This would make compilation of large projects excruciatingly slow.On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.The problem is, the linker knows nothing about the language. Arguably it should, but as things stand currently, it doesn't, and can't, because usually linkers are shipped with the OS, and are expected to link object files of *any* pedigree without needing to code for language-explicit checks. Perhaps this is slowly starting to change, as LTO and other recent innovations are pushing the envelope of what the linker can do. Maybe one day there will emerge a language-agnostic way for the linker to check for such errors... but I really don't see it happening, because languages *other* than C have already solved the problem with name mangling. There isn't much motivation for linkers to change just because C has some language design issues. (And note that I'm not trying to disagree with you -- I'm totally in agreement that what C allows is oftentimes extremely dangerous and rather unwise. But the way things are is just so entrenched that it's unlikely to change in the near (or even distant) future.)Chris Wright pointed this out, as well. This just means the final pass should be done at link-time, though. It's not a fundamental problem with generating the warning.3) Finally, scan the list for entries that share the same name, but have incompatible type signatures. Emit warning messages as needed. (The compiler should be used for this step, because it already has a lot of information about C's type system built into it that can help define "incompatible" sensibly.)This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)And that's where the sticky part lies. Current toolchains work in this, arguably suboptimal, way mainly because of historical baggage, but more because doing otherwise will make the toolchain incompatible with existing other toolchains and systems. The current divide between compiler and linker is actually IMO not in the best place it could be, as it hampers a lot of what, arguably, should be the compiler's job, not the linker's. Nevertheless, changing this means you become incompatible with much of the ecosystem and become a walled garden -- like Java (JNI was an afterthought, and requires a very specific setup to even work -- there's definitely no way to link Java objects with OS-level object files without jumping through lots of hoops with lots of caveats). I just don't see this ever happening, especially not for something that, in the big picture, really isn't *that* big of a deal. After all, C coders have gotten used to working with far more dangerous things in C than merely mismatched prototypes; it would take a LOT more than that for people to accept changing the way things work. T -- Skill without imagination is craftsmanship and gives us many useful objects such as wickerwork picnic baskets. Imagination without skill gives us modern art. -- Tom StoppardThis could be worked around with a little cooperation between the compiler and the linker. It's not even a feature of C the language - it's just the way current tool chains happen to work.As far as I can see, this requires an extra pass, but no additional information. What am I missing?The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables.
Feb 05 2016
On Fri, 05 Feb 2016 10:04:01 -0800, H. S. Teoh via Digitalmars-d wrote:On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d wrote:I think you're talking about maintaining an in-memory, modifiable data structure, doing one insert per operation and one point query per use. That's useful for incremental compilation, but it's going to be pretty slow. tsbockman is thinking of a single pass at link time that checks everything at once. You append an entry to a list for each prototype and definition, then later sort all those lists together by name. Error on duplicate names with mismatched signatures. This is faster for fresh builds than it is for incremental compilation -- tsbockman mentioned a brief benchmark, and that cost would crop up on every build, even if you'd only changed one line of code. (Granted, that example was pretty huge.) But this might typically be faster than a bunch of point queries even with incremental compilation. Anyway, that's why I'm thinking most people who used such a feature would turn it on in their continuous integration server or as a presubmit step rather than every build.On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:OK, probably I'm misunderstanding something here. :-POn Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time. I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from?On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:This would make compilation of large projects excruciatingly slow.On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.The problem is, the linker knows nothing about the language.We're only talking about a linker because we need to run this tool after compiling all your files, and it has to know what input files you're putting into the linker. So this "linker" is really just a shell script that invokes our checker and then calls the system linker.
Feb 05 2016
On Friday, 5 February 2016 at 20:35:16 UTC, Chris Wright wrote:On Fri, 05 Feb 2016 10:04:01 -0800, H. S. Teoh via Digitalmars-d wrote:Yes.On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d wrote:I think you're talking about maintaining an in-memory, modifiable data structure, doing one insert per operation and one point query per use. That's useful for incremental compilation, but it's going to be pretty slow. tsbockman is thinking of a single pass at link time that checks everything at once. You append an entry to a list for each prototype and definition, then later sort all those lists together by name. Error on duplicate names with mismatched signatures.On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:OK, probably I'm misunderstanding something here. :-PThis is faster for fresh builds than it is for incremental compilation -- tsbockman mentioned a brief benchmark, and that cost would crop up on every build, even if you'd only changed one line of code. (Granted, that example was pretty huge.) But this might typically be faster than a bunch of point queries even with incremental compilation. Anyway, that's why I'm thinking most people who used such a feature would turn it on in their continuous integration server or as a presubmit step rather than every build.It doesn't necessarily have to be slow when you only changed one line: * The list from the previous compilation could be re-used to speed things up considerably, although retaining it would cost some disk space. * If that's still too expensive, just don't cross-check files that aren't being recompiled. The check will be less useful on incremental builds, but not *useless*. The CI server can still do the full check (using the compiler), as you suggest.Yes. (Or, it's the compiler with a special option set, which then calls the linker after it finishes its global pre-link tasks.)The problem is, the linker knows nothing about the language.We're only talking about a linker because we need to run this tool after compiling all your files, and it has to know what input files you're putting into the linker. So this "linker" is really just a shell script that invokes our checker and then calls the system linker.
Feb 05 2016
On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:I did some quick tests on my system, and even with 100,000,000 names (more names than there are lines of code in the Linux kernel...) this can be done in less than three minutes. Smaller projects take seconds or less. I suspect there is a major disconnect between what I meant, and what you think I meant.1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.This would make compilation of large projects excruciatingly slow.
Feb 05 2016
On Friday, 5 February 2016 at 01:10:53 UTC, tsbockman wrote:All along I have been saying this is something that *compilers* should warn about. As far as I can recall, I never suggested using linters, sanitizers, changing the C standard - or even compiler plugins.Well, compilers "should" only implement the standard, then they "may" add extra static analysis. The direction C and C++ takes is that increasing compilation times by doing extra static analysis on every build isn't desirable. Therefore compilers should focus on what is necessary for code gen and optimization and sanitizers should focus on correctness. This is different from Rust, who do sanitization as part of their compilation, but that makes the compiler more complicated and/or much _slower_.(I did suggest the linker as an alternative, but you all have already explained why that can't work for C.)It can work if you compile all source files with the same compiler, that has historically not been the case as commercial libraries would be compiled with other compilers or be handwritten assembly. C compilers that do Whole Program Analysis have dedicated linkers that should be able to do extended type checking if the IR used in the object file provides typing info. I don't know if Clang or GCC does emit typing info though, but they _could_. Yes.
Feb 05 2016
Let me add to this that the superior approach is to compile to an intermediated high level format that retains type information. I guess this is where Rust is heading. It just isn't possible with C semantics to make a reasonable version of that, since the language itself is 90% unsafe and just a small step up from assembly (for good and bad).
Feb 05 2016
On 04.02.2016 23:57, tsbockman wrote:http://www.underhanded-c.org/#winner Actually, I'm surprised that this works even in C - I would have expected at least a compiler (or linker?) warning; this seems like it should be easy to detect automatically.You can do the same thing in D, using extern(C) to get no mangling: main.d: ---- alias float_t = double; extern(C) float_t deref(float_t* a); void main() { import std.stdio: writeln; float_t d = 1.23; writeln(deref(&d)); /* prints "1.01856e-314" */ } ---- deref.d: ---- alias float_t = float; extern(C) float_t deref(float_t* a) {return *a;} ---- Command to build and run: ---- dmd main.d deref.d && ./main ----
Feb 04 2016
On Thursday, 4 February 2016 at 23:40:13 UTC, anonymous wrote:You can do the same thing in D, using extern(C) to get no mangling: main.d: ---- alias float_t = double; extern(C) float_t deref(float_t* a); void main() { import std.stdio: writeln; float_t d = 1.23; writeln(deref(&d)); /* prints "1.01856e-314" */ } ---- deref.d: ---- alias float_t = float; extern(C) float_t deref(float_t* a) {return *a;} ---- Command to build and run: ---- dmd main.d deref.d && ./main ----You can do the same thing in D if you try, but it's not natural at all to use `extern(C)` for *internal* linkage of an all-D program like that. Any competent reviewer would certainly question why you were using `extern(C)`; this scores much lower in "underhanded-ness" than the original C program. Even so, I think that qualifies as a compiler bug or a hole in the D spec.
Feb 04 2016
On 05.02.2016 00:47, tsbockman wrote:You can do the same thing in D if you try, but it's not natural at all to use `extern(C)` for *internal* linkage of an all-D program like that. Any competent reviewer would certainly question why you were using `extern(C)`; this scores much lower in "underhanded-ness" than the original C program.We do have a lot of bindings to C libraries, though. When there's a wrong alias in one of them, you have the same scenario.Even so, I think that qualifies as a compiler bug or a hole in the D spec.Can anything be done about it? The compiler simply has no way to verify declarations, has it?
Feb 04 2016
On Thursday, 4 February 2016 at 23:51:57 UTC, anonymous wrote:We do have a lot of bindings to C libraries, though. When there's a wrong alias in one of them, you have the same scenario. On 05.02.2016 00:47, tsbockman wrote:The compiler cannot (in the general case) verify that `extern(C)` declarations are *correct*. What it could do, though, is verify that they are *consistent*. If the same `extern(C)` symbol is declared multiple places in the D source code for a program, the compiler should issue at least a warning if the D signatures don't agree with each other.Even so, I think that qualifies as a compiler bug or a hole in the D spec.Can anything be done about it? The compiler simply has no way to verify declarations, has it?
Feb 04 2016
On Friday, 5 February 2016 at 00:03:20 UTC, tsbockman wrote:If the same `extern(C)` symbol is declared multiple places in the D source code for a program, the compiler should issue at least a warning if the D signatures don't agree with each other.I guess D could do it, although this is a rather unlikely source for bugs. C cannot do it. It would be annoying as declarations are file local. C doesn't really build programs, it builds object files that are linked into a program. It makes perfect sense for one compilation unit to type a parameter pointer to float and another unit to type the same parameter as a simd-array of floats. The underlying code could be machine language. And in machine language there are no types (on current CPUs), only bit patterns. So you can have multiple reasonable interpretations of the same machine language entry. A type is a constraint, but it isn't a property of the actual bits, it is a language specific interpretation.
Feb 04 2016
On Friday, 5 February 2016 at 00:12:07 UTC, Ola Fosheim Grøstad wrote:It makes perfect sense for one compilation unit to type a parameter pointer to float and another unit to type the same parameter as a simd-array of floats. The underlying code could be machine language. And in machine language there are no types (on current CPUs), only bit patterns. So you can have multiple reasonable interpretations of the same machine language entry. A type is a constraint, but it isn't a property of the actual bits, it is a language specific interpretation.Aliasing types like that can be useful sometimes, but only within certain limits. In particular, the size (with alignment padding) of the types in question must match, otherwise you will corrupt the stack. It is often useful to cast from one pointer type to another, but that is why C has void* and explicit casts - so that one may document that the reinterpretation is intentional.
Feb 04 2016
On 5/02/2016 11:03 AM, tsbockman wrote:The compiler cannot (in the general case) verify that `extern(C)` declarations are *correct*. What it could do, though, is verify that they are *consistent*. If the same `extern(C)` symbol is declared multiple places in the D source code for a program, the compiler should issue at least a warning if the D signatures don't agree with each other.Currently D allows overloading extern(C) declarations, see https://issues.dlang.org/show_bug.cgi?id=15217 Checking for invalid overloads with non-D linkage is covered here: https://issues.dlang.org/show_bug.cgi?id=2789 But neither of these cover overloads that aren't simultaneously visible. 15217 shows us that this lack of checking, when combined with D's abundant binary-compatible-but-distinct types, is somewhat useful. Apart from some scary ABI hacks there is nothing really stopping us from enforcing that all non-D function in all modules included in a single compilation have distinct symbol names or (at least binary-compatible) matching D parameters.
Feb 05 2016
On Friday, 5 February 2016 at 10:49:50 UTC, Daniel Murphy wrote:Currently D allows overloading extern(C) declarations, see https://issues.dlang.org/show_bug.cgi?id=15217 Checking for invalid overloads with non-D linkage is covered here: https://issues.dlang.org/show_bug.cgi?id=2789 But neither of these cover overloads that aren't simultaneously visible. 15217 shows us that this lack of checking, when combined with D's abundant binary-compatible-but-distinct types, is somewhat useful. Apart from some scary ABI hacks there is nothing really stopping us from enforcing that all non-D function in all modules included in a single compilation have distinct symbol names or (at least binary-compatible) matching D parameters.I think it makes sense (when actually linking to C) to allow stuff like druntime's creative use of overloads. The signatures of the two bsd_signal() overloads are compatible (from C's perspective), so why not? However, multiple `extern(C)` overloads that differ in the number or size of arguments should trigger a warning. Signed versus unsigned or even int versus floating point is more of a gray area. Overloads with conflicting pointer types should definitely be allowed, but ideally the compiler would force them to be marked system or trusted, since there is an implied unsafe cast in there somewhere.
Feb 05 2016
On 5/02/2016 10:07 PM, tsbockman wrote:I think it makes sense (when actually linking to C) to allow stuff like druntime's creative use of overloads. The signatures of the two bsd_signal() overloads are compatible (from C's perspective), so why not? However, multiple `extern(C)` overloads that differ in the number or size of arguments should trigger a warning. Signed versus unsigned or even int versus floating point is more of a gray area.That's what I meant by binary compatible.Overloads with conflicting pointer types should definitely be allowed, but ideally the compiler would force them to be marked system or trusted, since there is an implied unsafe cast in there somewhere.Safety on C functions is always going to need to be hand verified, the presence of overloads doesn't change that. Conflicting pointer types are pretty much the same as a function taking void* - all the unsafe stuff is on the other side and invisible to the D compiler.
Feb 05 2016
On Thu, Feb 04, 2016 at 11:47:53PM +0000, tsbockman via Digitalmars-d wrote: [...]You can do the same thing in D if you try, but it's not natural at all to use `extern(C)` for *internal* linkage of an all-D program like that. Any competent reviewer would certainly question why you were using `extern(C)`; this scores much lower in "underhanded-ness" than the original C program. Even so, I think that qualifies as a compiler bug or a hole in the D spec.Nah... while D, by default, tries to be type-safe and prevent guffaws like the above, it *is* also a systems programming language (or at least, that's one of the stated goals), so it does allow you to go under the hood to do things that you normally aren't allowed to do. Linking to foreign languages is a use case for allowing extern(C) function names: if you know the mangling scheme of the target language, you can declare the mangled name under extern(C) and that will allow D code to call functions written in the target language directly. Otherwise you'd have to change the compiler (and wait for the next release, etc.) before you could do that. T -- Do not reason with the unreasonable; you lose by definition.
Feb 04 2016
On Thu, 04 Feb 2016 15:59:06 -0800, H. S. Teoh via Digitalmars-d wrote:Nah... while D, by default, tries to be type-safe and prevent guffaws like the above, it *is* also a systems programming language (or at least, that's one of the stated goals), so it does allow you to go under the hood to do things that you normally aren't allowed to do.Which suggests a check of this sort should be a warning rather than an error, or perhaps that a pragma or attribute could be offered to ignore it. Systems languages let you go into "Here Be Dragons" territory, but it would be nice if they still pointed out the signs to you.
Feb 04 2016
On Friday, 5 February 2016 at 00:07:45 UTC, Chris Wright wrote:Which suggests a check of this sort should be a warning rather than an error, or perhaps that a pragma or attribute could be offered to ignore it. Systems languages let you go into "Here Be Dragons" territory, but it would be nice if they still pointed out the signs to you.Yes.
Feb 04 2016
On Thursday, 4 February 2016 at 23:59:06 UTC, H. S. Teoh wrote:On Thu, Feb 04, 2016 at 11:47:53PM +0000, tsbockman via Digitalmars-d wrote: [...]I'm not saying that `extern(C)` is bad in general; I understand why it's necessary. I'm saying that anonymous' example (http://forum.dlang.org/post/n90ngu$1r6v$1 digitalmars.com) showcases a hole in the spec, because in it the D compiler has access to the full source code of the function being linked to, and doesn't bother to verify that its signature in main.d is compatible with the definition in deref.d. If the D compiler does *not* have access to the function's definition, then obviously it cannot perform this verification.Even so, I think that qualifies as a compiler bug or a hole in the D spec.Nah... while D, by default, tries to be type-safe and prevent guffaws like the above, it *is* also a systems programming language (or at least, that's one of the stated goals), so it does allow you to go under the hood to do things that you normally aren't allowed to do. Linking to foreign languages is a use case for allowing extern(C) function names: if you know the mangling scheme of the target language, you can declare the mangled name under extern(C) and that will allow D code to call functions written in the target language directly. Otherwise you'd have to change the compiler (and wait for the next release, etc.) before you could do that. T
Feb 04 2016
On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:The first place entry is particularly ridiculous; is there any modern language that would make it so easy to commit such an awful "mistake"?D allows that. This is why I recommend putting `static assert(foo.sizeof == expectation);` in code that interfaces with external things, like C code, or D .di stuff. #include <math.h> /* sqrt */ that line is an interesting one too: the trick is depending on namespace pollution by the include. In D, you might write `import core.stdc.math : sqrt;` and make that misleading comment part of the code.... though then you could perhaps exploit that module bug (314?).
Feb 04 2016
On Friday, 5 February 2016 at 01:14:05 UTC, Adam D. Ruppe wrote:D allows that. This is why I recommend putting `static assert(foo.sizeof == expectation);` in code that interfaces with external things, like C code, or D .di stuff. #include <math.h> /* sqrt */D *doesn't* allow that though - at least, not in a monolithic, idiomatic D program: there wouldn't be any duplicate declaration of `spectral_contrast()` to mess up. Yes, you can force the matter using `extern(C)` like anonymous demonstrated earlier - but using `extern(C)` for internal linkage in an all-D program would certainly attract scrutiny from reviewers; it would score poorly on the "underhanded-ness" test. As to the ".di" stuff - I've not used them. Care to educate me? How can they cause similar problems?that line is an interesting one too: the trick is depending on namespace pollution by the include. In D, you might write `import core.stdc.math : sqrt;` and make that misleading comment part of the code.... though then you could perhaps exploit that module bug (314?).314 definitely has potential. Should we start an "Underhanded D" contest? Sounds like bad marketing, but a lot of fun :-P
Feb 04 2016
On Friday, 5 February 2016 at 01:33:14 UTC, tsbockman wrote:As to the ".di" stuff - I've not used them. Care to educate me? How can they cause similar problems?Well, technically, a .di file is just a .d file renamed, but it tends to have the bodies stripped out. Separate compliation is a supported feature of D. The way you'd do it is something like this: struct Foo { float a; float b; } void bar(Foo* f) { f.b = whatever; } Then compile it with -lib and make a "header" file manually: struct Foo { double a; double b; } void bar(Foo*); You can now create D modules that import this and link against the compiled library. Very similar to C's model... But I redefined Foo! The name mangling won't catch this. bar will be mangled to take `Foo` as an argument and the linker will catch if we change that, but it doesn't know what Foo actually is. By changing that, we introduce the problem.314 definitely has potential. Should we start an "Underhanded D" contest? Sounds like bad marketing, but a lot of fun :-Pit might be :)
Feb 04 2016
On Friday, 5 February 2016 at 04:25:09 UTC, Adam D. Ruppe wrote:On Friday, 5 February 2016 at 01:33:14 UTC, tsbockman wrote:Thanks for the explanation. That does sound basically the same as the C issue. Since .di files are normally generated automatically, this seems like an easily solvable problem: 1) When compiling a library and its attendant .di file(s), generate a unique version identifier (such as a UUID or a hash of the completed binary) and append it to both the library and each .di file. 2) Whenever someone tries to link against the library, verify that the version ID matches. If it does not, issue a prominent warning. Problem solved? Or is this harder than it looks? (Of course there are various details to consider, such as how to efficiently share one set of .di files across many platforms/compiler settings; this is just a rough sketch.)As to the ".di" stuff - I've not used them. Care to educate me? How can they cause similar problems?Well, technically, a .di file is just a .d file renamed, but it tends to have the bodies stripped out. Separate compliation is a supported feature of D. The way you'd do it is something like this: struct Foo { float a; float b; } void bar(Foo* f) { f.b = whatever; } Then compile it with -lib and make a "header" file manually: struct Foo { double a; double b; } void bar(Foo*); You can now create D modules that import this and link against the compiled library. Very similar to C's model... But I redefined Foo! The name mangling won't catch this. bar will be mangled to take `Foo` as an argument and the linker will catch if we change that, but it doesn't know what Foo actually is. By changing that, we introduce the problem.314 definitely has potential. Should we start an "Underhanded D" contest? Sounds like bad marketing, but a lot of fun :-Pit might be :)
Feb 04 2016
On Fri, Feb 05, 2016 at 04:39:13AM +0000, tsbockman via Digitalmars-d wrote: [...]Thanks for the explanation. That does sound basically the same as the C issue. Since .di files are normally generated automatically, this seems like an easily solvable problem: 1) When compiling a library and its attendant .di file(s), generate a unique version identifier (such as a UUID or a hash of the completed binary) and append it to both the library and each .di file. 2) Whenever someone tries to link against the library, verify that the version ID matches. If it does not, issue a prominent warning.[...] This would break shared library upgrades that do not change the ABI. Plus, it doesn't fix wrong linkage at runtime, because the dynamic linker is part of the OS and the D compiler has no control over what it does beyond the standard symbol matching and relocation mechanisms. If you compile against libfoo, but at runtime the user happens to have a stale, ABI-incompatible version of libfoo hanging around that gets picked up by the dynamic linker, you'll have the same problem. T -- VI = Visual Irritation
Feb 04 2016
On Friday, 5 February 2016 at 07:15:56 UTC, H. S. Teoh wrote:This would break shared library upgrades that do not change the ABI. Plus, it doesn't fix wrong linkage at runtime, because the dynamic linker is part of the OS and the D compiler has no control over what it does beyond the standard symbol matching and relocation mechanisms. If you compile against libfoo, but at runtime the user happens to have a stale, ABI-incompatible version of libfoo hanging around that gets picked up by the dynamic linker, you'll have the same problem.I should have clarified that I was considering static libraries, only. (I thought D's dynamic library support was kind of broken right at the moment, anyway?) Dynamic libraries are definitely a harder problem. I think useful automated protection against bad .di files could be developed for dynamic libraries as well, but the scheme wouldn't be anywhere near as simple and it might require the maintainer to actually follow SemVer to be useful.
Feb 04 2016