www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Type safety could prevent nuclear war

reply tsbockman <thomas.bockman gmail.com> writes:
The annual Underhanded C Contest announced their winners today.

As always, the results are very entertaining, and also an 
excellent advertisement for languages-that-are-not-C.

The first place entry is particularly ridiculous; is there any 
modern language that would make it so easy to commit such an 
awful "mistake"?

http://www.underhanded-c.org/#winner

Actually, I'm surprised that this works even in C - I would have 
expected at least a compiler (or linker?) warning; this seems 
like it should be easy to detect automatically.
Feb 04 2016
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Feb 04, 2016 at 10:57:00PM +0000, tsbockman via Digitalmars-d wrote:
 The annual Underhanded C Contest announced their winners today.
 
 As always, the results are very entertaining, and also an excellent
 advertisement for languages-that-are-not-C.
 
 The first place entry is particularly ridiculous; is there any modern
 language that would make it so easy to commit such an awful "mistake"?
 
 http://www.underhanded-c.org/#winner
 
 Actually, I'm surprised that this works even in C - I would have
 expected at least a compiler (or linker?) warning; this seems like it
 should be easy to detect automatically.
The C preprocessor accepts all sorts of nasty, nonsensical things. For example, the following code compiles and runs (without any warning(!) on my Linux box's standard gcc installation), and prints "No": #include <stdio.h> #define if(a) if(!(a)) int main() { int i = 1; if (i == 1) printf("Yes\n"); else printf("No\n"); } Imagine if this nasty #define is buried somewhere under several layers of #include's. I'm pretty sure somebody can also concoct some nasty #define that will break the standard #include headers in horrible ways by changing the semantics of certain supposedly-built-in constructs. T -- Mediocrity has been pushed to extremes.
Feb 04 2016
next sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:10:23 UTC, H. S. Teoh wrote:
 On Thu, Feb 04, 2016 at 10:57:00PM +0000, tsbockman via 
 Digitalmars-d wrote:
 The annual Underhanded C Contest announced their winners today.
 
 As always, the results are very entertaining, and also an 
 excellent advertisement for languages-that-are-not-C.
 
 The first place entry is particularly ridiculous; is there any 
 modern language that would make it so easy to commit such an 
 awful "mistake"?
 
 http://www.underhanded-c.org/#winner
 
 Actually, I'm surprised that this works even in C - I would 
 have expected at least a compiler (or linker?) warning; this 
 seems like it should be easy to detect automatically.
The C preprocessor accepts all sorts of nasty, nonsensical things.
Definitely. What puzzles me about the winning entry, though, is that the compiler and/or linker should be able to trivially detect the type mismatch *after* the preprocessor pass(es) are already done. It should just see that the post-preprocessor signatures of `spectral_contrast()` in match.c and spectral_contrast.c are in conflict, and either issue a warning, or refuse to link them at all.
Feb 04 2016
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 4 February 2016 at 23:21:54 UTC, tsbockman wrote:
 Definitely. What puzzles me about the winning entry, though, is 
 that the compiler and/or linker should be able to trivially 
 detect the type mismatch *after* the preprocessor pass(es) are 
 already done.
Linkers don't know anything about types. A type is a language feature.
 It should just see that the post-preprocessor signatures of 
 `spectral_contrast()` in match.c and spectral_contrast.c are in 
 conflict, and either issue a warning, or refuse to link them at 
 all.
Has nothing to do with the preprocessor. He defined float_t to be an alias for double in one compilation unit, and float_t to be an alias for float in another compilation unit. In C, compilation units are completely independent, and can in fact come from different compilers and different languages. C is very much a system level programming language.
Feb 04 2016
next sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:25:58 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 4 February 2016 at 23:21:54 UTC, tsbockman wrote:
 It should just see that the post-preprocessor signatures of 
 `spectral_contrast()` in match.c and spectral_contrast.c are 
 in conflict, and either issue a warning, or refuse to link 
 them at all.
Has nothing to do with the preprocessor.
Yes, that was my point...
 He defined float_t to be an alias for double in one compilation 
 unit, and float_t to be an alias for float in another 
 compilation unit.

 In C, compilation units are completely independent, and can in 
 fact come from different compilers and different languages. C 
 is very much a system level programming language.
Just because *sometimes* the source code of the other module must be compiled independently, is a poor excuse to skip obvious, useful safety checks *all* the time.
Feb 04 2016
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 4 February 2016 at 23:35:46 UTC, tsbockman wrote:
 Just because *sometimes* the source code of the other module 
 must be compiled independently, is a poor excuse to skip 
 obvious, useful safety checks *all* the time.
The context is a compilation system for building big software on very slow CPUs with kilobytes of RAM. C was designed for always compiling independently and compiling source files that are bigger than what can be held in RAM, and also for building executables that can fill most of system RAM. So the compilation system was designed for using external memory (disk) and that affects C a lot. The forerunner for C, BCPL was a bootstrap language for writing compilers. So C is minimal by design. BTW, C++ programmers sometimes use similar unsafe hacks of "pruned header files" to break dependencies and speed up compilation. So this is not unique to C, but C++ introduced the mangling of types into names to support overloading of functions on parameter types, which is why C++ detects (some) type issues at link time.
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:53:58 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 4 February 2016 at 23:35:46 UTC, tsbockman wrote:
 Just because *sometimes* the source code of the other module 
 must be compiled independently, is a poor excuse to skip 
 obvious, useful safety checks *all* the time.
The context is a compilation system for building big software on very slow CPUs with kilobytes of RAM. C was designed for always compiling independently and compiling source files that are bigger than what can be held in RAM, and also for building executables that can fill most of system RAM. So the compilation system was designed for using external memory (disk) and that affects C a lot. The forerunner for C, BCPL was a bootstrap language for writing compilers. So C is minimal by design.
OK. That's a good reason for C's original design. But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler running on such a system skip safety checks just because they would be too expensive to run on some *other* computer? This isn't even a particularly expensive (in compile-time costs) check to perform anyway; all that is necessary is to store a temporary table of symbol signatures somewhere (it doesn't need to be in RAM), and check that any duplicate entries are consistent with each other before linking. This is already a solved problem in most other programming languages; there is no fundamental reason that the solutions used in D, C++, or Java could not be applied to C - without even changing any of the language semantics.
Feb 04 2016
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
 But it's 2016 and my PC has 32GiB of RAM. Why should a C 
 compiler running on such a system skip safety checks just 
 because they would be too expensive to run on some *other* 
 computer?
C has to be backwards compatible, but I don't know why people do larger projects in C in 2016. Libraries are done in C for portability and because it provides a FFI interface defined as the ABI by hardware and OS vendors. BeOS tried to define a specific C++ compiler as their ABI, but it was problematic. C++ does not have an ABI, you cannot link object files from So, basically, there is no suitable industry standard other than C.
 This is already a solved problem in most other programming 
 languages; there is no fundamental reason that the solutions 
 used in D, C++, or Java could not be applied to C - without 
 even changing any of the language semantics.
D and C++ change. C uses the ABI defined by the hardware/OS vendor. It is locked in stone, frozen, beyond discussion. As mentioned BeOS adopted C++. Apple has adopted Objective-C and Swift. But how can you make _all_ the other vendors (Microsoft, Google, IBM etc) standardize on something that isn't C?
 Aliasing types like that can be useful sometimes, but only 
 within certain limits. In particular, the size (with alignment 
 padding) of the types in question must match, otherwise you 
 will corrupt the stack.
I see where you are coming from, but I meant what I said literally. Machine language only deals with bitpatterns. When we interface with machine language we just add lots of constraints on what we hand over to it. Adding _more_ constraints the the creator of the machine language code intended is never wrong. Not adding enough constraints is not ideal, but often difficult to avoid if we care about performance. So if I write a piece of machine language code and give you the object file you only have my words for what the input is supposed to be. And then you have to make a formulation of the constraints that fits your use case and is expressible in your language. Different languages have different levels of expressiveness for describing and enforcing type constraints.
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 00:41:52 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
 But it's 2016 and my PC has 32GiB of RAM. Why should a C 
 compiler running on such a system skip safety checks just 
 because they would be too expensive to run on some *other* 
 computer?
C has to be backwards compatible, but I don't know why people do larger projects in C in 2016. [...]
Why would simply adding a warning change any of that? No ABI changes are required. Backwards compatibility is not broken.
Feb 04 2016
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Friday, 5 February 2016 at 00:50:32 UTC, tsbockman wrote:
 On Friday, 5 February 2016 at 00:41:52 UTC, Ola Fosheim Grøstad 
 wrote:
 On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
 But it's 2016 and my PC has 32GiB of RAM. Why should a C 
 compiler running on such a system skip safety checks just 
 because they would be too expensive to run on some *other* 
 computer?
C has to be backwards compatible, but I don't know why people do larger projects in C in 2016. [...]
Why would simply adding a warning change any of that? No ABI changes are required. Backwards compatibility is not broken.
Not sure what you mean by adding a warning. You can probably find sanitizers that do it, but the standard does not require warnings for anything (AFAIK). That is up to compiler vendors. As for why C isn't displaced by something better, maybe the right question is: why don't new languages stick to the C ABI and provide sensible C code gen. Well, they want more features... and features... and features... There is probably a market for it, but nobody can be bothered to create and maintain a simple modern system level language.
Feb 04 2016
prev sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Feb 05, 2016 at 12:14:11AM +0000, tsbockman via Digitalmars-d wrote:
[...]
 This isn't even a particularly expensive (in compile-time costs) check
 to perform anyway; all that is necessary is to store a temporary table
 of symbol signatures somewhere (it doesn't need to be in RAM), and
 check that any duplicate entries are consistent with each other before
 linking.
That's a lot more expensive than you think. There's a reason most modern linkers do not do full cross-referencing of symbols -- because doing so would be excruciatingly slow and consume gobs of memory. Even a 32GB machine would not be able to hold *all* the symbols in some very large software projects, and looking things up on disk is unacceptably slow for software of those sizes. Most modern linkers instead use faster algorithms that rely on clever scheduling of the order of symbol resolution, just so they *don't* have to cross-reference all symbols at once. Besides, all this is unnecessary work. All you need to do is to have C compilers mangle function names. Mission accomplished. (However, this *will* break a lot of existing inter-language code that rely on being able to spell out symbols explicitly. So it probably will not fly. But, in theory, it *is* possible...) And to paraphrase one of my favorite Walter quotes: fixing inconsistent function signatures is only plugging one hole in a cheese grater. C has far more dangerous gotchas than just function signature mismatches. T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
Feb 04 2016
prev sibling parent tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:25:58 UTC, Ola Fosheim Grøstad 
wrote:
 In C, compilation units are completely independent, and can in 
 fact come from different compilers and different languages. C 
 is very much a system level programming language.
I should also point out that D can link to (more or less) anything that C can, and yet does not have the weakness exploited by the winning entry. The only real reason that D is one wit less of a "system level programming language" than C, is the heavyweight runtime library - but that is irrelevant to the problem of type-checking cross-module references within the same code base.
Feb 04 2016
prev sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Feb 04, 2016 at 11:21:54PM +0000, tsbockman via Digitalmars-d wrote:
[...]
 Definitely. What puzzles me about the winning entry, though, is that the
 compiler and/or linker should be able to trivially detect the type mismatch
 *after* the preprocessor pass(es) are already done.
It cannot, because C symbols are not mangled. The function name uniquely identifies the function, and the signature is not encoded anywhere. The linker knows nothing about types or parameters; all it knows is that within offset X of binary blob B, there's a binary number (usually a 32- or 64-bit address) associated with a symbol that it needs to replace with the value (i.e., address) of that symbol, which it obtains from the object file that defines that symbol. So as far as the linker is concerned, the function names match up, and that's all there is to it. C provides zero protection against calling functions with mismatched parameters if the caller is not in the same file, and does not have the right declaration. E.g.: /* module1.c */ void func(int a, int b) { ... } /* module2.c */ extern int func(double x); /* I'm too lazy to #include a header */ int main() { int x = func(1.0); /* kaboom */ } In theory, this problem is solved by #include'ing the appropriate header file, but even that isn't free from accidents like forgetting to update the header after you change the function signature. Of course, most sane C projects will also #include the header in the file that defines the function, in which case, finally, the compiler will catch the mistake. But you can see just how fragile this is, and how many points of failure it has, and, believe it or not, there *are* still C projects out there that don't follow the convention of one header per .c file, and of those that do, a frightening number do not #include the header in the .c file. This isn't the whole story, either. Even if you follow said conventions to prevent function signature mismatches, problems can still occur. For instance, once I've had to debug a mysterious crash problem in an enterprise project that, seemingly, cannot be found in the code. Turns out, that it was caused by two shared libraries that defined two different functions under the same name. Since the conflicting functions are in separately-compiled libraries, the compiler is oblivious to the conflict. Furthermore, the linker doesn't detect it either, because, being shared libraries, all the linker knows is that it found symbol X in library1, so it didn't bother looking for symbol X again in library2 which is processed afterward. An unrelated code change caused the order of libraries linked to change, and suddenly now the linker finds symbol X in library2 first, leading to the function call being linked to the wrong implementation. So at runtime, kaboom. Name mangling singlehandedly solves all of the above problems. T -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
Feb 04 2016
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2016 3:10 PM, H. S. Teoh via Digitalmars-d wrote:
 The C preprocessor accepts all sorts of nasty, nonsensical things.
The preprocessor makes C++ into an inherently unreliable, unsafe programming language. I've talked to some C++ committee members about this, about why there is no push to rid (at least deprecate) all use of the preprocessor. The general reaction I get is it is unimportant to do so.
Feb 04 2016
prev sibling next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:
 Actually, I'm surprised that this works even in C - I would 
 have expected at least a compiler (or linker?) warning; this 
 seems like it should be easy to detect automatically.
AFAICT C would have complained if he had included <math.h>. This is a rather unlikely mistake... Anyway, in C being able to work around restrictions is sometimes desired, so... if you don't want the ability to do it, don't use C.
Feb 04 2016
parent tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:19:20 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:
 Actually, I'm surprised that this works even in C - I would 
 have expected at least a compiler (or linker?) warning; this 
 seems like it should be easy to detect automatically.
AFAICT C would have complained if he had included <math.h>. This is a rather unlikely mistake... Anyway, in C being able to work around restrictions is sometimes desired, so... if you don't want the ability to do it, don't use C.
What restriction does not checking, by default, that the parameter types match allow one to work around, though? C already has `void*` and explicit casts, either of which would allow one to explicitly indicate that type checking is not desired.
Feb 04 2016
prev sibling next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Thu, 04 Feb 2016 22:57:00 +0000, tsbockman wrote:

 The annual Underhanded C Contest announced their winners today.
 
 As always, the results are very entertaining, and also an excellent
 advertisement for languages-that-are-not-C.
 
 The first place entry is particularly ridiculous; is there any modern
 language that would make it so easy to commit such an awful "mistake"?
 
 http://www.underhanded-c.org/#winner
 
 Actually, I'm surprised that this works even in C - I would have
 expected at least a compiler (or linker?) warning; this seems like it
 should be easy to detect automatically.
C linkage does zero name mangling, which is the problem. C++ introduced name mangling, so compiling with g++ would show the error rather quickly. C99 is pretty close to C++98, but there are enough differences that that isn't a reliable diagnostic. (Though if you're familiar with the differences, you could use it as a quick way to show potential problem areas.) I suppose a compiler could produce two symbol tables, one featuring mangled names and one with unmangled names. The linker would prefer matching mangled names and issue a warning if it only had an unmangled match with a mangled false match.
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:24:21 UTC, Chris Wright wrote:
 C linkage does zero name mangling, which is the problem. C++ 
 introduced name mangling, so compiling with g++ would show the 
 error rather quickly. C99 is pretty close to C++98, but there 
 are enough differences that that isn't a reliable diagnostic. 
 (Though if you're familiar with the differences, you could use 
 it as a quick way to show potential problem areas.)

 I suppose a compiler could produce two symbol tables, one 
 featuring mangled names and one with unmangled names. The 
 linker would prefer matching mangled names and issue a warning 
 if it only had an unmangled match with a mangled false match.
That explains why the linker doesn't catch it. I still don't see much excuse for the compiler allowing it though, beyond a desire to allow each module to be compiled independently.
Feb 04 2016
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 4 February 2016 at 23:29:10 UTC, tsbockman wrote:
 That explains why the linker doesn't catch it. I still don't 
 see much excuse for the compiler allowing it though, beyond a 
 desire to allow each module to be compiled independently.
The excuse is that C use the same mechanism for creating bindings to C and non-C code. It is actually very handy. IF you want a system level language and full separation of compilation units (which allows for very fast compilation).
Feb 04 2016
prev sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Thu, 04 Feb 2016 23:29:10 +0000, tsbockman wrote:

 That explains why the linker doesn't catch it. I still don't see much
 excuse for the compiler allowing it though, beyond a desire to allow
 each module to be compiled independently.
Doing this sort of validation requires build system integration (track the command line arguments that went into producing this object file; find which object files are combined into which targets; run the analysis on that) and costs as much time as compiling the whole project from scratch. Developing such a system is nontrivial, so it's not a matter of conjuring excuses; rather, someone would have to put in considerable effort to make it work. I'm betting some of the commercial static analyzers for C do this, but they're not the sort of things you install on every dev machine and run on every build. Generally they're the sort of thing that you send off to the security company anda they send you a report some weeks later.
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 00:03:56 UTC, Chris Wright wrote:
 Doing this sort of validation requires build system integration 
 (track the command line arguments that went into producing this 
 object file; find which object files are combined into which 
 targets; run the analysis on that) and costs as much time as 
 compiling the whole project from scratch.
There is no need to take "as much time as compiling the whole project from scratch". The necessary information is already gathered during the normal course of compilation; all that is required is to actually save it somewhere until link-time, instead of throwing it away. The time required for the check should be at most O(N log(N)), where N is the number of function and global variable declarations in the project. The space required for the table is O(N). In both cases the constant factors should be quite small.
 Developing such a system is nontrivial, so it's not a matter of
 conjuring excuses; rather, someone would have to put in
 considerable effort to make it work.
Adding any interesting feature to a build system is usually nontrivial, but I still think you're overestimating the cost of this one. Again, the hard part (finding all the signatures and processing them into a semantically meaningful form) is already being done by the compiler. The results just need to be saved, sorted, and scanned for conflicts.
Feb 04 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Fri, 05 Feb 2016 00:38:16 +0000, tsbockman wrote:

 On Friday, 5 February 2016 at 00:03:56 UTC, Chris Wright wrote:
 Doing this sort of validation requires build system integration (track
 the command line arguments that went into producing this object file;
 find which object files are combined into which targets; run the
 analysis on that) and costs as much time as compiling the whole project
 from scratch.
There is no need to take "as much time as compiling the whole project from scratch". The necessary information is already gathered during the normal course of compilation; all that is required is to actually save it somewhere until link-time, instead of throwing it away.
True. That works if this is baked into your compiler, or if your compiler has plugin support. And you'd have to compile with this plugin or the relevant options turned on by default in order for you not to duplicate work. That's partly an engineering issue (build this thing in this particular way) and partly a social issue (get people to run it by default; have them add the extra flag to the makefile to specify to create the relevant output; possibly get your compiler vendor to build it in, depending on what compiler your devs are using). I imagine Google, to take a random example where I have experience, would add this as a presubmit step rather than requiring it on every build.
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 00:56:16 UTC, Chris Wright wrote:
 True. That works if this is baked into your compiler, or if 
 your compiler has plugin support. And you'd have to compile 
 with this plugin or the relevant options turned on by default 
 in order for you not to duplicate work.
On Friday, 5 February 2016 at 00:56:28 UTC, Ola Fosheim Grøstad wrote:
 Not sure what you mean by adding a warning. You can probably 
 find sanitizers that do it, but the standard does not require 
 warnings for anything (AFAIK). That is up to compiler vendors.
Quoting myself (emphasis added): On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:
 Actually, I'm surprised that this works even in C - I would 
 have expected at least a COMPILER (or linker?) warning; this 
 seems like it should be easy to detect automatically.
All along I have been saying this is something that *compilers* should warn about. As far as I can recall, I never suggested using linters, sanitizers, changing the C standard - or even compiler plugins. (I did suggest the linker as an alternative, but you all have already explained why that can't work for C.)
Feb 04 2016
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:

 All along I have been saying this is something that *compilers* should
 warn about.
The compiler doesn't have all the information you need. You could add it to the build system or the linker as well as the compiler. Adding it to the linker is almost identical to my previous suggestion of adding optional name mangling to C.
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
 On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
 The compiler doesn't have all the information you need. You 
 could add it to the build system or the linker as well as the 
 compiler. Adding it to the linker is almost identical to my 
 previous suggestion of adding optional name mangling to C.
What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name. 3) Finally, scan the list for entries that share the same name, but have incompatible type signatures. Emit warning messages as needed. (The compiler should be used for this step, because it already has a lot of information about C's type system built into it that can help define "incompatible" sensibly.) As far as I can see, this requires an extra pass, but no additional information. What am I missing?
Feb 04 2016
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Fri, 05 Feb 2016 04:02:41 +0000, tsbockman wrote:

 On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
 On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
 The compiler doesn't have all the information you need. You could add
 it to the build system or the linker as well as the compiler. Adding it
 to the linker is almost identical to my previous suggestion of adding
 optional name mangling to C.
What information, specifically, is the compiler missing?
It doesn't know what targets I'm ultimately creating, and it doesn't know what files have been modified that I'm about to compile (but haven't compiled yet). Example 1: I compile one .c file referencing a function: void foo(int); That's going to end up in libfoo.so. I compile another .c file in the same directory defining a function: void foo(float); That's going to end up in libbar.so. No bug here. (The linker should tell us if someone depends on foo from libbar and foo from libfoo in the same executable.) How does your putative compiler plugin handle it? Either I have to define a build rule for every source file to specify where to put this symbol cache (and you need to add parameters for the plugin to look for multiple caches, because libfoo and libbar share a lot of source files), or the plugin gives me false positives. Example 2: I compile a.c: int foo(int i) { return i + 1; } In the course of refactoring, I delete that function from a.c and add it to b.c with modifications: int foo(int i, int increment) { return i + increment; } My build script recompiles b.c before it recompiles a.c. Your compiler plugin produces a build error, halting my build. I have to make clean && make in order to proceed -- and that's assuming I know your tool doesn't work well with incremental compilation. The first problem might be uncommon, but the second would crop up constantly. They have the same fix: collect the information when you compile, evaluate it when you link.
Feb 04 2016
parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 06:05:49 UTC, Chris Wright wrote:
 It doesn't know what targets I'm ultimately creating, and it 
 doesn't know what files have been modified that I'm about to 
 compile (but haven't compiled yet).

 Example 1:

 I compile one .c file referencing a function:
 void foo(int);

 That's going to end up in libfoo.so.

 I compile another .c file in the same directory defining a 
 function:
 void foo(float);

 That's going to end up in libbar.so.

 No bug here. (The linker should tell us if someone depends on 
 foo from libbar and foo from libfoo in the same executable.)

 How does your putative compiler plugin handle it? Either I have 
 to define a build rule for every source file to specify where 
 to put this symbol cache (and you need to add parameters for 
 the plugin to look for multiple caches, because libfoo and 
 libbar share a lot of source files), or the plugin gives me 
 false positives.

 Example 2:

 I compile a.c:
 int foo(int i) { return i + 1; }

 In the course of refactoring, I delete that function from a.c 
 and add it
 to b.c with modifications:
 int foo(int i, int increment) { return i + increment; }

 My build script recompiles b.c before it recompiles a.c. Your 
 compiler plugin produces a build error, halting my build. I 
 have to make clean && make in order to proceed -- and that's 
 assuming I know your tool doesn't work well with incremental 
 compilation.

 The first problem might be uncommon, but the second would crop 
 up constantly. They have the same fix: collect the information 
 when you compile, evaluate it when you link.
No spurious error is generated by my proposal in your example 2, because I specifically stated that the extra pass must be done once, after *all* modules have been compiled. I see, however, that this would require one of: 1) Modifying build scripts to pass the complete list of .c files to the compiler in a single command, or 2) Modifying build scripts to run the compiler one extra time after processing all the .c files, or 3) Run the final check at link-time. For a C tool chain with a clean-sheet design, any of those would handle example 2 fine. (1) or (3) could also handle example 1 without issue. However, as you say, only (3) is backwards compatible with existing make files and what-not. (This is not a limitation of the C language or ABI, though.)
Feb 04 2016
prev sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:
 On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
The compiler doesn't have all the information you need. You could add it
to the build system or the linker as well as the compiler. Adding it to
the linker is almost identical to my previous suggestion of adding
optional name mangling to C.
What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.
This would make compilation of large projects excruciatingly slow.
 3) Finally, scan the list for entries that share the same name, but
 have incompatible type signatures. Emit warning messages as needed.
 (The compiler should be used for this step, because it already has a
 lot of information about C's type system built into it that can help
 define "incompatible" sensibly.)
This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)
 As far as I can see, this requires an extra pass, but no additional
 information. What am I missing?
The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables. So as others have said, this can only work for compilers that are aware of the larger picture than just the single source file it's currently compiling. Even in D, for a sufficiently large project the compiler can't see everything at once either, because it won't fit into your RAM. Thankfully, D doesn't suffer from this particular problem because of name mangling. Which is why I said, adding name mangling to the C compiler will solve this problem. Except that it breaks existing inter-language code, so it won't work for *all* C programs. And it will also break linkage with existing shared libraries, which are *not* name-mangled. (Recompiling said libraries may not be an option if they are OEM, binary-only blobs.) So it can only work for self-contained, independent projects with no inter-language linkage, which would be a very restricted subset of C codebases. T -- Nobody is perfect. I am Nobody. -- pepoluan, GKC forum
Feb 04 2016
next sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
 On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via 
 Digitalmars-d wrote:
 On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.
This would make compilation of large projects excruciatingly slow.
It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time. I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from?
 3) Finally, scan the list for entries that share the same 
 name, but have incompatible type signatures. Emit warning 
 messages as needed. (The compiler should be used for this 
 step, because it already has a lot of information about C's 
 type system built into it that can help define "incompatible" 
 sensibly.)
This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)
Chris Wright pointed this out, as well. This just means the final pass should be done at link-time, though. It's not a fundamental problem with generating the warning.
 As far as I can see, this requires an extra pass, but no 
 additional information. What am I missing?
The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables.
This could be worked around with a little cooperation between the compiler and the linker. It's not even a feature of C the language - it's just the way current tool chains happen to work.
Feb 04 2016
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d wrote:
 On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d
wrote:
On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.
This would make compilation of large projects excruciatingly slow.
It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time. I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from?
OK, probably I'm misunderstanding something here. :-P
3) Finally, scan the list for entries that share the same name, but
have incompatible type signatures. Emit warning messages as needed.
(The compiler should be used for this step, because it already has a
lot of information about C's type system built into it that can help
define "incompatible" sensibly.)
This fails for multi-executable projects, which may legally have different functions under the same name. (Even though that's arguably a very bad idea.)
Chris Wright pointed this out, as well. This just means the final pass should be done at link-time, though. It's not a fundamental problem with generating the warning.
The problem is, the linker knows nothing about the language. Arguably it should, but as things stand currently, it doesn't, and can't, because usually linkers are shipped with the OS, and are expected to link object files of *any* pedigree without needing to code for language-explicit checks. Perhaps this is slowly starting to change, as LTO and other recent innovations are pushing the envelope of what the linker can do. Maybe one day there will emerge a language-agnostic way for the linker to check for such errors... but I really don't see it happening, because languages *other* than C have already solved the problem with name mangling. There isn't much motivation for linkers to change just because C has some language design issues. (And note that I'm not trying to disagree with you -- I'm totally in agreement that what C allows is oftentimes extremely dangerous and rather unwise. But the way things are is just so entrenched that it's unlikely to change in the near (or even distant) future.)
As far as I can see, this requires an extra pass, but no additional
information. What am I missing?
The fact that the C compiler only sees one file at a time, and has no idea which one, if any, of them will even end up in the final executable. Many projects produce multiple executables with some shared sources between them, and only the build system knows which file(s) go with which executables.
This could be worked around with a little cooperation between the compiler and the linker. It's not even a feature of C the language - it's just the way current tool chains happen to work.
And that's where the sticky part lies. Current toolchains work in this, arguably suboptimal, way mainly because of historical baggage, but more because doing otherwise will make the toolchain incompatible with existing other toolchains and systems. The current divide between compiler and linker is actually IMO not in the best place it could be, as it hampers a lot of what, arguably, should be the compiler's job, not the linker's. Nevertheless, changing this means you become incompatible with much of the ecosystem and become a walled garden -- like Java (JNI was an afterthought, and requires a very specific setup to even work -- there's definitely no way to link Java objects with OS-level object files without jumping through lots of hoops with lots of caveats). I just don't see this ever happening, especially not for something that, in the big picture, really isn't *that* big of a deal. After all, C coders have gotten used to working with far more dangerous things in C than merely mismatched prototypes; it would take a LOT more than that for people to accept changing the way things work. T -- Skill without imagination is craftsmanship and gives us many useful objects such as wickerwork picnic baskets. Imagination without skill gives us modern art. -- Tom Stoppard
Feb 05 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Fri, 05 Feb 2016 10:04:01 -0800, H. S. Teoh via Digitalmars-d wrote:

 On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d
 wrote:
 On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d
wrote:
On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
What information, specifically, is the compiler missing? The compiler already computes the name and type signature of each function. As far as I can see, all that is necessary is to: 1) Insert that information (together with what file and line number it came from) into a big list in a temporary file. 2) After all modules have been compiled, go back and sort the list by function name.
This would make compilation of large projects excruciatingly slow.
It's a small fraction of the total data being handled by the compiler (smaller than the source code), and the list could probably be directly generated in a partially sorted state. Little-to-no random access to the list is required at any point in the process. It does not ever need to all be in RAM at the same time. I can see it may cost more than it's actually worth, but where does the "excruciatingly slow" part come from?
OK, probably I'm misunderstanding something here. :-P
I think you're talking about maintaining an in-memory, modifiable data structure, doing one insert per operation and one point query per use. That's useful for incremental compilation, but it's going to be pretty slow. tsbockman is thinking of a single pass at link time that checks everything at once. You append an entry to a list for each prototype and definition, then later sort all those lists together by name. Error on duplicate names with mismatched signatures. This is faster for fresh builds than it is for incremental compilation -- tsbockman mentioned a brief benchmark, and that cost would crop up on every build, even if you'd only changed one line of code. (Granted, that example was pretty huge.) But this might typically be faster than a bunch of point queries even with incremental compilation. Anyway, that's why I'm thinking most people who used such a feature would turn it on in their continuous integration server or as a presubmit step rather than every build.
 The problem is, the linker knows nothing about the language.
We're only talking about a linker because we need to run this tool after compiling all your files, and it has to know what input files you're putting into the linker. So this "linker" is really just a shell script that invokes our checker and then calls the system linker.
Feb 05 2016
parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 20:35:16 UTC, Chris Wright wrote:
 On Fri, 05 Feb 2016 10:04:01 -0800, H. S. Teoh via 
 Digitalmars-d wrote:

 On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via 
 Digitalmars-d wrote:
 On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
OK, probably I'm misunderstanding something here. :-P
I think you're talking about maintaining an in-memory, modifiable data structure, doing one insert per operation and one point query per use. That's useful for incremental compilation, but it's going to be pretty slow. tsbockman is thinking of a single pass at link time that checks everything at once. You append an entry to a list for each prototype and definition, then later sort all those lists together by name. Error on duplicate names with mismatched signatures.
Yes.
 This is faster for fresh builds than it is for incremental 
 compilation -- tsbockman mentioned a brief benchmark, and that 
 cost would crop up on every build, even if you'd only changed 
 one line of code. (Granted, that example was pretty huge.) But 
 this might typically be faster than a bunch of point queries 
 even with incremental compilation.

 Anyway, that's why I'm thinking most people who used such a 
 feature would turn it on in their continuous integration server 
 or as a presubmit step rather than every build.
It doesn't necessarily have to be slow when you only changed one line: * The list from the previous compilation could be re-used to speed things up considerably, although retaining it would cost some disk space. * If that's still too expensive, just don't cross-check files that aren't being recompiled. The check will be less useful on incremental builds, but not *useless*. The CI server can still do the full check (using the compiler), as you suggest.
 The problem is, the linker knows nothing about the language.
We're only talking about a linker because we need to run this tool after compiling all your files, and it has to know what input files you're putting into the linker. So this "linker" is really just a shell script that invokes our checker and then calls the system linker.
Yes. (Or, it's the compiler with a special option set, which then calls the linker after it finishes its global pre-link tasks.)
Feb 05 2016
prev sibling parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
 On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via 
 Digitalmars-d wrote:
 1) Insert that information (together with what file and line 
 number it
 came from) into a big list in a temporary file.
 2) After all modules have been compiled, go back and sort the 
 list by
 function name.
This would make compilation of large projects excruciatingly slow.
I did some quick tests on my system, and even with 100,000,000 names (more names than there are lines of code in the Linux kernel...) this can be done in less than three minutes. Smaller projects take seconds or less. I suspect there is a major disconnect between what I meant, and what you think I meant.
Feb 05 2016
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Friday, 5 February 2016 at 01:10:53 UTC, tsbockman wrote:
 All along I have been saying this is something that *compilers* 
 should warn about. As far as I can recall, I never suggested 
 using linters, sanitizers, changing the C standard - or even 
 compiler plugins.
Well, compilers "should" only implement the standard, then they "may" add extra static analysis. The direction C and C++ takes is that increasing compilation times by doing extra static analysis on every build isn't desirable. Therefore compilers should focus on what is necessary for code gen and optimization and sanitizers should focus on correctness. This is different from Rust, who do sanitization as part of their compilation, but that makes the compiler more complicated and/or much _slower_.
 (I did suggest the linker as an alternative, but you all have 
 already explained why that can't work for C.)
It can work if you compile all source files with the same compiler, that has historically not been the case as commercial libraries would be compiled with other compilers or be handwritten assembly. C compilers that do Whole Program Analysis have dedicated linkers that should be able to do extended type checking if the IR used in the object file provides typing info. I don't know if Clang or GCC does emit typing info though, but they _could_. Yes.
Feb 05 2016
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
Let me add to this that the superior approach is to compile to an 
intermediated high level format that retains type information. I 
guess this is where Rust is heading.

It just isn't possible with C semantics to make a reasonable 
version of that, since the language itself is 90% unsafe and just 
a small step up from assembly (for good and bad).
Feb 05 2016
prev sibling next sibling parent reply anonymous <anonymous example.com> writes:
On 04.02.2016 23:57, tsbockman wrote:
 http://www.underhanded-c.org/#winner

 Actually, I'm surprised that this works even in C - I would have
 expected at least a compiler (or linker?) warning; this seems like it
 should be easy to detect automatically.
You can do the same thing in D, using extern(C) to get no mangling: main.d: ---- alias float_t = double; extern(C) float_t deref(float_t* a); void main() { import std.stdio: writeln; float_t d = 1.23; writeln(deref(&d)); /* prints "1.01856e-314" */ } ---- deref.d: ---- alias float_t = float; extern(C) float_t deref(float_t* a) {return *a;} ---- Command to build and run: ---- dmd main.d deref.d && ./main ----
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:40:13 UTC, anonymous wrote:
 You can do the same thing in D, using extern(C) to get no 
 mangling:

 main.d:
 ----
 alias float_t = double;
 extern(C) float_t deref(float_t* a);
 void main()
 {
     import std.stdio: writeln;
     float_t d = 1.23;
     writeln(deref(&d)); /* prints "1.01856e-314" */
 }
 ----

 deref.d:
 ----
 alias float_t = float;
 extern(C) float_t deref(float_t* a) {return *a;}
 ----

 Command to build and run:
 ----
 dmd main.d deref.d && ./main
 ----
You can do the same thing in D if you try, but it's not natural at all to use `extern(C)` for *internal* linkage of an all-D program like that. Any competent reviewer would certainly question why you were using `extern(C)`; this scores much lower in "underhanded-ness" than the original C program. Even so, I think that qualifies as a compiler bug or a hole in the D spec.
Feb 04 2016
next sibling parent reply anonymous <anonymous example.com> writes:
On 05.02.2016 00:47, tsbockman wrote:
 You can do the same thing in D if you try, but it's not natural at all
 to use `extern(C)` for *internal* linkage of an all-D program like that.

 Any competent reviewer would certainly question why you were using
 `extern(C)`; this scores much lower in "underhanded-ness" than the
 original C program.
We do have a lot of bindings to C libraries, though. When there's a wrong alias in one of them, you have the same scenario.
 Even so, I think that qualifies as a compiler bug or a hole in the D spec.
Can anything be done about it? The compiler simply has no way to verify declarations, has it?
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:51:57 UTC, anonymous wrote:
 We do have a lot of bindings to C libraries, though. When 
 there's a wrong alias in one of them, you have the same 
 scenario.

 On 05.02.2016 00:47, tsbockman wrote:
 Even so, I think that qualifies as a compiler bug or a hole in 
 the D spec.
Can anything be done about it? The compiler simply has no way to verify declarations, has it?
The compiler cannot (in the general case) verify that `extern(C)` declarations are *correct*. What it could do, though, is verify that they are *consistent*. If the same `extern(C)` symbol is declared multiple places in the D source code for a program, the compiler should issue at least a warning if the D signatures don't agree with each other.
Feb 04 2016
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Friday, 5 February 2016 at 00:03:20 UTC, tsbockman wrote:
 If the same `extern(C)` symbol is declared multiple places in 
 the D source code for a program, the compiler should issue at 
 least a warning if the D signatures don't agree with each other.
I guess D could do it, although this is a rather unlikely source for bugs. C cannot do it. It would be annoying as declarations are file local. C doesn't really build programs, it builds object files that are linked into a program. It makes perfect sense for one compilation unit to type a parameter pointer to float and another unit to type the same parameter as a simd-array of floats. The underlying code could be machine language. And in machine language there are no types (on current CPUs), only bit patterns. So you can have multiple reasonable interpretations of the same machine language entry. A type is a constraint, but it isn't a property of the actual bits, it is a language specific interpretation.
Feb 04 2016
parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 00:12:07 UTC, Ola Fosheim Grøstad 
wrote:
 It makes perfect sense for one compilation unit to type a 
 parameter pointer to float  and another unit to type the same 
 parameter as a simd-array of floats. The underlying code could 
 be machine language. And in machine language there are no types 
 (on current CPUs), only bit patterns. So you can have multiple 
 reasonable interpretations of the same machine language entry.

 A type is a constraint, but it isn't a property of the actual 
 bits, it is a language specific interpretation.
Aliasing types like that can be useful sometimes, but only within certain limits. In particular, the size (with alignment padding) of the types in question must match, otherwise you will corrupt the stack. It is often useful to cast from one pointer type to another, but that is why C has void* and explicit casts - so that one may document that the reinterpretation is intentional.
Feb 04 2016
prev sibling parent reply Daniel Murphy <yebbliesnospam gmail.com> writes:
On 5/02/2016 11:03 AM, tsbockman wrote:
 The compiler cannot (in the general case) verify that `extern(C)`
 declarations are *correct*. What it could do, though, is verify that
 they are *consistent*.

 If the same `extern(C)` symbol is declared multiple places in the D
 source code for a program, the compiler should issue at least a warning
 if the D signatures don't agree with each other.
Currently D allows overloading extern(C) declarations, see https://issues.dlang.org/show_bug.cgi?id=15217 Checking for invalid overloads with non-D linkage is covered here: https://issues.dlang.org/show_bug.cgi?id=2789 But neither of these cover overloads that aren't simultaneously visible. 15217 shows us that this lack of checking, when combined with D's abundant binary-compatible-but-distinct types, is somewhat useful. Apart from some scary ABI hacks there is nothing really stopping us from enforcing that all non-D function in all modules included in a single compilation have distinct symbol names or (at least binary-compatible) matching D parameters.
Feb 05 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 10:49:50 UTC, Daniel Murphy wrote:
 Currently D allows overloading extern(C) declarations, see
 https://issues.dlang.org/show_bug.cgi?id=15217

 Checking for invalid overloads with non-D linkage is covered 
 here:
 https://issues.dlang.org/show_bug.cgi?id=2789

 But neither of these cover overloads that aren't simultaneously 
 visible.
 15217 shows us that this lack of checking, when combined with 
 D's abundant binary-compatible-but-distinct types, is somewhat 
 useful.

 Apart from some scary ABI hacks there is nothing really 
 stopping us from enforcing that all non-D function in all 
 modules included in a single compilation have distinct symbol 
 names or (at least binary-compatible) matching D parameters.
I think it makes sense (when actually linking to C) to allow stuff like druntime's creative use of overloads. The signatures of the two bsd_signal() overloads are compatible (from C's perspective), so why not? However, multiple `extern(C)` overloads that differ in the number or size of arguments should trigger a warning. Signed versus unsigned or even int versus floating point is more of a gray area. Overloads with conflicting pointer types should definitely be allowed, but ideally the compiler would force them to be marked system or trusted, since there is an implied unsafe cast in there somewhere.
Feb 05 2016
parent Daniel Murphy <yebbliesnospam gmail.com> writes:
On 5/02/2016 10:07 PM, tsbockman wrote:
 I think it makes sense (when actually linking to C) to allow stuff like
 druntime's creative use of overloads. The signatures of the two
 bsd_signal() overloads are compatible (from C's perspective), so why not?

 However, multiple `extern(C)` overloads that differ in the number or
 size of arguments should trigger a warning. Signed versus unsigned or
 even int versus floating point is more of a gray area.
That's what I meant by binary compatible.
 Overloads with conflicting pointer types should definitely be allowed,
 but ideally the compiler would force them to be marked  system or
  trusted, since there is an implied unsafe cast in there somewhere.
Safety on C functions is always going to need to be hand verified, the presence of overloads doesn't change that. Conflicting pointer types are pretty much the same as a function taking void* - all the unsafe stuff is on the other side and invisible to the D compiler.
Feb 05 2016
prev sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Feb 04, 2016 at 11:47:53PM +0000, tsbockman via Digitalmars-d wrote:
[...]
 You can do the same thing in D if you try, but it's not natural at all
 to use `extern(C)` for *internal* linkage of an all-D program like
 that.
 
 Any competent reviewer would certainly question why you were using
 `extern(C)`; this scores much lower in "underhanded-ness" than the
 original C program.
 
 Even so, I think that qualifies as a compiler bug or a hole in the D
 spec.
Nah... while D, by default, tries to be type-safe and prevent guffaws like the above, it *is* also a systems programming language (or at least, that's one of the stated goals), so it does allow you to go under the hood to do things that you normally aren't allowed to do. Linking to foreign languages is a use case for allowing extern(C) function names: if you know the mangling scheme of the target language, you can declare the mangled name under extern(C) and that will allow D code to call functions written in the target language directly. Otherwise you'd have to change the compiler (and wait for the next release, etc.) before you could do that. T -- Do not reason with the unreasonable; you lose by definition.
Feb 04 2016
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Thu, 04 Feb 2016 15:59:06 -0800, H. S. Teoh via Digitalmars-d wrote:

 Nah... while D, by default, tries to be type-safe and prevent guffaws
 like the above, it *is* also a systems programming language (or at
 least, that's one of the stated goals), so it does allow you to go under
 the hood to do things that you normally aren't allowed to do.
Which suggests a check of this sort should be a warning rather than an error, or perhaps that a pragma or attribute could be offered to ignore it. Systems languages let you go into "Here Be Dragons" territory, but it would be nice if they still pointed out the signs to you.
Feb 04 2016
parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 00:07:45 UTC, Chris Wright wrote:
 Which suggests a check of this sort should be a warning rather 
 than an error, or perhaps that a pragma or attribute could be 
 offered to ignore it.

 Systems languages let you go into "Here Be Dragons" territory, 
 but it would be nice if they still pointed out the signs to you.
Yes.
Feb 04 2016
prev sibling parent tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 4 February 2016 at 23:59:06 UTC, H. S. Teoh wrote:
 On Thu, Feb 04, 2016 at 11:47:53PM +0000, tsbockman via 
 Digitalmars-d wrote: [...]
 Even so, I think that qualifies as a compiler bug or a hole in 
 the D spec.
Nah... while D, by default, tries to be type-safe and prevent guffaws like the above, it *is* also a systems programming language (or at least, that's one of the stated goals), so it does allow you to go under the hood to do things that you normally aren't allowed to do. Linking to foreign languages is a use case for allowing extern(C) function names: if you know the mangling scheme of the target language, you can declare the mangled name under extern(C) and that will allow D code to call functions written in the target language directly. Otherwise you'd have to change the compiler (and wait for the next release, etc.) before you could do that. T
I'm not saying that `extern(C)` is bad in general; I understand why it's necessary. I'm saying that anonymous' example (http://forum.dlang.org/post/n90ngu$1r6v$1 digitalmars.com) showcases a hole in the spec, because in it the D compiler has access to the full source code of the function being linked to, and doesn't bother to verify that its signature in main.d is compatible with the definition in deref.d. If the D compiler does *not* have access to the function's definition, then obviously it cannot perform this verification.
Feb 04 2016
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:
 The first place entry is particularly ridiculous; is there any 
 modern language that would make it so easy to commit such an 
 awful "mistake"?
D allows that. This is why I recommend putting `static assert(foo.sizeof == expectation);` in code that interfaces with external things, like C code, or D .di stuff. #include <math.h> /* sqrt */ that line is an interesting one too: the trick is depending on namespace pollution by the include. In D, you might write `import core.stdc.math : sqrt;` and make that misleading comment part of the code.... though then you could perhaps exploit that module bug (314?).
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 01:14:05 UTC, Adam D. Ruppe wrote:
 D allows that. This is why I recommend putting `static 
 assert(foo.sizeof == expectation);` in code that interfaces 
 with external things, like C code, or D .di stuff.

 #include <math.h> /* sqrt */
D *doesn't* allow that though - at least, not in a monolithic, idiomatic D program: there wouldn't be any duplicate declaration of `spectral_contrast()` to mess up. Yes, you can force the matter using `extern(C)` like anonymous demonstrated earlier - but using `extern(C)` for internal linkage in an all-D program would certainly attract scrutiny from reviewers; it would score poorly on the "underhanded-ness" test. As to the ".di" stuff - I've not used them. Care to educate me? How can they cause similar problems?
 that line is an interesting one too: the trick is depending on 
 namespace pollution by the include. In D, you might write 
 `import core.stdc.math : sqrt;` and make that misleading 
 comment part of the code.... though then you could perhaps 
 exploit that module bug (314?).
314 definitely has potential. Should we start an "Underhanded D" contest? Sounds like bad marketing, but a lot of fun :-P
Feb 04 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 5 February 2016 at 01:33:14 UTC, tsbockman wrote:
 As to the ".di" stuff - I've not used them. Care to educate me? 
 How can they cause similar problems?
Well, technically, a .di file is just a .d file renamed, but it tends to have the bodies stripped out. Separate compliation is a supported feature of D. The way you'd do it is something like this: struct Foo { float a; float b; } void bar(Foo* f) { f.b = whatever; } Then compile it with -lib and make a "header" file manually: struct Foo { double a; double b; } void bar(Foo*); You can now create D modules that import this and link against the compiled library. Very similar to C's model... But I redefined Foo! The name mangling won't catch this. bar will be mangled to take `Foo` as an argument and the linker will catch if we change that, but it doesn't know what Foo actually is. By changing that, we introduce the problem.
 314 definitely has potential. Should we start an "Underhanded 
 D" contest? Sounds like bad marketing, but a lot of fun :-P
it might be :)
Feb 04 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 04:25:09 UTC, Adam D. Ruppe wrote:
 On Friday, 5 February 2016 at 01:33:14 UTC, tsbockman wrote:
 As to the ".di" stuff - I've not used them. Care to educate 
 me? How can they cause similar problems?
Well, technically, a .di file is just a .d file renamed, but it tends to have the bodies stripped out. Separate compliation is a supported feature of D. The way you'd do it is something like this: struct Foo { float a; float b; } void bar(Foo* f) { f.b = whatever; } Then compile it with -lib and make a "header" file manually: struct Foo { double a; double b; } void bar(Foo*); You can now create D modules that import this and link against the compiled library. Very similar to C's model... But I redefined Foo! The name mangling won't catch this. bar will be mangled to take `Foo` as an argument and the linker will catch if we change that, but it doesn't know what Foo actually is. By changing that, we introduce the problem.
 314 definitely has potential. Should we start an "Underhanded 
 D" contest? Sounds like bad marketing, but a lot of fun :-P
it might be :)
Thanks for the explanation. That does sound basically the same as the C issue. Since .di files are normally generated automatically, this seems like an easily solvable problem: 1) When compiling a library and its attendant .di file(s), generate a unique version identifier (such as a UUID or a hash of the completed binary) and append it to both the library and each .di file. 2) Whenever someone tries to link against the library, verify that the version ID matches. If it does not, issue a prominent warning. Problem solved? Or is this harder than it looks? (Of course there are various details to consider, such as how to efficiently share one set of .di files across many platforms/compiler settings; this is just a rough sketch.)
Feb 04 2016
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Feb 05, 2016 at 04:39:13AM +0000, tsbockman via Digitalmars-d wrote:
[...]
 Thanks for the explanation. That does sound basically the same as the
 C issue.
 
 Since .di files are normally generated automatically, this seems like
 an easily solvable problem:
 
 1) When compiling a library and its attendant .di file(s), generate a
 unique version identifier (such as a UUID or a hash of the completed
 binary) and append it to both the library and each .di file.
 
 2) Whenever someone tries to link against the library, verify that the
 version ID matches. If it does not, issue a prominent warning.
[...] This would break shared library upgrades that do not change the ABI. Plus, it doesn't fix wrong linkage at runtime, because the dynamic linker is part of the OS and the D compiler has no control over what it does beyond the standard symbol matching and relocation mechanisms. If you compile against libfoo, but at runtime the user happens to have a stale, ABI-incompatible version of libfoo hanging around that gets picked up by the dynamic linker, you'll have the same problem. T -- VI = Visual Irritation
Feb 04 2016
parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 February 2016 at 07:15:56 UTC, H. S. Teoh wrote:
 This would break shared library upgrades that do not change the 
 ABI.

 Plus, it doesn't fix wrong linkage at runtime, because the 
 dynamic linker is part of the OS and the D compiler has no 
 control over what it does beyond the standard symbol matching 
 and relocation mechanisms. If you compile against libfoo, but 
 at runtime the user happens to have a stale, ABI-incompatible 
 version of libfoo hanging around that gets picked up by the 
 dynamic linker, you'll have the same problem.
I should have clarified that I was considering static libraries, only. (I thought D's dynamic library support was kind of broken right at the moment, anyway?) Dynamic libraries are definitely a harder problem. I think useful automated protection against bad .di files could be developed for dynamic libraries as well, but the scheme wouldn't be anywhere near as simple and it might require the maintainer to actually follow SemVer to be useful.
Feb 04 2016