D - C Intermediate Language
- Mark Evans <Mark_member pathlink.com> Feb 19 2003
- "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> Feb 19 2003
http://manju.cs.berkeley.edu/cil/index.html I was amused that CIL is written in OCaml. OCaml just continues to amaze. The CIL license is loose, so this tool might have uses for D. I can envision a D front end written in OCaml that is one-quarter its present size and twice as robust. The CIL tool has processed the ENTIRE linux kernel successfully, quirks and all. -M. --------------------------------------------------------------- CIL (C Intermediate Language) is a high-level representation along with a set of tools that permit easy analysis and source-to-source transformation of C programs. CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and a close relationship with the source program. The main advantage of CIL is that it compiles all valid C programs into a few core constructs with a very clean semantics. Also CIL has a syntax-directed type system that makes it easy to analyze and manipulate C programs. Furthermore, the CIL front-end is able to process not only ANSI-C programs but also those using Microsoft C or GNU C extensions. If you do not use CIL and want instead to use just a C parser and analyze programs expressed as abstract-syntax trees then your analysis will have to handle a lot of ugly corners of the language (let alone the fact that parsing C itself is not a trivial task). See Section 15 for some examples of such extreme programs that CIL simplifies for you. In essence, CIL is a highly-structured, 'clean' subset of C. CIL features a reduced number of syntactic and conceptual forms. For example, all looping constructs are reduced to a single form, all function bodies are given explicit return statements, syntactic sugar like "->" is eliminated and function arguments with array types become pointers. (For an extensive list of how CIL simplifies C programs, see Section 3.) This reduces the number of cases that must be considered when manipulating a C program. CIL also separates type declarations from code and flattens scopes within function bodies. This structures the program in a manner more amenable to rapid analysis and transformation. CIL computes the types of all program expressions, and makes all type promotions and casts explicit. CIL supports all GCC and MSVC extensions except for nested functions and complex numbers. Finally, CIL organizes C's imperative features into expressions, instructions and statements based on the presence and absence of side-effects and control-flow. Every statement can be annotated with successor and predecessor information. Thus CIL provides an integrated program representation that can be used with routines that require an AST (e.g. type-based analyses and pretty-printers), as well as with routines that require a CFG (e.g., dataflow analyses). CIL comes accompanied by a number of Perl scripts that perform generally useful operations on code: A driver which behaves as either the gcc or Microsoft VC compiler and can invoke the preprocessor followed by the CIL application. The advantage of this script is that you can easily use CIL and the analyses written for CIL with existing make files. A whole-program merger that you can use as a replacement for your compiler and it learns all the files you compile when you make a project and merges all of the preprocessed source files into a single one. This makes it easy to do whole-program analysis. A patcher makes it easy to create modified copies of the system include files. The CIL driver can then be told to use these patched copies instead of the standard ones. CIL has been tested very extensively. It is able to process the SPECINT95 benchmarks, the Linux kernel, GIMP and other open-source projects. All of these programs are compiled to the simple CIL and then passed to gcc and they still run! We consider the compilation of Linux a major feat especially since Linux contains many of the ugly GCC extensions (see Section 15.2). This adds to about 1,000,000 lines of code that we tested it on. It is also able to process the few Microsoft NT device drivers that we have had access to. CIL was tested against GCC's c-torture testsuite and (except for the tests involving complex numbers and inner functions, which CIL does not currently implement) CIL passes most of the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests that it should pass. GCC itself fails 19 tests. A total of 1400 regression test cases are run automatically on each change to the CIL sources. CIL is relatively independent on the underlying machine and compiler. When you build it CIL will configure itself according to the underlying compiler. However, CIL has only been tested on Intel x86 using the gcc compiler on Linux and cygwin and using the MS Visual C compiler. (See below for specific versions of these compilers that we have used CIL for.) The largest application we have used CIL for is CCured, a compiler that compiles C code into type-safe code by analyzing your pointer usage and inserting runtime checks in the places that cannot be guaranteed statically to be type safe. [Note: the Cyclone folks think they did CCured one better; see their PDF intro which mentions CCured.]
Feb 19 2003
Pretty neat! Seems like a much easier route than implementing all the back-end pieces of a compiler yet again. Dan "Mark Evans" <Mark_member pathlink.com> wrote in message news:b2vekk$2940$1 digitaldaemon.com...http://manju.cs.berkeley.edu/cil/index.html I was amused that CIL is written in OCaml. OCaml just continues to amaze.
CIL license is loose, so this tool might have uses for D. I can envision
front end written in OCaml that is one-quarter its present size and twice
robust. The CIL tool has processed the ENTIRE linux kernel successfully,
and all. -M. --------------------------------------------------------------- CIL (C Intermediate Language) is a high-level representation along with a
tools that permit easy analysis and source-to-source transformation of C programs. CIL is both lower-level than abstract-syntax trees, by clarifying
constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and
close relationship with the source program. The main advantage of CIL is
compiles all valid C programs into a few core constructs with a very clean semantics. Also CIL has a syntax-directed type system that makes it easy
analyze and manipulate C programs. Furthermore, the CIL front-end is able
process not only ANSI-C programs but also those using Microsoft C or GNU C extensions. If you do not use CIL and want instead to use just a C parser
analyze programs expressed as abstract-syntax trees then your analysis
to handle a lot of ugly corners of the language (let alone the fact that
C itself is not a trivial task). See Section 15 for some examples of such extreme programs that CIL simplifies for you. In essence, CIL is a highly-structured, 'clean' subset of C. CIL features
reduced number of syntactic and conceptual forms. For example, all looping constructs are reduced to a single form, all function bodies are given
return statements, syntactic sugar like "->" is eliminated and function arguments with array types become pointers. (For an extensive list of how
simplifies C programs, see Section 3.) This reduces the number of cases
must be considered when manipulating a C program. CIL also separates type declarations from code and flattens scopes within function bodies. This structures the program in a manner more amenable to rapid analysis and transformation. CIL computes the types of all program expressions, and
type promotions and casts explicit. CIL supports all GCC and MSVC
except for nested functions and complex numbers. Finally, CIL organizes
imperative features into expressions, instructions and statements based on
presence and absence of side-effects and control-flow. Every statement can
annotated with successor and predecessor information. Thus CIL provides an integrated program representation that can be used with routines that
AST (e.g. type-based analyses and pretty-printers), as well as with
that require a CFG (e.g., dataflow analyses). CIL comes accompanied by a number of Perl scripts that perform generally
operations on code: A driver which behaves as either the gcc or Microsoft
compiler and can invoke the preprocessor followed by the CIL application.
advantage of this script is that you can easily use CIL and the analyses
for CIL with existing make files. A whole-program merger that you can use as a replacement for your compiler
it learns all the files you compile when you make a project and merges all
the preprocessed source files into a single one. This makes it easy to do whole-program analysis. A patcher makes it easy to create modified copies of the system include
The CIL driver can then be told to use these patched copies instead of the standard ones. CIL has been tested very extensively. It is able to process the SPECINT95 benchmarks, the Linux kernel, GIMP and other open-source projects. All of
programs are compiled to the simple CIL and then passed to gcc and they
run! We consider the compilation of Linux a major feat especially since
contains many of the ugly GCC extensions (see Section 15.2). This adds to
1,000,000 lines of code that we tested it on. It is also able to process
Microsoft NT device drivers that we have had access to. CIL was tested
GCC's c-torture testsuite and (except for the tests involving complex
and inner functions, which CIL does not currently implement) CIL passes
the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests
it should pass. GCC itself fails 19 tests. A total of 1400 regression test
are run automatically on each change to the CIL sources. CIL is relatively independent on the underlying machine and compiler. When
build it CIL will configure itself according to the underlying compiler. However, CIL has only been tested on Intel x86 using the gcc compiler on
and cygwin and using the MS Visual C compiler. (See below for specific
of these compilers that we have used CIL for.) The largest application we have used CIL for is CCured, a compiler that
C code into type-safe code by analyzing your pointer usage and inserting
checks in the places that cannot be guaranteed statically to be type safe. [Note: the Cyclone folks think they did CCured one better; see their PDF
which mentions CCured.]
Feb 19 2003








"J. Daniel Smith" <j_daniel_smith HoTMaiL.com>