www.digitalmars.com         C & C++   DMDScript  

D - C Intermediate Language

reply Mark Evans <Mark_member pathlink.com> writes:
http://manju.cs.berkeley.edu/cil/index.html

I was amused that CIL is written in OCaml.  OCaml just continues to amaze.  The
CIL license is loose, so this tool might have uses for D.  I can envision a D
front end written in OCaml that is one-quarter its present size and twice as
robust.  The CIL tool has processed the ENTIRE linux kernel successfully, quirks
and all.  -M.

---------------------------------------------------------------

CIL (C Intermediate Language) is a high-level representation along with a set of
tools that permit easy analysis and source-to-source transformation of C
programs.

CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous
constructs and removing redundant ones, and also higher-level than typical
intermediate languages designed for compilation, by maintaining types and a
close relationship with the source program. The main advantage of CIL is that it
compiles all valid C programs into a few core constructs with a very clean
semantics. Also CIL has a syntax-directed type system that makes it easy to
analyze and manipulate C programs. Furthermore, the CIL front-end is able to
process not only ANSI-C programs but also those using Microsoft C or GNU C
extensions. If you do not use CIL and want instead to use just a C parser and
analyze programs expressed as abstract-syntax trees then your analysis will have
to handle a lot of ugly corners of the language (let alone the fact that parsing
C itself is not a trivial task). See Section 15 for some examples of such
extreme programs that CIL simplifies for you.

In essence, CIL is a highly-structured, 'clean' subset of C. CIL features a
reduced number of syntactic and conceptual forms. For example, all looping
constructs are reduced to a single form, all function bodies are given explicit
return statements, syntactic sugar like "->" is eliminated and function
arguments with array types become pointers. (For an extensive list of how CIL
simplifies C programs, see Section 3.) This reduces the number of cases that
must be considered when manipulating a C program. CIL also separates type
declarations from code and flattens scopes within function bodies. This
structures the program in a manner more amenable to rapid analysis and
transformation. CIL computes the types of all program expressions, and makes all
type promotions and casts explicit. CIL supports all GCC and MSVC extensions
except for nested functions and complex numbers. Finally, CIL organizes C's
imperative features into expressions, instructions and statements based on the
presence and absence of side-effects and control-flow. Every statement can be
annotated with successor and predecessor information. Thus CIL provides an
integrated program representation that can be used with routines that require an
AST (e.g. type-based analyses and pretty-printers), as well as with routines
that require a CFG (e.g., dataflow analyses).

CIL comes accompanied by a number of Perl scripts that perform generally useful
operations on code: A driver which behaves as either the gcc or Microsoft VC
compiler and can invoke the preprocessor followed by the CIL application. The
advantage of this script is that you can easily use CIL and the analyses written
for CIL with existing make files.

A whole-program merger that you can use as a replacement for your compiler and
it learns all the files you compile when you make a project and merges all of
the preprocessed source files into a single one. This makes it easy to do
whole-program analysis.

A patcher makes it easy to create modified copies of the system include files.
The CIL driver can then be told to use these patched copies instead of the
standard ones. 

CIL has been tested very extensively. It is able to process the SPECINT95
benchmarks, the Linux kernel, GIMP and other open-source projects. All of these
programs are compiled to the simple CIL and then passed to gcc and they still
run! We consider the compilation of Linux a major feat especially since Linux
contains many of the ugly GCC extensions (see Section 15.2). This adds to about
1,000,000 lines of code that we tested it on. It is also able to process the few
Microsoft NT device drivers that we have had access to. CIL was tested against
GCC's c-torture testsuite and (except for the tests involving complex numbers
and inner functions, which CIL does not currently implement) CIL passes most of
the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests that
it should pass. GCC itself fails 19 tests. A total of 1400 regression test cases
are run automatically on each change to the CIL sources.

CIL is relatively independent on the underlying machine and compiler. When you
build it CIL will configure itself according to the underlying compiler.
However, CIL has only been tested on Intel x86 using the gcc compiler on Linux
and cygwin and using the MS Visual C compiler. (See below for specific versions
of these compilers that we have used CIL for.)

The largest application we have used CIL for is CCured, a compiler that compiles
C code into type-safe code by analyzing your pointer usage and inserting runtime
checks in the places that cannot be guaranteed statically to be type safe.
[Note: the Cyclone folks think they did CCured one better; see their PDF intro
which mentions CCured.]
Feb 19 2003
parent "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> writes:
Pretty neat!  Seems like a much easier route than implementing all the
back-end pieces of a compiler yet again.

   Dan


"Mark Evans" <Mark_member pathlink.com> wrote in message
news:b2vekk$2940$1 digitaldaemon.com...
 http://manju.cs.berkeley.edu/cil/index.html

 I was amused that CIL is written in OCaml.  OCaml just continues to amaze.

 CIL license is loose, so this tool might have uses for D.  I can envision

 front end written in OCaml that is one-quarter its present size and twice

 robust.  The CIL tool has processed the ENTIRE linux kernel successfully,

 and all.  -M.

 ---------------------------------------------------------------

 CIL (C Intermediate Language) is a high-level representation along with a

 tools that permit easy analysis and source-to-source transformation of C
 programs.

 CIL is both lower-level than abstract-syntax trees, by clarifying

 constructs and removing redundant ones, and also higher-level than typical
 intermediate languages designed for compilation, by maintaining types and

 close relationship with the source program. The main advantage of CIL is

 compiles all valid C programs into a few core constructs with a very clean
 semantics. Also CIL has a syntax-directed type system that makes it easy

 analyze and manipulate C programs. Furthermore, the CIL front-end is able

 process not only ANSI-C programs but also those using Microsoft C or GNU C
 extensions. If you do not use CIL and want instead to use just a C parser

 analyze programs expressed as abstract-syntax trees then your analysis

 to handle a lot of ugly corners of the language (let alone the fact that

 C itself is not a trivial task). See Section 15 for some examples of such
 extreme programs that CIL simplifies for you.

 In essence, CIL is a highly-structured, 'clean' subset of C. CIL features

 reduced number of syntactic and conceptual forms. For example, all looping
 constructs are reduced to a single form, all function bodies are given

 return statements, syntactic sugar like "->" is eliminated and function
 arguments with array types become pointers. (For an extensive list of how

 simplifies C programs, see Section 3.) This reduces the number of cases

 must be considered when manipulating a C program. CIL also separates type
 declarations from code and flattens scopes within function bodies. This
 structures the program in a manner more amenable to rapid analysis and
 transformation. CIL computes the types of all program expressions, and

 type promotions and casts explicit. CIL supports all GCC and MSVC

 except for nested functions and complex numbers. Finally, CIL organizes

 imperative features into expressions, instructions and statements based on

 presence and absence of side-effects and control-flow. Every statement can

 annotated with successor and predecessor information. Thus CIL provides an
 integrated program representation that can be used with routines that

 AST (e.g. type-based analyses and pretty-printers), as well as with

 that require a CFG (e.g., dataflow analyses).

 CIL comes accompanied by a number of Perl scripts that perform generally

 operations on code: A driver which behaves as either the gcc or Microsoft

 compiler and can invoke the preprocessor followed by the CIL application.

 advantage of this script is that you can easily use CIL and the analyses

 for CIL with existing make files.

 A whole-program merger that you can use as a replacement for your compiler

 it learns all the files you compile when you make a project and merges all

 the preprocessed source files into a single one. This makes it easy to do
 whole-program analysis.

 A patcher makes it easy to create modified copies of the system include

 The CIL driver can then be told to use these patched copies instead of the
 standard ones.

 CIL has been tested very extensively. It is able to process the SPECINT95
 benchmarks, the Linux kernel, GIMP and other open-source projects. All of

 programs are compiled to the simple CIL and then passed to gcc and they

 run! We consider the compilation of Linux a major feat especially since

 contains many of the ugly GCC extensions (see Section 15.2). This adds to

 1,000,000 lines of code that we tested it on. It is also able to process

 Microsoft NT device drivers that we have had access to. CIL was tested

 GCC's c-torture testsuite and (except for the tests involving complex

 and inner functions, which CIL does not currently implement) CIL passes

 the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests

 it should pass. GCC itself fails 19 tests. A total of 1400 regression test

 are run automatically on each change to the CIL sources.

 CIL is relatively independent on the underlying machine and compiler. When

 build it CIL will configure itself according to the underlying compiler.
 However, CIL has only been tested on Intel x86 using the gcc compiler on

 and cygwin and using the MS Visual C compiler. (See below for specific

 of these compilers that we have used CIL for.)

 The largest application we have used CIL for is CCured, a compiler that

 C code into type-safe code by analyzing your pointer usage and inserting

 checks in the places that cannot be guaranteed statically to be type safe.
 [Note: the Cyclone folks think they did CCured one better; see their PDF

 which mentions CCured.]

Feb 19 2003