www.digitalmars.com         C & C++   DMDScript  

D - C Intermediate Language

reply Mark Evans <Mark_member pathlink.com> writes:
http://manju.cs.berkeley.edu/cil/index.html

I was amused that CIL is written in OCaml.  OCaml just continues to amaze.  The
CIL license is loose, so this tool might have uses for D.  I can envision a D
front end written in OCaml that is one-quarter its present size and twice as
robust.  The CIL tool has processed the ENTIRE linux kernel successfully, quirks
and all.  -M.

---------------------------------------------------------------

CIL (C Intermediate Language) is a high-level representation along with a set of
tools that permit easy analysis and source-to-source transformation of C
programs.

CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous
constructs and removing redundant ones, and also higher-level than typical
intermediate languages designed for compilation, by maintaining types and a
close relationship with the source program. The main advantage of CIL is that it
compiles all valid C programs into a few core constructs with a very clean
semantics. Also CIL has a syntax-directed type system that makes it easy to
analyze and manipulate C programs. Furthermore, the CIL front-end is able to
process not only ANSI-C programs but also those using Microsoft C or GNU C
extensions. If you do not use CIL and want instead to use just a C parser and
analyze programs expressed as abstract-syntax trees then your analysis will have
to handle a lot of ugly corners of the language (let alone the fact that parsing
C itself is not a trivial task). See Section 15 for some examples of such
extreme programs that CIL simplifies for you.

In essence, CIL is a highly-structured, 'clean' subset of C. CIL features a
reduced number of syntactic and conceptual forms. For example, all looping
constructs are reduced to a single form, all function bodies are given explicit
return statements, syntactic sugar like "->" is eliminated and function
arguments with array types become pointers. (For an extensive list of how CIL
simplifies C programs, see Section 3.) This reduces the number of cases that
must be considered when manipulating a C program. CIL also separates type
declarations from code and flattens scopes within function bodies. This
structures the program in a manner more amenable to rapid analysis and
transformation. CIL computes the types of all program expressions, and makes all
type promotions and casts explicit. CIL supports all GCC and MSVC extensions
except for nested functions and complex numbers. Finally, CIL organizes C's
imperative features into expressions, instructions and statements based on the
presence and absence of side-effects and control-flow. Every statement can be
annotated with successor and predecessor information. Thus CIL provides an
integrated program representation that can be used with routines that require an
AST (e.g. type-based analyses and pretty-printers), as well as with routines
that require a CFG (e.g., dataflow analyses).

CIL comes accompanied by a number of Perl scripts that perform generally useful
operations on code: A driver which behaves as either the gcc or Microsoft VC
compiler and can invoke the preprocessor followed by the CIL application. The
advantage of this script is that you can easily use CIL and the analyses written
for CIL with existing make files.

A whole-program merger that you can use as a replacement for your compiler and
it learns all the files you compile when you make a project and merges all of
the preprocessed source files into a single one. This makes it easy to do
whole-program analysis.

A patcher makes it easy to create modified copies of the system include files.
The CIL driver can then be told to use these patched copies instead of the
standard ones. 

CIL has been tested very extensively. It is able to process the SPECINT95
benchmarks, the Linux kernel, GIMP and other open-source projects. All of these
programs are compiled to the simple CIL and then passed to gcc and they still
run! We consider the compilation of Linux a major feat especially since Linux
contains many of the ugly GCC extensions (see Section 15.2). This adds to about
1,000,000 lines of code that we tested it on. It is also able to process the few
Microsoft NT device drivers that we have had access to. CIL was tested against
GCC's c-torture testsuite and (except for the tests involving complex numbers
and inner functions, which CIL does not currently implement) CIL passes most of
the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests that
it should pass. GCC itself fails 19 tests. A total of 1400 regression test cases
are run automatically on each change to the CIL sources.

CIL is relatively independent on the underlying machine and compiler. When you
build it CIL will configure itself according to the underlying compiler.
However, CIL has only been tested on Intel x86 using the gcc compiler on Linux
and cygwin and using the MS Visual C compiler. (See below for specific versions
of these compilers that we have used CIL for.)

The largest application we have used CIL for is CCured, a compiler that compiles
C code into type-safe code by analyzing your pointer usage and inserting runtime
checks in the places that cannot be guaranteed statically to be type safe.
[Note: the Cyclone folks think they did CCured one better; see their PDF intro
which mentions CCured.]
Feb 19 2003
parent "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> writes:
Pretty neat!  Seems like a much easier route than implementing all the
back-end pieces of a compiler yet again.

   Dan


"Mark Evans" <Mark_member pathlink.com> wrote in message
news:b2vekk$2940$1 digitaldaemon.com...
 http://manju.cs.berkeley.edu/cil/index.html

 I was amused that CIL is written in OCaml.  OCaml just continues to amaze.
The
 CIL license is loose, so this tool might have uses for D.  I can envision
a D
 front end written in OCaml that is one-quarter its present size and twice
as
 robust.  The CIL tool has processed the ENTIRE linux kernel successfully,
quirks
 and all.  -M.

 ---------------------------------------------------------------

 CIL (C Intermediate Language) is a high-level representation along with a
set of
 tools that permit easy analysis and source-to-source transformation of C
 programs.

 CIL is both lower-level than abstract-syntax trees, by clarifying
ambiguous
 constructs and removing redundant ones, and also higher-level than typical
 intermediate languages designed for compilation, by maintaining types and
a
 close relationship with the source program. The main advantage of CIL is
that it
 compiles all valid C programs into a few core constructs with a very clean
 semantics. Also CIL has a syntax-directed type system that makes it easy
to
 analyze and manipulate C programs. Furthermore, the CIL front-end is able
to
 process not only ANSI-C programs but also those using Microsoft C or GNU C
 extensions. If you do not use CIL and want instead to use just a C parser
and
 analyze programs expressed as abstract-syntax trees then your analysis
will have
 to handle a lot of ugly corners of the language (let alone the fact that
parsing
 C itself is not a trivial task). See Section 15 for some examples of such
 extreme programs that CIL simplifies for you.

 In essence, CIL is a highly-structured, 'clean' subset of C. CIL features
a
 reduced number of syntactic and conceptual forms. For example, all looping
 constructs are reduced to a single form, all function bodies are given
explicit
 return statements, syntactic sugar like "->" is eliminated and function
 arguments with array types become pointers. (For an extensive list of how
CIL
 simplifies C programs, see Section 3.) This reduces the number of cases
that
 must be considered when manipulating a C program. CIL also separates type
 declarations from code and flattens scopes within function bodies. This
 structures the program in a manner more amenable to rapid analysis and
 transformation. CIL computes the types of all program expressions, and
makes all
 type promotions and casts explicit. CIL supports all GCC and MSVC
extensions
 except for nested functions and complex numbers. Finally, CIL organizes
C's
 imperative features into expressions, instructions and statements based on
the
 presence and absence of side-effects and control-flow. Every statement can
be
 annotated with successor and predecessor information. Thus CIL provides an
 integrated program representation that can be used with routines that
require an
 AST (e.g. type-based analyses and pretty-printers), as well as with
routines
 that require a CFG (e.g., dataflow analyses).

 CIL comes accompanied by a number of Perl scripts that perform generally
useful
 operations on code: A driver which behaves as either the gcc or Microsoft
VC
 compiler and can invoke the preprocessor followed by the CIL application.
The
 advantage of this script is that you can easily use CIL and the analyses
written
 for CIL with existing make files.

 A whole-program merger that you can use as a replacement for your compiler
and
 it learns all the files you compile when you make a project and merges all
of
 the preprocessed source files into a single one. This makes it easy to do
 whole-program analysis.

 A patcher makes it easy to create modified copies of the system include
files.
 The CIL driver can then be told to use these patched copies instead of the
 standard ones.

 CIL has been tested very extensively. It is able to process the SPECINT95
 benchmarks, the Linux kernel, GIMP and other open-source projects. All of
these
 programs are compiled to the simple CIL and then passed to gcc and they
still
 run! We consider the compilation of Linux a major feat especially since
Linux
 contains many of the ugly GCC extensions (see Section 15.2). This adds to
about
 1,000,000 lines of code that we tested it on. It is also able to process
the few
 Microsoft NT device drivers that we have had access to. CIL was tested
against
 GCC's c-torture testsuite and (except for the tests involving complex
numbers
 and inner functions, which CIL does not currently implement) CIL passes
most of
 the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests
that
 it should pass. GCC itself fails 19 tests. A total of 1400 regression test
cases
 are run automatically on each change to the CIL sources.

 CIL is relatively independent on the underlying machine and compiler. When
you
 build it CIL will configure itself according to the underlying compiler.
 However, CIL has only been tested on Intel x86 using the gcc compiler on
Linux
 and cygwin and using the MS Visual C compiler. (See below for specific
versions
 of these compilers that we have used CIL for.)

 The largest application we have used CIL for is CCured, a compiler that
compiles
 C code into type-safe code by analyzing your pointer usage and inserting
runtime
 checks in the places that cannot be guaranteed statically to be type safe.
 [Note: the Cyclone folks think they did CCured one better; see their PDF
intro
 which mentions CCured.]
Feb 19 2003