Using Python-RTL

First you need Python version 2.4 installed.

Testing analysis with sample code

You can test the analysis with sample code provided with RTL-Check

$ cd python-rtl
$ python analysis.py

By default it will analyze the RTL code from ../samples/mem.c.00.rtl.bin, which has been compiled from the C code ../samples/mem.c.

You can also ask it to analyze another file by passing it as the first argument:

$ python analysis.py ../samples/fact.c.00.rtl.bin

Understanding the output

For each function of the file that is analyzed, a diagnostic will be printed to indicate whether the function is considered safe or not. If it is considered safe, you will get a message like:

Analyzing write_global ...
this function is safe for the specified preconditions

The specified preconditions are hardcoded for now, see the next section for more informations about what they are.

If a function cannot be considered safe, a message will indicate why. For example:

Analyzing list_sum ...
Warning: line 10 : Memory access error in zone Zone(1_Unknown):

word of length 4 at offset 0 may be outside the bounds [Unknown,Unknown]

This does not mean that the function is unsafe, it only means Python-RTL does not know (yet) what makes it safe. In this case, there is a linked list terminated by a NULL pointer and we cannot tell Python-RTL to expect a pointer or NULL for now. When the code follows the pointer it does not know where it leads. This is why the message talks about Zone(1_Unknown). The message indicates that the "error" is at line 10 of the file mem.c. It tells that the code tries to access a word of 4 bytes (32 bits) at the offset 0 of an unknown zone.

The following message indicates a real error:

Analyzing global_sum_error ...
Warning: line 42 : Memory access error in zone Zone(22_Sym_array):

word of length 4 at offset [0, 52] may be outside the bounds [0,40]

Here the name of the zone has a significative name: array is a symbol declared in mem.c, it is an array of 10 integers. The message tells that the code tries to access this array at offsets between 0 and 52 (inclusively) but the array contains only 40 bytes.

Another possible message is:

Analyzing test_alloca ...
Warning: line 71 : Function call to f not supported

All function calls are reported as possible unsafe operation (for now) because Python-RTL ignores them. A function could access unallocated memory or it could corrupt the data structures of its calling function, thus they are all considered unsafe.

Analyzing your own code

Here are the steps to follow if you want to analyze your own code.

Build a patched GCC

The code that can be analysed is the RTL that is dumped by a patch to GCC. You will find this patch in the subdirectory rtl-dump. It is against version 3.3.2 of GCC. The steps are:

Download gcc-3.3.2.tar.gz from ftp://ftp.gnu.org/gnu/gcc/gcc-3.3.2
untar it
cd gcc-3.3.2
patch -p1 <../../rtl-check/rtl-dump/rtl-dump.patch
follow the instructions from GCC to build it or simply:
- ./configure
- make (this will take a long time...)
make install (by default it will install to /usr/local/...) 2. Dump the rtl of your source code

Now you have to dump the RTL that GCC generates as a binary file by compiling it with the patched GCC. For this use the switch -dr. Make sure you use the patched GCC by ensuring /usr/local/bin is in your PATH before /usr/bin:

PATH=/usr/local/bin:$PATH
gcc -g -c -dr yourfile.c

If you omit -g you will not see line numbers. The option -c prevents GCC from trying to link the file. This will generate a file name yourfile.c.00.rtl.bin. If you only have yourfile.c.00.rtl (without .bin) this is because you use an unpatched gcc. Make sure you applied the patch, make sure you installed it in /usr/local and make sure your PATH is set correctly. You can test your installation of GCC on the sample files provided with rtl-check. You will see the date of the .bin file has changed.

It is possible to dump the RTL at different stages during the compilation by using different -d? swithes, but you will get the best results with -dr because the RTL is still much like the source code at this step.

3. First analysis of your code

cd rtl-check/python-rtl
python analysis.py ../../path/to/yourfile.c.00.rtl.bin

There will be many errors reported because Python-RTL does not know about your data (the preconditions).

4. Getting more useful results

By default, Python-RTL does not give very useful results yet. This will be improved in future versions but for now you have to enter the preconditions of your functions manually. These preconditions indicate the number of parameters expected by a function, the number of local variables it uses, the size of static variables, where (in memory or registers) to expect pointers and the size of the zone they point to.

Some preconditions are common to all functions (e.g. which register points to the arguments or the local variables). They are hardcoded in analysis.py.

The other preconditions are specific to a source file (those about static variables) or a function (those about local variables or arguments). For now these must be entered in datadb.py. This file contains the preconditions for all sample files provided. See the declaration of _addStatic, _addFunction and _setSpecial for information on how to specify preconditions for your code. Keep in mind that some complex conditions cannot be represented for now and some that can be represented are ignored by the analysis. For example a linked list terminated by a NULL pointer or a string terminated by a NULL character cannot be represented. Sometimes it is useful to use many different set of preconditions for one function. It is not possible to tell that a variable can point to "X or Y", but it is possible to run the analysis once with a pointer to X and once with a pointer to Y.

Thus, the steps to improve the results of the analysis are:

add preconditions to datadb.py (entries for sample code are good examples)
run the analysis again
modify the preconditions if necessary
try again...

Python-RTL does not support all RTL instructions yet and there might still be some bugs left in the analysis. If you see an exception or an assertion that fails, please report it to the mailing list rtlcheck-devel@lists.sourceforge.net or me directly (tootix@writeme.com). I would also be very happy to know about your successes, failures or suggestions.

In a future version, the information about static variables, arguments and local variables will be extracted automatically from the structures of GCC. Also, some of the relations between pointers could be infered by static analysis.