Previous Up Next

5  Transformation

Coccinelle semantic patches are able to transform C code.

5.1  Basic transformations

The transformation specification essentially has the form of C code, except that lines to remove are annotated with - in the first column, and lines to add are annotated with +. A transformation specification can also use dots, “...”, describing an arbitrary sequence of function arguments or instructions within a control-flow path. Implicitly, “...” matches the shortest path between something that matches the pattern before the dots (or the beginning of the function, if there is nothing before the dots) and something that matches the pattern after the dots (or the end of the function, if there is nothing after the dots). Dots may be modified with a when clause, indicating a pattern that should not occur anywhere within the matched sequence. The shortest path constraint is implemented by requiring that the pattern (if any) appearing immediately before the dots and the pattern (if any) appearing immediately after the dots are not matched by the code matched by the dots. when any removes the aforementioned constraint that “...” matches the shortest path. Finally, a transformation can specify a disjunction of patterns, of the form ( pat1 | | patn ) where each (, | or ) is in column 0 or preceded by \. Similarly, a transformation can specify a conjunction of patterns, of the form ( pat1 & & patn ) where each (, & or ) is in column 0 or preceded by \. All of the patterns must be matched at the same place in the control-flow graph.

The grammar that we present for the transformation is not actually the grammar of the SmPL code that can be written by the programmer, but is instead the grammar of the slice of this consisting of the - annotated and the unannotated code (the context of the transformed lines), or the + annotated code and the unannotated code. For example, for parsing purposes, the following transformation is split into the two variants shown below and each is parsed separately.

1 proc_info_func(...) { 2 <... 3 - hostno 4 + hostptr->host_no 5 ...> 6 }
1 proc_info_func(...) { 2 <... 3 - hostno 4 ...> 5 }
 
1 proc_info_func(...) { 2 <... 3 + hostptr->host_no 4 ...> 5 }

Requiring that both slices parse correctly ensures that the rule matches syntactically valid C code and that it produces syntactically valid C code. The generated parse trees are then merged for use in the subsequent matching and transformation process.

The grammar for the minus or plus slice of a transformation is as follows:

transformation   ::=  include +
|OPTDOTSEQ(top, when)
include   ::=  #include include_string
top   ::=  expr
|decl_stmt +
|fundecl
when   ::=  when != when_code
|when = rule_elem_stmt
|when COMMA_LIST(any_strict)
|when true != expr
|when false != expr
when_code   ::=  OPTDOTSEQ(decl_stmt +, when)
|OPTDOTSEQ(expr, when)
rule_elem_stmt   ::=  one_decl
|expr;
|return [expr];
|break;
|continue;
|\(rule_elem_stmt (\| rule_elem_stmt) +\)
any_strict   ::=  any
|strict
|forall
|exists
OPTDOTSEQ(grammar_ds, when_ds)   ::=   
    [... (when_ds) *] grammar_ds (... (when_ds) * grammar_ds) * [... (when_ds) *]

Lines may be annotated with an element of the set {-, +, *} or the singleton ?, or one of each set. ? represents at most one match of the given pattern, ie a match of the pattern is optional. * is used for semantic match, i.e., a pattern that highlights the fragments annotated with *, but does not perform any modification of the matched code. The code is presented with lines containing a match of a starred line preceded by -, but this is not intended as a removal and applying the output as a patch to the original code will likely not result in correct code. * cannot be mixed with - and +. There are some constraints on the use of these annotations:

An #include may be followed by "...", <...> or simply .... With either quotes or angle brackets, it is possible to put a partial path, ending with ..., such as <include/...>, or to put a complete path. A #include with ... matches any include, with either quotes or angle brackets. Partial paths or complete are not allowed in the latter case. Something that is added before an include will be put before the last matching include that is not under an ifdef in the file. Likewise, something that is added after an include will be put after the last matching include that is not under an ifdef in the file.

Each element of a disjunction must be a proper term like an expression, a statement, an identifier or a declaration. The constraint on a conjunction is similar. Thus, the rule on the left below is not a syntactically correct SmPL rule. One may use the rule on the right instead.

1 @@ 2 type T; 3 T b; 4 @@ 5 6 ( 7 writeb(..., 8 | 9 readb(..., 10 ) 11 -(T) 12 b)
              
1 @@ 2 type T; 3 T b; 4 @@ 5 6 ( 7 read 8 | 9 write 10 ) 11 (..., 12 - (T) 13 b)

Some kinds of terms can only appear in + code. These include comments, ifdefs, and attributes (__attribute__((...))).

5.2  Advanced transformations

You may run into the situation where a semantic patch needs to add several disjoint terms at the same place in the code. Coccinelle does not know in which order these terms should appear, and thus gives an “already tagged token” error in this situation. If you are sure that order does not matter you can use the optional double addition token ++ to indicate to Coccinelle that it may add things in any order. This may be for instance safe in situations such as extending a data structure with more members, based on existing members of the data structure. The following rule helps to extend a data structure with a respective float for a present int. If there is only one int field in the data structure, this semantic patch works well with the simple +.

1 @simpleplus@ 2 identifier x,v; 3 fresh identifier xx = v ## "_float"; 4 @@ 5 6 struct x { 7 + float xx; 8 ... 9 int v; 10 ... 11 }

This semantic patch works fine, for example, on the following code (plusplus1.c):

1 struct x { 2 int z; 3 char b; 4 };

If however there are multiple int fields tokens that Coccinelle can transform, order cannot be guaranteed for how Coccinelle makes additions. If you are sure order does not matter for the transformation you may use ++ instead, as follows:

1 @plusplus@ 2 identifier x,v; 3 fresh identifier xx = v ## "_float"; 4 @@ 5 6 struct x { 7 ++ float xx; 8 ... 9 int v; 10 ... 11 }

This rule would work against a file plusplus2.c that has three int fields:

1 struct x { 2 int z; 3 int a; 4 char b; 5 int c; 6 int *d; 7 };

A possible result is as shown below. The precise order of the float fields is however not guaranteed with respect to each other:

1 struct x { 2 float a_float; 3 float c_float; 4 float z_float; 5 int z; 6 int a; 7 char b; 8 int c; 9 int *d; 10 };

If you used simpleplus rule on plusplus2.c you would end up with an “already tagged token” error due to the ordering considerations explained in this section.


Previous Up Next