4

How can I build an AST (Abstract Syntax Tree) from gcc C code in order to make some transformations, like below, and reproduce(generate) the code to C syntax again after that?

    if(condition_1){
     //lines of code 1
    }
    #ifdef expression_1
        else if(condition_2){
           //lines of code 2
        }
   #endif

into

bool test = condition_1;
if(teste){
 //lines of code 1
}
#ifdef expression_1
  if(!(test) && condition_2){
    //lines of code 2
  }
#endif
1
  • Would be closers: this question is clear, and has a clear answer.
    – Ira Baxter
    Commented Dec 29, 2015 at 7:41

1 Answer 1

3

GCC itself will build ASTs, but not before expanding the preprocessor directives. So the preprocessor conditionals are gone. Reinstalling them after you have done the transformations will be extremely hard. Doing transformations that involved the conditionals themselves will be impossible. So GCC itself is not a good way to get the ASTs you want.

If you want to parse your code example (the conditional wrapped around the else if is really nice!), you need a reengineering parser. These are parsers designed to support refactoring. Such parsers need to capture more than traditional parsers, e.g., column numbers of tokens, the format of lexical items, etc., to enable the regeneration of source text from the modified tree. For C, such a parser must capture the proprocessor directives, too. These are pretty rare.

One such reengineering parser is our DMS Software Reengineering Toolkit and its C front end, which handles many dialects of C including GCC 2/3/4/5. It is designed explicitly to capture preprocessor conditionals (including your specific example). DMS also has support for carrying out the transformations using source-to-source transformations.

For a changed-to-make-legal version of OP's example, placed in test.c:

void main () {
  if (condition_1) {
     x++; 
  }
  #ifdef expression_1
  else if (condition_2) {
         y++;
       }
  #endif
}

... the DMS C~GCC4 parser (out of the box) produces the following AST:

C:\DMS\Domains\C\GCC4\Tools\Parser\Source>run ..\domainparser ++AST C:\temp\test.c
C~GCC4 Domain Parser Version 3.0.1(28449)
Copyright (C) 1996-2015 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
AST Optimizations: remove constant tokens, remove unary productions, compact sequences
Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I

28 tree nodes in tree.
(translation_unit@C~GCC4=2#3cde920^0 Line 1 Column 1 File C:/temp/test.c
 (function_definition@C~GCC4=966#3cde740^1#3cde920:1 Line 1 Column 1 File C:/temp/test.c
  (function_head@C~GCC4=967#3047320^1#3cde740:1 Line 1 Column 1 File C:/temp/test.c
   (simple_type_specifier@C~GCC4=686#3047180^1#3047320:1 Line 1 Column 1 File C:/temp/test.c)simple_type_specifier
   (direct_declarator@C~GCC4=852#3047380^1#3047320:2 Line 1 Column 6 File C:/temp/test.c
   |(IDENTIFIER@C~GCC4=1531#3047160^1#3047380:1[`main'] Line 1 Column 6 File C:/temp/test.c)IDENTIFIER
   |(parameter_declaration_clause@C~GCC4=900#30473c0^1#3047380:2 Line 1 Column 12 File C:/temp/test.c)parameter_declaration_clause
   )direct_declarator#3047380
  )function_head#3047320
  (compound_statement@C~GCC4=507#3cde1e0^1#3cde740:2 Line 1 Column 14 File C:/temp/test.c
   (selection_statement@C~GCC4=539#3cde940^1#3cde1e0:1 Line 2 Column 3 File C:/temp/test.c
   |(if_head@C~GCC4=550#30476e0^1#3cde940:1 Line 2 Column 3 File C:/temp/test.c
   | (IDENTIFIER@C~GCC4=1531#30473e0^1#30476e0:1[`condition_1'] Line 2 Column 7 File C:/temp/test.c)IDENTIFIER
   |)if_head#30476e0
   |(compound_statement@C~GCC4=507#3cde700^1#3cde940:2 Line 2 Column 20 File C:/temp/test.c
   | (expression_statement@C~GCC4=503#3047740^1#3cde700:1 Line 3 Column 6 File C:/temp/test.c
   |  (postfix_expression@C~GCC4=205#3047720^1#3047740:1 Line 3 Column 6 File C:/temp/test.c
   |   (IDENTIFIER@C~GCC4=1531#3047700^1#3047720:1[`x'] Line 3 Column 6 File C:/temp/test.c)IDENTIFIER
   |  )postfix_expression#3047720
   | )expression_statement#3047740
   |)compound_statement#3cde700
   |(if_directive@C~GCC4=1088#3cde7a0^1#3cde940:3 Line 5 Column 3 File C:/temp/test.c
   | ('#'@C~GCC4=1548#3cde820^1#3cde7a0:1[Keyword:0] Line 5 Column 3 File C:/temp/test.c)'#'
   | (IDENTIFIER@C~GCC4=1531#3cde1c0^1#3cde7a0:2[`expression_1'] Line 5 Column 10 File C:/temp/test.c)IDENTIFIER
   | (new_line@C~GCC4=1578#3cde800^1#3cde7a0:3[Keyword:0] Line 5 Column 22 File C:/temp/test.c)new_line
   |)if_directive#3cde7a0
   |(selection_statement@C~GCC4=527#3cde840^1#3cde940:4 Line 6 Column 8 File C:/temp/test.c
   | (IDENTIFIER@C~GCC4=1531#3047340^1#3cde840:1[`condition_2'] Line 6 Column 12 File C:/temp/test.c)IDENTIFIER
   | (compound_statement@C~GCC4=507#3cde860^1#3cde840:2 Line 6 Column 25 File C:/temp/test.c
   |  (expression_statement@C~GCC4=503#3cde8a0^1#3cde860:1 Line 7 Column 12 File C:/temp/test.c
   |   (postfix_expression@C~GCC4=205#3cde880^1#3cde8a0:1 Line 7 Column 12 File C:/temp/test.c
   |   |(IDENTIFIER@C~GCC4=1531#3cde780^1#3cde880:1[`y'] Line 7 Column 12 File C:/temp/test.c)IDENTIFIER
   |   )postfix_expression#3cde880
   |  )expression_statement#3cde8a0
   | )compound_statement#3cde860
   |)selection_statement#3cde840
   |(endif_directive@C~GCC4=1092#3cde8c0^1#3cde940:5 Line 9 Column 3 File C:/temp/test.c
   | ('#'@C~GCC4=1548#3cde900^1#3cde8c0:1[Keyword:0] Line 9 Column 3 File C:/temp/test.c)'#'
   | (new_line@C~GCC4=1578#3cde8e0^1#3cde8c0:2[Keyword:0] Line 9 Column 9 File C:/temp/test.c)new_line
   |)endif_directive#3cde8c0
   )selection_statement#3cde940
  )compound_statement#3cde1e0
 )function_definition#3cde740
)translation_unit#3cde920

EDIT: OP asks for example of how to do his transformation. As stated earlier, DMS allows source-to-source transformation patterns, of the form of "if you see this, replace it by that" stated in the surface syntax of the target language being manipulated (in this case, GCC4 version of C). The value of such transformations is that they are much easier to write than the traditional AST hacking code done by procedure calls.

To achieve OP's effect, he needs the following DMS transformation:

    default domain C~GCC4; // tells DMS to use C domain with GCC4 dialect

    rule transform_pp_conditional_else(c1: condition, c2: condition,
                                       s1: statements, s2: statements, 
                                       pc1: preprocessor_condition):
         statement -> statement

      "if (\c1) { \s1 }
       #ifdef \pc1
       else if (\c2) { \s2 }
       #endif"
   ->
       "{ bool test=\c1;
          if (test) { \s1 }
          #ifdef \pc1
          if (!test && \c2) { \s2 }
          #endif
        }"

The default domain declaration tells DMS that the following rules are for GCC4. The transformation is called a "rule" in DMS; it is parameterized by types of subtrees. The metaquotes "..." are to distinguish DMS rewrite rule syntax, from C~GCC4 syntax. I think the rest of it is clear enough.

5
  • Thanks!!! Do you have any tutorial or examples showing how to handle with the AST in order to do transformations? Commented Dec 29, 2015 at 14:03
  • 1
    Sure. Check out semanticdesigns.com/Products/DMS/DMSRewriteRules.html to see how DMS lets you write source-to-source transformations. You can also do direct AST tree hacking through a procedural interface, and practical tools mix both of these.
    – Ira Baxter
    Commented Dec 29, 2015 at 15:11
  • many thanks, it's just what i need! Is there a academic license for DMS and its C front end? Commented Dec 29, 2015 at 16:59
  • There is. Contact the company directly to inquire about this.
    – Ira Baxter
    Commented Dec 29, 2015 at 17:11
  • It's done. One more time, thanks for your availability! Commented Dec 29, 2015 at 17:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.