课程主页:

https://www.edx.org/course/compilers

课程视频:

https://www.bilibili.com/video/BV1NE411376V?p=20&t=6

参考资料:

https://github.com/dychen/compilers/tree/master/PA3

https://github.com/yifanyang96/Compiler/tree/master/PA3

https://github.com/skyzluo/CS143-Compilers-Stanford

https://github.com/afterthat97/cool-compiler

error处理:

http://marvin.cs.uidaho.edu/Teaching/CS445/bisonErrorToken.html

https://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html

这次回顾作业3——语法分析,本次作业和前一次相比要容易一些,困难的部分主要是错误处理,主要参考了第一份资料。

作业分析

准备工作

完成本次作业的整体流程如下:

  1. 阅读如下内容:

    1. PA3。
    2. cool-manual第16页的运算符优先级,第17页的上下文无关文法。
    3. cool-tour第6章。
    4. bison介绍:
      1. flex与bison第3章(重点推荐)。
      2. http://westes.github.io/flex/manual/
      3. https://www.gnu.org/software/bison/manual/bison.html
    5. cool-parse.h,stringtab.h,stringtab_functions.h
  2. 编写cool.y

  3. 下载评分脚本:

    wget https://courses.edx.org/asset-v1:StanfordOnline+SOE.YCSCS1+1T2020+type@asset+block@pa2-grading.pl

    编译:

    make -f Makefile
    make parser

    测试:

    perl pa2-grading.pl

上下文无关文法

本次作业主要参考上下文无关文法编写:

代码中写明了对应的非终结符关系:

/*
*  cool.y
*              Parser definition for the COOL language.
*
*/
%{
  #include <iostream>
  #include "cool-tree.h"
  #include "stringtab.h"
  #include "utilities.h"
  
  extern char *curr_filename;
  
  
  /* Locations */
  #define YYLTYPE int              /* the type of locations */
  #define cool_yylloc curr_lineno  /* use the curr_lineno from the lexer for the location of tokens */
    
    extern int node_lineno;          /* set before constructing a tree node
    to whatever you want the line number
    for the tree node to be */
      
      
      #define YYLLOC_DEFAULT(Current, Rhs, N)         \
      Current = Rhs[1];                             \
      node_lineno = Current;
    
    
    #define SET_NODELOC(Current)  \
    node_lineno = Current;
    
    /* IMPORTANT NOTE ON LINE NUMBERS
    *********************************
    * The above definitions and macros cause every terminal in your grammar to 
    * have the line number supplied by the lexer. The only task you have to
    * implement for line numbers to work correctly, is to use SET_NODELOC()
    * before constructing any constructs from non-terminals in your grammar.
    * Example: Consider you are matching on the following very restrictive 
    * (fictional) construct that matches a plus between two integer constants. 
    * (SUCH A RULE SHOULD NOT BE  PART OF YOUR PARSER):
    
    plus_consts	: INT_CONST '+' INT_CONST 
    
    * where INT_CONST is a terminal for an integer constant. Now, a correct
    * action for this rule that attaches the correct line number to plus_const
    * would look like the following:
    
    plus_consts	: INT_CONST '+' INT_CONST 
    {
      // Set the line number of the current non-terminal:
      // ***********************************************
      // You can access the line numbers of the i'th item with @i, just
      // like you acess the value of the i'th exporession with $i.
      //
      // Here, we choose the line number of the last INT_CONST (@3) as the
      // line number of the resulting expression (@$). You are free to pick
      // any reasonable line as the line number of non-terminals. If you 
      // omit the statement @$=..., bison has default rules for deciding which 
      // line number to use. Check the manual for details if you are interested.
      @$ = @3;
      
      
      // Observe that we call SET_NODELOC(@3); this will set the global variable
      // node_lineno to @3. Since the constructor call "plus" uses the value of 
      // this global, the plus node will now have the correct line number.
      SET_NODELOC(@3);
      
      // construct the result node:
      $$ = plus(int_const($1), int_const($3));
    }
    
    */
    
    
    
    void yyerror(char *s);        /*  defined below; called for each parse error */
    extern int yylex();           /*  the entry point to the lexer  */
    
    /************************************************************************/
    /*                DONT CHANGE ANYTHING IN THIS SECTION                  */
    
    Program ast_root;	      /* the result of the parse  */
    Classes parse_results;        /* for use in semantic analysis */
    int omerrs = 0;               /* number of errors in lexing and parsing */
    %}
    
    /* A union of all the types that can be the result of parsing actions. */
    %union {
      Boolean boolean;
      Symbol symbol;
      Program program;
      Class_ class_;
      Classes classes;
      Feature feature;
      Features features;
      Formal formal;
      Formals formals;
      Case case_;
      Cases cases;
      Expression expression;
      Expressions expressions;
      char *error_msg;
    }
    
    /* 
    Declare the terminals; a few have types for associated lexemes.
    The token ERROR is never used in the parser; thus, it is a parse
    error when the lexer returns it.
    
    The integer following token declaration is the numeric constant used
    to represent that token internally.  Typically, Bison generates these
    on its own, but we give explicit numbers to prevent version parity
    problems (bison 1.25 and earlier start at 258, later versions -- at
    257)
    */
    %token CLASS 258 ELSE 259 FI 260 IF 261 IN 262 
    %token INHERITS 263 LET 264 LOOP 265 POOL 266 THEN 267 WHILE 268
    %token CASE 269 ESAC 270 OF 271 DARROW 272 NEW 273 ISVOID 274
    %token <symbol>  STR_CONST 275 INT_CONST 276 
    %token <boolean> BOOL_CONST 277
    %token <symbol>  TYPEID 278 OBJECTID 279 
    %token ASSIGN 280 NOT 281 LE 282 ERROR 283
    
    /*  DON'T CHANGE ANYTHING ABOVE THIS LINE, OR YOUR PARSER WONT WORK       */
    /**************************************************************************/
    
    /* Complete the nonterminal list below, giving a type for the semantic
    value of each non terminal. (See section 3.6 in the bison 
    documentation for details). */
    
    /* Declare types for the grammar's non-terminals. */
    %type <program> program
    %type <classes> class_list
    %type <class_> class
    
    /* You will want to change the following line. */
    %type <feature> feature
    %type <features> feature_list
    %type <formal> formal
    %type <formals> formal_list   
    %type <case_> branch
    %type <cases> branch_list
    %type <expression> expr
    %type <expression> while
    %type <expression> let
    %type <expressions> expr_list
    %type <expressions> expr_list_
    
    /* Precedence declarations go here. */
    %right ASSIGN
    %left NOT
    %nonassoc LE '<' '='
    %left '+' '-'
    %left '*' '/'
    %left ISVOID
    %left '~'
    %left '@'
    %left '.'

    %%
    /* 
    Save the root of the abstract syntax tree in a global variable.
    */
    program	
      : class_list	
        { @$ = @1;
          ast_root = program($1); };
    
    /* [[class;]]+ */
    class_list
      : class			/* single class */
        { $$ = single_Classes($1);
          parse_results = $$; }
      | class_list class	/* several classes */
        { $$ = append_Classes($1,single_Classes($2)); 
          parse_results = $$; };
    
    /* If no parent is specified, the class inherits from the Object class. */
    //* class; */
    class
      : CLASS TYPEID '{' feature_list '}' ';'
        { $$ = class_($2,idtable.add_string("Object"),$4,
          stringtable.add_string(curr_filename)); }
      | CLASS TYPEID INHERITS TYPEID '{' feature_list '}' ';'
        { $$ = class_($2,$4,$6,stringtable.add_string(curr_filename)); }
      | CLASS error ';' class 
        { $$ = $4; }
    
    /* Feature list may be empty, but no empty features in list. */
    /* [[feature;]]* */
    feature_list
      :		/* empty */
        { $$ = nil_Features(); }
      | feature_list feature
        { $$ = append_Features($1, single_Features($2)); };
      | feature_list error ';'
        { $$ = $1; }

    /* feature; */
    feature
      : OBJECTID '(' ')' ':' TYPEID '{' expr '}' ';'
        { $$ = method($1, nil_Formals(), $5, $7); }
      | OBJECTID '(' formal_list ')' ':' TYPEID '{' expr '}' ';'
        { $$ = method($1, $3, $6, $8); }
      | OBJECTID ':' TYPEID ';'
        { $$ = attr($1, $3, no_expr()); }
      | OBJECTID ':' TYPEID ASSIGN expr ';'
        { $$ = attr($1, $3, $5); } 

    /* formal */
    formal
      : OBJECTID ':' TYPEID
        { $$ = formal($1, $3); };
    
    /* [formal [[, formal]]* */
    formal_list
      : formal
        { $$ = single_Formals($1); }
      | formal_list ',' formal
        { $$ = append_Formals($1, single_Formals($3)); };

    /* expr [[, expr]]* */
    expr_list
      : expr
        { $$ = single_Expressions($1); }
      | expr_list ',' expr
        { $$ = append_Expressions($1, single_Expressions($3)); };

    /* [[expr;]]+ */
    expr_list_
      : expr ';'
        { $$ = single_Expressions($1); }
      | expr_list_  expr ';'
        { $$ = append_Expressions($1, single_Expressions($2)); };
      | expr_list_ error ';' 
        { $$ = $1; }      

    /* ID : TYPE [ <- expr ] [[, ID : TYPE [ <- expr ]]]* in expr */
    let
      : OBJECTID ':' TYPEID IN expr
        { $$ = let($1, $3, no_expr(), $5); }
      | OBJECTID ':' TYPEID ASSIGN expr IN expr
        { $$ = let($1, $3, $5, $7); }
      | OBJECTID ':' TYPEID ',' let
        { $$ = let($1, $3, no_expr(), $5); }
      | OBJECTID ':' TYPEID ASSIGN expr ',' let
        { $$ = let($1, $3, $5, $7); };
      | error ',' let
        { $$ = $3; }

    /* ID : TYPE => expr; */
    branch
      : OBJECTID ':' TYPEID DARROW expr ';'
        { $$ = branch($1, $3, $5); };

    /* [[ID : TYPE => expr;]]+ */
    branch_list
      : branch
        { $$ = single_Cases($1); }
      | branch_list branch 
        { $$ = append_Cases($1, single_Cases($2)); };

    /* nonempty expr  */
    expr
      // assign
      : OBJECTID ASSIGN expr
        { $$ = assign($1, $3); };
      // expr[@TYPE]:ID( [ expr [[; expr]]* ] )
      // dispath
      | expr '.' OBJECTID '(' ')'
        { $$ = dispatch($1, $3, nil_Expressions()); }
      | expr '.' OBJECTID '(' expr_list ')'
        { $$ = dispatch($1, $3, $5); }
      // static dispatch
      | expr '@' TYPEID '.' OBJECTID '(' ')'
        { $$ = static_dispatch($1, $3, $5, nil_Expressions()); }
      | expr '@' TYPEID '.' OBJECTID '(' expr_list ')'
        { $$ = static_dispatch($1, $3, $5, $7); };
      // ID( [ expr [[; expr]]* ] )
      | OBJECTID '(' ')'
        { $$ = dispatch(object(idtable.add_string("self")), $1, nil_Expressions()); }
      | OBJECTID '(' expr_list ')'
        { $$ = dispatch(object(idtable.add_string("self")), $1, $3); };
      // if expr then expr else expr fi
      | IF expr THEN expr ELSE expr FI
        { $$ = cond($2, $4, $6); };
      // while expr loop expr pool
      | WHILE expr LOOP expr POOL
        { $$ = loop($2, $4); };
      // { [[expr; ]]+ }
      | '{' expr_list_ '}'
        { $$ = block($2); };
      // let ID : TYPE [ <- expr ] [[; ID : TYPE [ <- expr ]]]* in expr
      | LET let
        { $$ = $2; };
      // case expr of [[ID : TYPE => expr; ]]+ esac
      | CASE expr OF branch_list ESAC
        { $$ = typcase($2, $4); };
      // new TYPE
      | NEW TYPEID
        { $$ = new_($2); };
      // isvoid expr
      | ISVOID expr
        { $$ = isvoid($2); };
      // expr + expr
      | expr '+' expr
        { $$ = plus($1, $3); };
      // expr - expr
      | expr '-' expr
        { $$ = sub($1, $3); };
      // expr * expr
      | expr '*' expr
        { $$ = mul($1, $3); };
      // expr / expr
      | expr '/' expr
        { $$ = divide($1, $3); };
      // ~expr
      | '~' expr
        { $$ = neg($2); };
      // expr < expr
      | expr '<' expr
        { $$ = lt($1, $3); };
      // expr <= expr
      | expr LE expr
        { $$ = leq($1, $3); };
      // expr = expr
      | expr '=' expr
        { $$ = eq($1, $3); };
      // not expr
      | NOT expr
        { $$ = comp($2); };
      // (expr)
      | '(' expr ')'
        { $$ = $2; };
      // ID
      | OBJECTID
        { $$ = object($1); };
      // integer
      | INT_CONST
        { $$ = int_const($1); };
      // string
      | STR_CONST
        { $$ = string_const($1); };
      // true, false
      | BOOL_CONST
        { $$ = bool_const($1); };
    
    /* end of grammar */
    %%
    
    /* This function is called automatically when Bison detects a parse error. */
    void yyerror(char *s)
    {
      extern int curr_lineno;
      
      cerr << "\"" << curr_filename << "\", line " << curr_lineno << ": " \
      << s << " at or near ";
      print_cool_token(yychar);
      cerr << endl;
      omerrs++;
      
      if(omerrs>50) {fprintf(stdout, "More than 50 errors\n"); exit(1);}
    }  

优先级申明

下表给出了从最高到最低的中缀二元运算和前缀一元运算的优先级:

.
@
~
isvoid
* /
+ -
<= < =
not
<-

所有二元运算都是左结合的,除了赋值是右结合的,还有三个比较运算是没有结合性的。

据此给出代码中的运算符优先级,注意bison中优先级是从低到高排列:

/* Precedence declarations go here. */
%right ASSIGN
%left NOT
%nonassoc LE '<' '='
%left '+' '-'
%left '*' '/'
%left ISVOID
%left '~'
%left '@'
%left '.'

错误处理

完成之前可以先阅读:

http://marvin.cs.uidaho.edu/Teaching/CS445/bisonErrorToken.html

https://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html

这部分是这次作业的难点,主要是根据PA2的错误处理部分:

  • 如果类定义中有错误,但该类正确终止,并且下一个类的语法正确,则解析器应该能够在下一个类定义时重新启动。
  • 类似地,解析器应该从feature(继续到下一个feature)、let绑定(继续到下一个变量)和$\{\ldots \}$块中的表达式中的错误中恢复。

这两部分对应如下三种含义:

  • 如果处理class时报错,应该忽略该错误,然后处理后续的class。
  • 如果处理feature时报错,则忽略该错误,然后处理后续的feature。
  • 如果处理let时报错,则忽略该错误,然后处理后续的let。

对应的代码:

class
  : CLASS TYPEID '{' feature_list '}' ';'
    { $$ = class_($2,idtable.add_string("Object"),$4,
      stringtable.add_string(curr_filename)); }
  | CLASS TYPEID INHERITS TYPEID '{' feature_list '}' ';'
    { $$ = class_($2,$4,$6,stringtable.add_string(curr_filename)); }
  | CLASS error ';' class 
    { $$ = $4; }
    
feature_list
  :		/* empty */
    { $$ = nil_Features(); }
  | feature_list feature
    { $$ = append_Features($1, single_Features($2)); };
  | feature_list error ';'
    { $$ = $1; }
    
let
  : OBJECTID ':' TYPEID IN expr
    { $$ = let($1, $3, no_expr(), $5); }
  | OBJECTID ':' TYPEID ASSIGN expr IN expr
    { $$ = let($1, $3, $5, $7); }
  | OBJECTID ':' TYPEID ',' let
    { $$ = let($1, $3, no_expr(), $5); }
  | OBJECTID ':' TYPEID ASSIGN expr ',' let
    { $$ = let($1, $3, $5, $7); };
  | error ',' let
    { $$ = $3; }

运行结果

=====================================================================
submission: ..

=====================================================================
You got a score of 70 out of 70.