Compiler CH4 Grammars and Parsing
Compiler CH4 Grammars and Parsing
Parsing: Syntax Analysis
- decides which part of the incoming token stream should be grouped together.
- the output of parsing is some representation of a parse tree.
- intermediate code generator transforms the parse tree into an intermediate language.
Comparisons between reqular expr. and context-free grammars
A context-free grammar:
𝑒𝑥𝑝→exp〖𝑜𝑝 exp|(𝑒𝑥𝑝)〗 |𝑛𝑢𝑚𝑏𝑒𝑟
𝑜𝑝→+|−|∗
A regular expression:
𝑛𝑢𝑚𝑏𝑒𝑟=𝑑𝑖𝑔𝑖𝑡 〖𝑑𝑖𝑔𝑖𝑡〗^∗
𝑑𝑖𝑔𝑖𝑡=0|1|2|3|4|5|6|7|8|9
The major difference is that the rules of a context-free grammar are recursive.
Rules from F.A. to CFG
- For each state there is a nonterminal symbol.
- If state 𝐴 has a transition to state 𝐵 on symbol 𝑎, introduce 𝐴→𝑎𝐵.
- If 𝐴 goes to 𝐵 on input 𝜆, introduce 𝐴→𝐵.
- If 𝐴 is an accepting state, introduce 𝐴→𝜆.
- Make the start state of the NFA be the start symbol of the grammar.
Ex.
Why do not we use c.f.g to replac r.e?
r.e. => easy & clear description for token.
r.e. => efficient token recognizer
modularizing the components (The grammar rules use regular expressions as components)
Feature of programming language
contents:
- declarations
- sequential statements
- iterative statements
- conditional statements
Description of programming language
- Syntax Diagrams
- Context Free Grammars (CFG)
Capabilities of Context-free grammars
- give precise syntactic specification of programming languages
- a parser can be constructed automatically by CFG
- the syntax entity specified in CFG can be used for translating into object code.
- useful for describing nested structures such as balanced parentheses, matching begin-end’s, corresponding if-then-else, etc.
Context-Free Grammars: Concepts and Notation
- A context-free grammar 𝐺=
- A finite terminal vocabulary
- The token set produced by scanner
- A finite set of nonterminal vocabulary
- Intermediate symbols
- A finite terminal vocabulary
- A start symbol 𝑆∈
that starts all derivations - Also called goal symbol
- 𝑃, a finite set of productions (rewriting rules) of the form
- 𝐴→𝜆 is a valid production
Derivation
- One step derivation
- If 𝐴→𝛾, then 𝛼𝐴𝛽⇒𝛼𝛾𝛽
- One or more steps derivation ⟹^+
- Zero or more steps derivation ⟹^∗
If 𝑆⟹
- SF(G) is the set of sentential forms of grammar G (may contain nonterminal vocabulary )
𝐿(𝐺)={
𝐿(𝐺)=SF(G)∩
回溯CFG
Errors in Context-Free Grammars
unless nonterminal
ambiguous
- rewrite CFG
- specifying the intention(使用括號)
Transforming Extened BNF Grammars
Extended BNF≡BNF
- Extended BNF allows
- Square bracket []
- Optional list {}
Parsers and Recognizers
Two general approaches to parsing
- Top-down parser
- Expanding the parse tree (via predictions) in a depth-first manner
- Preorder traversal of the parse tree
- Predictive in nature
- lm
- LL
- Buttom-down parser
- Beginning at its bottom (the leaves of the tree, which are terminal symbols) and determining the productions used to generate the leaves
- Postorder traversal of the parse tree
- rm
- LR
Grammar Analysis Algorithms
Goal of this section
- discuss a number of important analysis algo. for grammars
Cont’d
- data structure of a grammar G
what nonterminals can derive
- A => BCD => BC => B =>
- An iterative marking algo.
ex.
Rule
Follow(A)
- A is any nonterminal
- Follow(A) is the set of terminals that may follow A in some sentential form (跟隨在A後面的終止符號)
- $Follow(A) = {a \in V_t | S=> ^* …Aa…} \cup {if, S=> ^+aA , then , \
Compiler CH4 Grammars and Parsing
https://z-hwa.github.io/webHome/[object Object]/Compiler/Compiler-CH4-Grammars-and-Parsing/