Compiler CH4 Grammars and Parsing

Parsing: Syntax Analysis

decides which part of the incoming token stream should be grouped together.
the output of parsing is some representation of a parse tree.
intermediate code generator transforms the parse tree into an intermediate language.

Comparisons between reqular expr. and context-free grammars

A context-free grammar:

𝑒𝑥𝑝→exp⁡〖𝑜𝑝 exp⁡|(𝑒𝑥𝑝)〗 |𝑛𝑢𝑚𝑏𝑒𝑟
𝑜𝑝→+|−|∗

A regular expression:

𝑛𝑢𝑚𝑏𝑒𝑟=𝑑𝑖𝑔𝑖𝑡 〖𝑑𝑖𝑔𝑖𝑡〗^∗
𝑑𝑖𝑔𝑖𝑡=0|1|2|3|4|5|6|7|8|9

The major difference is that the rules of a context-free grammar are recursive.

Rules from F.A. to CFG

For each state there is a nonterminal symbol.
If state 𝐴 has a transition to state 𝐵 on symbol 𝑎, introduce 𝐴→𝑎𝐵.
If 𝐴 goes to 𝐵 on input 𝜆, introduce 𝐴→𝐵.
If 𝐴 is an accepting state, introduce 𝐴→𝜆.
Make the start state of the NFA be the start symbol of the grammar.

Ex.

Why do not we use c.f.g to replac r.e?

r.e. => easy & clear description for token.
r.e. => efficient token recognizer
modularizing the components (The grammar rules use regular expressions as components)

Feature of programming language

contents:

declarations
sequential statements
iterative statements
conditional statements

Description of programming language

Syntax Diagrams
Context Free Grammars (CFG)

Capabilities of Context-free grammars

give precise syntactic specification of programming languages
a parser can be constructed automatically by CFG
the syntax entity specified in CFG can be used for translating into object code.
useful for describing nested structures such as balanced parentheses, matching begin-end’s, corresponding if-then-else, etc.

Context-Free Grammars: Concepts and Notation

A context-free grammar 𝐺=
- A finite terminal vocabulary
  - The token set produced by scanner
- A finite set of nonterminal vocabulary
  - Intermediate symbols
A start symbol 𝑆∈ that starts all derivations
- Also called goal symbol
𝑃, a finite set of productions (rewriting rules) of the form
- 𝐴→𝜆 is a valid production

Derivation

One step derivation
- If 𝐴→𝛾, then 𝛼𝐴𝛽⇒𝛼𝛾𝛽
One or more steps derivation ⟹^+
Zero or more steps derivation ⟹^∗

If 𝑆⟹, then 𝛽 is said to be sentential form of the CFG

SF(G) is the set of sentential forms of grammar G (may contain nonterminal vocabulary )

𝐿(𝐺)={ if }

𝐿(𝐺)=SF(G)∩

回溯CFG

Errors in Context-Free Grammars

unless nonterminal

ambiguous

rewrite CFG
specifying the intention(使用括號)

Transforming Extened BNF Grammars

Extended BNF≡BNF

Extended BNF allows
- Square bracket []
- Optional list {}

Parsers and Recognizers

Two general approaches to parsing

Top-down parser
- Expanding the parse tree (via predictions) in a depth-first manner
- Preorder traversal of the parse tree
- Predictive in nature
- lm
- LL
Buttom-down parser
- Beginning at its bottom (the leaves of the tree, which are terminal symbols) and determining the productions used to generate the leaves
- Postorder traversal of the parse tree
- rm
- LR

Grammar Analysis Algorithms

Goal of this section

discuss a number of important analysis algo. for grammars

Cont’d

data structure of a grammar G

what nonterminals can derive

A => BCD => BC => B =>
An iterative marking algo.

ex.

Rule

Follow(A)

A is any nonterminal
Follow(A) is the set of terminals that may follow A in some sentential form (跟隨在A後面的終止符號)
- $Follow(A) = {a \in V_t | S=> ^* …Aa…} \cup {if, S=> ^+aA , then , \

Compiler

Compiler CH4 Grammars and Parsing

https://z-hwa.github.io/webHome/[object Object]/Compiler/Compiler-CH4-Grammars-and-Parsing/

作者

crown tako

發布於

2024年4月18日

許可協議

Compiler CH3 Scanning 上一篇

Compiler CH5 Top-Down Parsing 下一篇