Compiler CH5 Top-Down Parsing
Compiler CH5 Top-Down Parsing
objectves of Top-Down Parsing
an attempt to find a leftmost derivation for an input string
an attempt to construct a parse tree for the input string starting from the root and creating the nodes of the parse tree in preorder
2 forms
- Recursive-descent parsers
- contain a set of mutually recursive procedures that cooperate to parse a string.
- Code for these procedures can be written directly from a suitable grammar.
- Table-driven LL parsers
- use a generic LL(k) parsing engine and a parse table that directs the activity of the engine.
- The entries for the parse table are determined by the particular LL(k) grammar. The notation LL(k) is explained below.
basic method of recursive-descent
using EBNF(擴展巴斯科範式)
- Consider now the case of an exp in the grammar for simple arithmetic expressions in BNF:
- 𝑒𝑥𝑝→exp〖𝑎𝑑𝑑𝑜𝑝 𝑡𝑒𝑟𝑚 | 𝑡𝑒𝑟𝑚〗(可能不斷走exp)
- The solution is to use the EBNF rule
- 𝑒𝑥𝑝→𝑡𝑒𝑟𝑚{𝑎𝑑𝑑𝑜𝑝 𝑡𝑒𝑟𝑚}
Approaches of Top-parsing
with backtracking (making repeated scans of the input, a general form of top-down parsing)
Methods: To create a procedure for each nonterminal.
- 為每個終止符創造procedure
problems?
left-recursion (can cause a top-down parser to go into an infinite loop)
- Def. A grammar is said to be left-recursive
- if it has a nonterminal 𝐴 s.t. there is a derivation 𝐴⟹𝐴𝛿 for some 𝛿.
- Def. A grammar is said to be left-recursive
backtracking - undo not only the movement but also the semantics entering in symbol table.
- 回溯不僅撤銷動作,也撤銷進入符號表的語意
the order the alternatives are tried (For the grammar shown above, try 𝑤=𝑐𝑎𝑏𝑑 where 𝐴→𝑎 is applied first)
- 我看不懂…
LL(1) predict function
single symbol lookahead
The limitation of LL(1)
- LL(1) contains exactly those grammars that have disjoint predict sets for productions that share a common left-hand side
- 會包含公共左側且不相交的預測集
A grammar G is LL(1) if and only if whenever 𝐴→𝛼|𝛽 are two distinct productions of G, the following conditions hold:
- For no terminal 𝒂 do both 𝛼 and 𝛽 derive strings beginning with 𝒂.
- At most one of 𝛼 and 𝛽 can derive the empty string.
- If 𝛽⇒
then 𝛼 does not derive any string beginning with a terminal in FOLLOW(𝐴). - Likewise, if 𝛼⇒
then 𝛽 does not derive any string beginning with a terminal in FOLLOW(𝐴)
- Likewise, if 𝛼⇒
First set
Ex.
Follow set
Ex.
LL(1) prediction function
A grammar 𝐺 is LL(1) if and only if whenever 𝐴→𝛼|𝛽 are two distinct productions of 𝐺, the following conditions hold:
- For no terminal 𝑎 do both 𝛼 and 𝛽 derive strings beginning with 𝑎. First(𝛼)∩First(𝛽)=𝜙
- At most one of 𝛼 and 𝛽 can derive the empty string.
- If 𝛽⟹
, then 𝛼 does not derive any string beginning with a terminal in Follow(𝐴). - Likewise, if 𝛼⟹
, then 𝛽 does not derive any string beginning with a terminal in Follow(𝐴). - First(𝛼)∩Follow(𝐴)=𝜙 (i.e. If First(𝛼) contains 𝜆 then First(𝛽)∩Follow(𝐴)=𝜙 )
- Likewise, if 𝛼⟹
end of file token
- $
LL(1) Parse Table
- An LL(1) parse table
- The definition of 𝑇
- 𝑇[𝐴][𝑡]=
if 𝑡∈ - 𝑇[𝐴][𝑡]=Error, otherwise
- 𝑇[𝐴][𝑡]=
LL(1) Parsing
* 𝑆→(𝑆)𝑆
- 𝑆→𝜆
- Input String: ()
- then
- 𝑆⟹${lm} (𝑆)𝑆⟹{lm} ()𝑆⟹_{lm} ()$
Elimination of left recursion
- It is possible for a recursive-descent parser to loop forever. A problem arises with “left-recursive” productions like
- expr → expr + term
- A left-recursive production can be eliminated by rewriting the offending production. Consider a nonterminal A with two productions
- A → A𝛼|𝛽
- For example, A= expr, 𝛼= + term, 𝛽= term
We can convert left recursion to right recursion in the following manner, using a new nonterminal R:
- A →𝛽R
- R → 𝛼R|ϵ
Immediate left recursion can be eliminated by the following technique, which works for any number of 𝐴-productions. First, group the productions as
- 𝐴→
where no 𝛽_𝑖 begins with an 𝐴. Then, replace the 𝐴-productions by
- 𝐴→
- 𝐴′→
algorithm: Elimination left recursion
Input: Context-free Grammar G with no cycles or 𝜆-production
Methods:
- Arrange the nonterminals in some order
for 𝑖=1 to 𝑛 do { for 𝑗=1 to 𝑖−1 do { replace each production of the form 𝐴_𝑖→𝐴_𝑗 𝛾 by the production 𝐴_𝑖→𝛿_1 𝛾|𝛿_2 𝛾|⋯|𝛿_𝑘 𝛾, where 𝐴_𝑗→𝛿_1 |𝛿_2 |⋯|𝛿_𝑘 are all current 𝐴_𝑗-production; } eliminate the immediate left-recursion among the 𝐴_𝑖-production; }
Ex.
e.g.
𝑆 →𝐴𝑎 | 𝑏
𝐴 → 𝐴𝑐 | 𝑆𝑑 | 𝑒
- Step 1: ==> 𝑆 → 𝐴𝑎 | 𝑏
- Step 2: ==> 𝐴 → 𝐴𝑐 | 𝐴𝑎𝑑 | 𝑏𝑑 | 𝑒
- Step 3: ==> 𝐴 → 𝑏𝑑𝐴′ |𝑒𝐴′ 𝐴′ → 𝑐𝐴′ |𝑎𝑑𝐴′ |
Non-backtracking(recursive-descent) parsing
recursive descent : use a collection of mutually recursive routines to perform the syntax analysis.
Predicative Parsing
Features:
- maintains a stack rather than recursive calls
- table-driven
Components:
- An input buffer with end marker ($)
- A stack with endmarker ($) on the bottom
- A parsing table, a two-dimensional array 𝑀[𝐴,𝑎]
- where ‘𝐴’ is a nonterminal symbol and ‘𝑎’ is the current input symbol (terminal/token).
Parsing Table
- an ex.
Algorithm
input
- an input string
and a parsing table for grammar G
ouput
- a leftmost derivation of
or an error indication
initially w
- Method
construct a Predicative parsing table
LL(1) grammar
A grammar whose parsing table has no multiply-defined entries is said to be LL(1).
First ‘L’ : scan the input from left to right.
Second ‘L’ : produce a leftmost derivation.
‘1’ : use one input symbol to determine parsing action.
- No ambiguous or left-recursive grammar can be LL(1).