编译原理笔记(二)——正则表达式和NFA、DFA转化原理
#mermaid-svg-tRPiQapdJeShdcwB .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-tRPiQapdJeShdcwB .label text{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .node rect,#mermaid-svg-tRPiQapdJeShdcwB .node circle,#mermaid-svg-tRPiQapdJeShdcwB .node ellipse,#mermaid-svg-tRPiQapdJeShdcwB .node polygon,#mermaid-svg-tRPiQapdJeShdcwB .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-tRPiQapdJeShdcwB .node .label{text-align:center;fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .node.clickable{cursor:pointer}#mermaid-svg-tRPiQapdJeShdcwB .arrowheadPath{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-tRPiQapdJeShdcwB .flowchart-link{stroke:#333;fill:none}#mermaid-svg-tRPiQapdJeShdcwB .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-tRPiQapdJeShdcwB .edgeLabel rect{opacity:0.9}#mermaid-svg-tRPiQapdJeShdcwB .edgeLabel span{color:#333}#mermaid-svg-tRPiQapdJeShdcwB .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-tRPiQapdJeShdcwB .cluster text{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-tRPiQapdJeShdcwB .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-tRPiQapdJeShdcwB text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-tRPiQapdJeShdcwB .actor-line{stroke:grey}#mermaid-svg-tRPiQapdJeShdcwB .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB .sequenceNumber{fill:#fff}#mermaid-svg-tRPiQapdJeShdcwB #sequencenumber{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB #crosshead path{fill:#333;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB .messageText{fill:#333;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-tRPiQapdJeShdcwB .labelText,#mermaid-svg-tRPiQapdJeShdcwB .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-tRPiQapdJeShdcwB .loopText,#mermaid-svg-tRPiQapdJeShdcwB .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-tRPiQapdJeShdcwB .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-tRPiQapdJeShdcwB .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-tRPiQapdJeShdcwB .noteText,#mermaid-svg-tRPiQapdJeShdcwB .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-tRPiQapdJeShdcwB .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-tRPiQapdJeShdcwB .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-tRPiQapdJeShdcwB .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-tRPiQapdJeShdcwB .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .section{stroke:none;opacity:0.2}#mermaid-svg-tRPiQapdJeShdcwB .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-tRPiQapdJeShdcwB .section2{fill:#fff400}#mermaid-svg-tRPiQapdJeShdcwB .section1,#mermaid-svg-tRPiQapdJeShdcwB .section3{fill:#fff;opacity:0.2}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle0{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle1{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle2{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle3{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-tRPiQapdJeShdcwB .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .grid path{stroke-width:0}#mermaid-svg-tRPiQapdJeShdcwB .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-tRPiQapdJeShdcwB .task{stroke-width:2}#mermaid-svg-tRPiQapdJeShdcwB .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .taskText:not([font-size]){font-size:11px}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-tRPiQapdJeShdcwB .task.clickable{cursor:pointer}#mermaid-svg-tRPiQapdJeShdcwB .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-tRPiQapdJeShdcwB .taskText0,#mermaid-svg-tRPiQapdJeShdcwB .taskText1,#mermaid-svg-tRPiQapdJeShdcwB .taskText2,#mermaid-svg-tRPiQapdJeShdcwB .taskText3{fill:#fff}#mermaid-svg-tRPiQapdJeShdcwB .task0,#mermaid-svg-tRPiQapdJeShdcwB .task1,#mermaid-svg-tRPiQapdJeShdcwB .task2,#mermaid-svg-tRPiQapdJeShdcwB .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutside0,#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutside2{fill:#000}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutside1,#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutside3{fill:#000}#mermaid-svg-tRPiQapdJeShdcwB .active0,#mermaid-svg-tRPiQapdJeShdcwB .active1,#mermaid-svg-tRPiQapdJeShdcwB .active2,#mermaid-svg-tRPiQapdJeShdcwB .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-tRPiQapdJeShdcwB .activeText0,#mermaid-svg-tRPiQapdJeShdcwB .activeText1,#mermaid-svg-tRPiQapdJeShdcwB .activeText2,#mermaid-svg-tRPiQapdJeShdcwB .activeText3{fill:#000 !important}#mermaid-svg-tRPiQapdJeShdcwB .done0,#mermaid-svg-tRPiQapdJeShdcwB .done1,#mermaid-svg-tRPiQapdJeShdcwB .done2,#mermaid-svg-tRPiQapdJeShdcwB .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-tRPiQapdJeShdcwB .doneText0,#mermaid-svg-tRPiQapdJeShdcwB .doneText1,#mermaid-svg-tRPiQapdJeShdcwB .doneText2,#mermaid-svg-tRPiQapdJeShdcwB .doneText3{fill:#000 !important}#mermaid-svg-tRPiQapdJeShdcwB .crit0,#mermaid-svg-tRPiQapdJeShdcwB .crit1,#mermaid-svg-tRPiQapdJeShdcwB .crit2,#mermaid-svg-tRPiQapdJeShdcwB .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-tRPiQapdJeShdcwB .activeCrit0,#mermaid-svg-tRPiQapdJeShdcwB .activeCrit1,#mermaid-svg-tRPiQapdJeShdcwB .activeCrit2,#mermaid-svg-tRPiQapdJeShdcwB .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-tRPiQapdJeShdcwB .doneCrit0,#mermaid-svg-tRPiQapdJeShdcwB .doneCrit1,#mermaid-svg-tRPiQapdJeShdcwB .doneCrit2,#mermaid-svg-tRPiQapdJeShdcwB .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-tRPiQapdJeShdcwB .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-tRPiQapdJeShdcwB .milestoneText{font-style:italic}#mermaid-svg-tRPiQapdJeShdcwB .doneCritText0,#mermaid-svg-tRPiQapdJeShdcwB .doneCritText1,#mermaid-svg-tRPiQapdJeShdcwB .doneCritText2,#mermaid-svg-tRPiQapdJeShdcwB .doneCritText3{fill:#000 !important}#mermaid-svg-tRPiQapdJeShdcwB .activeCritText0,#mermaid-svg-tRPiQapdJeShdcwB .activeCritText1,#mermaid-svg-tRPiQapdJeShdcwB .activeCritText2,#mermaid-svg-tRPiQapdJeShdcwB .activeCritText3{fill:#000 !important}#mermaid-svg-tRPiQapdJeShdcwB .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-tRPiQapdJeShdcwB g.classGroup text .title{font-weight:bolder}#mermaid-svg-tRPiQapdJeShdcwB g.clickable{cursor:pointer}#mermaid-svg-tRPiQapdJeShdcwB g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-tRPiQapdJeShdcwB g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-tRPiQapdJeShdcwB .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-tRPiQapdJeShdcwB .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-tRPiQapdJeShdcwB .dashed-line{stroke-dasharray:3}#mermaid-svg-tRPiQapdJeShdcwB #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB .commit-id,#mermaid-svg-tRPiQapdJeShdcwB .commit-msg,#mermaid-svg-tRPiQapdJeShdcwB .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-tRPiQapdJeShdcwB g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-tRPiQapdJeShdcwB .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-tRPiQapdJeShdcwB .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-tRPiQapdJeShdcwB .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-tRPiQapdJeShdcwB .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-tRPiQapdJeShdcwB .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-tRPiQapdJeShdcwB .edgeLabel text{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .node circle.state-start{fill:black;stroke:black}#mermaid-svg-tRPiQapdJeShdcwB .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-tRPiQapdJeShdcwB #statediagram-barbEnd{fill:#9370db}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-state .divider{stroke:#9370db}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-tRPiQapdJeShdcwB .note-edge{stroke-dasharray:5}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-tRPiQapdJeShdcwB .error-icon{fill:#522}#mermaid-svg-tRPiQapdJeShdcwB .error-text{fill:#522;stroke:#522}#mermaid-svg-tRPiQapdJeShdcwB .edge-thickness-normal{stroke-width:2px}#mermaid-svg-tRPiQapdJeShdcwB .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-tRPiQapdJeShdcwB .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-tRPiQapdJeShdcwB .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-tRPiQapdJeShdcwB .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-tRPiQapdJeShdcwB .marker{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .marker.cross{stroke:#333}
:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}#mermaid-svg-tRPiQapdJeShdcwB {
color: rgba(0, 0, 0, 0.75);
font: ;
}
Thompson算法
子集构造算法
Hopcroft算法
RE
NFA
DFA
词法分析器
从RE到NFA(Thompson算法)
Thompson
Kenneth Lane Thompson(1943年2月4日 -),计算机科学家,美国计算机学者,1983年图灵奖得主,因为其在Unix上做出的重大贡献,
算法介绍
基于对RE结构做归纳
- 对基本的RE结构,直接构造
- 对于复合的RE结构,递归构造
RE正则表达式的五种结构
e -> 空串j | c | e1 e2 连接 | e1 | e2 选择 | e1* 闭包
- 对于空串
j
,其属于一个基本的RE结构,直接构造,NFA为
- 对于字符c,其也为一个基本的RE结构,NFA的构造为
- 对于连接
e1、e2
- 对于选择
e1|e2
- 闭包
e1*
实例
比如将**a(b|c)***转化成NFA
- 首先需要注意优先级
() > 闭包 > 连接 > 选择,所以上面的描述应该是b和c的选择的闭包再和a的连接
- 从左至右将每个基本模块拆列出来
- 然后连接上去
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IWzmhuq2-1637423203277)(https://i.loli.net/2021/11/12/dwXeZsE9cuMAtCR.png)]
从NFA到DFA(子集构造算法)
算法思想
从初始状态
q0
出发,看他识别一个字符能到达的所有状态的集合
q1
,空字符是不需要消耗字符,然后再从集合q1出发,看他能到达的下一个状态的集合q2,依次类推下去,直到所有状态都被遍历完毕
实例
以RE表达式
a(b|c)*
的NFA为列,生成DFA的过程为
从初始状态q0开始,识别a,能到达的状态为**{1,2,3,4,6,9},q1 = {1,2,3,4,6,9},识别b和c没有任何状态,再从q1开始,对于q1内部的子状态,分别再识别符号表中的a,b,c,到4时,识别b,能到{5,8,9,3,4,6}=q2**,然后再从q2出发…
最终得到四个合并状态,他们的简化表示为
空字符闭包的计算(伪代码)
深度优先遍历
set closure ={}
void eps_closure(x)
closure +={x}
foreach(y: x -- 空字符 --> y)
if(!visited(y))
eps->closure(y)# 因为有一个判断结点是否访问的visited,时间复杂度为O(N)
广度优先遍历
set closure ={}
queue =[]
void eps_closure(x)=
queue =[x]
while(queue not empty)
q <- deQueue(queue)
closure +={q}
foreach(y: q--空字符-->y)
if(!visited(y))
enQueue(queue,y)
工作表算法
q0 <- eps_closure(n0)
Q <- {q0}
workList <- q0
while(workList !=[])
remove q from workList
foreach(character c)
t <- eps_closure(delta(q, c))
D[q, c]<- t
if(t not in Q)add t to Q and workList
minDFA算法(Hopcroft算法)
Hopcroft教授是美国理论计算机科学家,美国科学院、工程院及艺术和科学院院士, 康奈尔大学终身教授,1986获得 “图灵奖” 、2010年获得 “冯诺依曼奖章” 、2016年获得 “中国政府友谊奖” 、2017获得西蒙雷曼创始人奖
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OGipZDAP-1637423203280)(https://i.loli.net/2021/11/13/ZwBUuQPFDi4r3Hj.png)]
基于等价类的思想,拆分子集
其伪代码:
split(s)
foreach(character c)
if(c can split S)split S into T1,T2,T3.....
hopcroft()split all nodes into N,A
while(set is still changes)
split(S)
现将DFA拆分成,
终止状态集A
和
正常状态集N
,然后再在N和A里面拆分,直到不能再拆分为止,从大集合到不能再拆分的小集合,即同一个大集合里面的几个状态扫描一个字符能到同样的状态,那么这几个状态就组成了一个集合,反之就要拆分出来,不断这样下去,一直拿到这个最小集合此时这个最小集合就是minDFA
实例
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W81F5HkH-1637423203283)(https://i.loli.net/2021/11/13/6zMjGqfVUlFNrCg.png)]
版权归原作者 问号小朋友 所有, 如有侵权,请联系我们删除。