0


编译原理笔记(二)——正则表达式到有限状态自动机

编译原理笔记(二)——正则表达式和NFA、DFA转化原理

#mermaid-svg-tRPiQapdJeShdcwB .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-tRPiQapdJeShdcwB .label text{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .node rect,#mermaid-svg-tRPiQapdJeShdcwB .node circle,#mermaid-svg-tRPiQapdJeShdcwB .node ellipse,#mermaid-svg-tRPiQapdJeShdcwB .node polygon,#mermaid-svg-tRPiQapdJeShdcwB .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-tRPiQapdJeShdcwB .node .label{text-align:center;fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .node.clickable{cursor:pointer}#mermaid-svg-tRPiQapdJeShdcwB .arrowheadPath{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-tRPiQapdJeShdcwB .flowchart-link{stroke:#333;fill:none}#mermaid-svg-tRPiQapdJeShdcwB .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-tRPiQapdJeShdcwB .edgeLabel rect{opacity:0.9}#mermaid-svg-tRPiQapdJeShdcwB .edgeLabel span{color:#333}#mermaid-svg-tRPiQapdJeShdcwB .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-tRPiQapdJeShdcwB .cluster text{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-tRPiQapdJeShdcwB .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-tRPiQapdJeShdcwB text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-tRPiQapdJeShdcwB .actor-line{stroke:grey}#mermaid-svg-tRPiQapdJeShdcwB .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB .sequenceNumber{fill:#fff}#mermaid-svg-tRPiQapdJeShdcwB #sequencenumber{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB #crosshead path{fill:#333;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB .messageText{fill:#333;stroke:#333}#mermaid-svg-tRPiQapdJeShdcwB .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-tRPiQapdJeShdcwB .labelText,#mermaid-svg-tRPiQapdJeShdcwB .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-tRPiQapdJeShdcwB .loopText,#mermaid-svg-tRPiQapdJeShdcwB .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-tRPiQapdJeShdcwB .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-tRPiQapdJeShdcwB .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-tRPiQapdJeShdcwB .noteText,#mermaid-svg-tRPiQapdJeShdcwB .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-tRPiQapdJeShdcwB .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-tRPiQapdJeShdcwB .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-tRPiQapdJeShdcwB .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-tRPiQapdJeShdcwB .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .section{stroke:none;opacity:0.2}#mermaid-svg-tRPiQapdJeShdcwB .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-tRPiQapdJeShdcwB .section2{fill:#fff400}#mermaid-svg-tRPiQapdJeShdcwB .section1,#mermaid-svg-tRPiQapdJeShdcwB .section3{fill:#fff;opacity:0.2}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle0{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle1{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle2{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle3{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-tRPiQapdJeShdcwB .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .grid path{stroke-width:0}#mermaid-svg-tRPiQapdJeShdcwB .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-tRPiQapdJeShdcwB .task{stroke-width:2}#mermaid-svg-tRPiQapdJeShdcwB .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .taskText:not([font-size]){font-size:11px}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-tRPiQapdJeShdcwB .task.clickable{cursor:pointer}#mermaid-svg-tRPiQapdJeShdcwB .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-tRPiQapdJeShdcwB .taskText0,#mermaid-svg-tRPiQapdJeShdcwB .taskText1,#mermaid-svg-tRPiQapdJeShdcwB .taskText2,#mermaid-svg-tRPiQapdJeShdcwB .taskText3{fill:#fff}#mermaid-svg-tRPiQapdJeShdcwB .task0,#mermaid-svg-tRPiQapdJeShdcwB .task1,#mermaid-svg-tRPiQapdJeShdcwB .task2,#mermaid-svg-tRPiQapdJeShdcwB .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutside0,#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutside2{fill:#000}#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutside1,#mermaid-svg-tRPiQapdJeShdcwB .taskTextOutside3{fill:#000}#mermaid-svg-tRPiQapdJeShdcwB .active0,#mermaid-svg-tRPiQapdJeShdcwB .active1,#mermaid-svg-tRPiQapdJeShdcwB .active2,#mermaid-svg-tRPiQapdJeShdcwB .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-tRPiQapdJeShdcwB .activeText0,#mermaid-svg-tRPiQapdJeShdcwB .activeText1,#mermaid-svg-tRPiQapdJeShdcwB .activeText2,#mermaid-svg-tRPiQapdJeShdcwB .activeText3{fill:#000 !important}#mermaid-svg-tRPiQapdJeShdcwB .done0,#mermaid-svg-tRPiQapdJeShdcwB .done1,#mermaid-svg-tRPiQapdJeShdcwB .done2,#mermaid-svg-tRPiQapdJeShdcwB .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-tRPiQapdJeShdcwB .doneText0,#mermaid-svg-tRPiQapdJeShdcwB .doneText1,#mermaid-svg-tRPiQapdJeShdcwB .doneText2,#mermaid-svg-tRPiQapdJeShdcwB .doneText3{fill:#000 !important}#mermaid-svg-tRPiQapdJeShdcwB .crit0,#mermaid-svg-tRPiQapdJeShdcwB .crit1,#mermaid-svg-tRPiQapdJeShdcwB .crit2,#mermaid-svg-tRPiQapdJeShdcwB .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-tRPiQapdJeShdcwB .activeCrit0,#mermaid-svg-tRPiQapdJeShdcwB .activeCrit1,#mermaid-svg-tRPiQapdJeShdcwB .activeCrit2,#mermaid-svg-tRPiQapdJeShdcwB .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-tRPiQapdJeShdcwB .doneCrit0,#mermaid-svg-tRPiQapdJeShdcwB .doneCrit1,#mermaid-svg-tRPiQapdJeShdcwB .doneCrit2,#mermaid-svg-tRPiQapdJeShdcwB .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-tRPiQapdJeShdcwB .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-tRPiQapdJeShdcwB .milestoneText{font-style:italic}#mermaid-svg-tRPiQapdJeShdcwB .doneCritText0,#mermaid-svg-tRPiQapdJeShdcwB .doneCritText1,#mermaid-svg-tRPiQapdJeShdcwB .doneCritText2,#mermaid-svg-tRPiQapdJeShdcwB .doneCritText3{fill:#000 !important}#mermaid-svg-tRPiQapdJeShdcwB .activeCritText0,#mermaid-svg-tRPiQapdJeShdcwB .activeCritText1,#mermaid-svg-tRPiQapdJeShdcwB .activeCritText2,#mermaid-svg-tRPiQapdJeShdcwB .activeCritText3{fill:#000 !important}#mermaid-svg-tRPiQapdJeShdcwB .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-tRPiQapdJeShdcwB g.classGroup text .title{font-weight:bolder}#mermaid-svg-tRPiQapdJeShdcwB g.clickable{cursor:pointer}#mermaid-svg-tRPiQapdJeShdcwB g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-tRPiQapdJeShdcwB g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-tRPiQapdJeShdcwB .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-tRPiQapdJeShdcwB .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-tRPiQapdJeShdcwB .dashed-line{stroke-dasharray:3}#mermaid-svg-tRPiQapdJeShdcwB #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB .commit-id,#mermaid-svg-tRPiQapdJeShdcwB .commit-msg,#mermaid-svg-tRPiQapdJeShdcwB .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-tRPiQapdJeShdcwB g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-tRPiQapdJeShdcwB g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-tRPiQapdJeShdcwB .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-tRPiQapdJeShdcwB .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-tRPiQapdJeShdcwB .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-tRPiQapdJeShdcwB .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-tRPiQapdJeShdcwB .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-tRPiQapdJeShdcwB .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-tRPiQapdJeShdcwB .edgeLabel text{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-tRPiQapdJeShdcwB .node circle.state-start{fill:black;stroke:black}#mermaid-svg-tRPiQapdJeShdcwB .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-tRPiQapdJeShdcwB #statediagram-barbEnd{fill:#9370db}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-state .divider{stroke:#9370db}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-tRPiQapdJeShdcwB .note-edge{stroke-dasharray:5}#mermaid-svg-tRPiQapdJeShdcwB .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-tRPiQapdJeShdcwB .error-icon{fill:#522}#mermaid-svg-tRPiQapdJeShdcwB .error-text{fill:#522;stroke:#522}#mermaid-svg-tRPiQapdJeShdcwB .edge-thickness-normal{stroke-width:2px}#mermaid-svg-tRPiQapdJeShdcwB .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-tRPiQapdJeShdcwB .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-tRPiQapdJeShdcwB .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-tRPiQapdJeShdcwB .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-tRPiQapdJeShdcwB .marker{fill:#333}#mermaid-svg-tRPiQapdJeShdcwB .marker.cross{stroke:#333}

:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}#mermaid-svg-tRPiQapdJeShdcwB {
color: rgba(0, 0, 0, 0.75);
font: ;
}
Thompson算法

子集构造算法

Hopcroft算法

       RE 
     

       NFA 
     

       DFA 
     

       词法分析器 
     

从RE到NFA(Thompson算法)

Thompson

Kenneth Lane Thompson(1943年2月4日 -),计算机科学家,美国计算机学者,1983年图灵奖得主,因为其在Unix上做出的重大贡献,

算法介绍

基于对RE结构做归纳

  1. 对基本的RE结构,直接构造
  2. 对于复合的RE结构,递归构造

RE正则表达式的五种结构

e -> 空串j
    | c
    | e1 e2  连接
    | e1 | e2  选择
    | e1* 闭包
  • 对于空串j,其属于一个基本的RE结构,直接构造,NFA为

image-20211112142623485

  • 对于字符c,其也为一个基本的RE结构,NFA的构造为

image-20211112142747888

  • 对于连接e1、e2

image-20211112143154868

  • 对于选择e1|e2

image-20211112143451392

  • 闭包e1*

image-20211112143815025

实例

比如将**a(b|c)***转化成NFA

  • 首先需要注意优先级

() > 闭包 > 连接 > 选择,所以上面的描述应该是b和c的选择的闭包再和a的连接

  • 从左至右将每个基本模块拆列出来
  • 然后连接上去

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IWzmhuq2-1637423203277)(https://i.loli.net/2021/11/12/dwXeZsE9cuMAtCR.png)]

从NFA到DFA(子集构造算法)

算法思想

从初始状态

q0

出发,看他识别一个字符能到达的所有状态的集合

q1

,空字符是不需要消耗字符,然后再从集合q1出发,看他能到达的下一个状态的集合q2,依次类推下去,直到所有状态都被遍历完毕

实例

以RE表达式

a(b|c)*

的NFA为列,生成DFA的过程为

从初始状态q0开始,识别a,能到达的状态为**{1,2,3,4,6,9},q1 = {1,2,3,4,6,9},识别b和c没有任何状态,再从q1开始,对于q1内部的子状态,分别再识别符号表中的a,b,c,到4时,识别b,能到{5,8,9,3,4,6}=q2**,然后再从q2出发…

最终得到四个合并状态,他们的简化表示为

image-20211112152313278

空字符闭包的计算(伪代码)

深度优先遍历

set closure ={}

void eps_closure(x)
    closure +={x}
    foreach(y: x -- 空字符 --> y)
        if(!visited(y))
            eps->closure(y)# 因为有一个判断结点是否访问的visited,时间复杂度为O(N)

广度优先遍历

set closure ={}
queue =[]

void eps_closure(x)=
    queue =[x]
    while(queue not empty)
        q <- deQueue(queue)
        closure +={q}
        foreach(y: q--空字符-->y)
            if(!visited(y))
                enQueue(queue,y)

工作表算法

q0 <- eps_closure(n0)
Q <- {q0}

workList <- q0
while(workList !=[])
    remove q from workList
    foreach(character c)
        t <- eps_closure(delta(q, c))
        D[q, c]<- t
        if(t not in Q)add t to Q and workList

minDFA算法(Hopcroft算法)

Hopcroft教授是美国理论计算机科学家,美国科学院、工程院及艺术和科学院院士, 康奈尔大学终身教授,1986获得 “图灵奖” 、2010年获得 “冯诺依曼奖章” 、2016年获得 “中国政府友谊奖” 、2017获得西蒙雷曼创始人奖

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OGipZDAP-1637423203280)(https://i.loli.net/2021/11/13/ZwBUuQPFDi4r3Hj.png)]

基于等价类的思想,拆分子集

其伪代码:

split(s)
    foreach(character c)
        if(c can split S)split S into T1,T2,T3.....
            
hopcroft()split all nodes into N,A
    while(set is still changes)
        split(S)

现将DFA拆分成,

终止状态集A

正常状态集N

,然后再在N和A里面拆分,直到不能再拆分为止,从大集合到不能再拆分的小集合,即同一个大集合里面的几个状态扫描一个字符能到同样的状态,那么这几个状态就组成了一个集合,反之就要拆分出来,不断这样下去,一直拿到这个最小集合此时这个最小集合就是minDFA

实例

image-20211113235357478

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W81F5HkH-1637423203283)(https://i.loli.net/2021/11/13/6zMjGqfVUlFNrCg.png)]


本文转载自: https://blog.csdn.net/qq_48201696/article/details/121447930
版权归原作者 问号小朋友 所有, 如有侵权,请联系我们删除。

“编译原理笔记(二)——正则表达式到有限状态自动机”的评论:

还没有评论