yara规则学习与使用

最近需要对分析的病毒提供一定的检测能力。看了一圈发现yara规则比较满足我的需求。
本文包括：

yara规则的简单介绍
yara规则的编写（字符串定义和条件定义）（基本就是官网翻译了）
如何在python语言中使用yara（简单使用）

一、简介&安装

简介：vt开发的一个用于编写恶意软件识别和分类规则的工具。

官方的github库地址：https://github.com/VirusTotal/yara/releases

官方文档说明：https://yara.readthedocs.io

简单示例：

rule silent_banker : banker
{
    meta:
        description ="This is just an example"
        threat_level =3
        in_the_wild =true
    strings:
        $a={6A 4068 00 30 00 00 6A 14 8D 91}$b={8D 4D B0 2B C1 83 C0 2799 6A 4E 59 F7 F9}$c="UVODFRYSIHLNWPEJXQZAKCBGMT"
    condition:
        $a or $b or $c}

安装：下载即可使用
使用：yara.exe rule.yara 待检测文件或目录

二、yara规则编写

一般规则分为：字符串和条件两个部分。
字符串定义软件中可能出现的字符串。
条件将字符串出现进行组合更好的筛选程序。

//两种简单的字符串形式
rule ExampleRule
{
    strings:
        $my_text_string="text here"$my_hex_string={ E2 34 A1 C8 23 FB }

    condition:
        $my_text_string or $my_hex_string}

2.1关键字

关键字规则与c语言类似

all    and    any    ascii    at    base64    base64wide    condition
contains    endswith    entrypoint    false    filesize    for    fullword    global
import    icontains    iendswith    iequals    in    include    int16    int16be
int32    int32be    int8    int8be    istartswith    matches    meta    nocase
none    not    of    or    private    rule    startswith    strings
them    true    uint16    uint16be    uint32    uint32be    uint8    uint8be
wide    xor    defined

2.2 字符串定义（string）

字符串以$开头，使用数字、下划线、字符串进行命名。可以使用”或者{}进行字符串的定义

$my_hex_string={ E2 34 A1 C8 23 FB }$hex_string={ E2 34 ?? C8 A? FB }//？为通配符
$hex_string={ F4 23[4-6]62 B4 }//任意填充4-6个字节
$hex_string={ F4 23(62 B4 |56)45}//63 B4或者56选择其中一个
$my_text_string="text here\"\\\r\t\n\xdd"//和c语言中的字符串定义一样

字符串的修饰：在定义了字符串后可以用一些修饰词对其进行修饰，并且支持同时使用多个修饰词，如nocase表示忽略大小写

$text_string="foobar" nocase//忽略大小写，可以匹配Foobar, FOOBAR, and fOoBaR
$wide_string="Borland" wide//表示匹配宽字节，B\x00p\x00这种
$wide_and_ascii_string="Borland" wide ascii//可以同事匹配wide或者ascii
$xor_string="This program cannot" xor//可以发现异或后的字符串
$a="This program cannot" base64//发现base64加密的字符串
$a="This program cannot" base64("!@#$%^&*(){}[].,|ABCDEFGHIJ\x09LMNOPQRSTUVWXYZabcdefghijklmnopqrstu")
//支持自定义base64的表
fullword关键字，需要整个词匹配。如domain不能匹配www.mydomain.com，匹配www.my-domain.com和

关键词的组合限制
关键词作用限制，无法同时使用nocase忽略大小写xor base64 base64widewide宽字节UTF16ascii匹配ascii字符xor单字节异或nocase base64 base64widebase64匹配base64后的nocase xor fullwordbase64wide匹配base64后的交错0x00的字符串nocase xor fullwordfullword严格匹配完整字符base64 base64wide
正则表达式方式：使用/和/将正则内容包裹（https://www.runoob.com/regexp/regexp-tutorial.html 正则学习）

$re1= /md5: [0-9a-fA-F]{32}/
$re2= /state: (on|off)/
$re1= /foo/i    // 大小不敏感
$re2= /bar./s   // In this regexp the dot matches everything, including new-line
$re3= /baz./is  // Both modifiers can be used together

正则表达式特殊字符学习
符号含义\匹配一个字符。\，|，等^匹配开头$匹配结尾.匹配任意单个字符()匹配括号里的内容[]匹配【】里的任意内容匹配0或多次+至少匹配一次?匹配0或1次{n}匹配n次{n,}至少匹配n次{,m}最多匹配m次{n,m}匹配n到m次\ttab\n换行\r回车\xNN某个字符\w匹配一个单词(数字，字母，下划线)\W匹配非单词\s匹配一个空白字符\S匹配非空白字符\d匹配数字\D匹配非数字\b单词边界\B非单词边界

2.3 条件定义（condition）

条件定义与编程的布尔表达式基本一致

布尔类型：and、or、not
关系运算：>=、<=、<、>、==、!=
算术运算：+、-、*、、\、%
位运算：&、|、<<、>>、~、^

井号（#）表示统计出现次数

rule CountExample
{
    strings:
        $a="dummy1"$b="dummy2"

    condition:
        #a == 6 and #b > 10#a in (filesize-500..filesize) == 2 //可以范围统计}

at表示偏移或虚拟地址

rule AtExample
{
    strings:
        $a="dummy1"$b="dummy2"

    condition:
        $a at 100 and $b at 200//$a出现在100偏移
}

in表示范围寻找

rule InExample
{
    strings:
        $a="dummy1"$b="dummy2"

    condition:
        $ain(0..100) and $bin(100..filesize)}

关键词filesize表示文件大小，表示文件大于200kb，只对文件时生效

rule FileSizeExample
{
    condition:
        filesize > 200KB
}

关键词entrypoint表示程序的入口点，常用于查看是否为壳或是否感染

rule EntryPointExample1
{
    strings:
        $a={ E8 00 00 00 00 }

    condition:
        $a at entrypoint
}

rule EntryPointExample2
{
    strings:
        $a={ 9C 5066 A1 ?? ?? ?? 00 66 A9 ?? ?? 58 0F 85}

    condition:
        $ain(entrypoint..entrypoint + 10)}

从文件或内存偏移获取数据

int8(<offset or virtual address>)
int16(<offset or virtual address>)
int32(<offset or virtual address>)

uint8(<offset or virtual address>)
uint16(<offset or virtual address>)
uint32(<offset or virtual address>)

int8be(<offset or virtual address>)
int16be(<offset or virtual address>)
int32be(<offset or virtual address>)

uint8be(<offset or virtual address>)
uint16be(<offset or virtual address>)
uint32be(<offset or virtual address>)

rule IsPE
{
    condition:
        // MZ signature at offset 0 and ...
        uint16(0)== 0x5A4D and
        // ... PE signature at offset stored in MZ header at 0x3C
        uint32(uint32(0x3C))== 0x00004550
}

字符串集合：可以使用括号，或者通配符*来表示，所有字符串可以使用them

rule OfExample1
{
    strings:
        $a="dummy1"$b="dummy2"$c="dummy3"$foo1="foo1"$foo2="foo2"$foo3="foo3"

    condition:
        2 of ($a,$b,$c)2 of ($foo*)  // equivalent to 2 of ($foo1,$foo2,$foo3)1 of them // equivalent to 1 of ($*)}

all of them       // all strings in the rule
any of them       // any string in the rule
all of ($a*)      // all strings whose identifier starts by $a
any of ($a,$b,$c) // any of $a, $b or $c1 of ($*)         // same that "any of them"
none of ($b*)     // zero of the set of strings that start with "$b"

针对字符串的遍历,#表示出现次数，@表示第一个偏移量，!表示字符串长度

for all of them :(# > 3 )for all of ($a*):( @ > @b )

迭代遍历

for any section in pe.sections :( section.name ==".text")for any i in(0..pe.number_of_sections-1):( pe.sections[i].name ==".text")for any k,v in some_dict :( k =="foo" and v=="bar")for<quantifier><variables>in<iterable>:(<some condition using the loop variables>)

参考其他规则，可以直接复用其他规则

rule Rule1
{
    strings:
        $a="dummy1"

    condition:
        $a}

rule Rule2
{
    strings:
        $a="dummy2"

    condition:
        $a and Rule1
}

2.4 其他语法

全局规则（global）：所有其他规则都会带上全局规则限制

global rule SizeLimit
{
    condition:
        filesize < 2MB
}

私有规则：不会有检测输出，作为其他规则的配套规则

private rule PrivateRuleExample
{...
}

Metadata：存放规则的相关信息

rule MetadataExample
{
    meta:
        my_identifier_1 ="Some string data"
        my_identifier_2 =24
        my_identifier_3 =true

    strings:
        $my_text_string="text here"$my_hex_string={ E2 34 A1 C8 23 FB }

    condition:
        $my_text_string or $my_hex_string}

引入第三方的库

import"pe"import"cuckoo"

rule Test
{
    strings:
        $a="some string"

    condition:
        $a and pe.entry_point == 0x1000
}

引入其他的yara文件

include "other.yar"
include "./includes/other.yar"
include "../includes/other.yar"

三、在python中使用yara规则

安装yara-python库

pip install yara-python

简单demo

import yara
import os

# 获取目录内的yara规则文件# 将yara规则编译defgetRules(path):
    filepath ={}for index,fileinenumerate(os.listdir(path)):
        rupath = os.path.join(path,file)
        key ="rule"+str(index)
        filepath[key]= rupath
    yararule = yara.compile(filepaths=filepath)return yararule

# 扫描函数defscan(rule, path):forfilein os.listdir(path.decode("utf-8")):
        mapath = os.path.join(path,file)print malpath
        fp =open(mapath,'rb')
        matches = rule.match(data=fp.read())iflen(matches)>0:printfile, matches

if __name__ =='__main__':
    rulepath ="/home/authenticate/yara/rule_yara/"# yara规则目录
    malpath ="/home/authenticate/yara/test_simple/"# simple目录# yara规则编译函数调用
    yararule = getRules(rulepath)# 扫描函数调用
    scan(yararule, malpath)

四、总结

规则编写主要分为字符串编写和条件编写难度都不大，但是如何能够写出准确、通用性好、误报少的还是挺难的，需要多写写和想象力。
参考：
官方的github库地址：https://github.com/VirusTotal/yara/releases
官方文档说明：https://yara.readthedocs.io
python中使用yara的demo： https://blog.csdn.net/weixin_40596016/article/details/79865670

标签：安全

本文转载自: https://blog.csdn.net/abel_big_xu/article/details/125381650
版权归原作者 努力学习的大康 所有，如有侵权，请联系我们删除。

yara规则学习与使用

一、简介&安装

二、yara规则编写

2.1关键字

2.2 字符串定义（string）

2.3 条件定义（condition）

2.4 其他语法

三、在python中使用yara规则

四、总结

发表评论

“yara规则学习与使用”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航