首页
社区
课程
招聘
[原创]Angr基础笔记
发表于: 2019-5-10 14:28 11030

[原创]Angr基础笔记

2019-5-10 14:28
11030

最近学习angr,整理了一下相关资料内容

官方文档
3caK9s2c8@1M7s2y4Q4x3@1q4Q4x3V1k6Q4x3V1k6Y4K9i4c8Z5N6h3u0Q4x3X3g2U0L8$3#2Q4x3V1k6S2L8X3N6J5i4K6u0r3j5h3&6Y4M7W2)9J5k6r3c8G2j5H3`.`.
e86K9s2c8@1M7s2y4Q4x3@1q4Q4x3V1k6Q4x3V1k6S2L8X3N6J5i4K6u0W2K9h3!0Q4x3V1k6S2M7r3W2Q4x3X3c8V1L8$3y4Q4x3V1j5`.
ad8K9s2c8@1M7s2y4Q4x3@1q4Q4x3V1k6Q4x3V1k6V1L8$3y4K6i4K6u0W2j5h3&6Y4M7W2)9J5k6h3W2G2i4K6u0r3

PyVEX

Claripy

在内存位置上放符号变量
方便我们追踪并求解内存位置上的值

注意这里的s.se.add(bvs > 1000),构造了一个符号限制表达式。类似的逻辑限制表达式如下表:

获取内存数据
s.se是state的一个solver engine

给输入的值(stdin)加上限制

这里通过st.se.add(k != 10)限制了输入不为'\n'

实际用 angr 跑,会发现
● 跑了几个小时都还没有找到目标路径
● 跑着跑着就崩了
● 自动分析似乎很美好,但却隐藏着许多问题

情境
– 对符号执行来说, libc 里复杂无比,一旦进入 libc function 分析可能就挂在里面了
– Crypto function
– 看不懂的 syscall

这个时候可以采用Veritesting的技术

AST 是语法分析树的简写。 se 是符号执行的简写。
为什么要用Claripy?

如何修改寄存器的值?
state.regs.eax=state.solver.BVV(0x4,32)
state.regs.eax=4
以上两个在效果上都是一样的。

如何修改内存?如何获取内存

此处mem数组的索引就是内存地址
读取内存的两种方式

注意:如果使用resolved,获取到的是BV类型的值,如果是concrete,那么获取到的数值是采用python内置的数据类型的。

Symbolic Execution
– Angr: 446K9s2c8@1M7q4)9K6b7g2)9J5c8W2)9J5c8X3q4F1k6%4u0Q4x3X3g2A6L8#2)9J5c8R3`.`.
– KLEE: f12K9s2c8@1M7s2y4Q4x3@1q4Q4x3V1k6Q4x3V1k6C8L8r3g2W2i4K6u0W2k6$3W2@1K9s2g2T1i4K6u0W2K9h3!0Q4x3V1j5`.
– Triton: 9efK9s2c8@1M7q4)9K6b7g2)9J5c8W2)9J5c8Y4c8J5K9i4c8G2L8W2)9J5k6i4q4#2j5i4u0C8M7$3I4S2j5W2)9J5k6h3y4G2L8g2)9J5c8R3`.`.

 
>>> print b.loader.find_symbol_got_entry('__libc_start_main')
>>> print b.loader.main_bin.imports
{'__gmon_start__': <cle.elf.ELFRelocation at 0x7f9928941650>,
'__libc_start_main': <cle.elf.ELFRelocation at 0x7f9928941dd0>,
'__stack_chk_fail': <cle.elf.ELFRelocation at 0x7f9928941590>,
'fgets': <cle.elf.ELFRelocation at 0x7f9928941550>,
'getenv': <cle.elf.ELFRelocation at 0x7f9928406810>,
'printf': <cle.elf.ELFRelocation at 0x7f99284062d0>,
'ptrace': <cle.elf.ELFRelocation at 0x7f99286cca10>,
'puts': <cle.elf.ELFRelocation at 0x7f99284068d0>}
import angr
p = angr.Project("test")
ex = p.surveyors.Explorer(find=(0x400844, ), avoid=(0x400855,))
ex.run()
print ex.found[0].state.posix.dumps(0)
import angr
p = angr.Project("test")
initial_state = p.factory.entry_state()
pg = p.factory.path_group(initial_state)
pg.explore(find=(0x4005d1,))
print pg
# <PathGroup with 18 deadended, 4 active, 1 found>
print pg.found[0]
# <Path with 64 runs (at 0x4005d1)>
print pg.found[0].state.posix.dumps(0)
# input_string
>>> import angr
>>> b = angr.Project('/bin/true')
>>> s = b.factory.blank_state(addr=0x08048591)
>>> s = b.factory.entry_state()
# The first 5 bytes of the binary
>>> print s.memory.load(b.loader.min_addr(), 5)
import angr
import claripy
p = angr.Project("test")
args = claripy.BVS('args', 8*16)
initial_state = prog.factory.entry_state(args=["./vul", args])
pg = p.factory.path_group(initial_state)
pg.explore(find=(0x4005d1,))
print pg
# <PathGroup with 18 deadended, 4 active, 1 found>
print pg.found[0]
# <Path with 64 runs (at 0x4005d1)>
print pg.found[0].state.posix.dumps(0)
# input_string
# Create a 32-bit symbolic bitvector "x"
>>> claripy.BVS('x', 32)
# Create a 32-bit bitvectory with the value 0x12345678
>>> claripy.BVV(0x12345678, 32)
<BV32 BVV(0x12345678, 32)>
# 64-bit bitvectors with concrete values 1 and 100
>>> one = state.solver.BVV(1, 64)
>>> one
<BV64 0x1>
# create a 27-bit bitvector with concrete value 9
>>> weird_nine = state.solver.BVV(9, 27)
>>> weird_nine
<BV27 0x9>
>>> weird_nine.zero_extend(64 - 27)
<BV64 0x9>
>>> one + weird_nine.zero_extend(64 - 27)
<BV64 0xa>
import angr
p = angr.Project('./vul')
s = p.factory.blank_state(addr=0x80485c8)
bvs = s.se.BVS('to_memory', 8*4)
s.se.add(bvs > 1000)
s.memory.store(0x08049b80, bvs, endness='Iend_LE')
pg = p.factory.path_group(s, immutable=False)
# get the integer
>>> print s.se.any_int(s.regs.rax)
# get the string
>>> print s.se.any_str(s.memory.load(0x1000, 10, endness='Iend_LE'))
# storing data
>>> s.regs.rax = aaaa
>>> s.memory.store(0x1000, aaaa, endness='Iend_LE')
>>> s.memory.store(s.regs.rax, aaaa, endness='Iend_LE')
p = angr.Project('./vul')
st = p.factory.full_init_state(args=['./vul'])
# Constrain the first 28 bytes to be non-null and non-newline
for _ in xrange(28):
 k = st.posix.files[0].read_from(1)
 st.se.add(k != 0)
 st.se.add(k != 10)
# Constrain the last byte to be a newline
k = st.posix.files[0].read_from(1)
st.se.add(k == 10)
# Reset the symbolic stdin's properties and set its length
st.posix.files[0].seek(0)
st.posix.files[0].length = 29
p = angr.Project('./vul',
load_options={'auto_load_libs': True},
use_sim_procedures=True,
exclude_sim_procedures_func='strcmp')
class my_strcmp(simuvex.SimProcedure):
def run(self):
...
return ...
p.hook_symbol('strcmp', my_strcmp)
'''
$ objdump -M intel -d ./vul | grep -A2 85d7
80485d7: e8 9f 00 00 00 call 804867b
80485dc: 89 44 24 10 mov DWORD PTR [esp+0x10],eax
80485e0: 83 7c 24 10 ff cmp DWORD PTR [esp+0x10],0xffffffff
'''
def check1(state):
state.regs.eax = 20
p.hook(0x080485d7, check1, length=5)
initial_state = project.factory.entry_state(
args=[project.filename, arg1],
add_options={'BYPASS_UNSUPPORTED_SYSCALL'})
pg = p.factory.path_group(initial_state, immutable=False)
pg.use_technique(angr.exploration_techniques.DFS())
# pg.explore(find=(0x08041234, ))
pg.run(step_func=my_find_func)
int counter = 0, values = 0;
for(i=0; i<100; i++){
if(input[i] == 'B'){
counter++;
values += 2;
}
}
if(counter == 75)
bug();
pg = p.factory.path_group(initial_state, immutable=False, veritesting=True)
initial_state = project.factory.entry_state(args=[project.filename, arg1])
initial_state.options.discard('LAZY_SOLVES')
state.mem[0x100000].uint8_t=state.solver.BVV(0x4,8)
state.mem[0x100000].uint16_t=0
state.mem[0x100000].uint32_t=state.solver.BVV(0x3,32)
>> state.mem[0x100000].uint32_t.resolved
<BV32 0x3>
state.mem[0x100000].uint32_t.concrete
0x3
  • CLE(CLE Loading Everything)。angr工具第一步就是将二进制文件加载到cle.loader类里.loader类加载所有的对象并导出一个进程内存的抽象。生成该程序已加载和准备运行的地址空间。具体的源码分析可以参考这里
    >>> print b.loader.find_symbol_got_entry('__libc_start_main')
    >>> print b.loader.main_bin.imports
    {'__gmon_start__': <cle.elf.ELFRelocation at 0x7f9928941650>,
    '__libc_start_main': <cle.elf.ELFRelocation at 0x7f9928941dd0>,
    '__stack_chk_fail': <cle.elf.ELFRelocation at 0x7f9928941590>,
    'fgets': <cle.elf.ELFRelocation at 0x7f9928941550>,
    'getenv': <cle.elf.ELFRelocation at 0x7f9928406810>,
    'printf': <cle.elf.ELFRelocation at 0x7f99284062d0>,
    'ptrace': <cle.elf.ELFRelocation at 0x7f99286cca10>,
    'puts': <cle.elf.ELFRelocation at 0x7f99284068d0>}
    
  • PyVEX

    • 将指令转换成中间语言 (IR) 、分析 IR 并且模拟
    • i.e., 不只知道他是什么,还知道他做了什么
    • state, symbolic memory, SimProcedure
  • Claripy

    • 设置符号变量以及 solver 、收集限制式(约束条件)
    • 是一个前端界面,而后端可以是各种 solver ,比如 z3
  • angr&surveyors
    • 一整个集成符号执行
    • path, path_group, factory, ...

      脚本示例

      对于这样的题目


      我们有多种方式写
  • Surveyors
    通过构建Surveyors的方法
    import angr
    p = angr.Project("test")
    ex = p.surveyors.Explorer(find=(0x400844, ), avoid=(0x400855,))
    ex.run()
    print ex.found[0].state.posix.dumps(0)
    
  • path_group
    import angr
    p = angr.Project("test")
    initial_state = p.factory.entry_state()
    pg = p.factory.path_group(initial_state)
    pg.explore(find=(0x4005d1,))
    print pg
    # <PathGroup with 18 deadended, 4 active, 1 found>
    print pg.found[0]
    # <Path with 64 runs (at 0x4005d1)>
    print pg.found[0].state.posix.dumps(0)
    # input_string
    
  • SimState
    • entry_state: a SimState initialized to the program state at the binary's entry point
    • blank_state: a SimState object with little initialization
      >>> import angr
      >>> b = angr.Project('/bin/true')
      >>> s = b.factory.blank_state(addr=0x08048591)
      >>> s = b.factory.entry_state()
      # The first 5 bytes of the binary
      >>> print s.memory.load(b.loader.min_addr(), 5)
      

      如何设置args

      import angr
      import claripy
      p = angr.Project("test")
      args = claripy.BVS('args', 8*16)
      initial_state = prog.factory.entry_state(args=["./vul", args])
      pg = p.factory.path_group(initial_state)
      pg.explore(find=(0x4005d1,))
      print pg
      # <PathGroup with 18 deadended, 4 active, 1 found>
      print pg.found[0]
      # <Path with 64 runs (at 0x4005d1)>
      print pg.found[0].state.posix.dumps(0)
      # input_string
      

      Claripy后端

      # Create a 32-bit symbolic bitvector "x"
      >>> claripy.BVS('x', 32)
      # Create a 32-bit bitvectory with the value 0x12345678
      >>> claripy.BVV(0x12345678, 32)
      <BV32 BVV(0x12345678, 32)>
      
  • 建立一个32bit的符号值容器 "x":
    claripy.BVS('x',32)
  • 建立一个32bit的具体值(0xc001b3475)容器:
    claripy.BVV(0xc001b3a75,32)
  • 建立一个32bit的步进值,从1000到2000能被10整除的数:
    claripy.SI(name='x',bits=32,lower_bound=1000,upper_bound=2000,stride=10)
  • 如果两个bitvector的长度不同是不能共同运算的,需要将其扩展:
    # 64-bit bitvectors with concrete values 1 and 100
    >>> one = state.solver.BVV(1, 64)
    >>> one
    <BV64 0x1>
    # create a 27-bit bitvector with concrete value 9
    >>> weird_nine = state.solver.BVV(9, 27)
    >>> weird_nine
    <BV27 0x9>
    >>> weird_nine.zero_extend(64 - 27)
    <BV64 0x9>
    >>> one + weird_nine.zero_extend(64 - 27)
    <BV64 0xa>
    
  • 将指令转换成中间语言 (IR) 、分析 IR 并且模拟
  • i.e., 不只知道他是什么,还知道他做了什么
  • state, symbolic memory, SimProcedure
  • 设置符号变量以及 solver 、收集限制式(约束条件)
  • 是一个前端界面,而后端可以是各种 solver ,比如 z3
  • 一整个集成符号执行
  • path, path_group, factory, ...

    脚本示例

    对于这样的题目


    我们有多种方式写
  • entry_state: a SimState initialized to the program state at the binary's entry point
  • blank_state: a SimState object with little initialization
    >>> import angr
    >>> b = angr.Project('/bin/true')
    >>> s = b.factory.blank_state(addr=0x08048591)
    >>> s = b.factory.entry_state()
    # The first 5 bytes of the binary
    >>> print s.memory.load(b.loader.min_addr(), 5)
    

    如何设置args

    import angr
    import claripy
    p = angr.Project("test")
    args = claripy.BVS('args', 8*16)
    initial_state = prog.factory.entry_state(args=["./vul", args])
    pg = p.factory.path_group(initial_state)
    pg.explore(find=(0x4005d1,))
    print pg
    # <PathGroup with 18 deadended, 4 active, 1 found>
    print pg.found[0]
    # <Path with 64 runs (at 0x4005d1)>
    print pg.found[0].state.posix.dumps(0)
    # input_string
    

    Claripy后端

    # Create a 32-bit symbolic bitvector "x"
    >>> claripy.BVS('x', 32)
    # Create a 32-bit bitvectory with the value 0x12345678
    >>> claripy.BVV(0x12345678, 32)
    <BV32 BVV(0x12345678, 32)>
    
  • Environment
    – shared library
  • Exploration Strategy
    – BFS
    – DFS
  • Explosion
    – path explosion
    – path pruning
  • SimProcedure
    p = angr.Project('./vul',
    load_options={'auto_load_libs': True},
    use_sim_procedures=True,
    exclude_sim_procedures_func='strcmp')
    
  • Hook symbol
    class my_strcmp(simuvex.SimProcedure):
    def run(self):
    ...
    return ...
    p.hook_symbol('strcmp', my_strcmp)
    
  • Go into library
  • Hook
    '''
    $ objdump -M intel -d ./vul | grep -A2 85d7
    80485d7: e8 9f 00 00 00 call 804867b
    80485dc: 89 44 24 10 mov DWORD PTR [esp+0x10],eax
    80485e0: 83 7c 24 10 ff cmp DWORD PTR [esp+0x10],0xffffffff
    '''
    def check1(state):
    state.regs.eax = 20
    p.hook(0x080485d7, check1, length=5)
    
  • Unknown syscall
    initial_state = project.factory.entry_state(
    args=[project.filename, arg1],
    add_options={'BYPASS_UNSUPPORTED_SYSCALL'})
    

    Exploration Strategy

  • Exploration techniques
    pg = p.factory.path_group(initial_state, immutable=False)
    pg.use_technique(angr.exploration_techniques.DFS())
    # pg.explore(find=(0x08041234, ))
    pg.run(step_func=my_find_func)
    
  • 例如有这么一个情景
    int counter = 0, values = 0;
    for(i=0; i<100; i++){
    if(input[i] == 'B'){
    counter++;
    values += 2;
    }
    }
    if(counter == 75)
    bug();
    
    会发现在二叉决策的时候,会发生路径爆炸的现象。
  • 情景2
    Unsatisfiable path 代表这条路不可能发生,即无法发生任何一組
    input 使得 binary 可以照这条路执行
  • LAZY_SOLVES
    自己给他取了个名字剪枝
    – 懒得检查,意思是当路径探索完的时候才进行检查
    – 预设是开启的
    initial_state = project.factory.entry_state(args=[project.filename, arg1])
    initial_state.options.discard('LAZY_SOLVES')
    

  • Dynamic path pruning
    • 根据已经检查的路径,推测现在 unsatisfiable path 的比例
    • 依照 unsatisfiable path 的比例调整之后路径要不要进行检查的概率

[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课

收藏
免费 4
支持
分享
最新回复 (5)
雪    币: 280
活跃值: (66)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
2
感谢分享。
2019-7-25 17:19
0
雪    币: 104
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
3
谢谢分享
2019-10-17 22:59
0
雪    币: 8611
活跃值: (5261)
能力值: ( LV4,RANK:45 )
在线值:
发帖
回帖
粉丝
4
mark
2019-10-18 13:39
0
雪    币: 977
活跃值: (435)
能力值: ( LV7,RANK:100 )
在线值:
发帖
回帖
粉丝
5
mark
2020-6-1 15:06
0
雪    币: 18
活跃值: (1076)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
6
mark 
2020-10-12 10:31
0
游客
登录 | 注册 方可回帖
返回