-
-
[转帖]TypeRefHasher_2020年11月11日更新:1.0.2
-
发表于: 2020-11-12 09:04 1555
-
简介
用于计算 .NET 二进制文件的 TypeRefHash 的 CLI 工具。
安装
有多个选项来获取TypeRefHasher。
窗户
安装工具与winget。该工具将添加到环境变量中。PATH
winget install GDATA.TypeRefHasher
另一种选择是仅从 GitHub 版本选项卡下载二进制文件或安装程序
trh.msi
-> Windows x64 安装程序(允许卸载并将该工具添加到您的PATH
)trh.exe
-> Windows x64 (独立二进制文件)
Linux
从 GitHub 版本选项卡下载二进制文件或安装程序。
trh
-> Linux x64 (独立二进制文件)
使用
用法尽可能直截了当。
窗户:
> trh.exe file
Linux:
> trh file
在这两种情况下,输出都是给定文件的 TRH(例如:1defec485ab3060a9201f35d69cfcdec4b70b84a2b71c83b53795ca30d1ae8be))或一条错误消息,说明为什么无法计算哈希。
介绍 TypeRefHash (TRH)
我们介绍 TypeRefHash (TRH),它是不与 .NET 二进制文件一起工作的 ImpHash 的替代方法。我们的评估表明,它可用于有效地识别 .NET 恶意软件系列。
ImpHash 于2014 年由 FireEye [1] 推出。此后,许多恶意软件分析人员一直使用它,并在VirusTotal 等工具中实施,通过导入来识别类似的恶意软件样本。理论上,如果程序使用相同的导入,它们也使用类似的源代码。
.NET 样本通常只导入mscoree.dll,因此所有 .NET 二进制文件都只有几种不同的 ImpHashes。因此,不能在这里使用 ImpHash。这促使我们找到一个替代方案,TypeRefHash(TRH)。为了显示导入的 DLL、函数和 TypeRef 表,我们使用在线工具penet.io。
.NET 样本的唯一导入是"mscoree.dll"。
.NET 文件在所谓的元数据表中存储其引用类型的导入命名空间。我们可以使用这些标识符来构造一个标识符,如 ImpHash。与导入表中的 DLL/函数名称的组合类似,TypeRef 表包含一个包含类型名称及其相应命名空间的列表。例如,.NET 二进制文件可能会从命名空间系统导入类型调试器浏览器状态。
.NET 二进制文件的 TypeRef 表,具有导入的命名空间和类型。
为了计算 TRH,我们提取TypeRef 表,并解析索引到相应的字符串。
- 按类型命名空间订购条目,然后按类型名称排列条目。
- 用破折号来串联"类型名称"和"类型名称"。如果命名空间为空,则串联的字符串以破折号开头。
- 使用逗号加入所有字符串并计算生成的 UTF8 字节字符串的SHA256哈希值。
我们使用 SHA256,而不是用于 ImpHash 的 MD5,因为我们已经在数据集上看到了 MD5 冲突。我们命令表中的条目,以防止通过重新排序表为示例创建不同的 TypeRefHash 的攻击。巴雷斯和沙福丁对ImpHash也显示了类似的攻击[2]。我们选择了短划线和逗号作为分隔符,因为它们在 .NET 中的命名空间和类型名称中无效。
假设我们有一个 .NET 示例,该示例具有以下简化的 TypeRef 表:
# | 类型名称(已解决) | 类型名称空间(已解决) |
0 | 编译关系属性 | 系统.运行时.编译器服务 |
1 | 运行时兼容属性 | 系统.运行时.编译器服务 |
2 | 目标框架属性 | 系统.运行时.版本 |
3 | 调试模式 | |
4 | 程序集文件版本属性 | 系统.反射 |
这将导致以下有序和连接字符串。应该注意的是,具有空命名空间的 TypeRefs 已排序到列表的开头。
-调试模式系统 |
系统.反射-组装文件版本属性 |
系统.运行时.编译器服务-编译关系属性 |
系统.运行时.编译器服务-运行时兼容属性 |
系统. 运行时. 版本 - 目标框架属性 |
这与以下最终字符串串联:
-DebuggingModesSystem,System.Reflection-AssemblyFileVersionAttribute,System.Runtime.CompilerServices-CompilationRelaxationsAttribute,System.Runtime.CompilerServices-RuntimeCompatibilityAttribute,System.Runtime.Versioning-TargetFrameworkAttribute
生成的 TRH 是上述字符串的 SHA256 哈希值。
63AE8074B4C2EF8E36FE3272BE23B445CEAB495E14877935C457E75CFB5E5A1E
您可以在这里的PeNet库中找到TRH参考实现。
How good can a TypeRefHash identify a certain malware family? To answer this, we evaluated .NET samples that we received mid May to mid June 2020 and looked at the corresponding hashes for seven families. We chose those, because we were able to collect a significant number of samples for each malware family, such that a meaningful evaluation is possible.
We looked at the following families:
Malware Family | # Samples |
AsyncRAT | 558 |
Blackshades | 5035 |
Bladabindi | 7793 |
DiscordTokenGrabber | 159 |
Nanocore | 1335 |
QuasarRAT | 517 |
RevengeRAT | 276 |
We inspected the distribution of different TypeRefHashes for those families. In the following figures the blue sections depict the most common TRH for that family. If the number of samples with the same TypeRefHash was equal or lower than five, we aggregated those TRHs in the shaded areas, to not pollute the chart.
Percentage of different TRH per malware family.
We can see that in most cases one TypeRefHash dominates a family. Especially blackshades could be identified very successfully with the two most common TRHs comprising 97% of all analysed samples.
We evaluated the distribution for different malware families. The most common TypeRefHash for each family can be seen in the following table:
Malware Family | Most common TRH |
AsyncRAT | 4807b5cd7256fad54967dfe3c394c27d16bad1ac95b0306911a3546025bd6ccf |
Blackshades | 306db7dcdf4dd7bbf2eaa054a8c050fb97cbe84c0da87528c6e508ac5e11607b |
Bladabindi | 695409c18e59ff8a2c04f5572f61d35157ea1ce34e6f3db4975dfbaeb5d7e07f |
DiscordTokenGrabber | 6f917770f111b5e0f6bd7b1ccd3adf491fbc756bf031fe107233d3b19d4737d |
Nanocore | 31feea84c77a972ebe0bfc87ac90630ad824e91965b664c47d0d2b0761b30d16 |
QuasarRAT | 03d72f6a261029edbd5028d814b27b075f5c3c62219dbfe8a349998909d07b9a |
RevengeRAT | faaf850b8f9ce7eeed4c9d18b2fbd70ef1c9dde8d920c6e333829f3150d9ca08 |
The distribution can be seen in the following figures.
Percentage of malware families per most common TRH.
We can see that for five families, we hit the right samples in 100% of the cases. When looking at the most common TypeRefHash of QuasarRAT, we found one CardinalRAT sample, too. Only with RevengeRAT our results are a little bit more inaccurate, as we found 15 Bladabindi and one AsyncRAT samples. We also found two samples known to be clean. Therefore, the TypeRefHash cannot be used effectively for some malware families, like Revengerat.
As the ImpHash cannot be used with .NET binaries, we developed a similar method called TypeRefHash (TRH). The TRH is a SHA256 hashsum over the imported .NET namespaces and types. This is similar to the ImpHash, which is an MD5 hashsum over the imported DLLs and their functions.
Our evaluation showed that the TRH can be used to identify malware families with a similar precision as the ImpHash for non-.NET files. Depending on the family, the TRH can be unique for one malware family or can be found in multiple families.
You can find the reference implementation in the PeNet library here.
You can find a list with the samples used for the evaluation with their corresponding family name and TRH here.
A command line tool to compute the TRH on Windows and Linux can be found here.
[2]: Balles, C. and Sharfuddin, A., 2019. Breaking Imphash. a8dK9s2c8@1M7s2y4Q4x3@1q4Q4x3V1k6Q4x3V1k6S2M7Y4S2A6N6W2)9J5k6h3!0J5k6#2)9J5c8X3k6@1M7q4)9J5c8X3q4J5P5r3W2$3i4K6u0r3M7r3q4H3k6i4u0K6i4K6u0r3x3e0V1H3z5g2)9J5c8U0p5&6x3o6W2Q4x3X3f1H3y4K6j5K6x3q4)9J5k6i4m8V1k6R3`.`. (accessed: 17.06.2020)
Disclaimer: The PeNet library and penet.io are both projects from one of the authors of this blog entry (Stefan Hausotte).