Joern
一个源码分析工具
功能:解析C/C++/java源代码并提供代码的中间图表示,包含:
- Abstract Syntax Trees (AST) 抽象语法树
- Control Flow Graphs (CFG) 控制流图
- Control Dependence Graphs (CDG) 控制依赖图
- Data Dependence Graphs (DDG) 数据依赖图
- Program Dependence graphs (PDG) 程序依赖图
- Code Property Graphs (CPG14) 代码属性图
- Entire graph, i.e. convert to a different graph format (ALL)
环境配置与安装
System
Java:
- openjdk version “11.0.15” 2022-04-19
- OpenJDK Runtime Environment (build 11.0.15+10-Ubuntu-0ubuntu0.22.04.1)
- OpenJDK 64-Bit Server VM (build 11.0.15+10-Ubuntu-0ubuntu0.22.04.1, mixed mode, sharing)
Packages:
Installation
1 2 3 4
| mkdir joern && cd joern curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh chmod u+x joern-install.sh ./joern-install.sh --interactive
|
导入源码&新建工程
1 2
| joern> importCode.$Languange.fromString("$Code") res0: Cpg = Cpg (Graph [$number nodes])
|
e.g.
1
| joern> importCode(inputPath="$path", projectName="$name")
|
解析c源码并输出对应的cpg的.dot文件
1
| joern> cpg.method($name).dotCpg14.l
|
可视化.dot并导出为svg
使用VS Code 插件 Graphviz Interactive Preview
利用python脚本批量处理源代码(CPD&PDG)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| import subprocess import os import shutil import pandas as pd from tqdm import tqdm JOERNPATH="$JOERNPATH" root_dir = './data' source_dir = "$src"
import subprocess
def parse_source_code_to_dot(file_path,f, out_dir_pdg='/parsed/dot/pdg/',out_dir_cpg='/parsed/dot/cpg/'): root_path = './data' try : os.makedirs(root_path+out_dir_pdg) os.makedirs(root_path+out_dir_cpg) except: pass out_dir_cpg=root_path + '/parsed/dot/cpg/' shell_str = "sh " + JOERNPATH + "./joern-parse " + file_path subprocess.call(shell_str, shell=True)
shell_export_cpg = "sh " + JOERNPATH + "joern-export " + "--repr cpg14 --out " + out_dir_cpg + f.split('.')[0] + os.sep subprocess.call(shell_export_cpg, shell=True)
|
写入到json:
1 2 3 4 5 6 7 8 9 10 11
| import json import gzip
path = 'data/poj104/test.gzip' with gzip.open(path, 'r') as fin: json_bytes = fin.read() json_str = json_bytes.decode('utf-8') objs = json.loads(json_str)
with open('json.json','w',encoding='utf-8') as file: file.write(json.dumps(objs,indent=2,ensure_ascii=False))
|