Joern

2023-07-30

技术笔记

一个源码分析工具

功能:解析C/C++/java源代码并提供代码的中间图表示，包含：

Abstract Syntax Trees (AST) 抽象语法树
Control Flow Graphs (CFG) 控制流图
Control Dependence Graphs (CDG) 控制依赖图
Data Dependence Graphs (DDG) 数据依赖图
Program Dependence graphs (PDG) 程序依赖图
Code Property Graphs (CPG14) 代码属性图
Entire graph, i.e. convert to a different graph format (ALL)

环境配置与安装

System

WSL2 (Ubuntu 22.04)

Java:

openjdk version “11.0.15” 2022-04-19
OpenJDK Runtime Environment (build 11.0.15+10-Ubuntu-0ubuntu0.22.04.1)
OpenJDK 64-Bit Server VM (build 11.0.15+10-Ubuntu-0ubuntu0.22.04.1, mixed mode, sharing)

Packages:

unzip

Installation

mkdir joern && cd joern # optional
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
chmod u+x joern-install.sh
./joern-install.sh --interactive

导入源码&新建工程

method1：fromstring

1 2	joern> importCode.$Languange.fromString("$Code") res0: Cpg = Cpg (Graph [$number nodes])

e.g.

method 2: frompath

1	joern> importCode(inputPath="$path", projectName="$name")

解析c源码并输出对应的cpg的.dot文件

1	joern> cpg.method($name).dotCpg14.l

可视化.dot并导出为svg

使用VS Code 插件 Graphviz Interactive Preview

利用python脚本批量处理源代码(CPD&PDG)

import subprocess
import os
import shutil
import pandas as pd
from tqdm import tqdm
JOERNPATH="$JOERNPATH"
root_dir = './data'
source_dir = "$src"

import subprocess

def parse_source_code_to_dot(file_path,f,
out_dir_pdg='/parsed/dot/pdg/',out_dir_cpg='/parsed/dot/cpg/'):
    root_path = './data'
    try :
        os.makedirs(root_path+out_dir_pdg)
        os.makedirs(root_path+out_dir_cpg)
    except:
        pass
    out_dir_cpg=root_path + '/parsed/dot/cpg/'
    
    shell_str = "sh " + JOERNPATH + "./joern-parse " + file_path
    subprocess.call(shell_str, shell=True) 

    shell_export_cpg = "sh " + JOERNPATH + "joern-export " + "--repr cpg14 --out " + out_dir_cpg + f.split('.')[0] + os.sep
    subprocess.call(shell_export_cpg, shell=True)

写入到json:

import json
import gzip

path = 'data/poj104/test.gzip'
with gzip.open(path, 'r') as fin:
  json_bytes = fin.read()
json_str = json_bytes.decode('utf-8')
objs = json.loads(json_str)

with open('json.json','w',encoding='utf-8') as file:
  file.write(json.dumps(objs,indent=2,ensure_ascii=False))