Code property graph explained
In computer science, a code property graph (CPG) is a computer program representation that captures syntactic structure, control flow, and data dependencies in a property graph. The concept was originally introduced to identify security vulnerabilities in C and C++ system code,[1] but has since been employed to analyze web applications,[2] [3] [4] [5] cloud deployments,[6] and smart contracts.[7] Beyond vulnerability discovery, code property graphs find applications in code clone detection,[8] [9] attack-surface detection,[10] exploit generation,[11] measuring code testability,[12] and backporting of security patches.[13]
Definition
A code property graph of a program is a graph representation of the program obtained by merging its abstract syntax trees (AST), control-flow graphs (CFG) and program dependence graphs (PDG) at statement and predicate nodes. The resulting graph is a property graph, which is the underlying graph model of graph databases such as Neo4j, JanusGraph and OrientDB where data is stored in the nodes and edges as key-value pairs. In effect, code property graphs can be stored in graph databases and queried using graph query languages.
Example
Consider the function of a C program:void foo
The code property graph of the function is obtained by merging its abstract syntax tree, control-flow graph, and program dependence graph at statements and predicates as seen in the following figure:
Implementations
Joern CPG. The original code property graph was implemented for C/C++ in 2013 at University of Göttingen as part of the open-source code analysis tool Joern.[14] This original version has been discontinued and superseded by the open-source Joern Project,[15] which provides a formal code property graph specification[16] applicable to multiple programming languages. The project provides code property graph generators for C/C++, Java, Java bytecode, Kotlin, Python, JavaScript, TypeScript, LLVM bitcode, and x86 binaries (via the Ghidra disassembler).
Plume CPG. Developed at Stellenbosch University in 2020 and sponsored by Amazon Science, the open-source Plume[17] project provides a code property graph for Java bytecode compatible with the code property graph specification provided by the Joern project. The two projects merged in 2021.
Fraunhofer AISEC CPG. The provides open-source code property graph generators for C/C++, Java, Golang, and Python,[18] albeit without a formal schema specification. It also provides the Cloud Property Graph,[19] an extension of the code property graph concept that models details of cloud deployments.
Galois’ CPG for LLVM. Galois Inc. provides a code property graph based on the LLVM compiler.[20] The graph represents code at different stages of the compilation and a mapping between these representations. It follows a custom schema that is defined in its documentation.
Machine learning on code property graphs
Code property graphs provide the basis for several machine-learning-based approaches to vulnerability discovery. In particular, graph neural networks (GNN) have been employed to derive vulnerability detectors.[21] [22] [23] [24] [25] [26] [27]
See also
Notes and References
- Book: Yamaguchi . Fabian . Golde . Nico . Arp . Daniel . Rieck . Konrad . 2014 IEEE Symposium on Security and Privacy . Modeling and Discovering Vulnerabilities with Code Property Graphs . May 2014 . 590–604 . 10.1109/SP.2014.44. 978-1-4799-4686-0 . 2231082 .
- Book: Backes . Michael . Rieck . Konrad . Skoruppa . Malte . Stock . Ben . Yamaguchi . Fabian . 2017 IEEE European Symposium on Security and Privacy (EuroS&P) . Efficient and Flexible Discovery of PHP Application Vulnerabilities . April 2017 . 334–349 . 10.1109/EuroSP.2017.14. 978-1-5090-5762-7 . 206649536 .
- Book: Li . Song . Kang . Mingqing . Hou . Jianwei . Cao . Yinzhi . Mining Node.js Vulnerabilities via Object Dependence Graph and Query . 2022 . 143–160 . 9781939133311 . en.
- Brito . Tiago . Lopes . Pedro . Santos . Nuno . Santos . José Fragoso . Wasmati: An efficient static vulnerability scanner for WebAssembly . Computers & Security . 1 July 2022 . 118 . 102745 . 10.1016/j.cose.2022.102745. 2204.12575 . 248405811 .
- Book: Khodayari . Soheil . Pellegrino . Giancarlo . JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals . 2021 . 2525–2542 . 9781939133243 . en.
- Book: Banse . Christian . Kunz . Immanuel . Schneider . Angelika . Weiss . Konrad . 2021 IEEE 14th International Conference on Cloud Computing (CLOUD) . Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis . September 2021 . 13–19 . 10.1109/CLOUD53861.2021.00014. 2206.06938 . 978-1-6654-0060-2 . 243946828 .
- Giesen . Jens-Rene . Andreina . Sebastien . Rodler . Michael . Karame . Ghassan . Davi . Lucas . Practical Mitigation of Smart Contract Bugs TeraFlow . www.teraflow-h2020.eu .
- Book: Wi . Seongil . Woo . Sijae . Whang . Joyce Jiyoung . Son . Sooel . Proceedings of the ACM Web Conference 2022 . HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs . 25 April 2022 . 755–766 . 10.1145/3485447.3512235. 9781450390965 . 248367462 .
- Book: Bowman . Benjamin . Huang . H. Howie . 2020 IEEE European Symposium on Security and Privacy (EuroS&P) . VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets . September 2020 . 53–69 . 10.1109/EuroSP48549.2020.00012. 978-1-7281-5087-1 . 226268429 .
- Book: Du . Xiaoning . Chen . Bihuan . Li . Yuekang . Guo . Jianmin . Zhou . Yaqin . Liu . Yang . Jiang . Yu . 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) . LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics . May 2019 . 60–71 . 10.1109/ICSE.2019.00024. 1901.11479 . 978-1-7281-0869-8 . 59523689 .
- Book: Alhuzali . Abeer . Gjomemo . Rigel . Eshete . Birhanu . Venkatakrishnan . V. N. . NAVEX: Precise and Scalable Exploit Generation for Dynamic Web Applications . 2018 . 377–392 . 9781939133045 . en.
- Al Kassar . Feras . Clerici . Giulia . Compagna . Luca . Balzarotti . Davide . Yamaguchi . Fabian . Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications – NDSS Symposium . NDSS Symposium .
- Book: Shi . Youkun . Zhang . Yuan . Luo . Tianhan . Mao . Xiangyu . Cao . Yinzhi . Wang . Ziwen . Zhao . Yudi . Huang . Zongan . Yang . Min . Backporting Security Patches of Web Applications: A Prototype Design and Implementation on Injection Vulnerability Patches . 2022 . 1993–2010 . 9781939133311 . en.
- Web site: Joern - A Robust Code Analysis Platform for C/C++ . www.mlsec.org.
- Web site: Joern - The Bug Hunter's Workbench . Joern - The Bug Hunter's Workbench . en.
- Web site: Code Property Graph Specification . cpg-spec.github.io . en.
- Web site: Plume . plume-oss.github.io.
- Web site: Code Property Graph . Fraunhofer AISEC . 31 August 2022.
- Book: Banse . Christian . Kunz . Immanuel . Schneider . Angelika . Weiss . Konrad . 2021 IEEE 14th International Conference on Cloud Computing (CLOUD) . Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis . September 2021 . 13–19 . 10.1109/CLOUD53861.2021.00014. 2206.06938 . 978-1-6654-0060-2 . 243946828 .
- Web site: The Code Property Graph — MATE 0.1.0.0 documentation . galoisinc.github.io.
- Zhou . Yaqin . Liu . Shangqing . Siow . Jingkai . Du . Xiaoning . Liu . Yang . Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks . Proceedings of the 33rd International Conference on Neural Information Processing Systems . 8 December 2019 . 10197–10207 . Curran Associates Inc.. 1909.03496 .
- Book: Haojie . Zhang . Yujun . Li . Yiwei . Liu . Nanxin . Zhou . 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) . Vulmg: A Static Detection Solution for Source Code Vulnerabilities Based on Code Property Graph and Graph Attention Network . December 2021 . 250–255 . 10.1109/ICCWAMTIP53232.2021.9674145. 978-1-6654-1364-0 . 246039350 .
- Book: Zheng . Weining . Jiang . Yuan . Su . Xiaohong . 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) . Vu1SPG: Vulnerability detection based on slice property graph representation learning . October 2021 . 457–467 . 10.1109/ISSRE52982.2021.00054. 978-1-6654-2587-2 . 246751595 .
- Chakraborty . Saikat . Krishna . Rahul . Ding . Yangruibo . Ray . Baishakhi . Deep Learning based Vulnerability Detection: Are We There Yet . IEEE Transactions on Software Engineering . 2021 . 48 . 9 . 3280–3296 . 10.1109/TSE.2021.3087402. 2009.07235 . 221703797 .
- Book: Zhou . Li . Huang . Minhuan . Li . Yujun . Nie . Yuanping . Li . Jin . Liu . Yiwei . 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC) . GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network . October 2021 . 381–388 . 10.1109/DSC53577.2021.00060. 2202.02501 . 978-1-6654-1815-7 . 246634824 .
- Book: Ganz . Tom . Härterich . Martin . Warnecke . Alexander . Rieck . Konrad . Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security . Explaining Graph Neural Networks for Vulnerability Discovery . 15 November 2021 . 145–156 . 10.1145/3474369.3486866. 9781450386579 . 240001850 . free .
- Book: Duan . Xu . Wu . Jingzheng . Ji . Shouling . Rui . Zhiqing . Luo . Tianyue . Yang . Mutian . Wu . Yanjun . Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence . VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities . August 2019 . 4665–4671 . 10.24963/ijcai.2019/648. 978-0-9992411-4-1 . 199466292 . free .