Professional Documents
Culture Documents
2022 Codaspy Achallade BuildingACommitLevelDataset
2022 Codaspy Achallade BuildingACommitLevelDataset
2022 Codaspy Achallade BuildingACommitLevelDataset
Real-World Vulnerabilities
Alexis Challande, Robin David, Guénaël Renault
Biography
whoami
Alexis Challande, Ph.D. student (3rd year)
Collaboration between Quarkslab and LiX (Ecole Polytechnique/Inria) in France
Alexis Challande 1 / 16
Problem
Situation
Let’s imagine I have a shiny new tool that detect if a patch has been applied to a binary,
Potential Solutions
� Gather a few couples of binaries from real vulnerabilities
� Create a synthetic project where I add hand-crafted vulnerabilities
� …
Alexis Challande 2 / 16
Problem
Situation
Let’s imagine I have a shiny new tool that detect if a patch has been applied to a binary,
Potential Solutions
� Gather a few couples of binaries from real vulnerabilities
� Create a synthetic project where I add hand-crafted vulnerabilities
� …
� Use the dataset presented today
Alexis Challande 2 / 16
Contributions
Commit Level Dataset
A dataset of more than 1,900 vulnerabilities:
� Based on real-world issues (CVEs);
� Precise at a Commit Level.
Precompiled Dataset
Binaries for a 600 of these vulnerabilities:
� 4 architectures (ARM, x86, x86-64, ARM64);
� With debug symbols.
Alexis Challande 4 / 16
Potential Applications
� Patch Characterization
Draw an identity card of Patch
� Patch Detection
Detect if a patch has been applied
� …
Alexis Challande 5 / 16
Android Open Source Project (AOSP)
What is AOSP?
6 architectures 90 GB / version
Alexis Challande 6 / 16
Android’s security
Update process
� Multiple stakeholders (Google, OEM, carriers…)
� Intricated, complex, and usually not followed in time 2 years of security-update
� Project Treble [6] to solve part of the problem starting to get enforced
Alexis Challande 7 / 16
Android Security Bulletins
Objective
Based on the Security Patch Level, find CVEs affecting a device.
Alexis Challande 9 / 16
Towards Binary Artefacts
Motivations
Not every vulnerability is found in Open Source.
There exists a need for binary only solutions.
Alexis Challande 10 / 16
AOSPBuilder: Overview
Process
1. Reuse Roy results for vulnerabilities
2. Prepare a build environment for AOSP
3. (Build-magic)
Extract
Roy
CVE Data
Fixed binaries
Build vulnerable version
Build diffing
Alexis Challande 11 / 16
Dataset Overview: At Source Level
Results
� Huge set of vulnerabilities (3400+ and more than 1900 with commits)
� Ever increasing but parser often needs to be updated
Limitations
� Only Open Source components
� Rely on Google’s commitment to publish bulletins
Alexis Challande 12 / 16
Dataset Overview: At Source Level
1000
800
3000
# of CVE
600
# CVEs
400 2000
200
1000
0
2015 2016 2017 2018 2019 2020 2021 2022*
CVEs with commit CVEs 0
2 4 6 8 10
Alexis Challande 12 / 16
Dataset Overview: At Binary Level
Results
� 600 vulnerabilities compiled
� 120 GB links to download on GitHub
� 4 architectures the same binaries in various architectures
Limitations
� Only vulnerabilities on C/C++
� Build automation is hard: lots of failures
� Only vulnerabilities after Android 6
Alexis Challande 13 / 16
Dataset Overview: At Binary Level
CVE-2017-0738
(1d919d737)
7e8e7da[...]_libbundlewrapper.so 6e6d6c2[...]_libbundlewrapper.so
fd9556f0[...]_libbundlewrapper.so ae4c026[...]_libbundlewrapper.so
Alexis Challande 13 / 16
Dataset Overview: At Binary Level
Name Count
Library Object libbluetooth.so 953
34.8% 35.8%
bluetooth.default.so 748
libnfc-nci.so 650
Executable
Archive
7.92%
libstagefright.so 421
13.2%
Other
8.23% net_test_btif 417
Most Common Binaries
Binary Files Types
Alexis Challande 13 / 16
Usage
aosp-dataset/
├── cves
│ ├── CVE-2012-6701.json
│ ├── CVE-2012-6702.json CVE Details
│ ├── ....
├── LICENSE Links towards
� Information is stored in JSON Files ├── precompiled precompiled
│ └── links.json
� A helper module in Python provided to ├── pyproject.toml
vulnerabilities
ease usage ├── README.md
├── schemas
│ ├── AospCve.schema.json Schemas
│ ├── precompiled.schema.json definitions
└── src
Python
Alexis Challande 14 / 16
Conclusion
Dataset
� +1,900 Commit-precise real-world vulnerabilities
� +600 Precompiled artefacts for four architectures
Useful for: Patch characterization, Silent fix detection, Cross architecture diffing …
� https://github.com/quarkslab/aosp_dataset
Alexis Challande 15 / 16
Thank you
Contact information:
� achallande@quarkslab.com
� +33 1 58 30 81 51
� https://www.quarkslab.com
References I
Frederick Boland and Paul Black.
The juliet 1.1 c/c++ and java test suite.
(45).
Publisher: Computer (IEEE Computer).
DARPA.
Cyber grand challenge.
Brendan Dolan-Gavitt, Patrick Hulin, Engin Kirda, Tim Leek, Andrea Mambretti, Wil
Robertson, Frederick Ulrich, and Ryan Whelan.
LAVA: Large-scale automated vulnerability addition.
In 2016 IEEE Symposium on Security and Privacy (SP), pages 110–121. IEEE.
Junaid Akram and Luo Ping.
How to build a vulnerability benchmark to overcome cyber security attacks.
14(1):60–71.
Google.
Android security bulletins.
MITRE Corporation.
MITRE.
Alexandre Dulaunoy and Pieter-Jan Moreels.
cve-search - a free software to collect, search and analyse common vulnerabilities and
exposures in software.
Serena Elisa Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and Cedric
Dangremont.
A manually-curated dataset of fixes to vulnerabilities of open-source software.
In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories
(MSR), pages 383–387. IEEE.