Shellcon 2019 Rolling Your Own - 2

Rolling Your Own
How to Write Custom, Lightweight Static Analysis Tools
Clint Gibler Daniel DeFreez

@clintgibler @defreez
https://tldrsec.com
By the end of this talk you will...
● Understand at a high level how static analysis tools work
● See several practical, hands-on examples using open source tools
● Have an intuition for what use cases are easy and which are hard
● Know when commercial SAST tools may be useful for your company
About Us
● Daniel DeFreez
○ PhD student at UC Davis (program analysis and automated reasoning group)
○ LLVM and Linux Enthusiast
● Clint Gibler
○ Technical Director and Research Director at NCC Group
○ PhD from UC Davis
○ Loves building tools to find bugs
NCC Group - Books
Head of
CEO
NCC Group - Alumni Red Team
Co-founder
C(I)SO Sec manager

/ Head of
AppSec /
Head of
InfraSec
Security Engineer
Agenda
● Background
○ Static analysis vs dynamic analysis
○ Motivating example - finding NodeJS command injection
○ Fundamentals - parsing, abstract syntax trees (ASTs)
● Hands-on Examples
○ Interactively exploring Rails code bases
○ Finding command injection in NodeJS apps
● Challenges and Limitations
○ When static analysis is hard
● Going Forward, Other Resources
○ Other use cases of AST matching
○ Should I buy a SAST tool?
○ Open source tools
Background
Static vs. Dynamic Analysis Overview
● Static analysis - reason about code based on looking at it
● Dynamic analysis - run code and observe how it behaves
● ShellCon 2019 - Automated Bug Finding in Practice -> Slides | Video
Grep / Linters
Motivating Example: Find command injection
● NodeJS child_process.exec()
“Spawns a shell then executes the command within that shell... The command
string passed to the exec function is processed directly by the shell and special
characters need to be dealt with accordingly.”
Aww yeah
Attempt 1:
Attempt 2: -g = only return files

that match pattern
Not exec() Hard-coded strings
aren’t injectable :(
Attempt 3:
The Fundamental Issue
String (regex) Source Code (?)
exec(
cmd // multi-line calls are OK
)
other_exec(cmd) // another function
// exec(arg) in a comment
console.log("exec(foo) in a string") !=
// arg to exec is a combination of
// variable and hard-coded string
exec(arg + " constant")
exec("constant" + arg)
Search language ● Clever regexes can get closer,
vs Problem domain but we’re fundamentally

limited by the expressivity of
mismatch our query language
● Source code can be written in
Find all method calls where many ways - regex rabbit hole
- Method name == “exec”
- 1st arg is not a hard-coded
string
Solution: Parse the Code
● Goal: search code in a way that’s aware of syntax and the language’s
structure
● Parsing allows us to go from operating on text strings (concrete syntax) to
operating on language constructs (abstract syntax)
○ E.g. Conditionals, loops, variable assignments, function calls…
○ Why? Know when something is a variable vs method call, text is in a comment or string literal,
...
Parsing: How?
● Write your own parser
○ Don’t do this! Languages are complex and ever evolving
● A grammar and generic parser, e.g. ANTLR
● A tool designed specifically for a language, e.g. JavaParser, pycparser
● An IDE library, e.g. Eclipse JDT
● A compiler, e.g. clang
● A parsing tool that handles multiple languages, easily extract parsed structure
○ https://github.com/github/semantic/ - used to power GitHub’s language intelligence
○ https://doc.bblf.sh/
○ ...
Why roll your own static analysis?
● Commercial static analysis tools are:
○ Money expensive
○ Computationally expensive
○ Opaque
● Exploratory, iterative searching (vs knowing exactly what to look for)
● Goal: keep it under X lines of Python where X is some small number
Why shouldn’t you?

● You have to update/maintain the tool
● No built-in set of rules
● Data-flow analysis is hard to write
Static Analysis Security Testing (SAST) Architecture
Control and Apply security
Parse data flow rules
Source code Abstract syntax Graph structure Security bugs

tree (AST) ● Call graph
● Single static
assignment (SSA)
=> Reason about control
This talk and data dependencies
between statements and
functions
example.rb -> semantic -> JSON AST
Ruby Source AST from Semantic
name = “world”
Hands-on Examples
Case Study
Exploring a Rails Code Base
Here’s a massive Rails code base
Now go find some bugs
● Static analysis can be iterative and exploratory
○ We don’t know what we’re looking for
● Not just a black box that outputs results
#1 - What languages are being used? cloc
#2 - What dependencies are being used? Gemfile
Data probably stored
in Elasticsearch
Makes network requests

● SSRF
● HTTP?
Library used for

authentication
Big Picture Methodology
Custom scripts
@github/semantic
Abstract Syntax Program

Trees (ASTs) understanding
Background
● Controller - class that defines methods that respond to HTTP reqs (“route”)
● before_action - “call this method before running this route”
#3 - Routes: --rails-summarize-controllers
} methods
} methods
before_action
} before_action
} Helper methods used by

other API controllers
} Helper methods used by

other API controllers
Rails Practice: Helpers and Common Filters
ApplicationController < ActionController::Base
● Define helper functions used by other controllers

● before_actions that apply to all subclasses
○ CSRF protections, authentication/authorization checks
Api::BaseController < ApplicationController
● API-specific helper functions

● before_actions that apply to other API controllers
○ …
Not subclassing the right controller -> security checks may not be applied
#4 --rails-controllers-by-superclass
● How should Clearance be used?

● What config settings are important?
● Any gotchas?
#4 --rails-controllers-by-superclass
● How should Clearance be used?

● What config settings are important?
● Any gotchas? All controllers inherit from
ApplicationController or
Api::BaseController
👍
#5 --rails-controllers-by-before-action
:redirect_to_signin
Dashboards || affected = show | ignored =
EmailConfirmations || affected = | ignored = update, new
Profiles || affected = edit, update, delete, destroy | ignored = show
Notifiers || affected = show, update | ignored =
MultifactorAuths || affected = new, create, update | ignored =
Api::V1::ApiKeys || affected = reset | ignored = show
:verify_password
Profiles || affected = update, destroy | ignored = edit, show, delete
:verify_with_otp
Api::V1::Owners || affected = create, destroy | ignored = show, gems,
verify_gem_ownership
Api::V1::Rubygems || affected = create | ignored = index, show,
reverse_dependencies
Api::V1::Deletions || affected = create | ignored =
#5 --rails-controllers-by-before-action
:redirect_to_signin
Dashboards || affected = show | ignored =
EmailConfirmations || affected = | ignored = update, new
Profiles || affected = edit, update, delete, destroy | ignored = show
Notifiers || affected = show, update | ignored =
MultifactorAuths || affected = new, create, update | ignored =
Api::V1::ApiKeys || affected = reset | ignored = show
:verify_password
Profiles || affected = update, destroy | ignored = edit, show, delete
:verify_with_otp
}
Api::V1::Owners || affected = create, destroy | ignored = show, gems,
verify_gem_ownership
Api::V1::Rubygems || affected = create | ignored = index, show,
reverse_dependencies
Api::V1::Deletions || affected = create | ignored =
GitHub Case Study
JavaScript Command Injection
NodeJS shell command injection
● Typical example of unsafe API (should use execFile() or spawn())

● If the first parameter is tainted, then attacker has command injection
ExpressJS - popular NodeJS web framework
Big Picture Methodology
Filter - .js files that contain

child_process, exec,
req
@github/semantic
Abstract Syntax Command injection

Interesting files Trees (ASTs)
Finding Command Injection in a Haystack
# Source files in
GitHub dataset
2.2 billion
# JavaScript files 284 million (13%)
child_process
& exec 322,000 (1%) % is of previous
working set
+ req 12,236 (3.8%)
Extract Function Calls Named exec() - jq
"callFunction": {
"sourceRange": [ 750, 754 ],
"name": "exec",
cat ast.json
"sourceSpan": {
| jq '..
"start": [29, 11],
| .callFunction?
"end": [29, 15 ]
| select(.name == "exec")'
},
"term": "Identifier"
}
jq Limitations
● Works best for a pipeline that gets narrower and narrower
● Cumbersome to match part of the tree then go back up
Quick Terminology - Taint
● Taint source - where untrusted, attacker-controlled data comes in
○ URL parameters, cookies, data from a third-party service, ...
● Taint sink - function call that’s dangerous if attacker controlled data reaches
○ Unparameterized SQL query, shell exec()
“Two-step” taint analysis
● A quick way to reduce false positives is to limit path length
● “One step” would be directly using taint source at sink
We’re going to:
1. Assign taint to variable (uses req)

2. Use variable as argument to child_process.exec()
Step 1 Step 2
Taint Source
● Extremely rough approximation: any variable named “req”
○ Clearly this could be improved
● ...few results ��
AST Subtree Approximation
● Node is tainted if subtree

contains a taint source.
● Handles expressions
○ In particular, string concatenation
● Assume the entire right side is

tainted.
Our analysis in a nutshell
Three rules:
1. An AST node is tainted if any descendant (child, grandchildren, etc.) is a taint

source.
2. An AST node is tainted if it is the result of assignment from (1)
3. An AST node is a bug if it is a taint sink called with an argument from (2)
Three rules:

source.
Rule #1
Three rules:

source.
Rule #1
Rule #2
Three rules:

source.
Rule #1
Rule #2 Rule #3
Some real command injection examples
Sanitizer is insufficient: ; , && ...

Not Perfect
● It’s a simple technique so there are problems
● False positive rate is low
○
○
Could be lower by removing all conditional exec()s
Tradeoff between false negatives / false positives
😞😞
😞
var command = req.params.command;
switch (command) {
case "sudo poweroff":
exec(command, function(error, stdout, stderr){
2 hours of parsing
37 command injection bugs
Challenges and Limitations
Things that are hard for static analysis
● Hard truth: Reasoning about any non-trivial program property is undecidable
○ Hard proof: Rice’s theorem
● Dynamically typed languages
● eval(), new Function(), reflection, metaprogramming, …
● Calls to libraries where you don’t have the source code
● To be performant/tractable, all tools make trade-offs
○ How precisely do you model the code vs memory and CPU costs
○ E.g. Tracking taint for: individual keys in a map, objects to a max depth foo.attr.bar.baz
The “Find Everything” vs “Find Only Real Bugs” Knob
Only report guaranteed Try to catch all bugs

bugs (false negatives) Your Goals (false positives)
● You don’t have Bugs making it to prod

time / people to is massively costly or
triage many FPs. unacceptable
● Some risk is OK.
Most companies...
Types Matter - JS Exec, not so simple
● Just knowing exec() is a function call isn’t enough
● Here, we’re running a regex, not executing a command in a shell
Going Forward, Other Resources
Example Code Patterns You Can Find with ASTs
● Find all methods that <return a value> or <take a parameter> of this type
● Find all classes that inherit <this interface> or subclass <this class>
● Find all unauthenticated routes
○ Find all methods that define routes
○ Filter out methods that check the user’s session / apply the right middleware
Killer applications:
● Find when our secure wrapper library isn’t used (ensure safe defaults)
● Find company-specific business logic bugs. E.g. “It’s a bug when:”
○ A method calls foo() before bar()
○ A method calls foo() and not bar()
○ The arguments to foo() are (not) a constant value (e.g. nonces in crypto APIs, creds, …)
Static Analysis Difficulty Cheat Sheet Abstract Interpretation /
formal methods
Probably
use an
Implementation difficulty
existing Inter-procedural data-flow

Use a tool framework
for parsing
Intra-procedural data-flow
AST
matching
Types, control flow
Regex
Expressivity and analysis complexity

Should I buy a SAST tool?
● It depends on your company- technologies, development processes, culture
● Generally ran daily or weekly (takes hours - days)
Tl;dr - if >= 1 of these apply, potentially not the best immediate use of your time
and money.
● You use modern frameworks and don’t have significant legacy code
● You ship code rapidly (embraced Agile/DevOps)
● Your code is not written in C/C++
● You haven’t already invested significant effort in building secure by default libs
● You don’t have one or more AppSec engineers who can dedicate months to
onboarding/tuning the SAST tool
(Borrowed: AppSec EU 2018 - “How Lead Companies are Scaling Their Security”)
Some considerations:
● Do you have large, legacy code bases that haven’t be thoroughly vetted?
● Is much of your code in Java or other statically typed languages?
○ Dynamic languages are harder for static analysis (Python / Ruby / JavaScript)
● How mature is your security program?
● Does your org heavily use custom frameworks with non-standard control
flow?
● Are you willing to invest months of security engineer time in the rollout?
Calculate ROI
● How much security engineer time is required to comb through results vs. how
many bugs of what severity do we find?
● Initial upfront time investment? Low recurring time cost?
● Might find many good bugs on initial integration but fewer over the years
Security engineering / secure wrapper libraries may have higher ROI
● AppSec Cali 2019 - Lessons Learned from the DevSecOps Trenches
○ Panel of senior security leaders from: Netflix, Dropbox, Datadog, Snap, DocuSign
● DevSecCon Seattle 2019 - Ban Footguns: How to standardize how
developers use dangerous aspects of your framework
● BsidesSF 2019 - DevSecOps State of the Union
○ Several examples of security engineering solving classes of vulnerabilities
Open Source Static Analysis Tools
● C/C++ - Clang Static Analyzer, Phasar, Cppcheck
● C#/.NET - Puma Scan, Security Code Scan
● Golang - gosec, glasgo
● Java - SpotBugs, Frameworks: Soot, WALA
● JavaScript/Typescript - NodeJsScan, eslint, tslint
● Python - bandit, dlint, pyre-check (data-flow analysis to find web app bugs)
● Ruby - Brakeman
Massive list: mre/awesome-static-analysis

Tl;dr sec Newsletter - https://bit.ly/tldrsec
● Talk summaries | Tools & Resources links | Original research
@clintgibler | @defreez
Conclusion
● Static analysis can find bugs (but not all bugs) at scale in your code
● Data-flow analysis is hard (and can be slow/imprecise)
○ SAST can provide value, but it depends on your env and goals. (Usually) Not an easy win.
● AST matching can find interesting code patterns (code-aware grep)
○ You can find issues unique to your code bases using open source tools
● Static analysis can be interactive, not just a black box that spits out results
● Slides: https://bit.ly/2019ShellCon_StaticAnalysis
● Code: https://github.com/nccgroup/lightweight_static_analysis
● Uplevel your security knowledge: https://tldrsec.com

Shellcon 2019 Rolling Your Own - 2

Uploaded by

Copyright:

Available Formats

You might also like

Shellcon 2019 Rolling Your Own - 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shellcon 2019 Rolling Your Own - 2

Uploaded by

Copyright:

Available Formats

Rolling Your Own

How to Write Custom, Lightweight Static Analysis Tools

Clint Gibler Daniel DeFreez

C(I)SO Sec manager

Attempt 2: -g = only return files

vs Problem domain but we’re fundamentally

Why shouldn’t you?

Source code Abstract syntax Graph structure Security bugs

Makes network requests

Library used for

Abstract Syntax Program

} Helper methods used by

} Helper methods used by

● Define helper functions used by other controllers

Api::BaseController < ApplicationController

● API-specific helper functions

● How should Clearance be used?

● How should Clearance be used?

● Typical example of unsafe API (should use execFile() or spawn())

Filter - .js ﬁles that contain

Abstract Syntax Command injection

We’re going to:

1. Assign taint to variable (uses req)

● Node is tainted if subtree

● Assume the entire right side is

1. An AST node is tainted if any descendant (child, grandchildren, etc.) is a taint

1. An AST node is tainted if any descendant (child, grandchildren, etc.) is a taint

1. An AST node is tainted if any descendant (child, grandchildren, etc.) is a taint

1. An AST node is tainted if any descendant (child, grandchildren, etc.) is a taint

Sanitizer is insufficient: ; , && ...

Only report guaranteed Try to catch all bugs

● You don’t have Bugs making it to prod

existing Inter-procedural data-flow

Expressivity and analysis complexity

Massive list: mre/awesome-static-analysis

You might also like