Professional Documents
Culture Documents
Formal Verification of Programs With Arrays: Vu An Hoa Assoc. Prof. Chin Wei Ngan
Formal Verification of Programs With Arrays: Vu An Hoa Assoc. Prof. Chin Wei Ngan
with arrays
Vu An Hoa
Assoc. Prof. Chin Wei Ngan
Agenda
• Introduction
• Review of first order logic
• Satisfiability modulo theory problem
• Implementation & illustrations
• Evaluation
• Concluding remarks
• Questions and Answers
Introduction
• Hip/Sleek: program formal verification system
– Designed to reason about recursive data structures
(linked list, binary tree, etc.)
– Developed at NUS by Chin et al.
• Hip
– Verification front end: parse and verify program in our
version of C/C#/Java
• Sleek
– Separation logic entailment checker
– Also support classical logic (or pure logic)
Formal verification
• “… act of proving or disproving the correctness of
intended algorithms underlying a system with
respect to a certain formal specification or property,
using formal methods of mathematics.”
• Independent in the model of computation
– Does not consider language issues such as the range of
type int is limited, a + b ≠ b + a, …
• Immune from algorithmic bugs
– Unlike testing methods like path coverage, boundary
value analysis, …
– Desirable!
Formal verification
• Example
– Prove that the following function In C, type int
allows for values
int sumfirst(int n) { up to 231 – 1.
if (n <=0) return 0; What if n > 109 ?
else return n + sumfirst(n-1);
}
always returns the sum 1 + 2 + … + n for input n ≥ 0.
– Induction on the input
• Base case n = 0 is trivial.
English • Assume that n ≥ 0 and sumfirst(n) returns 1 + 2 + … + n. Then n
+ 1 ≥ 1 > 0 and hence, sumfirst(n+1) returns (n+1) + sumfirst(n)
which equals to 1 + 2 + … + n + (n+1) by induction hypothesis.
The project
• Hip/Sleek is powerful in verifying properties of
recursive data structures
– There was no support for array.
– Why is array not a recursive data structure?
• Problem: extend Hip/Sleek to support array.
– Important
• Array is used intensively in many programs (why?)
• High rate of exposure to algorithmic bug
– Challenging
Array vs. Recursive data structures
• Array vs. Linked list
– Array: arranged contiguously in the memory; the
memory location of any array element can be
computed directly => efficient!
– Linked list: need to traverse the node pointers to
access the farther node.
• Random access vs. recursive access
• Array = a map from integer to value
First order logic: Syntax
• Alphabet
– Logical symbols: , ∧, ∨, →, ↔, ∀, ∃, =, (, ), variables (v1, v2, …, x,
y, …)
– Parameters: symbols for function (f1, f2, …), relation (R1, R2, …)
and constant (c1, c2, …)
• Terms
– Any variable is a term. If t1,…,tk are terms and fi is a k-ary
function then fi t1 t2 … tk is a term.
• Formulas
– Atomic: t1 = t2 or Ri t1 … tk where tj‘s are terms.
– If α and β are formulas then ( α), (α ∧ β), (α ∨ β), (α → β), (α
↔ β), ∀vi α, ∃vi α are.
First order logic: Syntax
• Example
– Language of arithmetic L = {S, +, -, x} U {<} U {0, 1}
• Formal deduction system
– Symbol manipulation mechanism to derive a formula
α (conclusion) from a set of formulas Γ (hypotheses).
– Hilbert’s system = Axioms + 1 rule (Modus Ponens)
– Natural deduction = 2 x 7 + 2 logical rules
• (universal | existential | …) (introduction | elimination)
– Notation: Γ⊢α for α is derivable from Γ
– “Formal method of mathematic” = formal deduction
First order logic: Semantic
• Structure 𝒮 for a language L
– Universe of discourse: a set, denoted by|𝒮|
– Interpretation: Functions, relations or constants
on |𝒮| correspond to parameters of L.
• Variable instantiation in a structure S
– A map ε from the variable symbols to |𝒮|
• Validity
– In a fixed structure 𝒮 and a variable instantiation ε,
a formula is either valid (true) or invalid (false).
First order logic: Semantic
• Satisfaction
– Denote Γ ⊨α [𝒮,ε] if in 𝒮 and ε, if every member of Γ is
true then α is true.
– Denote Γ ⊨α if Γ ⊨α [𝒮,ε]holds for any 𝒮 and ε.
– Γ is satisfiable if there are S and I to make every α in Γ
true. Otherwise, Γ is unsatisfiable.
• Example
– Let L = {A} U {} U {} and α = ∃x ∀y (A x y = y)
– In 𝒮 :|𝒮|=ℕ, A: (x,y) ↦ x+y and any ε, α is true.
– In 𝒮 :|𝒮|=ℝ, A: (x,y) ↦ xy and any ε, α is false.
– x = x is true in any 𝒮 and ε
Satisfiability Modulo Theory
• Satisfiability
– Given a collection of formulas Γ, decide whether
there is S and I such that every member of Γ is true.
• Modulo Theory
– Standard models/theories: integer, real numbers,
arrays, bit vectors, …
• Undecidable in general
– decidable in many interesting cases: Presburger
arithmetic, elementary real number, Boolean SAT, …
• SMT solvers: Z3, CVC3, Yices, …
Satisfiability Modulo Theory
• SMTLIB
– standard language for SMT solvers
– library of benchmarks
• Output of SMT solvers:
– sat: there is an instantiation such that the
formulas are satisfiable
– unsat: the collection is unsatisfiable
– unknown: the solver cannot decide
• Example
Satisfiability Modulo Theory
• We want to use SMT to construct proofs.
• From definitions:
– Γ ⊨α ⇔ Γ ∪{α} is unsatisfiable.
– Γ ⊢ α ⇔ Γ ∪{α} is inconsistent (i.e. Γ ∪{α}
derives two contradictory formulas)
• Soundness and completeness theorem:
– Γ⊢α ⇔ Γ ⊨α
• So Γ⊢α ⇔ Γ ∪{α} is unsatisfiable.
– In fact, SMT solver decides unsatisfiability based on
inconsistency.
Implementation
• Hip/Sleek
– Symbolic execution (context transformation)
– Important assumptions:
• every loop terminates
• every recursive functions is well-founded (eventually reduces to
the base case)
• Array
– Currently support pure logic only
– Additional assumption:
• no pointer aliasing i.e. different array variables are for different
arrays
– Allow for user defined relations
Implementation
• Arrays
– Viewed as value just like integer
– Utilize SMT array theory
• Relations
– Define in SMT using Boolean-value functions
– Axiomatize using the user’s definition
• Implication
– For Γ⊢α, ask whether Γ ∪{α} is satisfiable
– Both outputs sat and unknown are deemed as satisfiable =>
implication is invalid => there is no proof => Γ⊬α
– Reliable (as long as Z3 is reliable)
Illustration
• Sum of the elements of an array
Illustration
• Selection sort
Illustration
• Data structures
Evaluate
• Limitations
– Incapable of performing induction
– Intermediate steps discovery
– Memory context (same pointer)
– Can only tackle easy problems
– Work on model checking (no formal proof
generation)
Evaluate
• Frama-C + Jessie + Why
– Do not allow recursive relations
– Check for memory safety
– Prove termination using user supply loop invariant
– Incorporate many solvers
• Hip/Sleek
– Allow recursive relations
– Utilize only Z3 for arrays
The glory unveils
• Def: The difficulty level of alpha with respect
to Gamma = least n in N such that alpha in
Gamma_n.
• When the difficulty level < 3, the result is said
to be trivial. When Gamma |/- alpha, level of
alpha is infinity.
• Observation: Provable examples are all of
trivial level.
Concluding remarks
• In this project, we achieve
– An (incomplete) solution to verification of
programs with arrays
– A collection of comprehensive examples that
illustrates the capability of our system
– Analysis of the verification power
Concluding remarks
• A lot of interesting stuffs can be added
– Separation logic
– Proof generation
– Induction
– Invariant detection heuristics
– Etc.
Thank you for your attention