Presentation: On Optimal Stopping Strategies For Text Recognition in A Video Stream As An Application of A Monotone Sequential Decision Model

On optimal stopping strategies for text
recognition in a video stream as an application

of a monotone sequential decision model
Konstantin Bulatov, Nikita Razumnyi, Vladimir V. Arlazarov

September 24, 2019
Introduction
Mobile DAR systems:

• Mobile offline document data
extraction in real time
• Ability to use video stream to
increase recognition quality
Problems:
• How to combine per-frame
results
• When to stop
1/15
Introduction
1 1
2 2
3 3
4 4
5 5
6 6
7 7
2/15
Goals
3/15
Goals
Improving per-frame accuracy
3/15
Goals
Improving combination strategy
3/15
Goals
3/15
Goals
3/15
Goals
Stopping strategies
3/15
Goals
• Explore a decision-theoretic framework for

recognition stopping problem
• Describe a stopping method based on next
integrated result modeling
• Provide experimental evaluation results
4/15
Problem statement
Optimal stopping problem:

X1 , X2 , X3 , . . . – observed sequence
Ln (X1 , X2 , . . . , Xn ) – loss function
N – stopping time
Goal: minimize expected loss at stopping time:
E (LN (X1 , X2 , . . . , XN )) → min
5/15
Problem statement
Proofreading problem:
M – number of initial errors
Xi – errors corrected by the i-th proofreading
c – cost of each proofreading
Loss function:
∑
n
Ln = M − Xi + c · n
i=1
6/15
Problem statement
Recognition stopping problem:

X∗ – correct result
Xi – i-th per-frame result
Rn = R(X1 , . . . , Xn ) – combination of n results
c – cost of each observation
Loss function:
Ln = ρ(Rn , X∗ ) + c · n
7/15
Monotone stopping problems
Monotone problems:
∀n {Ln ≤ En (Ln+1 )} ⊂ {Ln+1 ≤ En+1 (Ln+2 )}
Myopic rule:
NA = min {n ≥ 0 : Ln ≤ En (Ln+1 )}
8/15
Proposed approach
9/15
Proposed approach
Assumption about the integrator function:

∀n E(ρ(Rn , Rn+1 )) ≥ E(ρ(Rn+1 , Rn+2 ))
Monotone condition events:

{Ln ≤ En (Ln+1 )} = {ρ(Rn , X∗ )−En (ρ(Rn+1 , X∗ )) ≤ c}
Triangle inequality:
ρ(Rn , X∗ ) − En (ρ(Rn+1 , X∗ )) ≤ En (ρ(Rn , Rn+1 ))
10/15
Proposed approach
Assumption about the integrator function:

∀n E(ρ(Rn , Rn+1 )) ≥ E(ρ(Rn+1 , Rn+2 ))
Monotone condition events:

{Ln ≤ En (Ln+1 )} = {ρ(Rn , X∗ )−En (ρ(Rn+1 , X∗ )) ≤ c}
Triangle inequality:
ρ(Rn , X∗ ) − En (ρ(Rn+1 , X∗ )) ≤ En (ρ(Rn , Rn+1 ))
10/15
Proposed approach
1. Estimate the expected distance

En (ρ(Rn , Rn+1 )) given current observations;
2. Threshold the obtained estimation, thus
approximating the myopic rule.
Estimation:
( )
1 ∑
n
ˆn =
∆ δ+ ρ(Rn , R(x1 , x2 , . . . , xn , xi ))
n+1
i=1
11/15
Dataset
MIDV-500 dataset:
• 50 types, 10 clips per type
• 15000 frames in total
• 546 fields in total
• Analyzed field groups:

document numbers, dates,
MRZ, Latin names
• 2239 field clips in total
• Tesseract v4 + ROVER
12/15
Dataset
MIDV-500 dataset:
• 50 types, 10 clips per type
• 15000 frames in total
• Analyzed field groups:

document numbers, dates,
MRZ, Latin names
• 2239 field clips in total
• Tesseract v4 + ROVER
12/15
Results
Tesseract v4.0.0
0.28 NK
0.26
0.25
Mean distance ρL to X ∗
0.24
0.23
0.21
0.20
0.19
0.18
0.16
0.15
1.00 3.90 6.80 9.70 12.60 15.50 18.40
Mean number of processed frames
13/15
Results
Tesseract v4.0.0
0.28 NK
0.26 NCX
0.25 NCR
0.24
0.23
0.21
0.20
0.19
0.18
0.16
0.15
1.00 3.90 6.80 9.70 12.60 15.50 18.40
13/15
Results
Tesseract v4.0.0
0.28 NK
0.26 NCX
0.25 NCR
0.24 N
0.23
0.21
0.20
0.19
0.18
0.16
0.15
1.00 3.90 6.80 9.70 12.60 15.50 18.40
13/15
Results
E(N) and E(ρL (RN , X∗ ))
Stopping Target interval for the average number of observations

method 5 ± 0.5 6 ± 0.5 7 ± 0.5 8 ± 0.5 9 ± 0.5 10 ± 0.5
5.332 8.471
NCX ∅ ∅ ∅ ∅
0.195 0.170
5.099 6.920 8.594 10.103
NCR ∅ ∅
0.201 0.180 0.167 0.164
5.000 6.000 7.000 8.000 9.000 10.000
NK
0.197 0.191 0.185 0.180 0.178 0.171
4.571 5.539 6.683 7.742 8.771 9.779
N
0.174 0.165 0.161 0.161 0.158 0.158
14/15
Conclusion
1. Decision-theoretic framework for stopping the text

recognition in a video stream
2. Stopping method based on assumed monotonicity
and next integrated result modeling
3. Evaluation on an open dataset MIDV-500, proposed
method outperforms previously introduced
approaches
4. Future work: confidence scores incorporation;

generalization for other objects; multiple objects
15/15
Questions?
15/15

Presentation: On Optimal Stopping Strategies For Text Recognition in A Video Stream As An Application of A Monotone Sequential Decision Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation: On Optimal Stopping Strategies For Text Recognition in A Video Stream As An Application of A Monotone Sequential Decision Model

Uploaded by

Copyright:

Available Formats

On optimal stopping strategies for text

recognition in a video stream as an application

Konstantin Bulatov, Nikita Razumnyi, Vladimir V. Arlazarov

Mobile DAR systems:

Improving per-frame accuracy

Improving combination strategy

• Explore a decision-theoretic framework for

Optimal stopping problem:

Goal: minimize expected loss at stopping time:

E (LN (X1 , X2 , . . . , XN )) → min

Recognition stopping problem:

∀n {Ln ≤ En (Ln+1 )} ⊂ {Ln+1 ≤ En+1 (Ln+2 )}

Assumption about the integrator function:

Monotone condition events:

Assumption about the integrator function:

Monotone condition events:

1. Estimate the expected distance

• Analyzed field groups:

• Analyzed field groups:

E(N) and E(ρL (RN , X∗ ))

Stopping Target interval for the average number of observations

1. Decision-theoretic framework for stopping the text

4. Future work: confidence scores incorporation;

You might also like