An Intuitive Approach To DTW - Dynamic Time Warping

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Open in app Sign up Sign in

Search Write

An intuitive approach to DTW —


Dynamic Time Warping
How to visualize the algorithm, so that you can customize it on your
own.

Himanshu Chandra · Follow


Published in Towards Data Science · 8 min read · Jun 28, 2020

106

Photo by Nigel Tadyanehondo on Unsplash

S ince you are here, I assume you already know the reason why we use
Dynamic Time Warping, or DTW in time-series data. Simply put, it’s
used to align or match two similar patterns.

A brief overview

O ne of the reasons DTW was initially developed was for speech


recognition. Your mother may speak slowly one day, and hurriedly on
another; even have a bit of cold and sore throat on some days, but you still
can recognize her voice. Can machines do the same? Can they somehow
match the highs and lows, the peaks and troughs, the frequencies of her
voice, no manner how she speaks and tell us it is her voice indeed?

There are several awesome real life situations were DTW just rocks — if you
do not know them already, I recommend getting to know them — they are
really fun!

You can read more about DTW here and about it’s many applications here (refer
the Introduction section)

Two signals being aligned — “Image by author”

Take me there, quick!

I love this algorithm! In essence, it is one of those short and elegant algos
which pack a punch. It is a classic example of finding the shortest path
using the Dynamic Programming approach.

I have had colleagues running away from understanding the inner workings
of the algo as it involves recursion. This eventually stops them from
understanding the nuances of the approach and from learning how to tweak
it as per their requirements.

Let’s visualize the logic behind the algorithm in a non-programmatic way so


that we can write it from scratch.

To understand recursion, one must first understand


recursion.

— Stephen Hawking

Imagine you are standing at the blue square and wish to go to the red square.
All the numbers that you see in the cells along the path, correspond to the
toll amount you have to pay at each step (arbitrarily decided, for now). If I
were to ask you, “Tell me the least amount one needs to spend to reach from blue
to red”, how would you do it?

Find the ‘cheapest’ route — “Image by author”


In fact, I’ll make it simpler by saying that you are supposed to move only in
the ‘forward’ direction. The black arrows show the only 3 ‘allowed’ directions
(right, down, right-down diagonal) and red arrows show ‘restricted’
directions, which are everything apart from the 3 allowed ones (up, left,
other diagonals, etc.). This is similar to what DTW assumes before solving
the problem.

Allowed directions — “Image by author”

Another way to say the above fact is that, “You can arrive at any square only
from one of the 3 adjacent squares”:

Another way to look at it — “Image by author”

A ‘brute-force’ way would be to try all the paths possible from the blue to the
red square and choose the cheapest one. However, dynamic programming
and recursion give us a better, smarter way.

The lazy boss

I like to solve all recursion problems by thinking as ‘the lazy boss’ who has
uncountable minions at his disposal to do the job. If someone asked me
to solve the above problem, I would simply say, “Minions A, B and C, tell me
the least cost to reach the 3 squares surrounding the red square, and then I’ll
calculate the answer in a minute. Till then, don’t bother me.”

This is the trick recursion uses — give the exact same


kind of work you have been asked to do, to your
underlings, after ‘scaling down’ the work a bit. And
make sure to keep something simple for yourself, so
that no one can say you did nothing!

Suppose by some magic (which you are not interested in knowing about), the
minions bring you the answer, marked in green cells:
Answers of Minions A, B and C — “Image by author”

All I need to do now is find the minimum of (10, 7, 7), which is 7 and then
add current square’s cost, which is 2. This gives me the answer as 9. Done!

9 is the minimum cost of travelling from blue to red. But wait, how did the
minions come up with the green values? They of course, emulate you and act
as lazy bosses too!

Each minion gets hold of 3 junior minions (the damned hierarchy in the
office, I tell you!) and tells them to bring them least cost values for their 3
neighbourhood/adjacent squares. Here’s an animation to better explain this:

Minions at work — “Image by author”

This keeps going on and on, with each junior minion ‘magically’ getting its
answer. But for how long? Surely it ends sometime? Yes, it does. When there
is no way to subdivide the work further. When a minion gets the task to
calculate the cost at the blue square, he can’t delegate it to anyone else, as
there is no neighbourhood to go to. He just needs to know that the cost there
is zero.

That’s it. You can translate this into a recursive python function in just a few
lines of code:

def cheapest(cost, i, j):


if (i==0) & (j==0):
return 0 ## can't subdivide the work anymore

if (i<0) | (j<0):
return np.inf

## current square's cost + minimum of the 3 neigbours


return cost[i][j] + min(cheapest(cost, i-1, j-1),
cheapest(cost, i-1, j),
cheapest(cost, i, j-1))

Here, cost is a 2-D array, our initial matrix:


cost = [
[0,2,5,1,1],
[5,3,4,2,2],
[1,1,6,1,3],
[1,3,2,1,2]
]

Notice how I have returned np.inf (infinity) when either i or j is less than
zero. This is just a way to ignore the neighbour which doesn’t exist. For
example, when you are in a top-row square, the only neighbour you need to
consider is the left one, as there’s no square above the current square.
Similar logic goes for squares in the first column, where there’s no
neighbour to the left.

And you can call the recursive function by simply calling:

output_cost = cheapest(cost, 3, 4)
print(output_cost)

The output it will give is 9.

If you call it for all the squares, you can finally create a cheapest cost matrix
as output:

output_cost_matrix = [] ## used to store all outputs


for i in range(4):
for j in range(5):
output_cost_matrix.append(cheapest(cost, i, j))

## reshape the output for better display


output_cost_matrix= (np.array(output_cost_matrix)).reshape(-1,5)
print(output_cost_matrix)

Output:

[[ 0, 2, 7, 8, 9],
[ 5, 3, 6, 8, 10],
[ 6, 4, 9, 7, 10],
[ 7, 7, 6, 7, 9]]

So, what would be the cheapest path from the blue square to the red square?

We just need to track the minimum-neighbour around each square. Here’s the
code if you want to print the path too. This is certainly not optimized, but is
simple to understand:

def trace_path(output_cost_matrix, i, j):


path = [(i, j)]
while ((i>0) | (j>0)):
neighbours = [output_cost_matrix[i-1, j-1],
output_cost_matrix[i-1, j],
output_cost_matrix[i, j-1]]
## see which neighbour is the smallest
path_min = np.argmin(neighbours)

## store the position of the smallest neighbour


if path_min==0:
i=i-1
j=j-1
elif path_min==1:
i=i-1
else:
j=j-1
path.append((i, j))
return path[::-1] ## return after reversing the list

Let’s see the path returned from blue to red square by calling:

trace_path(output_cost_matrix, 3, 4)

Output:

[(0, 0), (0, 1), (1, 2), (2, 3), (3, 4)]

Representing this graphically:

Traced path — “Image by author”

Or tracing it over our original matrix:

Cheapest path — “Image by author”

Cool! But where’s DTW in all of this?

Q uite right! How does DTW picture into all of this?

Well, DTW is nothing but simply matching one point


in time-series pattern1 with the closest point in
pattern2.

Let’s look at the mapping of two signals again. The one in cyan is pattern1
and the one in orange is pattern2, with the red lines trying to find a
corresponding point on pattern1 for every point on pattern2:
Pattern mapping — “Image by author”

So essentially, we are finding the shortest or cheapest path between a


pattern2-point and pattern1-point. But what is the cost between any two
points. Do we put it randomly like we did in our example for each square?

You have various choices you can pick from here, as per your use-case, but
the most common one is the euclidean distance between those points.

You arrange all pattern1-points as column axis of the matrix and all
pattern2-points as row axis. Then fill each square with the euclidean
distance:

Patterns arraged as a matrix — “Image by author”

And then it’s the recursive algorithm we just went through. An output path
like [(0,0), (0,1), (1,1), (2,2), (3,3), (4,3)] would mean that point 0 in
pattern1 should be matched up with points 0 and 1 in pattern2. Points 1, 2, 3
and 4 in pattern1 should be matched with points 1, 2, 3 and 3 in pattern2.

But what does that actually achieve? What does such shortest-distance
matching mean?

I like to imagine it as a solution where you are trying to


match pattern2 with pattern1 in a manner that
ensures least stretching (or shrinking) of pattern2;
greedily matching it to the closest neighbour.

And there you have it. A simple DTW algorithm implemented from scratch.

I have not mentioned several nuances and variations of the DTW process, for
example a windowed DTW where we add a locality constraint. I hope this
article has sparked your interest in knowing more about it. Do let me know if
you discover something cool about it!

Interested in sharing ideas, asking questions or simply discussing thoughts?


Connect with me on LinkedIn, YouTube, GitHub or through my website: I am
Just a Student.

See you around & happy learning!

Himanshu Chandra - Business Head - AI, ML


www.linkedin.com

Dtw Dynamic Time Warping Dynamic Programming Recursion

Time Series Analysis

Written by Himanshu Chandra Follow

108 Followers · Writer for Towards Data Science

Tech enthusiast. I love absorbing new concepts and sharing them in the simplest of ways.
Reach out on: https://www.linkedin.com/in/himanshu-chandra-33512811

More from Himanshu Chandra and Towards Data Science


Himanshu Chandra in Towards Data Science Cristian Leo in Towards Data Science

Pipelines & Custom Transformers The Math behind Adam Optimizer


in scikit-learn: The step-by-step… Why is Adam the most popular optimizer in
Understand the basics and workings of scikit- Deep Learning? Let’s understand it by diving…
learn pipelines from the ground up, so that…

8 min read · May 7, 2020 16 min read · Jan 30, 2024

714 8 2.3K 19

Siavash Yasini in Towards Data Science Himanshu Chandra in Analytics Vidhya

Python’s Most Powerful Decorator A Fun Project (Pose Detector) With


And 5 ways to use it in data science and Google’s Teachable Machine
machine learning It’s 2 PM… Done with a delicious lunch… Now
working on your laptop… Droopy eyes……

· 11 min read · Feb 2, 2024 6 min read · Apr 6, 2020

2.4K 17 31

See all from Himanshu Chandra See all from Towards Data Science

Recommended from Medium

Andrew Bowell in Towards Data Science Data Master

How to Tune the Perfect Smoother Temporal Patterns: Time Series


Get the most out of your data with Whittaker- Decomposition Methods
Eilers smoothing and leave-one-out cross… Hello Folks 🙂,
· 12 min read · 3 days ago · 3 min read · Jan 5, 2024

42 50

Lists

Predictive Modeling w/ Practical Guides to Machine


Python Learning
20 stories · 963 saves 10 stories · 1137 saves

ChatGPT prompts
44 stories · 1195 saves
Karolina Kozmana Aggelos K

Common side effects of not A Step-by-Step Guide to Building


drinking your own Market Maker Bot with…
By rejecting alcohol, you reject something Contents
very human, an extra limb that we have…

10 min read · Jan 22, 2024 6 min read · Jan 28, 2024

19.8K 545 11

btd Satyajit Chaudhuri in Analytics Vidhya

Location-Based Learning: Assessment of Accuracy Metrics


Discovering Data Patterns with… for Time Series Forecasting
Clustering geolocation data involves grouping Introduction
spatially distributed data points into clusters…

· 4 min read · Nov 17, 2023 13 min read · Oct 26, 2023

14 277

See more recommendations

You might also like