Kumar Rahul-Paper PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

UCSB RMP

Q-Learning for Deep Brain Stimulation AUG 2019


!

Utilizing Q-Learning to Optimize Deep Brain Stimulation in Parkinson’s Patients


Rahul Kumar,1 Timothy Matchen2, Jeff Moehlis2
1Indus
International School, Billapura Cross, Sarjapura, Bangalore, Karnataka
rahulukumar02@gmail.com
2Department of Mechanical Engineering, University of California, Santa Barbara, CA 93106


Over 6 million people around the world are affected by Parkinson’s Disease, a degenerative disorder that affects the
motor system. The loss of dopaminergic cells in the striatum is normally treated with levodopa, a precursor to
dopamine which acts a dopamine replacement agent. However, levodopa’s efficacy reduces with time, as well as
causing adverse side effects such as dyskinesia, and so levodopa is not viable as a long-term cure. Deep Brain
Stimulation (DBS) is a surgical method designed to reduce the effects of Parkinson’s by placing a neurostimulator in
the brain which sends electrical impulses through electrodes in order to treat movement disorders such as
Parkinson’s. However, DBS is not well-understood, and optimizing stimulation parameters is done manually, which
is a time-consuming and difficult task. If the stimulation parameters could be found electronically rather than
manually, time and effort could be saved. Furthermore, by ensuring that the parameters are as optimal as possible,
the patient would get the greatest benefit. Since Parkinsonian neurons show excess synchronization, it is necessary
to desynchronize the neurons via a stimulator. Current iterations of neural networks control this process by offsetting
the firing of each stimulator. However, it is not necessary that the offsets used are the most efficient. We propose a
Q-Learning solution in which the offsets are not predetermined, in order to find the optimal stimulation parameters
as quickly as possible.

Keywords: Parkinson’s Disease, Deep Brain Stimulation, Machine Learning, Reinforcement Learning, Q-Learning

1. INTRODUCTION order for DBS to be as effective as possible, several


parameters, such as frequency, amplitude, and pulse width
Parkinson’s Disease (PD) is a degenerative disorder of must be optimized [8]. This optimization, which is done
the central nervous system for which there is no known manually, is time-consuming for both the surgeon as well as
cure. Over 6 million people worldwide suffer from PD, and the PD patient.
it causes over 100,000 deaths each year [1]. PD sufferers Therefore, we wish to automate this process by adjusting
experience several motor symptoms such as shaking, the above process based off of electrical feedback from the
rigidity, and bradykinesia, caused by the death of cells in patient. Machine learning models for deep brain stimulation
the substantial nigra, resulting in a dopamine deficiency [2]. have been made before, utilizing closed-loop models, deep
As PD progresses, patients often demonstrate non-motor learning classification, and optimization algorithms.
symptoms such as dementia, depression, sleep deprivation, However, these models are not optimized, and therefore do
and behavioral issues [3]. not approach the efficiency needed for DBS.
While PD cannot be cured, the symptoms can be partially In this paper, we propose a solution using a
alleviated by levodopa, or L-DOPA [3]. The dopamine reinforcement learning neural network which utilizes Q-
molecule itself is too polar to pass through the blood-brain Learning. Reinforcement learning algorithms do not require
barrier, but L-DOPA, as an amino acid, can [4]. In the input/output pairs, or explicit correction. Instead, they strike
brain, L-DOPA is converted into dopamine via the process a balance between using current knowledge and obtaining
of decarboxylation. However, since PD is a degenerative new knowledge to find an optimal solution.
disease, L-DOPA becomes less and less effective as the Unlike supervised learning, reinforcement learning does
disorder progresses [5]. not require a target output. Instead, the agent is given
When PD can no longer be controlled through medication feedback about the appropriateness of its response.
alone, Deep Brain Stimulation (DBS) is used to control the However, the agent is not told what the correct response
side effects, especially motor fluctuations and tremors [6]. should be, unlike in supervised learning. For this reason,
DBS is the electrical stimulation of areas of the brain, reinforcement learning is much more realistic than
including the globus pallidus internus, thalamus, supervised learning - in real life and in nature, the agent
subthalamic nucleus, and the peduncolopontine nucleus by rarely knows the right answer, only that it has made a
a neurostimulator. Each patient has different symptoms, and mistake.
so the specific location of DBS in the aforementioned areas
differs. DBS is highly effective, and patients show
significant improvement in motor score evaluations [7]. In
UCSB RMP
Q-Learning for Deep Brain Stimulation AUG 2019
!

II. METHODS Originally, we used 4 stimulators, as this was the


accepted number in previous literature[9]. We also
Patients with Parkinson’s Disease exhibit excessively experimented with different numbers of stimulators to find
synchronized neurons in the brain. Due to this, the brain the optimum.
struggles to control the motion of the body, leading to the
motor symptoms commonly experienced in PD such as
tremors, shaking, and bradykinesia. Deep Brain Stimulation
desynchronizes the neurons, which helps the patient III. RESULTS
function normally. Our goal is to use Q-Learning to Our first and main goal was to build a working Q-
automatically stimulate the patient. In order to model the learning algorithm that could desynchronize our population
activity of neurons in the brain, we used a network of phase of neurons. Our algorithm largely succeeded in that task.
oscillators, which represent our Parkinsonian neurons. The Our neurons started off completely synchronized, and
activity of these oscillators is modeled by Equation 1 in ended up largely desynchronized(See Appendix 1).
Tass 2003[9].The oscillators, like the neurons they Completely desynchronizing our population is impossible,
represent have a tendency to clump together. For this as the neurons are globally coupled and thus resist
reason, before we turn on our stimulators, our oscillators desynchronization. However, our algorithm could be
are all at the same phase. Our model’s aim is to use our optimized to get the extent of desynchronization higher.
stimulators to spread apart the phase of the oscillators. We Our second goal was to test if we could achieve a similar
use Equation 4 in Tass 2003[9] to measure the level of level of desynchronization with multiple populations of
synchronization in the neurons. neurons, all hidden from each other. In other words, each
We built a Q-Learning model which decides whether or population was completely separate, with no interaction or
not to turn the neurostimulator on. The environment of the knowledge flow between different stimulators. Firstly, we
model consists of neurons, which are assigned random found that our model with separate populations did an
eigenfrequencies. In order to closely model patients with effective job of desynchronizing our neurons.(See
Parkinson’s Disease, the neurons are closely synchronized - Appendix 2) This is important, as it is impractical to
their eigenfrequencies are similar. We assign a learning rate, stimulate each neuron separately. By dividing the neurons
discount factor, and exploration/exploitation rate to the into groups and stimulating each group as a whole, we
model. reduce the amount of electricity applied to the neural tissue,
The Q-table has two states: Phase2 and R2. There are and thus reduce the chance of damage.
only two actions: either the neurostimulator is turned on, or A previous study on this subject succeeded in
it is turned off. The reward is calculated by measuring the desynchronizing neurons by separating them into separate
synchronization of the neurons - the further apart the groups. This study demonstrated that the optimal
eigenfrequencies, the higher the reward. The goal of our Q- stimulation pattern for multiple stimulators was to have
Learning model is to maximize the reward by them offset from each other: each would peak at different
desynchronizing the neurons. times. Therefore, this behavior was hard coded into the
The model starts out by exploring the state and action stimulators. On the other hand, while our stimulators were
space. Since the learning rate is originally very high, the given no such information, they learned the same type of
agent almost always pick a random action. As the agent behavior by themselves.(See Appendix 3) Over time, the
learns more, the learning rate decreases. The agent begins stimulators begin to fire offset from each other. This tell us
to use its own judgement to maximize its reward. that it is not necessary to hard code that behavior into our
The agent’s discount factor determines the weight it stimulators, saving time.
places on current rewards vs future rewards. If the discount Next, we wanted to find if 4 stimulators was indeed the
factor is too high(the agent cares too much about current optimum. For this reason, we ran our model again, with the
rewards), the agent is less likely to converge to the highest number of stimulators varying from 3 to 8. We found that,
reward. For this reason, the agent starts with a low discount in our model, the optimum number of stimulators was
factor which gradually increases over time. indeed 4. This is because the model runs into issues if the
The first iteration of our model had only one stimulator, number of stimulators is either increased or decreased.
and so all of the neurons were placed in the same group: If the number of stimulators is increased, each stimulator
our stimulator had knowledge of all of their controls an increasingly smaller slice of the population. For
eigenfrequencies. Our Q-learning agent successfully example, if there are 8 stimulators, each stimulator only
desynchronized the neurons. controls 12.5% of the environment, and has no information
However, stimulating each neuron individually would about the remaining 87.5%. For this reason, the stimulators
require many electrodes[9], so we decided to split our cannot desynchronize the entire population, as they do not
neurons into multiple populations. Each population would know the frequency values of most of the neurons. In many
be controlled by a different stimulator. These stimulators runs of our model, the neurons tend to clump together as
only have information about the set of neurons they control: they are globally coupled and the stimulators do not take
they are not aware of the frequencies of other neurons. explicit action to prevent this.
Their reward is calculated by only measuring the With a smaller number of stimulators, we faced the
synchronization of neurons that the stimulator controls. opposite problem. As mentioned above, the neurons want to
UCSB RMP
Q-Learning for Deep Brain Stimulation AUG 2019
!

synchronize, and must be forced apart by the stimulators. ACKNOWLEDGEMENTS


Neurons in the same population usually end up with similar
frequencies, as the stimulator always performs the same The first author would like to thank his co-authors for
action on its population. It cannot choose to stimulate only their continued mentorship and guidance throughout this
half of its population; it must stimulate all or none. On the project. Their insight and technical ability have been
other hand, different stimulators perform different actions, invaluable, and their scientific ability was crucial to this
and so different populations of neurons diverge rapidly. project. We would like to thank the rest of the Moehlis lab
With a smaller amount of stimulators(2 or 3), the for their assistance and support. We would also like to
population is not pulled apart as effectively. thank the University of California Santa Barbara for
providing us with the facilities we required in order to
complete this project, as well as Dr. Lina Kim, Dr. Michael
Hughes, and the rest of the Research Mentorship Program
IV. CONCLUSIONS for giving us the ability to undertake and complete this
Q-Learning can be used effectively in Deep Brain project.
Stimulation. Our model successfully desynchronizes the
neuronal population. However, our original model is not REFERENCES
viable as stimulating each neuron would involve pumping
far too much electricity into the patient.
Our second model is far more viable as it significantly [1] Sejvar, James. “Faculty of 1000 Evaluation for Global,
limits the number of stimulators, and thus the danger of Regional, and National Burden of Neurological
damage to the patient. We find that 4 stimulators do almost Disorders during 1990-2015: a Systematic
as well as stimulating each neuron individually. Analysis for the Global Burden of Disease
We also find that it is unnecessary to force the stimulators Study 2015.” F1000 - Post-Publication Peer
to fire at different times. While that is the most efficient Review of the Biomedical Literature, 2018
firing pattern, our stimulators learn to do so by themselves, [2] “Parkinson's Disease Information Page.” National
therefore saving the operator’s time. Institute of Neurological Disorders and Stroke, U.S.
Finally, 4 stimulators, the accepted amount in previous Department of Health and Human Services,
studies, is indeed the optimal for our setup. Increasing or www.ninds.nih.gov/Disorders/All-Disorders/
decreasing the number of stimulators decreases the extent Parkinsons-Disease-Information-Page.
of the desynchronization. [3] Sveinbjornsdottir, Sigurlaug. “The Clinical Symptoms
of Parkinson's Disease.” Journal of Neurochemistry,
vol. 139, 2016, pp. 318–324.
V. FUTURE WORK [4] “L-DOPA.” L-DOPA - an Overview | ScienceDirect
There are several ways in which we can expand the work Topics, www.sciencedirect.com/topics/neuroscience/l-
that we have done so far. Currently, our stimulators receive dopa.
no knowledge about the actions of nearby stimulators. This [5] Thanvi, B R. “Long Term Motor Complications of
is the reason why their effect drops significantly as the Levodopa: Clinical Features, Mechanisms, and
number of stimulators increases. It is possible that a larger Management Strategies.” Postgraduate Medical
amount of stimulators would be more effective if each Journal, vol. 80, no. 946, 2004, pp. 452–458.
stimulator received partial information - the actions of its [6] Montgomery, Erwin B. “What Is Deep Brain
immediate neighbors. We would like to see if the extent of Stimulation?” 20 Things to Know about Deep Brain
desynchronization increases with this change. Stimulation, 2014, pp. 1–13.
Secondly, our Q-Learning algorithm has numerous [7] “Deep Brain Stimulation.” AANS, www.aans.org/en/
dependent variables, such as learning rate, exploitation- Patients/Neurosurgical-Conditions-and-Treatments/
exploration coefficient, discount factor, intensity, strength Deep-Brain-Stimulation.
of inter-neuronal attraction, as well as number of [8] Mcintyre, Cameron C., et al. “Optimizing Deep Brain
stimulators. We would like to vary these variables in order Stimulation Parameter Selection with Detailed Models
to see if there is any correlation between them and of the Electrode-Tissue Interface.” 2006 International
desynchronization of the neurons. Conference of the IEEE Engineering in Medicine and
Furthermore, in some tests of our model, some Biology Society, 2006
stimulators seem to abruptly turn off, though this does not [9] Tass, Peter A. “A Model of Desynchronizing Deep
affect the extent of desynchronization. Since Parkinsonian Brain Stimulation with a Demand-Controlled
neurons tend to fire at similar frequencies, it is odd that the Coordinated Reset of Neural Subpopulations.”
neurons do not tend towards synchronization once the Biological Cybernetics, vol. 89, no. 2, 2003, pp. 81–88.
stimulators stop acting on them. Further research is
required to understand if this is simply an error in our
model or if the neurons truly do manage to stay
desynchronized.
UCSB RMP
Q-Learning for Deep Brain Stimulation AUG 2019
!

APPENDIX FIG. 2. (Color online) A graphical representation of Fig. 1.


The initial part of the graph shows the synchronization of
phase oscillators without DBS. The right-hand side of the
1. Desynchronization graph shows the rapid desynchronization of the oscillators
once stimulation is applied.
a. Extent of desynchronization

2. Stimulator offset

FIG. 1. (Color online) The extent of desynchronization,


measured by Equation 4 in Tass 2003[9]. Our model starts
with a set of synchronized phase oscillators and
desynchronizes them effectively. FIG. 3. (Color online) Each line represents a different
stimulator. While the stimulators initially fire at random
without any pattern, they eventually learn to peak offset
b. Graphical representation
from the other stimulators - they all peak at different times.

3. Different number of stimulators


a. 3 Stimulators
UCSB RMP
Q-Learning for Deep Brain Stimulation AUG 2019
!

FIG. 4. (Color online) With 3 stimulators, the different


stimulators show no pattern in their firing, likely because
there are too few stimulators to force the oscillators apart

b. 6 Stimulators

FIG. 5. (Color online) Our model with 6 stimulators shows


the same characteristic offset. However, it does not do as
well, likely because the stimulators do not have enough
information. In this graph, we notice that some stimulators
occasionally “turn off”.

You might also like