Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Journal of Neuroscience Methods 234 (2014) 66–72

Contents lists available at ScienceDirect

Journal of Neuroscience Methods


journal homepage: www.elsevier.com/locate/jneumeth

Basic Neuroscience

Diving deeper into Zebrafish development of social behavior:


Analyzing high resolution data
Christine Buske a,b,∗ , Robert Gerlai b
a
Papers/Springer SBM, London, UK (previously: University of Toronto, Department of Cell & Systems Biology)
b
University of Toronto Mississauga, Department of Psychology, Toronto, Canada

h i g h l i g h t s

• Zebrafish are a high throughput, cost effective vertebrate model.


• Behavioral data collection/analysis is time-consuming.
• The R programming language is a powerful tool in data analytics.
• R was used to analyze a large behavioral dataset.

a r t i c l e i n f o a b s t r a c t

Article history: Vertebrate model organisms have been utilized in high throughput screening but only with substantial
Received 2 May 2014 cost and human capital investment. The zebrafish is a vertebrate model species that is a promising and
Received in revised form 16 June 2014 cost effective candidate for efficient high throughput screening. Larval zebrafish have already been suc-
Accepted 16 June 2014
cessfully employed in this regard (Lessman, 2011), but adult zebrafish also show great promise. High
Available online 23 June 2014
throughput screening requires the use of a large number of subjects and collection of substantial amount
of data. Collection of data is only one of the demanding aspects of screening. However, in most screening
Keywords:
approaches that involve behavioral data the main bottleneck that slows throughput is the time consum-
Zebrafish
Behavior ing aspect of analysis of the collected data. Some automated analytical tools do exist, but often they
Methods only work for one subject at a time, eliminating the possibility of fully utilizing zebrafish as a screening
tool. This is a particularly important limitation for such complex phenotypes as social behavior. Testing
multiple fish at a time can reveal complex social interactions but it may also allow the identification of
outliers from a group of mutagenized or pharmacologically treated fish. Here, we describe a novel method
using a custom software tool developed within our laboratory, which enables tracking multiple fish, in
combination with a sophisticated analytical approach for summarizing and analyzing high resolution
behavioral data. This paper focuses on the latter, the analytic tool, which we have developed using the
R programming language and environment for statistical computing. We argue that combining sophis-
ticated data collection methods with appropriate analytical tools will propel zebrafish into the future of
neurobehavioral genetic research.
© 2014 Published by Elsevier B.V.

1. Introduction adult zebrafish have been becoming increasingly popular in


behavioral neuroscience as the identification of mutation or drug-
Larval zebrafish have been an excellent vertebrate model for induced alterations in brain function may be best investigated with
high throughput studies (Lessman, 2011), but adult zebrafish may behavioral test paradigms (Gerlai, 2010). While larval zebrafish
also hold substantial promise in this regard (Brennan, 2014). The offer behavioral endpoints, adult zebrafish provide the advantage
limiting factors in using adult zebrafish for high throughput stud- of a far wider ranging behavioral repertoire (Norton and Bally-
ies lie not in maintenance or acquisition costs, but rather in time Cuif, 2010). Behavior is particularly appropriate when modeling
consuming experimental procedures and analysis. Nevertheless human conditions for which abnormal behavior is a core symptom
(Gerlai, 2012; Brennan, 2014). Previous work has also indicated a
correlation between neurochemical changes and behavioral mat-
∗ Corresponding author. Tel.: +44 7867410264. uration in zebrafish (Buske and Gerlai, 2012). From this leads the
E-mail address: Christine.Buske@springer.com (C. Buske). argument that changes in behavior are accompanied by changes in

http://dx.doi.org/10.1016/j.jneumeth.2014.06.019
0165-0270/© 2014 Published by Elsevier B.V.
C. Buske, R. Gerlai / Journal of Neuroscience Methods 234 (2014) 66–72 67

neurochemistry, and as we further develop our understanding of tracking with one camera and one computer cost thousands of dol-
how these differences interplay we may gain further understanding lars and was only able to generate a position for the animal once
of the brain and behavior. per second. Now the cost for basic equipment is in the few hun-
Zebrafish offer a rich behavioral repertoire. Shoaling is one of dred dollar range and tracking occurs at 30 times per second with
the behavioral outputs belonging to this (Buske and Gerlai, 2011a,b; a simple inexpensive camera (Dusenbery, 1985).
Miller and Gerlai, 2011). Several behaviors have been characterized The rapid improvements in electronics and video equipment
in zebrafish and are being further investigated within a behavioral have opened up the possibility of extracting large amounts of
neuroscience context. These have been previously reviewed and data. Extracting positional data from a video recorded at 30 frames
include reward, learning and memory, aggression, locomotion, anx- per second results in 18,000 data points in a short 10-min video.
iety, mating, and sleep (Norton and Bally-Cuif, 2010). Shoaling is a Processing and analysis that requires manual input would be pro-
complex behavior to study and quantify as there are a number of hibitively time consuming for the number of trials required in the
protocols that mimic a shoaling situation, while quantifying shoal- average experiment. As technology improves, Big Data is becoming
ing in an open field setting (where subjects are able to interact with the new bottle neck across many disciplines (Marx, 2013).
each other) has been difficult in the past (Buske and Gerlai, 2011a,b; Shoaling behavior is complex, involving subjects leaving and
Gerlai, 2014). rejoining the shoal, and varying distances between shoal members.
While behavior offers insights into brain function, recording and Several research groups have attempted to describe and quan-
analysis of behavioral trials can be time consuming. Throughput tify shoaling behavior using multiple zebrafish simultaneously, but
in behavioral screening may be successfully increased by running these approaches typically involve a method where the number of
several testing arenas in parallel, and cost of set up such systems is individuals occupying an arbitrary area of space within the test-
decreasing as technology advances. This makes parallel behavioral ing tank is counted at different time intervals (Echevarria et al.,
test systems a reality even for the average academic laboratory. The 2011), or by assigning a shoal cohesion ‘score’, a subjective judg-
bottleneck in such behavioral screening then becomes the proper ment made by the observer, at different time points during the
extraction and analysis of the acquired data. Several commercially trial (Piato et al., 2011). In other studies, a single fish is exposed
available tracking systems exist for testing a single subject at a time, to a shoal of conspecifics separated by a glass divider, and the
(e.g. Noldus’ Ethovision, and CleverSys). These systems are highly time spent within a preset compartment next to the stimulus is
sophisticated, but still present some trade-offs: while accurately measured (Savio et al., 2012). Aside from being time consuming,
tracking a single fish in optimal conditions, these systems present these methods do not offer an objective or particularly informa-
difficulty tracking very small fish in a large area (in larger tanks), tive description of shoal cohesion. For example, a group of fish may
and require optimal light conditions. The additional disadvantage is be divided across two different arbitrary areas as defined by the
that most of these systems have difficulties with tracking multiple experimenter, but be physically very close to each other. In another
fish, particularly shoals with more than four members. The final, scenario, the group may be present within the same arbitrary area
but certainly not most trivial limiting factor is the cost: With a sin- of the testing arena, but be physically further apart. Shoal cohe-
gle license one could only analyze one trial at a time. This presents sion would be rated lower in the first scenario than the second
a significant time constraint with regards to data extraction from and does not accurately represent how close the shoal truly is. In
videotaped trials. Processing video files at a higher throughput rate cases where a shoal cohesion score is assigned to each time interval
would require the purchase of multiple licenses and an equal num- sampled, substantial information is also lost, and there is the pos-
ber of computers. With tracking systems costing $5–15 thousand sibility of experimenter bias. Subtle differences in shoal cohesion
per license, parallel processing (multiple licences) is not possible would be missed in such a method, even when assessed by multi-
for most academic laboratories. ple raters and inter-rater reliability is high. Needless to say, these
Several commercial software applications exist that allow for methods do not allow for high throughput analysis of behavioral
automated tracking of zebrafish, and quantification of several trial, aside from not providing objective and precise measurements
behavioral outputs. As discussed above, these can be costly and the of group behavior. Reproducibility of these methods across labo-
cost can limit throughput. Having said that, these methods do offer ratories is also of concern (Benjamini et al., 2010). Also notably,
increasingly sophisticated means of measuring behavior: tracking methods where a single fish is tested in the presence of a stimulus
of animal behavior has been possible in a two dimensional plane for shoal (Savio et al., 2012) do not describe shoal behavior properly
some time (Noldus and Spink, 2001). More recently, various groups as there is no possibility of interactive communication between
have started applying 3D video behavioral tracking (Maaswinkel the subject and the stimulus. For example, even when live stim-
et al., 2013). These methods have provided insight in various behav- ulus fish are used, the experimental zebrafish will not be able to
ioral outputs, particularly as a result of drug exposure (Cachat et al., sense the presence of the stimulus fish with their lateral line, as a
2013; Maaswinkel et al., 2013). glass barrier is placed between the test subject and the stimulus
When developing our in-house tools, first we consider the fish. Similarly, while larval zebrafish can be observed and tracked
requirements of the tracking system and the specifics of behavioral in multiwell plates, each subject is isolated in its own well, and thus
analysis as they pertain to zebrafish. High throughput capabilities the subjects do not interact (Cario et al., 2011). When measuring
would require a simple tracking system and reliable behavioral shoaling in a group setting automated tracking methods have still
paradigms (Blaser and Gerlai, 2006). The tracking system should fallen short.
ideally be capable of tracking shoals of fish and tracking even under An earlier tracking program developed within our laboratory
sub-optimal conditions (e.g. when reflections cannot be avoided or (described in detail previously by Miller and Gerlai (2007) have
under sub-optimal light/background conditions). Preferably, such allowed for more objective and accurate measurement of inter-
systems should be capable of processing video-data in real time individual distance, and other parameters of group behavior, in
and preferably with a small number of computers. For example, zebrafish. This method has been successfully employed in previous
under such conditions commercially available tracking systems fre- studies (Buske and Gerlai, 2011a,b). Notably, it relied on manually
quently mistake small particles, e.g. air bubble or debris, which identifying each individual in a shoal at each time interval sam-
introduces substantial errors. The errors need to be corrected by the pled, which required a human observer, and thus the method was
experimenter, and this requires continuous monitoring and time. highly time consuming. Because the observer was only identifying
Recent improvements in optics and computing power have the location of the subject on a screen by clicking on it, and not mak-
revolutionized the analysis of behavior. Just under 30 years ago, ing any assessment on shoal cohesion, this method was arguably
68 C. Buske, R. Gerlai / Journal of Neuroscience Methods 234 (2014) 66–72

more objective than several prior manual scoring or rating meth- a xy field for the subject being tracked. At this stage, the analysis of
ods. However, extracting high-resolution data from trials using this the data provides the second major challenge.
method is prohobitively time consuming, and as a consequence it Using the raw data output from the application developed in-
is impossible to avoid loss of information unless a substantial time house affords this flexibility. We have developed a quantification
investment is made. module based on the R-environment for statistical computing that
More recently, a sophisticated yet simple tracking system was allows us to extract numerous behavioral endpoints from the raw
developed in our laboratory, internally named ‘Real Fish Tracker’. It Data-output, and in a highly flexible, user-specific manner. In our
was developed by James McCrae, a computer science PhD Candi- behavioral experiments, we recorded the behavior of our fish for
date at the University of Toronto. This tracking software is able to 8 min. Each 8-min trial resulted in data files consisting of 13,920
track multiple subjects within the same environment, and records rows of data (sampled 29× per second). More complex behav-
precise location data (X-Y coordinates) for each fish at the frame ioral studies, e.g. those requiring following the subject’s behavior
rate of the video being sampled. This translates to a sampling rate across extended periods of time, generate even larger data matri-
of 29 times per second for videos created with conventional dig- ces. A longitudinal study recently completed, assessing the effects
ital cameras. The program not only records location data at 29× of embryonic ethanol exposure over the course of development,
per second, but it does so in real time on computers with average generated 1300 data files (13 age points, n = 20 for five treatment
processing speeds. In addition, multiple sessions can run simulta- groups). This resulted in a total of 18 million data points (Buske and
neously on the same computer, allowing for tracking of up to five Gerlai, 2011a,b).
different trials in unison on a laptop computer with a modest pro- Processing and computation of many hundreds, or thousands
cessor and memory card. It should be noted that when running this of data files corresponding to an equal number of trials require a
many instances of the program, tracking does not proceed in real sophisticated approach. The R programming language and envi-
time, but some time savings is gained from only calibrating a few ronment for statistical computing is particularly suitable for this
trials at once every half an hour instead of calibrating one file every purpose (Venables and Smith, 2012). R can be regarded as an
few minutes. implementation of the S language which was developed at Bell Lab-
The program supports a range of conventional video formats, oratories by Rick Becker, John Chambers and Allan Wilks, and also
and tracks the fish by comparing the frame being sampled with forms the basis of the S-Plus systems. It is an effective program-
an average image computed over the previous several frames in ming language that facilitates data manipulation, calculation, and
the video. The difference in images allows the program to identify graphics.
the change in pixels, i.e. the location of the fish, and assigns x-y The software suite is referred to as an ‘environment’, as it is a
coordinates to each subject. The algorithms take into account the system for developing methods of interactive data analysis, rather
previously known position of the subject, the current image and than a typical statistical package or data analysis software. R pro-
the average images for previous frames. This allows for a highly vides the researcher with flexibility in designing analytical tools
reliable determination of the position of each subject, and mini- suitable for very specific data sets or goals unique to a particular
mizes instances where a subject might be ‘lost’ by the program due, project.
for example, to two subjects crossing into the path of each other. By creating programs in R for the data manipulation and analysis
While minimizing these instances, they do occur. The software, as of high resolution positional data as acquired with the Real Fish
with any other application, does not guarantee consistent identi- Tracker program it is possible to quickly and efficiently process and
fication of the same fish. This would be of concern in paradigms analyze hundreds of output files in a very short time frame, and
where an individual mutant or differentially treated fish may be with minimal interference by the experimenter.
exposed to shoal of control fish. The experiments discussed within
this article are focused on characterizing dynamics of movement
of the shoal and as such do not require individual labeling of each 2. Data processing and analysis
shoal member, an additional goal that will require future software
development. Similar to several other tracking programs, The Real Fish Tracker
The current software program provides high-resolution posi- produces individual .txt format output files with high resolution
tional data for each of the subjects in the trial. However, it does positional time series data for each individual tracked within the
not provide any quantification of particular (established) behav- same arena. Video files are sampled at a rate of 29× per second,
ioral endpoints, such as inter-individual distance, nearest neighbor creating data files of thousands of rows with data.
distance, distance from the closest wall or center, distance from a Each data file corresponds to a single trial, and from the posi-
corner, etc. As described before, some sophisticated software pack- tional data recorded various different behavioral measures can be
ages exist that can analyze select behavioral outputs in zebrafish computed in a user-defined and highly flexible manner. For our
(Ahmad et al., 2012), but for many research groups the cost of acqui- own purposes we decided to compute the following behavioral
sition of these packages can still be prohibitive. In addition, these measures: inter-individual distance, nearest neighbor distance, dis-
packages offer a set of behavioral outputs that can be quantified, tance from the closest wall or center, distance from a corner,
but beyond these predetermined behavioral measures, they do not distance traveled, time spent in a perimeter or particular zone, etc.
offer more flexibility. The current method was deployed in a dif- As only positional data are provided, further processing is both a
ferent study validating its reliability for the endpoints measured; must and a great advantage providing flexibility. The 29 hz samp-
comparable results were obtained in the description of shoaling ling rate generates many time points and the average of these
behavior with the currently described method (Buske and Gerlai, (the temporal mean) is calculated for the above measures for user
2012; Mahabir et al., 2013), as with older studies using previously defined intervals. In addition, but also importantly, the variance
discussed methods (Buske and Gerlai, 2011a,b). of the time point data is also calculated. It represents the within
Several other tracking programs exist that have been either cus- individual temporal variance of the behavior.
tom developed or open source (Aguiar et al., 2007; Wolfer et al., The xy coordinates are in screen units and need to be con-
2001; Kane et al., 2004). The community faces the challenge of verted to an experimenter defined distance unit, e.g. cm or body
frequently reinventing the wheel, with several tracking programs lengths. The latter is commonly used in animal, and particularly
being developed with sometimes overlapping features. Most of fish, research (Partridge, 1981). Subsequently, a new data frame
these programs produce data files consisting of coordinate data in can be constructed with the means of these time intervals or trials
C. Buske, R. Gerlai / Journal of Neuroscience Methods 234 (2014) 66–72 69

for statistical analysis. Processing each data file individually is pro-


hibitively time consuming and an automated procedure for high
throughput processing is a necessity. The development of an ana-
lytical tool in the R programming language, to complement the
tracking program solves this issue and is described below.

3. Requirements

Batch automation of all script files applied to a selection of


data files has been accomplished using the args package (Piipari,
2010). The R environment runs on both Macintosh and Widows
based operating systems. Basic knowledge of data processing in R
is required for basic analysis.

3.1. Data processing workflow

Raw data files are in .txt format and contain the following vari-
ables:

• Tracking area (coordinate points)


• Calibration rulers x and y (coordinate points), and length (in cm
or other measure defined by the user). Fig. 1. Calibration of the testing arena. All calibration information, which is unique
• Seconds for each trial, is captured by the R scripts and tranformed to a numeric object. This
• FishX coordinates for each subject allows for applicability of each script across all trials, despite different calibration
• FixhY coordinates for each subject information.

• Confidence (a confidence measure for detecting the subject in the


trial) Subsequently, the calibration information is skipped and the
• Ruler 1 and Ruler 2 calibrated positions for each subject. required columns from the data matrix are selected, and the new
matrix is saved.
R scripts are written to process and analyze the data obtained
in an automated manner. 3.3. Calculation of inter-individual distances

3.2. Data pre-processing After the pre-processing step, a new R script is applied to all
matrices generated from the first step described above. In this
The first R script applied to the data is to reshape the data for script, formula to calculate the distance between each focal fish and
further analysis. It is possible to perform ANOVAs and other statis- each other fish present in the arena are applied across the entire
tical analyses in R using data in a long format. The data obtained data table, generating 10 new columns of distance data for each of
from the tracking program is already in the long format, but con- the ten focal fish in a trial.
tains calibration information in the first six rows of the matrix. The The formula used follows from the Pythagorean theorem:
original data sets can be simplified first for easier processing and
behavioral quantification. f1f2d2 <- ((trial.sub$f2rx-trial.sub$f1rx)2̂ + (trial.sub$f2ry-
The calibration information is transferred to a new column in trial.sub$f1ry)2̂)
the data matrix. For functions to be applied across the entire data f1f2d <- f1f2d2ˆ(1/2)
matrix, the R script written requires the matrix to be of equal width
and length. All pertinent information retained in the first rows of The code above describes the distance between fish 1 and fish
the output file is therefore reshaped to fit this requirement. In the 2. The same is repeated within the script for each other focal fish
interest of processing speed, non-essential information is removed for all unique distance combinations.
from the new data matrix. All original data files are preserved, and Distances are taken from the calibrated ruler information pro-
the script outputs a newly reshaped data matrix. vided in the data matrix, and are thus set in the measure for which
First, the calibration information is called from each trial: the arena was calibrated. For each of our experiments, calibration
was done in centimeters, but naturally any distance measure may
tanksize <- as.numeric(as.character(trial[6,5])) be used as desired by the user.
xa <- as.numeric(as.character(trial[4,1])) All vectors containing the distances of one focal fish to one other
xb <- as.numeric(as.character(trial[4,3])) focal fish are combined into a new object called indiv.dist, and the
ya <- as.numeric(as.character(trial[4,2])) mean of each row is calculated (mean of time point inter-individual
yb <-as.numeric(as.character(trial[4,4])) distance data for each time interval sampled).

xa, xb, ya, and yb are standard arguments used across the script indiv.dist <- cbind(f1f2d, f1f3d, f1f4d, f1f5d, f1f6d, f1f7d, f1f8d,
library to denote each of the corners of the tank (Fig. 1). In the data f1f9d, f1f10d, f2f3d, f2f4d, f2f5d, f2f6d, f2f7d, f2f8d, f2f9d, f2f10d,
file the calibration information is contained within a specific loca- f3f4d, f3f5d, f3f6d, f3f7d, f3f8d, f3f9d, f3f10d, f4f5d, f4f6d, f4f7d,
tion that is automatically queried using these arguments and the f4f8d, f4f9d, f4f10d, f5f6d, f5f7d, f5f8d, f5f9d, f5f10d, f6f7d, f6f8d,
locations of which are denoted between the square brackets. By f6f9d, f6f10d, f7f8d, f7f9d, f7f10d, f8f9d, f8f10d, f9f10d)
performing this step, it normalizes each of the different xy coor- average.dist <- rowMeans(indiv.dist, na.rm=FALSE, dims=1)
dinates for the arena calibration so that despite each trial having
a slightly different xy coordinate for the corners of the arena, the The new vector with the mean inter individual distance is added
same script can be applied to all data files (Fig. 1). to the individual distance vectors. Subsequently, this data frame is
70 C. Buske, R. Gerlai / Journal of Neuroscience Methods 234 (2014) 66–72

merged with the original data matrix containing the XY coordinate


data. Using a small matrix containing independent factor informa-
tion, for example in our case the age, strain, treatment, and body
length of the subjects is added to the data matrix.
From each of the inter-individual distances the minimum dis-
tance is called and a new vector is created, representing the
minimum-neighbor distance. This is different from the nearest
neighbor distance in the sense that nearest neighbor distance rep-
resents the average smallest inter individual distance among all
focal fish. The minimum-neighbor distance is the smallest value
for all focal fish. Both measures are computed.
Subsequently, the distance for each focal fish from each of the
walls of the arena is extracted using the raw xy coordinate data
matrix. At each time point measured, four new vectors are calcu-
lated for each of the subjects in the arena. Each vector represents
the distance of the focal fish from one of the walls. The minimum
of the row is taken for each fish, and summarized in a new vector.
The example code for fish 1 is as follows:

f1.dist.xa <- (trial$f1.x-xa)


f1.dist.xb <- (xb-trial$f1.x)
f1.dist.ya <- (trial$f1.y-ya)
f1.dist.yb <- (yb - trial$f1.y) Fig. 2. Explanation of the distance to the closest wall measurement: For each focal
f1.wall <- cbind(f1.dist.xa, f1.dist.xb, f1.dist.ya, f1.dist.yb) fish (red) the distances to each of the four walls of the arena is calculated based on
f1.min <- apply(f1.wall, 1, min) the individual’s location on the XY coordinate grid. Each of these wall distances is
saved to a separate vector. For these four vectors, the minimum value is extracted
into a fifth vector and used as the minimum wall distance. The original four vectors
After applying the script to a data file, entering the object name are discarded after the calculation is completed across all time points sampled. (For
f1.min, and hitting enter returns the time series for the distance to interpretation of the references to color in this figure legend, the reader is referred
to the web version of this article.)
the closest wall for fish 1.
Each vector is created using a collection of formulae and saved
in R as an object. The object can be called from the R script for each
forming (shoaling) and exploratory behavior are becoming a reality.
individual data file as long as the entire script is run first.
Hardware and software solutions for screening with zebrafish larva
have become commercially available and are usually performed
3.4. Extracting data in 96 well plates (Redfern et al., 2008; Giacomotto and Ségalat,
2010; Schnorr et al., 2012). Adult zebrafish provide additional ben-
The R library created consists of 20 separate scripts customized efits not present yet in larvae; e.g. a more sophisticated behavioral
for either single subject data or multi-subject data. repertoire (Norton and Bally-Cuif, 2010) such as complex forms of
Data from each trial can be summarized and the object of social behavior including shoaling (Buske and Gerlai, 2011a,b). We
interest (measure, e.g. inter-individual distance, nearest neighbor have previously shown that shoaling can be disrupted at the adult
distance, etc.) can be called for either the complete trial or for time stage by a single embryonic insult with a teratogen such as ethanol
windows within each trial. For each trial these values can be col- (Fernandes and Gerlai, 2009; Buske and Gerlai, 2011a,b) Behavioral
lected into one data matrix for statistical analysis across treatment tests can provide insights into brain function and may for example
groups. help in the analysis of the effects of early embryonic teratogenic
We created a customized script containing commands for each insults (Schnorr et al., 2012). Utilizing adult zebrafish as a model to
of the objects to be called for from the library. Then, this script can study altered brain function in a vertebrate is particularly attractive.
be applied over an unlimited number of data files to quickly and effi- However, when attempting to identify mutants or efficacious drugs
ciently process and summarize data for all trials in an experiment. from a library of compounds, high throughput methods are neces-
Within the same script, the possibility of exporting a file contain- sary (Zon and Peterson, 2005). Housing and testing adult zebrafish
ing all statistical analyses can be included. This automates the data is economical and thus the practical and financial bottleneck in high
processing and analysis steps of a research project. By allowing the throughput studies lies in the extraction and analysis of data.
program to export the summary tables for each of the measures we Previously, manual tracking techniques only partially auto-
include the possibility for the experimenter to perform additional mated the process of obtaining reliable and quantifiable data from
statistical analyses, in addition to being able to export to a standard behavioral recordings (Miller and Gerlai, 2007). The time invest-
statistical analysis as desired by the user (Figs. 2 and 3). ment required not only for the tracking aspect, but also the manual
processing of the data in a spreadsheet program such as Microsoft’s
4. Discussion & future directions Excel is prohibitively time consuming with Big Data. Human error
and bias are also of some concern with such approaches. Even
High throughput screening is typically thought of in the when using automated tracking software, the raw data gener-
context of simple observable characteristics in small model orga- ated still requires substantial preparation and processing prior to
nisms (Lessman, 2011) but with increasingly powerful behavioral downstream analyses. Furthermore, commercial automated track-
paradigms and methodologies now screening is becoming feasi- ing software is expensive, and acquisition of sufficient licenses to
ble even for complex behavioral phenotypes. Adult zebrafish are run a high throughput operation is prohibitive for the average lab-
a small and highly social species (Buske and Gerlai, 2011a,b) and oratory. These systems have so far proven inefficient or unable to
the above methods now suggest that high throughput screens track multiple subjects at the same time in the same arena, severely
for complex behaviors including numerous parameters of group limiting zebrafish behavioral research. We address each of these
C. Buske, R. Gerlai / Journal of Neuroscience Methods 234 (2014) 66–72 71

Fig. 3. Explanation of minimum nearest neighbor (MNND) vs. nearest neighbor distance (NND). Each circle represents an individual (i.e. a fish). The Nearest Neighbor Distance
(NND) of the focal (red) individual is the distance between it and the closest other individual, represented by the red arrow. The NND is calculated as mean of all nearest
neighbor distances for each focal fish (B). The Minimum nearest neighbor distance only measures the minimum of all nearest neighbor distances for a sampled time point,
so only the smallest nearest neighbor distance is calculated as shown in (A). (For interpretation of the references to color in this figure legend, the reader is referred to the
web version of this article.)

limitations with a novel approach consisting of a custom tracking the analytics process do not need to be recoded, and the prover-
application and analytical tool developed in our laboratory. bial reinventing the wheel does not need to occur, when another
The tracking system described generates multi-subject track researcher has contributed an appropriate resource already.
data in an xy coordinate format. The raw data generated by The R is open source software, making it accessible to virtually any-
Real Fish Tracker can be now processed and analyzed so that often one, anywhere. The number of R users today is estimated to be in
complex behavioral measures are extracted and quantified in an the millions, across a variety of different disciplines and industries.
automated manner using the R language (Gentleman et al., 2004). Although R is a command line driven system, development of an
Using a command line interface (such as Terminal app on Macintosh R package containing many of the behavioral outputs discussed in
computers, and Windows Command Processor on PCs), the scripts this study will minimize the R-specific experience and knowledge
developed for the desired behavioral measures can be applied to required to operate the package, while maintaining the flexibility of
an unlimited number of data files within a project folder. Just as an the R environment for more experienced users. The popularity for
example, it is possible to process and analyze 1300 individual data the R language has grown dramatically in recent years. The R lan-
files (containing a total of 18 million rows and 28 columns of data) guage is an appropriate choice from a methodological perspective.
within approximately 3 h using a Macintosh computer running OSX In addition, with a growing number of users and the open source
10.8, with a 2.6 GHz Intel core i7 processor and 16 GB 1600 MHz nature of the platform, there are many opportunities for innovation
DDR3 of memory. It would take several weeks or even months for and collaboration.
an individual to process the same amount of data and perform all The behavioral outputs discussed in this article may be viewed
calculations, even when assisted with e.g. an Excel Macro for some as examples and starting points of a vast array of future possibilities.
of the tasks. Any behavior that can be quantified using xy positional data can be
At this time the library of R scripts we have developed for these extracted using these methods. One does not need to rely on pre-
analytical purposes exists as a separate set of script files. Moving set parameters that can be quantified, and instead the researchers
forward, it will be necessary to rewrite aspects of these for effi- has free reign of how in depth the analysis proceeds.
ciency and processing speed, and combine them together in the In addition to benefits of analytical flexibility, these methods
R package. This package can then easily be distributed to anyone are attractive to the large number of laboratories that are not able
who is interested through the Comprehensive R Archive Network to afford costly commercially available tracking systems. While the
(CRAN), and through any other means deemed appropriate (e.g. zebrafish is growing in popularity in behavioral neuroscience (Sison
potentially a dedicated website where both the tracking software et al., 2006; Gerlai, 2010), the aspects involving data acquisition
and the analytical package can be offered.) and processing in behavioral studies do not match the low cost and
At the time of writing the CRAN repository features 3971 avail- high throughput capabilities typical of research conducted with
able packages. Each of these has been contributed by a member of this species. Our tracking software, combined with basic R skills
the community, and made available to anyone else interested in for pre-made R scripts can make high powered analytics accessible
using the analytical tools contained within them. This illustrates at minimal cost. This eliminates one of the primary bottlenecks in
the breadth and activity of the R programming community, and high throughput screening of adult zebrafish, and high throughput
the R user community as a whole. It also emphasizes that steps in behavioral testing in general. These types of software applications
72 C. Buske, R. Gerlai / Journal of Neuroscience Methods 234 (2014) 66–72

and analytical tools are not limited to zebrafish, they can be applied Gerlai R. Social behavior of zebrafish: from synthetic images to biological mecha-
with other model organisms as well. Therefore, we hope that our nisms of shoaling. J Neurosci Methods 2014.
Giacomotto J, Ségalat L. High-throughput screening and small animal models, where
methods will gain popularity across a broad spectrum of Big Data are we? Br J Pharmacol 2010;160(March (2)):204–16.
applications and will make analysis of results easier and more effi- Kane A, Salierno J, Gipson G, Molteno T, Hunter C. A video-based movement
cient. analysis system to quantify behavioral stress responses of fish. Water Res
2004;38(November (18)):3993–4001.
Lessman C. The developing zebrafish (Danio rerio): a vertebrate model for high-
References throughput screening of chemical libraries. Birth Defects Res Part C: Embryo
Today: Rev 2011;93(3):268–80.
Aguiar P, Medonça L, Galhardo V. OpenControl. A free opensource software for Maaswinkel H, Le X, He L, Zhu L, Weng W. Dissociating the effects of habitua-
video tracking and automated control of behavioral mazes. J Neurosci Methods tion, black walls, buspirone and ethanol on anxiety-like behavioral responses
2007;166(October (1)):66–72. in shoaling zebrafish. A 3D approach to social behavior. Pharmacol Biochem
Ahmad F, Noldus L, Tegelenbosch R, Richardson M. Zebrafish embryos and larvae in Behav 2013;July (108):16–27.
behavioural assays. Behavior 2012;149(January (10–12)):1241–81. Mahabir S, Chatterjee D, Buske C, Gerlai R. Maturation of shoaling in two
Benjamini Y, Lipkind D, Horev G, Fonio E, Kafkafi N, Golani I. Ten ways to improve zebrafish strains: a behavioral and neurochemical analysis. Behav Brain Res
the quality of descriptions of whole-animal movement. Neurosci Behav Rev 2013;247(June):1–8.
2010;34(July (8)):1351–65. Marx V. Biology: the big challenges of big data. Nat Technol 2013;498(June
Blaser R, Gerlai R. Behavioral phenotyping in zebrafish: comparison of three behav- (7453)):255–60.
ioral quantification methods. Behav Res Methods 2006;38(August (3)):456–69. Miller N, Gerlai R. Quantification of shoaling behaviour in zebrafish (Danio rerio).
Brennan CH. Zebrafish behavioural assays of translational relevance for the study of Behav Brain Res 2007;184(December (2)):157–66.
psychiatric disease. IEEE 2014;22(1). Miller N, Gerlai R. Shoaling in zebrafish: what we don’t know. IEEE 2011;22(1).
Buske C, Gerlai R. Early embryonic ethanol exposure impairs shoaling and the Noldus L, Spink AJ, Tegelenbosch RAJ. EthoVision. A versatile video tracking system
dopaminergic and serotoninergic systems in adult zebrafish. Neurotoxicol Ter- for automation of behavioral experiments. Behav Res Methods Instrum Comput
atol 2011a;33(November (6)):698–707. 2001;33(August (3)):398–414, Springer-Verlag.
Buske C, Gerlai R. Shoaling develops with age in Zebrafish (Danio rerio). Prog Neuro- Norton W, Bally-Cuif L. Adult zebrafish as a model organism for behavioural genetics.
Psychopharmacol Biol Psychiatr 2011b;35(August (6)):1409–15. BMC Neurosci 2010;11:90.
Buske C, Gerlai R. Maturation of shoaling behavior is accompanied by changes Partridge B. Internal dynamics and the interrelations of fish in schools. J Comp
in the dopaminergic and serotoninergic systems in zebrafish. Dev Psychobiol Physiol 1981;144(3):313–25.
2012;54(January (1)):28–35, Wiley Subscription Services, Inc., A Wiley Com- Piato A, Rosemberg D, Capiotti K, Siebel A, Herrmann A, Ghisleni G, et al. Acute
pany. restraint stress in zebrafish: behavioral parameters and purinergic signaling.
Cachat J, Kyzar EJ, Collins C, Gaikwad S, Green J, Roth A, et al. Unique and potent Neurochem Res 2011;36(October (10)):1876–86.
effects of acute ibogaine on zebrafish: the developing utility of novel aquatic Piipari M. Inference and classification of eukaryotic cis-regulatory motifs;
models for hallucinogenic drug research. Behav Brain Res 2013;236(January 2010, Available from ftp://anonymous@ftp.sanger.ac.uk/pub4/theses/mp4/
(1)):258–69. abstract.pdf
Cario CL, Farrell TC, Milanese C, Burton EA. Automated measurement of zebrafish Redfern W, Waldron G, Winter M, Butler P, Holbrook M, Wallis R, et al. Zebrafish
larval movement. J Physiol Physiol Soc 2011;589(August (15)):3703–8. assays as early safety pharmacology screens: paradigm shift or red herring? J
Dusenbery D. Using a microcomputer and video camera to simultaneously track 25 Pharmacol Toxicol Methods 2008;58(September (2)):110–7.
animals. Comput Biol Med 1985;15(4). Savio L, Vuaden F, Piato A, Bonan C, Wyse A. Behavioral changes induced by
Echevarria DJ, Buske C, Toms CN, Jouandot DJ. A novel test battery to assess long-term proline exposure are reversed by antipsychotics in zebrafish. Prog
drug-induced changes in zebrafish social behavior. zebrafish neurobehavioral Nueropsychoparmacol Biol Psychiatr 2012;36(March (2)):258–63.
protocols. Totowa, NJ: Humana Press; 2011. p. 109–24. Schnorr S, Steenbergen P, Richardson M, Champagne D. Measuring thigmotaxis in
Fernandes Y, Gerlai R. Long-term behavioral changes in response to early devel- larval zebrafish. Behav Brain Res 2012;228(March (2)):367–74.
opmental exposure to ethanol in Zebrafish. Alcohol Clin Exp Res 2009;33(4): Sison M, Cawker J, Buske C, Gerlai R. Fishing for genes influencing vertebrate behav-
601–9. ior: zebrafish making headway. Lab Anim (NY) 2006;35(May (5)):33–9.
Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, et al. Bioconduc- Venables W, Smith D. R Core Team. An Introduction to R: Notes on R, a Programming
tor: open software development for computational biology and bioinformatics. Environment for Data Analysis and Graphics; 2012 http://cran.r-project.org/
Genom Biol 2004;5(10):R80. doc/manuals/R-intro.pdf
Gerlai R. High-throughput behavioral screens: the first step towards finding genes Wolfer D, Madani R, Valenti P, Lipp H. Extended analysis of path data from mutant
involved in vertebrate brain function using zebrafish. Molecules 2010;15(April mice using the public domain software Wintrack. Physiol Behav 2001;73(August
(4)):2609–22. (5)):745–53.
Gerlai R. Using zebrafish to unravel the genetics of complex brain disorders. Behav Zon L, Peterson R. In vivo drug discovery in the zebrafish. Nat Rev Drug Discov
Neurogen 2012;12:3–24. 2005;4(January (1)):35–44.

You might also like