Professional Documents
Culture Documents
Capstone
Capstone
Capstone
Jake Horne, Kevin Yoshimoto, Luis Navarro, Michael Watson & Myles Lopez
Summer 2023
1
Executive Summary
The Monterey Bay Aquarium has collected data from the past few decades on all pertinent
information on oceans, from changing salinity levels to the underwater topography as well as
temperature changes. In total, millions upon millions of data points have been taken in raw form
and have been extrapolated and cleaned to be used in a plethora of current projects, all pertaining
to the overall trends that are occurring in our oceans as well as relationships that are not
immediately intuitive to the human eye and are readily found through machine learning models
and data extrapolation. Our goal as a team from CSUMB is to create widgets that make data
visualization more accessible to a human user, whether that be toggling for a certain time period,
end user perspective, it makes the process of looking at different views much more user friendly,
in that only a click of a button is needed to create the desired data visualization as opposed to
coding the desired output explicitly each time. From a outcome perspective, our team is expected
to create a pull request with a jupyter notebook that has sample public data, and utilizes the
newly developed widgets and their overall capabilities and flexibility in visualizing the data that
has already been cleaned, and give users an easier system to look at data from a human’s
perspective, to see trends over time as well as anticipated projections based off the data that has
Table of Contents
Introduction/Background............................................................................................................3
Project Name and Description................................................................................................. 3
Problem and Issue With Technology....................................................................................... 3
Solution to the problem and/or issue in technology................................................................. 3
Environmental Scan/Literature Review..................................................................................... 4
Stakeholders................................................................................................................................ 5
Ethical Considerations............................................................................................................. 5
Legal Considerations............................................................................................................... 6
Project Goals and Objectives..................................................................................................... 6
Goals....................................................................................................................................... 6
Objectives................................................................................................................................ 6
Final Deliverables........................................................................................................................ 7
Approach/Methodology...............................................................................................................7
Timeline/Resources.....................................................................................................................8
Milestones................................................................................................................................9
Resources Needed.................................................................................................................. 9
Platform........................................................................................................................................ 9
Risks and Dependencies.......................................................................................................... 10
Risks...................................................................................................................................... 10
Dependencies........................................................................................................................ 11
Testing Plan................................................................................................................................12
Team Members...........................................................................................................................13
References................................................................................................................................. 13
3
Introduction/Background
The organization that we are working in tangent with is the MBARI lab, which is a
non-profit oceanographic research center that uses data to make inferences about oceanic
changes and the impact that these changes will have in the future. There is no specific project
name associated with the work that we will be doing; however, it will be utilized by the internal
team that we are directly working with, with the goal of using the work to help the Dorado class
sensor data processing project to eventually be utilized as a general function for other teams.
This work that is being completed is important at MBARI not just from a direct oceanic
perspective, but since a large portion of the world’s population is in direct contact with the
oceans the information provided by MBARI can be utilized to determine the living conditions of
these populations, for example the effects of rising sea levels and Decreasing pH due to
By using robots, MBARI both autonomous and remotely controlled are used to collect
information on salinity and chlorophyll monthly, to take a few examples of the many projects
currently being conducted. The issue with the current situation at MBARI is that the amount of
data points collected from these robots is much too large for scientists to process and analyze
Our goal as students at CSUMB is to provide easier access for scientists within the
MBARI team to visualize the data in a more appealing format for a human viewer, where the
4
intention of the visualization is apparent from the get-go. We will utilize Javascript employed
widgets in jupyter notebook to show data by providing readily available masks to quickly toggle
between different views in the data. The technology used for the gathering of data is constantly
being fine tuned further and worked on to provide increasing levels of precision; however, our
scope and involvement does not deal directly with the technology involved in the collection of
data, but rather the user interface associated with viewing the data that has been preprocessed,
cleaned, and extrapolated. At its current state, the masks that are currently in use take too long
based on the volume of data provided, so new masks need to be created to alleviate the current
Mote Marine Laboratory in Florida deployed a similar AUV as the MBARI vessel named
“Genie”. This AUV’s main goal is to gather data useful for ocean observing and research. Genie
carries instruments that can monitor water temperature, depth, salinity, colored dissolved organic
matter, and turbidity. The vessel can also be used to monitor microscopic plant-like organisms,
which include a toxic algae that causes Florida red tides and can be harmful for marine life and
people. Where this AUV differs from MBARI is the acoustic receiver that Genie carries to detect
fish that were tagged by researchers to collect data on fish migration patterns. Scientists at Mote
Marine Laboratory believe that collecting this data can help discover patterns in the movement of
the toxic algae and use it to mitigate the issues it is causing to the Florida population. Once
researchers get data on the red tide location they send the information out to the public and to
resource managers as soon as it’s available. It is one of the many benefits that AUV data
collection can provide to the public. With access to helpful information when going to the beach
or in the ocean. All the data that is collected by Genie is sent to the GCOOS (Gulf of Mexico
5
Coastal Ocean Observing System) website and is used to populate an animated dashboard of the
ocean currents, water direction of the currents, and the speed of the currents. The dashboard also
shows the locations of all the AUVs that are providing data points to the website. This
information provides patterns of the ocean currents which can help engineers determine where to
design and build structures like bridges, dams, and offshore platforms, shipping companies can
also use this info to optimize shipping routes and avoid areas with strong currents. Marine
ecosystems also get affected from water current and this data can help understand how they
Stakeholders
The stakeholders for this project are the direct team we are working with at MBARI, with
the long term goal of providing a resource that is both versatile and flexible enough to be utilized
and adapted further for other teams’ use. There is minimal risk in the terms of risk, besides
having a functionality that does not work to the full extent as expected and there are limitations
in its functionality. However, if thorough testing is conducted and the functionality is rigorously
tested, the team can hope to gain access to widgets that lessen the load of the current
visualizations within the team. This can help with creating easy to read visualizations for end
users to understand trends that are occurring in the ocean using the data collected from the robots
Ethical Considerations
There are no ethical considerations in question, if anything the research done by MBARI
Legal Considerations
The only legal considerations are making sure that countries acknowledge and accept the
rationale for robots getting data in international waters, should they need to explore that far from
shore as well as the robots minimally affecting the environment they are gathering data from.
Our goals go beyond the actual deliverable that is due at the end of the capstone planning
and execution, but encompasses the nature of working in a team with a project outline and
deadline. Instead of having a prefabricated outline and rubric on what is expected, the only
information provided is that of the final deliverable without the addition of the step-by-step
Goals
Objectives
can be utilized for overall into manageable and tangible meticulous planning and
succinctly as complete or
7
Final Deliverables
Our project has a tangible goal of creating widgets that aid in the enablement of
interactive data visualization for MBARI AUV data. The current widgets being utilized take
seconds to fully process, whereas our goal centers around giving end users a more responsive
interface that shows data visualizations and specified parameters much quicker, giving the user a
seamless experience. This is similar to an existing project STOQS, but the data visualization is
still too slow for consumers and as such is only used by the engineers that are working directly
with the data, not the casual bystander. We are hoping to bridge this gap by giving widgets that
allow for manipulation of activity_name, time, depth specifically, and create a pull request that
consists of a Notebook with public example data, demonstrating this functionality. This pull
request will be accepted and merged by Mr. McCann directly, if he sees it as a valuable feature to
Approach/Methodology
As a team, we hope to have weekly meetings so not quite daily standups that are typical
of an Agile workflow (mainly dictated by differing work schedules), and to assign points and
stories based on the bandwidth of each team member. If there is an overlap of information that
will benefit multiple stories, the team members working on the stories that overlap will work
together, to troubleshoot and develop either in real time or through messaging. The preliminary
stage of the project will involve planning as well as getting familiar with the resources required
for the project, which include making sure all team members have all software configured the
8
same in order to have seamless integration. All information relating to data that needs to be
extrapolated and cleaned will be provided by Professor McCann and any additional research that
needs to be conducted on creation of widgets can be derived from the sample that he provided
based on other configurations as well search the web for anything we may deem applicable.
Once all research has been conducted, the stories will be separated on the separate functionalities
(widgets for activity_name, time, depth) and further divided into substories focused on creation
main development branch once all necessary checks have passed and have been approved by the
team as a whole.
Timeline/Resources
Include a detailed schedule for completing the project. Use a chart/table with your description.
Include major stages (Milestones) of the process toward completion of the project.
Detailed Timeline
Week 1 and 2: Setup Week 3 and 4: Begin Week 5 and 6: Week 7: Integrate all
of Brew, Anaconda, design of widgets Implement changes widgets into a main
and Poetry, become based off SME based on design and dev branch and check
familiarized with the suggestion and start design review at the for compatibility and
software and make documentation of end of Week 4, any residual issues
sure compatibility of preplanning and meeting twice a week before submitting for
software is seamless initial stages for status updates a pull request for
among members regarding timeline, do Professor McCann.
testing for each
widget separately
throughout.
Document any results
as well as testing
results.
9
Milestones
Resources Needed
No main resources needed besides publicly provided data and personal computers.
Platform
The softwares that was used to create the AUV data processing application consists of
only Python and sets of tools for it. The main software being Python is the driver for the purpose
of the application, which is processing large amounts of data. Python has a large list of libraries
and frameworks that are useful for data manipulation, transformation, and analysis. For our
capstone we are writing a Jupyter Notebook file that processes large amounts of data collected by
an AUV and filters out data based on the user’s preference, which makes Python the perfect
language to code the project in. We are using a Jupyter Notebook file because we can use code
with rich-text, allowing us to write our script and explain how to use it to the user. Anaconda is
the platform that we will be using to write the notebook file in. It consists of a package manager,
some pre-compiled packages, and other tools that make it easier to work with Python. As a team,
we found that Anaconda provides a robust foundation for working with Python. Overall the
platform we have utilized is Anaconda, which allows us to write a Jupyter Notebook file that will
manipulate the data shown to a user based on their preference of what they would like to be
displayed. Because our capstone is to create a script that works with a project that has already
been created by our client we are also using Poetry in order to manage the dependencies
10
necessary for our project. With Poetry our team can inherit all the dependencies that the already
created project has imported and will be able to jump right into working on the script without
The risk and dependencies of the project are not all known due to there not being
accessibility to the Jupiter notebook which will be provided by the client. From the meetings
with the client, the notebook they will provide will determine the project’s schedule and division
of labor. However, the risks for the project are known. These risks are few, and this is mainly
Risks
The risks for the success of this project are few due to the nature of the project’s scope.
The most concerning risk for this project is the limitation of adequate testing for the totality of
the dataset. Due to there being a vast amount of data points in the project, the programming will
have to be consistently tested on small sections of the dataset. This limits the possibility of
providing adequate bounds on which we can predict before we are capable of testing on the
entire dataset. Another risk that we might encounter is problems with the project being able to
merge together near the end of the project development. Considering that individual team
members will be focusing on specific features and parts of the project, bringing these parts
together may prove problematic. We intend for each section of the project to be capable of
overlapping in a single Jupyter notebook, so there is substantial risk that these sections can prove
incompatible and require significant tuning to work correctly. One risk we are avoiding is the
11
risk that we can corrupt some of the dataset considering we will not be altering any data directly
in the database.
Dependencies
When it comes to dependencies, there are only two parts of the project with which the
project will depend on prior parts. We will originally use the already existing Jupiter notebook
with which we will gain access to at the beginning of the project. This notebook will be our main
dependency because it is the foundation with which the rest of the project will be based on.
When we are able to view this notebook the team will then split up working on specific features.
The next dependency will be joining all of the features towards the end of the project. The
project will be dependent on each of the team member’s work to compile into a single Jupiter
notebook. Another dependency of note is the access to the dataset provided by MBARI, which is
publicly available. This project will be engineered in the agile methodology, and as such the
dependencies of the project are not known beyond these two until we have all information
Testing Plan
To deliver a reliable, user-friendly, and performant solution for exploring and visualizing
the extensive AUV data archive. Our testing plan for the project encompasses a comprehensive
approach to ensure the quality and effectiveness of the Jupyter Notebook and the newly
developed data selection widgets. We will begin with unit testing, meticulously verifying the
correctness of each function and module to guarantee their individual functionality. Integration
testing will follow, focusing on seamless interaction between different components to identify
any potential conflicts or issues. The next phase involves functional testing, where we will
thoroughly validate the functionality of the data selection widgets. This will include testing the
12
drop-down selector for activity__name, range selectors for depth and time, and their seamless
integration with the existing biplot() function. We will conduct extensive testing, exploring
various combinations of selections to ensure accurate data filtering and visualization, and to
verify that the generated plots align with the selected criteria. In addition to functional testing,
we recognize the importance of usability testing. We will engage users and domain experts who
are familiar with AUV data and its use cases. Through usability testing, we will gather valuable
feedback on the user interface, the intuitiveness of the data selection widgets, and the overall
user experience. This feedback will guide us in refining and improving the user interaction and
interface design to ensure a user-friendly and intuitive experience when exploring and
visualizing the extensive AUV data archive. Further, performance testing to evaluate handling of
large datasets, and rigorous testing of error handling and edge cases will be considered.
Thorough documentation review and peer review will be conducted to ensure clarity, accuracy,
and usability.
Team Members
Kevin Yoshimoto - Project manager ( Team leader ). Creation of project plan and delegation of
Jake Horne - Widget creation. Creation of widget drop down selector for activity_name, range
Michael Watson - Widget creation. Creation of widget drop down selector for activity_name,
Luis Navarro - Stress testing. comprehensive testing to ensure the accurate filtering and
visualization of data, as well as the alignment of generated plots with the selected criteria. This
13
testing will involve the utilization of the drop-down selector for activity names, range selectors
for depth and time, and their seamless integration with the existing biplot() function.
Myles Lopez - Stress testing. comprehensive testing to ensure the accurate filtering and
visualization of data, as well as the alignment of generated plots with the selected criteria. This
testing will involve the utilization of the drop-down selector for activity names, range selectors
for depth and time, and their seamless integration with the existing biplot() function.
14
References
https://www.mbari.org/technology/seafloor-mapping-auv/
Rutger, H. (2015, November 10). New underwater robot “Genie” deployed to monitor harmful
https://mote.org/news/article/new-underwater-robot-genie-deployed-to-monitor-harmful-
algae-and-more