Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Maddie Gamache 1/4/2022

1.4 Sourcing the Right Data

Population Data by Geography - US Census


Data Source
This is an external data source from the US Census Bureau. This data source should be
considered trustworthy since it is from a government agency.

Data Collection
The US Census Bureau collects data from many different sources including federal, state, and
local governments. This would be considered administrative data. Some data is collected from
respondents directly through censuses and surveys conducted by the US Census Bureau. This
would be considered survey data. The US Census Bureau combines administrative data with
survey data.
https://www.census.gov/about/what/admin-data.html

Data Contents
This data set contains US population statistics by geographical location. It is further broken
down by county & state, year, total population, male total population, female total population,
and age group (per 5 years).

Data Limitations
Since this dataset manually collected, it is possible it is prone to human-bias, typo errors,
collection errors, and accuracy errors. Since this dataset looks through the years 2009-2017 it is
also important to consider the collection of data has not been consistent in each county
through the relevant years. The age group numbers are estimates and may not add up to the
total population.

Data Relevance
Project Objective: Provide insight to influenza trends and establish proactive plan to help
hospitals and clinics across the country with temporary staffing assistance during peak influenza
outbreaks, when additional staffing is in high demand.
Hypothesis: If a geographical location has a high percentage of vulnerable populations and a
low percentage of population vaccinated against influenza, then those locations are at higher
risk of developing a high influenza-related mortality rate and will require additional staffing
relief.

This data set is relevant because it gives us the most accurate look at total populations broken
down by geographical location and age group. This will allow us to find the most densely
populated areas in the US with the highest vulnerable populations by age group (under 5 and
over 65 years). This data set does not specify individuals with chronical medical conditions as
vulnerable populations. We can also cross reference this data set with vaccination records to
find out which locations will potentially need the highest priority for staffing assistance.
CDC Influenza Laboratory Tests and Patient Visits
Data Source
These two data sets (separated by lab tests and patient visits) are both external data sources
that come from the CDC. These should be considered trustworthy as the CDC is a government
agency and is aware that the data sets will be used by many different organizations.

Data Collection
Both datasets are considered survey data and are updated on a weekly basis. Information is
updated by voluntary participants from state, local, and territorial health departments, public
health and clinical laboratories, vital statistics offices, health care providers, hospitals, clinics,
emergency departments, and long-term care facilities.

Data Contents
Laboratory Tests: Categorized by region, year (2010-2015), week, total specimens collected,
percent positive for influenza, and type of influenza virus detected.
Patient Visits: Categorized by region, year (2010-2019), week, weighted and unweighted
percentage of influenza like illness (ILI), age groups, total ILI, number of providers, and total
patients.

Data Limitations
Since the data is reported from volunteering participants the data is prone to collection errors,
inaccuracy, and inconsistent information. The datasets do not share the same timeline and do
not provide enough of a timeline to accurately predict patterns in flu activity. There are no
records of different age groups, so we are unable to differentiate vulnerable populations from
general population.

Data Relevancy
Project Objective: Provide insight to influenza trends and establish proactive plan to help
hospitals and clinics across the country with temporary staffing assistance during peak influenza
outbreaks, when additional staffing is in high demand.
Hypothesis: If a geographical location has a high percentage of vulnerable populations and a
low percentage of population vaccinated against influenza, then those locations are at higher
risk of developing a high influenza-related mortality rate and will require additional staffing
relief.

These data sets provide valuable insights on positive influenza tests, type of virus detected, and
number of patients visiting health care facilities for influenza like illness but are not necessarily
relevant to the hypothesis due to its limitations. The Laboratory Tests data set could be used to
identify geographical locations with the highest percentage of positive test results in the years
2010-2015 which may be useful if cross-referenced with another dataset.
NIS Child Flu Shots
Data Source
This dataset is a National Immunization Survey owned by the CDC. This should be considered
trustworthy as the CDC is a government agency and is aware that the data sets will be used by
many different organizations.

Data Collection
This data was collected manually through telephone surveys monitoring flu vaccination in
children and adolescents (6 months – 17 years old). This is considered survey data.

Data Contents
This dataset contains flu vaccination data for children and adolescents. It is categorized by year
(2017), age group, family demographics, state, and insurance coverage and history.

Data Limitations
This dataset only looks at flu vaccinations in children and adolescents and not adult or elderly
populations. It also only includes one year of data in 2017. The data was collected through
phone surveys and could be easily prone to error from misreporting or mistrusting
respondents.

Data Relevancy
Project Objective: Provide insight to influenza trends and establish proactive plan to help
hospitals and clinics across the country with temporary staffing assistance during peak influenza
outbreaks, when additional staffing is in high demand.
Hypothesis: If a geographical location has a high percentage of vulnerable populations and a
low percentage of population vaccinated against influenza, then those locations are at higher
risk of developing a high influenza-related mortality rate and will require additional staffing
relief.
This dataset provides insight on vaccination rates among children under the age of 5
(vulnerable population) and could help determine geographical locations with low percentages
of population vaccinated against influenza.

You might also like