Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

TSlab1_revathy.

R
revak

2024-05-19
##1> Set Up

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## ── Attaching packages ────────────────────────────────────────────── fpp3


0.5 ──

## ✔ tibble 3.2.1 ✔ tsibble 1.1.4


## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.3.2
## ✔ lubridate 1.9.3 ✔ fable 0.3.4
## ✔ ggplot2 3.5.1 ✔ fabletools 0.4.2

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ─────────────────────────────────────────────────
fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse


2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ──────────────────────────────────────────
tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all
conflicts to become errors

##2> Download the file


##The path to the downloaded file is "C:\Users\revak\Downloads\jobs.csv

##3> # Load the dataset


jobs <- read_csv("C:/Users/revak/Downloads/jobs.csv")

## Rows: 19 Columns: 21
## ── Column specification
────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): North American Industry Classification System (NAICS) 3
## dbl (7): 2002, 2003, 2004, 2005, 2006, 2007, 2008
## num (13): 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018,
2019, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this
message.

# View the first few rows and column names


head(jobs)

## # A tibble: 6 × 21
## North American Indus…¹ `2002` `2003` `2004` `2005` `2006` `2007` `2008`
`2009`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
<dbl>
## 1 Forestry, logging and… 7941 8101 7966 7666 7205 6332 5544
4406
## 2 Mining, quarrying, an… 17184 17256 18975 19338 20416 23222 28611
23251
## 3 Utilities 43723 43280 46111 44619 45578 46787 46434
45569
## 4 Construction 213684 221792 227238 233124 244687 258014 271446
259994
## 5 Manufacturing 903171 896256 875269 854352 835430 799984 753952
654947
## 6 Transportation and wa… 209937 213273 215467 221099 227585 232039 233812
229055
## # ℹ abbreviated name:
## # ¹`North American Industry Classification System (NAICS) 3`
## # ℹ 12 more variables: `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013`
<dbl>,
## # `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>, `2018` <dbl>,
## # `2019` <dbl>, `2020` <dbl>, `2021` <dbl>

colnames(jobs)

## [1] "North American Industry Classification System (NAICS) 3"


## [2] "2002"
## [3] "2003"
## [4] "2004"
## [5] "2005"
## [6] "2006"
## [7] "2007"
## [8] "2008"
## [9] "2009"
## [10] "2010"
## [11] "2011"
## [12] "2012"
## [13] "2013"
## [14] "2014"
## [15] "2015"
## [16] "2016"
## [17] "2017"
## [18] "2018"
## [19] "2019"
## [20] "2020"
## [21] "2021"

###Lab Steps
##Step 1: Data Cleaning
jobs <- jobs %>%
pivot_longer(cols = `2002`:`2021`, names_to = "year", values_to = "value")
%>%
mutate(year = as.integer(year)) %>%
as_tsibble(key = `North American Industry Classification System (NAICS) 3`,
index = year)

# View the cleaned dataset


print(jobs)

## # A tsibble: 380 x 3 [1Y]


## # Key: North American Industry Classification System (NAICS) 3 [19]
## `North American Industry Classification System (NAICS) 3` year value
## <chr> <int> <dbl>
## 1 Accommodation and food services 2002 346025
## 2 Accommodation and food services 2003 344845
## 3 Accommodation and food services 2004 342658
## 4 Accommodation and food services 2005 340312
## 5 Accommodation and food services 2006 350512
## 6 Accommodation and food services 2007 366718
## 7 Accommodation and food services 2008 383017
## 8 Accommodation and food services 2009 376547
## 9 Accommodation and food services 2010 377721
## 10 Accommodation and food services 2011 391459
## # ℹ 370 more rows

##Step 2: Filter Data for Arts, Entertainment, and Recreation


arts <- jobs %>%
filter(`North American Industry Classification System (NAICS) 3` == "Arts,
entertainment and recreation")

# View the filtered data


print(arts)

## # A tsibble: 20 x 3 [1Y]
## # Key: North American Industry Classification System (NAICS) 3 [1]
## `North American Industry Classification System (NAICS) 3` year value
## <chr> <int> <dbl>
## 1 Arts, entertainment and recreation 2002 83730
## 2 Arts, entertainment and recreation 2003 87619
## 3 Arts, entertainment and recreation 2004 90838
## 4 Arts, entertainment and recreation 2005 90790
## 5 Arts, entertainment and recreation 2006 91562
## 6 Arts, entertainment and recreation 2007 93434
## 7 Arts, entertainment and recreation 2008 94049
## 8 Arts, entertainment and recreation 2009 94024
## 9 Arts, entertainment and recreation 2010 91980
## 10 Arts, entertainment and recreation 2011 92515
## 11 Arts, entertainment and recreation 2012 93126
## 12 Arts, entertainment and recreation 2013 92469
## 13 Arts, entertainment and recreation 2014 94026
## 14 Arts, entertainment and recreation 2015 101652
## 15 Arts, entertainment and recreation 2016 112399
## 16 Arts, entertainment and recreation 2017 114862
## 17 Arts, entertainment and recreation 2018 114857
## 18 Arts, entertainment and recreation 2019 119120
## 19 Arts, entertainment and recreation 2020 84254
## 20 Arts, entertainment and recreation 2021 80992

###This code snippet filters the original jobs data to create a new data
frame arts containing only entries for "Arts, entertainment and recreation"
jobs. It selects rows where the industry code ("North American Industry
Classification System (NAICS) 3") matches that specific category. The
resulting arts data frame has 20 years of data (one row per year) on jobs in
that industry.

##Step 3: Display the Time Plot


arts %>%
ggplot(aes(x = year, y = value)) +
geom_line() +
labs(title = "Arts, Entertainment, and Recreation Jobs Over Time",
x = "Year",
y = "Number of Jobs") +
theme_minimal()

# Save the plot


ggsave("time_plot.png")

## Saving 5 x 4 in image

###This time series plot visualizes an upward trend in Arts, Entertainment &
Recreation jobs from early 2000s to just above 2020.
### The x-axis represents years, and the y-axis likely represents jobs in
thousands.
###The graph shows there is an upward trend(not linear) of increasing jobs
until 2019 and there is a sharp decline in jobs in 2020 and further decline
in the coming year.

##Step 4: Display an ACF Plot


arts %>%
ACF(value) %>%
autoplot() +
labs(title = "ACF of Arts, Entertainment, and Recreation Jobs",
x = "Lag",
y = "ACF") +
theme_minimal()
# Save the plot
ggsave("acf_plot.png")

## Saving 5 x 4 in image

###X-axis (Lag): Represents the number of time steps or lags considered. In


this plot, the x-axis likely ranges from 0 to lags just above 12.5
###Y-axis (ACF): Represents the autocorrelation coefficient at each lag,
ranging from -1 to 1.
###Dotted Blue Lines (Confidence Intervals): These lines show the range of
values within which the true autocorrelation coefficient might fall, with a
certain level of confidence (typically 95%).
###Interpretation : If the autocorrelation value for a specific lag falls
outside the dotted confidence interval lines, it suggests a statistically
significant correlation at that lag. In this case, since the ACF values
appear close to 0 and within the confidence interval lines across all lags,
it indicates that there's likely no statistically significant correlation
between the number of Arts, entertainment, and recreation jobs in a given
year and the number of jobs in previous years.

You might also like