Macha Final Project

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Final Project

Anuraag K. Macha

ALY6010: Probability Theory and Introductory Statistics

Dr. Thomas Goulding

06/27/24
Introduction

The initial exploratory data analysis of the bike sharing dataset focused on uncovering the

dynamics of bike rental patterns in urban environments. This dataset, sourced from a publicly

available repository, includes various features such as date, season, weather conditions,

temperature, humidity, and rental counts, which provide a comprehensive view of bike usage.

Our analysis involved data cleaning to ensure quality and reliability, followed by a detailed

exploration of relationships between key variables.

We began with visualizing the data through scatter plots, density plots, and histograms to

get an initial understanding at the relationships. For instance, scatter plots revealed clear

relationships between temperature and bike rental counts, with higher temperatures generally

associated with increased rentals. Similarly, lower wind speeds tended to correspond with higher

rental counts, suggesting that favorable weather conditions promote bike usage. Density plots

and histograms provided deeper insights into the distributions of temperature, humidity, and bike

rental counts, highlighting the frequency of different rental counts and temperature values.

From this rudimentary analysis, two key questions emerged that warranted further

investigation:

1. How do rental patterns differ between casual and registered users?

2. What are the peak hours for bike rentals, and how do they vary by day of the week?

These questions were chosen because understanding the differences in rental patterns

between user types and the influence of holidays and peak bike usage can provide valuable

insights for optimizing bike sharing operations and improving user experience.
How Do Rental Patterns Differ Between Casual and Registered Users?

This question was chosen to explore how the behaviors of casual and registered users

differ. Casual users may represent tourists or infrequent users, while registered users are likely

more consistent commuters. By examining the rental patterns of these two groups, we can

identify specific trends and preferences, which can help in tailoring marketing strategies and

resource allocation to better meet the needs of each user type.

To explore how rental patterns differ between casual and registered users, a hypothesis

test was conducted to compare the average number of bike rentals between these two user

groups.

 Null Hypothesis (H0): The average number of rentals does not differ between casual

and registered users.

 Alternative Hypothesis (Ha): The average number of rentals differs between casual and

registered users.

A two-sample t-test at a significance level of 0.05 was used to compare the means of bike

rentals between casual and registered users. This test is appropriate because it compares the

means of two independent groups.

Figure 1: Two Sample t-test using t.test command


Since the p-value (< 2.2e-16) is much less than the significance level (0.05), we reject the null

hypothesis. This indicates that there is a significant difference in the average number of rentals

between casual and registered users.

To visualize the impact of holidays on bike rental counts, we created scatterplots and

performed a linear regression analysis.

Figure 2: Scatter plot and linear regression

The scatterplot and regression analysis indicate a positive relationship between casual and

registered users, suggesting that both user types follow similar rental patterns. However, the low

R-squared value (0.256) suggests that the casual users is not effectively explaining the variation

in the registered users.

What are the Peak Hours for Bike Rentals, and How Do They Vary by Day of the Week?

Understanding the peak hours for bike rentals and their variation by day of the week is

fundamental for optimizing bike sharing operations. By identifying the times when bike rentals
are highest, bike sharing operators can ensure adequate bike availability during these peak

periods, thereby enhancing user satisfaction and operational efficiency. Additionally, the

question addresses different user behaviors on weekdays versus weekends. Analyzing these

variations can help in tailoring services and promotional strategies to cater to different user

groups.

 Null Hypothesis (H0): There is no significant difference in the average number of

rentals across different hours of the day and different days of the week.

 Alternative Hypothesis (Ha): There is a difference in the average number of rentals on

holidays and non-holidays.

We used an ANOVA test to compare the means of bike rentals across different hours of

the day and days of the week. This test helps determine if there are statistically significant

differences between the means of multiple groups.

Figure 3: Anova test using aov command

The very low p-value (< 2e-16) for hour indicates that there is a statistically significant

difference in the average number of bike rentals across different hours of the day. This means

that the hour of the day has a significant impact on bike rental patterns. The very low p-value

(7.89e-08) for weekday indicates that there is a statistically significant difference in the average
number of bike rentals across different days of the week. This means that the day of the week

also has a significant impact on bike rental patterns.

The high F-value for hour (760.744) compared to the F-value for weekday (7.318)

suggests that the hour of the day has a much stronger effect on bike rental counts than the day of

the week.

To further visualize the relationship between the hour of the day and the number of bike

rentals, we created scatterplots and performed a linear regression analysis.

Figure 4: Scatter plot

The scatterplot, colored by the day of the week, visualizes the relationship between the hour of

the day and bike rental counts. The linear regression lines show the trends for each day of the

week.
 Peak Hours: The scatterplot and regression analysis revealed peak rental hours during

morning (7-9 AM) and evening (4-6 PM) commutes. These peaks are more pronounced

on weekdays, reflecting typical work commute patterns.

 Day of the Week Variations: The analysis showed variations across different days of the

week. Weekdays generally have higher rental counts during commute hours, while

weekends show a more even distribution of rentals throughout the day, peaking slightly

later in the morning.

Conclusion

The insights gained from this analysis are crucial for optimizing bike sharing operations.

Understanding peak rental times helps ensure adequate bike availability during high-demand

periods, enhancing user satisfaction and operational efficiency. Recognizing the different rental

patterns of casual and registered users can inform targeted marketing strategies and resource

allocation to better meet the needs of each user group. Moreover, the significant impact of the

hour of the day and day of the week on rental patterns underscores the importance of temporal

considerations in planning and managing bike sharing systems.

These findings provide actionable insights that can guide bike sharing operators in

making data-driven decisions to improve service efficiency, user experience, and overall

sustainability of urban transportation solutions.


Works Cited

Kabacoff, R. (2022). R in action: Data analysis and graphics with R and Tidyverse. Manning

Publications.

Bluman, A. G. (2018). Elementary statistics: A step by step approach. McGraw-Hill Education.

R functions. (n.d.). https://www.w3schools.com/r/r_functions.asp

Appendix

The written and executed R commands are included in the R script file that was submitted

alongside this file.

You might also like