Professional Documents
Culture Documents
Macha Final Project
Macha Final Project
Macha Final Project
Anuraag K. Macha
06/27/24
Introduction
The initial exploratory data analysis of the bike sharing dataset focused on uncovering the
dynamics of bike rental patterns in urban environments. This dataset, sourced from a publicly
available repository, includes various features such as date, season, weather conditions,
temperature, humidity, and rental counts, which provide a comprehensive view of bike usage.
Our analysis involved data cleaning to ensure quality and reliability, followed by a detailed
We began with visualizing the data through scatter plots, density plots, and histograms to
get an initial understanding at the relationships. For instance, scatter plots revealed clear
relationships between temperature and bike rental counts, with higher temperatures generally
associated with increased rentals. Similarly, lower wind speeds tended to correspond with higher
rental counts, suggesting that favorable weather conditions promote bike usage. Density plots
and histograms provided deeper insights into the distributions of temperature, humidity, and bike
rental counts, highlighting the frequency of different rental counts and temperature values.
From this rudimentary analysis, two key questions emerged that warranted further
investigation:
2. What are the peak hours for bike rentals, and how do they vary by day of the week?
These questions were chosen because understanding the differences in rental patterns
between user types and the influence of holidays and peak bike usage can provide valuable
insights for optimizing bike sharing operations and improving user experience.
How Do Rental Patterns Differ Between Casual and Registered Users?
This question was chosen to explore how the behaviors of casual and registered users
differ. Casual users may represent tourists or infrequent users, while registered users are likely
more consistent commuters. By examining the rental patterns of these two groups, we can
identify specific trends and preferences, which can help in tailoring marketing strategies and
To explore how rental patterns differ between casual and registered users, a hypothesis
test was conducted to compare the average number of bike rentals between these two user
groups.
Null Hypothesis (H0): The average number of rentals does not differ between casual
Alternative Hypothesis (Ha): The average number of rentals differs between casual and
registered users.
A two-sample t-test at a significance level of 0.05 was used to compare the means of bike
rentals between casual and registered users. This test is appropriate because it compares the
hypothesis. This indicates that there is a significant difference in the average number of rentals
To visualize the impact of holidays on bike rental counts, we created scatterplots and
The scatterplot and regression analysis indicate a positive relationship between casual and
registered users, suggesting that both user types follow similar rental patterns. However, the low
R-squared value (0.256) suggests that the casual users is not effectively explaining the variation
What are the Peak Hours for Bike Rentals, and How Do They Vary by Day of the Week?
Understanding the peak hours for bike rentals and their variation by day of the week is
fundamental for optimizing bike sharing operations. By identifying the times when bike rentals
are highest, bike sharing operators can ensure adequate bike availability during these peak
periods, thereby enhancing user satisfaction and operational efficiency. Additionally, the
question addresses different user behaviors on weekdays versus weekends. Analyzing these
variations can help in tailoring services and promotional strategies to cater to different user
groups.
rentals across different hours of the day and different days of the week.
We used an ANOVA test to compare the means of bike rentals across different hours of
the day and days of the week. This test helps determine if there are statistically significant
The very low p-value (< 2e-16) for hour indicates that there is a statistically significant
difference in the average number of bike rentals across different hours of the day. This means
that the hour of the day has a significant impact on bike rental patterns. The very low p-value
(7.89e-08) for weekday indicates that there is a statistically significant difference in the average
number of bike rentals across different days of the week. This means that the day of the week
The high F-value for hour (760.744) compared to the F-value for weekday (7.318)
suggests that the hour of the day has a much stronger effect on bike rental counts than the day of
the week.
To further visualize the relationship between the hour of the day and the number of bike
The scatterplot, colored by the day of the week, visualizes the relationship between the hour of
the day and bike rental counts. The linear regression lines show the trends for each day of the
week.
Peak Hours: The scatterplot and regression analysis revealed peak rental hours during
morning (7-9 AM) and evening (4-6 PM) commutes. These peaks are more pronounced
Day of the Week Variations: The analysis showed variations across different days of the
week. Weekdays generally have higher rental counts during commute hours, while
weekends show a more even distribution of rentals throughout the day, peaking slightly
Conclusion
The insights gained from this analysis are crucial for optimizing bike sharing operations.
Understanding peak rental times helps ensure adequate bike availability during high-demand
periods, enhancing user satisfaction and operational efficiency. Recognizing the different rental
patterns of casual and registered users can inform targeted marketing strategies and resource
allocation to better meet the needs of each user group. Moreover, the significant impact of the
hour of the day and day of the week on rental patterns underscores the importance of temporal
These findings provide actionable insights that can guide bike sharing operators in
making data-driven decisions to improve service efficiency, user experience, and overall
Kabacoff, R. (2022). R in action: Data analysis and graphics with R and Tidyverse. Manning
Publications.
Appendix
The written and executed R commands are included in the R script file that was submitted