Professional Documents
Culture Documents
Review - EDA J Component
Review - EDA J Component
In this work, dataset is collected from an internet source and cleaned. Using Python
programming, data preprocessing is done where we identify, handle and treat the missing
values from the taken dataset. Then, we visualize a graphical representation, where
insights behind our dataset on uber drives are explained with first and last 10 records of
uber drives, information about the variables and the summary of the dataset.
Keywords: uber drives , missing values, visualization, bar graph and summary
INTRODUCTION:
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques
which includes,
to maximize insight into a data set,
to uncover underlying structure,
to extract important variables,
to detect outliers and anomalies,
to test underlying assumptions,
to develop parsimonious models and
to determine optimal factor settings.
PROBLEM DEFINITION
Authors: K. Abt
Article title: Descriptive data analysis: A concept between confirmatory and exploratory data analysis
Journal: Methods of Information in Medicine
Description
[4] Confirmatory Data Analysis (CDA) in randomized comparative ("controlled") studies with many variables and/or time points of
interest finds its limitations in the multiplicity of desired inferential statements which leads to unfeasibly small adjusted significance
levels ("Bonferronization") and, thereby, to unduly increased risks of not rejecting false hypotheses. In general, analytical models
adequate for such complex data structures and suitable for practical use do not exist as yet. Exploratory Data Analysis (EDA), on the
other hand, is usually intended to generate hypotheses and not to lead to final conclusions based on the results of the study. In this paper,
it is proposed to fill the conceptual gap between CDA and EDA by "Descriptive Data Analysis" ("DDA") which concept is mainly based
on descriptive inferential statements. The results of a DDA in a controlled study are interpreted simultaneously on the basis of the
investigator's experience with respect to numerically relevant treatment effect differences and on "descriptive significances" as they
appear in "near regular" patterns corresponding to the resulting relevant effect differences. A DDA may also contain confirmatory parts
and/or tests on global hypotheses at a prechosen maximum risk a of erroneously rejecting true hypotheses. The paper is in parts
expository and is addressed to investigators as well as statisticians.
METHODOLOGY
Data Collection
Data Cleaning
EDA
Data Visualization
RESULTS AND DISCUSSION
We have found the first and last 10 records of uber drives, dimension and size of the
dataset, information about all variables, information on missing values, summary of the
original data, unique start destinations, total number of unique start and stop destinations,
most popular starting and dropping destinations, most frequent routes and other
information on bar plot for purposes, miles and number of trips and proportion of trips
between Business and Personal trips.
Initially, we need to load the above libraries and import and load the dataset from Kaggle
website [9].
LAST AND FIRST 10 RECORDS OF
UBER DRIVES:
SUMMARY
[1] M. Wang and L. Mu, “Spatial disparities of Uber accessibility: An exploratory analysis in Atlanta, USA,” Computers, Environment
and Urban Systems, vol. 67, pp. 169–175, 2018, doi: 10.1016/j.compenvurbsys.2017.09.003.
[2] A. R. L. JOHN T. BEHRENS, KRISTEN E. DICERBO, NEDIM YEL, “Exploratory data analysis,” Handbook of Psychology, vol.
Second Edi, pp. 185–203, 2016, doi: 10.1007/978-3-319-43742-2_15.
[3] J. Cramer and A. B. Krueger, “Disruptive change in the taxi business: The case of uber,” American Economic Review, vol. 106, no. 5,
pp. 177–182, 2016, doi: 10.1257/aer.p20161002.
[4] K. Abt, “Descriptive data analysis: A concept between confirmatory and exploratory data analysis,” Methods of Information in
Medicine, vol. 26, no. 2, pp. 77–88, 1987, doi: 10.1055/s-0038-1635488.
[5] https://blog.kiprosh.com/exploratory-data-analysis-a-vital-approach/
[6] https://backlinko.com/uber-users
[7] https://morioh.com/p/4a87b1620dda
[8] https://medium.com/swlh/uber-case-study-8da589c6161b
[9] https://www.kaggle.com/zusmani/uberdrives
CONCLUSION
In this work, we could conclude that, from the uber drives dataset taken, the first and last 10
records, missing values, the summary and other information on bar plot such as
the total number of unique start and stop destinations are 176 and 187,
the most popular starting and dropping point is Cary,
the most frequent routes taken by uber driver is Cary to Morrisville and vice-versa,
longest miles travelled and maximum number of trips travelled is for the meeting
purpose and
the proportion of trips between Business and Personal is 93.33 : 6.67.