Business Analitics

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Assignment I

Business Analytics

Submitted To: Dr. Sridhar Manohar


Submitted By: Khyati Tiwari
Roll No.: 020
Date: 25TH OCTOBER 2021
INTRODUCTION
This dataset includes several months (and counting) of data on daily trending
YouTube videos. Data is included for the US, GB, DE, CA, FR, RU, MX, KR, JP, IN
regions (USA, Great Britain, Germany, Canada, France, Russia, Mexico, South Korea,
Japan and India respectively), with up to 300 statical values.
Data includes the video title, views, likes and dislikes.

Possible uses for this dataset could include:


 Sentiment analysis in a variety of forms
 Categorising YouTube videos based on their comments and statistics.
 Training ML algorithms like RNNs to generate their own YouTube comments.
 Analysing what factors affect how popular a YouTube video will be.
 Statistical analysis over time

I have taken the data from: - Trending YouTube Video Statistics | Kaggle

Descriptive Analysis
Descriptive statistics are brief descriptive coefficients that summarize a given data set,
which can be either a representation of the entire population or a sample of a
population. Descriptive statistics are broken down into measures of central tendency
and measures of variability (spread). Measures of central tendency include the mean,
median, and mode, while measures of variability include standard deviation, variance,
minimum and maximum variables, kurtosis, and skewness.
In this data we have information regarding 300 Trending YouTube Video Statistics.
Of which I have tried calculated measures of central tendency include the mean,
median, and mode, while measures of variability include standard deviation, variance,
minimum and maximum variables, kurtosis, and skewness.

Sample Mean
A sample mean is an average of a set of data. The sample mean can be used to
calculate the central tendency, standard deviation and the variance of a data set. The
sample mean can be applied to a variety of uses, including calculating population
averages.
Of the 300 data I have taken 100 data as sample value and tried to calculate measures
of central tendency include the mean, median, and mode, while measures of
variability include standard deviation, variance, minimum and maximum
variables, kurtosis, and skewness of the take sample.

Correlation
Correlation, in the finance and investment industries, is a statistic that measures the
degree to which two securities move in relation to each other. Correlations are used in
advanced portfolio management, computed as the correlation coefficient, which has a
value that must fall between -1.0 and +1.0.

The Correlation (Population) is 0.364 which means that there is positive relation
between likes and dislikes, but the relation is weak in nature.

The Correlation (Sample Mean) is 0.288 which means that there is positive relation
between likes and dislikes, but the relation is weak in nature.

Covariance
Covariance measures the directional relationship between the returns on two assets. A
positive covariance means that asset returns move together while a negative
covariance means they move inversely. Covariance is calculated by analysing at-
return surprises (standard deviations from the expected return) or by multiplying the
correlation between the two variables by the standard deviation of each variable.

As per my calculations the Covariance (Population) of likes and likes is 24793477973,


of likes and dislikes is 609861232.1 and of dislikes and dislikes is 112930643.7.
As per my calculations the Covariance (Sample Mean) of likes and likes is
35421848759, of likes and dislikes is 947440868.9 and of dislikes and dislikes is
305391014.4.

Regression
Regression is a statistical method used in finance, investing, and other disciplines that
attempts to determine the strength and character of the relationship between one
dependent variable (usually denoted by Y) and a series of other variables (known as
independent variables).
Regression helps investment and financial managers to value assets and understand
the relationships between variables, such as commodity prices and the stocks of
businesses dealing in those commodities.

T-Test
A t-test is a type of inferential statistic used to determine if there is a significant
difference between the means of two groups, which may be related in certain features.
It is mostly used when the data sets, like the data set recorded as the outcome from
flipping a coin 100 times, would follow a normal distribution and may have unknown
variances. A t-test is used as a hypothesis testing tool, which allows testing of
an assumption applicable to a population. 

A t-test looks at the t-statistic, the t-distribution values, and the degrees of freedom to


determine the statistical significance. To conduct a test with three or more means, one
must use an analysis of variance.

Scatter Plot
A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two
different numeric variables. The position of each dot on the horizontal and vertical
axis indicates values for an individual data point. Scatter plots are used to observe
relationships between variables.

CLT (Central Limit Theorem)


According to the central limit theorem, the mean of a sample of data will be closer to
the mean of the overall population in question, as the sample size increases,
notwithstanding the actual distribution of the data. In other words, the data is accurate
whether the distribution is normal or aberrant.

As a general rule, sample sizes of around 30-50 are deemed sufficient for the CLT to
hold, meaning that the distribution of the sample means is fairly normally distributed.
Therefore, the more samples one takes, the more the graphed results take the shape of
a normal distribution.

You might also like