Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

1. Define statictics and limitations and also used in business and trade.

Statistics is a branch of science dealing with collecting, organizing, summarizing, analyzing and making decisions
from data.
There are two types of Statistics
Descriptive
Inferential
Limitations in Statistics

 Cannot deal with single observation or value


 It can’t be used for qualitative features
 It gives information about a group of people andnot about individual
 It does not depict entire story of phenomenon
 It is liable to be misused
 Results are true only on average:
 Statistical results are not always beyond doubt
Role of Statistics in Business and Trade
Statistics are crucial in business and trade for quantitative analysis, facilitating informed decision-making and
operational efficiency. Here are five key applications:
Descriptive Statistics: Summarizes data with measures like mean and standard deviation to understand sales
performance and market demographics.
Inferential Statistics: Uses sample data to make predictions about a population, employing techniques like hypothesis
testing and regression analysis for sales forecasting and marketing strategy assessment.
Predictive Analytics: Analyzes historical data to predict future trends using methods like time series analysis, aiding in
inventory management and demand forecasting.
Quality Control: Monitors and improves processes using control charts and Six Sigma methodologies to ensure
product quality and reduce defects.
Risk Management: Identifies and assesses risks with tools like Value at Risk (VaR) and Monte Carlo simulations,
helping businesses manage uncertainties and make strategic decisions.
2. Explain sampling and purpose of it.
Sampling is a process in statistical analysis where researchers take a predetermined number of observations from a
larger population. Sampling allows researchers to conduct studies about a large group by using a small portion of the
population. The method of sampling depends on the type of analysis being performed, but it may include simple
random sampling or systematic sampling.
Probability Sampling
It is mainly used in quantitative research. If you want to produce results that are representative of the whole
population, probability sampling techniques are the most validchoice.

In systematic sampling every member of the population is listed with a number, but instead of randomly generating
numbers, individuals are chosen at regular intervals.
Stratified sampling involves dividing the population into subpopulations that may differ in important ways. It allows
you draw more precise conclusions by ensuring that every subgroup is properly represented in the sample.

Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar
characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select entire
subgroups. This method is good for dealing with large and dispersed populations.

Sampling is essential in quantitative analysis for several key reasons:

Efficiency: It saves time and reduces costs by allowing researchers to study a subset of the population instead of the
entire group.
Manageability: Handling a smaller, representative sample makes data collection, processing, and analysis more
practical and less resource-intensive.
Accuracy: Proper sampling techniques, such as random sampling, ensure that the sample represents the population
well, reducing bias and increasing the reliability of results.
Detailed Analysis: Sampling makes it feasible to conduct in-depth analyses, which would be impractical with a larger
population.
Hypothesis Testing: It enables researchers to test hypotheses and make inferences about the population based on
sample data, supporting decision-making with known levels of confidence.

3. What is regression analysis how does it differ from correlation?


Regression analysis is a form of predicitive modelling technique which investigates the relationship between a
dependent and independent variable.bascically a regression analysis involves graphing a line over points that most
line over a set of data points that most closely fits the overaall shape of the data .a regression shows the changes in
a dependent variable on the y axis to the explantory on the x axis.

4. Show that sample variance is an unbaised estimatorof population variance?


5. The concept of point estimation?
Point estimators are defined as functions that can be used to find the approximate value of a particular point from a
given population parameter. The sample data of a population is used to find a point estimate or a statistic that can act
as the best estimate of an unknown parameter that is given for a population.
What are the Properties of Point Estimators?
It is desirable for a point estimate to be the following :

Consistent - We can say that the larger is the sample size, the more accurate is the estimate.
Unbiased - The expectation of the observed values of various samples equals the corresponding population
parameter. Let’s take, for example, We can say that sample mean is an unbiased estimator for the population mean.
Most Efficient That is also Known as Best Unbiased - of all the various consistent, unbiased estimates, the one
possessing the smallest variance (a measure of the amount of dispersion away from the estimate). In simple words,
we can say that the estimator varies least from sample to sample and this generally depends on the particular
distribution of the population. For example, the mean is more efficient than the median (that is the middle value) for
the normal distribution but not for more “skewed” ( also known as asymmetrical) distributions.

What are the Methods Used to Calculate Point Estimators?


The maximum likelihood method is a popularly used way to calculate point estimators. This method uses differential
calculus to understand the probability function from a given number of sample parameters.
Named after Thomas Bayes, the Bayesian method is another way using which the frequency function of a parameter
can be understood. This is a more non-traditional approach. However, in this case, enough information on the
distribution of the parameter is not always given but in case it is, then the estimation can be done fairly easily.

6. Primary and secondary data? Differnce? Precaution to use secondary data?


Primary data –coolected by the investigator himself for the purpose of a specific enquiry or study.it is used as orginal
data and it is genearted by conducting a survey.

Secondary data- data has been collected and analyzed by some agency for its own use and later the data is used by
a different agency.
Precaution
(i) Suitable Purpose of Investigation- The investigator must ensure that the data are suitable for the purpose of the
enquiry.
(ii) Inadequate Data- Adequacy of the data is to be judged in the light of the requirements of the survey as well as the
geographical area covered by the available data.
(iii) Definition of Units- The investigator must ensure that the definitions of units are the same as in the earlier
investigation.
(iv) Degree of Accuracy- The investigator should keep in mind the degree accuracy maintained by each investigator.
(v) Time and Condition of Collection of Facts- It should be ascertained before making use of available data, to which
period and conditions the data were collected.

7. Differnce
Aspect Critical Region Region of Acceptance
Values leading to rejection of the null Values leading to acceptance of the null
Definition hypothesis. hypothesis.
Decision Reject null hypothesis. Do not reject null hypothesis.
Significance Level Determined by 𝛼α (e.g., 0.05). Complement of the critical region.
Probability Probability of Type I error (𝛼α). Region where Type II error (β) is considered.
Location Tails of the distribution. Central part of the distribution.

Aspect Null Hypothesis Alternative Hypothesis


Assumes no effect, no difference, or no Assumes a specific effect, difference, or
Statement relationship. relationship exists.
Denoted by 𝐻0H0 𝐻1H1 or 𝐻𝑎Ha
Subject of statistical testing, often represents
Objective the status quo. Challenges or contradicts the null hypothesis.
Default Presumed to be true until evidence suggests
Assumption otherwise. Requires evidence to support its assertion.
Aspect Null Hypothesis Alternative Hypothesis
Tested against sample data to determine its Tested alongside the null hypothesis to
Hypothesis Test validity. determine significance.
Either rejected or failed to be rejected based
Decision on evidence. Accepted only if the null hypothesis is rejected.

8. Method of moments
The moments in the "Method of Moments" refer to the statistical properties (sample mean and sample variance) of a
distribution, and the method utilizes these moments to estimate the parameters of the distribution.

Let f(ɵ1, ɵ2… ɵk )be the p.d.f of the population and let x1,x2,…xn be a random sample taken the population.
In the method of moments we find the first k moments of the population and equate them to the corresponding
moments of the sample to obtain k equations.
Then the values of ɵ1, ɵ2… ɵk which are obtained as the solutions of these equations are taken as their estimates.
In short, the method of moments involves equating sample moments with theoretical moments.

9. Multiple regression
Multiple regression analysis is a statistical technique that analyzes the relationship between two or more variables
and uses the information to estimate the value of the dependent variables. In multiple regression, the objective is to
develop a model that describes a dependent variable y to more than one independent variable.
The multiple regression equation is given by y = a + b 1×1+ b2×2+……+ bkxk
Multiple regression analysis permits to control explicitly for many other circumstances that
concurrently influence the dependent variable. The objective of regression analysis is to model
the relationship between a dependent variable and one or more independent variables. Let k
represent the number of variables and denoted by x1, x2, x3, ……, xk. Such an equation is useful
for the prediction of value for y when the values of x are known.
Stepwise regression is a step by step process that begins by developing a regression model
with a single predictor variable and adds and deletes predictor variable one step at a time.
Stepwise multiple regression is the method to determine a regression equation that begins with
a single independent variable and add independent variables one by one. The stepwise multiple
regression method is also known as the forward selection method because we begin with no
independent variables and add one independent variable to the regression equation at each of
the iterations. There is another method called backwards elimination method, which begins with
an entire set of variables and eliminates one independent variable at each of the iterations.
Multicollinearity is a term reserved to describe the case when the inter-correlation of predictor
variables is high.
 The high correlation between pairs of predictor variables.
 The magnitude or signs of regression coefficients do not make good physical sense.
 Non-significant regression coefficients on significant predictors.
 The ultimate sensitivity of magnitude or sign of regression coefficients leads to the insertion
or deletion of a predictor variable.
10. Neyman pearson lemmas
The Neyman-Pearson Lemma is a way to find out if the hypothesis test you are using is the one with the greatest
statistical power.
The power of a hypothesis test is the probability that testcorrectly rejects the null hypothesis when the alternate
hypothesis istrue
The goal would be to maximize this power, so that the null hypothesis is rejected as much as possible when the
alternate is true.
The lemma basically tells us that good hypothesis tests are likelihood ratio tests.
The Neyman-Pearson lemma is based on a simple hypothesis test. A “simple” hypothesis test is one where the
unknown parameters are specified as single values.
The Neyman-Pearson Lemma is a statistical principle that helps to make optimal decisions when we have to choose
between two hypotheses.
In simpler words, it's a way of figuring out the best way to decide between two options when we don't know which
one is true.
The lemma suggests that we should base our decision on the likelihood ratio of the two hypotheses. Specifically, we
should choose the hypothesis that has the highest likelihood ratio, as it is most likelyto be true.
However, there's a catch: we can only make this decision if we specify the level of significance weare willing to
accept. This means we have to decide beforehand how likely we want to be wrong (i.e.,reject a true hypothesis or fail
to reject a false hypothesis) when making our decision.
So, in summary, the Neyman-Pearson Lemma helps us to make an optimal decision between two hypotheses by
considering the likelihood ratio, but we also need to specify our level of significance to make a decision.

11. What is hypothesis testing ?explain z test for single mean ?z test for disffernce of mean?
A hypothesis is defined as a formal statement, which gives the explanation about the relationship between two or
more variables of a specified population.
Assume that a particular hypothesis is true, we find that results observed in a random sample differ markedly from
those expected. We say that observed differences are significant and we reject the hypothesis.
Procedures that enable us to decide to accept or reject hypothesis are called test of hypothesis, test of significance,
decision rules.
The Z-test for a single mean is a statistical test used to determine whether the mean of a sample differs significantly
from a known or hypothesized population mean. This test is typically used when the population variance is known
and the sample size is large (generally n > 30).
Here’s a step-by-step explanation of how the Z-test for a single mean is conducted:
1. Formulate the Hypotheses
 Null Hypothesis (H₀): The sample mean is equal to the population mean. 𝐻0:𝜇=𝜇0H0:μ=μ0
 Alternative Hypothesis (H₁): The sample mean is not equal to the population mean (two-tailed test), or it is
greater than or less than the population mean (one-tailed test). 𝐻1:𝜇≠𝜇0H1:μ=μ0 (two-tailed),
𝐻1:𝜇>𝜇0H1:μ>μ0 or 𝐻1:𝜇<𝜇0H1:μ<μ0 (one-tailed)
2. Determine the Level of Significance
 Choose the significance level (α), which is the probability of rejecting the null hypothesis when it is true.
Common choices are 0.05, 0.01, and 0.10.
3. Calculate the Test Statistic
 The test statistic for the Z-test is calculated using the formula:

4. Determine the Critical Value(s)


 For a two-tailed test, find the critical values 𝑍𝛼/2Zα/2 and −𝑍𝛼/2−Zα/2 from the standard normal distribution
corresponding to the chosen α.
 For a one-tailed test, find the critical value 𝑍𝛼Zα for a right-tailed test or −𝑍𝛼−Zα for a left-tailed test.
5. Make the Decision
 Compare the calculated Z value to the critical value(s):
 For a two-tailed test, if 𝑍Z lies outside the range −𝑍𝛼/2−Zα/2 to 𝑍𝛼/2Zα/2, reject the null
hypothesis.
 For a one-tailed test, if 𝑍Z is greater than 𝑍𝛼Zα (right-tailed) or less than −𝑍𝛼−Zα (left-tailed),
reject the null hypothesis.
6. Interpret the Results
 If you reject the null hypothesis, it suggests that there is enough evidence to say that the sample mean
significantly differs from the population mean.
 If you fail to reject the null hypothesis, it suggests that there is not enough evidence to say that the sample
mean significantly differs from the population mean.
The Z-test for the difference of means is used to determine if there is a significant difference between the means of
two independent samples. This test is applicable when the population variances are known and/or the sample sizes
are large (generally n > 30).
Here’s a step-by-step explanation of how to conduct a Z-test for the difference of means:
1. Formulate the Hypotheses
 Null Hypothesis (H₀): The means of the two populations are equal. 𝐻0:𝜇1=𝜇2H0:μ1=μ2
 Alternative Hypothesis (H₁): The means of the two populations are not equal (two-tailed test), or one mean
is greater than or less than the other mean (one-tailed test). 𝐻1:𝜇1≠𝜇2H1:μ1=μ2 (two-tailed),
𝐻1:𝜇1>𝜇2H1:μ1>μ2 or 𝐻1:𝜇1<𝜇2H1:μ1<μ2 (one-tailed)
2. Determine the Level of Significance
 Choose the significance level (α), which is the probability of rejecting the null hypothesis when it is true.
Common choices are 0.05, 0.01, and 0.10.
3. Calculate the Test Statistic
 The test statistic for the Z-test for the difference of means is calculated using the formula:

4. Determine the Critical Value(s)


 For a two-tailed test, find the critical values 𝑍𝛼/2Zα/2 and −𝑍𝛼/2−Zα/2 from the standard normal distribution
corresponding to the chosen α.
 For a one-tailed test, find the critical value 𝑍𝛼Zα for a right-tailed test or −𝑍𝛼−Zα for a left-tailed test.
5. Make the Decision
 Compare the calculated Z value to the critical value(s):
 For a two-tailed test, if 𝑍Z lies outside the range −𝑍𝛼/2−Zα/2 to 𝑍𝛼/2Zα/2, reject the null
hypothesis.
 For a one-tailed test, if 𝑍Z is greater than 𝑍𝛼Zα (right-tailed) or less than −𝑍𝛼−Zα (left-tailed),
reject the null hypothesis.
6. Interpret the Results
 If you reject the null hypothesis, it suggests that there is enough evidence to say that the means of the two
populations significantly differ.
 If you fail to reject the null hypothesis, it suggests that there is not enough evidence to say that the means of
the two populations significantly differ.

12. Stratified sampling? Merits and demerits?


n a stratified sample, researchers divide a population into homogeneous subpopulations called strata (the plural of
stratum) based on specific characteristics (e.g., race, gender identity, location). Every member of the population
studied should be in exactly one stratum.
Each stratum is then sampled using another probability sampling method, such as cluster or simple random
sampling, allowing researchers to estimate statistical measures for each subpopulation.
Researchers rely on stratified sampling when a population’s characteristics are diverse and they want to ensure that
every characteristic is properly represented in the sample.
adv
1. Unbiased in nature
When the provided population is divided into a number of homogenous groups according to purposive characteristics
and then utilizing the technique of random selection to gather samples from each stratum, a well-prepared and
performed stratified random sampling plan avoids the disadvantages of purposeful sampling and random sampling
while still enjoying the benefits of both methods.
2. Higher accuracy
Compared to regular random sampling, stratified random sampling offers more accurate estimates since the
variability within each and every stratum is reduced.
3. Efficiency in survey execution
Stratified sampling can make data collecting easier and save survey expenses.
4. Reliable source for sampling
For various demographic groups, it is occasionally desirable to achieve distinct levels of accuracy. The only sampling
strategy that permits us to obtain findings with known precision for each stratum is stratified random sampling.
Disadv
1. Dependency on other factors
The effective division of the population into homogeneous strata and the appropriate size of the sample to be
obtained from each stratum are essential for stratified random sampling to be successful.
2. Issue with value application
The values that must be applied to the various strata in a disproportional stratified sample must be accurate;
otherwise, the samples will not be fair and may produce biased results.
3. Needs proper focus
Once the researchers are able to create subgroups that are reasonably homogeneous in comparison to the overall
population can, stratified sampling yields benefits.

13. Explain the mae and mape to check the performance of regression model?
The mean absolute error (MAE) is the simplest regression error metric to understand. We’ll calculate the residual for
every data point, taking only the absolute value of each so that negative and positive residuals do not cancel out.

The MAE is also the most intuitive of the metrics since we’re just looking at the absolute difference between the data
and the model’s predictions. Because we use the absolute value of the residual, the MAE does not indicate
underperformance or overperformance of the model.

The mean absolute percentage error (MAPE) is the percentage equivalent of MAE. The equation looks just like that
of MAE, but with adjustments to convert everything into percentages. the MAPE is how far the model’s predictions
are off from their corresponding outputs on average.

This formula helps us understand one of the important caveats when using MAPE. In order to calculate this metric,
we need to divide the difference by the actual value. This means that if you have actual values close to or at 0 then
your MAPE score will either receive a division by 0 error, or be extremely large. Therefore, it is advised to not use
MAPE when you have actual values close to 0.

The MAPE is a commonly used measure in machine learning because of how easy it is to interpret. The lower the
value for MAPE, the better the machine learning model is at predicting values. Inversely, the higher the value for
MAPE, the worse the model is at predicting values.
for example, if we calculate a MAPE value of 20% for a given machine learning model, then the average difference
between the predicted value and the actual value is 20%. MAPE will favor models that are under-forecast rather than
over-forecast.

14. What is diagrammatic representation of data?adv and dis?


15. Methods of maximum likelihood? Adv and dis

advantages of Maximum likelihood Method


Maximum likelihood provides a consistent approach to parameter estimation problems. This means that maximum
likelihood estimates can be developed for a large variety of estimation situations. For example, they can be applied in
reliability analysis to censored data under various censoring models.
Maximum likelihood methods have desirable mathematical and optimality properties. Specifically,
.Several popular statistical software packages provide excellent algorithms for maximum likelihood estimates for
many of the commonly used distributions. This helps mitigate the computational complexity of maximum likelihood
estimation.
Disadvantages of Maximum likelihood Method
The likelihood equations need to be specifically worked out for a given distribution and estimation problem. The
mathematics is often non-trivial, particularly if confidence intervals for the parameters are desired.
The numerical estimation is usually non-trivial. Except for a few cases where the maximum likelihood formulas are
in fact simple, it is generally best to rely on high quality statistical software to obtain maximum likelihood estimates.
Fortunately, high quality maximum likelihood software is becoming increasingly common.
Maximum likelihood estimates can be heavily biased for small samples. The optimality properties may not apply for
small samples
 Maximum likelihood can be sensitive to the choice of starting values

16. Consistency and unbaised point estimation with ex?


19. list methods of collectively statistical data?which of this is reilable and why.
20. define a random variable& its mathematical expectation.
21. what are the test of skewness
22. assumption on mlr?
23. what can u say about max error?

You might also like