Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Probability and Statistics

Chapter 1
Introduction to Statistics

Prof. Simón Ortiz

School of Economics and Business


Universidad de Navarra

2021-22

1 / 27
Contents

1 Introduction

2 Population and sample

3 Types of variables

4 Softwares: R and Stata

2 / 27
Contents

1 Introduction

2 Population and sample

3 Types of variables

4 Softwares: R and Stata

3 / 27
Introduction

• This course introduces the basic concepts on Probability &


Statistics. Students will be able to, using different statistical
techniques and datasets, infer scientific conclusions particular
economic or business problems
• Sessions:
• First day: Tuesday
• Second day: Thursday
• Professors:
• Simón Ortiz (sortiz3@alumni.unav.es).
Office: 2300
Office hours: Tuesday 8:00 - 10:00 & Thursday 10:00 - 12:00.
By appointment (email)
• Juncal Cuñado (jcunado@unav.es).
Office: 2180
Office hours: TBA

4 / 27
Program

1 Introduction to Statistics
2 Descriptive Statistics
3 Probability
4 Probability Models
5 Statistic Inference
6 One sample tests of hypothesis
7 Two sample tests of hypothesis
8 Analysis of Variance
9 Simple Linear Regression.

5 / 27
Educational activities

• Lectures: theoretical and practical classes.


• Group assignment.
• Personal study and teamwork.
• Midterm and final exam

6 / 27
Course plan
Month Week Lecture
September 1 Chapter 1-2
September 2 Chapter 2
September 3 Chapter 3
September 4 Chapter 3-4
September 5 Chapter 4 Group assignment
October 1 Chapter 5 Midterm
October 2 Chapter 5-6
October 3 Chapter 6
October 4 Chapter 6-7
November 1 Chapter 7
November 2 Chapter 7-8 Group assignment
November 3 Chapter 8
November 4 Chapter 9
December Final exam
Table 1: Course plan
7 / 27
Assessment

• Group assignment (20%) - September 30 & November 15 (at


noon)
• Midterm exam (25%) - October 8 (13:00 - 15:00)
• Final exam (55%) - December 14
Students must achieve at least a 4 in the final exam to pass the
course.

8 / 27
Teaching material

• Slides in adi
• Notes from Lectures
• Newbold, P., Carlson, W.L., Thorne, B., (2012), “Statistics for
Business and Economics”, Prentice Hall, 8th edition.

9 / 27
Softwares

• R (RStudio) for Probability (Chapters 1-4)


• Stata for Statistics (Chapters 5-9)

10 / 27
How to study

• A little every day is the recommended way. Studying the whole


program two weeks before the exam is the surest way to fail.
• Reading the slides before coming to the lectures
• Try to solve exercises before looking at the solution.
Hard work pays off!

11 / 27
Motivation

Why do we study probability and statistics?


• Core course to understand others: Econometrics, Time series...
• First step for Data Analysis, Machine Learning...
• Selection processes
• GMAT & GRE
Moneyball clip
21 Blackjack clip

12 / 27
Definitions

• Probability: The mathematical tools needed to perform infer-


ence on the data.
• Statistics: the science of collecting, organizing, presenting, analysing
and interpreting data to assist in making more effective deci-
sions.
• Descriptive Statistics: Focus on graphical and numerical proce-
dures that are used to summarize and process data. In other
words, collecting, describing and summarizing data.
• Inferential Statistics: Focus on using the data to make predic-
tions, forecasts, and estimates to make better decisions. In sum,
drawing conclusions from data.
Example: assessing the efficacy of a new drug:
1 1 Collect data, e.g. two groups of volunteers receiving (without
knowing in which group they are) either the drug or a placebo.
Describe and summarize the data (tables, charts...).
2 2 Infer conclusions from the data, e.g. is the new drug efficient?
13 / 27
Contents

1 Introduction

2 Population and sample

3 Types of variables

4 Softwares: R and Stata

14 / 27
Population

• A Population is the complete set of all items that interest an


investigator. Population size, N , can be very large or even
infinite.
• In other words, a population is the total collection of elements
of interest.
Examples: All potential buyers of a new product, all stocks traded
on the NYSE, all registered voters in a particular city or country...

15 / 27
Sample

• A Sample is an observed subset (or portion) of a population


with sample size given by n. In other words, a subgroup of the
population.
• A Random Sample is a procedure used to select a sample of
n objects from a population in such a way that each member
of the population is chosen strictly by chance, the selection
of one member does not influence the selection of any other
member, each member of the population is equally likely to be
chosen, and every possible sample of a given size, n, has the
same chance of selection.
• Inferential Statistics usually implies drawing conclusions about
the whole population using information from a (random) sam-
ple.

16 / 27
Population and sample

17 / 27
Population and sample

18 / 27
Parameter and statistic

• Parameter is a numerical measure that describes a specific char-


acteristic of a population.
• Statistic is a numerical measure that describes a specific char-
acteristic of a sample
Example: We want to know the average age of registered voters in
the US. Population is too big, we take only a random sample (500
US voters), and calculate their average age.
Because this average is based on sample data, it is called a statistic.
If we were able to calculate the average age of the entire population,
then the resulting average would be called a parameter.
• Sampling error: information is available on only a subset of all
the population members.

19 / 27
Contents

1 Introduction

2 Population and sample

3 Types of variables

4 Softwares: R and Stata

20 / 27
Types of variables
• Qualitative variable: nonnumeric responses that belong to groups
or categories. Also called categorical or string variable
• Examples: car ownership, gender, state of birth, eye color.
• Quantitative variable: numeric. Also called numerical variable
• Discrete numerical variable: May (but does not necessarily) have
a finite number of values. The most common type of discrete
numerical variable produces a response that comes from a count-
ing process
Examples: number of students enrolled in the class, number of
cars sold in 2010, number of stocks in an investor’s portfolio.
• Continuous numerical variable: May take on any value within a
given range of real numbers and usually arises from a measure-
ment (not a counting) process.
Examples: class time until break, weight, salary, height.
• How to differentiate between discrete and continuous variables?
In a continuous variable, we could deviate within a certain amount,
depending on the precision of the measurement instrument used,
and we could truncate continuous variables and treat them as
discrete.
21 / 27
Types of variables

22 / 27
Contents

1 Introduction

2 Population and sample

3 Types of variables

4 Softwares: R and Stata

23 / 27
R

• R is a programming language and free software environment for


statistical computing and graphics supported by the R Founda-
tion for Statistical Computing (http://www.r-project.org/).
• Available on Windows, OS X, Linux. . . It looks slightly
austere.
• But it can be used in combination with RStudio (http://rstudio.org/).

24 / 27
RStudio

25 / 27
Stata

• Stata is a general-purpose statistical software package devel-


oped by StataCorp for data manipulation, visualization, statis-
tics, and automated reporting.
• It is not free
• Easier than R, but less powerful and rich.
• You have it installed in the university computers.

26 / 27
Stata

27 / 27

You might also like