Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

How to take the samples using sample() in R?

JournalDev

Let’s understand one of the frequently used functions, sample() in R. In data analysis, taking
samples of the data is the most common process done by the analysts. To study and understand the
data, sometimes taking a sample is the best way and it is mostly true in case of big data.

R offers the standard function sample() to take a sample from the datasets. Many business and data
analysis problems will require taking samples from the data. The random data is generated in this
process with or without replacement, which is illustrated in the below sections.

Let’s roll into the topic!!!

Table of Contents[hide]
 1 Syntax of sample() in R
 2 Taking samples with replacement
 3 Samples Without Replacement in R
 4 Taking samples using the function set.seed()
 5 Taking the sample from a dataset
 6 Taking the samples from the dataset using the set.seed() function
 7 Generating a random sample using sample() in R
 8 Taking samples by setting the probabilities
 9 Wrapping up

Syntax of sample() in R

sample(x, size, replace = FALSE, prob = NULL)


 x – vector or a data set.
 size – sample size.
 replace – with or without replacement of values.
 replace – with or without replacement of values.
 prob – probability weights

Taking samples with replacement

You may wonder, what is taking samples with replacement?

Well, while you are taking samples from a list or a data, if you specify replace=TRUE or T, then
the function will allow repetition of values.

Follow the below example which clearly explains the case.

Página 1
How to take the samples using sample() in R?
JournalDev

#sample range lies between 1 to 5


x<- sample(1:5)
#prints the samples
x
Output -> 3 2 1 5 4

#samples range is 1 to 5 and number of samples is 3


x<- sample(1:5, 3)
#prints the samples (3 samples)
x
Output -> 2 4 5

#sample range is 1 to 5 and the number of samples is 6


x<- sample(1:5, 6)
x
#shows error as the range should include only 5 numbers (1:5)
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace =
FALSE'

#specifing replace=TRUE or T will allow repetition of values so


that the function will generate 6 samples in the range 1 to 5. Here
2 is repeated.

x<- sample(1:5, 6, replace=T)


Output -> 2 4 2 2 4 3

Samples Without Replacement in R

In this case, we are going to take samples without replacement. The whole concept is shown
below.

In this case of without replacement, the function replace=F is used and it will not allow the
repetition of values.

#samples without replacement


x<-sample(1:8, 7, replace=F)
x
Output -> 4 1 6 5 3 2 7
x<-sample(1:8, 9, replace=F)
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace =
FALSE'

#here the size of the sample is equal to range 'x'.

Página 2
How to take the samples using sample() in R?
JournalDev

x<- sample(1:5, 5, replace=F)


x
Output -> 5 4 1 3 2

Taking samples using the function set.seed()

As you may experience that when you take the samples, they will be random and change each time.
In order to avoid that or if you don’t want different samples each time, you can make use
of set.seed() function.

set.seed() – set.seed function will produce the same sequence when you run it.

This case is illustrated below, execute the below code to get the same random samples each time.

#set the index


set.seed(5)
#takes the random samples with replacement
sample(1:5, 4, replace=T)
2313

set.seed(5)
sample(1:5, 4, replace=T)
2313

set.seed(5)
sample(1:5, 4, replace=T)
2313

Taking the sample from a dataset

In this section, we are going to generate samples from a dataset in Rstudio.

This code will take the 10 rows as a sample from the ‘ToothGrowth’ dataset and display it. In this
way, you can take the samples of the required size from the dataset.

#reads the dataset 'Toothgrwoth' and take the 10 rows as sample


df<- sample(1:nrow(ToothGrowth), 10)
df
--> 53 12 16 26 37 27 9 22 28 10
#sample 10 rows
ToothGrowth[df,]

len supp dose


53 22.4 OJ 2.0

Página 3
How to take the samples using sample() in R?
JournalDev

12 16.5 VC 1.0
16 17.3 VC 1.0
26 32.5 VC 2.0
37 8.2 OJ 0.5
27 26.7 VC 2.0
9 5.2 VC 0.5
22 18.5 VC 2.0
28 21.5 VC 2.0
10 7.0 VC 0.5

Taking the samples from the dataset using the set.seed() function

In this section, we are going to use the set.seed() function to take the samples from the dataset.

Execute the below code to generate the samples from the data set using set.seed().

#set.seed function
set.seed(10)
#taking sample of 10 rows from the iris dataset.
x<- sample(1:nrow(iris), 10)
x
--> 137 74 112 72 88 15 143 149 24 13
#displays the 10 rows
iris[x, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
137 6.3 3.4 5.6 2.4 virginica
74 6.1 2.8 4.7 1.2 versicolor
112 6.4 2.7 5.3 1.9 virginica
72 6.1 2.8 4.0 1.3 versicolor
88 6.3 2.3 4.4 1.3 versicolor
15 5.8 4.0 1.2 0.2 setosa
143 5.8 2.7 5.1 1.9 virginica
149 6.2 3.4 5.4 2.3 virginica
24 5.1 3.3 1.7 0.5 setosa
13 4.8 3.0 1.4 0.1 setosa

You will get the same rows when you execute the code multiple times. The values won’t change as
we have used the set.seed() function.

Generating a random sample using sample() in R

Well, we will understand this concept with the help of a problem.

Página 4
How to take the samples using sample() in R?
JournalDev

Problem: A gift shop has decided to give a surprise gift to one of its customers. For this purpose,
they have collected some names. The thing is to choose a random name out of the list.

Hint: use the sample() function to generate random samples.

As you can see below, every time you run this code, it generates a random sample of participant
names.

#creates a list of names and generates one sample from this list
sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Rossie"
sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Jolie"

sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "jack"

sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Edwards"

sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Kyle"

Taking samples by setting the probabilities

With the help of the above examples and concepts, you have understood how you can generate
random samples and extract specific data from a dataset.

Some of you may feel relaxed if I say that R allows you to set the probabilities, as it may solve
many problems. Let’s see how it works with the help of a simple example.

Let’s think of a company that is able to manufacture 10 watches. Among these 10 watches, 20% of
them are found defective. Let’s illustrate this with the help of the below code.

#creates a probability of 80% good watches an 20% effective


watches.
sample (c('Good','Defective'), size=10, replace=T,
prob=c(.80,.20))

"Good" "Good" "Good" "Defective" "Good"


"Good"
"Good" "Good" "Defective" "Good"

You can also try for different probability adjustments as shown below.

sample (c('Good','Defective'), size=10, replace=T,

Página 5
How to take the samples using sample() in R?
JournalDev

prob=c(.60,.40))

--> "Good" "Defective" "Good" "Defective" "Defective"


"Good"
"Good" "Good" "Defective" "Good"

Wrapping up

In this tutorial, you have learned how to generate the sample from the dataset, vector, and a list with
or without replacement. The set.seed() function is helpful when you are generating the same
sequence of samples.

Try taking samples from various datasets available in R and also you can import some CSV files to
take samples with probability adjustments as shown.

Página 6

You might also like