Professional Documents
Culture Documents
Distributions Demo
Distributions Demo
Uniform distribution
Simplest and useful distribution. The support is defined by the two parameters, a and b,
which are its minimum and maximum values. The probability distribution function of the
continuous uniform distribution is:
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
#fitting uniform distribution to given data
a_fit,b_fit = uniform.fit(data_uniform)
print(f"actual a and b parameters : {a},{b}")
print(f"fitted a and b parameters : {a_fit},{b_fit}")
Normal distribution
A normal distribution has a bell-shaped density curve described by its mean μ and
standard deviation σ. The density curve is symmetrical, centered about its mean, with its
spread determined by its standard deviation showing that data near the mean are more
frequent in occurrence than data far from the mean. Its pdf is described as:
# For parameters μ=0, σ=1
mu=0
sigma=1
x = np.linspace(norm.ppf(0.01, mu, sigma),norm.ppf(0.99, mu, sigma),
100)
plt.plot(x, norm.pdf(x, mu, sigma),'r-', lw=3, alpha=0.6,
label='Normal pdf')
plt.legend()
plt.xlabel('X-variable')
plt.ylabel('pdf')
plt.show()
plt.plot(x, norm.cdf(x, mu, sigma),'g-', lw=3, alpha=0.6,
label='Normal cdf')
plt.legend()
plt.xlabel('X-variable')
plt.ylabel('cdf')
plt.show()
# Generating 100000 samples and plotting histogram
n = 100000
data_norm = norm.rvs(size=n, loc = 0, scale=1,random_state =3)
fig,ax=plt.subplots(figsize=(16,6))
ax.hist(data_norm,bins=1000)
ax.set(xlabel='Normal ', ylabel='Frequency')
plt.show()
Exponential distribution
The exponential distribution describes the time between events in a Poisson point process,
i.e., a process in which events occur continuously and independently at a constant average
rate. It has a parameter λ called rate parameter, and its PDF is described as :
# For parameters λ = 2
# Scipy expon distribution has shape param scale = 1/lambda and shift
param loc
Lambda=2
x = np.linspace(expon.ppf(0.01, loc=0, scale=Lambda),expon.ppf(0.99,
loc=0, scale=Lambda), 100)
plt.plot(x, expon.pdf(x, loc=0, scale=Lambda),'r-', lw=3, alpha=0.6,
label='Exponential pdf')
plt.legend()
plt.xlabel('X-variable')
plt.ylabel('pdf')
plt.show()
plt.plot(x, expon.cdf(x, loc=0, scale=Lambda),'g-', lw=3, alpha=0.6,
label='Exponential cdf')
plt.legend()
plt.xlabel('X-variable')
plt.ylabel('cdf')
plt.show()
# Generating 10000 samples and plotting histogram
n = 10000
data_expon = expon.rvs(size=n, loc = 0, scale=Lambda,random_state =
42)
ax = sns.distplot(data_expon,
bins=100,
kde=False,
color='blue',
hist_kws={"linewidth": 15,'alpha':0.2})
ax.set(xlabel='Exponential ', ylabel='Frequency')
plt.show()
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
actual lambda : 2
fitted lambda : 1.9549746397343684
x = np.linspace(expon.ppf(0.01, loc=0, scale=Lambda),expon.ppf(0.99,
loc=0, scale=Lambda), 100)
fig,ax=plt.subplots(figsize=(16,6))
ax.hist(data_expon,bins=100,density=True)
ax.plot(x,expon.pdf(x, loc = 0, scale=lambda_fit),lw=5)
plt.show
Beta distribution
Beta distribution is a family of continuous probability distributions defined on the interval
[0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as
exponents of the random variable and control the shape of the distribution. Its pdf is
descibed as:
Where Γ(n) is the gamma function and is defined as (n−1)!.
# With a=1 , b=1 Beta distribution looks like a uniform distribution.
a=1
b=1
x = np.linspace(beta.ppf(0.01, a, b),beta.ppf(0.99, a, b), 100)
plt.plot(x, beta.pdf(x, a, b),'r-', lw=3, alpha=0.6, label='beta pdf')
plt.legend()
plt.xlabel('X-variable')
plt.ylabel('pdf')
plt.show()
plt.plot(x, beta.cdf(x, a, b),'g-', lw=3, alpha=0.6, label='beta cdf')
plt.legend()
plt.xlabel('X-variable')
plt.ylabel('cdf')
plt.show()
# Let us generate 10000, random numbers from Beta distribution.
#The histogram of Beta(1,1) is a uniform distribution.
data_beta = beta.rvs(a, b, size=10000,random_state = 42)
ax = sns.distplot(data_beta,
kde=False,
bins=100,
color='blue',
hist_kws={"linewidth": 15,'alpha':0.2})
ax.set(xlabel='Beta', ylabel='Frequency')
plt.show()
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
Try plotting histograms with different values of a and b to see beta distribution taking
different forms.
#fit beta distribution to data generated using a=b=10
# write your code here
a1,b1,loc1,scale1 = beta.fit(data_beta)
print(f"fitted alpha : {a1}")
print(f"fitted beta : {b1}")
print(f"loc : {loc1}")
print(f"scale : {scale1}")
fitted alpha : 9.146392200712828
fitted beta : 8.551425730437387
loc : 0.012318289332077735
scale : 0.940469126410479
Gamma distribution
The gamma distribution is a two-parameter family of continuous probability distributions.
While it is used rarely in its raw form but other popularly used distributions like
exponential, chi-squared, erlang distributions are special cases of the gamma distribution.
The gamma distribution can be parameterized in terms of a shape parameter α=k and an
inverse scale parameter β=1/θ, called a rate parameter.
# Scipy gamma distribution has shape param a = alpha and has scale and
shift param scale and loc
# When a is an integer, gamma reduces to the Erlang distribution, and
when a=1 to the exponential distribution.
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
Discrete distributions
• Bernoulli
• Binomial
• Poisson
• Geometric
from scipy.stats import bernoulli, binom, nbinom, poisson, geom
Bernoulli distribution
It is the discrete probability distribution of a random variable which takes the value 1 with
probability p and the value 0 with probability q=1-p. Probability mass function over
possible outcomes k, is defined as:
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
actual p : 0.3
fitted p : 0.2887
Binomial distribution
It can be used to model the number of successes from N independent Bernoulli trials. The
probability of getting exactly k successes in n trials with succes probability p for each
bernoulli trial, is given by the probability mass function:
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
actual p : 0.4
fitted p : 0.3955
Poisson distribution
It expresses the probability of a given number of events occurring in a fixed interval of time
or space if these events occur with a known constant rate and independently. The Poisson
distribution can also be used for the number of events in other specified intervals such as
distance, area or volume.
The probability of observing k events in an interval is given by the equation:
Where
• λ is the average number of events per interval (rate parameter).
• k takes values 0, 1, 2, …
# scipy poisson takes mu = λ as shape parameter and loc as shift
parameter.
# For parameter λ= 3
Lambda = 3
x = np.arange(poisson.ppf(0.01, mu=Lambda),poisson.ppf(0.99,
mu=Lambda)+1)
plt.plot(x, poisson.pmf(x, mu=Lambda), 'bo', ms=8, label='poisson
pmf')
plt.vlines(x, 0, poisson.pmf(x, mu=Lambda), colors='b', lw=5,
alpha=0.5)
plt.legend()
plt.xlabel('X-variable')
plt.ylabel('pmf')
plt.show()
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
actual lambda : 3
fitted lambda : 2.9946
Geometric distribution
The geometric distribution gives the probability that the first occurrence of success
requires k independent trials, each with success probability p. If the probability of success
on each trial is p, then the probability that the kth trial (out of k trials) is the first success is
for k = 1, 2, 3, ....
The following form of the geometric distribution is used for modeling the number of
failures until the first success:
for k = 0, 1, 2, 3, ....
# scipy geom takes p as shape parameter and loc as shift parameter.
# For parameter p= 0.4
p=0.4
x = np.arange(geom.ppf(0.01, p),geom.ppf(0.99, p)+1)
plt.plot(x, geom.pmf(x, p), 'bo', ms=8, label='geometric pmf')
plt.vlines(x, 0, geom.pmf(x, p), colors='b', lw=5, alpha=0.5)
plt.legend()
plt.xlabel('X-variable')
plt.ylabel('pmf')
plt.show()
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)
#fit geometric distribution to above generated data