Oil Export Indonesia

This is research to test whether the variables
last_oil_price, mid_USD_rate and inflation

influence oil_export volume in Indonesia. The
test uses regression calculations by taking
sample data from 2021 to 2022. The data is
downloaded from the BPS, BI and investing
websites.
1. Preparation of Data
Import all of libraries that is required

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import seaborn as sns
from math import log
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.stattools import jarque_bera
%matplotlib inline
#Read in the export_oil csv file as DataFrame called oil_indonesia
oil_indonesia = pd.read_csv("/content/export_oil.csv")
#Check the head of oil_indonesia
oil_indonesia.head()
last_oil_price mid_USD_rate oil_export inflation

0 51.56 13662.0 815.3 0.0268
1 44.76 14234.0 805.2 0.0298
2 20.48 16367.0 617.4 0.0296
3 18.84 15157.0 562.1 0.0267
4 35.49 14733.0 560.9 0.0219
#Check the oil_indonesia table's shape
oil_indonesia.shape
(48, 5)
#Check the oil_indonesia info
oil_indonesia.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48 entries, 0 to 47
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 end_month 48 non-null object
1 last_oil_price 48 non-null float64
2 mid_USD_rate 48 non-null int64
3 oil_export 48 non-null float64
4 inflation 48 non-null float64
dtypes: float64(3), int64(1), object(1)
memory usage: 2.0+ KB
Change data type mid_USD_rate to float64 type

oil_indonesia['mid_USD_rate'] =
oil_indonesia['mid_USD_rate'].astype('float64')
oil_indonesia.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48 entries, 0 to 47
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 end_month 48 non-null object
1 last_oil_price 48 non-null float64
2 mid_USD_rate 48 non-null float64
3 oil_export 48 non-null float64
4 inflation 48 non-null float64
dtypes: float64(4), object(1)
memory usage: 2.0+ KB
#Check null value in oil_indonesia
print(oil_indonesia.isnull().sum())
end_month 0
last_oil_price 0
mid_USD_rate 0
oil_export 0
inflation 0
dtype: int64
Check possibility duplicated value in

oil_indonesia
oil_indonesia.duplicated().sum()
#Drop "end_month" column that it is useless for regression calculation
oil_indonesia.drop(['end_month'],axis=1,inplace=True)
Check the oil_indonesia describe methods

oil_indonesia.describe()

count 48.000000 48.000000 48.000000 48.000000
mean 69.608542 14776.416667 1061.262500 0.028981
std 22.174758 554.931463 342.029537 0.014455
min 18.840000 13662.000000 0.000000 0.013200
25% 52.040000 14346.750000 849.275000 0.015975
50% 74.580000 14671.500000 1066.150000 0.026050
75% 82.242500 15101.500000 1311.150000 0.038500
max 114.670000 16367.000000 1662.900000 0.059500
2. Exploratory Data Analysis
Use heatmap to recreate the plot below

plt.figure(figsize=(10,5))
c = oil_indonesia.corr()
sns.heatmap(c,cmap="BrBG",annot=True)
c
last_oil_price 1.000000 0.073903 0.755607 0.458852
mid_USD_rate 0.073903 1.000000 0.156464 0.545627
oil_export 0.755607 0.156464 1.000000 0.541510
inflation 0.458852 0.545627 0.541510 1.000000
Use pairplot to recreate the plot below

sns.pairplot(oil_indonesia)
<seaborn.axisgrid.PairGrid at 0x78b559296e90>
Create a linear model plot (using seaborn
lmplot) inflation vs oil_export
sns.lmplot(x='inflation', y="oil_export", data=oil_indonesia)
<seaborn.axisgrid.FacetGrid at 0x78b557c3aa10>
3. Training and Testing Data
Set variable X equal to the numerical features of

the oil _indonesia and a variable y equal to the
oil_export column
X = oil_indonesia.drop(['oil_export'],axis=1)
y = oil_indonesia['oil_export']
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
Import cross validation train_test_split from
sklearn
from sklearn.model_selection import train_test_split
Split the data into training and testing sets. Set

test size=0.2 and random test=0
X_train,X_test, y_train, y_test = train_test_split(X,y, test_size=0.2,
random_state=0)
4. Training the Model
Import linear regression from

sklearn.linear_model
from sklearn.linear_model import LinearRegression
Create a instance of a LinearRegression model

named lm
lm = LinearRegression()
Train/fit lm on the training data

lm.fit(X_train, y_train)
LinearRegression()
Print out the coefficients of the model

lm.coef_
array([200.44050601, -10.80755483, 121.11087484])

Print out the X_test
prediction = lm.predict(X_test)
prediction
array([1528.45452188, 705.9208487 , 1343.42451644, 1511.02483507,

1416.10915511, 1357.42519896, 1372.09041617, 1159.72902241,
700.81715592, 757.00143871])
5. Predicting the Test Data
Create a scatterplot of the real test values vs

the predicted values
plt.scatter(y_test, prediction)
plt.xlabel("Y_test")
plt.ylabel('predicted values')
Text(0, 0.5, 'predicted values')

6. Evaluating the Model
Calculate the mean_absolute_error,

mean_squared_error and
metrics.mean_squared_error for normalization
of data
from sklearn import metrics
print('MAE',metrics.mean_absolute_error(y_test, prediction))
print('MSE',metrics.mean_squared_error(y_test,prediction))
print('RMSE',np.sqrt(metrics.mean_squared_error(y_test,prediction)))
MAE 139.51781104780872
MSE 25329.345231343865
RMSE 159.15195641695348
Print out the R square value for deteminate how

accurate the model in prediction
metrics.explained_variance_score(y_test,prediction)
0.8234756635987014
Print head of independent variable(X) from

oil_indonesia DataFramePrint head of
independent variable(X) from oil_indonesia
DataFrame
X = oil_indonesia.drop(["oil_export"],axis=1)
X.head()
last_oil_price mid_USD_rate inflation

0 51.56 13662.0 0.0268
1 44.76 14234.0 0.0298
2 20.48 16367.0 0.0296
3 18.84 15157.0 0.0267
4 35.49 14733.0 0.0219
Print head of independent variable(y) from

oil_indonesia DataFrame
y = oil_indonesia['oil_export']
y.head()
0 815.3
1 805.2
2 617.4
3 562.1
4 560.9
Name: oil_export, dtype: float64
Add constant in X variable

X = sm.add_constant(X)
X.head()
const last_oil_price mid_USD_rate inflation

0 1.0 51.56 13662.0 0.0268
1 1.0 44.76 14234.0 0.0298
2 1.0 20.48 16367.0 0.0296
3 1.0 18.84 15157.0 0.0267
4 1.0 35.49 14733.0 0.0219
Print OLS Regression Result in watching

performance of regression model
OLS = sm.OLS(endog = y, exog = X).fit()
OLS.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
======================================================================
========
Dep. Variable: oil_export R-squared:
0.620
Model: OLS Adj. R-squared:
0.594
Method: Least Squares F-statistic:
23.93
Date: Wed, 17 Jan 2024 Prob (F-statistic):
2.44e-09
Time: 10:50:40 Log-Likelihood:
-324.46
No. Observations: 48 AIC:
656.9
Df Residuals: 44 BIC:
664.4
Df Model: 3
Covariance Type: nonrobust
======================================================================
============
coef std err t P>|t| [0.025
0.975]
----------------------------------------------------------------------
------------
const 546.7744 1021.789 0.535 0.595 -1512.506
2606.054
last_oil_price 9.7749 1.661 5.887 0.000 6.428
13.122
mid_USD_rate -0.0238 0.070 -0.339 0.736 -0.166
0.118
inflation 6432.0732 3031.540 2.122 0.040 322.405
1.25e+04
======================================================================
========
Omnibus: 48.862 Durbin-Watson:
1.555
Prob(Omnibus): 0.000 Jarque-Bera (JB):
286.338
Skew: -2.453 Prob(JB):
6.65e-63
Kurtosis: 13.913 Cond. No.
1.45e+06
======================================================================
========
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.45e+06. This might indicate that
there are
strong multicollinearity or other numerical problems.
"""
7. Conclusion
Testing of prediction instead testing data that is

prepared. Create a new variable DataFrame (df)
new_df = pd.DataFrame({"Actual":y_test,"Predicted":prediction})
new_df
Actual Predicted
29 1551.8 1528.454522
4 560.9 705.920849
26 1493.3 1343.424516
30 1287.6 1511.024835
32 1259.0 1416.109155
37 1186.5 1357.425199
34 1101.9 1372.090416
40 1308.6 1159.729022
7 599.6 700.817156
10 762.2 757.001439

Oil Export Indonesia

Uploaded by

Copyright:

Available Formats

You might also like

Oil Export Indonesia

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Oil Export Indonesia

Uploaded by

Copyright:

Available Formats

This is research to test whether the variables

last_oil_price, mid_USD_rate and inflation

Import all of libraries that is required

#Read in the export_oil csv file as DataFrame called oil_indonesia

#Check the head of oil_indonesia

last_oil_price mid_USD_rate oil_export inflation

#Check the oil_indonesia info

Change data type mid_USD_rate to float64 type

#Check null value in oil_indonesia

Check possibility duplicated value in

#Drop "end_month" column that it is useless for regression calculation

Check the oil_indonesia describe methods

last_oil_price mid_USD_rate oil_export inflation

2. Exploratory Data Analysis

Use heatmap to recreate the plot below

Use pairplot to recreate the plot below

Set variable X equal to the numerical features of

from sklearn.preprocessing import StandardScaler

Split the data into training and testing sets. Set

4. Training the Model

Import linear regression from

Create a instance of a LinearRegression model

Train/fit lm on the training data

Print out the coefficients of the model

array([200.44050601, -10.80755483, 121.11087484])

array([1528.45452188, 705.9208487 , 1343.42451644, 1511.02483507,

5. Predicting the Test Data

Create a scatterplot of the real test values vs

Text(0, 0.5, 'predicted values')

Calculate the mean_absolute_error,

Print out the R square value for deteminate how

Print head of independent variable(X) from

last_oil_price mid_USD_rate inflation

Print head of independent variable(y) from

Add constant in X variable

const last_oil_price mid_USD_rate inflation

Print OLS Regression Result in watching

Covariance Type: nonrobust

Testing of prediction instead testing data that is

You might also like