Professional Documents
Culture Documents
Tutorial 1 - Regression
Tutorial 1 - Regression
Tutorial 1 - Regression
Big-Data-Analytics-with-R-and-Hadoop
Data modelling is a machine learning technique to identify the hidden pattern from the historical
dataset, and this pattern will help in future value prediction over the same data. This technique
highly focusses on past user actions and learns their taste. Most of these data modelling techniques
have been adopted by many popular organizations to understand the behaviour of their customers
based on their past transactions. These techniques will analyse data and predict for the customers
what they are looking for. Amazon, Google, Facebook, eBay, LinkedIn, Twitter, and many other
organizations are using data mining for changing the definition applications.
Objective: • Regression: In statistics, regression is a classic technique to identify the scalar relationship
between two or more variables by fitting the state line on the variable values. That relationship will
help to predict the variable value for future events. For example, any variable y can be modeled as
linear function of another variable x with the formula y = mx+c. Here, x is the predictor variable, y is
the response variable, m is slope of the line, and c is the intercept. Sales forecasting of products or
services and predicting the price of stocks can be achieved through this regression. R provides this
regression feature via the lm method, which is by default present in R.
Method:
1. Prior to start the coding, install some packages as below by: Go to Package > Install as
shown in Figure 1.
2. Go to the Packages section and type the following packages and install them a shown in the
Figure 2 .
i. ggplot2
ii. plyr
iii. Shiny
iv. Rpubs
v. devtools.
Big Data
Go to File > Import Dataset > From Excel as shown in Figure 3. The dataset of
“LungCapData” is imported and displayed on the screen as shown in Figure 4.
> summary(mod)
Call:
lm(formula = LungCap ~ Age)
Residuals:
Min 1Q Median 3Q Max
-4.7799 -1.0203 -0.0005 0.9789 4.2650
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.14686 0.18353 6.249 7.06e-10 ***
Age 0.54485 0.01416 38.476 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>
> #To show correlation for Age and LungCap
> attributes(mod)
$names
[1] "coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr"
[8] "df.residual" "xlevels" "call" "terms" "model"
$class
[1] "lm"
Response: LungCap
Df Sum Sq Mean Sq F value Pr(>F)
Age 1 3447.0 3447.0 1480.4 < 2.2e-16 ***
Big Data