Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

SCHOOL OF COMPUTER STUDIES (PG)

COMPUTER SCIENCE/COMPUTER APPLICATIONS/INFORMATION


TECHNOLOGY
Assignment 5 R Big Data processing technology
Data Scenario:

Wal-Mart wants to analyse its Retail Sales using the dimension of Sales Person,

Customer and Product. All these 3 subjects (Person, Customer & Product) , we have get the
summarised data from HIVE from following combinations:
a. overall sales across month
b. Year wise sales summary on Each Sales persons
c. Month wise shopping Summary on each Customer
d. Year wise sales summary on Each Product
Upon extracting the below data, we may have do the following analysis
1. Distribute each data & draw the curve after finding its mean, median, mode and variance to
Analyse the trend of each Analysis
2. Draw suitable chart and graphs to present the Trends (pie, Line, Baretc)
1. Wal-Mart to set up the R Analysis environment for their Year-to-Year Sales Analysis , so the IT
department has to build the Dev and Test environment pointing to the folders
/home/usr/R/Dev and /home/usr/R/Test. Configure this in R.
Ans: Setwd(/home/usr/R/Dev) ----- for development
Setwd(/home/usr/R/test) ------- for Testing
2. Wal-Mart IT team wants to re-use its shell-scripts with data processing in R Environment , both
Development & Testing. Unix Shell is non-sensitive in terms of program constructs. So the what
is the impact of shell script migration to R Scripts.
Ans: R Language is case-sensitive but Unix is non-case sensitive, so we must cross check all
the source code of Unix and update them as case-sensitive.
Example : >X=10 and display the value of X with small case x, R wont understand.
3. Need to build a quick analysis on independent sales values TV and furniture res sales. Declare
R Variables for the below computation.
a. TV sales value is 12,00,000.
b. Furniture sales value is 50,00,200
Compute Sum of total Electronic sales using appropriate variables.
Ans : Above computation needs Scalar variables because above data collection does not

have any relationship and does not have any requirement of analysis together so these
variable must be declared as Vector as follows : > X=12,00,000; Y=50,00,000; X+Y
4. There is a situation to get the Sales values of Mal-mart across 4 months and compare these sales
for quick analysis. Values are M1=1.2M, M2=1M, M3=2M, M4=4M. Declare appropriate
variables for the and find out Max and Min of sales among 4 months.
Ans: (Vector Variables)
> Sal=c(1.2,1,2,4)
> Sal
[1] 1.2 1.0 2.0 4.0
> max (Sal)
[1] 4
> min(Sal)
5. Wal-Mart wants to do quick analysis between New-York and London sales of last quarter (3
months sales). Values are New-York (1M, 3M, 4M) and London(3M, 1.2M, 2M). Declare
appropriate variables and sum the New-York and London sales.
Ans: (Vector Variables)
> NY = c(1,3,4)
> LN = c(3,1.2,2)
> Sal_sum = NY + LN
> Sal_sum
[1] 4.0 4.2 6.0
6.

Received a mail with Sales values(in Crore) of Delhi, Mumbai, Chennai & Cochin for Jan, Feb
and Mar as (Delhi, 1.2), (Mumbai, 3), (Chennai, 4), (Cochin, 5). Find out average sales across 4
cities including the most deviated city for further improvement using appropriate variables.
Ans: Create a single vector for 4 cities sales values
> sal_val=c(1.2,3,4,5)
> a=mean(sal_val)
>a
[1] 3.3
> a - sal_val[1]
[1] 2.1
> a - sal_val[2]
[1] 0.3
> a - sal_val[3]
[1] -0.7

> a - sal_val[4]
[1] -1.7
7. Received 3 countries new product details as summary (Location, #product, Opening_sales) for
Master data update in Analysis Database. Create them in R using suitable variables/object for
table creation in my-sql ;
a. (India, 20,200000)
b. (Singapore, 30,4000000)
c. (New Zealand, 7, 9008788)
Ans:
> Counties=c("India","Singapore","New Zealand")
> product=c(20,30,7)
> open_sal=c(200000,4000000,9008788)
> sal_df=data.frame(Counties,product,open_sal)
> sal_df
Counties product open_sal
1

India

20 200000

2 Singapore
3 New Zealand

30 4000000
7 9008788

8. There is a meeting with Sales Head of Wal-mart explaining last 3 years sales, Y1(20.4M),
Y2(18.M), Y3(30M). Prepare few stats and chart for presentation.
Ans: As its Sales head meeting, we must have the data about Average sales and its
Variance. So we should find mean and Deviation of year sales from mean. Also we should
prepare Bar and Pie chart for 3 Yr sales value visualization.
> Ysal=c(20.4,18,30)
> mean(Ysal)
[1] 22.8
> Ysal-22.8
[1] -2.4 -4.8 7.2
> labels=c("2013", "2014", "2015")
> pie(Ysal,labels)

9. There are 5 cities sales performance is being received with weighted average and find out the
Max, Min & Avg of sales without using R function.
Five city sales (30000.34,8990000.200000,23899999,200000)
Ans:
Declare them as Vector v = c(30000.34,8990000.200000,23899999,200000)

& find out the maximum of sales values using the below R Code
> ma=0;mi=0;su=0
> for (i in 1:length(v)) {if (v[i] > ma) ma=v[i] }
> for (i in 1:length(v)) {if (v[i] < mi) mi=v[i];su=su+v[i] }
> ma ;mi;su/length(v)

10. Apollo Hospital wants to build an alert system for their service to attend the patients by their
conditions (Advice for Medication & Normal with Health awareness) basing on their tests.
Blood Test Range:
Sugar
Haemoglobin
WBC count
70-120
14-18
4000-10000
Haematology:
RBC Count Salt
Platelet Count
4.5-6.5
09.2 15.5
1-3 Lakhs
2 customer Results:
#
Sugar Haemoglobin WBC count
1
110
17
8000
2
140 19
11000

RBC count
5.0
7

Salt
11.4
7

Platelet count
2.5Lakhs
60000

Write R code to alert for Medication or health awareness if they are out of range of with-in
range.
Steps :
Build 2 Vector for both customer
C1 = c(110,17,8000, 5.0,11.4,2.5)
C2= c (140,19,

11000, 7,7,60000)

Use array variables to get each test result and check the range using if statement and set the flag
H (Health awarness) if all the results are within the range or M (Medication) [ use if and
switch statement].

Ans: Steps :
Build 2 Vector for both patients
## We have to do 6 Test namely Sugar, Haemoglobin, WBC count,
RBC, Count, Salt, Platelet Count
## So we create a Tests vector with 6 numbers
Tests = c(1,2,3,4,5,6)
For firat Patient
PatientValue1 <- c(110, 17, 8000, 5.0, 11.4, 250000)
PatientStatus1 = 'Normal\n'
for (x in Tests) {
switch(x,
if (PatientValue1[1] < 70) { PatientStatus1 ='Medication
Advised' },

if (PatientValue1[1]
Advised' },
if (PatientValue1[2]
Advised' },
if (PatientValue1[2]
Advised' },
if (PatientValue1[3]
Advised' },
if (PatientValue1[3]
Advised' },
if (PatientValue1[4]
Advised' },
if (PatientValue1[4]
Advised' },
if (PatientValue1[5]
Advised' },
if (PatientValue1[5]
Advised' },
if (PatientValue1[6]
Advised' },
if (PatientValue1[6]
Advised' }
)
}

> 140) { PatientStatus1 ='Medication


< 14) { PatientStatus1 ='Medication
> 18) { PatientStatus1 ='Medication
< 4000) { PatientStatus1 ='Medication
> 10000) { PatientStatus1 ='Medication
< 4.5) { PatientStatus1 ='Medication
> 6.5) { PatientStatus1 ='Medication
< 9.2) { PatientStatus1 ='Medication
> 15.5) { PatientStatus1 ='Medication
< 100000) { PatientStatus1 ='Medication
> 300000) { PatientStatus1 ='Medication

## For Second Patient


PatientValue2 <- c(140, 19, 11000, 7.0, 7, 600000)
for (y in Tests) {
switch(y,
if (PatientValue2[1]
Advised' },
if (PatientValue2[1]
Advised' },
if (PatientValue2[2]
Advised' },
if (PatientValue2[2]
Advised' },
if (PatientValue2[3]
Advised' },
if (PatientValue2[3]
Advised' },
if (PatientValue2[4]
Advised' },
if (PatientValue2[4]
Advised' },
if (PatientValue2[5]
Advised' },
if (PatientValue2[5]
Advised' },
if (PatientValue2[6]
Advised' },
if (PatientValue2[6]
Advised' }

< 70) { PatientStatus2 ='Medication


> 140) { PatientStatus2 ='Medication
< 14) { PatientStatus2 ='Medication
> 18) { PatientStatus2 ='Medication
< 4000) { PatientStatus2 ='Medication
> 10000) { PatientStatus2 ='Medication
< 4.5) { PatientStatus2 ='Medication
> 6.5) { PatientStatus2 ='Medication
< 9.2) { PatientStatus2 ='Medication
> 15.5) { PatientStatus2 ='Medication
< 100000) { PatientStatus2 ='Medication
> 300000) { PatientStatus2 ='Medication

)
}
cat ('PatientStatus1\n', PatientStatus1)
cat ('PatientStatus2\n', PatientStatus2)

11. Need to prepare a quick report to Wal-Mart Sales head for their 12 months Sales across the world.
Sales values are in Millions: (4, 7, 8, 12, 14, 6, 17, 20, 7, 12, 11, 18). Prepare the activity plan for
IT team to execute the same.
Ans:
> sal=c(4, 7, 8, 12, 14, 6, 17, 20, 7, 12, 11, 18)
> summary(sal)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.00 7.00 11.50 11.33 14.75 20.00
Guidelines for Reports/Interpretations:
a. Highlight the difference between Min and Max
b. Mean and median difference should be very negligible
c. Quartiles should not have much differences
If above measures is fine, then the sales values are in Normal. Otherwise we may have to alert
the sales head for further analysis and subsequent actions.
Also plot the Bar char and line graph for Sales Trends, explain the reasons.
12. Need to send a request with technical approach (Details on data & R commands) to IT head
office of Wal-Mart to execute Application Status Checking and subsequent activity using the
below application status (totally 9 columns):
Date
Time AppModule AppStatus

Txt1

Txt2

Txt3

Txt4

Txt5

Prepare this data set & status checking approach with Trend analysis visualization.
Ans: Step1: set the working directory as your application log directory
>setwd("E:/Training/MCA Training/RVS");
Step2: Read this CSV file into data frame using
>sal_frm=read.csv("E:/Training/MCA Training/RVS/salapplog.csv");
Step 3: Create a dataframe having only AppStatus
> x=data.frame(as)
Step 4: Get the summary of data frame from Step3 for Inference
Step 5. Draw line graph & Bar chart for AppStatus Trends. Finally save the appstate in
other file, write.csv(as,"appstat.csv");

13. There is a requirement of preparing E2E blue print for Customer Shopping Trends so that to plan
the Marketing strategy accordingly. So prepare the plan of analysis right from bringing CSV file
into R and saving the result back from R. Sample data structure:

It has around 89 customer with their shopping values for 3 Years.


Step1: Load the file custsal.csv into R
Step2 : Get the stats summary for inferences
Step 3: Draw the Bell-curve for distribution finding & subsequent infrences
Step 4: present the Customer Shopping trends and Year wise comparison using appropriate
chart/Graphs.
Ans:
Step1: Load the file custsal.csv into R
>cust=read.csv("CustomerYearSale.csv")
Step2 : Get the stats summary for inferences
>summary(cust)
Step 3: Draw the Bell-curve for distribution finding & subsequent infrences

Step 4: present the Customer Shopping trends and Year wise comparison using
appropriate chart/Graphs.
14. Wal-Mart is planning to start their business in New Zealand so they need to find out the consistent
Sales person from their group to debut him as NZ Sales head to initiate and manage the sales
operations. Below are the Salespersons Selling data for your analysis.

Step 1: Load data into R


Step 2: Get the Status summary of sales / year wise
Step 3: Get the status summary of Sales / Sales person wise (need to make the row as column)
Step 4: Get the summary and identify the consistent sales person for NZ deputation.
Ans :
Step 1: Load data into R
> employee=read.csv("EmployeeYearSale.csv")
Step 2: Get the Status summary of sales / year wise
>summary(employee)
Step 3: Get the status summary of Sales / Sales person wise (need to make the row as
column)
Step 4: Get the summary and identify the consistent sales person for NZ deputation.
15. Wal-Mart finds its recent quarter sales are kind of slowing against their target so they are planning
to re-size the inventory by reducing set of non-performed products. Prepare the plan and approach
to extract the same using R. Given date :

Step1: Load given data into R as Data frame


Step2: findout mean and add that as 5th Column
Step 3. Sort the whole data frame by 5th Column with ascending
Step 4. Extract top 10 & get the summary of sales for those products for Inference

Step 5. Draw Bar-char for comparing those 10 poor performed product to reduce them from
Inventory.
Ans:
Step1: Load given data into R as Data frame
> employee=read.csv("ProductYearSale.csv")
Step2: findout mean and add that as 5th Column

Step 3. Sort the whole data frame by 5th Column with ascending
Step 4. Extract top 10 & get the summary of sales for those products for Inference
Step 5. Draw Bar-char for comparing those 10 poor performed product to reduce them
from Inventory.

Normalization:
# Create a sequence of numbers between -10 and 10 incrementing by 0.1.
x <- seq(-10,10,by=1)
# Choose the mean as 2.5 and standard deviation as 0.5.
y <- pnorm(x, mean= 2.5, sd = 0.5)
# Give the chart file a name.
png(file = "pnorm.png")
plot(x,y)
# Save the file.
dev.off()

PNORM

DNORM

QNORM

RNORM

You might also like