Professional Documents
Culture Documents
Report PSA Assessement
Report PSA Assessement
16 October 2018
——————————————————————————————–
Please Install the following packages when running on your PC. Please uncomment the
following statements while running
install.packages(“sqldf”)
install.packages(“ggplot”)
install.packages(“dplyr”)
install.packages(“plotrix”)
——————————————————————————————–
#Loading the packages
library("dplyr")
##
## Attaching package: 'dplyr'
library("sqldf")
library("ggplot2")
library("plotrix")
##NOTE: Kindly note, I have read the file as .csv and NOT .xlsx
Checking if there are any NULL or empty values in any of the columns
sum(is.na(MyData$Indicator_X))
## [1] 0
sum(is.na(MyData$Dateofdiagnosis))
## [1] 0
#No Null values in the Dateofdiagnosis column
sum(is.na(MyData$ï..Hospital))
## [1] 0
We shall run some SQL queries to get some idea about the data. This
will help in analysing the data
#The below query shows the number of indicators where the cancer was
#diagnosed and documented
sqldf("select Hospital, count((Indicator_X)) from MyData where Indicator_X==1
GROUP BY Hospital")
## Hospital count((Indicator_X))
## 1 A 81
## 2 B 578
## 3 C 130
## 4 D 164
## 5 E 700
## 6 F 409
## 7 G 22
## 8 H 288
## 9 I 438
## 10 L 39
## 11 M 828
## 12 N 601
## 13 O 220
## 14 P 308
After running the above query we see that Hospital A and Hospital G have the
minimum number of records for Cancer diagnosed for a particular date
Now Checking on average how many 0’s and 1’s are present in the
Indicator_X column for all hospitals and plotting in a histogram
hist(MyData$Indicator_X,
xlab="Indicator_X",
main="HISTOGRAM FOR PSA ASSESMENTS",
border="blue",
col="yellow",
xlim=c(0,1),
las=1,
breaks=5)
From the above plot, we see that in general the number of PSA Assesments
completed is more than the incomplete PSA Assesments
Count for:
indicaror x for 0= 3358
indicaror x for 1= 4806
Density plot for the Indicator_X to check how densely the 2 indicators
are populated
##Defining the themes to make the graphs look better.
red.bold.italic.text <- element_text(face = "bold.italic", color =
"red",size=15)
blue.bold.italic.text <- element_text(face = "bold.italic", color =
"blue",size=15)
ggplot() + geom_density(aes(x=MyData$Indicator_X),colour="blue",fill="gray")+
labs(title = "PSA ASSESMENT", x = "Indicator_X")+theme(title =
red.bold.italic.text, axis.title = blue.bold.italic.text) +
theme(plot.title = element_text(hjust = 0.5))
From the above plot, we see that the density of PSA assesments
completed(Indicator_X = 1) is much more than than the non-
completed PSA assesments (Indicator_X = 0)
Calculating the averages of the Indicator_X by grouping Indicator_X
with Hospital
ag1 <- aggregate(MyData$Indicator_X==0 ~ MyData$Hospital, FUN = mean, data =
MyData)
ag1
## MyData$Hospital MyData$Indicator_X == 0
## 1 A 0.4527027
## 2 B 0.2444444
## 3 C 0.3193717
## 4 D 0.4184397
## 5 E 0.5694957
## 6 F 0.2735346
## 7 G 0.4500000
## 8 H 0.4396887
## 9 I 0.3240741
## 10 L 0.3809524
## 11 M 0.3925165
## 12 N 0.4445471
## 13 O 0.5546559
## 14 P 0.2000000
## MyData$Hospital MyData$Indicator_X == 1
## 1 A 0.5472973
## 2 B 0.7555556
## 3 C 0.6806283
## 4 D 0.5815603
## 5 E 0.4305043
## 6 F 0.7264654
## 7 G 0.5500000
## 8 H 0.5603113
## 9 I 0.6759259
## 10 L 0.6190476
## 11 M 0.6074835
## 12 N 0.5554529
## 13 O 0.4453441
## 14 P 0.8000000
In the output, we se that the the averages for the completion of PSA assesment
(Indicator_X=1) is less in Hospital A compared to the other hospitals.
## Checking for each hospital and for each date of Diagnosis the average of
the non-completion of PSA assesment (Indicator_X=0)
ag3 <- aggregate(MyData$Indicator_X==0 ~
MyData$Dateofdiagnosis+MyData$Hospital, FUN = mean, data = MyData)
#ag3
## Checking for each hospital and for each date of Diagnosis the average of
the completion of PSA
#assesment (Indicator_X=1)
ag4 <- aggregate(MyData$Indicator_X==1 ~
MyData$Dateofdiagnosis+MyData$Hospital, FUN = mean, data = MyData)
#ag4
Counting the number of PSA assesments completed and not
completed in each of the hospitals
# counting number of 0's and 1's in the Indicator_X column for Hospital A
## Hospital count((Indicator_X))
## 1 A 81
## Hospital count((Indicator_X))
## 1 A 67
#No of 0's : 67
#No of 1's : 81
## Hospital count((Indicator_X))
## 1 B 578
## Hospital count((Indicator_X))
## 1 B 187
## Hospital count((Indicator_X))
## 1 C 130
## Hospital count((Indicator_X))
## 1 C 61
#No of 0's : 130
#No of 1's : 61
## Hospital count((Indicator_X))
## 1 D 164
## Hospital count((Indicator_X))
## 1 D 118
## Hospital count((Indicator_X))
## 1 E 700
## Hospital count((Indicator_X))
## 1 E 926
## Hospital count((Indicator_X))
## 1 F 409
## Hospital count((Indicator_X))
## 1 F 154
#No of 0's : 154
#No of 1's : 409
## Hospital count((Indicator_X))
## 1 G 22
## Hospital count((Indicator_X))
## 1 G 18
#No of 0's : 18
#No of 1's : 22
## Hospital count((Indicator_X))
## 1 F 409
## Hospital count((Indicator_X))
## 1 F 154
## Hospital count((Indicator_X))
## 1 G 22
## Hospital count((Indicator_X))
## 1 G 18
#No of 0's : 18
#No of 1's : 22
## Hospital count((Indicator_X))
## 1 H 288
## Hospital count((Indicator_X))
## 1 H 226
## Hospital count((Indicator_X))
## 1 I 438
## Hospital count((Indicator_X))
## 1 I 210
## Hospital count((Indicator_X))
## 1 L 39
## Hospital count((Indicator_X))
## 1 L 24
#No of 0's : 24
#No of 1's : 39
## Hospital count((Indicator_X))
## 1 M 828
## Hospital count((Indicator_X))
## 1 M 535
## Hospital count((Indicator_X))
## 1 N 601
## Hospital count((Indicator_X))
## 1 N 481
## Hospital count((Indicator_X))
## 1 O 220
## Hospital count((Indicator_X))
## 1 O 274
#No of 0's : 274
#No of 1's : 220
## Hospital count((Indicator_X))
## 1 P 308
## Hospital count((Indicator_X))
## 1 P 77
#No of 0's : 77
#No of 1's : 308
# The dates are grouped based on the year and are shown on the x-axis and the
corresponding y-axis shows the different hospitals.
Grouping the data for individual hospitals with their completed PSA
assessments.
grp1= MyData %>%
filter(Indicator_X==1 & Hospital=='A')
#81 obs