Trabajo Analisis de Datos 2 TEMA 1

UNIVERSIDAD DE MANIZALES
FACULTAD DE CIENCIAS CONTABLES, ECONÓMICAS Y ADMINISTRATIVAS
Asignatura: Análisis de datos II
Análisis de la personalidad del cliente con RStudio
Caso tomado de: https://www.kaggle.com/
Para esta actividad se consultará el siguiente recurso de estudio:
Betancourt Uscátegui, J., & Polanco Guzmán, I. (2021). Análisis de datos con Power Bi, R-Rstudio y
Knime. Rama Editorial. https://www-digitaliapublishing-com.biblioproxy.umanizales.edu.co/a/
110209
Orientaciones de desarrollo: la presente actividad tiene como objetivo la construcción de gráficos

empleando las librerías ggplot2, tidyr, readxl y dplyr de R. Para ello es necesario descargar el
Dataset Customer Personality Analysis (https://www.kaggle.com/datasets/imakash3011/customer-
personality-analysis?select=marketing_campaign.csv) donde realizará un análisis detallado en pro
de ayudar a la empresa a comprender mejor a sus clientes, determinando lo siguiente:
 Realizar una segmentación de los clientes por:

o Clientes antiguos con altos ingresos y naturaleza de alto gasto.
o Clientes nuevos con ingresos por debajo del promedio y naturaleza de bajo gasto.
o Clientes nuevos con ingresos elevados y alto nivel de gasto.
o Clientes antiguos con ingresos por debajo del promedio y una naturaleza de bajo
gasto.
 definir tres (3) segmentos de los clientes según la edad, nivel educativo, los ingresos, tipo
de producto y la antigüedad (Puede considerar los atributos que usted considere
pertinente para la segmentación, por ejemplo: clientes con un ingreso promedio, clientes
con un gasto total promedio, clientes con su nivel de educación, clientes que más
compran según el tipo de producto, entre otros).
Para la presentación de la actividad, adjuntar el archivo de R del código y un documento en Word
donde realice un análisis soportado con gráficos e interpretación de cada uno de ellos según el
caso “Customer Personality Analysis”.
Reconocimiento: el conjunto de datos para este proyecto es proporcionado por el Dr. Omar
Romero Hernández.
DESARROLLO
Teniendo en cuenta la complejidad del proceso de organización y algoritmos para la creación de la

base de datos y las segmentaciones requeridas en el trabajo escrito, la presentación del siguiente
análisis para su evaluación.
Iniciamos realizando capturas del programa Rstudio, en la cual desarrollamos graficas de

segmentación y el paso a paso de este.
Eliminamos caso atipico
> install.packages("ggplot2")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/ggplot2_3.5.0.zip'
Content type 'application/zip' length 4821877 bytes (4.6 MB)
downloaded 4.6 MB
package ‘ggplot2’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in

C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("tidyr")
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/tidyr_1.3.1.zip'
downloaded 1.2 MB
package ‘tidyr’ successfully unpacked and MD5 sums checked

> install.packages("readxl")
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/readxl_1.4.3.zip'
downloaded 1.1 MB
package ‘readxl’ successfully unpacked and MD5 sums checked

> install.packages("dplyr")
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/dplyr_1.1.4.zip'
downloaded 1.5 MB
package ‘dplyr’ successfully unpacked and MD5 sums checked

> install.packages("gower")
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/gower_1.0.1.zip'
Content type 'application/zip' length 314815 bytes (307 KB)
downloaded 307 KB
package ‘gower’ successfully unpacked and MD5 sums checked

> install.packages("Rtsne")
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/Rtsne_0.17.zip'
downloaded 527 KB
package ‘Rtsne’ successfully unpacked and MD5 sums checked

> install.packages("dpTyr")
Warning in install.packages :
package ‘dpTyr’ is not available for this version of R
A version of this package for your version of R might be available

elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-
packages
> install.packages("corrgram")
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/corrgram_1.14.zip'
downloaded 393 KB
package ‘corrgram’ successfully unpacked and MD5 sums checked

> install.packages("stringr")
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/stringr_1.5.1.zip'
downloaded 311 KB
package ‘stringr’ successfully unpacked and MD5 sums checked

> install.packages("clister")
package ‘clister’ is not available for this version of R
A version of this package for your version of R might be available

elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-
packages
> install.packages("corrplot")
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/corrplot_0.92.zip'
downloaded 3.7 MB
package ‘corrplot’ successfully unpacked and MD5 sums checked

> library(dpTyr)
Error in library(dpTyr) : there is no package called ‘dpTyr’
> library(dpTyr)
Error in library(dpTyr) : there is no package called ‘dpTyr’
> library(dptyr)
Error in library(dptyr) : there is no package called ‘dptyr’
> library(stringr)
> library(corrgram)
> library(ggplot2)
> library(clister)
Error in library(clister) : there is no package called ‘clister’
> library(cluster)
> install.packages("cluster")
Error in install.packages : Updating loaded packages
> install.packages("cluster")
package ‘cluster’ is in use and will not be installed
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/dplyr_1.1.4.zip'
downloaded 1.5 MB
package ‘dplyr’ successfully unpacked and MD5 sums checked

cannot remove prior installation of package ‘dplyr’
problema al copiar C:\Users\jcqui\AppData\Local\R\win-library\
4.3\00LOCK\dplyr\libs\x64\dplyr.dll a C:\Users\jcqui\AppData\Local\R\
win-library\4.3\dplyr\libs\x64\dplyr.dll: Permission denied
restored ‘dplyr’

> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
> library(dplyr)
> library(tidyr)
> library(corrplot)
corrplot 0.92 loaded
package ‘corrplot’ is in use and will not be installed
> setwd("C:/Users/jcqui/OneDrive/Imágenes/analisis de datos")
> data <-read.csv("marketing_campaign2.csv",header = TRUE,sep = ";",dec =
",")
> View(data)
> sum(is.na(data))
[1] 24
> data <- na.omit(data)
> str(data)
'data.frame': 2216 obs. of 29 variables:
$ ID : int 5524 2174 4141 6182 5324 7446 965 6177 4855
5899 ...
$ Year_Birth : int 1957 1954 1965 1984 1981 1967 1971 1985 1974
1950 ...
$ Education : chr "Graduation" "Graduation" "Graduation"
"Graduation" ...
$ Marital_Status : chr "Single" "Single" "Together" "Together" ...
$ Income : int 58138 46344 71613 26646 58293 62513 55635
33454 30351 5648 ...
$ Kidhome : int 0 1 0 1 1 0 0 1 1 1 ...
$ Teenhome : int 0 1 0 0 0 1 1 0 0 1 ...
$ Dt_Customer : chr "4/9/2012" "8/3/2014" "21/8/2013"
"10/2/2014" ...
$ Recency : int 58 38 26 26 94 16 34 32 19 68 ...
$ MntWines : int 635 11 426 11 173 520 235 76 14 28 ...
$ MntFruits : int 88 1 49 4 43 42 65 10 0 0 ...
$ MntMeatProducts : int 546 6 127 20 118 98 164 56 24 6 ...
$ MntFishProducts : int 172 2 111 10 46 0 50 3 3 1 ...
$ MntSweetProducts : int 88 1 21 3 27 42 49 1 3 1 ...
$ MntGoldProds : int 88 6 42 5 15 14 27 23 2 13 ...
$ NumDealsPurchases : int 3 2 1 2 5 2 4 2 1 1 ...
$ NumWebPurchases : int 8 1 8 2 5 6 7 4 3 1 ...
$ NumCatalogPurchases: int 10 1 2 0 3 4 3 0 0 0 ...
$ NumStorePurchases : int 4 2 10 4 6 10 7 4 2 0 ...
$ NumWebVisitsMonth : int 7 5 4 6 5 6 6 8 9 20 ...
$ AcceptedCmp3 : int 0 0 0 0 0 0 0 0 0 1 ...
$ AcceptedCmp4 : int 0 0 0 0 0 0 0 0 0 0 ...
$ AcceptedCmp5 : int 0 0 0 0 0 0 0 0 0 0 ...
$ AcceptedCmp1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ AcceptedCmp2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Complain : int 0 0 0 0 0 0 0 0 0 0 ...
$ Z_CostContact : int 3 3 3 3 3 3 3 3 3 3 ...
$ Z_Revenue : int 11 11 11 11 11 11 11 11 11 11 ...
$ Response : int 1 0 0 0 0 0 0 0 1 0 ...
- attr(*, "na.action")= 'omit' Named int [1:24] 11 28 44 49 59 72 91 92
93 129 ...
..- attr(*, "names")= chr [1:24] "11" "28" "44" "49" ...
> data$age<-2015-data$Year_Birth
> data$Education[data$Education =="2n Cycle"]="UG"
> data$Education[data$Education =="Basic"]="UG"
> data$Education[data$Education =="Graduation"]="pG"
> data$Education[data$Education =="Master"]="PG"
> data$Education[data$Education =="PhD"]="UG"
> View(data)
> data$Marital_Status[data$Marital_Status== "Divorced"]= "Single"
> data$Marital_Status[data$Marital_Status== "Absurd"]= "Single"
> data$Marital_Status[data$Marital_Status== "YOLO"]= "Single"
> data$Marital_Status[data$Marital_Status== "Widow"]= "Single"
> data$Marital_Status[data$Marital_Status== "Together"]= "Single"
> data$Marital_Status[data$Marital_Status== "Married"]= "Single"
> data$Marital_Status[data$Marital_Status== "Alone"]= "Single"
> data$Customer_year <- str_sub(data$Dt_Customer,-4)
> data$Customer_year <- as.numeric(data$Customer_year)
> data$Customer_Seniority <- 2015 - data$Customer_year
> View(data)
> data$child<-data$Kidhome+data$Teenhome
> View(data)
> data$Ant_Spent<-
data$MntWines+data$MntFruits+data$MntMeatProducts+data$MntGoldProds
> View(data)
> data$Num_purchases_made<-
data$NumWebPurchases+data$NumCatalogPurchases+data$NumStorePurchases
> View(data)
> typeof(data$Income)
[1] "integer"
> data<-
data[c(1,30,3,4,5,33,32,9,34,35,16,20,10,11,12,13,14,15,17,18,19)]
> data2<-na.omit(data2)
Error: objeto 'data2' no encontrado
> data2<-data
> data2$Education <- unclass(as.factor(data2$Education))
> data2$Marital_Status <- unclass(as.factor(data2$Marital_Status))
> data2$Education <- as.numeric(data2$Education)
> data2$Marital_Status <- as.numeric(data2$Marital_Status)
> View(data2)
> set.seed(478)
> k <-kmeans(data2[,-c(1)],center = 4, iter.max = 3000)
> k$centers
age Education Marital_Status Income child
Customer_Seniority Recency
1 47.70629 1.780420 1 75544.14 0.4699301
1.945455 49.05874
2 48.88058 1.854331 1 51691.11 1.2335958
2.003937 49.60367
3 42.50000 2.125000 1 221604.50 0.6250000
1.875000 48.62500
4 41.91108 1.844049 1 28186.57 1.1190150
1.964432 48.35568
Ant_Spent Num_purchases_made NumDealsPurchases NumWebVisitsMonth
MntWines MntFruits
1 1137.70070 19.339860 1.661538 3.299301
623.20839 56.720280
2 427.33071 12.741470 3.098425 5.729659
273.49344 17.695538
3 656.62500 11.125000 4.250000 1.125000
26.50000 4.500000
4 78.85499 5.746922 2.142271 6.912449
29.92476 5.923393
MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds
NumWebPurchases
1 386.50210 81.551049 59.458741 71.26993
5.527273
2 92.06824 24.160105 17.040682 44.07349
4.628609
3 621.87500 4.250000 1.250000 3.75000
0.500000
4 25.42134 9.099863 6.002736 17.58550
2.147743
NumCatalogPurchases NumStorePurchases
1 5.3216783 8.490909
2 2.1666667 5.946194
3 9.8750000 0.750000
4 0.5253078 3.073871
> table(k$cluster)
1 2 3 4
715 762 8 731
> ggplot(aes(y= Ant_Spent, x = income), data= data2) +
geom_point(aes(color = k$cluster))
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! objeto 'income' no encontrado
Run `rlang::last_trace()` to see where the error occurred.
> View(data2)
> set.seed(478)
> k$centers
1 47.70629 1.780420 1 75544.14 0.4699301
1.945455 49.05874
2 48.88058 1.854331 1 51691.11 1.2335958
2.003937 49.60367
3 42.50000 2.125000 1 221604.50 0.6250000
1.875000 48.62500
4 41.91108 1.844049 1 28186.57 1.1190150
1.964432 48.35568
MntWines MntFruits
1 1137.70070 19.339860 1.661538 3.299301
623.20839 56.720280
2 427.33071 12.741470 3.098425 5.729659
273.49344 17.695538
3 656.62500 11.125000 4.250000 1.125000
26.50000 4.500000
4 78.85499 5.746922 2.142271 6.912449
29.92476 5.923393
NumWebPurchases
1 386.50210 81.551049 59.458741 71.26993
5.527273
2 92.06824 24.160105 17.040682 44.07349
4.628609
3 621.87500 4.250000 1.250000 3.75000
0.500000
4 25.42134 9.099863 6.002736 17.58550
2.147743
1 5.3216783 8.490909
2 2.1666667 5.946194
3 9.8750000 0.750000
4 0.5253078 3.073871
> table(k$cluster)
1 2 3 4
715 762 8 731
> ggplot(aes(y= Ant_Spent, x = income), data= data2) +
Caused by error:
! objeto 'income' no encontrado
> ggplot(aes(y = Age, x = Income), data = data2) + geom_point(aes(color =
k$cluster))
Caused by error:
! objeto 'Age' no encontrado
> ggplot(aes(y = Age, x = Income), data2 = data2) + geom_point(aes(color
= k$cluster))
Error in `fortify()`:
! `data` must be a <data.frame>, or an object coercible by `fortify()`,
or a valid
<data.frame>-like object coercible by `as.data.frame()`, not a <uneval>
object.
ℹ Did you accidentally pass `aes()` to the `data` argument?
> View(data)
> set.seed(478)
> k$centers
1 47.70629 1.780420 1 75544.14 0.4699301
1.945455 49.05874
2 48.88058 1.854331 1 51691.11 1.2335958
2.003937 49.60367
3 42.50000 2.125000 1 221604.50 0.6250000
1.875000 48.62500
4 41.91108 1.844049 1 28186.57 1.1190150
1.964432 48.35568
MntWines MntFruits
1 1137.70070 19.339860 1.661538 3.299301
623.20839 56.720280
2 427.33071 12.741470 3.098425 5.729659
273.49344 17.695538
3 656.62500 11.125000 4.250000 1.125000
26.50000 4.500000
4 78.85499 5.746922 2.142271 6.912449
29.92476 5.923393
NumWebPurchases
1 386.50210 81.551049 59.458741 71.26993
5.527273
2 92.06824 24.160105 17.040682 44.07349
4.628609
3 621.87500 4.250000 1.250000 3.75000
0.500000
4 25.42134 9.099863 6.002736 17.58550
2.147743
1 5.3216783 8.490909
2 2.1666667 5.946194
3 9.8750000 0.750000
4 0.5253078 3.073871
> table(k$cluster)
1 2 3 4
715 762 8 731
> ggplot(aes(y = Age, x = Income), data = data2) + geom_point(aes(color =
k$cluster))
Caused by error:
! objeto 'Age' no encontrado
> ggplot(aes(y = Ant_Spent, x = Income), data = data2) +
> g_caja<-boxplot(data2$Income, col ="skyblue", frame.plot=F)
> g_caja$out
[1] 157243 162397 153924 160803 157733 157146 156924 666666
> data2<-data2[!(data2$Income %in% g_caja$out),]
> g_caja$out
[1] 157243 162397 153924 160803 157733 157146 156924 666666
> k$centers
1 47.70629 1.780420 1 75544.14 0.4699301
1.945455 49.05874
2 48.88058 1.854331 1 51691.11 1.2335958
2.003937 49.60367
3 42.50000 2.125000 1 221604.50 0.6250000
1.875000 48.62500
4 41.91108 1.844049 1 28186.57 1.1190150
1.964432 48.35568
MntWines MntFruits
1 1137.70070 19.339860 1.661538 3.299301
623.20839 56.720280
2 427.33071 12.741470 3.098425 5.729659
273.49344 17.695538
3 656.62500 11.125000 4.250000 1.125000
26.50000 4.500000
4 78.85499 5.746922 2.142271 6.912449
29.92476 5.923393
NumWebPurchases
1 386.50210 81.551049 59.458741 71.26993
5.527273
2 92.06824 24.160105 17.040682 44.07349
4.628609
3 621.87500 4.250000 1.250000 3.75000
0.500000
4 25.42134 9.099863 6.002736 17.58550
2.147743
1 5.3216783 8.490909
2 2.1666667 5.946194
3 9.8750000 0.750000
4 0.5253078 3.073871
> data2<-data2[data2$MntWines %in% g_caja$out).]
Error: inesperado ')' en "data2<-data2[data2$MntWines %in% g_caja$out)"
> umbral_antiguedad<-1
> umbral_ingresos <-68000
> umbral_gasto<-1000
> clientes_segmentados_1 <-data2 %>%
+ Filters(Customer_Seniority>umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
Error in Filters(., Customer_Seniority > umbral_antiguedad & Income > :
no se pudo encontrar la función "Filters"
> head(clientes_segmentados_1)
Error: objeto 'clientes_segmentados_1' no encontrado
> #flirtrar clientes antiguos con altos ingresos y alto gasto
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority < umbral_antiguedad & Income <
umbral_ingresos & Ant_Spent < umbral_gasto)
[1] ID age Education
Marital_Status
[5] Income child Customer_Seniority Recency
[9] Ant_Spent Num_purchases_made NumDealsPurchases
NumWebVisitsMonth
[13] MntWines MntFruits MntMeatProducts
MntFishProducts
[17] MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
[21] NumStorePurchases
<0 rows> (or 0-length row.names)
+ filter(Customer_Seniority > umbral_antiguedad & Income >
ID age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
Num_purchases_made NumDealsPurchases NumWebVisitsMonth MntWines
MntFruits MntMeatProducts
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
MntFishProducts MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> kmeans_reslt <- kmeans (caraacteristicas_escala, centers =
num_clusters)
Error: objeto 'caraacteristicas_escala' no encontrado
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans (caraacteristicas_escala, centers =
num_clusters)
> caracteristicas_segmentacion <- clientes_segmentados_1 %>%
+ select(Income, Ant_Spent)
> )
Error: inesperado ')' en ")"
> num_clusters <- 3
> kmeans_reslt <- kmeans(caraacteristicas_escala, centers = num_clusters)
> source("~/.active-rstudio-document")
Error in source("~/.active-rstudio-document") :
~/.active-rstudio-document:165:0: unexpected end of input
163:
164:
^
> set.seed(123)
> num_clusters <- 3
> set.seed(123)
> num_clusters <- 3
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caracteristicas_escala, centers = num_clusters)
Error: objeto 'caracteristicas_escala' no encontrado
> # escalar las caracteristicas para evitar sesgos debidos a la
diferencias de escalas
> caracteristicas_escala <- scale(caracteristicas_segmentacion)
> #aplicar el algoritmo k-means para la segmentacion
> set.seed(123)
> num_clusters <- 3
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, aes(x = income, y =Amt_Spent, color = cluster))
Error: objeto 'cluster' no encontrado
cluster, as.factor(kmeans_reslt$cluster)
+ # añadir segmentacion de cluster al conjunto de datos original
+ clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
Error: unexpected symbol in:
"# añadir segmentacion de cluster al conjunto de datos original
clientes_segmentados_con_cluster"
> # añadir segmentacion de cluster al conjunto de datos original
+
+ keams_result$centers
"
keams_result"
+
+ kmeams_result$centers
"
kmeams_result"
+
+ kmeans_result$centers
"
kmeans_result"
+ # añadir segmentacion de cluster al conjunto de datos original
+ clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
"# añadir segmentacion de cluster al conjunto de datos original
clientes_segmentados_con_cluster"
> #clientes segmentados
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
+
"
kmeans_result"
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> #caracteristicas de los clientes segmentados k-neans
> set.seed(123)
> num_clusters <- 3
+
"
kmeans_result"
> gplot(clientes_segmentados_con_cluster, es(x = income, y =Amt_Spent,
color = cluster))
Error in gplot(clientes_segmentados_con_cluster, es(x = income, y =
Amt_Spent, :
no se pudo encontrar la función "gplot"
> ggplot(clientes_segmentados_con_cluster, es(x = income, y =Amt_Spent,
color = cluster))
Error: objeto 'clientes_segmentados_con_cluster' no encontrado
> ggplot(clientes_segmentados_con_cluster, aes(x = income, y =Amt_Spent,
color = cluster))+
+ geom_point() +
+ labs(x = "Income", y = "Amt_Spent", color = "cluster")+
+ ggtitle("segmentacion de clientes con k-means")+
+ theme_minimal()
> kmeans_result$centers
Error: objeto 'kmeans_result' no encontrado
"clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
kmeans_result"
> set.seed(478)
> k$centers
age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 49.20746 1.875203 1 60901.91 1.0000000 2.025932
48.60130 717.37763
2 47.40816 1.730612 1 79552.78 0.3387755 1.920408
49.74286 1232.20612
3 40.45796 1.842920 1 23218.81 0.9889381 2.006637
48.70133 59.84292
4 46.40370 1.842835 1 41532.87 1.3312789 1.935285
49.07396 189.94299
1 16.836305 2.933549 4.920583 450.02431
31.956240 175.89465
2 19.502041 1.340816 2.846939 650.00408
63.253061 444.68163
3 5.296460 2.035398 7.088496 17.11726
5.537611 20.87832
4 8.320493 2.662558 6.383667 110.88906
7.942989 45.03852
NumCatalogPurchases
1 43.15559 30.235008 59.50243 5.635332
3.4424635
2 90.91837 68.010204 74.26735 5.332653
5.7653061
3 8.04646 5.747788 16.30973 1.951327
0.4557522
4 13.18490 8.178737 26.07242 3.200308
1.0554700
NumStorePurchases
1 7.758509
2 8.404082
3 2.889381
4 4.064715
> table(k$cluster)
1 2 3 4
617 490 452 649
> g_caja<-boxplot(data2$Income, col ="skyblue", frame.plot=F)
> g_caja$out
numeric(0)
> summary(data$Income)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1730 35303 51382 52247 68522 666666
> #
> summary(data$Ant_Spent)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.0 57.0 349.0 542.4 928.5 2306.0
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> set.seed(123)
> num_clusters <- 3
kmeans_result"
color = cluster))+
+ geom_point() +
+ theme_minimal()
kmeans_result"
kmeans_result"
kmeans_result"
color = cluster))+
+ geom_point() +
+ theme_minimal()
> centers
Error: objeto 'centers' no encontrado
> kmeans_result$centers
Error: objeto 'kmeans_result' no encontrado
> ggplot(clientes_segmentados_con_cluster, aes(x = income, y
=Amt_Spent, color = cluster))+
+ geom_point() +
+ theme_minimal()
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> set.seed(123)
> num_clusters <- 3
kmeans_result"
> ggplot(clientes_segmentados_con_cluster, aes(x = income, y
=Amt_Spent, color = cluster))+
+ geom_point() +
+ theme_minimal()
+ ggplot(clientes_segmentados_con_cluster, aes(x = income, y =Amt_Spent,
color = cluster))+
ggplot"
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> set.seed(123)
> num_clusters <- 3

Trabajo Analisis de Datos 2 TEMA 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Trabajo Analisis de Datos 2 TEMA 1

Uploaded by

Copyright:

Available Formats

UNIVERSIDAD DE MANIZALES

FACULTAD DE CIENCIAS CONTABLES, ECONÓMICAS Y ADMINISTRATIVAS

Asignatura: Análisis de datos II

Análisis de la personalidad del cliente con RStudio

Caso tomado de: https://www.kaggle.com/

Para esta actividad se consultará el siguiente recurso de estudio:

Orientaciones de desarrollo: la presente actividad tiene como objetivo la construcción de gráficos

 Realizar una segmentación de los clientes por:

Teniendo en cuenta la complejidad del proceso de organización y algoritmos para la creación de la

Iniciamos realizando capturas del programa Rstudio, en la cual desarrollamos graficas de

package ‘ggplot2’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

package ‘tidyr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

package ‘readxl’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

package ‘dplyr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

The downloaded binary packages are in

package ‘Rtsne’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

A version of this package for your version of R might be available

package ‘corrgram’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

package ‘stringr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

A version of this package for your version of R might be available

package ‘corrplot’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

package ‘dplyr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

You might also like