Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

UNIVERSIDAD DE MANIZALES

FACULTAD DE CIENCIAS CONTABLES, ECONÓMICAS Y ADMINISTRATIVAS

Asignatura: Análisis de datos II

Análisis de la personalidad del cliente con RStudio

Caso tomado de: https://www.kaggle.com/

Para esta actividad se consultará el siguiente recurso de estudio:

Betancourt Uscátegui, J., & Polanco Guzmán, I. (2021). Análisis de datos con Power Bi, R-Rstudio y
Knime. Rama Editorial. https://www-digitaliapublishing-com.biblioproxy.umanizales.edu.co/a/
110209

Orientaciones de desarrollo: la presente actividad tiene como objetivo la construcción de gráficos


empleando las librerías ggplot2, tidyr, readxl y dplyr de R. Para ello es necesario descargar el
Dataset Customer Personality Analysis (https://www.kaggle.com/datasets/imakash3011/customer-
personality-analysis?select=marketing_campaign.csv) donde realizará un análisis detallado en pro
de ayudar a la empresa a comprender mejor a sus clientes, determinando lo siguiente:

 Realizar una segmentación de los clientes por:


o Clientes antiguos con altos ingresos y naturaleza de alto gasto.
o Clientes nuevos con ingresos por debajo del promedio y naturaleza de bajo gasto.
o Clientes nuevos con ingresos elevados y alto nivel de gasto.
o Clientes antiguos con ingresos por debajo del promedio y una naturaleza de bajo
gasto.

 definir tres (3) segmentos de los clientes según la edad, nivel educativo, los ingresos, tipo
de producto y la antigüedad (Puede considerar los atributos que usted considere
pertinente para la segmentación, por ejemplo: clientes con un ingreso promedio, clientes
con un gasto total promedio, clientes con su nivel de educación, clientes que más
compran según el tipo de producto, entre otros).
Para la presentación de la actividad, adjuntar el archivo de R del código y un documento en Word
donde realice un análisis soportado con gráficos e interpretación de cada uno de ellos según el
caso “Customer Personality Analysis”.

Reconocimiento: el conjunto de datos para este proyecto es proporcionado por el Dr. Omar
Romero Hernández.

DESARROLLO

Teniendo en cuenta la complejidad del proceso de organización y algoritmos para la creación de la


base de datos y las segmentaciones requeridas en el trabajo escrito, la presentación del siguiente
análisis para su evaluación.

Iniciamos realizando capturas del programa Rstudio, en la cual desarrollamos graficas de


segmentación y el paso a paso de este.
Eliminamos caso atipico
> install.packages("ggplot2")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/ggplot2_3.5.0.zip'
Content type 'application/zip' length 4821877 bytes (4.6 MB)
downloaded 4.6 MB

package ‘ggplot2’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("tidyr")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/tidyr_1.3.1.zip'
Content type 'application/zip' length 1267089 bytes (1.2 MB)
downloaded 1.2 MB

package ‘tidyr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("readxl")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/readxl_1.4.3.zip'
Content type 'application/zip' length 1197242 bytes (1.1 MB)
downloaded 1.1 MB

package ‘readxl’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("dplyr")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/dplyr_1.1.4.zip'
Content type 'application/zip' length 1558956 bytes (1.5 MB)
downloaded 1.5 MB

package ‘dplyr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("gower")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/gower_1.0.1.zip'
Content type 'application/zip' length 314815 bytes (307 KB)
downloaded 307 KB
package ‘gower’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("Rtsne")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/Rtsne_0.17.zip'
Content type 'application/zip' length 539659 bytes (527 KB)
downloaded 527 KB

package ‘Rtsne’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("dpTyr")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘dpTyr’ is not available for this version of R

A version of this package for your version of R might be available


elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-
packages
> install.packages("corrgram")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/corrgram_1.14.zip'
Content type 'application/zip' length 403257 bytes (393 KB)
downloaded 393 KB

package ‘corrgram’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("stringr")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/stringr_1.5.1.zip'
Content type 'application/zip' length 319020 bytes (311 KB)
downloaded 311 KB

package ‘stringr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> install.packages("clister")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘clister’ is not available for this version of R

A version of this package for your version of R might be available


elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-
packages
> install.packages("corrplot")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/corrplot_0.92.zip'
Content type 'application/zip' length 3844699 bytes (3.7 MB)
downloaded 3.7 MB

package ‘corrplot’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> library(dpTyr)
Error in library(dpTyr) : there is no package called ‘dpTyr’
> library(dpTyr)
Error in library(dpTyr) : there is no package called ‘dpTyr’
> library(dptyr)
Error in library(dptyr) : there is no package called ‘dptyr’
> library(stringr)
> library(corrgram)
> library(ggplot2)
> library(clister)
Error in library(clister) : there is no package called ‘clister’
> library(cluster)
> install.packages("cluster")
Error in install.packages : Updating loaded packages
> install.packages("cluster")
Warning in install.packages :
package ‘cluster’ is in use and will not be installed
> install.packages("dplyr")
Error in install.packages : Updating loaded packages
> install.packages("dplyr")
Installing package into ‘C:/Users/jcqui/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
probando la URL
'https://cran.rstudio.com/bin/windows/contrib/4.3/dplyr_1.1.4.zip'
Content type 'application/zip' length 1558956 bytes (1.5 MB)
downloaded 1.5 MB

package ‘dplyr’ successfully unpacked and MD5 sums checked


Warning in install.packages :
cannot remove prior installation of package ‘dplyr’
Warning in install.packages :
problema al copiar C:\Users\jcqui\AppData\Local\R\win-library\
4.3\00LOCK\dplyr\libs\x64\dplyr.dll a C:\Users\jcqui\AppData\Local\R\
win-library\4.3\dplyr\libs\x64\dplyr.dll: Permission denied
Warning in install.packages :
restored ‘dplyr’

The downloaded binary packages are in


C:\Users\jcqui\AppData\Local\Temp\RtmpYpVcAf\downloaded_packages
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

> library(dplyr)
> library(tidyr)
> library(corrplot)
corrplot 0.92 loaded
> install.packages("corrplot")
Error in install.packages : Updating loaded packages
> install.packages("corrplot")
Warning in install.packages :
package ‘corrplot’ is in use and will not be installed
> setwd("C:/Users/jcqui/OneDrive/Imágenes/analisis de datos")
> data <-read.csv("marketing_campaign2.csv",header = TRUE,sep = ";",dec =
",")
> View(data)
> sum(is.na(data))
[1] 24
> data <- na.omit(data)
> str(data)
'data.frame': 2216 obs. of 29 variables:
$ ID : int 5524 2174 4141 6182 5324 7446 965 6177 4855
5899 ...
$ Year_Birth : int 1957 1954 1965 1984 1981 1967 1971 1985 1974
1950 ...
$ Education : chr "Graduation" "Graduation" "Graduation"
"Graduation" ...
$ Marital_Status : chr "Single" "Single" "Together" "Together" ...
$ Income : int 58138 46344 71613 26646 58293 62513 55635
33454 30351 5648 ...
$ Kidhome : int 0 1 0 1 1 0 0 1 1 1 ...
$ Teenhome : int 0 1 0 0 0 1 1 0 0 1 ...
$ Dt_Customer : chr "4/9/2012" "8/3/2014" "21/8/2013"
"10/2/2014" ...
$ Recency : int 58 38 26 26 94 16 34 32 19 68 ...
$ MntWines : int 635 11 426 11 173 520 235 76 14 28 ...
$ MntFruits : int 88 1 49 4 43 42 65 10 0 0 ...
$ MntMeatProducts : int 546 6 127 20 118 98 164 56 24 6 ...
$ MntFishProducts : int 172 2 111 10 46 0 50 3 3 1 ...
$ MntSweetProducts : int 88 1 21 3 27 42 49 1 3 1 ...
$ MntGoldProds : int 88 6 42 5 15 14 27 23 2 13 ...
$ NumDealsPurchases : int 3 2 1 2 5 2 4 2 1 1 ...
$ NumWebPurchases : int 8 1 8 2 5 6 7 4 3 1 ...
$ NumCatalogPurchases: int 10 1 2 0 3 4 3 0 0 0 ...
$ NumStorePurchases : int 4 2 10 4 6 10 7 4 2 0 ...
$ NumWebVisitsMonth : int 7 5 4 6 5 6 6 8 9 20 ...
$ AcceptedCmp3 : int 0 0 0 0 0 0 0 0 0 1 ...
$ AcceptedCmp4 : int 0 0 0 0 0 0 0 0 0 0 ...
$ AcceptedCmp5 : int 0 0 0 0 0 0 0 0 0 0 ...
$ AcceptedCmp1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ AcceptedCmp2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Complain : int 0 0 0 0 0 0 0 0 0 0 ...
$ Z_CostContact : int 3 3 3 3 3 3 3 3 3 3 ...
$ Z_Revenue : int 11 11 11 11 11 11 11 11 11 11 ...
$ Response : int 1 0 0 0 0 0 0 0 1 0 ...
- attr(*, "na.action")= 'omit' Named int [1:24] 11 28 44 49 59 72 91 92
93 129 ...
..- attr(*, "names")= chr [1:24] "11" "28" "44" "49" ...
> data$age<-2015-data$Year_Birth
> data$Education[data$Education =="2n Cycle"]="UG"
> data$Education[data$Education =="Basic"]="UG"
> data$Education[data$Education =="Graduation"]="pG"
> data$Education[data$Education =="Master"]="PG"
> data$Education[data$Education =="PhD"]="UG"
> View(data)
> data$Marital_Status[data$Marital_Status== "Divorced"]= "Single"
> data$Marital_Status[data$Marital_Status== "Absurd"]= "Single"
> data$Marital_Status[data$Marital_Status== "YOLO"]= "Single"
> data$Marital_Status[data$Marital_Status== "Widow"]= "Single"
> data$Marital_Status[data$Marital_Status== "Together"]= "Single"
> data$Marital_Status[data$Marital_Status== "Married"]= "Single"
> data$Marital_Status[data$Marital_Status== "Alone"]= "Single"
> data$Customer_year <- str_sub(data$Dt_Customer,-4)
> data$Customer_year <- as.numeric(data$Customer_year)
> data$Customer_Seniority <- 2015 - data$Customer_year
> View(data)
> data$child<-data$Kidhome+data$Teenhome
> View(data)
> data$Ant_Spent<-
data$MntWines+data$MntFruits+data$MntMeatProducts+data$MntGoldProds
> View(data)
> data$Num_purchases_made<-
data$NumWebPurchases+data$NumCatalogPurchases+data$NumStorePurchases
> View(data)
> typeof(data$Income)
[1] "integer"
> data<-
data[c(1,30,3,4,5,33,32,9,34,35,16,20,10,11,12,13,14,15,17,18,19)]
> data2<-na.omit(data2)
Error: objeto 'data2' no encontrado
> data2<-data
> data2$Education <- unclass(as.factor(data2$Education))
> data2$Marital_Status <- unclass(as.factor(data2$Marital_Status))
> data2$Education <- as.numeric(data2$Education)
> data2$Marital_Status <- as.numeric(data2$Marital_Status)
> View(data2)
> set.seed(478)
> k <-kmeans(data2[,-c(1)],center = 4, iter.max = 3000)
> k$centers
age Education Marital_Status Income child
Customer_Seniority Recency
1 47.70629 1.780420 1 75544.14 0.4699301
1.945455 49.05874
2 48.88058 1.854331 1 51691.11 1.2335958
2.003937 49.60367
3 42.50000 2.125000 1 221604.50 0.6250000
1.875000 48.62500
4 41.91108 1.844049 1 28186.57 1.1190150
1.964432 48.35568
Ant_Spent Num_purchases_made NumDealsPurchases NumWebVisitsMonth
MntWines MntFruits
1 1137.70070 19.339860 1.661538 3.299301
623.20839 56.720280
2 427.33071 12.741470 3.098425 5.729659
273.49344 17.695538
3 656.62500 11.125000 4.250000 1.125000
26.50000 4.500000
4 78.85499 5.746922 2.142271 6.912449
29.92476 5.923393
MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds
NumWebPurchases
1 386.50210 81.551049 59.458741 71.26993
5.527273
2 92.06824 24.160105 17.040682 44.07349
4.628609
3 621.87500 4.250000 1.250000 3.75000
0.500000
4 25.42134 9.099863 6.002736 17.58550
2.147743
NumCatalogPurchases NumStorePurchases
1 5.3216783 8.490909
2 2.1666667 5.946194
3 9.8750000 0.750000
4 0.5253078 3.073871
> table(k$cluster)

1 2 3 4
715 762 8 731
> ggplot(aes(y= Ant_Spent, x = income), data= data2) +
geom_point(aes(color = k$cluster))
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! objeto 'income' no encontrado
Run `rlang::last_trace()` to see where the error occurred.
> View(data2)
> set.seed(478)
> k <-kmeans(data2[,-c(1)],center = 4, iter.max = 3000)
> k$centers
age Education Marital_Status Income child
Customer_Seniority Recency
1 47.70629 1.780420 1 75544.14 0.4699301
1.945455 49.05874
2 48.88058 1.854331 1 51691.11 1.2335958
2.003937 49.60367
3 42.50000 2.125000 1 221604.50 0.6250000
1.875000 48.62500
4 41.91108 1.844049 1 28186.57 1.1190150
1.964432 48.35568
Ant_Spent Num_purchases_made NumDealsPurchases NumWebVisitsMonth
MntWines MntFruits
1 1137.70070 19.339860 1.661538 3.299301
623.20839 56.720280
2 427.33071 12.741470 3.098425 5.729659
273.49344 17.695538
3 656.62500 11.125000 4.250000 1.125000
26.50000 4.500000
4 78.85499 5.746922 2.142271 6.912449
29.92476 5.923393
MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds
NumWebPurchases
1 386.50210 81.551049 59.458741 71.26993
5.527273
2 92.06824 24.160105 17.040682 44.07349
4.628609
3 621.87500 4.250000 1.250000 3.75000
0.500000
4 25.42134 9.099863 6.002736 17.58550
2.147743
NumCatalogPurchases NumStorePurchases
1 5.3216783 8.490909
2 2.1666667 5.946194
3 9.8750000 0.750000
4 0.5253078 3.073871
> table(k$cluster)

1 2 3 4
715 762 8 731
> ggplot(aes(y= Ant_Spent, x = income), data= data2) +
geom_point(aes(color = k$cluster))
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! objeto 'income' no encontrado
Run `rlang::last_trace()` to see where the error occurred.
> ggplot(aes(y = Age, x = Income), data = data2) + geom_point(aes(color =
k$cluster))
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! objeto 'Age' no encontrado
Run `rlang::last_trace()` to see where the error occurred.
> ggplot(aes(y = Age, x = Income), data2 = data2) + geom_point(aes(color
= k$cluster))
Error in `fortify()`:
! `data` must be a <data.frame>, or an object coercible by `fortify()`,
or a valid
<data.frame>-like object coercible by `as.data.frame()`, not a <uneval>
object.
ℹ Did you accidentally pass `aes()` to the `data` argument?
Run `rlang::last_trace()` to see where the error occurred.
> View(data)
> set.seed(478)
> k <-kmeans(data2[,-c(1)],center = 4, iter.max = 3000)
> k$centers
age Education Marital_Status Income child
Customer_Seniority Recency
1 47.70629 1.780420 1 75544.14 0.4699301
1.945455 49.05874
2 48.88058 1.854331 1 51691.11 1.2335958
2.003937 49.60367
3 42.50000 2.125000 1 221604.50 0.6250000
1.875000 48.62500
4 41.91108 1.844049 1 28186.57 1.1190150
1.964432 48.35568
Ant_Spent Num_purchases_made NumDealsPurchases NumWebVisitsMonth
MntWines MntFruits
1 1137.70070 19.339860 1.661538 3.299301
623.20839 56.720280
2 427.33071 12.741470 3.098425 5.729659
273.49344 17.695538
3 656.62500 11.125000 4.250000 1.125000
26.50000 4.500000
4 78.85499 5.746922 2.142271 6.912449
29.92476 5.923393
MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds
NumWebPurchases
1 386.50210 81.551049 59.458741 71.26993
5.527273
2 92.06824 24.160105 17.040682 44.07349
4.628609
3 621.87500 4.250000 1.250000 3.75000
0.500000
4 25.42134 9.099863 6.002736 17.58550
2.147743
NumCatalogPurchases NumStorePurchases
1 5.3216783 8.490909
2 2.1666667 5.946194
3 9.8750000 0.750000
4 0.5253078 3.073871
> table(k$cluster)

1 2 3 4
715 762 8 731
> ggplot(aes(y = Age, x = Income), data = data2) + geom_point(aes(color =
k$cluster))
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! objeto 'Age' no encontrado
Run `rlang::last_trace()` to see where the error occurred.
> ggplot(aes(y = Ant_Spent, x = Income), data = data2) +
geom_point(aes(color = k$cluster))
> g_caja<-boxplot(data2$Income, col ="skyblue", frame.plot=F)
> g_caja$out
[1] 157243 162397 153924 160803 157733 157146 156924 666666
> data2<-data2[!(data2$Income %in% g_caja$out),]
> data2<-data2[!(data2$Income %in% g_caja$out),]
> g_caja$out
[1] 157243 162397 153924 160803 157733 157146 156924 666666
> k$centers
age Education Marital_Status Income child
Customer_Seniority Recency
1 47.70629 1.780420 1 75544.14 0.4699301
1.945455 49.05874
2 48.88058 1.854331 1 51691.11 1.2335958
2.003937 49.60367
3 42.50000 2.125000 1 221604.50 0.6250000
1.875000 48.62500
4 41.91108 1.844049 1 28186.57 1.1190150
1.964432 48.35568
Ant_Spent Num_purchases_made NumDealsPurchases NumWebVisitsMonth
MntWines MntFruits
1 1137.70070 19.339860 1.661538 3.299301
623.20839 56.720280
2 427.33071 12.741470 3.098425 5.729659
273.49344 17.695538
3 656.62500 11.125000 4.250000 1.125000
26.50000 4.500000
4 78.85499 5.746922 2.142271 6.912449
29.92476 5.923393
MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds
NumWebPurchases
1 386.50210 81.551049 59.458741 71.26993
5.527273
2 92.06824 24.160105 17.040682 44.07349
4.628609
3 621.87500 4.250000 1.250000 3.75000
0.500000
4 25.42134 9.099863 6.002736 17.58550
2.147743
NumCatalogPurchases NumStorePurchases
1 5.3216783 8.490909
2 2.1666667 5.946194
3 9.8750000 0.750000
4 0.5253078 3.073871
> data2<-data2[data2$MntWines %in% g_caja$out).]
Error: inesperado ')' en "data2<-data2[data2$MntWines %in% g_caja$out)"
> umbral_antiguedad<-1
> umbral_ingresos <-68000
> umbral_gasto<-1000
> clientes_segmentados_1 <-data2 %>%
+ Filters(Customer_Seniority>umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
Error in Filters(., Customer_Seniority > umbral_antiguedad & Income > :
no se pudo encontrar la función "Filters"
> head(clientes_segmentados_1)
Error: objeto 'clientes_segmentados_1' no encontrado
> #flirtrar clientes antiguos con altos ingresos y alto gasto
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority < umbral_antiguedad & Income <
umbral_ingresos & Ant_Spent < umbral_gasto)
> head(clientes_segmentados_1)
[1] ID age Education
Marital_Status
[5] Income child Customer_Seniority Recency
[9] Ant_Spent Num_purchases_made NumDealsPurchases
NumWebVisitsMonth
[13] MntWines MntFruits MntMeatProducts
MntFishProducts
[17] MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
[21] NumStorePurchases
<0 rows> (or 0-length row.names)
> #flirtrar clientes antiguos con altos ingresos y alto gasto
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority > umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
> head(clientes_segmentados_1)
ID age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
Num_purchases_made NumDealsPurchases NumWebVisitsMonth MntWines
MntFruits MntMeatProducts
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
MntFishProducts MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> kmeans_reslt <- kmeans (caraacteristicas_escala, centers =
num_clusters)
Error: objeto 'caraacteristicas_escala' no encontrado
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans (caraacteristicas_escala, centers =
num_clusters)
Error: objeto 'caraacteristicas_escala' no encontrado
> #flirtrar clientes antiguos con altos ingresos y alto gasto
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority > umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
> caracteristicas_segmentacion <- clientes_segmentados_1 %>%
+ select(Income, Ant_Spent)
> )
Error: inesperado ')' en ")"
> num_clusters <- 3
> kmeans_reslt <- kmeans(caraacteristicas_escala, centers = num_clusters)
Error: objeto 'caraacteristicas_escala' no encontrado
> source("~/.active-rstudio-document")
Error in source("~/.active-rstudio-document") :
~/.active-rstudio-document:165:0: unexpected end of input
163:
164:
^
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caraacteristicas_escala, centers = num_clusters)
Error: objeto 'caraacteristicas_escala' no encontrado
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caraacteristicas_escala, centers = num_clusters)
Error: objeto 'caraacteristicas_escala' no encontrado
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caracteristicas_escala, centers = num_clusters)
Error: objeto 'caracteristicas_escala' no encontrado
> # escalar las caracteristicas para evitar sesgos debidos a la
diferencias de escalas
> caracteristicas_escala <- scale(caracteristicas_segmentacion)
> #aplicar el algoritmo k-means para la segmentacion
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caracteristicas_escala, centers = num_clusters)
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, aes(x = income, y =Amt_Spent, color = cluster))
Error: objeto 'cluster' no encontrado
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ # añadir segmentacion de cluster al conjunto de datos original
+ clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
Error: unexpected symbol in:
"# añadir segmentacion de cluster al conjunto de datos original
clientes_segmentados_con_cluster"
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+
+ keams_result$centers
Error: unexpected symbol in:
"
keams_result"
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+
+ kmeams_result$centers
Error: unexpected symbol in:
"
kmeams_result"
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+
+ kmeans_result$centers
Error: unexpected symbol in:
"
kmeans_result"
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ # añadir segmentacion de cluster al conjunto de datos original
+ clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
Error: unexpected symbol in:
"# añadir segmentacion de cluster al conjunto de datos original
clientes_segmentados_con_cluster"
> #clientes segmentados
> head(clientes_segmentados_1)
ID age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
Num_purchases_made NumDealsPurchases NumWebVisitsMonth MntWines
MntFruits MntMeatProducts
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
MntFishProducts MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+
+ kmeans_result$centers
Error: unexpected symbol in:
"
kmeans_result"
> #flirtrar clientes antiguos con altos ingresos y alto gasto
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority > umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
> #flirtrar clientes antiguos con altos ingresos y alto gasto
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority > umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
> #clientes segmentados
> head(clientes_segmentados_1)
ID age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
Num_purchases_made NumDealsPurchases NumWebVisitsMonth MntWines
MntFruits MntMeatProducts
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
MntFishProducts MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> #caracteristicas de los clientes segmentados k-neans
> caracteristicas_segmentacion <- clientes_segmentados_1 %>%
+ select(Income, Ant_Spent)
> #caracteristicas de los clientes segmentados k-neans
> caracteristicas_segmentacion <- clientes_segmentados_1 %>%
+ select(Income, Ant_Spent)
> # escalar las caracteristicas para evitar sesgos debidos a la
diferencias de escalas
> caracteristicas_escala <- scale(caracteristicas_segmentacion)
> #aplicar el algoritmo k-means para la segmentacion
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caracteristicas_escala, centers = num_clusters)
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+
+ kmeans_result$centers
Error: unexpected symbol in:
"
kmeans_result"
> gplot(clientes_segmentados_con_cluster, es(x = income, y =Amt_Spent,
color = cluster))
Error in gplot(clientes_segmentados_con_cluster, es(x = income, y =
Amt_Spent, :
no se pudo encontrar la función "gplot"
> ggplot(clientes_segmentados_con_cluster, es(x = income, y =Amt_Spent,
color = cluster))
Error: objeto 'clientes_segmentados_con_cluster' no encontrado
> ggplot(clientes_segmentados_con_cluster, aes(x = income, y =Amt_Spent,
color = cluster))+
+ geom_point() +
+ labs(x = "Income", y = "Amt_Spent", color = "cluster")+
+ ggtitle("segmentacion de clientes con k-means")+
+ theme_minimal()
Error: objeto 'clientes_segmentados_con_cluster' no encontrado
> kmeans_result$centers
Error: objeto 'kmeans_result' no encontrado
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ kmeans_result$centers
Error: unexpected symbol in:
"clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
kmeans_result"
> set.seed(478)
> k <-kmeans(data2[,-c(1)],center = 4, iter.max = 3000)
> k$centers
age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 49.20746 1.875203 1 60901.91 1.0000000 2.025932
48.60130 717.37763
2 47.40816 1.730612 1 79552.78 0.3387755 1.920408
49.74286 1232.20612
3 40.45796 1.842920 1 23218.81 0.9889381 2.006637
48.70133 59.84292
4 46.40370 1.842835 1 41532.87 1.3312789 1.935285
49.07396 189.94299
Num_purchases_made NumDealsPurchases NumWebVisitsMonth MntWines
MntFruits MntMeatProducts
1 16.836305 2.933549 4.920583 450.02431
31.956240 175.89465
2 19.502041 1.340816 2.846939 650.00408
63.253061 444.68163
3 5.296460 2.035398 7.088496 17.11726
5.537611 20.87832
4 8.320493 2.662558 6.383667 110.88906
7.942989 45.03852
MntFishProducts MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
1 43.15559 30.235008 59.50243 5.635332
3.4424635
2 90.91837 68.010204 74.26735 5.332653
5.7653061
3 8.04646 5.747788 16.30973 1.951327
0.4557522
4 13.18490 8.178737 26.07242 3.200308
1.0554700
NumStorePurchases
1 7.758509
2 8.404082
3 2.889381
4 4.064715
> table(k$cluster)

1 2 3 4
617 490 452 649
> ggplot(aes(y = Ant_Spent, x = Income), data = data2) +
geom_point(aes(color = k$cluster))
> g_caja<-boxplot(data2$Income, col ="skyblue", frame.plot=F)
> g_caja$out
numeric(0)
> data2<-data2[!(data2$Income %in% g_caja$out),]
> summary(data$Income)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1730 35303 51382 52247 68522 666666
> umbral_antiguedad<-1
> umbral_ingresos <-68000
> umbral_gasto<-1000
> #
> summary(data$Ant_Spent)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.0 57.0 349.0 542.4 928.5 2306.0
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority > umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
> #clientes segmentados
> head(clientes_segmentados_1)
ID age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
Num_purchases_made NumDealsPurchases NumWebVisitsMonth MntWines
MntFruits MntMeatProducts
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
MntFishProducts MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> caracteristicas_segmentacion <- clientes_segmentados_1 %>%
+ select(Income, Ant_Spent)
> # escalar las caracteristicas para evitar sesgos debidos a la
diferencias de escalas
> caracteristicas_escala <- scale(caracteristicas_segmentacion)
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caracteristicas_escala, centers = num_clusters)
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ kmeans_result$centers
Error: unexpected symbol in:
"clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
kmeans_result"
> ggplot(clientes_segmentados_con_cluster, aes(x = income, y =Amt_Spent,
color = cluster))+
+ geom_point() +
+ labs(x = "Income", y = "Amt_Spent", color = "cluster")+
+ ggtitle("segmentacion de clientes con k-means")+
+ theme_minimal()
Error: objeto 'clientes_segmentados_con_cluster' no encontrado
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ kmeans_result$centers
Error: unexpected symbol in:
"clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
kmeans_result"
> kmeans_reslt <- kmeans(caracteristicas_escala, centers = num_clusters)
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ kmeans_result$centers
Error: unexpected symbol in:
"clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
kmeans_result"
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ kmeans_result$centers
Error: unexpected symbol in:
"clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
kmeans_result"
> ggplot(clientes_segmentados_con_cluster, aes(x = income, y =Amt_Spent,
color = cluster))+
+ geom_point() +
+ labs(x = "Income", y = "Amt_Spent", color = "cluster")+
+ ggtitle("segmentacion de clientes con k-means")+
+ theme_minimal()
Error: objeto 'clientes_segmentados_con_cluster' no encontrado
> centers
Error: objeto 'centers' no encontrado
> kmeans_result$centers
Error: objeto 'kmeans_result' no encontrado
> ggplot(clientes_segmentados_con_cluster, aes(x = income, y
=Amt_Spent, color = cluster))+
+ geom_point() +
+ labs(x = "Income", y = "Amt_Spent", color = "cluster")+
+ ggtitle("segmentacion de clientes con k-means")+
+ theme_minimal()
Error: objeto 'clientes_segmentados_con_cluster' no encontrado
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority > umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
> head(clientes_segmentados_1)
ID age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
Num_purchases_made NumDealsPurchases NumWebVisitsMonth MntWines
MntFruits MntMeatProducts
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
MntFishProducts MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> caracteristicas_segmentacion <- clientes_segmentados_1 %>%
+ select(Income, Ant_Spent)
> # escalar las caracteristicas para evitar sesgos debidos a la
diferencias de escalas
> caracteristicas_escala <- scale(caracteristicas_segmentacion)
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caracteristicas_escala, centers = num_clusters)
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ kmeans_result$centers
Error: unexpected symbol in:
"clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
kmeans_result"
> ggplot(clientes_segmentados_con_cluster, aes(x = income, y
=Amt_Spent, color = cluster))+
+ geom_point() +
+ labs(x = "Income", y = "Amt_Spent", color = "cluster")+
+ ggtitle("segmentacion de clientes con k-means")+
+ theme_minimal()
Error: objeto 'clientes_segmentados_con_cluster' no encontrado
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ ggplot(clientes_segmentados_con_cluster, aes(x = income, y =Amt_Spent,
color = cluster))+
Error: unexpected symbol in:
"clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
ggplot"
> ggplot(aes(y = Ant_Spent, x = Income), data = data2) +
geom_point(aes(color = k$cluster))
> umbral_antiguedad<-1
> umbral_ingresos <-68000
> umbral_gasto<-1000
> #flirtrar clientes antiguos con altos ingresos y alto gasto
> clientes_segmentados_1<-data2 %>%
+ filter(Customer_Seniority > umbral_antiguedad & Income >
umbral_ingresos & Ant_Spent > umbral_gasto)
> #clientes segmentados
> head(clientes_segmentados_1)
ID age Education Marital_Status Income child Customer_Seniority
Recency Ant_Spent
1 2114 69 3 1 82800 0 3
23 1188
2 6565 66 2 1 76995 1 2
91 1766
3 1966 50 3 1 84618 0 2
96 1585
4 8755 69 2 1 68657 0 2
4 1009
5 8601 35 1 1 80011 1 2
3 1135
6 6566 61 3 1 72550 2 3
39 1231
Num_purchases_made NumDealsPurchases NumWebVisitsMonth MntWines
MntFruits MntMeatProducts
1 25 1 3 1006
22 115
2 24 2 5 1012
80 498
3 25 1 2 684
100 801
4 17 1 7 482
34 471
5 19 2 4 421
76 536
6 19 9 8 826
50 317
MntFishProducts MntSweetProducts MntGoldProds NumWebPurchases
NumCatalogPurchases
1 59 68 45 7
6
2 0 16 176 11
4
3 21 66 0 6
9
4 119 68 22 3
5
5 82 178 102 8
6
6 50 38 38 5
2
NumStorePurchases
1 12
2 9
3 10
4 9
5 5
6 12
> #caracteristicas de los clientes segmentados k-neans
> caracteristicas_segmentacion <- clientes_segmentados_1 %>%
+ select(Income, Ant_Spent)
> #aplicar el algoritmo k-means para la segmentacion
> set.seed(123)
> num_clusters <- 3
> kmeans_reslt <- kmeans(caracteristicas_escala, centers = num_clusters)
> # añadir segmentacion de cluster al conjunto de datos original
> clientes_segmentados_con_cluster <-cbind(clientes_segmentados_1,
cluster, as.factor(kmeans_reslt$cluster)
+ kmeans_result$centers

You might also like