Project
The aim of this study is to gain a better understanding of tea consumption by examining the consumer sustainability awareness on tea purchase decision-making when tea is purchased either as a gift or for household self-use.
library(dplyr)
library(tidyr)
library(factoextra)
library(janitor)
library(DataExplorer)
tea = haven::read_sav("data set tea consumption-shared data.sav")
tea <- tea %>% dplyr::select(1:8, 30:34)
tea <- janitor::clean_names(tea)
cols <- colnames(tea)[1:8]
tea[cols] <- lapply(tea[cols], factor)
tea_dum <- DataExplorer::dummify(tea)
EDA
Missing values
plot_intro(tea)
skimr::skim(tea)
Name | tea |
Number of rows | 280 |
Number of columns | 13 |
_______________________ | |
Column type frequency: | |
factor | 8 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
purchase_purpose | 0 | 1 | FALSE | 2 | 1: 156, 2: 124 |
sex | 0 | 1 | FALSE | 2 | 2: 144, 1: 136 |
hometown | 0 | 1 | FALSE | 5 | 3: 106, 2: 98, 1: 46, 5: 25 |
age | 0 | 1 | FALSE | 6 | 3: 94, 2: 91, 1: 56, 4: 26 |
ethnic | 0 | 1 | FALSE | 2 | 1: 266, 2: 14 |
education | 0 | 1 | FALSE | 5 | 3: 148, 1: 70, 4: 43, 5: 11 |
job | 0 | 1 | FALSE | 9 | 4: 76, 2: 51, 3: 47, 9: 41 |
monthly_income | 0 | 1 | FALSE | 5 | 2: 102, 3: 82, 4: 39, 5: 31 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
fac1_sustain | 0 | 1 | 0 | 1 | -3.45 | -0.52 | 0.21 | 0.77 | 1.55 | ▁▂▃▇▇ |
fac2_brand | 0 | 1 | 0 | 1 | -2.80 | -0.76 | -0.04 | 0.81 | 1.82 | ▁▅▇▇▆ |
fac3_prg | 0 | 1 | 0 | 1 | -3.36 | -0.70 | 0.13 | 0.80 | 1.89 | ▁▂▆▇▅ |
fac4_fashion | 0 | 1 | 0 | 1 | -3.04 | -0.57 | 0.08 | 0.65 | 2.62 | ▁▃▇▆▁ |
fac5_conformty | 0 | 1 | 0 | 1 | -3.27 | -0.57 | 0.08 | 0.63 | 2.03 | ▁▂▅▇▃ |
Clustering
WSS plot
factoextra::fviz_nbclust(tea_dum, kmeans, method = "wss")
It seems 4 is an acceptable number for clusters.
K-means
set.seed(123)
tea_kmeans <- kmeans(tea, 4, nstart = 50, iter.max = 100)
tea_dum$Clusters <- factor(tea_kmeans$cluster)
tea$Clusters <- factor(tea_kmeans$cluster)
Plots
plot_bar(tea, by="Clusters",
order_bar = F, by_position = "fill")
Correlation Heatmap
plot_correlation(tea_dum)
Purchase purpose 1 is related to pragmatism.
Purchase purpose 2 is related to brand and prestige chasing.
Cluster Analysis
Cluster 1 - Novelty-seekers
It seems that these people tend to Prefer distinguished products with special appearances and enjoy novelty when deciding on a tea purchase. It is possible that these people are more like to have sensation seeking personalities.
Cluster 2 - Non-eco-friendly People
There is not much to say about people from this cluster other than they specifically tend not to care about sustainability for the environment in their tea purchase decisions. This means that they are not particularly concerned about their carbon footprint or being eco-friendly.
Cluster 3 - Gift-givers
For their tea purchases, they look for a known brand and prestige when deciding on tea purchases; they don’t care about pragmatic utility the tea has. Also, they are more like to purchase tea as a gift to other people rather then themselves. Therefore, these people tend to buy prestigeous tea brands to other people.
Cluster 4 - Utilitarians
They tend to choose a tea after considering the pragmatic utility of the kind, meaning they choose what is most needed and consider affordability. Lastly, they are more likely to purchase the tea for themselves rather than as a gift to other people.
Data Coding
Gender
Male
Female
Place of birth
Northern China
Easter China
Southern China
Western China
Central China
Age
18–24
25–34
35–44
45–54
55–64
65+
Ethnic group
Han
Minority group
Education background
Below high school graduate
High school degree
Bachelor’s degree
Master’s degree
PhD. and above
Occupation
Civil servant
Diplomats
Professional (Educator, Engineering, IT, Doctor, Nurse, Lawyer, Consultant, Athletes)
General Business Clerk
Corporate Management
Artists (e.g. Producer, Actor, Director, Designer)
Self-employed
Farmer
Others
Household monthly income (RMB)
Up to 3499
3500–7499
7500–12499
12500–16499
More than 16500