Professional Documents
Culture Documents
Research Paper
Research Paper
Abstract
This research aims to predict app prices on the Google Play Store
based on various features, utilizing machine learning techniques. The
study utilizes PySpark, a powerful tool for big data processing, and em-
ploys regression models such as Random Forest, Linear Regression, and
Gradient-Boosted Trees for prediction. The research also explores data
preprocessing, model training, and evaluation techniques.
1 Introduction
The increasing number of mobile applications on the Google Play Store presents
an opportunity to understand the factors influencing app prices [1]. In this
study, we leverage machine learning models to predict app prices based on key
features, including rating, reviews, and installs.
3 Feature Engineering
We identify key features, including ’Rating,’ ’Reviews,’ and ’Installs,’ as predic-
tors for app prices [3]. These features are assembled into a vector to be used as
input for machine learning models. The dataset is split into training and testing
sets for model evaluation.
1
4 Model Training and Evaluation
Three regression models, namely Random Forest, Linear Regression, and Gradient-
Boosted Trees, are trained on the dataset [4]. The models are evaluated using
the Root Mean Squared Error (RMSE) metric, providing insights into their
predictive performance.
5 Predictive Analysis
We demonstrate the application of the trained model by predicting the price of
a new app with specified features. The research explores how the models can
be used for real-world predictions and decision-making.
7 Visualization of Predictions
To enhance understanding, the predictions of the models are visualized using a
bar chart [6]. This visualization contrasts actual prices with predicted prices,
offering an intuitive view of model performance.
8 Conclusion
In conclusion, this research demonstrates the application of machine learning
techniques to predict app prices on the Google Play Store. The study emphasizes
the importance of data preprocessing, feature engineering, and model evaluation
in achieving accurate predictions. The comparative analysis of different regres-
sion models provides insights for future research and practical applications.
9 Future Work
Future research could explore additional features, hyperparameter tuning, and
more sophisticated modeling techniques to further enhance predictive accuracy.
Additionally, the study could be extended to analyze the impact of other factors
on app prices in a dynamic and evolving mobile app market.
2
References
[1] Roma, Paolo, and Daniele Ragaglia. (2016). Revenue models, in-app pur-
chase, and the app performance: Evidence from Apple’s App Store and
Google Play. Electronic commerce research and applications, 17(2016), 173-
190. https://doi.org/10.1016/j.elerap.2016.04.007
[3] Li, Zheng, Xianfeng Ma, and Hongliang Xin. (2017). Feature engineering of
machine-learning chemisorption models for catalyst design. Catalysis today,
280(2017), 232-238. https://doi.org/10.1016/j.cattod.2016.04.013
[4] Raschka, Sebastian (2018). Model evaluation, model selection, and algo-
rithm selection in machine learning. arXiv preprint arXiv, 1811.12808
(2018), https://doi.org/10.48550/arXiv.1811.12808
[5] Pollock, Michael L., Carl Foster, Donald Schmidt, Charles Hellman, A.
C. Linnerud, and Ann Ward. (1982). Comparative analysis of physio-
logic responses to three different maximal graded exercise test proto-
cols in healthy women. American heart journal, 103(1982), 363-373.
https://doi.org/10.1016/0002-8703(82)90275-7
[6] Hong, Jiayi, Ross Maciejewski, Alain Trubuil, and Tobias Isenberg.
(2023). Visualizing and comparing machine learning predictions to
improve Human-AI teaming on the example of cell lineage. IEEE
Transactions on Visualization and Computer Graphics, (2023), .
https://doi.org/10.1109/TVCG.2023.3302308