Professional Documents
Culture Documents
Explanatory Report BachNguyen Nvb180000
Explanatory Report BachNguyen Nvb180000
Recommender Systems
Introduction
At BlueEvent company, speedy event recommendation and matching is the primary
purpose of our flagship software products. We require a reliable computational hardware system
that allows us to prototype a machine learning (ML) powered recommender model and aid in the
rapid release and integration of the model in our future software shipments. However, the cost of
the said hardware systems, mainly parallel computing hardware necessary for training ML
models, is somewhat overwhelming for our small private company. Therefore, we require a
graphical processing unit solution (a popular form of parallel computation hardware) that offers
the most performance efficiency per price.
There are several options for graphical processing units (GPU) that are available in the
marketplace. The first option is to invest in an NVidia GPU to build an on-premise system,
which is the most popular and most potent hardware option for parallel computing and ML with
broad support and adoption on the market; however, their initial pricing is rather steep. The
alternative is online rental GPUs, a versatile and initially affordable option of renting on-demand
GPU time from hosting platforms such as Google Cloud, but with the downside of performance
and long-term cost compared to the other option. Our main goal is to build a system to quickly
and reliably prototype new ML recommender models; thus, stability/performance comes first,
and pricing comes second in our evaluation.
Advantages
● Native Support for Machine Learning Libraries: NVidia GPUs are unmatched in standard
ML libraries support, including the NVidia CUDA Deep Neural Network library; there
are powerful alternatives but with a significant lack of library support (Ramesh, 2018).
Having a widely supported platform allows for more flexibility when the company needs
to adapt the recommender systems to a new environment.
● Long-term Support: NVidia GPUs are covered by a three-year warranty policy, ensuring
that the company can effectively utilize the unit’s lifecycle while handling model training
for a long time.
● Higher Control and Security over Data: On-premise servers let the company have
complete control over the data processing, meaning the data is less susceptible to attacks
that are otherwise prevalent on cloud computing platforms (“Cloud vs.”, 2018). Because
BlueEvent is bound to handle the personal information of several million customers and
thousands of events, the promise to privacy can be a critical factor in the company’s
recommender systems and applications’ success.
● High Performance: on-premise GPU setups are shown to boast unparalleled performance,
up to 6x faster than online on-demand/cloud GPU (Boesen, 2017). This allows for more
rapid prototyping of the company’s recommender systems.
Disadvantages
The requirements for running computational tasks on the cloud is vastly different from
running the same tasks on-premise with NVidia GPUs. The data is loaded to an online server, the
company then choose from a variety of supported GPUs to run on, and GPU workloads are
handled by the cloud-hosting platform (“Cloud GPUs”, n.d.). A high-speed, stable Internet
connection is required to ensure data stream stability, and the cost is determined by GPU time,
which is the time consumed by the computation tasks on the upstream servers.
Advantages
Disadvantages
● Reliance on Internet Connection: The fact that cloud computing needs the data to be
uploaded to the server before processing, which means the company will have to ensure
that the connection to the server is fast and stable. This is a compromise on BlueEvent’s
side since unplanned interruption of internet service will result in prototyping delays.
● Low Customizability: For less operational complications, cloud-based servers are not
nearly as customizable as on-premise servers. This means the company cannot fit custom
recommender systems or cannot build an exclusive environment for any prototype.
● Lower Data Security: Transferring data for processing over the Internet comes with
frequent data leakage risks, as demonstrated by AWS voter information leakage in 2016
(“Cloud vs.”, 2018). This is a liability for BlueEvent since the company handles massive
user information flows in software application products.
● Expensive Long-term: Even though cloud-based servers’ start-up cost is small, the
cumulative cost of running these cloud models surpasses on-premise NVidia GPU servers
(Cheng, 2017). This can prove troublesome if the company needs to maintain trained
recommender models in the long run. After around 100 days of training on a cloud server,
the cost exceeds training on-premise (Boesen, 2017).
● Lower Performance compared to On-premise Servers: As mentioned before, on-premise
servers outperform cloud-based computing, which is a significant factor in light of the
rapid recommender system prototyping requirement of the company. Furthermore, on
average, the same ML task’s running time is lower on-average for an on-premise server
than a cloud-based server (Figure 4) (Cheng, 2017).
Figure 4. Training time for each platform, on-premise and two clouds (left to right)
Conclusion
Both an on-premise server solution and a cloud-based computation solution meet certain
different requirements for the purposes of training recommender systems that BlueEvent needs
them for. An on-premise server is a costly upfront investment that is powerful enough to rapidly
roll out prototypes of the recommender system, which can speed up software development for
event matching functionalities. The focus on on-premise performance requires more maintenance
but provides the company with more control over the data security and model customizability
aspects of the development process. Alternatively, renting a cloud-based server for computation
enables the company to cut the resource-consuming maintenance responsibilities while also
offering a more affordable option to develop early recommender prototypes. The trade-off would
be lower performance and less control over data model and security, making the option ideal if
BlueEvent would like to explore early recommender systems with less initial commitment and
more experimentation-oriented.
Works Cited
Boesen, M. R. (2017, May 8). GPU servers for machine learning startups: Cloud vs
On-premise? Medium.
https://medium.com/@thereibel/gpu-servers-for-machine-learning-startups-cloud-vs-on-p
remise-9a9dedfcadc9
Cheng, C. H. (2017, September 5). On-premise (DIY) vs Cloud GPU. Towards Data Science.
https://towardsdatascience.com/on-premise-diy-vs-cloud-gpu-d5280320d53d
Dettmers, T. (2020, September 7). Which GPU(s) to Get for Deep Learning: My Experience and
Advice for Using GPUs in Deep Learning. Tim Dettmers.
http://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/#The_Most_Important_
GPU_Specs_for_Deep_Learning_Processing_Speed
Exxact Corporation. (2018, November 13). Cloud vs. On-Premises: Which is Really Better for
Deep Learning?
https://blog.exxactcorp.com/cloud-vs-on-premises-which-is-really-better-for-deep-learnin
g/
Liao, K. (2018, November 11). Prototyping a Recommender System Step by Step Part 1: KNN
Item-Based Collaborative Filtering. Towards Data Science.
https://towardsdatascience.com/prototyping-a-recommender-system-step-by-step-part-1-k
nn-item-based-collaborative-filtering-637969614ea
Ramesh, P. (2018, August 29). NVIDIA leads the AI hardware race. But which of its GPUs
should you use for deep learning? Packt Publishing Ltd.
https://hub.packtpub.com/nvidia-leads-the-ai-hardware-race-but-which-of-its-gpus-shoul
d-you-use-for-deep-learning/