Comparative Study Amongs AWS Azure and HortonWorks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Comparative study of Hortonworks, Aws and

Microsoft hd InSights
Class: CA191
Jamahuriya University of science and
technlogy

Abstract -- In recent years, new technologies have


II. HISTROY AND EVOLUTION OF THE
generated massive amounts of data every day.
DISTRIBUTIONS/SERVICES
Companies face challenges in collecting, storing,
analyzing, and utilizing large amounts of data in order A. HortonWorks
to create added value. The whole point for businesses
Hortonworks was established in June 2011 independently
and governments is to avoid losing valuable information
by members of you team to be success the Hadoop project.
in the shuffle. This is where "Big Data" technology
Hortonworks is open source and licensed Apache, the idea
comes into play. This technology is based on the analysis
behind the project is not to sell the license but to sell support
of very fine data masses [3]. Cloud computing is a pay-
and training[3]. The Hortonworks data platform (HDF)
as-you-go model that has the potential to eliminate the
includes Apache Hadoop to store, process and analyze the
need for high-cost and complex IT infrastructure. It
volume of data. Hortonworks completely merge with
offers adaptable and scalable resources for easy access
Cloudera in January 2019[4]. HDP offers a security-rich,
and development via mobile devices. The entire data in
enterprise-ready open-source Hadoop distribution based on a
cloud computing is dependent on networked resources
centralized architecture and the fundamental components of
that can be shared over the internet [2].
HDP are Apache Hadoop YARN and the Hadoop distributed
Keywords— Amazon Web Service (AWS), Hortonworks, File System (HDFS)[5]. The HDFS is designed to handle a huge
Microsoft Azure, Big Data, Hadoop Distribution, Cloud Service of volumes of data it splits the data into blocks and data
distributed and stored several different computer also HDFS
makes copies of the data and stores across multiple systems[5].
I. INTRODUCTION (BRIEF BACKGROUND )
A few years ago, the data processing method of entire
data processed by single machine and only needed
combination of one storage and processor that processed the
data, this consumed time and inefficient especially when
processing large volumes of a variety of data. Today data is
growing exponentially and isn’t possible to handle one
machine such a big volumes data. Technologies are
providing a big volume of data that need to be collected,
processed and stored and it’s when it appears the term big
data, companies are getting the advantages of using big data Figure 1: Hortonworks in Hadoop platform (HDP)[1]
technology[1]. Many business uses Hadoop architecture to
manage big data such as Hortonworks and other cloud B. Amazon web serivce(AWS)
services such as Amazon web service (AWS) and Microsoft The Amazon web service stablished in 2002 to provide
Azure[1]. Cloud computing depends on sharing required tools and services help developers to incorporate features of
resources instead of having local servers or personal devices Amazon. Latter in 2006 AWS began to offer first cloud
to do some type of applications. Cloud computing is computing services also 2006 on March 14, the web service
comparable to grid computing, a type of computing where of Amazon was re-launched and the most three initial service
unused processing cycles of all computers in a network to that combined to offer to Amazon which are S3 cloud
tackle and solve problems too hard for any stand-alone storage, Simple Queue Service (SQS), and EC2. AWS
machine[2]. After companies start providing cloud services provides four kinds of cloud service which are computing,
that user can access easy and efficiency storage space of database, networking and content storage[1]. Amazon Elastic
normal pricing, they gained popularity and many users Compute Cloud (Amazon EC2) is a network a service that
subscribed their services[2]. provides scalable computing power a cloud. It is designed to
make web size calculations easy for development. A simple
Amazon EC2 web service interface that allows you to access
and configure powerful minimal conflict. It gives you III. HIGHLIGHTS OF THE
complete control your computing resources that allow you to DISTRIBUTIONS/SERVICES AND ITS COMPONENTS
work with Amazon's verified computing environment.
Amazon EC2 reduces it Time required to find and open new
A. Amazon web service(AWS)
server instances minutes, allowing you to quickly measure
capacity, both and below, when your accounting Amazon web service is cloud computing platform that
offers huge cloud infrastructure and lots of services and
requirements change. Amazon EC2 is changing the products. AWS cloud provides a broadest range of scalable,
computing economy allowing you to pay only for the flexible infrastructure services that you can select to match
capacity you actually use[4]. your workloads and tasks. This gives you the ability to
choose the most appropriate mix of resources for your
specific applications. Cloud computing makes it easy to
experiment with infrastructure components and architecture
design[8].
1. Highlights of AWS
• AWS offers more than 100+ services.
• User benefit from the AWS years of
experience.
• Easy to use
• Allow to scale up and down services

2. Components of AWS[9]
1. Data Management and Data Transfer
Figure 2: component of Amazon EC2[1] 2. Compute & Networking
3. Storage
C. Microsoft Azure HDInsight 4. Automation and Orchestration
Microsoft azure is cloud computing service that 5. Visualization
Microsoft offers. The platform offers over 600 services and 6. Operations and Management
its web-based platform that you can build, test, deploy and 7. Security and Compliance
manage for applications and services[6]. Microsoft designed
Azure HDInsight as a cloud-based service for processing B. HortonWorks
and analyzing large volumes of streaming and historical Hortonworks Data Platform (HDP) is an open-source
data. Enterprises can further use HDInsight as a fully framework for distributed storage and processing of large,
managed analytics service. HDInsight enables developers to multi-source data sets. Hortonworks is the Hadoop
build big data applications using open-source frameworks Distribution that support windows platform. Premises while
such as Apache Hadoop, Apache Spark, Apache Hive, helping you drive new revenue streams, improve customer
Apache Kafka, Apache LLAP, Apache Storm, and experience, and control costs.
Microsoft Machine Learning Server[7]. Azure HDInsight
allows developers to build custom big data solutions and 1. Highlights of Hortonworks[1]
process massive amounts of data using the implementation 1. Hortonworks purpose of economic model
of widely used Apache products. The developers can is selling their support and training not to
facilitate batch processing using Apache Pig, Apache Spark, sell their license.
or Apache Hive. Likewise, they can access NoSQL data 2. It is the Big Hadoop contributor.
using Apache HBase, and stream millions of streaming 3. Uses existing data platform to embed
events using Apache Storm, Apache Spark, or Apache Hadoop
Kafka. The users can further integrate Apache Spark with
Hadoop MapReduce to extract, transform, and load (ETL) 2. Components of Hortonworks
large data cluster on demand[7]. Microsoft Azure does offer
Software as a service (SaaS), Platform as a service (PaaS) The Hortonworks Data Platform consists three
and the infrastructure as a service (IaaS). Any kind of layers
programming language, tools and the framework which is 1. Core Hadoop 2: The basic components of
currently bringing to the top marketplace of services that Apache Hadoop version 2.x.
can be use by the customers[1]. • Hadoop Distributed File System
Figure 3: (HDFS).
cloud service • YARN.
models • MapReduce 2 (MR2)
2. Essential Hadoop: A set of Apache
components designed to ease working with
Core Hadoop.
Some are Apache HBase, Apache
HCatalog, Apache Hive, Apache Pig.
3. Supporting Components: A set of I. Features of Aws, Azure and Hortonworks
components that allow you to monitor your
Hadoop installation and to connect Hadoop feature Azure Aws Hortonworks
with your larger compute environment.
C. Microsoft Azure
Azure is a public cloud computing platform with
Storage Blob Hortonworks
solutions including Infrastructure as a Service S3
services Data Platform
(IaaS), Platform as a Service (PaaS), and Software as a Storage
Buckets (HDP) is an
Service (SaaS) that can be used for services such as
Containers open-source
analytics, virtual computing, storage, networking, and much
EBS framework for
more.
Azure Drive distributed
SDB storage and
1. Highlights of Azure[10], [11]
Table processing of
• Azure supports IaaS, PaaS, SaaS domains
Storage large, multi-
• Global – Data housed in geo-synchronous source data sets.
data center. Tables Easy to use
• Open – Supports almost any OS,
language, tool, or framework. Storage SQS
• Flexible – Move compute resources up Stats
and down as needed. CloudFront
• Azure facilitates easy mobility and a
reliable consistent platform between on- AWS Import/
premise and public Cloud. Export
• Azure has hybrid capabilities that make it
unique. Database PostgreSQL
MS SQL MySQL
services
2. Components of Microsoft Azure [12] SQL Sync Oracle MySQL

• Compute DynamoDB Oracle


• Storage
• Database
• Security And Authentication Deployment Cspkg Amazon Web HDP offers a
• Networking services (fancy zip Services range of
• Monitoring file) infrastructure
• Web Services Upload via Amazon choices to deploy
• Mobile Services portal or Machine an open and
API via blob Instance (AMI) flexible data
storage platform. Users
IV. COMPARISONS AMONG THE Course- Traditional have the
DISTRIBUTIONS/SERVICES AND ITS COMPONENTS flexibility to
grained Deployment
(ANALYSIS/RESULTS) combine the
updates Models
infrastructure
The aim of this report is to compare big data cloud “click to options that best
platforms those are Amazon Web Service (AWS), Microsoft scale.” Fine-grained suit their unique
Azure and Hortonworks. There are a lot of different More magic updates use cases.
approaches we can look our comparison but we short the
comparison into three areas: Elastic On premises.
Beanstalk
• Features of the distributions Cloud.
• Weaknesses of the distributions Cloud
Formation
• Strengths of the distributions Hybrid cloud
and Cloudbreak
Security Provides Security is A Hadoop-
security by provided using powered data
offering defined roles lake can provide
permissions with permission a robust
on the control feature. foundation for a
whole new generation
account. of analytics and
insight.
II. Strengths of Aws, Azure and Hortonworks[13] III. Weaknesses of Aws, Azure and Hortonworks[13],
[14]
Azure Aws Hortonworks
Azure Aws Hortonworks
1. Capability for 1. High Transfer 1. Cost efficiency
developers and Stability 1. Customer 1. Less hybrid- 1.Overall
users to create, 2. Scalability service is not cloud-friendly complexity and
maintain and 2. Trusted by transparent, and challenging
deploy high-profile 3. Versatility data is hosted 2. AWS elastic learning curve
applications customers. globally. So, if load balancer is
4. Compatibility not equipped to 2. Low speed and
you have data
2. Fully scalable 3. Robust partner with multiple file handle as many no real-time data
restrictions where
cloud computing ecosystem systems and requests as it processing
it must be stored
platform offers processing receives
in a specific
open access 4. Broad & deep engines 3. Small file
country, at that
across multiple service offerings 3. AWS lacks problem
5. Fault tolerance time you need to
languages, verify/specify customer support,
frameworks, and 5. AWS enables so it more
you to select an with Microsoft
tools suitable for a
operating system, technically savvy
2. You will be
3. Total support programming group of
charged extra for
for Microsoft language, consumers and
paying as you go
legacy apps database of your those companies
choice. 3. Azure cloud- who have their
4. Greater based services are inbuild tech
awareness of 6. Compute
full of glitches. support team
enterprise needs Cloud allows you
To fix these bugs,
to increase or 4. The number of
you will need to
5. Easy one-click decrease storage choices offered
spend additional
migrations in according to the by AWS is
money
many cases need of your confusing to those
organization 4. Less flexibility who may not
6. Conversion of about non- speak the
on-prem licenses Windows server language of
to the cloud platforms, when technology.
compared to
7. Support for 5. Incompatible
AWS
mixed and Weak Hybrid
Linux/Windows Strategy
environments
6. AWS is a less
8. Offers inbuilt open private
tool like Azure cloud. This makes
stack to help the it an unpopular
organization storage option for
deliver Azure sensitive
service from the industries like
own data center banking

7. AWS has too


many products
which makes the
selection process
much harder
presenting a high-stakes at the commercial and marketing
V. DISCUSSION AND RECOMMENDATION levels. This trend in Big Data collection and analysis has
(APPLICATION DOMAIN) given rise to new solutions that combine traditional data
warehouse technologies with Big Data systems in a logical
We discussed a lot about big data, how data is changing
architecture[3]. Finally, Hortonworks has merged with
time after and according earlier years and there is massive
Cloudera, which is Hortonworks' parent company. And
different that occur data. Earlier managing data is not too
Hortonworks is primarily a data software company, with
hard is only needed one machine to analyze and process the
Amazon Web Service (AWS) and Microsoft Azure serving
data but today data growing is unimaginable and that when
as cloud platforms. These two offer highly secure cloud
it comes big data. There are different companies that offer
services. Microsoft has high-protected security via the
big data and cloud services, this report we talk about three
Azure platform, whereas Amazon Web Service has security
of them Amazon Web Service (AWS), Microsoft Azure and
via EC2. When it comes to Cloud service providers, there is
lastly Hortonworks. Aws and Microsoft Azure are two
no best one unless it all comes down to what best fits your
biggest cloud services in the world, both Aws and Azure
needs[1].
different products that each of them competing the market.
AWS and Azure offer essentially the same basic capabilities
VII. TEAM MEMBER ROLES
around flexible compute, storage, networking, and pricing.
Both share the common elements of a public cloud Title Name + ID
autoscaling, self-service, pricing, security, compliance,
identity access management features, and instant Abstract Abduulahi Said Fatih C119008
provisioning[15]. Currently Hortonworks is not working Introduction Mohamed Said Mohamed
separately. Cloudera and Hortonworks merge together. C119046
Today organizations like Healthcare, Public sector, History and evolution Sabirin Ali Mohamed C119007
Education, Insurance Services, Industry, Transportation Sagal Abshir Yusuf C119017
and Finance & Crime Detection needs to use and get Highlights of distributions AbiWali Abdulkadir Ali
C119744
benefits of using cloud and big data services offered by big
Comparisons among Yoonis Mohamed Abdullahi
data companies. When comes which company or service is
distributions C119775
choose to maintain and analyze organization’s data it needs Discussion and Mohamed Abdirahman Ahmed
to look different areas like pricing, experience, performance, recommendation C119704
security, integration with your business and so other things Conclusion AbdiNor Ibrahim Ahmed
maybe seems its import to that business. In pricing term C119734
Aws and Azure, you paid depending on the scale of service
and done your business but Hortonworks is open-source
platform and free they only charge when it comes training,
supporting. As reports shows Amazon Web Service leading VIII. REFERENCE
40% market while Azure leading 30% and others 30%.
Microsoft Azure has increased its market share in the past [1] K. A. Pinto, “A Comparative study one of the
years, but not to the point where there is real competition Hadoop distribution Hortonworks with Amazon Web
between the two companies at least in the near future[13]. In Service (AWS) and Microsoft Azure”.
the organization to deal with big data service or company [2] T. Madhuri and P. Sowjanya, “Microsoft Azure v/s
Amazon AWS cloud services: A comparative study,” Int. J.
like Aws, Azure but sometimes may organization use some
Innov. Res. Sci. Eng. Technol., vol. 5, no. 3, pp. 3904–3907,
service of specific company, if business already using 2016.
Microsoft office is great choice to choose Microsoft Azure [3] A. Erraissi, A. Belangour, and A. Tragha, “A big
because the business already familiar with services of data hadoop building blocks comparative study,” Int. J.
Microsoft. In the future it seems data will grow massively Comput. Trends Technol. Accessed June, vol. 18, 2017.
and big data companies develop their services and bring [4] V. A. Gandhi and C. Kumbharana, “Comparative
new ideas process and analyze the data. study of Amazon EC2 and Microsoft Azure cloud
architecture,” Int. J. Adv. Netw. Appl., pp. 117–123, 2014.
[5] “Hortonworks Data Platform,” p. 6.
[6] “Everything you ever wanted to know about
Microsoft Azure,” Nigel Frank, Nov. 05, 2018.
VI. CONCLUTION https://www.nigelfrank.com/insights/everything-you-ever-
Big Data is a term that has gained popularity in recent years wanted-to-know-about-microsoft-azure (accessed Oct. 31,
to describe the fact that businesses are confronted with large 2022).
volumes of data to handle gradually and significantly while
[7] “What is Azure HDInsight?,” BluePi, Feb. 07,
2019. https://www.bluepiit.com/blog/what-is-azure-
hdinsight/ (accessed Oct. 30, 2022).
[8] “Solution Components,” Amazon Web Services,
Inc. https://aws.amazon.com/hpc/solution-components/
(accessed Nov. 06, 2022).
[9] “Amazon Web Services (AWS): Products,
Components and Services.”
https://www.knowledgehut.com/blog/cloud-
computing/what-is-aws (accessed Nov. 06, 2022).
[10] L. McCoy, “Microsoft Azure Explained: What It Is
and Why It Matters,” CCB Technology, Mar. 06, 2019.
https://ccbtechnology.com/what-microsoft-azure-is-and-
why-it-matters/ (accessed Nov. 07, 2022).
[11] “Top 10 Azure Services and Products in 2022,”
Intellipaat Blog, Nov. 26, 2021.
https://intellipaat.com/blog/top-azure-services/ (accessed
Nov. 07, 2022).
[12] “Azure Components | Top 8 Awesome
Components of Azure,” EDUCBA, Jan. 10, 2020.
https://www.educba.com/azure-components/ (accessed Nov.
07, 2022).
[13] D. Taylor, “Azure vs AWS: Difference Between
Them,” Jan. 31, 2020. https://www.guru99.com/azure-vs-
aws.html (accessed Nov. 08, 2022).
[14] “The Good and the Bad of Hadoop Big Data
Framework,” AltexSoft.
https://www.altexsoft.com/blog/hadoop-pros-cons/
(accessed Nov. 08, 2022).
[15] “AWS vs Azure-Who is the big winner in the cloud
war?,” ProjectPro. https://www.projectpro.io/article/aws-vs-
azure-who-is-the-big-winner-in-the-cloud-war/401
(accessed Nov. 08, 2022).

You might also like