Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

What is Big Data?

Big Data refers to humongous volumes of data that cannot be processed effectively with the
traditional applications that exist. The processing of Big Data begins with the raw data that isn’t
aggregated and is most often impossible to store in the memory of a single computer.

A buzzword that is used to describe immense volumes of data, both unstructured and
structured, Big Data inundates a business on a day-to-day basis. Big Data is something that can be used
to analyze insights that can lead to better decisions and strategic business moves.

The definition of Big Data, given by Gartner, is, “Big data is high-volume, and high-velocity or
high-variety information assets that demand cost-effective, innovative forms of information processing
that enable enhanced insight, decision making, and process automation.”

How Big is Big Data

Big Data is a popular term used to describe the massive collection of data, whether structured, semi-
structured, unstructured or raw. Data may be defined as an asset on balance sheet.

According to Gartner, Big Data comprises high volume, velocity and variety of information assets
that demand cost-effective, innovative forms of information-processing for enhanced insights and
decision-making. Hence, the globally accepted 3 Vs of Big Data are:

 Volume – amount of data,

 Velocity – speed of data in and out, and

 Variety – range of data types and sources which include:  unstructured text documents, picture,
video, email, audio, stock ticker data, financial transactions, etc.

However, recent studies have added two more components which describe Big Data, viz:

 Variability: At times, data flow is highly inconsistent with periodic peaks which hamper the
process of handling and managing data effectively.

 Complexity: As large volumes of data come from multiple sources, data management becomes a
challenging task.

In fact, the data sets are so big and complex that it becomes very difficult and challenging to process
them using the traditional data processing applications. It is estimated that about 2.5 quintillion bytes of
data is created every day.

This implies that about 90% of the world’s total data was created in the last two years. It should be
noted here that about 80% of the total data is unstructured – there are data collected from sensors used
to gather weather information, social media posts, digital photos and videos, purchase transaction
records, to mobile phone’s GPS and many more.
Both government and private sectors have used Big Data to enhance their productivity. Big Data
analytics played a key role in Barack Obama’s successful re-election campaign in 2012. Witnessing the
role of Big Data in addressing the problems faced by the government, the Obama Government
announced the Big Data Research and Development initiative in the year 2012. The United States
Federal government owns six of the ten most powerful supercomputers of the world.

In the private sector, Facebook uses Big Data to handle 50 billion photos from its user’s base.
Amazon.com used Linux based technology to handle millions of back end operations every day.
eBay.com uses two data warehouses of 7.5 PB and 40 PB as well as a 40 PB Hadoop cluster for search.
FICO Falcon Credit Card Fraud Detection system secures 2.1 billion active accounts across the globe.

Walmart handles 1 million+ customer transactions every hour, which are imported into databases
estimated to contain more than 2.5 petabytes of data. According to estimates, the volume of data
worldwide doubles every 1.2 years.

Role of Big Data in an Enterprise

The evolution of Big Data databases has enabled enterprises to know the importance of data in
their growth and success. These databases have helped enterprises to save money, increase revenue
and achieve many other business objectives.  The real challenge faced by the enterprises is finding that
critical piece of information that provides the competitive edge. Hadoop helps in managing and handling
massive amount of data. It also helps in transforming the data into a more usable structure and format,
and extract valuable analytics from it.

Big Data in International Development

The scope of Big Data is not limited either to IT companies or any particular sector. As per Research on
Effective uses of Information and Communication Technologies for Development (ICT4D), Big Data
technologies can be extremely useful and can make important contributions in solving challenges in
international development.  Advancements in Big Data technologies result in the creation of cost-
effective opportunities which help in improving decision making process in critical areas of development
such as healthcare, employment, law and crime, security, natural calamity etc.

Big Job Opportunities in Big Data

Big data offers huge job opportunities in the IT sector, provided one possesses the right
qualifications. A study by Mckinsey & Company in 2011, report that the United States can face an acute
shortage of people with deep analytical skills in big data. Companies are and will continue to look for
skilled people who can tap Big Data’s promise of competitive advantage.  

There are several big data jobs that require skilled professionals.
 
1. Chief Data Officer
A person responsible for the overall implementation and execution of Big Data in an organization.
He or she holds an important chair in the organization. He or she should be a member of the executive
board of an organization, reporting directly to the CEO.

2. Big Data Scientist

This is going to be one of the most sought after jobs of the 21st century.  As the Big Data industry is
witnessing a magical growth, the demand for Big Data Scientist is more than ever.  But, this is not an
easy task. In order to become a successful Big Data Scientist, one needs to possess some specialized
skills such as natural learning processes, machine learning, conceptual modeling, statistical analysis,
predictive modeling and hypothetical testing etc.  
In order to be a successful Big Data Scientist, one needs to master the following capabilities too:

 Ability to work in a fast- paced multi-disciplinary environment

 Strong written and verbal communication

 Be able to develop program databases.

 Ability to query databases and perform statistical analyses

 Ability to create examples and demonstrations

 Ability to work autonomously

 Good understanding of design and architecture principals

3. Big Data Analyst

Big Data Analyst assists the Big Data Scientist in performing the necessary jobs.  His job primarily
is to work on data in a given system and analyze different data sets. The next job for a Big Data Analyst
can be that of a Big Data Scientist.

And, hence he needs to possess similar sets of skills and capabilities. The skills include data
mining skills (including data auditing, aggregation, validation and reconciliation), advanced modeling
techniques, testing, creating and explaining data in clear and concise reports. Testing skills of a big data
analyst is very important for the successful analysis of databases. It is required for a big data analyst to
be a successful communicator as he needs to communicate complex findings and ideas in a much
simplerlanguage.

4. Big Data Visualizer

Big Data Visualizer is a creative job where a person is expected to visualize the data in a way that
it becomes understandable for the senior management of an organization. He or she must be able to
understand user interface design as well as other visualization skills such as typography, interface
design, user experience design and visual art design. His/ her job is to make the abstract information
from the data analyses appealing and present it in an understandable way.  

Strong knowledge of Java script, HTML, familiarity with modern visualization frameworks such
as Gephi , experience with common web libraries such as JQuery, LESS etc., sharp analytical abilities and
proven design skills, proficiency in Photoshop, Illustrator, InDesign as well as other Adobe Creative Suite
Products.  

5. Big Data Manager

Big Data Manager acts as a bridge between the technical team members and the strategic
management in an organization. He or she leads and manages the teams of big data scientists, big data
analysts and big data visualizers. He or she must master the core management skills like communicating
effectively and efficiently, building personal relationships with the big data team, flexibility in changing
environment and ability to understand, interpret and relate organization’s strategy to the team.

6. Big Data Solutions Architect

This domain aims to address specific big data problems and requirements.  Big Data Solutions
Architect is quite important for an organization as they are trained to describe the structure and
behavior of a big data solution. He or she must be familiar with HADOOP. Some of the skills required for
a big Data Solutions Architect are ability to clearly articulate pros and cons of various technologies,
ability to document used cases, solutions and recommendations, strong written and verbal
communication skills, self- starter, ability to work in teams etc.

7. Big Data Engineer 

They develop, maintain, test and evaluate big data solutions within an organization. He or she
must possess extensive knowledge in different programming or scripting languages like Java, C++, PHP,
Ruby, Python etc. Building data processing systems with Hadoop and Hive is another important skill, he
needs to possess.

With all that said, it can be claimed that Big Data is definitely getting mainstream in the tech
savvy world of today and more and more organizations are investing on it to save a lot of time and effort
and still gain that success in their businesses.
Data Science vs. Big Data vs. Data Analytics

Fig 1: Data Science vs. BIG Data vs. Data Analytics (Source: https://www.simplilearn.com/)

Data is everywhere. The amount of digital data that exists is growing at a rapid rate, doubling
every two years, and changing the way we live. An article by Forbes states that data is growing faster
than ever before. By the year 2020, about 1.7 megabytes of new information will be created every
second for every human being on the planet, which makes it extremely important to know the basics of
the field at least. After all, here is where our future lies.
Figure 2: Where Are They Used (Source: https://www.simplilearn.com/)

What is Data Science?

Dealing with unstructured and structured data, Data Science is a field that comprises everything
that related to data cleansing, preparation, and analysis.

Data Science is the combination of statistics, mathematics, programming, problem-solving,


capturing data in ingenious ways, the ability to look at things differently, and the activity of cleansing,
preparing, and aligning the data.

In simple terms, it is the umbrella of techniques used when trying to extract insights and
information from data.

What is Data Analytics: Everything You Need To Know

Living in the 21st century, you might have often come across the word ‘data analytics’. Currently, it is
one of the most buzzing terminologies. For those who want to begin their journey in data analytics, then
this is the right read for you. 

What is Data Analytics?

Companies around the globe generate vast volumes of data daily, in the form of log files, web
servers, transactional data, and various customer-related data. In addition to this, social media websites
also generate enormous amounts of data. 

Companies ideally need to use all of their generated data to derive value out of it and make
impactful business decisions. Data analytics is used to drive this purpose. 

Data analytics is the science of examining raw data to conclude that information. It is the
process of exploring and analyzing large datasets to find hidden patterns, unseen trends, discover
correlations, and derive valuable insights to make business predictions. It improves the speed and
efficiency of your business. It is used in several industries to allow organizations and companies to make
better decisions as well as verify and disprove existing theories or models. The focus of Data Analytics
lies in inference, which is the process of deriving conclusions that are solely based on what the
researcher already knows.

Businesses use many modern tools and technologies to perform data analytics. This is data
analytics for beginners, in a nutshell. 

Data Analytics involves applying an algorithmic or mechanical process to derive insights and, for
example, running through several data sets to look for meaningful correlations between each other.

Ways to Use Data Analytics

Now that you have looked at what data analytics is, let’s understand how we can use data analytics. 

Figure 3: Ways to use Data Analytics (Source: https://www.simplilearn.com/)

1. Improved Decision Making


Data Analytics eliminates guesswork and manual tasks.  Be it choosing the right content,
planning marketing campaigns, or developing products. Organizations can use the insights they gain
from data analytics to make informed decisions. Thus, leading to better outcomes and customer
satisfaction.
2. Better Customer Service
Data analytics allows you to tailor customer service according to their needs. It also provides
personalization and builds stronger relationships with customers. Analyzed data can reveal
information about customers’ interests, concerns, and more. It helps you give better
recommendations for products and services.
3. Efficient Operations
With the help of data analytics, you can streamline your processes, save money, and boost
production. With an improved understanding of what your audience wants, you spend lesser time
creating ads and content those are not in line with your audience’s interests.
4. Effective Marketing
Data analytics gives you valuable insights into how your campaigns are performing. This helps in
fine-tuning them for optimal outcomes. Additionally, you can also find potential customers who are
most likely to interact with a campaign and convert into leads.

Steps Involved in Data Analytics


The next step to understanding what data analytics is to learn how data is analyzed in
organizations. There are a few steps that are involved in the data analytics lifecycle. Let’s have a look at
it with the help of an analogy. 

Imagine you are running an e-commerce business and your company has nearly a million in
customer base. Your aim is to figure out certain problems related to your business, and subsequently
come up with data-driven solutions to grow your business.

Below are the steps that you can take to solve your problems.

Figure 4: Data Analytics process steps (Source: https://www.simplilearn.com/)

1. Understand the problem: Understanding the business problems, defining the organizational goals,
and planning a lucrative solution is the first step in the analytics process. E-commerce companies often
encounter issues such as predicting the return of items, giving relevant product recommendations,
cancellation of orders, identifying frauds, optimizing vehicle routing, etc.

2. Data Collection: Next, you need to collect transactional business data and customer-related
information from the past few years to address the problems your business is facing. The data can have
information about the total units that were sold for a product, the sales, and profit that were made, and
also when was the order placed. Past data plays a crucial role in shaping the future of a business.
3. Data Cleaning: Now, all the data you collect will often be disorderly, messy, and contain unwanted
missing values. Such data is not suitable or relevant for performing data analysis. Hence, you need to
clean the data to remove unwanted, redundant, and missing values to make it ready for analysis.

4. Data Exploration and Analysis: After you gather the right data, the next vital step is to execute
exploratory data analysis. You can use data visualization and business intelligence tools, data mining
techniques, and predictive modeling to analyze, visualize, and predict future outcomes from this data.
Applying these methods can tell you the impact and relationship of a certain feature as compared to
other variables. 

Below are the results you can get from the analysis:

 You can identify when a customer purchases the next product.

 You can understand how long it took to deliver the product. 

 You get a better insight into the kind of items a customer looks for, product returns, etc. 

 You will be able to predict the sales and profit for the next quarter. 

 You can minimize order cancellation by dispatching only relevant products.

 You’ll be able to figure out the shortest route to deliver the product, etc.

5. Interpret the results: The final step is to interpret the results and validate if the outcomes meet your
expectations. You can find out hidden patterns and future trends. This will help you gain insights that
will support you with appropriate data-driven decision making. 

Data Analytics for Beginners - Tools used

Figure 5: Data Analytics for Beginners - Tools Used (Source: https://www.simplilearn.com/)


1. Python: Python is an object-oriented open-source programming language. It supports a range of
libraries for data manipulation, data visualization, and data modeling. 
2. R: R is an open-source programming language majorly used for numerical and statistical
analysis. It provides a range of libraries for data analysis and visualization.
3. Tableau: It is a simplified data visualization and analytics tool. This helps you create a variety of
visualizations to present the data interactively, build reports, and dashboards to showcase
insights and trends. 
4. Power BI: Power BI is a business intelligence tool that has an easy ‘drag and drop’ functionality.
It supports multiple data sources with features that visually appeal to data. Power BI supports
features that help you ask questions to your data and get immediate insights.
5. QlikView: QlikView offers interactive analytics with in-memory storage technology to analyze
vast volumes of data and use data discoveries to support decision making. It provides social data
discovery and interactive guided analytics. It can manipulate colossal data sets instantly with
accuracy. 
6. Apache Spark: Apache Spark is an open-source data analytics engine that processes data in real-
time and carries out sophisticated analytics using SQL queries and machine learning algorithms. 
7. SAS: SAS is a statistical analysis software that can help you perform analytics, visualize data,
write SQL queries, perform statistical analysis, and build machine learning models to make
future predictions. 

Data Analytics Applications 

Figure 6: Various applications of data analytics (Source: https://www.simplilearn.com/)

Data analytics is used in almost every sector of business, some examples are below:

a) Retail: Data analytics helps retailers understand their customer needs and buying habits to
predict trends, recommend new products, and boost their business.

They optimize the supply chain, and retail operations at every step of the customer journey.
b) Healthcare: The main challenge for hospitals with cost pressures tightens is to treat as many
patients as they can efficiently, keeping in mind the improvement of the quality of care.
Instrument and machine data are being used increasingly to track as well as optimize patient
flow, treatment, and equipment used in the hospitals. It is estimated that there will be a 1%
efficiency gain that could yield more than $63 billion in global healthcare savings.

Healthcare industries analyze patient data to provide lifesaving diagnoses and treatment
options. Data analytics help in discovering new drug development methods as well. 
c) Manufacturing: Using data analytics, manufacturing sectors can discover new cost-saving
opportunities. They can solve complex supply chain issues, labor constraints, and equipment
breakdowns.
d) Banking sector:  Banking and financial institutions use analytics to find out probable loan
defaulters and customer churn out rate. It also helps in detecting fraudulent transactions
immediately.
e) Logistics: Logistics companies use data analytics to develop new business models and optimize
routes. This, in turn, ensures that the delivery reaches on time in a cost-efficient manner.

f) Travel: Data analytics can optimize the buying experience through mobile/ weblog and social
media data analysis. Travel sights can gain insights into the customer’s desires and preferences.
Products can be up-sold by correlating the current sales to the subsequent browsing increase
browse-to-buy conversions via customized packages and offers. Personalized travel
recommendations can also be delivered by data analytics based on social media data.

g) Gaming: Data Analytics helps in collecting data to optimize and spend within as well as across
games. Game companies gain insight into the dislikes, the relationships, and the likes of the
users.

h) Energy Management: Most firms are using data analytics for energy management, including
smart-grid management, energy optimization, energy distribution, and building automation in
utility companies. The application here is centered on the controlling and monitoring of network
devices, dispatch crews, and manage service outages. Utilities are given the ability to integrate
millions of data points in the network performance and lets the engineers use the analytics to
monitor the network.

Those were a few of the applications involving data analytics. To make things simpler, this blog will
also focus on a case study from Walmart. Here you can observe how data analytics is applied to grow a
business and serve its customers better.

Applications of Data Science

 Internet Search

Search engines make use of data science algorithms to deliver the best results for search
queries in a fraction of seconds.
 Digital Advertisements

The entire digital marketing spectrum uses the data science algorithms - from display
banners to digital billboards. This is the mean reason for digital ads getting higher CTR than
traditional advertisements.

 Recommender Systems

The recommender systems not only make it easy to find relevant products from billions of
products available but also adds a lot to user-experience. A lot of companies use this system to
promote their products and suggestions in accordance with the user’s demands and relevance of
information. The recommendations are based on the user’s previous search results.

Applications of Big Data

1. Big Data for Financial Services

Credit card companies, retail banks, private wealth management advisories, insurance firms,
venture funds, and institutional investment banks use big data for their financial services. The
common problem among them all is the massive amounts of multi-structured data living in multiple
disparate systems, which can be solved by big data. Thus big data is used in several ways like: 

 Customer analytics  Fraud analytics

 Compliance analytics  Operational analytics

2. Big Data in Communications

Gaining new subscribers, retaining customers, and expanding within current subscriber
bases are top priorities for telecommunication service providers. The solutions to these challenges
lie in the ability to combine and analyze the masses of customer-generated data and machine-
generated data that is being created every day.

3. Big Data for Retail

Brick and Mortar or an online e-tailer, the answer to staying the game and being
competitive is understanding the customer better to serve them. This requires the ability to analyze
all the disparate data sources that companies deal with every day, including the weblogs, customer
transaction data, social media, store-branded credit card data, and loyalty program data.
Figure 7: What Are the Skills Required (Source: https://www.simplilearn.com/)

Skills Required to Become a Data Scientist

 Education: 88% have a Master’s Degree, and 46% have PhDs

 In-depth knowledge of SAS or R: For Data Science, R is generally preferred.

 Python coding: Python is the most common coding language that is used in data science, along
with Java, Perl, C/C++.

 Hadoop platform: Although not always a requirement, knowing the Hadoop platform is still
preferred for the field. Having a bit of experience in Hive or Pig is also a huge selling point.

 SQL database/coding: Though NoSQL and Hadoop have become a significant part of the Data
Science background, it is still preferred if you can write and execute complex queries in SQL.

 Working with unstructured data: It is essential that a Data Scientist can work with unstructured
data, be it on social media, video feeds, or audio.
Skills Required to Become a Big Data Specialist

 Analytical skills: The ability to be able to make sense of the piles of data that you get. With
analytical skills, you will be able to determine which data is relevant to your solution, more like
problem-solving.

 Creativity: You need to have the ability to create new methods to gather, interpret, and analyze
a data strategy. This is an extremely suitable skill to possess.

 Mathematics and statistical skills: Good, old-fashioned “number crunching.” This is extremely
necessary, be it in data science, data analytics, or big data.

 Computer science: Computers are the workhorses behind every data strategy. Programmers will
have a constant need to come up with algorithms to process data into insights.

 Business skills: Big Data professionals will need to have an understanding of the business
objectives that are in place, as well as the underlying processes that drive the growth of the
business as well as its profit.

Skills Required to Become a Data Analyst

 Programming skills: Knowing programming languages are R and Python are extremely important
for any data analyst.

 Statistical skills and mathematics: Descriptive and inferential statistics and experimental designs
are a must for data scientists.

 Machine learning skills

 Data wrangling skills: The ability to map raw data and convert it into another format that allows
for more convenient consumption of the data.

 Communication and Data Visualization skills

 Data Intuition: it is extremely important for a professional to be able to think like a data analyst.

Salary Trends

Though in the same domain, each of these professionals, data scientists, big data specialists, and
data analysts, earn varied salaries.

Data Scientist Salary

According to Glassdoor, the average salary of a Data Scientist is $108,224 per year.

Big Data Specialist Salary

According to Glassdoor, the average salary of a Big Data Specialist is $106,784 per year.
Data Analyst Salary

According to Glassdoor, the average salary for a Data Analyst is $61,473 per year.

The salary increases as per the knowledge and expertise you bring to the table. 

Now that you know the differences, which one do you think is most suited for you – Data Science? Big
Data? Or Data Analytics?

Activity 1: My Big Data Experience

Required Article Reading for Video Reflection:

How Facebook is Using Big Data - The Good, the Bad, and the Ugly

https://www.simplilearn.com/how-facebook-is-using-big-data-article?
source=frs_left_nav_clicked

References:

SimpliLearn. What is Data Analytics: Everything You Need To Know. Retrieved from
https://www.simplilearn.com/tutorials/data-analytics-tutorial/what-is-data-analytics?
source=sl_frs_eng_user_clicks_on_watch_tutorial

Monnappa, Avantika (2020). Data Science vs. Big Data vs. Data Analytics. Retrieved from
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article

Verma, Eshna (2020). How Big is Big Data. Retrieved from https://www.simplilearn.com/how-big-is-big-
data-rar335-article?source=frs_category

What is Business Analytics. Retrieved fromhttps://www.microstrategy.com/us/resources/introductory-


guides/business-analytics-everything-you-need-to-know

Copy

You might also like