Professional Documents
Culture Documents
Task 2a
Task 2a
This article will highlight the primary attributes of both data analytics and data science,
explain what data analysts and data scientists do and the skills they should have, clarify
when to use each process, and explain how to choose which role to hire for your team.
Sens
The goal of data science is to apply scientific methods and predictions to business
goals and discover new and unique questions to drive the business forward. Some
useful predictions that data science can help with include working out how many
supplies should be purchased based on expected sales volume, or answering a
question like “if we raise prices by X%, what is the predicted impact on sales and
revenue?”
There are four main types of data analytics: descriptive, diagnostic, predictive, and
prescriptive analytics. Descriptive and diagnostic analytics are done by data analysts, but
predictive and prescriptive analytics fall under the realm of data science. This is the
main difference between the two fields: data analytics looks backward and focuses on
past data, aiming to identify trends (by describing the past and diagnosing why certain
events happened). Data science looks forward and focuses on the future (by predicting
it or prescribing what should happen).
Data science involves coming up with and answering key questions that are game-
changers for driving businesses forward. Data analytics focuses on asking specific
questions that are on more of a micro-scale or are specific to a particular team. Despite
the smaller scale of the questions, data analytics answers very useful questions that
tend to be asked on a regular basis, which is why a key part of data analytics is
Sens
itivit operationalizing–procedurally automating–analytics reports.
y:
Inte
rnal
doc Both data analytics and data science make use of statistics; however, the types of
ume
nt statistics used in data analytics tend to be more rudimentary than those used by data
for
Uni science. Data analytics tends to use aggregation methods such as averages,
on
Ban
k
percentiles, sums, and counts in spreadsheets, analytics tools (such
as Mixpanel, Amplitude, or PostHog), or relational databases and data warehouses. Data
scientists, on the other hand, use more advanced statistical methods such as
regression or cluster analysis. Data scientists also commonly use machine learning
models, whereas data analysts are much less likely to do so.
Data analysts will always be provided with a question that needs answering and will
usually have access to structured data to help them with their analysis. Structured data
is data that is highly organized in its structure: for example, data that is stored in a
spreadsheet or relational database. Data scientists, by contrast, often have to wade
through large amounts of unstructured data (for example, image data, social media
posts, or large amounts of free text) and use data mining techniques to find useful
insights from it. They may also have to come up with their own questions, and they
must be able to justify why answering these questions adds value to the business.
One of the areas of confusion in comparing data analytics and data science is that
predictive and prescriptive analytics are sometimes viewed as part of data analytics
(because they are two of the four main types of data analytics), but they are also viewed
as part of data science because they tend to be done by data scientists. This Venn
diagram shows which activities are considered part of data science or data analytics
(some things are done in both fields), as well as using a color-coded key for which of
these tasks are done by data scientists, data analysts, or both.
Sens
itivit
y:
Inte
rnal
doc
ume
nt
for REF: Data Analytics vs. Data Science (rudderstack.com)
Uni
on
Ban
k
Overview: Data science vs data analytics
Think of data science as the overarching umbrella that covers a wide range of tasks performed to find
patterns in large datasets, structure data for use, train machine learning models and develop artificial
intelligence (AI) applications. Data analytics is a task that resides under the data science umbrella and is
done to query, interpret and visualize datasets. Data scientists will often perform data analysis tasks to
understand a dataset or evaluate outcomes.
Business users will also perform data analytics within business intelligence (BI) platforms for insight into
current market conditions or probable decision-making outcomes. Many functions of data analytics—such
as making predictions—are built on machine learning algorithms and models that are developed by data
scientists. In other words, while the two concepts are not the same, they are heavily intertwined.
REF: Data science vs data analytics: Unpacking the differences - IBM Blog
Data analysts examine large data sets to identify trends, develop charts, and create visual presentations to
help businesses make more strategic decisions.
Data scientists, on the other hand, design and construct new processes for data modeling and production
using prototypes, algorithms, predictive models, and custom analysis.
Data analysts have a range of fields and titles, including (but not limited to) database analyst, business
analyst, market research analyst, sales analyst, financial analyst, marketing analyst, advertising analyst,
customer success analyst, operations analyst, pricing analyst, and international strategy analyst. The best
data analysts have both technical expertise and the ability to communicate quantitative findings to non-
technical colleagues or clients.
Working professionals that are considering changing careers could benefit if they have experience in
mathematical or statistical fields. Adding the pursuit of an advanced degree in the data industry will greatly
impact their job opportunities and make for a smooth transition into a data analysis position.
Data science
Data scientists use programming, math, and statistics to gain insights and drive organizational
strategy. Data scientists are highly adept at machine learning, data modeling, and the use of
algorithms to automate processes. Since meaningful data is field-specific, data scientists also
must have domain expertise, the understanding of their industry or company, to provide context for
the data they work with. For example, data science research in healthcare can drive diagnoses,
help prevent disease, or teach computers to read X-rays or MRIs.
Data scientists work closely with sales and marketing, product development, information
technology, finance, and business leaders to help identify trends, spot issues, understand
consumer behavior, and present solutions that support strategic decision-making.
Data analytics
Data analytics professionals are responsible for data collection, organization, and maintenance,
as well as for using statistics, programming, and other techniques to gain insights from data. The
role of a data analyst is to spot trends and help solve problems. Examples of data analytics in
retail include order tracking, recommendation features, and identification of store locations.
Data analysts tend to respond to requests from decision-makers rather than drive the decision-
making process.
Professionals in both data science and data analytics manipulate huge data sets with millions of
data points. These massive databases may have low-quality data that must be wrangled
(cleaned), maintained, and organized so that any analysis is accurate.
Technical skills
Both fields require programming skills (such as in R, Python, Tableau, and SQL), as well as
Sens
statistics, Excel, and data visualization and modeling proficiency. Professionals in both fields must
itivit be highly analytical and have a methodical approach to problem-solving and project management.
y:
Inte Communication skills
rnal
doc
ume Data scientists and data analysts work with colleagues across departments, many of whom may
nt not have a tech background. Professionals in both fields are responsible for presenting their
for
Uni
findings in a clear and effective manner.
on
Ban
k
Differences between data science and data analytics
The major difference between data science and data analytics is scope. A data scientist’s role is
far broader than that of a data analyst, even though the two work with the same data sets. For that
reason, a data scientist often starts their career as a data analyst.
Responsibilities
Data scientists model data to make predictions, identify opportunities, and support strategy. They
use data to understand the future. The role of the data analyst is to solve problems and spot
trends. They work with the data as a snapshot of what exists now.
Data scientists use algorithms and machine learning to improve the ways that data supports
business goals. Data analysts collect, store, and maintain data and analyze results.
Ref: Data Science vs. Data Analytics: What’s the Difference? | Maryville Online
Conducting data analysis involves a variety of tools, skills and computing languages to perform statistical
analyses and answer questions to solve organizational challenges. A data analyst may use a query language
like SQL, programming language like R and SAS, and visualization tools like Power BI and Tableau in the
course of their work. This often involves figuring out how to deal with missing data.
Strong communication skills are also useful in data analysis. Data analysts are often required to convey their
findings to outside teams or stakeholders, explaining their reasoning and research to justify their
conclusions.
Data analysts and data scientists serve important yet distinct roles in an organization. Here are a few ways
they can contribute to the same data set or project:
A data analyst makes sense out of existing data through routine analysis and writing reports. A data
scientist works on new ways to capture, store, manipulate and analyze that data.
A data analyst works toward answering business-related questions. A data scientist works to develop new
ways to ask and answer those questions.
A data analyst relies on database software, business intelligence programs and statistical software. A data
scientist uses Python, Java and machine learning to manipulate and analyze data.
TASK 2B
Categories of Data
1. Qualitative Data
Sens
Qualitative data is used to represent no numerical information. This data type is used to represent the
itivit
y: qualities and characteristics of the given information, such as colour, gender, symbols, text, taste, etc. It
Inte cannot be presented in numerical form. These data are obtained from interviews, meetings, surveys, etc.
rnal They are also known as Categorical data. There are two main types of qualitative data: Nominal data and
doc
ume Ordinal data. Let us learn about them in detail.
nt
for
Uni
on
Ban
k
1. Nominal Data
Nominal data is a type of qualitative data that is used to represent data into labels based on different
categories. They do not have any specific order or numerical significance. Let us understand it better with a
few real-world examples.
This is also a type of qualitative data where only non-numerical data is considered. It is almost similar to
nominal data. However, there is just one major difference, ordinal data are arranged in a meaningful order,
unlike nominal data, which does not follow any specific order.
2. Quantitative Data
Quantitative data is a type of data that represents numerical information that we can count and measure.
Sens They are also known as Numerical data. It generally gives answers to “how many”, “how much”, etc. This
itivit
y:
data can be represented in graphical and chart forms
such as bar graphs, histograms, pie charts, etc. Let us
Inte understand quantitative data with some examples.
rnal
doc Marks in a test
ume
Temperature
nt
for
Weight
Uni Sales figure
on
Ban
k
These are some common examples of numerical data. It will always represent information in numerical
form. There are two major types of quantitative data: Discrete and continuous. Let us know about them in
detail.
1. Discrete Data
Discrete data is used to represent distinct or separate numerical values. They are discrete because they can
be presented in the form of whole numbers or integers, which cannot be divided into smaller parts.
However, the discrete data can be counted and is not infinite. They can be easily represented by various
graphs and charts, such as bar graphs, number lines, etc. Let us understand with a few examples given
below.
Continuous data is a data type that deals with an infinite range of numerical data. They are generally defined
within a specific range, with any value within that range. It can be easily divided into smaller fractional or
decimal values. They are generally used in fractional form, unlike discrete, which uses only whole numbers
or integers.
The main difference between continuous data and discrete data is that discrete data cannot be presented in
decimal or fractional form, while continuous data can be presented in fractional form. Let us understand it
with some common examples.
Height of a person
Temperature in celsius or fahrenheit
Weight in pounds or kilograms
Distance in meter or kilometers
Share price of market
REF: 4 Types Of Data- Nominal, Ordinal, Discrete And Continuous (pwskills.com)
TASK 2C
Sens
itivit
DATA VIRTUALIZATION
y:
Inte
rnal
doc
ume
nt
for
Uni What is Data Virtualization?
on
Ban
k
Data Virtualization is a data integration approach that allows an application to retrieve and
manipulate data without requiring technical details, such as how it is formatted or where it is
physically located. It provides a single, unified, and consistent business view of data across
various, disparate data sources, making it easier for business users to access data.
Real-time access to data: It provides business users with real-time access to data
regardless of its location.
Data abstraction: It hides the complexities of data, such as its source, format, location,
and storage technology, from end-users.
Data federation: It aggregates data from multiple sources and delivers a unified,
consolidated view of it.
Cache: To improve performance, it saves recent or frequent data requests in the cache.
Data transformation: It transforms data into business-friendly formats.
Architecture
The architecture of data virtualization comprises of three primary components: the data
consumers (applications, BI tools, etc.), the data virtualization layer (which abstracts and provides
unified view of the data), and the data providers (databases, web services, flat files, etc.).
Latency and performance issues can occur if data is being accessed from multiple,
Sens
itivit
geographically-dispersed sources.
y: Security control implementation can be complex due to diverse data sources.
Inte
rnal As it depends on source systems for data, any changes in those systems can impact
doc
ume the virtualization layer.
nt
for
Uni Integration with Data Lakehouse
on
Ban
k
Implementing Data Virtualization in a data lakehouse environment can simplify data management
and enhance accessibility. A lakehouse merges the features of data lakes and data warehouses.
Thus, data virtualization becomes a key capability in a lakehouse architecture to provide a unified
view of data, regardless of its format or location.
Security Aspects
Data Virtualization employs data security measures like data masking, encryption, and role-
based access control to ensure data privacy and compliance with regulations.
Performance
While Data Virtualization facilitates real-time access to data, its performance can be influenced by
factors such as network latency, the performance of source systems, and hardware limitations.
What is Data
Virtualization?
Data virtualization decouples the database layer that sits between the storage and application layers
in the application stack. Just like a hypervisor sits between the server and the OS to create a virtual
server, database virtualization software sits between the database and the OS to abstract/virtualize
the data store resources.
Because database resources are virtualized, they require a much smaller storage footprint than the
source database. Instead of making and moving new blocks of data, virtual data (virtual data copies)
use pointers to data blocks, providing high-performance access to data already in place.
Data virtualization provides the ability to securely manage and distribute policy-governed virtual
copies of production-quality datasets. No matter the underlying database management system
(DBMS) or source database location, data virtualization technology creates block-mapped virtual
copies of the database for rapid and controlled distribution all while leaving a minimal storage
footprint no matter how many copies are used.
Sens
itivit
y:
Inte
Why Virtualize Data?
rnal
doc
ume The speed of innovation and ability to adapt to rapidly changing market trends rests on the agility of
nt your release cycle and the ability to quickly diagnose, triage, and fix errors. Data virtualization is the
for
Uni
on
Ban
k
critical lever used by forward-thinking enterprises to provision production-quality data to dev and
test environments on demand or via APIs.
Virtual data copies are fully readable/writeable, and can be provisioned or torn down in just
minutes, eliminating development’s reliance on slow serial ticketing systems and DBA involvement
for initial data delivery as well as data refreshes after destructive testing.
Data virtualization technology facilitates data delivery across all phases of application development,
including testing, release, and production fix. Traditionally, IT organizations rely on a request-fulfill
model, in which developers and testers often find their requests queuing behind others. Because it
takes significant time and effort to create a copy of test data, it can take days, or even weeks to
provision or refresh data for a test environment. This creates massive wait states in the software
delivery life cycle, slowing the pace of application delivery.
To keep pace with a faster release cadence, dev and test teams are forced to work with a stale copy
of data because refreshing test data takes too long. This can result in missed test cases and
ultimately data-related defects escaping into production.
Sens
itivit Data Virtualization Capabilities
y:
Inte
rnal By virtualizing data software teams have:
doc
ume
nt
for
Uni
on
Ban
k
Enterprise Grade Distribution: Provision lightweight virtual database copies in minutes
(depending on the types and size of files) via UI or API that scale with your agile
development goals.
Built for Scale: Replicate data from production to non-production environments at scale,
either on-premises or in the cloud for multiple instances. Teams can provision virtual
databases as necessary without taxing storage.
Data Governance: Put your InfoSec department at ease with data controls that govern who
can do what, where, and when over specific datasets. When combining best-in-class security,
consistent data-masking policies, and robust auditing, data Virtualization becomes a security
asset.
Cost Savings: Maximize testing throughput while minimizing storage use - Virtual Datasets
provisioning, destruction, refresh and rewind all provide new tools for application testers to
maximize testing throughput with virtually no additional storage cost.
REF : What is Data Virtualization? | Delphix
Data virtualization is a special kind of data integration technology that provides access of data in real time,
seamlessly all in one place. Think of it like a television guide which contains a listing of shows on a variety
of channels, without having to be on that channel to see the content. In data virtualization, customers can
access and manipulate each datum, regardless of physical location or formatting. Instead it is one stop
shopping. “Data virtualization solutions, also, create integrated views of the data, across the multiple
sources, without moving the data to a new location.” Data virtualization typically can access a wide variety
of Enterprise Data Architectures, including those on premise and in the cloud, and adapts agilely
to structural changes, without impacting the business.
A single database view/s allowing access to distributed databases and multiple heterogeneous data stores.
(DAMA DMBoK2)
“A technology that delivers information from various data sources, including big data sources such as Hadoop
and distributed data stores in real-time and near-real time.” (Boston University)
Abstraction of IT data resources “that masks the physical nature and boundaries of those resources from
resource users.” (Gartner)
A technical “approach by which data access can be easily centralized, standardized, and secured across the
enterprise, no matter the location, design or platform of the data source.” (Indiana University)
“The process of aggregating data from different sources of information to develop a single, logical and virtual
view of information so that it can be accessed by front-end solutions such as applications, dashboards and
portals without having to know the data’s exact storage location.” (Techopedia)
Creating “a world class process and strategy to automate the data forensics and resolve regulatory
requirements across the organization”
Complying with the European General Data Protection Regulation (GDPR)
Aiding in the development of blockchain and machine learning projects within an organization
Sens
itivit Businesses Need Data Virtualization To:
y:
Inte Spend 40 percent less on building and managing data integration
rnal
doc
Connect distributed data assets
ume Limit data silos
nt Drive new innovations
for Streamline operations
Uni
on
Ban
k
REF: What Is Data Virtualization? - DATAVERSITY
Sens
itivit
y:
Inte
rnal
doc
ume
nt
for
Uni
on
Ban
k