Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Chapter 1: Introduction ti Big Data Analytics

1. What are the 3 characteristics of big data


→ Volume, velocity, variety

2. What are the 4 big data structures and give examples:


- Structured data: Data containing a defined data type, format, and structure
(RDBMS- relational database management system)
- Semi structured data: Textual data files with a discernible pattern that
enables parsing (XML)
- Quasi-Structured data: Textual data with erratic data formats that can be
formatted with effort, tools, and time(Web clickstream)
- Unstructured data: Data that has no inherent structure(text, image, video)

3. Give examples of data repositores and why they are used


+ Data islands/ spreadmarts - like excel, result in many versions of the data
+ Data warehouses - centralized data + security, one source for the data,
enables bi, howver its controlled by It groups and DBAs
+ Analytic sandbox - resolves the confict for analysts and DS with EDW
4. What is an analytical sandbox?
- A workspace designed to enable teams to explore many datasets in a
controlled fashion.
5. What are the business drivers for advanced analytics?
- Optimize business operations - sales, pricing, profitability, efficiency
- Identify business risks - churn, fraud, default
- Predict new business opportunities - upsell, cross-sell, best new customer
prospects
- Comply with laws or regulatiory requirements - aml, kyc
6. Why is an analytical sandbox important ?
- Because it enables flexible, high performance analysis in a nonproduction
environment, reduces costs and risks associated with data application into
"shadow" file systems. It is quicker because it enables the analysis to be done
in the database instead of bringing the data onto another program for
analysis.
7. Explain the difference between Bi and data science:
-BI - provides reports, dashboards, and queries on busines questions for the
current period or the past
- uses highly structured data organized in rows and columns for accurate reporting
DS - uses data about the present to explore informed decision making about the
future
- tend ro use many types of data sources, including large or unconventional datasets
8.

You might also like