2. What are the 4 big data structures and give examples:
- Structured data: Data containing a defined data type, format, and structure (RDBMS- relational database management system) - Semi structured data: Textual data files with a discernible pattern that enables parsing (XML) - Quasi-Structured data: Textual data with erratic data formats that can be formatted with effort, tools, and time(Web clickstream) - Unstructured data: Data that has no inherent structure(text, image, video)
3. Give examples of data repositores and why they are used
+ Data islands/ spreadmarts - like excel, result in many versions of the data + Data warehouses - centralized data + security, one source for the data, enables bi, howver its controlled by It groups and DBAs + Analytic sandbox - resolves the confict for analysts and DS with EDW 4. What is an analytical sandbox? - A workspace designed to enable teams to explore many datasets in a controlled fashion. 5. What are the business drivers for advanced analytics? - Optimize business operations - sales, pricing, profitability, efficiency - Identify business risks - churn, fraud, default - Predict new business opportunities - upsell, cross-sell, best new customer prospects - Comply with laws or regulatiory requirements - aml, kyc 6. Why is an analytical sandbox important ? - Because it enables flexible, high performance analysis in a nonproduction environment, reduces costs and risks associated with data application into "shadow" file systems. It is quicker because it enables the analysis to be done in the database instead of bringing the data onto another program for analysis. 7. Explain the difference between Bi and data science: -BI - provides reports, dashboards, and queries on busines questions for the current period or the past - uses highly structured data organized in rows and columns for accurate reporting DS - uses data about the present to explore informed decision making about the future - tend ro use many types of data sources, including large or unconventional datasets 8.
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions