Database Design and Introduction To MySQL Day - 1

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

#LifeKoKaroLift

AI-ML

1
Module Name: Database
Design and Introduction to
MySQL

Course : DBMS
Edit Master text styles
Lecture On : Database
Design and Introduction to
MySQL - Day - 4
Instructor :

2
Today’s Agenda

● Revision
● Data Warehouse
● ERD
● Star and Snowflake Schemas
● OLAP vs OLTP
● Entity Constraints
● Referential Constraints
● Key Takeaways

Data Science Certification 3


Revision

In our study of the fundamentals of Database Management Systems, we


previously explored the concepts of:

● Entity-Relationship Diagram illustration


● Specialization and Generalization.
Data Warehouse

● Imagine you work as a data analyst at DMart and are responsible for three
departments: marketing, sales, and finance. Each department has its own database,
which is an organized collection of data. However, this can lead to different answers
for the same query in different departments, as each department uses its own
database for information. This can become a problem due to the size of DMart's
operations.

● To solve this issue, a data warehouse can provide a unified version of facts and
truth. It acts as a central repository for all the data of the entire organization. This
helps in creating a single source of truth for data analytics. In this section, let's
understand what a data warehouse is and how it can aid data analytics for
companies.
Data Warehouse

● The concept of a data warehouse is similar to that of a physical warehouse. The


similarities between the two can be seen in the following comparison table:
Data Warehouse

● The characteristics of a data warehouse include being


○ subject-oriented,
○ integrated,
○ non-volatile, and
○ time variant.

● It is designed for a specific purpose and consolidates data from multiple sources
into a single form.

● Additionally, the data stored in a data warehouse remains unchanged and it is


capable of retaining information over time.
Data Warehouse

The following diagram depicts the process of capturing data in an organization.


Data Warehouse

● The diagram displays the following steps in the data warehouse process:

● Data is gathered from multiple sources, such as databases or Excel files.

● The data undergoes an Extract, Transform, Load (ETL) process, which consists of:

○ Extracting information from the sources


○ Transforming the data with necessary functions for loading into the final
destination
○ Loading the data into the central repository, known as the data warehouse.

● The stored data can be utilized for further processing using Online Analytical
Processing (OLAP) or for creating reports using tools such as Tableau or PowerBI.
Data Warehouse
● In data warehousing, the dimension model is utilized to improve data retrieval
speed. This model has two components: Facts and Dimensions.

● Facts refer to numerical information, while dimensions are metadata that provide
context to the facts. Both are crucial in generating meaningful insights from data.

● Before constructing a database, it is important to create a clear Entity Relationship


Diagram (ERD) that outlines the schema of the database.

● This diagram acts as a map, enabling us to visualize the database structure and
answer the following questions:

○ What tables does it include?


○ What are the columns for each table?
○ What are the data types and constraints (if any) for each column?
○ What is the relationship between the various tables?
Poll Question
Q. Which of the following is not a characteristic of a data warehouse?

a) Time variant
b) Volatile
c) Disintegrated
d) Subject Oriented
Entity Relationship Diagram (ERD)

● An Entity Relationship Diagram (ERD) is a graphical representation of entities and


their relationships to each other.

● It is similar to a map that shows the relationships between different parts of a


geographical region.

● In a database context, an ERD can provide an overview of the connections between


tables and entities in a database.

● Designing an ERD is a crucial step in data modeling. Before creating the tables, an
ERD is typically created to visualize the relationships between entities in a
database.
Entity Relationship Diagram (ERD)
● Now, let's examine the ERD and understand the symbols used. These symbols
have been described in the accompanying image.
Entity Relationship Diagram (ERD)

● In an ERD, the following notations are used to represent the relationships between
tables:

○ A star symbol indicates a primary key in a table.


○ A connecting line between two tables signifies that there is a relationship
between them.
○ A line connecting a table back to itself represents a self-referencing
relationship.

● A representation of the various types of


cardinalities that can occur in a relationship
can be seen in the diagram below.
Entity Relationship Diagram (ERD)

● In an ERD, four types of relationships can be identified:

1. One-to-One: This type of relationship exists when a single instance of an entity is


related to only one instance of another entity.

2. One-to-Many: This relationship occurs when a single instance of an entity is


associated with multiple instances of another entity.

3. Many-to-One: This type of relationship exists when multiple instances of an entity


are related to a single instance of another entity.

4. Many-to-Many: This type of relationship exists when multiple instances of an entity


are associated with multiple instances of another entity.
Star and Snowflake Schemas

● You learned about facts and dimensions, which are the two fundamental
components of dimensional modeling.

● When faced with multiple databases containing numerous variables, it may not be
necessary to include all of them in the data model.

● Therefore, only a specific combination of facts and dimensions is used to create the
structure of the data model, known as a schema diagram.

● A schema diagram is a blueprint of the entire data model and demonstrates the
relationships between various data sets and how the attributes of each data set are
utilized in the database design.

● One type of schema design is called a Star schema, which is characterized by a


central fact table that is connected to multiple dimension tables.
Star and Snowflake Schemas

● Advantages of Star Schemas include:

○ Simple to comprehend.

○ Straightforward to put into practice.

○ Efficient when it comes to querying, as they only require one join at most.

● Let's examine the composition of a Star and Snowflake Schema to have a clearer
understanding.
Star and Snowflake Schemas

● Let's take a closer look at the structure of star and snowflake schemas. The diagram
below illustrates the shape of a star schema, which is star-like.
Star and Snowflake Schemas

● The snowflake schema gets its name from its unique structure where the dimension
tables branching off from the central fact table resemble a snowflake.

● The diagram provided below gives a clearer representation of this design.


Star and Snowflake Schemas

● When working with a snowflake schema, it's common to split up related dimension
information into separate tables.

● This helps in better management of the dimensional table size.

● As an example, in a customer dimension, there may be an attribute for city ID.

● To improve readability, it can be useful to store additional city information such as


the city name and postal code in a different dimension table.
OLAP vs OLTP
● You may have a clear understanding of databases, but it's essential to know how
data warehouses and regular databases differ.

● Now, Let’s learn how data warehouses are unique and distinguishable from regular
databases.

● One significant difference between OLTP and OLAP is evident from their names:
OLTP is designed for transactional operations, while OLAP is optimized for
analytical purposes.

● Dimensional modeling and star schema were introduced earlier as crucial elements
in constructing a data warehouse.

● These methods include identifying the variables that can be analyzed and
integrating them with metadata to obtain useful insights.
Entity Constraints

● You learned about the ETL process which is used for transferring data into a
schema and carrying out operations on it.

● But, is it possible to add any value to a schema without restrictions or are there
limitations to maintain the integrity of the schema being established?

● Think about being a data analyst at Uber. The company's data warehouse includes
multiple tables that keep track of different aspects such as the rider's information,
driver's details, the vehicle used and transaction information.

● Each table has several associated attributes that describe it in detail.


Entity Constraints

● Let's take the "Rider" table as an example.

● Suppose you are reviewing the values in a column that records the fare for each
ride. It is reasonable to expect that the fare would not exceed four digits.

● A fare over even ₹5,000 would raise questions and is an anomaly that needs to be
addressed.

● Implementing the appropriate constraints for the appropriate attributes in a table can
prevent such issues.
Entity Constraints

● MySQL uses constraints to establish rules for the values that can be stored in the
columns of a database.

● This helps maintain data integrity, or the accuracy and consistency of the data
stored in the database.

● There are several types of entity constraints, including:

○ Unique: Ensures that each value in the column is unique.


○ Null: Specifies which columns can contain null values.
○ Primary Key: Identifies the column that uniquely identifies a table and must be
both unique and not null.
Referential Constraints

● The next type of constraint is called a referential integrity constraint.

● This type of constraint enforces a relationship between two tables by limiting the
values in one table based on the values in another table.

● For example, consider the tables below.


Referential Constraints

● In a database, referential constraints are used to ensure the consistency of data by


establishing a relationship between two tables.

● One table, called the referencing table, contains a foreign key column that refers to the primary
key column of another table, called the referenced table.

● For example,
○ The 'Orders' table has a foreign key column called 'CustomerID', which refers to the
'Customers' table.
○ By using the foreign key, it is possible to determine which customer placed a particular
order and access all related customer information.

● It's important to note that each table can have only one primary key but multiple foreign keys.

● Before assigning a column as a foreign key, the primary key column in the referenced table
must exist and have no null or duplicate values.
Poll Question
Q. What is the purpose of entity constraints in a database management system?

A) To limit the number of values in a column


B) To ensure data is entered in the right format
C) To ensure the accuracy and consistency of the data stored in the database
D) To restrict the amount of data that can be stored in a database
Key Takeaway

● Data Warehouse
● ERD
● Star and Snowflake Schemas
● OLAP vs OLTP
● Entity Constraints
● Referential Constraints

Data Science Certification 28


#LifeKoKaroLift

Thank You!

29

You might also like